Exactly; the question is in normal VoIP, does the system send as many packets for five seconds of silence as it sends for five seconds of sound? If the answer is yes, then an encrypted version of the message should "mask" the cadence.
From the stand-point of the VoIP carrier, they don't know what you are sending; all they see is a stream of packets. I'm not sure how they would differentiate between encrypted voice and a chatter-box - if the data rates are the same, there's no difference to the network.
There's no such thing as perfect silence at the mouthpiece, and I'm not even sure if the compression algorithm would be able to compress the "silence" more than the voice, especially if the background noise happened to be white noise. White noise is random, and randomness doesn't compress well, plus white noise is commonly piped into sensitive meeting rooms.