This is also a "local" exploit, requiring the hacker to be close to the target device. It is also a general attack that cannot single out specific user's device from among all others in the area unless the device is isolated. If the attackers is close enough to attack the device, he's already close enough to eavesdrop on any conversation the target may be engaged in having. In addition, texting using this technique would require zero ambient background conversation to override the intended text. In other words, this sounds like a neat, but extremely impractical, trick hack to accomplish anything malicious.
My question is why anyone would digitize voice microphone data in such a way as to be sensitive to ultrasound?Even if they want to use a microphone which happens to be sensitive to those frequencies, they need to low pass filter the output as a first step to getting good-quality digital data. If they dont do it analog - which in this day and age might not be cheap in context - it is trivial to do it digitally. This should never be a problem.