Posted on 04/29/2014 2:48:17 AM PDT by Gene Eric
In 2010, the best systems got around 25% WER for recorded phone conversations, meaning that a quarter of the words were misrecognized. This is actually pretty good, since people often speak very quickly with lots of background noise and add lots of disfluencies (e.g., like uh yeah so I I ok right). By comparison, the WER for Google Voice data, which includes search queries and dictated messages, was 16%.
Compared to other areas of artificial intelligence, speech is a mature field, especially since it was commercialized so early by IBM and Dragon Systems. Because of this, by the time 2010 rolled around, even incremental progress had become quite difficult. The four or five very best new ideas over the past few decades have yielded 5-10% improvements in the state of the art. Speech conferences include hundreds or even thousands of presentations on complex models that seem to work a bit better for one genre but not another. I decided to write my dissertation about an easier topic.
(Excerpt) Read more at datascience.berkeley.edu ...
Dragon Systems ping.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.