Skype Translator, a real-time language translation tool

Started by bill, May 28, 2014, 06:37:40 AM

Previous topic - Next topic

bill

QuoteSkype Translator, a real-time language translation tool, will begin rolling out later this year

Skype Translator results from decades of work by the industry, years of work by our researchers, and now is being developed jointly by the Skype and Microsoft Translator teams. The demo showed near real-time audio translation from English to German and vice versa, combining Skype voice and IM technologies with Microsoft Translator, and neural network-based speech recognition. Skype Translator is a great example of why Microsoft invests in basic research. We've invested in speech recognition, automatic translation and machine learning technologies for more than a decade, and now they're emerging as important components in this more personal computing era.

Have you ever seen how poorly Google Voice transcribes voicemail? It's unintelligible. How is Microsoft going to do any better with translation and voice?

ergophobe

Saw this in Le Monde this morning. Amazing.

As you know, the historian in me sees 10 years as the short term and has trouble focussing on what something will do in the next 1-2 years...

So I'm not as dismissive as you are, while fully acknowledging that in the short term you are 100% right.

Google Voice transcriptions
First off, I don't think voice recognition is truly a Google strength. I'd bet they blow MS away on image processing, but MS might be ahead on voice - they've put a lot into it.

And even so Google Voice transcriptions can be pretty good if the call quality is good - very often I don't need to check my messages at all because the email transcription is enough. Is it good enough for contracts and diplomacy? No. Is it good enough for figuring out when your guest is planning to show up? Generally. But yes, it's still fairly primitive.

The OCR Analogue

I remember that in 1993 I was commissioned to get some documents ready for republication. Since there were no electronic copies, I started by scanning the originals. It was abysmal. These were serif fonts on glossy paper and overwhelmingly it transcribed like this:

Aiiin nnininiii iiiiii iiiiihiiiii iriinnjjii

The error rate for OCR in these conditions was literally in the 90% range. Fast forward 10 years to the next time I had to do it and it had about three errors per page, roughly a .5% error rate. Today it is more accurate than all but the best typists.

Similarly, text search on Google Books was utterly useless for the first few years. By 2008, it had become a primarily tool in my historical research and, though there were errors of course, it was overwhelmingly accurate (albeit always issues with serif fonts).

Machine translation over time
At that same time, a British company pitched my publisher on doing French to English translations using their automated translation tool. As part of their pitch, they auto-translated the letter, which was passed around to gales of laughter and pinned to the bulletin board for amusement. The simple letter was completely bothced and they thought they would translate complex literary and historic texts from the 16th century. It was absurd.

But for the first time in many years I did one of my tests where I take a text in English and use an automoated translator to translate it into French then back into English and the results were pretty damn good.

And the CAPTCHA Analogy

I'll throw out one more thing that you and I have discussed in the past - CATPCHA. CAPTCHA's use three kinds of obfuscation: noise, distortion and boundary violations (i.e. connecting letters where they shouldn't connect). Computers are already better than humans at filtering for noise and distortion. CAPTCHAs only work still because we remain, for the short term, better at fixing boundary violations. Within a few years, the only way for a human to solve a CAPTCHA will be to run it through a computer ;-)

Relevance? I believe that in the relatively near term computers will also be better than humans at filtering noise and distortion in a voice signal (which as the Google Voice transcripts show, they currently are not, at least not at Google) and it may well be that our cell phones will have "english to english" translation as a feature for bad calls. In a foreign language, it has been my experience that the phone is a challenge. I see it all the time where people who are fine in person, cannot speak on the phone because a small amount of noise and distortion make it hard for them to follow at all. Again, I think computers have huge potential here.

Yes, of course you're right overall, but I have been feeling for a couple of years that both of these technologies are approaching a takeoff point.

So I expect that it will be laughable for the first few years and good for jokes. It will then be serviceable for the next few years. And 10-15 years from now it will be better than people who "know" a second language, but not as good, of course, as professional translation or someone who has spent years living in the culture of their foreign language. And kids born 10 years from now will just take it as a given that they can call someone in any modern language and have a reasonable conversation, that they can walk up to the desk at a hotel in a foreign country, speak into their phone and communicate with the staff...  and I wonder what that does for the hegemony of English and the long-term evolution of languages.

littleman

I keep thinking about how many devices from Start Trek that seem to be actually happening or soon to be happening today.   It's not quite the same thing, but its a step closer to a universal translator.

I think Google Voice has some comedic value as it is right now, other than that it seems pretty useless -- still hopeful about the technology and someday a working portable app.

bill

Just the other week a Japanese colleague was all excited about a Microsoft based machine translation software package. He was convinced that this product was a lot better than anything seen before. They were going to drop a lot of money on this package, but wanted to have me check first. After hearing the rave reviews I once again suspended my belief that machine translation is still crap, and read over the documents. It didn't take too long for me to figure out that it was still garbage.

I have no idea whether the technology used had any ties to the language engine being used on this voice product, but I'd think text would be a lot easier to handle than the spoken word.  If you can't get the text right how are you going to handle the rest?

Google has made great strides in OCR thru analyzing CAPTCHA on their site. They also have years of voice data from their Voice product. One would hope that these technologies would improve, and they have. I just have not seen it happen as quickly with machine translation.