>>I think they already extract context from images.
Jason, that is 100% true in some contexts. We know from reporting that
1. They are using text recognition in street view images and including that info for maps and local
2. They are trying to run image recognition on images and provide descriptions based purely on the image, not using meta data at all. There was a recent report on one of the Google blogs about this. Did you see it? Really one of the more enlightening pieces.
I said 10 years only because progress in image recognition, though amazing, has always taken longer than people have expected and the error rate is still too high.
There's a famous story... I am probably going to get all the details wrong, but it's something like "In 1966, Donald Knuth hired an undergrad for a project whose goal was to perform image recognition. Knuth budgeted 6 months for the project. 48 years later, we're still working on it." Back then they thought teaching computers to understand images and natural language would be easier than teaching a computer to beat a tournament chess player in chess.
So they seem "close" but "close" in this field might mean years.