As a developer of language understanding software, it can be frustrating to tackle problems that are not completely solvable using today’s technologies, and it’s always a challenge to evaluate the software in meaningful ways. Concepts like precision and recall are useful to compare two different approaches to the same problem, but they don’t necessarily map well to a person’s experience using the software, which may require extremely high accuracy for some tasks but not for others, and which may have as much to do with the way the results are presented than with the quality. I have struggled with meaningful evaluation on every project I’ve ever worked on.
As a novice Korean speaker, living in Seoul with a Korean wife, Korean in-laws, and Korean friends, I am also a frequent user of Google Translate, which attempts to solve language translation, one of the thorniest language problems around. I’m often pasting in an email or Facebook comment to get the gist of what was said, so I don’t have to bother somebody for a full translation unless it’s important. Sometimes the results are complete nonsense, sometimes they are really funny, but mostly they are far from perfect but still completely useful for my needs. (And as an atypical user who appreciates how difficult the problem is, perhaps my expectations are a lot lower.)
At any rate, I was delighted to see feedback drop-down (helpful | not helpful | offensive) on the latest Translate user interface. It seems like an obvious question to ask, but it’s the first time I’ve seen this feedback mechanism on an intelligent system like this. I really like that it doesn’t ask about the degree of correctness of the answer, but rather whether the (probably not completely accurate) results were good enough to help the user with whatever they were trying to do. I’m sure they are collecting a ton of really great data to improve their tool in a variety of ways, and I can imagine applying this to a lot of other artificially intelligent systems.
Finally, if anybody from the Google Translate team is reading this post, I’d like to make a request. I often share humorously poor translations with my wife, but I’d bet they are quite tame and mundane compared to the stuff you’re collecting with the offensive button. I’d bet that you even have an internal “hall of shame” for this kind of thing. If you can share any of these without violating your privacy terms, I’m sure all your diehard users would appreciate a peak at the worst of the worst. At least I would. 🙂
Image courtesy Google Translate UI.