Home     Publications     Articles and Projects     PhD Thesis     Links    

Improve Swype's recognition accuracy with a category-based language model

July 2012

Swype is good at recognising text on a small mobile phone screen, even when you hit the wrong key, but it could be even better. In this article I explain how, using a category-based language model to predict your next word.

Language models are statistical models of language that use learned word frequency distributions to predict what a user's input sentence will be. They are useful in speech recognition and machine translation, where they are used in combination with the user's input to guess of what the user intended to say. One relatively new application of language models is to recognise keyboard input on touch screen devices. The Swype keyboard is one popular example. Swype works by tracing a line from key to key on the keyboard. The keyboard is small, and the user does not always land on exactly on the right key. To recover from this, Swype would use a statistical model of the user's errors. So when the user traces a line, Swype would build a list of letter probability distributions, based on the proximity of keys to each vertex in the line. From Bayes' rule, you multiply these error probabilities together with the language model prediction for the word. This gives the probability that the user intended that word:

P(W|E) = P(E1|W)....P(En|W).P(W)

I wanted to see how good Swype's language model is ( the P(W) function in the above formula ) and whether it is simply a unigram ( a dictionary with single word probabilities ), or does it take account of the previous words in the sentence. So I installed the Android Swype keyboard and typed in "I am" and "I an". "m" and "n" are neighbouring on the keyboard. "I an" is not grammatical, as "I" is always followed by verb, but Swype was not biased to either one, so I concluded it is using a unigram model.

So how do we take account of the previous words in the sentence? It's difficult because of the combinatorial explosion of recording the frequency of every pair of words or every triple of words that occur in the training data. One solution if to keep large tables on a central server and provide word predictions as a web service. However this means you need to be always connected to the internet to use it, and there would be a perceptible processing delay if the network latency is above 100ms. Another disadvantage for those who prefer to type in private is that everything they type in to their phone is sent to the Swype language model server. There is an alternative solution however. A category-based language model can automatically use corpus data to compress a large dictionary of words into a small number of word categories, approximating grammatical classes. A small table of relations between these categories is used instead of relations between the words themselves. This enables Swype to know at least the basics of grammar at little memory and CPU cost, and with no link to a server.

Category-based language models are easy to build. You need to search for similar trigrams in a corpus of text. Suppose you had the word sequence trigrams [x1, y1, z1] and [x2, y2, z2] in a corpus where x1 = x2 and z1 = z2. This gives you a hint that y1 and y2 might be in the same grammatical class. You need to just make a sufficient number of links between words, and let gravity glob them into classes. Or you could take the 1000 most common words and simply classify them by hand. A bigram table is then constructed recording the probability that a word of class c2 will follow a word of class c1, by scanning the corpus data. We have only tens or hundreds of classes so this table, and a table to map dictionary words to their classes, would use only a few kilobytes of memory.

Nuance recently announced improvements to their language model for Swype, but in my opinion, they should use a category-based model as well. I read that their beta release now has a language model, but the installer would not work on my phone, and from what I gather, it is the less preferred server-based solution and is not a category-based language model.

A free ( GPL licenced for GNU / Linux ) alternative to Swype is possible without much effort, and even more possible when a category-based model is used, as it would not be tied to any company or centralised language model server, and would be completely open and private. You can read some more about the application of category-based language models here.