Friday, 30 January 2009

Creating a Spell as you type Jquery plugin - Part 1

Recently I've become enamoured with the Jquery library and we are now prefer it over the buggy Microsoft Ajax Control toolkit for things like modal dialogs, autocompletes, as well as hiding and showing divs etc. In short it rocks and I wish I'd found it sooner.
A client had the need to to implement spell as you type, complete with red underline "squiggles" in their web based CRM product running in IE6. An example of this can be found in most popular word processing packages, eg the screenshot from Microsoft Word 2007, but less so online in web pages etc.
Annoyingly, this functionality is built into Firefox and Chrome, but not into Internet Explorer - even the latest IE8 beta. A look around the web produced a couple of hopeful options - a third party component as well as toolbar and plugins from Google and Microsoft Live.
The main problem we had with all of the above solutions was that they tried to overwrite the underlying input with a new element in the DOM. The font and layout settings were never copied correctly and caused the layout of the page to change. Some browser plugins only offered the popup dialog variety of spell checking, and not the inline "squiggles" approach.
Happy day - a bespoke developement was on the cards. The first thing I had to determine was "Is this actually possible in a browser?" and "Could this be turned into a JQuery plugin?" . At the start, I made a list of the problems that would need to be resolved in order to have a chance of creating a decent the plugin:

1. How to measure the location of words in an existing input/textarea?
I did recall seeing that IE had some support for returning the exact measurement of text, and although this wasn't valid with the W3C standard, this potentially wasn't a problem, since the plugin was only targeted at IE users anyway. The TextRange object provides a lot of functionality that would have been very hard to reproduce. You can return the absolute position and size of a piece of text using this object e.g. var wholeWordsOnly = 2;
var range = input.createTextRange();
range.findText(term, 0, wholeWordsOnly);
This will create a range over an input and position it around the first instance of the word, giving you the absolute position of this word (in pixels) on the page.

2. Underlining misspellt words with a squiggly line.
This turned out to be simpler than I thought. Once we have the dimensions of the word, it is relatively easy to insert a div with a repeating background image over the word by using a bit of css:
div.spellayt {position:absolute; z-index:96;
background:url(Images/spellayt.gif) repeat-x; margin: 0px; padding: 0px}
The image itself is only 4 pixels wide and three pixels high, it is positioned 3 pixels from the bottom of the bounding rectangle of the word, using Jquery to append the new div to the document body:
//Append the div to the document body
$("<div class="'spellayt'"
id="'divWord'"></div>").appendTo(document.body).show();

3. Where do I get a list of words and how do I find suggestions when there is a misspelling?
My next task was to find a dictionary of common use words in the English language. A search on the web located some likely candidates with appropriate licenses (GPL etc). The best option turned to be a plain text list of words, including all variations, capitalisations etc.
Projects like GNU Aspell have a dictionary of over 150 000 entries! This clocks in at 1.6 mb uncompressed and 300k+ compressed which is a major consideration for implementation of a client side spell checking solution.
Early on I had decided that the spell checking process would have to occur entirely on the browser for the plugin to be useful to anyone else, and to reduce the massive load that would be placed on the server if every word was to be checked. However with client side caching and server based gzip compression headers, I decided this was acceptable and pushed on.
Next was one of the potentially show stopping problems - how to get a list of suggestions for misspellings? Again this turned out to be a well defined problem with a solution, including source code! The following page by Michael Gilleland details the Levenshtein Distance algorithm used to compare the "closeness" of two words, as a value. This page contains source code and a nice demo provided by Lukasz Stilger, which I used with some fixes and modifications for performance.
At the time this now meant that I thought I had all the tools necessary to tackle the problem and I was ready to go ahead and create my first Jquery plugin "Spell as you type".