Tuesday, 6 October 2009

Simple OOXML 2.1.3565 Released

Create .xlsx and .docx documents from templates or from new without Microsoft Excel or Microsoft Word. This release fixes previously reported issues and adds the following functionality

  • Works with August 2009 CTP of the Open XML SDK.
  • DeleteRow and DeleteRows from a Worksheet
  • Use FindColumn to retrieve a column from a Worksheet e.g. to set a column width.

The following issues have been fixed
  • id# 2227 - PasteDataTable results in a corrupt .xlsx file when pasting a DataTable that contains typeof(float) columns with a CultureInfo other than "en-US"
  • id# 2291 - PasteDataTable fails with adjacent existing data


Note: Download the source code to view examples and unit tests with lots of sample code.

http://simpleooxml.codeplex.com

Announcing the release of the jQuery databind plug-in

The databind plug-in allows you to automatically bind the contents of an object in JavaScript (usually retrieved using AJAX/JSON).

Each key in the object is used as an id (.example) selector within the selector provided e.g.

var data = { title: 'Lorem ipsum', text: 'Lorem ipsum dolor sit amet, consectetur adipisicing elit.' };
$(document).binddata(data);

<body>
<div>
<p id="title" style="font-weight:bold"></p>
<p id="text"></p>
</div>
</body>

Provides the output visible at http://www.opencomponents.net/databind/example1.htm

More complicated examples include the ability to bind arrays of data and objects to tables and unbind forms into json.

More examples and code can be downloaded from: http://www.opencomponents.net/databind/

Wednesday, 24 June 2009

Databinding with jQuery

I guess the title of this blog should be "Writing a new jQuery plug-in every week" - that's all I seem to be doing right now - but I keep finding functionality that isn't available as a plug-in, or has just been plain poorly implemented.

I'm using jQuery UI for a web project right now, and currently it's missing the ability to turn an html table into a grid styled using the UI css. Now there are some good lightweight plug-ins out there, such as tablesorter and I didnt want to reinvent the wheel. Nor did I want to use a heavy wieght plugin such as jqGrid that is impressive but tries to do too much, especially when what you're really only after is a styled table.

To this end, I'm currently workign on the TableGrid plug-in, that uses the guidelines set out by the jQuery UI development team on how to style tables, whilst incorporating the work done by other developers to enable scrolling, paging, sorting etc. (Example1, Example2)

Which brings me to the subject of this post. I didn't agree with the mechanism used in other grid plug-ins where the data retrieval element is included inside the plug-in, to load data via ajax or some other method. What if I also wanted to load data into say my form?

Following my approach of a super plug-in above, I realised I would first need a data binding plug-in so that developers could load table data on-the-fly. A quick look confirmed there was nothing out there that could be used.

The databind plug-in

View examples and download.

Use the databind plug-in to bind a json object to any html element that has text, is an input or is a table/tbody. Each key (property name) is matched to the id of an element within the target of the databind function. The inner text can be used, or the value of an input element - the plug-in is smart enough to decide. Pass array or a collection of objects to the plug-in to bind rows of data to a table or table body.

Wednesday, 10 June 2009

UpdatePanel plug-in 1.0.0 released.

In a previous post, I introduced a plug-in that makes it simple for developers to integrate jQuery function calls with Microsoft Ajax UpdatePanels. The final version has been released today to both http://plugins.jquery.com and http://www.codeplex.com.

Functions

  • .panelCreated(fn) - called when the UpdatePanel is created on a page
  • .panelUpdated(fn) - called when the panel is updated during a asynchronous (ajax) postback.
  • .panelReady(fn) - called when the UpdatePanel is first created or updated.
  • .beginRequest(fn) - called before the processing of an asynchronous postback starts and the postback request is sent to the server.
  • .initializeRequest(fn) - called during the initialization of the asynchronous postback. Allows the cancellation of the request.

The download includes the script file and a simple example.

Sunday, 17 May 2009

Getting started with the Simple OOXML library.

The Simple OOXML library allows you to create Word Processing (.docx) and Spreadsheet (.xlsx) documents quickly and easily, without having to understand the complex nature of the underlying Xml formats. Because the documents are (mostly) pure xml, these documents can be created in environments even where there is no Microsoft Office installation, such as on a web server.

Whilst the Open Xml Format SDK provides a convenient object wrapper around the xml specification, it is still far from easy to create these documents. Even with a sound understanding of the specification, the Simple OOXML library tool is still useful in providing wrapper functionality to developers, without having a noticeable performance overhead.

To use the Simple OOXML library, you need to be using .Net Framework 3.5 (or later). You'll also need to download the Open Xml Format SDK v 2.0. The current release is the April 2009 CTP and is available from the Microsoft website here: http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0. Obviously this is pre-release code but so far I've found the CTP releases very stable and haven't found any problems or bugs. Finally, you will need to download a copy of the latest Simple OOXML library release, available at CodePlex: http://simpleooxml.codeplex.com/Release/ProjectReleases.aspx. Since Simple OOXML is open source and distributed under the LGPL licence, you can use and distribute the binary with any application, commercial or otherwise. You can also have a look at the C# code to see how the library operates.

To demonstrate the capabilities of the Simple OOXML library, I'll show you how a number of ways to work with Spreadsheet documents by creating a new ASP.NET Web Application in Visual Studio 2008. Follow these steps to get started:

  1. Create a new ASP.NET Web Application.
  2. Include references to DocumentFormat.OpenXml from the SDK and the DocumentFormat.OpenXml.Extensions.dll from the Simple OOXML release.
  3. Finally, add an asp:Button control to the default.aspx page that was created for you, so that we have somewhere to run our code from.

Make sure that you are using the following namespaces:

using System.IO;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using DocumentFormat.OpenXml.Extensions;

Creating a new Spreadsheet (.xlsx) document.

In the click event handler for the button added above, add the following code:

protected void Button1_Click(object sender, EventArgs e)
{
   MemoryStream stream = SpreadsheetReader.Create();
   SpreadsheetDocument doc = SpreadsheetDocument.Open(stream, true);
   WorksheetPart worksheetPart = SpreadsheetReader.GetWorksheetPartByName(doc, "Sheet1");
   WorksheetWriter writer = new WorksheetWriter(doc, worksheetPart);

   writer.PasteText("B2", "Hello World");

   //Save to the memory stream
   SpreadsheetWriter.Save(doc);
          
   //Write to response stream
   Response.Clear();
   Response.AddHeader("content-disposition", String.Format("attachment;filename={0}", "performance.xlsx"));
   Response.ContentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";

   stream.WriteTo(Response.OutputStream);
   Response.End();
}

The first four lines of code create a stream containing a new document. This is loaded into a standard SpreadsheetDocument object. A WorksheetPart is then retrieved using a sheet name. Finally a WorksheetWriter is created to enable us to write content to the worksheet. Next, the writer is used to paste the text "Hello World" into cell reference "B2". The static Save method of the SpreadsheetWriter class is used to update the stream with the document. The final five lines write the document stream to the browser setting the correct header values for a spreadsheet document.

In the next part of this series I'll demonstrate how to add numeric values, dates, shared text and datatables to worksheet documents.

Introducing the UpdatePanel plug-in.

A while ago I blogged about Combining jQuery and Ms Ajax UpdatePanels. (http://bloggingdotnet.blogspot.com/2009/03/combining-jquery-and-updatepanels.html). The main problem is that the UpdatePanel contents (a Div element) are replaced during each post back from the server and this means that jQuery looses all the references to the elements that have now been replaced in the DOM. My initial work around (and commonly found on the internet) is a little clumsy as it involves hooking the Ms Ajax pageLoad event and placing all your jQuery code in a separate function which is then called from $(document).ready or pageLoad. Not pretty and not clever, especially when you have lots of jQuery and you only want to rerun the jQuery for the panel that has actually changed.

My subsequent workarounds involved looking into the LiveQuery plug-in and looking at Live events which are new to jQuery 1.3, however I wasn't satisfied with the performance, and the code still looked ugly. And when it looks ugly that usually means there is a better way. What I wanted was code that looked and worked like any other jQuery plug-in. I needed to dig more deeply into the Ms Ajax client runtime. What I found was the PageRequestManager which is created once per page by Ms Ajax and fires the pageLoaded event on each post back. Hooking into this event allows us to examine the panels that have been updated or created. Using this information, it was then relatively straightforward to wrap a jQuery-style functional interface around these events.

The UpdatePanel plug-in has the following three callbacks:

  • .panelCreated(fn) - called when the UpdatePanel is created on a page
  • .panelUpdated(fn) - called when the panel is updated during a asynchronous (ajax) postback.
  • .panelReady(fn) - called when the UpdatePanel is first created or updated

You would generally use the panelReady event like so:

$(document).ready(function() {
  //Place jQuery code here for elements selected outside the update panel
  $('#UpdatePanel1').panelReady(function() {
    //Place jQuery code here for elements selected inside the update panel
  });
});

Generally, you would only want to use panelReady, which is called when the page is first loaded, or on each postback. A sample project as well as the script is available at http://updatepanelplugin.codeplex.com/ or http://plugins.jquery.com/project/updatepanelplugin

Thursday, 14 May 2009

Simple OOXML featured on OpenXmlDeveloper

Simple OOXML has been picked up by http://openxmldeveloper.org - Microsoft's website for promoting the Office Open Xml standard - and is mentioned on their home page. Expect to see an article appear on the site over the next few days.

Monday, 23 March 2009

Exclusive Checkbox jQuery Plug-in

Web interfaces like Hotmail often have a checkbox or radio button next to each item so that you can select that row. I've always disliked the radio button approach as you are limited to one row at a time eg when deleting. However sometimes you'd like to keep the checkboxes but have them behave like radio buttons. I couldn't find a really simple jQuery plug-in so I decided to write one. Original and minified versions at the end of the article. 

(Update: I changed the code slightly to use the official jQuery extend function for the plug-in definition, and to include the usage in the comments.)

/*
* Exclusive Check Plugin v1.0.2
* Copyright (c) James Westgate
*
* @requires jQuery v1.3.2
*
* Dual licensed under the MIT and GPL licenses:
*   http://www.opensource.org/licenses/mit-license.ph
*   http://www.gnu.org/licenses/gpl.html
*
* @usage $('input:checked').exclusiveCheck();
* @usage $('form input:checked').exclusiveCheck();
* @usage $('table tbody input:checked').exclusiveCheck();
*
*/

//Create closure
(function($) {

    //Plugin definition
    $.fn.extend({

        exclusiveCheck: function() {

            var selector = $(this);

            //Loop through each item in the matched set and apply event handlers
            return this.each(function(i) {

                //When the checkbox gets clicked, uncheck other checkboxes
                $(this).click(function(event) {

                    var clicked = this;

                    //Uncheck all except current
                    if (this.checked) {
                        selector.each(function() {
                            if (this != clicked) this.checked = false;
                        });
                    }
                });
            });
        }
    });

// end of closure
})(jQuery);

Download original and minified version.

Thursday, 5 March 2009

Introducing the Simple OOXML library

The new xml based document formats (.xlsx, .docx, .pptx etc) introduced with Office 2007 finally provided developers with the ability to create documents on a server without having to have either MS Word or Excel installed or to use a 3rd party component.

However producing this Xml is very complicated and it's certainly not a pretty format. The OOXML SDK goes as far as wrapping the Xml elements of the specification into a set of .net classes, but still falls short of the higher-level functionality required to actually create documents. Some open source libraries, such as ExcelPackage, are incomplete or discontinued and I wanted to create a simple, robust fully supported library that fits right into the object model of version 2.0 of the sdk.

The http://simpleooxml.codeplex.com project addresses this issue by providing a layer of abstraction over version 2.0 as a set of simple classes and methods to create new spreadsheet (Excel) and word processing (Word) files with the following benefits:

  1. No Excel or Word is required on the server.
  2. No in-depth knowledge of the OOXML standard or SDK is required.
  3. New documents can be created or existing templates can be modified.
  4. Ability to stream directly to the browser or between servers
  5. High performance
The following table outlines the classes used to create and read office open xml documents:

DocumentReader

Static functions used to create new word processing documents.

DocumentWriter

Static functions used to paste text items into a word processing document using bookmarks and save the document to a stream or file.

SpreadsheetReader

Static functions to create new spreadsheet documents and to retrieve worksheet, column and row references. Get style and defined name range parts.

SpreadsheetStyle

Create an object that can set style, color and font information in a spreadsheet and retrieve and compare style parts.

SpreadsheetWriter

Static functions to create style parts in a spreadsheet, control shared strings and helper functions when working with spreadsheet documents

WorksheetReader

High level and static fucntions to retrieve cell and style information from a worksheet.

WorksheetWriter

High level and static functions to write text, numeric and datatable based values to a worksheet. Draw borders, insert rows, merge cells and set print areas.

Simple OOXML requires version 2.0 of the OOXML SDK which can be found here:
In my next series of posts, I'll show how easy and powerful this library is.

(The DocumentFormat.OpenXml.Extensions.Testing unit test project included in the source on Codeplex contains samples of almost every type of operation supported by the library.)

Tuesday, 3 March 2009

Creating a Spell as you type jQuery plug-in - Part 3

In my previous posts, I described how to create a jQuery plug-in called spellayt (Spell as you Type) that provided spelling correction to Internet Explorer users.

The timed work queue pattern

The performance of initial implementations of the spellayt were disappointing, you would often wait a few seconds whilst the latest words typed in were checked and highlighted. Even worse, you could get the dreaded "A script on this page is causing Internet Explorer to run slowly." message. I soon found I needed a solution usng a multi-threaded approach so that the spell checking could happen in the background to the user typing. A quick check confirmed that JavaScript does not support more than one thread although the search did reveal some interesting options.

Because the word breaking and dictionary checks could potentially take a few seconds each, I decided that only one word could be checked against the dictionary at a time. The trick is to use lots of short functions at regular intervals, so that the UI can continue to process events from the user.

This gave me the idea for implementing the following pattern which I call the Timed Work Queue pattern. In the plug-in, a call to doWordBreak() starts the ball rolling when the input receives focus:

//Global values
$.fn.spellayt.global = {
  options: null,         //plug-in options
  wordQueue: new Array() //word breaking queue
};

$(this).focus(function(event) {

  //Start the the word breaking queue
  doWordBreak();
});

The doWordBreak() function is actually really straight forward. It shifts the oldest work item off the queue and executes that item. The function then calls itself again in a predetermined time (50 ms seems to be a good figure)

//Pops the next word off the word breaking queue
function doWordBreak() {
  var g = $.fn.spellayt.global;

  //Get an item off the queue
  if (g.wordQueue.length > 0) {
     var work = g.wordQueue.shift();
     work.call(work.data);
  }

  //Process the next item
  g.wordTimer = setTimeout(function() { doWordBreak(); }, g.options.milliseconds);
}

So what does the object look like that we are pushing onto the work queue ? The breaktext function breaks text into sentences and words, and loads words onto the queue one at a time

//Check the spelling for all words in the input provided
function breakText(input) {
  if (input == null) return;

  //Split text into a 2d array of sentences and words
  var sentences = splitWords(input.value);

  if (sentences == null) return;

  //Add the call to checkWord() to the work queue for each word in each sentence
  for (var i = 0; i < sentences.length; i++) {

    $.fn.spellayt.global.wordQueue.push({ call: function(parm) { checkSentence(parm); }, data: sentences[i] });

  }
};

The object consists of a function pointer call and data to be passed to the function data. As long as your function has a single parameter (e.g. a JSON object), you could queue up any combination of functions to execute.

This is really handy as you can push completely different function pointers onto the queue, as well as putting some items ahead of others with a higher priority.

Compress ASP.NET response streams

When sending large amounts of data from IIS to the browser, it is sometimes worth compressing certain types of data such as text documents. Although compression is not supported by IIS 6.0, most browsers support basic gzip compression and they notify the server of this ability by sending a header in each request. The following piece of code shows how to use the System.IO.Compression namespace to add a filter to the output stream that compresses the output whilst checking and setting the correct headers. In this example, a .xls document containing in a string is being sent to the client:

    'Send response with content type to display as MS Excel
    context.Response.Clear()
    context.Response.Buffer = True

    context.Response.AddHeader("content-disposition", String.Format( "attachment;filename={0}", fileName))
    context.Response.ContentEncoding = Encoding.UTF8

    context.Response.Cache.SetCacheability(HttpCacheability.Private)

    'Compress the output as it may be very large
    'When flushing or closing+ending the stream, the compression filter does not have a chance to write the compression footer
    'Therefore, make sure the compression filter stream is closed before flushing
    AddCompression(context)

    context.Response.ContentType = "application/vnd.ms-excel"

    'Write to response
    context.Response.Write(_reportXmlss)

    'context.Response.Flush() 'Do not flush if using compression
    'context.Response.Close()
    context.Response.End()

The AddCompression method checks the appropriate headers and adds a compression filter stream to the output:
'Add compression to the response stream
    Public Sub AddCompression(ByVal context As HttpContext)

        Dim acceptEncoding As String = context.Request.Headers("Accept-Encoding")
        If acceptEncoding Is Nothing OrElse acceptEncoding.Length = 0 Then Return

        'Convert to lower to check
        acceptEncoding = acceptEncoding.ToLower

        'Gzip or Compress compression
        'Compress compression is quicker and performs better compression so try that first
        If (acceptEncoding.Contains("deflate")) Then

            context.Response.Filter = New DeflateStream(context.Response.Filter, CompressionMode.Compress)
            context.Response.AppendHeader("Content-Encoding", "deflate")

        ElseIf acceptEncoding.Contains("gzip") Then

            context.Response.Filter = New GZipStream(context.Response.Filter, CompressionMode.Compress)
            context.Response.AppendHeader("Content-Encoding", "gzip")

        End If

    End Sub
To check if compression is being used, I use the awesome HttpWatch, which shows useful information such as headers, amount of compression and bytes sent/received.

Monday, 2 March 2009

Combining jQuery and UpdatePanels

Some of you may have noticed using Microsoft Ajax and jQuery together works fine until you do a post back in e.g. an UpdatePanel and the jQuery plugins referencing elements contained in the panel stop working. I believe these are the event handlers which are bound to the old DOM elements by jQuery.
A simple work around is to combine a bit of Microsoft AJAX with jQuery - in this example I rerun all my jQuery that would normally reside in $(document).ready after each ajax callback instead.
//Jquery document ready
$(document).ready(function(){
  RunScript();
})

//This is called after every page load by ajax
//It is used instead of the normal document ready function
function pageLoad(sender, arg) {
  if (arg.get_isPartialLoad()) {
      RunScript();
  };
}

//Main Jquery function
function RunScript() {
 //... normal jQuery code here etc

Friday, 27 February 2009

Spell as you Type 1.0 plug-in released

Today I released the Spell as you Type jQuery plug-in. It provides inline (red wavy) spell checking and correction options to Internet Explorer users - functionality that is built into every other browser.
Jquery: http://plugins.jquery.com/project/spellayt
To use, simply call the spellayt function in jQuery
$(document).ready(function() {
$('#txtTextarea').spellayt();
})
The plug-in works fully on the client side. For more information, view my series on building this plug-in:

Monday, 23 February 2009

Referencing a template file from a Visual Studio Unit Test

As part of the test suite for the Simple OOXML project, I need to reference a .docx file in a template folder in code - the problem being that every unit test is run in it's own output folder. The solution is to copy the template files to the output folder and retrieve the path from the test context:
1. Make sure the TestContext is set when the test is run by providing a TestContext property (this is now really simple in 3.5)
    [TestClass()]
    public class SpreadsheetTests
    {
        public TestContext TestContext { get; set; }
        ...
2. Make sure that the requested file is copied to the output folder by using the DeploymentItem attribute
    [TestMethod(), DeploymentItem("Templates\\template.xlsx")]
    public void WorksheetCopyTest()
    {   ...
3. Reference from code using the TestContext.TestDeploymentDir
MemoryStream stream = SpreadsheetReader.Copy(string.Format("{0}\\template.xlsx", TestContext.TestDeploymentDir));
Note that even though we copied the template.xlsx file from the Templates folder, it ended up in the root output folder.

Thursday, 19 February 2009

Writing an xlsx document to the response stream using ASP.NET and OOXML

Have spent a most frustrating day trying to output an Excel document created using OOXML sdk 2.0 in a web service to the web browser.
There are two elements to solving this problem>
  1. Using a web service to transfer binary data
  2. Opening Excel on the browser using the new xlsx format
This is actually pretty straight forward, as long as you use a byte array, the data is base64 encoded in the background automatically. On the client, use the right content type and use BinaryWrite instead of Write which is where I was having the problem:
Public Function PerformanceReportByUser(ByVal startDate As Date, ByVal endDate As Date) As Byte()
'Get the path to the template
Dim path As String = Server.MapPath("Templates/PerformanceByUserTemplate.xlsx")

'Get the stream containing the report package
Dim stream As MemoryStream = PerformanceReport.ExecuteByUser(path, startDate, endDate)
Return stream.ToArray()
End Function
On the client, process the byte array. Note the content type, and use of BinaryWrite.
Dim bytes() As Byte = portalservice.PerformanceReportByUser(CDate(ctlStartDate.Text), CDate(ctlEndDate.Text))

'Send response with content type to display as MS Excel
Response.Clear()
Response.Buffer = True
Response.AddHeader("content-disposition", String.Format("attachment;filename={0}", "performance.xlsx"))

'The following directive causes a open/save/cancel dialog for Excel to be displayed
Response.Cache.SetCacheability(HttpCacheability.Private)
Response.ContentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"

'Write to response
Response.BinaryWrite(bytes)

'Response.Flush() 'Do not flush if using compression
'Response.Close()
Response.End()

Tuesday, 10 February 2009

Creating a Spell as you type jQuery plugin - Part 2

In my previous post, I outlined solutions to the technical challenges I faced trying to implement a complete client side browser based spell checking solution.
With the technical challenges sorted, I looked into writing a JQuery plug-in. The tutorials on the JQuery website are a good place to start. I also found the plugin development pattern from Mike Alsup extremely useful. It helps you set up a closure to keep your methods and variables private as well as setting up defaults and options (passed in as a JSON). I won’t go into these details here as they are covered so well in the articles I’ve referenced here.
Detecting non-IE browsers in your plug-in. The first task once the structure of the plug-in was in place and the options set up was to block non IE clients from using the plug-in, or more simply to exit the main function without executing any additional code. This was made very simple by using the following built-in JQuery code
//This plugin is only for IE 6.0 + users
if (!.browser.msie) return;
if (.browser.version <> 6) return;
Loading the dictionary using JQuery AJAX
The next item of code I wanted to tackle was getting the text-based dictionary into a structure the plug-in could understand and use. The $.ajax function provided an asynchronous way of retrieving data from the server. This is useful, because you don’t want the interface to be blocked whilst you potentially download a large file.
The success function passes the data to the loadDictionary function. When successful, the .fn.spellayt.loaded() function is called if it has been set on the plug-in. This is similar the event model used in c# programming. Note that this is how function pointers are exposed outside of the plug-in.
The loadDictionary function loads the words in the dictionary into a three dimensional array, consisting of an array for the first letter of the word, the length and the matching words. In this way, the words starting with the same letter and of the same length can be checked very quickly, as they will most likely have the closest Levenshtein distance.
//Load the dictionary from a url
$.ajax({
  type: "GET",
  url: $.fn.spellayt.global.options.url,
  dataType: "text",
  success: function(data) {

    //Load the dictionary into the dictionaryArray.
    loadDictionary(data);
    if ($.fn.spellayt.loaded != null) $.fn.spellayt.loaded();

  },

  error: function(XMLHttpRequest, textStatus, errorThrown) {
    if ($.fn.spellayt.loadError != null) fn.spellayt.loadError(textStatus);
  }
Highlighting words that have been misspellt
When the input gains focus, a few things need to happen on a regular basis as the user types
  • find new words to check
  • check if a word exists in the dictionary
  • Highlight misspellt words

This is done by using the setTimer and setInterval functions – the difference between them is minimal - the one fires a function every x milliseconds, the other calls a function once after the next x seconds.

$(this).focus(function(event) {

var g = $.fn.spellayt.global;
if (g.ready) {
 g.current = this;

 //Set a timer to break words every 1 second
 g.breakTimer = setInterval(function() {
   if (g.wordQueue.length == 0) breakText(g.current); }, 1000);

 //Set a timer to highlight words every 1/2 second
 g.highlight = setInterval(function() {
   $.fn.spellayt.highlight(g.current); }, 500);

 //Start the the word breaking queue
 doWordBreak()
}
});
The sentence and word breaking is done by the use of regular expression parameters – this is incredibly useful – a word breaking regex can be set as a parameter for e.g. a different language.

In the next part of this series, Ill look into the problems I faced when the processor intensive code for word breaking and checking blocked the UI and how to emulate a multi-threaded environment in the current version of javascript.

Friday, 30 January 2009

Creating a Spell as you type Jquery plugin - Part 1

Recently I've become enamoured with the Jquery library and we are now prefer it over the buggy Microsoft Ajax Control toolkit for things like modal dialogs, autocompletes, as well as hiding and showing divs etc. In short it rocks and I wish I'd found it sooner.
A client had the need to to implement spell as you type, complete with red underline "squiggles" in their web based CRM product running in IE6. An example of this can be found in most popular word processing packages, eg the screenshot from Microsoft Word 2007, but less so online in web pages etc.
Annoyingly, this functionality is built into Firefox and Chrome, but not into Internet Explorer - even the latest IE8 beta. A look around the web produced a couple of hopeful options - a third party component as well as toolbar and plugins from Google and Microsoft Live.
The main problem we had with all of the above solutions was that they tried to overwrite the underlying input with a new element in the DOM. The font and layout settings were never copied correctly and caused the layout of the page to change. Some browser plugins only offered the popup dialog variety of spell checking, and not the inline "squiggles" approach.
Happy day - a bespoke developement was on the cards. The first thing I had to determine was "Is this actually possible in a browser?" and "Could this be turned into a JQuery plugin?" . At the start, I made a list of the problems that would need to be resolved in order to have a chance of creating a decent the plugin:

1. How to measure the location of words in an existing input/textarea?
I did recall seeing that IE had some support for returning the exact measurement of text, and although this wasn't valid with the W3C standard, this potentially wasn't a problem, since the plugin was only targeted at IE users anyway. The TextRange object provides a lot of functionality that would have been very hard to reproduce. You can return the absolute position and size of a piece of text using this object e.g. var wholeWordsOnly = 2;
var range = input.createTextRange();
range.findText(term, 0, wholeWordsOnly);
This will create a range over an input and position it around the first instance of the word, giving you the absolute position of this word (in pixels) on the page.

2. Underlining misspellt words with a squiggly line.
This turned out to be simpler than I thought. Once we have the dimensions of the word, it is relatively easy to insert a div with a repeating background image over the word by using a bit of css:
div.spellayt {position:absolute; z-index:96;
background:url(Images/spellayt.gif) repeat-x; margin: 0px; padding: 0px}
The image itself is only 4 pixels wide and three pixels high, it is positioned 3 pixels from the bottom of the bounding rectangle of the word, using Jquery to append the new div to the document body:
//Append the div to the document body
$("<div class="'spellayt'"
id="'divWord'"></div>").appendTo(document.body).show();

3. Where do I get a list of words and how do I find suggestions when there is a misspelling?
My next task was to find a dictionary of common use words in the English language. A search on the web located some likely candidates with appropriate licenses (GPL etc). The best option turned to be a plain text list of words, including all variations, capitalisations etc.
Projects like GNU Aspell have a dictionary of over 150 000 entries! This clocks in at 1.6 mb uncompressed and 300k+ compressed which is a major consideration for implementation of a client side spell checking solution.
Early on I had decided that the spell checking process would have to occur entirely on the browser for the plugin to be useful to anyone else, and to reduce the massive load that would be placed on the server if every word was to be checked. However with client side caching and server based gzip compression headers, I decided this was acceptable and pushed on.
Next was one of the potentially show stopping problems - how to get a list of suggestions for misspellings? Again this turned out to be a well defined problem with a solution, including source code! The following page by Michael Gilleland details the Levenshtein Distance algorithm used to compare the "closeness" of two words, as a value. This page contains source code and a nice demo provided by Lukasz Stilger, which I used with some fixes and modifications for performance.
At the time this now meant that I thought I had all the tools necessary to tackle the problem and I was ready to go ahead and create my first Jquery plugin "Spell as you type".