Sunday, 9 September 2012

Integrating Word into a web application

After the success of integrating excel into my website, I wanted to try my hand at expanding the word integration options.

We already had quite extensive abilities to generate documents in html, pdf and rtf format (using an old version of the iTextSharp library).

I wanted to have functionality where more experienced users could download a template file, customise it in any way they like (for example adding their own text, logos etc) and then upload it back to our server and use it as a template.

Again a project on codeplex came to the rescue, Word Doc Generator.

This uses document placeholders, and you can write your own class that inherits from the DocumentGenerator class to output the data you want.

So a template file like this can be used to generate a custom booklet like this.

You can test this on the website, by creating your own logon, selecting some exercises, generating a booklet and then using the word templates tab.

I had to make some modifications to the code to enable it to handle images as part of the document placeholders.

I gave a presentation to the Newcastle Coders group in September and was asked by several people to post how I modified the code for images so I will post the details here.

This is based on the version released on the 24th January. It is a bit of hack, but it might be useful for you (or at least give you a starting point).

In the WordDocumentGenerator.Library project, I had to modify the DocumentGenerator and OpenXmlHelper classes. I include links to the changed files.

First Hack

Add a reference DocumentFormat.OpenXml.Drawing to OpenXmlHelper as well as a reference to the document being handled (this is needed to add images).

Second Hack

Modify the method SetContentOfContentControl, to handle jpg images (this could be any image type).

Third Hack

Add two helper methods to add the image.

Final Hack

Modify the SetContentInPlaceHolders method in the DocumentGenerator class so it passes in a link to the main document.

Amend as necessary and compile into the WordDocumentGenerator.library dll.

When you are making your own documentgenerator class libary, pass in the filename of the image as your content.

Email me if I have missed something and I will amend my blog post.

Integrating Excel into a web application

The new version of my website with new HTML 5 functionality continues to perform well, so I decided to continue the work on getting everything working as best as I can make it.

Originally, there was a data maintenance page where the physiotherapists could edit english text and translators could enter data in for the translations.

We modified the translation functionality so we could download the text as a CSV file and export it to excel. The translators would enter their text and we could upload it back to the database.

This had several problems
  • Excel has a method of "guessing" what the column types are, by looking at the first "X" rows, so columns with numbers and text would be imported incorrectly (as well as text potentially getting truncated).
  • I was a semi-manual process and time-consuming.
  • The translators would "break" the excel file by sorting partial row/column selections and hence corrupt the file because the text and key column would get misaligned.
  • The files were quite large
So I wanted to better automate this and also make it so the file couldn't be corrupted by translators sorting it incorrectly.

I first attempted to use Open Xml to automate the generation of excel files (in 2007+ xml format).

This worked reasonably well, but
  • Documentation on how to do things was not very good
  • Generating files from scrach with code could be quite difficult, a single change can result in the file refusing to open with very cryptic error messages.
  • Reading in large recordsets was quite slow
So I did some research and found the Closed Xml library on codeplex. It provides a more structured interface and generates the xml document for you in the background. This had good documentation and was an active project.

With a bit of work, we now have a simple webpage where the translators can just click on their language of choice and they get :
  • A protected workbook where they can only enter text in the translation column.
  • The translation sheet has frozen columns and can be filtered, this helps them in the translation process.
  • The rows are coloured coded indicating the need for translation/re-translation.
  • The sheet cannot be sorted without using the pre-defined sorting macros, so the translators cannot break the file.
  • The exercise text include a hyperlink to the exercise image. This helps the translators understand the exercise.
We have another simple page, where we can upload the modified file, and add a comment. It then summarises the changes and updates only the records that have been changed. It keeps a copy of the file so we can revert back to a previous version if something goes wrong.

The closed xml library is used in the reading and writing. It is very fast and works very well. I highly recommend it. The ability to base the excel file on pre-existing files is really useful as you can pre-program your own macros and have other sheets in the workbook.

An example of the Norwegian translation file can be found here.

This was received very well, so the next step was to use the library to update the english. With some work, we now have a page where they can :
  • Choose the data sections they wish to update, and it generates a custom file.
  • Columns have validation (ie number ranges, text max length and so on)
  • The sheets have frozen columns and can be filtered.
  • The last row has comments indicating what needs to be entered.
Much like the translations, they can upload the file and it will modify records, delete ones that are no longer required or insert new records. With the validation built in, we can have quite a bit of confidence of it being able to be used by the physiotherapists without "IT assistance".

An example of the English data file can be found here.

This has been very well received, they love the ability for them to use excel with spell checking and being able to email it to one another.

I gave a presentation to the Newcastle Coders Group in September, and there was quite a few developers there who were quite interested in using this library. It may be useful for your own projects.