Docudiff: OpenDocument comparison

I’m a strong supporter of the OpenDocument format, a new open standard file format for office documents that was created to stop most of the problems that you get when using proprietary office document files (how often have you opened a Word document that someone sent you only to find that something doesn’t look right?)

Document comparison - detecting the differences between two versions of a file - is a problem that’s been around in computer science since the 1970s, although the problem of detecting changes between office documents in a way which is meaningful to a human user has never been solved particularly well. For my final year project for my MEng degree in Computer Science at the University of Bristol, I looked at this problem and produced an algorithm and an application to detect changes between two versions of an OpenDocument format file.

The result is still fairly rough around the edges, but it does work. You can read more about it in my dissertation (including details of the algorithm and my design choices) or download the software (Java).

If you have any questions about my project, or you’d like to take the software further or develop it either as a commercial or open-source project, please contact me.