DISCLAIMER: This site is a mirror of original one that was once available at http://iki.fi/~tuomov/b/

LaTeX is great for scientific articles, prose and other simple documents. It, however, terribly fails on more complex documents, especially ones with lots of tables and tabulated information with big fields, such as program API documentation. Not only are most of the related packages broken and mutually incompatible, with most people who know low-level TeX having switched from writing rocking software to rocking chairs. Conversion to other formats such as HTML is also a major problem.

To convert (La)TeX to other formats, one has to re-implement almost every package beginning from a parser, as every TeX package may parse the input itself and create its own syntax – and many do. Clearly that means a lot of work for format conversion. Such a design decision would be a big mistake in a modern language. But TeX isn't modern. It was designed with computers with very limited memory in mind. In such a case an almost full-featured programming language that TeX is may come into question. In a modern document markup language, there should, however, be clear separation between syntax and semantics; clear separation between markup and packages that implement commands, environments and tags. Document markup and packages should not be implemented in the same language even. The two tasks are of very different nature, and hence a language that fits both tasks is suboptimal for both.

Unfortunately, the world seems to have concentrated on producing WYSIWYG word processors and other office package rubbish, and XML, instead of creating an improved LaTeX-like system. Neither are an answer to any problem. They're big problems themselves. Most documents should be written with the semantics of different elements in mind. WYSIWYG hides that, and the documents tend to become a totally unmanageable mess. It is true that word processors do provide ways to insert "sections" and other such elements in documents, but these tend to be more stylistic than semantic. It has happened that Word has decided to adopt what little "semantics" it supports based on style, and on the other hand, lost semantics when style has changed. WYSIWYM seems a slighly more promising approach, but unfortunately there's little work in that department beyond LyX to my knowledge. Although WYSIWYG, TeXmacs isn't all bad either, as most of its constructs are actually rather semantic – and it supports input in the form of commands – but it does hide a lot of detail crucial for editing that a WYSIWYM or pure markup approach wouldn't hide.

XML is not an answer, because being very verbose, it's not human-readable, let alone writable. A documentation markup language should be human-readable and writable in order to not force people to use editors (usually WIMP crap these days) that are far inferior to their favourite text editors with long history. The same applies to configuration file languages. XML isn't suited for either. The only tasks I can think of XML suited for, is protocol data, and even there a binary format might be better, as XML is quite inefficient uncompressed. But the XML people clearly think it's a silver bullet. They've even embedded formidably inefficient programming language constructs in the syntax in XSLT etc.

However, XML is not all bad for all tasks. The fundamental ideas are pretty much right (as long as you forget those programming/transformation language hacks that should be done with a real programming language for the most part). It's just the syntax that could just as well be binary given its current form. For documentation purposes, changing <foo bar=baz>quk</foo> to TeX-style \foo[bar=baz]{quk} would go far in making the syntax more readable (with alternative but equivalent \begin[bar=baz]{foo} … quk … \end{foo} syntax for bigger "environments"). It's not enough, however. (La)TeX has many other shortcuts that make editing less laboursome, while XML-based formats tend to use heavy markup. These include paragraph separation, where an empy line suffices in TeX while e.g. DocBook requires enclosing paragraphs in <para>-tags. XML-based formats also often tend to be too semantic, even, requiring a lot of information and tags everywhere. (LaTeX could at times be more semantic).

Other TeX niceties include a very usable and for the basic parts very natural maths mode: '$' for marking inline maths mode, '_' (underscore) for subscripts, and '^' for supscripts. Contrast this with MathML that is even less human-editable than XML formats in general, or even Lout that has a very "unsyntactic" maths mode: x sub i sup 2 (IIRC - lout.org seems to be missing documentation at the time of writing this) for what would be x_i^2 in TeX. I don't even want to know the MathML for that.

And, yes, Lout isn't much better than LaTeX. It does facilitate creation of new document styles, but it also has "plastic" syntax. As for ConTeXt, while it may avoid some of the incompatibility problems of LaTeX packages, it does not seem to be very semantically-geared, and I suppose it will have very much the same problems with HTML conversion as LaTeX, supposing there were any tools for that.

For technical, non-mathematical documents, it probably wouldn't be a lot of work to write a convertor from LaTeX-style syntax to DocBook. I may infact one day get around to doing that, because Ion documentation is currently in an unbuildable state thanks to some changes in LaTeX packages and ugly hacks in the documentation code itself. But to build a documentation system with all the features of LaTeX and good parts of its syntax, but better implemented, seems to be a lot of work. I think it is imperative to implement one, however. (La)TeX has many problems, and is dying along with the few people who know the obscure TeX language well (as can be seen from the progress on LaTeX3), but the available alternatives are even worse. WYWISYG word processors are a big mistake that are causing a lot of suffering in the world each day, and XML is too heavy for human editing. A language with the fundamental ideas of XML combined with those of LaTeX along with a lighter LaTeX-like syntax should serve the needs of both those who want to edit the markup with a plain old text editor, and those who want to use a WYSIWYM frontend.

Update 2006-04-15 18:37 EEST: Added a few words on ConTeXt.