Programming Languages

Featherweight TeX and Parser Correctness

by Sebastian Thore Erdweg and Klaus Ostermann

In Proceedings of International Conference on Software Language Engineering (SLE) 6563: . Springer, 2010.

Abstract

TeX (and its LaTeX incarnation) is a widely used document preparation system for technical and scientific documents. At the same time, TeX is also an unusual programming language with a quite powerful macro system. Despite the wide range of TeX users (especially in the scientific community), and despite a widely perceived considerable level of “pain” in using TeX, there is almost no research on TeX. This paper is an attempt to change that. To this end, we present Featherweight TeX, a formal model of TeX which we hope can play a similar role for TeX as Featherweight Java did for Java. The main technical problem which we study in terms of Featherweight TeX is the parsing problem. As for other dynamic languages performing syntactic analysis at runtime, the concept of "static" parsing and its correctness is unclear in TeX and shall be clarified in this paper. Moreover, it is the case that parsing TeX is impossible in general, but we present evidence that parsers for practical subsets exists. We furthermore outline three immediate applications of our formalization of TeX and its parsing: a macro debugger, an analysis that detects syntactic inconsistencies, and a test framework for TeX parsers.