Skip to content

Replace the "TexSoup" parser

We could rewrite the marker and tokenizer layers in order to switch to a different parser.

The TexSoup parser that is currently in use, only supports a very small subset of LaTeX structures. This causes TransLaTeX to fail when tested on bigger and more complex LaTeX files.

Some alternatives that I've found so far are listed below (from most to least preferable):

There are multiple ways we could proceed. We could use a LaTeX parser like TexSoup, in which case we only need to replace the current parser.

We could use a LaTeX to HTML converter that has a more mature/complete parser and then parse the HTML which is easier and there are multiple libraries readily available for HTML parsing.

Or, we could go for the nuclear option and try to use a full TeX engine by programming certain parts of TransLaTeX in LuaTeX or LaTeX. This way there would be no unknown structures left to the parser. This would require the user to also have a TeX engine installed for TransLaTeX to function. I have read some documentation for LuaTeX and LaTeX low level commands but I don't know so far if this approach is even possible and that these engines would even expose an interface to their underlying parsers.

Further research on this issue is necessary to decide on an optimal solution. In any case this operation would require quite a big rework of TransLaTeX on the inside. To see if it's worthwhile.