I hate LaTeX. I love LaTeX.

Everyone who knows me IRL (and, I suppose, who follows me online for long enough), knows that I have a… special relationship with LaTeX. I think it has something to do with its obscurity, when it wasn’t specifically made to be obtuse, and then being so good at what it does — which is typeset documents. It doesn’t help that people consistently make impressive things with it, thus showing that it’s not just theoretically Turing-complete, but really something you can bend to your will, provided you’re willing to grapple with books from the 70s and obscure PDFs scattered online, in lieu of some modern documentation.

This is to say, I set out to write a document and then suddenly 5 hours have passed and I’m reading about glue and fragile commands. In the end, it’s rarely worth it, but the giddy feeling of having mastered the weird machine lingers, and so the cycle repeats when the following report (or presentation) is due. As an example of this, let me share with you my recent venture into statefulness via auxiliary files with LaTeX.

The goal was simple: fully decouple metadata input from a title page, in terms of order and redundancy. I wanted to be able to do something like this:

\author{James A. First}
\affiliation{Reduandant Affiliation}
\affiliation{The Institute}

\author{John B. Deux}
\affiliation{Reduandant Affiliation}
\affiliation{The Other Institute}


and get something like this:

James A. First¹²    John B. Deux¹³
    ¹ Redundant Affiliation
    ² The Institute
    ³ The Other Institute


This turned out to be a slightly more complex variant of something that I’d previously managed: creating a Table of Contents. In this version of the problem, we aim to define two commands, \topic{Title} and \maketopics, such that we can get with the latter a list of all titles defined with the former.

If we were promised that all \topic commands preceded \maketopics, then this would be fairly easy1 2 3:

    \edef\@topics{\@topics \par #1}}

However, a TOC typically comes before all of the content, so this approach won’t work. If we knew exactly how many \topics were going to be defined, maybe we could make-do with an obscene amount of \expandafters, but that’s not going to cut it either. Then how can we get around the fact that LaTeX macros are expanded in the order of appearance?

(Now is a good time to pause reading and figure it out.)

File IO in LaTeX

LaTeX has built in file IO, via the following commands: \newwrite, \openin, \openout (and the counterparts \closein, \closeout), \read, and \write4. Respectively, they do the following:

Most of these operations require a preceding \immediate, otherwise nothing will happen until the current page is flushed.5

Armed with this knowledge, and ready to do some damage, we can go back to the TOC problem: if we’re allowed to compile the tex file more than once6 we can do the following:

  1. On the first pass, don’t do anything with \maketopics, but have each \topic command write a line into a topics.aux file.
  2. On the second pass, define \maketopics simply to read and echo the contents of topics.aux.

The catch here is that we don’t necessarily know, at the time of macro expansion, what’s the current pass; for the sake of simplicity let’s assume that we always clean the auxiliary files before compiling, so the difference between the first and second pass is that our topics.aux files only exists during the second pass.

Then, we need to define a macro that tells us whether a file exists:

\newcommand{\fileexists}[1]{ %
    \immediate\openin\@existsbuf=#1 %
    \ifeof\@existsbuf %
        \@fileexistsfalse %
    \else %
        \@fileexiststrue %
    \fi %
    \immediate\closein\@existsbuf %

Above I’ve used \ifeof, which is true if we’re at the End Of the opened File (i.e., we’ve already read everything in the file), or if the file never existed in the first place. We can use this as follows7:

    file.aux exists.
    file.aux does not exist.

Now, and from the description before, our definition of \topics and \maketopics follow easily — if with one caveat: we need all the writes to occur between a single \openout/\closeout pair, since opening a file will truncate any preexisting contents. Luckily, LaTeX has us covered with \AtEndDocument, which inserts its argument (you guessed it) at the end of the document.

\newcommand{\topics}[1]{ %
    \ifx\@empty\@buftopics %
        \relax % \@buftopics wasn't defined,
           % so we're not writing on this pass.
    \else %
        \immediate\write\@buftopics{#1 \par} %
    \fileexists{topics.aux} %
        % Second pass! Just read the file here
    \input{topics.aux} %
    \else %
        % First pass; do nothing.

\fileexists\relax\else % If file does not exist:
    % Open the file for writing.
    % We need to only do this if we don't plan to
    %  read from the file! Otherwise we'll truncate it.
    \immediate\openout\@buftopics=topics.aux \relax

Now Make it Harder

With the previous example under our belt, let’s again tackle the original problem: we can use the same technique to store the different affiliations in an auxiliary file in a first pass, and then produce the correct symbols and text during a second pass, by reading from this file. The complications will come from having to interpret LaTeX as simple text, and vice-versa. For convenience, I’ll be using below the catchfile and etoolbox packages, to get, respectively, the IfFileExists and CatchFileDef8 commands, and the ifdeflstrequal command. These are more robust versions of what you’d get with TeX primitives, which allows us not to have to deal with some annoyances: for example, while you could compare two strings stored to macros \a and \b with \ifx\a\b, if any of these requires more than one expansion to get to the actual string, the comparison may incorrectly fail. On the other hand, \ifdeflstrequal{\a}{\b} will just work.

I’m running out of steam writing this blog-post, because, as is usual with LaTeX, there are so many tiny details justified by complex reasons; one very good example is the use of \[email protected] rather than just \edef. Instead, I now present my final solution to the proposed problem, with no further comment; figuring it out is left as an exercise to the persistent reader, which can email me at miguelmurca æt cumperativa.xyz, or tweet me @mikeevmm. You can also check out the nerd snipe/Beamer hate-letter that inspired this post.

        % Author list empty
        % Other authors already present
        \global\protected@edef\@authors{\@authors, #1}%

\DeclareRobustCommand{\affiliation}[1]{ %
    \def\affilarg{#1\relax} %
    \protected@edef\affilarg{ %
        \detokenize\expandafter{\affilarg}} %
    % Calculate the footnotemark:
    \setcounter{@affilcounter}{0} %
    % Try to match \affilarg to one of the lines of the aux file
    \immediate\openin\@bufaffils=affils.aux\relax %
    \IfFileExists{affils.aux}{ %
        \newif\ifmatched %
        \matchedfalse %
        % Here I'm using the \unless extension for e-TeX, which
        % comes for free in pdfLaTeX. It's basically \if...\relax\else.
        \loop\unless\ifeof\@bufaffils %
            % Read a line from the file...
            \immediate\read\@bufaffils to\affilline %
            \ifeof\@bufaffils\relax\else %
                % ...and the empty line that follows.
                {\immediate\read\@bufaffils to\relax} %
            \fi %
            \stepcounter{@affilcounter} %
            % Comparing \affilline with \affilarg
            \ifdefstrequal{\affilline}{\affilarg}{ %
                % Matched, at position \[email protected]!
                \global\matchedtrue %
            }{% else
                % Found no match
                \ifeof\@bufaffils %
                    % Also, exhausted the possible matches.
                    \global\setcounter{@affilcounter}{0} %
                \fi %
            } %
            % Break the loop.
            % See this TeXExchange answer for an explanation:
            % https://tex.stackexchange.com/a/12490
            \ifmatched\let\iterate\relax\fi %
        \repeat}{} %
    % Finished matching.
    \immediate\closein\@bufaffils %
    \ifnum\value{@affilcounter}=0 %
        % The affiliation was not found in the file.
        % Write/append it to the auxilliary file.
        % We do this by reading the file into a macro, appending
        %  our new line, and writing it all back.
        % Read the existing contents:
        \IfFileExists{affils.aux}{ %
            \CatchFileDef %
                {\@affilswrite} %
                {affils.aux} %
                {\endlinechar=`^^J}% Preserve EOLs in the file.
                                   % Note that ^^J is TeX-speak for escaped newline.
            }{\let\@affilswrite\@empty} %
        % Open the file:
        \immediate\openout\@bufaffils=affils.aux\relax %
        % Write everything:
        % (Just writing will guarantee a trailing newline.)
        \unless\ifx\@empty\@affilswrite %
            \protected@edef\@affilswrite{ %
                \detokenize\expandafter{\@affilswrite}} %
            \immediate\write\@bufaffils{\@affilswrite} %
        \fi %
        \immediate\write\@bufaffils{\affilarg} %
        \immediate\closeout\@bufaffils %
    \else %
        \def\affilsymb{\fnsymbol{@affilcounter}} %
        \global\protected@edef\@authors{\@authors${}^\affilsymb$} %

\renewcommand{\maketitle}{ %
    \let\@affils\@empty %
    % Load the affiliations:
    \IfFileExists{affils.aux}{ %
        \setcounter{@affilcounter}{0} %
        \immediate\openin\@bufaffils=affils.aux\relax %
        \loop\unless\ifeof\@bufaffils %
            \immediate\read\@bufaffils to\lineaffil %
            {\unless\ifeof\@bufaffils\immediate\read\@bufaffils to\relax\fi} %
            \stepcounter{@affilcounter} %
            \global\def\affilsymb{\fnsymbol{@affilcounter}} %
            \ifx\@empty\@affils %
                \global\protected@edef\@affils{${}^\affilsymb$\lineaffil} %
            \else %
                \global\protected@edef\@affils{ %
                    \@affils, ${}^\affilsymb$\lineaffil} %
            \fi %
        \repeat %
        \immediate\closein\@bufaffils %
    }{} % else nothing
    % Typeset the authors and affiliations:
    \begin{center} %
    \@authors \par %
    \ifx\@empty\@affils %
        \relax% No affiliations
        \textsc{\@affils}} \par

Fine, maybe some comments. The main thing here is that we’re trying to match each affiliation to a line in affils.aux, and appending the affiliation to the file if it’s not there. If it is there, we convert the line index (which we counted with a counter) into a symbol with \fnsymbol. This lets us independently print the authors with the correct affiliation symbols, and then the different affiliations with their respective symbol.

Each write in LaTeX forcibly ends with an empty new-line, and this causes some trouble parsing back the affils.aux file. I worked around this by always writing a lines in pairs: an affiliation followed by an empty line. Then, parsing back the file, I assumed this structure and discarded lines accordingly. This worked well, but I am almost positive that I could have a more elegant solution by going over the file’s lines in a do..while-style loop, rather than the current for-style loop. Speaking of which, in case you’re not familiar, TeX’s loop syntax is a little weird: it’s \loop <content> \if <condition> <true action> \repeat, but the most common pattern is using it as \loop\if<condition> <actions> \repeat as a sort of while loop. But you already knew that.

Another thing, which you might already have noticed, is all the %s. LaTeX isn’t actually insensitive to newlines, and it’s not always clear when it’s safe to break a line. It also doesn’t help that LaTeX’s error reporting is cryptic, so to be safe, and not spend mental bandwidth with it, I just end lines that I’m wrapping for source code reasons with %.

Finally, I also want to comment this pattern:

\protected@edef\x{ %

What we’re doing here is redefining \x to be the string of its current definition. This is more or less straightforward to do with \detokenize, since what this command does is convert its argument to simple text, but here we have the added complication that we need to expand the argument of \detokenize, before actually converting it to simple text. The \expandafter is interrupting LaTeX’s parsing of { (which indicates the start of \detokenize’s argument), and expanding whatever follows immediately after; in this case \x. The detokenization then proceeds normally. See here for a more careful explanation.

OK, that’s actually everything. Do send me emails with suggestions or questions, I love to hear from the internet. But also remember I’m just a kid writing a blog post, and am therefore at the top of the Dunning-Krugger peak. Be kind, please.

Discuss this post on HackerNews


Users gus_massa and zauguin on HackerNews cleverly pointed out that it’s prefectly reasonable to expect every \affiliation command to precede \maketitle, and rather than writing into an auxiliary file, proposed the following vector mechanism:

    \expandafter\def\csname #[email protected]#2\endcsname{#3}%

    \csname #[email protected]#2\endcsname%

After thinking about it, I believe they’re right, and will update beamleeto to use this mechanism instead when I have the time.

  1. Already I’m throwing \edefs at you and mixing them up with \newcommands and so on. I simply don’t know enough (and there’s not enough space in this post) to go over the basics of TeX and LaTeX here, so you may be a little lost if you haven’t already messed around a bit with either one. Furthermore, I have a bad tendency to interchangeably use Plain TeX, e-TeX, and LaTeX commands, since my knowledge is almost strictly operational. In any case, if you’re curious, I can recommend this very good Plain TeX reference

  2. I will, however, give a brief explanation of \makeatletter and \makeatother: typically, @ is not a “letter” token in (La)TeX. However, in TeX, this type of thing is configurable on-the-fly. This makes for a useful mechanism where you can \makeatletter, then define a command that has an @ in their name, and then go back to the default with \makeatother, such that an ordinary user won’t accidentally call this internal macro. (They can still go out of their way to do so, by calling \makeatletter themselves.) 

  3. Fine, I guess I can also explain \edef. It stands for “expand definition”, and it’s for when you want the definition of the macro to be interpreted right now, rather than when the macro is called. The most common example is the one exactly provided here: if we were to \def\@topics{\@topics etc.} then the definition of \@topics would become infinitely recursive. Instead, we mean “define \@topics to be its contents right now plus some stuff,” and therefore we use \edef

  4. I often referred to this reference. Note that it’s applicable to LaTeX, not TeX, and, while it’s a good reference, it’s not a complete one. 

  5. Why? Because you might not know some stuff about the page from where you’re calling the macro until the page has actually been flushed: “By default LaTeX does not write string to the file right away. This is because, for example, you may need \write to save the current page number, but when TeX comes across a \write it typically does not know what the page number is, since it has not yet done the page breaking.” @ 

  6. If you’re using something like latexmk, you get this for free: I’m not sure what mechanism it uses to decide how many times it should recompile the files — maybe auxiliary file stability? — but it recompiles your project as many times as needed. This is because the technique we’re describing here is quite common, and is used, e.g., in reference numbering. (If you’re now finding out about latexmk, you’re very welcome.) 

  7. I kept the above as simple as possible, but it’d be way cooler (and ergonomic) to modify \fileexists so that its use was \if\fileexists{...} ... \fi. This is actually quite easy to achieve, so I’m leaving it as an exercise to the reader. (Hint: you can do it by adding three characters to the current definition.) 

  8. This one’s name isn’t so self-explanatory; it reads the contents of a file into a provided macro, which turns out to be surprisingly hard to do robustly with primitives.