Everyone who knows me IRL (and, I suppose, who follows me online for long enough), knows that I have a… special relationship with LaTeX. I think it has something to do with its obscurity, when it wasn’t specifically made to be obtuse, and then being so good at what it does — which is typeset documents. It doesn’t help that people consistently make impressive things with it, thus showing that it’s not just theoretically Turing-complete, but really something you can bend to your will, provided you’re willing to grapple with books from the 70s and obscure PDFs scattered online, in lieu of some modern documentation.
This is to say, I set out to write a document and then suddenly 5 hours have passed and I’m reading about glue and fragile commands. In the end, it’s rarely worth it, but the giddy feeling of having mastered the weird machine lingers, and so the cycle repeats when the following report (or presentation) is due. As an example of this, let me share with you my recent venture into statefulness via auxiliary files with LaTeX.
The goal was simple: fully decouple metadata input from a title page, in terms of order and redundancy. I wanted to be able to do something like this:
\author{James A. First}
\affiliation{Reduandant Affiliation}
\affiliation{The Institute}
\author{John B. Deux}
\affiliation{Reduandant Affiliation}
\affiliation{The Other Institute}
\maketitle
and get something like this:
James A. First¹² John B. Deux¹³
¹ Redundant Affiliation
² The Institute
³ The Other Institute
This turned out to be a slightly more complex variant of
something that I’d previously managed: creating a Table of
Contents. In this version of the problem, we aim to define two
commands, \topic{Title}
and
\maketopics
, such that we can get with the latter a
list of all titles defined with the former.
If we were promised that all \topic
commands
preceded \maketopics
, then this would be fairly
easy1 2 3:
\makeatletter
\newcommand{\@topics}{}
\newcommand{\topic}[1]{%
\edef\@topics{\@topics \par #1}}
\newcommand{\maketopics}{%
\@topics}
\makeatother
However, a TOC typically comes before all of the
content, so this approach won’t work. If we knew exactly how many
\topic
s were going to be defined, maybe we could
make-do with an obscene amount of \expandafter
s, but
that’s not going to cut it either. Then how can we get around the
fact that LaTeX macros are expanded in the order of
appearance?
(Now is a good time to pause reading and figure it out.)
LaTeX has built in file IO, via the following commands:
\newwrite
, \openin
,
\openout
(and the counterparts
\closein
, \closeout
),
\read
, and \write
4.
Respectively, they do the following:
\newwrite
gets you an unused file descriptor
(a number), which LaTeX requires to do file related operations.
Note that this descriptor does not uniquely match a file; you
simply point a descriptor at a file path;\openin
and \openout
do precisely
this; they bind a descriptor to a file.\read
reads a line from the file descriptor
and into a macro, with (the fun) syntax \read\filedesc
to\myline
.\write
does what it says on the tin:
\write\filedesc{contents}
. Importantly,
contents
is expanded before the write.\closein
and \closeout
just close
the file access.Most of these operations require a preceding
\immediate
, otherwise nothing will happen until the
current page is flushed.5
Armed with this knowledge, and ready to do some damage, we can
go back to the TOC problem: if we’re allowed to compile the
tex
file more than once6 we can do the
following:
\maketopics
, but have each \topic
command write a line into a topics.aux
file.\maketopics
simply
to read and echo the contents of topics.aux
.The catch here is that we don’t necessarily know, at the time
of macro expansion, what’s the current pass; for the sake of
simplicity let’s assume that we always clean the auxiliary files
before compiling, so the difference between the first and second
pass is that our topics.aux
files only exists during
the second pass.
Then, we need to define a macro that tells us whether a file exists:
\makeatletter
\newwrite\@existsbuf
\newif\if@fileexists
\newcommand{\fileexists}[1]{ %
\immediate\openin\@existsbuf=#1 %
\ifeof\@existsbuf %
\@fileexistsfalse %
\else %
\@fileexiststrue %
\fi %
\immediate\closein\@existsbuf %
\if@fileexists}
\makeatother
Above I’ve used \ifeof
, which is true if we’re at
the End Of the opened File (i.e., we’ve already read everything
in the file), or if the file never existed in the first place. We
can use this as follows7:
\fileexists{file.aux}
file.aux exists.
\else
file.aux does not exist.
\fi
Now, and from the description before, our definition of
\topics
and \maketopics
follow easily —
if with one caveat: we need all the writes to occur between a
single \openout
/\closeout
pair, since
opening a file will truncate any preexisting contents. Luckily,
LaTeX has us covered with \AtEndDocument
, which
inserts its argument (you guessed it) at the end of the
document.
\makeatletter
\let\@buftopics\@empty
\newcommand{\topics}[1]{ %
\ifx\@empty\@buftopics %
\relax % \@buftopics wasn't defined,
% so we're not writing on this pass.
\else %
\immediate\write\@buftopics{#1 \par} %
\fi}
\newcommand{\maketopics}{%
\fileexists{topics.aux} %
% Second pass! Just read the file here
\input{topics.aux} %
\else %
% First pass; do nothing.
\fi}
\fileexists\relax\else % If file does not exist:
% Open the file for writing.
% We need to only do this if we don't plan to
% read from the file! Otherwise we'll truncate it.
\newwrite\@buftopics
\immediate\openout\@buftopics=topics.aux \relax
\AtEndDocument{\immediate\closeout\@buftopics}
\fi
\makeatother
With the previous example under our belt, let’s again tackle
the original problem: we can use the same technique to store the
different affiliations in an auxiliary file in a first pass, and
then produce the correct symbols and text during a second pass,
by reading from this file. The complications will come from
having to interpret LaTeX as simple text, and vice-versa. For
convenience, I’ll be using below the catchfile
and
etoolbox
packages, to get, respectively, the
IfFileExists
and
CatchFileDef
8 commands, and the
ifdeflstrequal
command. These are more robust
versions of what you’d get with TeX primitives, which allows us
not to have to deal with some annoyances: for example, while you
could compare two strings stored to macros \a
and
\b
with \ifx\a\b
, if any of these
requires more than one expansion to get to the actual string, the
comparison may incorrectly fail. On the other hand,
\ifdeflstrequal{\a}{\b}
will just work.
I’m running out of steam writing this blog-post, because, as
is usual with LaTeX, there are so many tiny details justified by
complex reasons; one very good example is the
use of \protected@edef
rather than just
\edef
. Instead, I now present my final solution
to the proposed problem, with no further comment; figuring it out
is left as an exercise to the persistent reader, which can email
me at miguelmurca æt cumperativa.xyz
, or tweet me
@mikeevmm. You can
also check out the nerd snipe/Beamer
hate-letter that inspired this post.
\makeatletter
\let\@authors\@empty
\renewcommand{\author}[1]{%
\ifx\@empty\@authors%
% Author list empty
\global\def\@authors{#1}%
\else%
% Other authors already present
\global\protected@edef\@authors{\@authors, #1}%
\fi}
\makeatother
\makeatletter
\newcounter{@affilcounter}
\newwrite\@bufaffils
\DeclareRobustCommand{\affiliation}[1]{ %
\def\affilarg{#1\relax} %
\protected@edef\affilarg{ %
\detokenize\expandafter{\affilarg}} %
% Calculate the footnotemark:
\setcounter{@affilcounter}{0} %
% Try to match \affilarg to one of the lines of the aux file
\immediate\openin\@bufaffils=affils.aux\relax %
\IfFileExists{affils.aux}{ %
\newif\ifmatched %
\matchedfalse %
% Here I'm using the \unless extension for e-TeX, which
% comes for free in pdfLaTeX. It's basically \if...\relax\else.
\loop\unless\ifeof\@bufaffils %
% Read a line from the file...
\immediate\read\@bufaffils to\affilline %
\ifeof\@bufaffils\relax\else %
% ...and the empty line that follows.
{\immediate\read\@bufaffils to\relax} %
\fi %
\stepcounter{@affilcounter} %
% Comparing \affilline with \affilarg
\ifdefstrequal{\affilline}{\affilarg}{ %
% Matched, at position \the@affilcounter!
\global\matchedtrue %
}{% else
% Found no match
\ifeof\@bufaffils %
% Also, exhausted the possible matches.
\global\setcounter{@affilcounter}{0} %
\fi %
} %
% Break the loop.
% See this TeXExchange answer for an explanation:
% https://tex.stackexchange.com/a/12490
\ifmatched\let\iterate\relax\fi %
\repeat}{} %
% Finished matching.
\immediate\closein\@bufaffils %
%
\ifnum\value{@affilcounter}=0 %
% The affiliation was not found in the file.
% Write/append it to the auxilliary file.
% We do this by reading the file into a macro, appending
% our new line, and writing it all back.
% Read the existing contents:
\IfFileExists{affils.aux}{ %
\CatchFileDef %
{\@affilswrite} %
{affils.aux} %
{\endlinechar=`^^J}% Preserve EOLs in the file.
% Note that ^^J is TeX-speak for escaped newline.
}{\let\@affilswrite\@empty} %
% Open the file:
\immediate\openout\@bufaffils=affils.aux\relax %
% Write everything:
% (Just writing will guarantee a trailing newline.)
\unless\ifx\@empty\@affilswrite %
\protected@edef\@affilswrite{ %
\detokenize\expandafter{\@affilswrite}} %
\immediate\write\@bufaffils{\@affilswrite} %
\fi %
\immediate\write\@bufaffils{\affilarg} %
\immediate\closeout\@bufaffils %
%
\else %
\def\affilsymb{\fnsymbol{@affilcounter}} %
\global\protected@edef\@authors{\@authors{{300747552991929364975411410400750506946}}} %
\fi}
\makeatother
\makeatletter
\renewcommand{\maketitle}{ %
\let\@affils\@empty %
% Load the affiliations:
\IfFileExists{affils.aux}{ %
\setcounter{@affilcounter}{0} %
\immediate\openin\@bufaffils=affils.aux\relax %
\loop\unless\ifeof\@bufaffils %
\immediate\read\@bufaffils to\lineaffil %
{\unless\ifeof\@bufaffils\immediate\read\@bufaffils to\relax\fi} %
\stepcounter{@affilcounter} %
\global\def\affilsymb{\fnsymbol{@affilcounter}} %
\ifx\@empty\@affils %
\global\protected@edef\@affils{{{52276962561381568022211122770805081203}}\lineaffil} %
\else %
\global\protected@edef\@affils{ %
\@affils, {{77631689972302921409136044017826427483}}\lineaffil} %
\fi %
\repeat %
\immediate\closein\@bufaffils %
}{} % else nothing
%
% Typeset the authors and affiliations:
\begin{center} %
\@authors \par %
\ifx\@empty\@affils %
\relax% No affiliations
\else%
\textsc{\@affils}} \par
\fi%
\end{center}}
\makeatother
Fine, maybe some comments. The main thing here is
that we’re trying to match each affiliation to a line in
affils.aux
, and appending the affiliation to the
file if it’s not there. If it is there, we convert the
line index (which we counted with a counter) into a symbol with
\fnsymbol
. This lets us independently print the
authors with the correct affiliation symbols, and then the
different affiliations with their respective symbol.
Each write in LaTeX forcibly ends with an empty new-line, and
this causes some trouble parsing back the affils.aux
file. I worked around this by always writing a lines in pairs: an
affiliation followed by an empty line. Then, parsing back the
file, I assumed this structure and discarded lines accordingly.
This worked well, but I am almost positive that I could have a
more elegant solution by going over the file’s lines in a
do..while
-style loop, rather than the current
for
-style loop. Speaking of which, in case you’re
not familiar, TeX’s loop syntax is a little weird: it’s
\loop <content> \if <condition> <true
action> \repeat
, but the most common pattern is using
it as \loop\if<condition> <actions>
\repeat
as a sort of while
loop. But you already
knew that.
Another thing, which you might already have noticed, is all
the %
s. LaTeX isn’t actually insensitive to
newlines, and it’s not always clear when it’s safe to break a
line. It also doesn’t help that LaTeX’s error reporting is
cryptic, so to be safe, and not spend mental bandwidth with it, I
just end lines that I’m wrapping for source code reasons with
%
.
Finally, I also want to comment this pattern:
\protected@edef\x{ %
\detokenize\expandafter{\x}}
What we’re doing here is redefining \x
to be the
string of its current definition. This is more or less
straightforward to do with \detokenize
, since what
this command does is convert its argument to simple text, but
here we have the added complication that we need to
expand the argument of \detokenize
, before
actually converting it to simple text. The
\expandafter
is interrupting LaTeX’s parsing of
{
(which indicates the start of
\detokenize
’s argument), and expanding whatever
follows immediately after; in this case \x
. The
detokenization then proceeds normally.
See here for a more careful explanation.
OK, that’s actually everything. Do send me emails with suggestions or questions, I love to hear from the internet. But also remember I’m just a kid writing a blog post, and am therefore at the top of the Dunning-Krugger peak. Be kind, please.
Discuss this post on HackerNews
Users gus_massa
and zauguin on
HackerNews cleverly pointed out that it’s prefectly reasonable to
expect every \affiliation
command to precede
\maketitle
, and rather than writing into an
auxiliary file, proposed the following vector mechanism:
\newcommand{\defwithindex}[3]{%
\expandafter\def\csname #1@#2\endcsname{#3}%
}
\newcommand{\getwithindex}[2]{%
\csname #1@#2\endcsname%
}
After thinking about it, I believe they’re right, and will
update beamleeto
to use this mechanism instead when
I have the time.
Already I’m throwing \edef
s at you and
mixing them up with \newcommands
and so on. I
simply don’t know enough (and there’s not enough space in
this post) to go over the basics of TeX and LaTeX here, so
you may be a little lost if you haven’t already messed
around a bit with either one. Furthermore, I have a bad
tendency to interchangeably use Plain TeX, e-TeX, and LaTeX
commands, since my knowledge is almost strictly
operational. In any case, if you’re curious, I can
recommend this very
good Plain TeX reference. ↩
I will, however, give a brief explanation of
\makeatletter
and \makeatother
:
typically, @
is not a “letter” token in
(La)TeX. However, in TeX, this type of thing is
configurable on-the-fly. This makes for a useful mechanism
where you can \makeatletter
, then define a
command that has an @
in their name, and then
go back to the default with \makeatother
, such
that an ordinary user won’t accidentally call this internal
macro. (They can still go out of their way to do so, by
calling \makeatletter
themselves.) ↩
Fine, I guess I can also explain \edef
. It
stands for “expand definition”, and it’s for when you want
the definition of the macro to be interpreted right
now, rather than when the macro is called. The most
common example is the one exactly provided here: if we were
to \def\@topics{\@topics etc.}
then the
definition of \@topics
would become infinitely
recursive. Instead, we mean “define \@topics
to be its contents right now plus some stuff,” and
therefore we use \edef
. ↩
I often referred to this reference. Note that it’s applicable to LaTeX, not TeX, and, while it’s a good reference, it’s not a complete one. ↩
Why? Because you might not know some stuff about the
page from where you’re calling the macro until the page has
actually been flushed: “By default LaTeX does not write
string to the file right away. This is because, for
example, you may need \write
to save the
current page number, but when TeX comes across a
\write
it typically does not know what the
page number is, since it has not yet done the page
breaking.” @ ↩
If you’re using something like
latexmk
, you get this for free: I’m not sure
what mechanism it uses to decide how many times it should
recompile the files — maybe auxiliary file stability? — but
it recompiles your project as many times as needed. This is
because the technique we’re describing here is quite
common, and is used, e.g., in reference numbering. (If
you’re now finding out about latexmk
, you’re
very welcome.) ↩
I kept the above as simple as possible, but it’d be way
cooler (and ergonomic) to modify \fileexists
so that its use was \if\fileexists{...} ...
\fi
. This is actually quite easy to achieve, so I’m
leaving it as an exercise to the reader. (Hint: you can do
it by adding three characters to the current
definition.) ↩
This one’s name isn’t so self-explanatory; it reads the contents of a file into a provided macro, which turns out to be surprisingly hard to do robustly with primitives. ↩