LaTeX3 Automatic Labels for Fun and No Profit

I’m a PhD student in physics, which means I’ve spent the better part of the last 10 years writing LaTeX. For those not in the know, LaTeX is a 40-year old superset of a 46-year old typesetting system – i.e., a macro based programming language to produce print documents¹. Notably, it’s mostly intended not as a programming language; its strongest suit is arguably the way it beautifully typesets mathematics, and its solution to express complex mathematical expressions. For example,

refers to \(\sqrt{\frac{a+b}{\vert{c}\vert}}\). Indeed, if you’ve ever needed to write a mathematical expression into a computer, you’re likely to have used either TeX or some form of pidgin TeX.²

But, (La)TeX really is Turing complete – it’s just extremely convoluted. This makes (La)TeX a reasonably fun esoteric programming language to play around with.³ On the other hand, being able to wrangle (La)TeX’s macro system lets you automate repetitive tasks, or generally extend (La)TeX’s functionality, which end up being of practical use.⁴ This is why there is also a community effort to improve programming ("macro writing") in LaTeX.

LaTeX3 is a “new kernel for LaTeX [… based on] its own consistent interface to all the functions needed to control TeX."^[@expl3.pdf] The “new” qualifier is arguable, seeing as it has been in development since 1989, but, in short, we now have access to a set of base macros in LaTeX which are more sophisticated and behave more predictably. Unfortunately, however, due to Knuth’s own fondness of literate programming (the man invented the concept, after all), and because of the nature of LaTeX’s output,^{[citation
needed]} most information about LaTeX3’s functionality is buried deep in long PDFs with interspersed prose and code and accessible only via

texdoc <designator you must
  guess>

.⁵ ⁶ A notable (and welcome) exception is this article by Alan Xiang, which I recommend reading. In any case, this post is my attempt to make a small contribution to practical and digestible LaTeX3 materials, so that you, too, can procrastinate writing your document by writing very convoluted LaTeX macros.

Problem statement and goal

In LaTeX, you may \label{...} sections, equations, …, and later refer to their identifier with \ref{...}. So, for example,

As said, this also works for equations, and is most useful in mathematical documents, where you want to reference equations in the body of text. LaTeX2e provides the equation environment, which automatically typesets a nice (eqno) next to your equation, and which you can label and reference: ⁷

However, labels must be unique throughout the document. And so, when writing out a long document, it quickly becomes quite upsetting to think up a good label for an equation you are certain you will only reference in the next line of text. Wouldn’t it be nice to have some \AutoLabel and \AutoRef macros that would let you just

First steps

The simplest approach to this problem is to have \AutoLabel generate a different label every time it’s called in a systematic way, and have \AutoRef reference that label when it’s called.

Anything between \ExplSyntaxOn and \ExplSyntaxOff obeys LaTeX3’s syntax rules, rather than LaTeX2e’s rules. Indeed, (La)TeX allows you to (roughly speaking) re-define syntax rules on the fly; in particular, the “function” of each character as it is parsed. So, for example, while \ is usually a special character (indicating that what follows is a control sequence), you can change its value halfway through the document to function as a regular character. A typical (ab)use of this mechanism in LaTeX2e was the modification of the “character type” (the “character code”, or “catcode")⁸ of @ between special character and regular letter to guard internal commands from the end-user. This procedure was summarized with the \makeatletter and \makeatother commands (to, respectively, set the catcode of @ to “regular letter” and “other character"), and you’ll often find, in LaTeX2e code, code that looks like:

Calling \@internalcommand after \makeatother yields an error, because, after \makeatother is called, LaTeX doesn’t know how to interpret the character @ anymore. This also highlights the fact that the catcode switching happens as these declarations are parsed, even though the expressions in the declarations themselves are not evaluated at the time of declaration. This finds expression in the following example, a mistake I’ve incurred in more than once:

Calling \boom later in the document errors out, since at the time of declaration, \@ is parsed as the command in the declaration, not \@destroytheworld. At macro declaration time, the parser sees \makeatletter, but only as a macro that is part of the declaration – it doesn’t make sense to “run” the definition at declaration time, after all. It thus parses the rest of the declaration based on its current rules, which do not expect @ to be a regular character. So, as far as the parser cares, \boom reads

[macro: \makeatletter] [macro: \@] [letters:
  destroytheworld] [macro: \makeatother]

. When \boom is finally called, LaTeX complains about the use of \@.

This is a reasonably long tangent to say that \ExplSyntaxOn modifies catcodes, to favour better and more consistent macro names, and to provide a nicer “programming” environment. There are two notable effects: first, within expl3 rules, spaces are ignored. Really, dealing with spaces within macro declarations is a true pain point of LaTeX2e; not just spaces you type out, but also spaces implied by new lines. Here’s an example; what do you expect the differences to be between \a, \b, \c, and \d?

I tested these out by calling .\a..\b..\c..\d. within a LaTeX document; the periods are there to highlight eventual spacing. The results are as follows:

(I’ve highlighted spaces for you, with the ␣ character.) Essentially, what you would expect happens in the body of text – that a single newline becomes a space – is also happening within the macro declaration, inserting spurious spaces everywhere. These spaces can only be avoided by inserting comment characters, %, before the end of the line, which the parser interprets as an explicit instruction to ignore the newline. This results in a mess of % everywhere in macro declarations, since, at some point, macro writers start putting a % at the end of every line where they don’t explicitly want some space. While, in the example given, the spaces are not too critical (although unwanted), within complicated macro declarations spurious spaces will cause some really nasty bugs and crashes. The situation is even worse with double newlines, which get transformed into a paragraph,⁹ and that will really mess up your command’s parsing.

The other notable catcode switch is making : and _ “regular letters”. So, while in LaTeX2e, macro names are generally composed of a-zA-Z (and, eventually, @), in LaTeX3, macros names are expected to be of the form (texdoc expl3, sec. 3.2.),

where module would be a unique per-macro package prefix, to avoid name collisions, and description would be the actual name of the macro. arg-spec is a bit more special, telling you what kind of argument the macro expects, and, notably, if the command will expand it or not. Think of it as only slightly more than type hints. Macro expansion is a big topic – too big to cover here – but here’s an illustrative example:

In expl3, the arg-spec bit of a macro will tell you what the macro consumes, and if it will expand what it consumes. The main ones to care about (though you can find a complete list in texdoc expl3, chapter 3, and throughout the main reference for programming in LaTeX3, texdoc interface3) are:

The notion of a “variable” also appears here, as LaTeX3 tries to separate the notion of a macro that does something (a function) from a macro that merely stores some value (a variable). So, in LaTeX3 terms, \foo below would be a variable, whereas \baz is a function:

And, finally, heck it, texdoc expl3 specifies that variables should be named as \⟨scope⟩_⟨module⟩_⟨description⟩_⟨type⟩, so here’s some good, honest to God, LaTeX3:

This is the price to pay for the magical expansion control tools that LaTeX3 gives.

Here, we’re declaring a new integer variable. As good LaTeX3 netizens, we use the correct variable naming scheme, and declare the variable to be global with g_ (labels will need to be globally unique to the document, so this makes sense). We can expect this variable to be initialized to 0; from page 168 of texdoc interface3:

The idea will be that we can generate infinite unique labels by choosing a unique prefix, and suffixing it with an increasing integer; \g_autolabel_int will hold this integer.

Here we declare a user-facing command, with \NewDocumentCommand (LaTeX3’s version of \newcommand, much nicer to use and about which you can read with texdoc xparse). Thus, there is no need to adhere to LaTeX3’s naming conventions. The declaration furthermore specifies that it takes no arguments from the user, and leaves in its place, when called, two instructions: a global increment of the \g_autolabel_int variable, and

\exp_args:Ne \label {autolabelprefix- \int_use:N
  \g_autolabel_int}

. Here is what interface3.pdf has to say about \exp_args:Ne:

Ideally, we would have some labelling function \label:n with a variant \label:e, that would expand whatever would be given to it before taking the result as the actual label text. But, we don’t, since label is a LaTeX2e command (

texdoc
  latex2e

, chap. 7.1). So, we use LaTeX3’s \exp_args:Ne, to leave the first token (\label) alone while the braced tokens that follow are not fully expanded. Only then is this token list passed as argument to \label.

Inside the token list, we find

autolabelprefix-
  \int_use:N \g_autolabel_int

: since \int_use:N will “[recover] the content of an ⟨integer⟩ and [place] it directly in the input stream”, we get, after full expansion, our unique label, of the form

It’s fairly similar to \AutoLabel, with the exception that the relevant LaTeX2e function is now \ref, rather than \label. Since we know the form of the generated labels, we reconstruct the label at referencing time, by once again using the \g_autolabel_int variable and \exp_args:Ne.

Multiple Labels

The above solution works fine, but it quickly breaks down if we wish to reference two equations after both their declarations. To recover a previous example,

We can deal with this quite simply, by counting the number of labels and references separately:

I’ve sneaked in \c_autoprefix_tl as a constant token list holding the prefix, just because it looks that much nicer than a repeated arbitrary constant string throughout the source, and I’ve also done away with the scope for the sake of readability. Otherwise everything is still quite similar to the previous definition.

Things start going awry when using, for example, the gather environment from the amsmath package:

What’s going on? LaTeX3 does have some console debugging capabilities, with commands such as tl_show:N, but it’s easier, here, to just go with LaTeX’s version of print-debugging: placing values directly in the text. We thus modify the \AutoLabel definition to not only label the equations, but also place the label’s name in the text stream:

Huh?! Where did labels 1 and 2 go? The problem seems to happen only with amsmath environments, and so a little bit of trolling its documentation (texdoc amsmath) reveals the following paragraph:¹¹

Uh oh. Our \AutoLabel function is getting called twice: once in a box that amsmath uses to perform measurements, and then discards, and then again when the actual typesetting happens. To make matters worse, amsmath is using a Plain TeX Boolean, \ifmeasuring@, and the Boolean’s name has a @ in it. Here’s what interface3.pdf has to say on its Booleans, and Plain TeX’s Booleans:

Plain TeX Booleans are very fickle, and likely to misbehave if the Boolean test looks anything other than

We might get away with giving some arguments to \a or \b, but no more than that; certainly, something like \ifbool\a\b\c\else\d\e\f\fi is bound to cause you trouble. So, our first step is moving the whole of \AutoLabel into a single macro of its own:

Now, we need only to test \ifmeasuring@ inside \AutoLabel, and behave accordingly; i.e., if, indeed, measuring, we don’t want to do anything at all. But, recall that LaTeX3 and LaTeX2e’s relationship with @ is different, and LaTeX3 won’t correctly interpret \ifmeasuring@ at all (it’s not expecting macro names to have @s in them). We may get around this by first using a c-type argument, and an auxiliary definition:¹²

We’re using the Nc variant of \cs_set_eq:NN, about which interface3.pdf tells us:

Is this very kosher? Absolutely not. Is this the least cursed LaTeX macro ever written? Not even close, and it works, to boot. Run your compilation steps again, and you’ll find the document now correctly reads

Some final improvements

The last thing that’s missing is the ability to repeatedly reference the same equation. As it stands, there is no way to typeset something like

Let’s fix that. Specifically, let’s modify \AutoRef to take in an optional argument. If the argument is present, and equal to n, then that command should reference the nth equation before it. Thus, our previous example would correspond to something like

I’ll once again give you the final answer, and then break down any new elements:

First, note that I’ve also guarded the \g_autoref_int counter against double incrementation due to measuring – something we’d neglected to previously do, but it’s plausible \AutoRef is called within an amsmath environment. Otherwise, the main difference to the previous definitions has to do with the inclusion of the optional (o) argument in the declaration of \AutoRef.When no value is given for these arguments, they take a special flag value (usually denoted -NoValue- in the documentation). xparse provides the nice IfValue(TF) macros to deal with the cases where, respectively, the argument takes some value, or no value.

So, when we find that the user passed some optional value, we calculate the corresponding “absolute label number”; this is given by the current auto-labelling number, minus the number the user gave minus one (in other words: if g_autolabel_int is currently 2, implying autoprefix-1 and -2 were already defined, and the user passes in \AutoRef[2], they are referencing autoprefix-1, and

1 =
  2-(2-1)

If, on the other hand, the user did not pass in any value, then they are referencing the most recent label; as before, we increment g_autoref_int, and set \l_tmpa_int to hold this number:¹³ ¹⁴

This almost works, except for the following edge case: what do you expect the following code to produce?

As it stands, the first call to \AutoRef[1] correctly references the label of the first equation, but it does not advance \g_autoref_int, because a value is given. Therefore, when \AutoRef is called a second time, in the second paragraph, it again references the first equation. This is, however, the behaviour we might expect in the following situation (which, with apologies, is more synthetic):

The bottom line is that, to get the intended behaviour, calling \AutoRef[1] should be indifferent from simply calling \AutoRef; but only in the case where the two would produce the same result. This is not currently the case. But!, LaTeX3 has us covered with some pretty comprehensive arithmetic tools. The following modification to \AutoRef is sufficient:

The idea is not so different; but now, instead of addressing the optional argument directly (#1), we start by assigning it to \l_tmpb_tl. Then, if we find that the user did pass in some value, but that this value is 1, and would result in the same as not having provided any value at all, we put the empty value into \l_tmpb_tl. The rest of the command is therefore processed, in that case, as though the user had not supplied a value.

Using \l_tmpb_tl instead of #1 requires some further care with \exp_args:Ne to make sure the value of \l_tmpb_tl is expanded before it’s tested, but nothing special.

Final remarks

While quite a modest contribution to the very hefty history of LaTeX packages, it nonetheless served as a basis for discussing important points of LaTeX3 macro writing, such as syntax differences, token list manipulation, controlled expansion, variables and functions, Boolean tests, and LaTeX2e interfacing. Not too shabby. Hopefully this post is also a good enough introduction that you may now directly reference important reference documents, such as interface3.pdf, yourself.

Personally, I also find the macro itself quite useful in practice, and it’s something I’ve been using in my scientific writing. I am yet to find out if Physical Review will be upset by it.

As a parting gift, and to point the interested user towards property lists, here is an exercise for the reader: the revtex4-2 document class, mandated by Physical Review, does not support the use of \footnotetext and \footnotenumber. This means that any footnote text really must be in the middle of the source of the body of text. Can you write \FootnoteLater and \FootnoteNow to rectify this?

A Post-Scriptum for HackerNews readers: I like to submit these posts to HN, as I feel like the average HN user fits the intended audience. But, 1. famously, HN can be quite predictable in some of their responses (by what I expect is, essentially, a meme effect), and 2. I’ve had some unexpected experiences resulting from previously reaching FP in HN. So, consider this a preemptive response to some points I expect to be raised. If you’re coming in from HN and you see someone fail to account for these answers, you’ll know they haven’t even read the whole thing. So:

Written, respectively, by Leslie Lamport and Donald Knuth, two very significant computer scientists and mathematicians (though this is certainly understating Knuth’s importance). ↩
You’ll never guess what I typed into my Markdown file to get the expression above. ↩
If you’re curious, check out this Plain TeX reference. ↩
Look no further than LaTeX: originally written in TeX in order to improve TeX’s usage. ↩
This is true of TeX in its generality. Famously, TeX’s main reference is a book you need to buy. It’s called The TeXbook, and its a pretty good read, if you’re into that sort of thing. Knuth is a very good writer. But you either become a TeX expert (a TeXnic, as it were) before using TeX, or you’re out of luck. ↩
The “what do I type after texdoc” problem is real. Here’s an easy example: what would you type into your terminal in order to get the documentation for the LaTeX3 macros? If you answered texdoc latex3, congratulations, LaTeX has not yet fully consumed your psyche. The correct answer would be texdoc expl3, of course. Or, texdoc interface3, if you’re looking for the LaTeX3 programming reference. Except what’s inherited from LaTeX2e, which is mostly found in texdoc latex2e (not counting its many supporting documents, like texdoc ltshipout-doc). Doesn’t it all roll off the tongue? ↩
This is ever so slightly controversial; at least, Knuth certainly disagrees with this practice, distinguishing clearly between “display math”, for when you want to prominently display some math, and a proper equation. The latter should be labeled, whereas the former should not. However, in scientific circles, it is also a little controversial not to label every equation in the document – as, for example, it makes discussing the document a lot harder – and most people simply do not share Knuth’s appreciation for good typography. I’ll assume in the body of text the practice of labeling every equation, with apologies to the purists. ↩
See this reference for a complete list of catcodes. ↩
Actually, they get transformed into a \par, which is a complicated beast. See texdoc latex2e, chapter 15.1. ↩
“Ah, but couldn’t you just \cs_new:Nn \foo:V?” From interface3.pdf (texdoc interface3): “[…] the functions in Subsections 4.3.2 and 4.3.3 [cs_new:Nn and friends] are primarily meant to define base functions only. Base functions can only have the following argument specifiers: N and n […] T and F […] p and w […]. […] You should define the base function and then use \cs_generate_variant:Nn to generate custom variants as described in Section 5.2.” ↩
I’ll be honest with you: I have no idea how I managed to find this. I seem to remember some comment in an only-tangentially-related TeX StackExchange question mentioning \ifmeasuring@, from whence I looked up its documentation and figured out what was happening. ↩
Why do I define the auxiliary amsmath_ifmeasuring macro inside the \AutoLabel declaration? Because amsmath is also defining \ifmeasuring@ on the fly. You’ll find that, at the beginning of the document, \ifmeasuring@ is undefined; only at the moment of expansion of \AutoLabel (i.e., when \AutoLabel is called), does \ifmeasuring@ contain the correct definition. ↩
l_tmpa_xx and l_tmpb_xx, where xx varies for each type, are general purpose scratch variables that LaTeX3 provides as already defined. ↩
This is one of the aforementioned cases where we can get away with a little more than a single macro between an \if and an \else (or \fi). But, be wary. ↩