[TL;DR: this post is about how to make λx.(xxy)
parse as \lambdacalc{x}{xxy}
in TeX.]
I recently found out that you can get lambda calculus in TeX’s mouth. And it’s not even a complicated construction! It just makes good use of the way arguments work in TeX.
This again roused my want/hope for a sane way to program inside TeX2, and getting away from command expansion hell. In particular, I started wondering whether it was possible to write some bootstrapping code (in TeX) to parse a lambda calculus DSL; something like
λ x.(x x y)
to be expanded to
\lambdacalc{x}{x x y}
After all, we know it’s possible to make TeX do lambda calculus, so I just need a macro that parses into the relevant TeX commands. Now, in principle, this would be fairly easy:
\catcode`λ=\active
\defλ#1{...}
but, alas!, λ is not a single byte character, it’s a two byte character. And LaTeX doesn’t do two byte characters. If you try to compile the above, you’ll run into3
! LaTeX Error: Unicode character λ (U+03BB)
not set up for use with LaTeX.
Now, as it turns out, a non-negligible number of people use Greek symbols to speak, rather than to do math, which means there had better be a workaround — and there is:
\usepackage[utf8]{inputenc}
\usepackage[LGR]{fontenc}
So, how do they do it? Well, per this TeX Exchange question:
The unicode char ẟ that you used in your input, is coded in utf8 as three bytes, of binary values: 11100001, 10111010 and 10011111. […] For TeX those three bytes are simply three chars, with codes “E1, “BA and “9F respectively (” is the hexadecimal prefix for TeX). What inputenc basically does is to make the character with code “E1 an active char, and define the command associated with that character in such a way that if after it came characters “BA and “9F then the TeX command \delta is issued.1
Using this idea, it’s fairly simple to make λ
correspond to some command; let’s start by extracting the two
bytes in λ
:
\def\First#1#2{#1}
\def\Second#1#2{#2}
\makeatletter
\edef\lambda@first@oct{\Firstλ}
\edef\lambda@second@oct{\Secondλ}
Now, let’s set the first octet as an active character…
\expandafter\catcode\expandafter`\lambda@first@oct=\active
(note the judicious use of \expandafter
to
guarantee that \lambda@first@oct
is the first thing
to be expanded) …and define its meaning to test whether the
second octet of λ follows4:
\expandafter\def\lambda@first@oct#1{%
\def\@next{#1}%
\ifx\lambda@second@oct\@next%
% TODO
\else
% TODO
\fi}
In order not to mess up anything when doing this (because we are setting a single character to be a command), if we don’t find the second octet we expect, we just output the octet we were taking as an active character, and leave the argument untouched 5:
\expandafter\def\lambda@first@oct#1{%
\def\@next{#1}%
\ifx\lambda@second@oct\@next%
% TODO
\else
\lambda@first@oct#2%
\fi}
The other case is when we do detect the two octet sequence we were looking for; in this case, we can just replace the sequence by our command:
\def\@lambdacalc#1.#2{...}
\expandafter\def\lambda@first@oct#1{%
\def\@next{#1}%
\ifx\lambda@second@oct\@next%
\@lambdacalc%
\else%
\lambda@first@oct#2%
\fi}
\makeatother
With this, λx.{x x y z}
will already expand to
\@lambdacalc{x}{x x y z}
; but I’d really like to use
regular parenthesis, instead of curly ones. This is no problem if
we simply set ()
to have the correct group
catcodes when we encounter a λ. However, catcode gotchas will
require some command juggling:
\def\@lambdacalc#1.{%
\catcode`(=1%
\catcode`)=2%
\@lambdacalc@body{#1}}
\def\@lambdacalc@body#1#2{%
...
\catcode`(=11%
\catcode`)=11}
And that’s it! You can now check that the following outputs
[arg:(x) body:(x x y)]
:
\def\First#1#2{#1}
\def\Second#1#2{#2}
\makeatletter
\edef\lambda@first@oct{\Firstλ}
\edef\lambda@second@oct{\Secondλ}
\expandafter\catcode\expandafter`\lambda@first@oct=13
\expandafter\def\lambda@first@oct#1{%
\def\@next{#1}%
\ifx\lambda@second@oct\@next%
\@lambdacalc%
\else%
\lambda@first@oct#1%
\fi}
\def\@lambdacalc#1.{%
\catcode`(=1%
\catcode`)=2
\@lambdacalc@body{#1}}
\def\@lambdacalc@body#1#2{%
[arg:(#1) body:(#2)]
\catcode`(=11%
\catcode`)=11}
\makeatother
λx.(x x y)
Expanding this into a lambda calculus system is left as an exercise to the reader. :^)
This explains \inputenc
. We don’t care too
much about \fontenc
. ↩
Yes, I’ve heard of LuaTeX. And yes, I know of the L3 programming layer. And LyX. And Pandoc. We’re just hacking on an archaic programming language, it’s good fun… Sort of. ↩
Interestingly, TeX seems to… just not care. If you run
pdftex
on something like Hello λ
Goodbye
, it will happily produce a PDF that reads
“Hello Goodbye”. ↩
The need to define \@next
has to do with
how \ifx
compares two macros: the condition is
only true if the first level expansion of the two
macros matches. You can check for yourself that
\ifx\lambd@second@oct#1
fails. The behaviour
of \if
is even weirder; see this TeX
Exchange answer. ↩
“But won’t we be again outputting an active character?”
No, because we defined \lambda@first@oct
before that octet was an active character. This
behaviour of active characters can and will bite you; see,
for example, this
post in TeX FAQ. ↩