Lambda Operator in LaTeX

[TL;DR: this post is about how to make λx.(xxy) parse as \lambdacalc{x}{xxy} in TeX.]

I recently found out that you can get lambda calculus in TeX’s mouth. And it’s not even a complicated construction! It just makes good use of the way arguments work in TeX.

This again roused my want/hope for a sane way to program inside TeX², and getting away from command expansion hell. In particular, I started wondering whether it was possible to write some bootstrapping code (in TeX) to parse a lambda calculus DSL; something like

λ x.(x x y)

to be expanded to

\lambdacalc{x}{x x y}

After all, we know it’s possible to make TeX do lambda calculus, so I just need a macro that parses into the relevant TeX commands. Now, in principle, this would be fairly easy:

\catcode`λ=\active
\defλ#1{...}

but, alas!, λ is not a single byte character, it’s a two byte character. And LaTeX doesn’t do two byte characters. If you try to compile the above, you’ll run into³

! LaTeX Error: Unicode character λ (U+03BB)
               not set up for use with LaTeX.

Now, as it turns out, a non-negligible number of people use Greek symbols to speak, rather than to do math, which means there had better be a workaround — and there is:

\usepackage[utf8]{inputenc}
\usepackage[LGR]{fontenc}

So, how do they do it? Well, per this TeX Exchange question:

The unicode char ẟ that you used in your input, is coded in utf8 as three bytes, of binary values: 11100001, 10111010 and 10011111. […] For TeX those three bytes are simply three chars, with codes “E1, “BA and “9F respectively (” is the hexadecimal prefix for TeX). What inputenc basically does is to make the character with code “E1 an active char, and define the command associated with that character in such a way that if after it came characters “BA and “9F then the TeX command \delta is issued.¹

Using this idea, it’s fairly simple to make λ correspond to some command; let’s start by extracting the two bytes in λ:

\def\First#1#2{#1}
\def\Second#1#2{#2}

\makeatletter
\edef\lambda@first@oct{\Firstλ}
\edef\lambda@second@oct{\Secondλ}

Now, let’s set the first octet as an active character…

\expandafter\catcode\expandafter`\lambda@first@oct=\active

(note the judicious use of \expandafter to guarantee that \lambda@first@oct is the first thing to be expanded) …and define its meaning to test whether the second octet of λ follows⁴:

\expandafter\def\lambda@first@oct#1{%
    \def\@next{#1}%
    \ifx\lambda@second@oct\@next%
        % TODO
    \else
        % TODO
    \fi}

In order not to mess up anything when doing this (because we are setting a single character to be a command), if we don’t find the second octet we expect, we just output the octet we were taking as an active character, and leave the argument untouched ⁵:

\expandafter\def\lambda@first@oct#1{%
    \def\@next{#1}%
    \ifx\lambda@second@oct\@next%
        % TODO
    \else
        \lambda@first@oct#2%
    \fi}

The other case is when we do detect the two octet sequence we were looking for; in this case, we can just replace the sequence by our command:

\def\@lambdacalc#1.#2{...}

\expandafter\def\lambda@first@oct#1{%
    \def\@next{#1}%
    \ifx\lambda@second@oct\@next%
        \@lambdacalc%
    \else%
        \lambda@first@oct#2%
    \fi}
\makeatother

With this, λx.{x x y z} will already expand to \@lambdacalc{x}{x x y z}; but I’d really like to use regular parenthesis, instead of curly ones. This is no problem if we simply set () to have the correct group catcodes when we encounter a λ. However, catcode gotchas will require some command juggling:

\def\@lambdacalc#1.{%
    \catcode`(=1%
    \catcode`)=2%
    \@lambdacalc@body{#1}}
\def\@lambdacalc@body#1#2{%
    ...
    \catcode`(=11%
    \catcode`)=11}

And that’s it! You can now check that the following outputs [arg:(x) body:(x x y)]:

\def\First#1#2{#1}
\def\Second#1#2{#2}

\makeatletter
\edef\lambda@first@oct{\Firstλ}
\edef\lambda@second@oct{\Secondλ}

\expandafter\catcode\expandafter`\lambda@first@oct=13
\expandafter\def\lambda@first@oct#1{%
    \def\@next{#1}%
    \ifx\lambda@second@oct\@next%
        \@lambdacalc%
    \else%
        \lambda@first@oct#1%
    \fi}
\def\@lambdacalc#1.{%
    \catcode`(=1%
    \catcode`)=2
    \@lambdacalc@body{#1}}
\def\@lambdacalc@body#1#2{%
    [arg:(#1) body:(#2)]
    \catcode`(=11%
    \catcode`)=11}
\makeatother

λx.(x x y)

Expanding this into a lambda calculus system is left as an exercise to the reader. :^)

This explains \inputenc. We don’t care too much about \fontenc. ↩
Yes, I’ve heard of LuaTeX. And yes, I know of the L3 programming layer. And LyX. And Pandoc. We’re just hacking on an archaic programming language, it’s good fun… Sort of. ↩
Interestingly, TeX seems to… just not care. If you run pdftex on something like Hello λ Goodbye, it will happily produce a PDF that reads “Hello Goodbye”. ↩
The need to define \@next has to do with how \ifx compares two macros: the condition is only true if the first level expansion of the two macros matches. You can check for yourself that \ifx\lambd@second@oct#1 fails. The behaviour of \if is even weirder; see this TeX Exchange answer. ↩
“But won’t we be again outputting an active character?” No, because we defined \lambda@first@oct before that octet was an active character. This behaviour of active characters can and will bite you; see, for example, this post in TeX FAQ. ↩