<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Eli Bendersky's website - Python</title><link href="https://eli.thegreenplace.net/" rel="alternate"></link><link href="https://eli.thegreenplace.net/feeds/python.atom.xml" rel="self"></link><id>https://eli.thegreenplace.net/</id><updated>2026-06-14T03:20:58-07:00</updated><entry><title>Plugins case study: Pluggy</title><link href="https://eli.thegreenplace.net/2026/plugins-case-study-pluggy/" rel="alternate"></link><published>2026-06-13T20:21:00-07:00</published><updated>2026-06-14T03:20:58-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2026-06-13:/2026/plugins-case-study-pluggy/</id><summary type="html">&lt;p&gt;Recently I came upon &lt;a class="reference external" href="https://pluggy.readthedocs.io/en/latest/"&gt;Pluggy&lt;/a&gt;,
a Python library for developing plugin systems. It was originally developed
as part of the &lt;tt class="docutils literal"&gt;pytest&lt;/tt&gt; project - known for its rich plugin ecosystem - and
later extracted into a standalone library. You're supposed to reach out for
Pluggy if you want to add a plugin system …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Recently I came upon &lt;a class="reference external" href="https://pluggy.readthedocs.io/en/latest/"&gt;Pluggy&lt;/a&gt;,
a Python library for developing plugin systems. It was originally developed
as part of the &lt;tt class="docutils literal"&gt;pytest&lt;/tt&gt; project - known for its rich plugin ecosystem - and
later extracted into a standalone library. You're supposed to reach out for
Pluggy if you want to add a plugin system to your tool or library and want
to use something proven rather than rolling your own.&lt;/p&gt;
&lt;p&gt;In this post I will share some notes on how Pluggy works, and will
then review how it aligns with the
&lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures"&gt;fundamental concepts of plugin infrastructures&lt;/a&gt;.&lt;/p&gt;
&lt;img alt="Pluggy plug logo" class="align-center" src="https://eli.thegreenplace.net/images/2026/pluggy-plug.png" /&gt;
&lt;div class="section" id="using-pluggy"&gt;
&lt;h2&gt;Using Pluggy&lt;/h2&gt;
&lt;p&gt;Pluggy is built around the concept of &lt;em&gt;hooks&lt;/em&gt;: functions that host
applications or tools (from here on, just &amp;quot;hosts&amp;quot;) expose and plugins implement.
A host exposes hooks by using
a decorator returned from &lt;tt class="docutils literal"&gt;pluggy.HookspecMarker&lt;/tt&gt; and a plugin implements this
hook using a decorator returned from &lt;tt class="docutils literal"&gt;pluggy.HookimplMarker&lt;/tt&gt;.&lt;/p&gt;
&lt;p&gt;Pluggy's &lt;a class="reference external" href="https://pluggy.readthedocs.io/en/stable/"&gt;documentation&lt;/a&gt; explains
this fairly well; in this post, I'll show how to implement the &lt;tt class="docutils literal"&gt;htmlize&lt;/tt&gt; tool
with some plugins, introduced in &lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures"&gt;the original article in my plugin series&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As a reminder, &lt;tt class="docutils literal"&gt;htmlize&lt;/tt&gt; is a toy tool that takes markup notation similar to
reStructuredText, and converts it to to HTML. It supports plugins to handle
custom &amp;quot;roles&amp;quot; like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;some text :role:`customized text` and more text
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As well as plugins that do arbitrary processing on the entire text.&lt;/p&gt;
&lt;div class="section" id="defining-hooks"&gt;
&lt;h3&gt;Defining hooks&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2026/plugin-pluggy/htmlize/htmlize"&gt;Out host&lt;/a&gt; defines two hooks:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pluggy&lt;/span&gt;

&lt;span class="n"&gt;hookspec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pluggy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HookspecMarker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;htmlize&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@hookspec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstresult&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;htmlize_role_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Return a function accepting role contents.&lt;/span&gt;

&lt;span class="sd"&gt;    The function will be called with a single argument - the role contents, and&lt;/span&gt;
&lt;span class="sd"&gt;    should return what the role gets replaced with.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="nd"&gt;@hookspec&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;htmlize_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Return a function accepting full document contents.&lt;/span&gt;

&lt;span class="sd"&gt;    The function will be called with a single argument - the document contents&lt;/span&gt;
&lt;span class="sd"&gt;    (after paragraph splitting and role processing), and should return the&lt;/span&gt;
&lt;span class="sd"&gt;    transformed contents.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A hook is created by calling &lt;tt class="docutils literal"&gt;HookspecMarker&lt;/tt&gt; with the project's name. This
project name has to match between the host and its plugins. Pluggy is permissive
about what hooks accept as parameters and what they return; for maximal
flexibility and to stay true to the original &lt;tt class="docutils literal"&gt;htmlize&lt;/tt&gt; example, our hooks
return functions.&lt;/p&gt;
&lt;p&gt;To accompany this &lt;tt class="docutils literal"&gt;HookspecMarker&lt;/tt&gt;, the host also defines a &lt;tt class="docutils literal"&gt;HookimplMarker&lt;/tt&gt; with
the same name:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;hookimpl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pluggy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HookimplMarker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;htmlize&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is used by plugins to attach to hooks when they're loaded.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="loading-plugins-in-the-host"&gt;
&lt;h3&gt;Loading plugins in the host&lt;/h3&gt;
&lt;p&gt;The host's main function loads plugins at startup as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;pm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pluggy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PluginManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;htmlize&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_hookspecs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hookspecs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_setuptools_entrypoints&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;htmlize&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;hookspecs&lt;/tt&gt; is our Python module containing the hooks shown above.
&lt;tt class="docutils literal"&gt;load_setuptools_entrypoints&lt;/tt&gt; is Pluggy's helper for loading plugins that
were &lt;tt class="docutils literal"&gt;pip&lt;/tt&gt;-installed into the same environment and registered as
setuptools &lt;a class="reference external" href="https://setuptools.pypa.io/en/latest/userguide/entry_point.html"&gt;entry points&lt;/a&gt;.
It's a way to signal - in one's &lt;tt class="docutils literal"&gt;setup.py&lt;/tt&gt; or &lt;tt class="docutils literal"&gt;pyproject.toml&lt;/tt&gt; file - some
metadata that projects can review at runtime. In our project, the plugins
register themselves with this section in the &lt;tt class="docutils literal"&gt;pyproject.toml&lt;/tt&gt; file:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;[project.entry-points.htmlize]
tt = &amp;quot;tt&amp;quot;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This says &amp;quot;for entry point &lt;tt class="docutils literal"&gt;htmlize&lt;/tt&gt;, define a new entry named &lt;tt class="docutils literal"&gt;tt&lt;/tt&gt;&amp;quot;.
Pluggy's &lt;tt class="docutils literal"&gt;load_setuptools_entrypoints&lt;/tt&gt; then uses &lt;a class="reference external" href="https://docs.python.org/3/library/importlib.metadata.html"&gt;importlib.metadata&lt;/a&gt;
to access this information.&lt;/p&gt;
&lt;p&gt;Note that Pluggy doesn't require using this mechanism. Hosts can implement any
plugin discovery method they want, and add plugins directly to their
&lt;tt class="docutils literal"&gt;PluginManager&lt;/tt&gt; with the &lt;tt class="docutils literal"&gt;register&lt;/tt&gt; method. But this is the mechanism used
for &lt;tt class="docutils literal"&gt;pytest&lt;/tt&gt; and many other projects; it makes it very easy to
automatically discover and register plugins that are installed with &lt;tt class="docutils literal"&gt;pip&lt;/tt&gt; and
equivalent tools.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="invoking-plugins"&gt;
&lt;h3&gt;Invoking plugins&lt;/h3&gt;
&lt;p&gt;Once &lt;tt class="docutils literal"&gt;PluginManager&lt;/tt&gt; loads the plugins, invoking them is straightforward;
here's how &lt;tt class="docutils literal"&gt;htmlize&lt;/tt&gt; invokes the contents hooks &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Build full contents back again, and ask plugins to act on&lt;/span&gt;
&lt;span class="c1"&gt;# contents.&lt;/span&gt;
&lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;plugin_manager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;htmlize_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;contents&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Generally, hook invocations return a &lt;em&gt;list&lt;/em&gt; of all the hooks attached to by
different plugins (a single host application can have multiple plugins installed
and attaching to the same hook). When the host invokes the hook as shown above,
the default order is LIFO, but plugins can affect this with
&lt;a class="reference external" href="https://pluggy.readthedocs.io/en/stable/#call-time-order"&gt;hook options&lt;/a&gt;
like &lt;tt class="docutils literal"&gt;tryfirst&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;trylast&lt;/tt&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="implementing-hooks-in-plugins"&gt;
&lt;h3&gt;Implementing hooks in plugins&lt;/h3&gt;
&lt;p&gt;Here's our entire &lt;tt class="docutils literal"&gt;narcissist&lt;/tt&gt; plugin that's attaching to the contents hook:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;htmlize&lt;/span&gt;

&lt;span class="nd"&gt;@htmlize&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hookimpl&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;htmlize_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;repl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&amp;lt;b&amp;gt;I (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;)&amp;lt;/b&amp;gt;&amp;#39;&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;\bI\b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;repl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hook&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Some notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;It expects &lt;tt class="docutils literal"&gt;htmlize&lt;/tt&gt; to be installed; as discussed previously, we rely on
Pluggy's default install-based approach where both the host and plugins are
installed into the same Python environment and can thus find each other.
However, Pluggy supports any custom discovery method.&lt;/li&gt;
&lt;li&gt;It uses the &lt;tt class="docutils literal"&gt;hookimpl&lt;/tt&gt; exported value shown earlier.&lt;/li&gt;
&lt;li&gt;It returns a function that acts on contents; this is the &lt;tt class="docutils literal"&gt;htmlize&lt;/tt&gt;-specific
contract (ABI, if you will) we've discussed before.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="fundamental-plugin-concepts-in-this-case-study"&gt;
&lt;h2&gt;Fundamental plugin concepts in this case study&lt;/h2&gt;
&lt;p&gt;Let's see how this case study of Pluggy measures against the
&lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures"&gt;Fundamental plugin concepts&lt;/a&gt;
that were covered &lt;a class="reference external" href="https://eli.thegreenplace.net/tag/plugins"&gt;several times on this blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It's important to remember that Pluggy is not a specific host application with
a bespoke plugin system; rather, it's a reusable library for creating such
plugin systems. Therefore, this is more of a &lt;em&gt;meta&lt;/em&gt; case study.&lt;/p&gt;
&lt;div class="section" id="discovery"&gt;
&lt;h3&gt;Discovery&lt;/h3&gt;
&lt;p&gt;Generally, Pluggy leaves discovery logic to the user's discretion. Its
&lt;tt class="docutils literal"&gt;PluginManager&lt;/tt&gt; has a &lt;tt class="docutils literal"&gt;register&lt;/tt&gt; method for adding plugins, and these can
be discovered in any way the application chooses.&lt;/p&gt;
&lt;p&gt;That said, Pluggy comes with one discovery mechanism built in - through the
entry points process of Python packaging, as shown above. This is hugely
convenient for a large number of applications, as long as both the application
and its plugins are installed via standard Python packaging tools (which is
a very reasonable assumption in the Python ecosystem).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="registration"&gt;
&lt;h3&gt;Registration&lt;/h3&gt;
&lt;p&gt;In the entry point process, plugins register themselves by adding a
&lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;[project.entry-points.&amp;lt;HOST-ID&amp;gt;]&lt;/span&gt;&lt;/tt&gt; section in their &lt;tt class="docutils literal"&gt;pyproject.toml&lt;/tt&gt;
file.&lt;/p&gt;
&lt;p&gt;Otherwise - as in the previous section - users are free to devise their own
registration schemes.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="hooks"&gt;
&lt;h3&gt;Hooks&lt;/h3&gt;
&lt;p&gt;This one is easy, since it's called &lt;em&gt;hooks&lt;/em&gt; in Pluggy parlance as well!
Pluggy's implementation of hooks is rather elegant, with function decorators
available for plugins to set. We've seen an example of this above with
&lt;tt class="docutils literal"&gt;&amp;#64;htmlize.hookimpl&lt;/tt&gt; decorating &lt;tt class="docutils literal"&gt;htmlize_contents&lt;/tt&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="exposing-an-application-api-to-plugins"&gt;
&lt;h3&gt;Exposing an application API to plugins&lt;/h3&gt;
&lt;p&gt;Since Pluggy is designed for Python hosts and Python plugins, this one is
fairly straightforward. The plugins typically assume the host project is
already installed in the Python environment and its modules can be imported.&lt;/p&gt;
&lt;p&gt;In our example, &lt;tt class="docutils literal"&gt;hookimpl&lt;/tt&gt; is imported from &lt;tt class="docutils literal"&gt;htmlize&lt;/tt&gt; by the plugin to
accomplish this. It also shows how host data is passed to the plugin - the
&lt;tt class="docutils literal"&gt;post&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;db&lt;/tt&gt; parameters. These are APIs exposed by the host for the
plugins' use.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion-is-pluggy-worth-it"&gt;
&lt;h2&gt;Conclusion - is Pluggy worth it?&lt;/h2&gt;
&lt;p&gt;In footnote 2 of my original &lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures"&gt;fundamental concepts of plugin infrastructures&lt;/a&gt;
post, I wrote &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
This is probably why there are very few well-established plugin frameworks
in existence (even in low-level languages like C or C++). It's too easy (and
tempting) to roll your own.&lt;/blockquote&gt;
&lt;p&gt;I still believe my statement is true - plugin frameworks are very easy
to create, and the functionality they provide is relatively small compared to
their large surface area. In other words, this is a &lt;em&gt;shallow API&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;That said, Pluggy does provide some nice functionality for the more advanced
uses of plugins:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Automatic entry point registration mechanism - if you need it&lt;/li&gt;
&lt;li&gt;Signature validation&lt;/li&gt;
&lt;li&gt;Consistent plugin result collection across multiple hook attachments in a
single plugin and across many plugins&lt;/li&gt;
&lt;li&gt;Plugin ordering with &lt;tt class="docutils literal"&gt;firstresult&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;tryfirst&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;trylast&lt;/tt&gt;, etc.&lt;/li&gt;
&lt;li&gt;Hook &amp;quot;wrappers&amp;quot; for some special use cases&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Are these worthwhile for your project? It really depends on the project,
and it's always worth keeping the &lt;a class="reference external" href="https://eli.thegreenplace.net/2017/benefits-of-dependencies-in-software-projects-as-a-function-of-effort/"&gt;tradeoff between dependencies and project
effort&lt;/a&gt;
in mind.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="code"&gt;
&lt;h2&gt;Code&lt;/h2&gt;
&lt;p&gt;The full code repository for this post &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2026/plugin-pluggy"&gt;is available here&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Here &lt;tt class="docutils literal"&gt;plugin_manager&lt;/tt&gt; is the value previously returned from
&lt;tt class="docutils literal"&gt;pluggy.PluginManager&lt;/tt&gt;; in the previous code snippet it's saved into
&lt;tt class="docutils literal"&gt;pm&lt;/tt&gt; - the different variable name is because a function call is made
and &lt;tt class="docutils literal"&gt;plugin_manager&lt;/tt&gt; is the parameter name.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;To be fair, that post predates the creation of Pluggy!&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Plugins"></category><category term="Python"></category></entry><entry><title>Rewriting pycparser with the help of an LLM</title><link href="https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/" rel="alternate"></link><published>2026-02-04T19:35:00-08:00</published><updated>2026-02-05T03:38:39-08:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2026-02-04:/2026/rewriting-pycparser-with-the-help-of-an-llm/</id><summary type="html">&lt;p&gt;&lt;a class="reference external" href="https://github.com/eliben/pycparser"&gt;pycparser&lt;/a&gt; is my most widely used open
source project (with ~20M daily downloads from PyPI &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;). It's a pure-Python
parser for the C programming language, producing ASTs inspired by &lt;a class="reference external" href="https://docs.python.org/3/library/ast.html"&gt;Python's
own&lt;/a&gt;. Until very recently, it's
been using &lt;a class="reference external" href="https://www.dabeaz.com/ply/ply.html"&gt;PLY: Python Lex-Yacc&lt;/a&gt; for
the core parsing.&lt;/p&gt;
&lt;p&gt;In this post, I'll describe how …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;a class="reference external" href="https://github.com/eliben/pycparser"&gt;pycparser&lt;/a&gt; is my most widely used open
source project (with ~20M daily downloads from PyPI &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;). It's a pure-Python
parser for the C programming language, producing ASTs inspired by &lt;a class="reference external" href="https://docs.python.org/3/library/ast.html"&gt;Python's
own&lt;/a&gt;. Until very recently, it's
been using &lt;a class="reference external" href="https://www.dabeaz.com/ply/ply.html"&gt;PLY: Python Lex-Yacc&lt;/a&gt; for
the core parsing.&lt;/p&gt;
&lt;p&gt;In this post, I'll describe how I collaborated with an LLM coding agent (Codex)
to help me rewrite pycparser to use a hand-written recursive-descent parser and
remove the dependency on PLY. This has been an interesting experience and the
post contains lots of information and is therefore quite long; if you're just
interested in the final result, check out the latest code of pycparser - the
&lt;tt class="docutils literal"&gt;main&lt;/tt&gt; branch already has the new implementation.&lt;/p&gt;
&lt;img alt="meme picture saying &amp;quot;can't come to bed because my AI agent produced something slightly wrong&amp;quot;" class="align-center" src="https://eli.thegreenplace.net/images/2026/cantcometobed.png" /&gt;
&lt;div class="section" id="the-issues-with-the-existing-parser-implementation"&gt;
&lt;h2&gt;The issues with the existing parser implementation&lt;/h2&gt;
&lt;p&gt;While pycparser has been working well overall, there were a number of nagging
issues that persisted over years.&lt;/p&gt;
&lt;div class="section" id="parsing-strategy-yacc-vs-hand-written-recursive-descent"&gt;
&lt;h3&gt;Parsing strategy: YACC vs. hand-written recursive descent&lt;/h3&gt;
&lt;p&gt;I began working on pycparser in 2008, and back then using a YACC-based approach
for parsing a whole language like C seemed like a no-brainer to me. Isn't this
what everyone does when writing a serious parser? Besides, the K&amp;amp;R2 book
famously carries the entire grammar of the C99 language in an appendix - so it
seemed like a simple matter of translating that to PLY-yacc syntax.&lt;/p&gt;
&lt;p&gt;And indeed, it wasn't &lt;em&gt;too&lt;/em&gt; hard, though there definitely were some complications
in building the ASTs for declarations (C's &lt;a class="reference external" href="https://eli.thegreenplace.net/2008/10/18/implementing-cdecl-with-pycparser"&gt;gnarliest part&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Shortly after completing pycparser, I got more and more interested in compilation
and started learning about the different kinds of parsers more seriously. Over
time, I grew convinced that &lt;a class="reference external" href="https://eli.thegreenplace.net/tag/recursive-descent-parsing"&gt;recursive descent&lt;/a&gt; is the way to
go - producing parsers that are easier to understand and maintain (and are often
faster!).&lt;/p&gt;
&lt;p&gt;It all ties in to the &lt;a class="reference external" href="https://eli.thegreenplace.net/2017/benefits-of-dependencies-in-software-projects-as-a-function-of-effort/"&gt;benefits of dependencies in software projects as a
function of effort&lt;/a&gt;.
Using parser generators is a heavy &lt;em&gt;conceptual&lt;/em&gt; dependency: it's really nice
when you have to churn out many parsers for small languages. But when you have
to maintain a single, very complex parser, as part of a large project - the
benefits quickly dissipate and you're left with a substantial dependency that
you constantly grapple with.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-other-issue-with-dependencies"&gt;
&lt;h3&gt;The other issue with dependencies&lt;/h3&gt;
&lt;p&gt;And then there are the usual problems with dependencies; dependencies get
abandoned, and they may also develop security issues. Sometimes, both of these
become true.&lt;/p&gt;
&lt;p&gt;Many years ago, pycparser forked and started vendoring its own version of PLY.
This was part of transitioning pycparser to a dual Python 2/3 code base when PLY
was slower to adapt. I believe this was the right decision, since PLY &amp;quot;just
worked&amp;quot; and I didn't have to deal with active (and very tedious in the Python
ecosystem, where packaging tools are replaced faster than dirty socks)
dependency management.&lt;/p&gt;
&lt;p&gt;A couple of weeks ago &lt;a class="reference external" href="https://github.com/eliben/pycparser/issues/588"&gt;this issue&lt;/a&gt;
was opened for pycparser. It turns out the some old PLY code triggers security
checks used by some Linux distributions; while this code was fixed in a later
commit of PLY, PLY itself was apparently abandoned and archived in late 2025.
And guess what? That happened in the middle of a large rewrite of the package,
so re-vendoring the pre-archiving commit seemed like a risky proposition.&lt;/p&gt;
&lt;p&gt;On the issue it was suggested that &amp;quot;hopefully the dependent packages move on to
a non-abandoned parser or implement their own&amp;quot;; I originally laughed this idea
off, but then it got me thinking... which is what this post is all about.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="growing-complexity-of-parsing-a-messy-language"&gt;
&lt;h3&gt;Growing complexity of parsing a messy language&lt;/h3&gt;
&lt;p&gt;The original K&amp;amp;R2 grammar for C99 had - famously - a single shift-reduce
conflict having to do with dangling &lt;tt class="docutils literal"&gt;else&lt;/tt&gt;s belonging to the most recent
&lt;tt class="docutils literal"&gt;if&lt;/tt&gt; statement. And indeed, other than the famous &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Lexer_hack"&gt;lexer hack&lt;/a&gt;
used to deal with &lt;a class="reference external" href="https://eli.thegreenplace.net/2011/05/02/the-context-sensitivity-of-cs-grammar-revisited"&gt;C's type name / ID ambiguity&lt;/a&gt;,
pycparser only had this single shift-reduce conflict.&lt;/p&gt;
&lt;p&gt;But things got more complicated. Over the years, features were added that
weren't strictly in the standard but were supported by all the industrial
compilers. The more advanced C11 and C23 standards weren't beholden to the
promises of conflict-free YACC parsing (since almost no industrial-strength
compilers use YACC at this point), so all caution went out of the window.&lt;/p&gt;
&lt;p&gt;The latest (PLY-based) release of pycparser has many reduce-reduce conflicts
&lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;; these are a severe maintenance hazard because it means the parsing rules
essentially have to be tie-broken by order of appearance in the code. This is
very brittle; pycparser has only managed to maintain its stability and quality
through its comprehensive test suite. Over time, it became harder and harder to
extend, because YACC parsing rules have all kinds of spooky-action-at-a-distance
effects. The straw that broke the camel's back was &lt;a class="reference external" href="https://github.com/eliben/pycparser/pull/590"&gt;this PR&lt;/a&gt; which again proposed to
increase the number of reduce-reduce conflicts &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This - again - prompted me to think &amp;quot;what if I just dump YACC and switch to
a hand-written recursive descent parser&amp;quot;, and here we are.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="the-mental-roadblock"&gt;
&lt;h2&gt;The mental roadblock&lt;/h2&gt;
&lt;p&gt;None of the challenges described above are new; I've been pondering them for
many years now, and yet biting the bullet and rewriting the parser didn't feel
like something I'd like to get into. By my private estimates it'd take at least
a week of deep heads-down work to port the gritty 2000 lines of YACC grammar
rules to a recursive descent parser &lt;a class="footnote-reference" href="#footnote-4" id="footnote-reference-4"&gt;[4]&lt;/a&gt;. Moreover, it wouldn't be a
particularly &lt;em&gt;fun&lt;/em&gt; project either - I didn't feel like I'd learn much new and
my interests have shifted away from this project. In short, the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Potential_well"&gt;Potential well&lt;/a&gt; was just too deep.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="why-would-this-even-work-tests"&gt;
&lt;h2&gt;Why would this even work? Tests&lt;/h2&gt;
&lt;p&gt;I've definitely noticed the improvement in capabilities of LLM coding
agents in the past few months, and many reputable people online rave about using
them for increasingly larger projects. That said, would an LLM agent really be
able to accomplish such a complex project on its own? This isn't just a toy,
it's thousands of lines of dense parsing code.&lt;/p&gt;
&lt;p&gt;What gave me hope is the concept of &lt;a class="reference external" href="https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-year-of-conformance-suites"&gt;conformance suites mentioned by
Simon Willison&lt;/a&gt;.
Agents seem to do well when there's a very clear and rigid
goal function - such as a large, high-coverage conformance test suite.&lt;/p&gt;
&lt;p&gt;And pycparser has an &lt;a class="reference external" href="https://github.com/eliben/pycparser/blob/main/tests/test_c_parser.py"&gt;very extensive one&lt;/a&gt;.
Over 2500 lines of test code parsing various C snippets to ASTs with expected
results, grown over a decade and a half of real issues and bugs reported by
users.&lt;/p&gt;
&lt;p&gt;I figured the LLM can either succeed or fail and throw its hands up in despair,
but it's quite unlikely to produce a &lt;em&gt;wrong&lt;/em&gt; port that would still pass all
the tests. So I set it to run.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-initial-port"&gt;
&lt;h2&gt;The initial port&lt;/h2&gt;
&lt;p&gt;I fired up Codex in pycparser's repository, and wrote this prompt just to
make sure it understands me and can run the tests:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;This repository implements a Python parser producing an AST. It has a very
extensive test suite; can you first verify you successfully run all the tests?
(with python3 -m unittest discover, from the root dir)
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Codex figured it out (I gave it the exact command, after all!); my next
prompt was the real thing &lt;a class="footnote-reference" href="#footnote-5" id="footnote-reference-5"&gt;[5]&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;pycparser uses the PLY library for its lexer and parser, and its own AST
package for the ASTs created by the parser. I want to explore replacing the
Parser component (leaving the Lexer alone for now) with a hand written
recursive descent parser that will pass all the same tests. Can you try that?
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here Codex went to work and churned &lt;em&gt;for over an hour&lt;/em&gt;. Having never
observed an agent work for nearly this long, I kind of assumed it went off
the rails and will fail sooner or later. So I was rather surprised and skeptical
when it eventually came back with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;Replaced the PLY parse path with a handwritten recursive‑descent parser while
keeping the existing lexer and AST helpers intact; CParser now points to the
new RD implementation, with the old PLY version preserved as PLYCParser in
pycparser/c_parser.py. The RD parser includes a token stream, declarator/type
parsing, expression precedence handling, pragmas, and statement parsing to
match the previous AST output and coordinate behavior.
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It took me a while to poke around the code and run it until I was convinced - it
had actually done it! It wrote a new recursive descent parser with only
ancillary dependencies on PLY, and that parser passed the test suite. After a
few more prompts, we've removed the ancillary dependencies and made the
structure clearer. I hadn't looked too deeply into code quality at this point,
but at least on the functional level - it succeeded. This was very impressive!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="a-quick-note-on-reviews-and-branches"&gt;
&lt;h2&gt;A quick note on reviews and branches&lt;/h2&gt;
&lt;p&gt;A change like the one described above is impossible to code-review as one PR in
any meaningful way; so I used a different strategy. Before embarking on this
path, I created a new branch and once Codex finished the initial rewrite, I
committed this change, knowing that I will review it in detail, piece-by-piece
later on.&lt;/p&gt;
&lt;p&gt;Even though coding agents have their own notion of history and can &amp;quot;revert&amp;quot;
certain changes, I felt much safer relying on Git. In the worst case if all of
this goes south, I can nuke the branch and it's as if nothing ever happened.
I was determined to only merge this branch onto &lt;tt class="docutils literal"&gt;main&lt;/tt&gt; once I was fully
satisfied with the code. In what follows, I had to &lt;tt class="docutils literal"&gt;git reset&lt;/tt&gt; several times
when I didn't like the direction in which Codex was going. In hindsight, doing
this work in a branch was absolutely the right choice.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-long-tail-of-goofs"&gt;
&lt;h2&gt;The long tail of goofs&lt;/h2&gt;
&lt;p&gt;Once I've sufficiently convinced myself that the new parser is actually working,
I used Codex to similarly rewrite the lexer and get rid of the PLY dependency
entirely, deleting it from the repository. Then, I started looking more deeply
into code quality - reading the code created by Codex and trying to wrap my head
around it.&lt;/p&gt;
&lt;p&gt;And - oh my - this was quite the journey. Much has been written about the code
produced by agents, and much of it seems to be true. Maybe it's a setting I'm
missing (I'm not using my own custom &lt;tt class="docutils literal"&gt;AGENTS.md&lt;/tt&gt; yet, for instance), but
Codex seems to be that eager programmer that wants to get from A to B whatever
the cost. Readability, minimalism and code clarity are very much secondary
goals.&lt;/p&gt;
&lt;p&gt;Using &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;raise...except&lt;/span&gt;&lt;/tt&gt; for control flow? Yep. Abusing Python's weak typing
(like having &lt;tt class="docutils literal"&gt;None&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;false&lt;/tt&gt; and other values all mean different things
for a given variable)? For sure. Spreading the logic of a complex function
all over the place instead of putting all the key parts in a single switch
statement? You bet.&lt;/p&gt;
&lt;p&gt;Moreover, the agent is hilariously &lt;em&gt;lazy&lt;/em&gt;. More than once I had to convince it
to do something it initially said is impossible, and even insisted again in
follow-up messages. The anthropomorphization here is mildly concerning, to be
honest. I could never imagine I would be writing something like the following to
a computer, and yet - here we are: &amp;quot;Remember how we moved X to Y before? You
can do it again for Z, definitely. Just try&amp;quot;.&lt;/p&gt;
&lt;p&gt;My process was to see how I can instruct Codex to fix things, and intervene
myself (by rewriting code) as little as possible. I've &lt;em&gt;mostly&lt;/em&gt; succeeded in
this, and did maybe 20% of the work myself.&lt;/p&gt;
&lt;p&gt;My branch grew &lt;em&gt;dozens&lt;/em&gt; of commits, falling into roughly these categories:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;The code in X is too complex; why can't we do Y instead?&lt;/li&gt;
&lt;li&gt;The use of X is needlessly convoluted; change Y to Z, and T to V in all
instances.&lt;/li&gt;
&lt;li&gt;The code in X is unclear; please add a detailed comment - with examples - to
explain what it does.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Interestingly, after doing (3), the agent was often more effective in giving
the code a &amp;quot;fresh look&amp;quot; and succeeding in either (1) or (2).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-end-result"&gt;
&lt;h2&gt;The end result&lt;/h2&gt;
&lt;p&gt;Eventually, after many hours spent in this process, I was reasonably pleased
with the code. It's far from perfect, of course, but taking the essential
complexities into account, it's something I could see myself maintaining (with
or without the help of an agent). I'm sure I'll find more ways to improve it
in the future, but I have a reasonable degree of confidence that this will be
doable.&lt;/p&gt;
&lt;p&gt;It passes all the tests, so I've been able to release a new version (3.00)
without major issues so far. The only issue I've discovered is that some of
CFFI's tests are overly precise about the phrasing of errors reported by
pycparser; this was &lt;a class="reference external" href="https://github.com/python-cffi/cffi/pull/224"&gt;an easy fix&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The new parser is also faster, by about 30% based on my benchmarks! This is
typical of recursive descent when compared with YACC-generated parsers, in my
experience. After reviewing the initial rewrite of the lexer, I've spent a while
instructing Codex on how to make it faster, and it worked reasonably well.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="followup-static-typing"&gt;
&lt;h2&gt;Followup - static typing&lt;/h2&gt;
&lt;p&gt;While working on this, it became quite obvious that static typing would make the
process easier. LLM coding agents really benefit from closed loops with strict
guardrails (e.g. a test suite to pass), and type-annotations act as such.
For example, had pycparser already been type annotated, Codex would probably not
have overloaded values to multiple types (like &lt;tt class="docutils literal"&gt;None&lt;/tt&gt; vs. &lt;tt class="docutils literal"&gt;False&lt;/tt&gt; vs.
others).&lt;/p&gt;
&lt;p&gt;In a followup, I asked Codex to type-annotate pycparser (running checks using
&lt;tt class="docutils literal"&gt;ty&lt;/tt&gt;), and this was also a back-and-forth because the process exposed some
issues that needed to be refactored. Time will tell, but hopefully it will make
further changes in the project simpler for the agent.&lt;/p&gt;
&lt;p&gt;Based on this experience, I'd bet that coding agents will be somewhat more
effective in strongly typed languages like Go, TypeScript and especially Rust.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusions"&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;Overall, this project has been a really good experience, and I'm impressed with
what modern LLM coding agents can do! While there's no reason to expect that
progress in this domain will stop, even if it does - these are already very
useful tools that can significantly improve programmer productivity.&lt;/p&gt;
&lt;p&gt;Could I have done this myself, without an agent's help? Sure. But it would have
taken me &lt;em&gt;much&lt;/em&gt; longer, assuming that I could even muster the will and
concentration to engage in this project. I estimate it would take me at least
a week of full-time work (so 30-40 hours) spread over who knows how long to
accomplish. With Codex, I put in an order of magnitude less work into this
(around 4-5 hours, I'd estimate) and I'm happy with the result.&lt;/p&gt;
&lt;p&gt;It was also &lt;em&gt;fun&lt;/em&gt;. At least in one sense, my professional life can be described
as the pursuit of focus, deep work and &lt;em&gt;flow&lt;/em&gt;. It's not easy for me to get into
this state, but when I do I'm highly productive and find it very enjoyable.
Agents really help me here. When I know I need to write some code and it's
hard to get started, asking an agent to write a prototype is a great catalyst
for my motivation. Hence the meme at the beginning of the post.&lt;/p&gt;
&lt;div class="section" id="does-code-quality-even-matter"&gt;
&lt;h3&gt;Does code quality even matter?&lt;/h3&gt;
&lt;p&gt;One can't avoid a nagging question - does the quality of the code produced
by agents even matter? Clearly, the agents themselves can understand it (if not
today's agent, then at least next year's). Why worry about future
maintainability if the agent can maintain it? In other words, does it make sense
to just go full vibe-coding?&lt;/p&gt;
&lt;p&gt;This is a fair question, and one I don't have an answer to. Right now, for
projects I maintain and &lt;em&gt;stand behind&lt;/em&gt;, it seems obvious to me that the code
should be fully understandable and accepted by me, and the agent is just a tool
helping me get to that state more efficiently. It's hard to say what the future
holds here; it's going to interesting, for sure.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;pycparser has a fair number of &lt;a class="reference external" href="https://deps.dev/pypi/pycparser/3.0.0/dependents"&gt;direct dependents&lt;/a&gt;,
but the majority of downloads comes through &lt;a class="reference external" href="https://github.com/python-cffi/cffi"&gt;CFFI&lt;/a&gt;,
which itself is a major building block for much of the Python ecosystem.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;The table-building report says 177, but that's certainly an
over-dramatization because it's common for a single conflict to
manifest in several ways.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;It didn't help the PR's case that it was almost certainly vibe coded.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-4" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-4"&gt;[4]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;p class="first"&gt;There was also the lexer to consider, but this seemed like a much
simpler job. My impression is that in the early days of computing,
&lt;tt class="docutils literal"&gt;lex&lt;/tt&gt; gained prominence because of strong regexp support which wasn't
very common yet. These days, with excellent regexp libraries
existing for pretty much every language, the added value of &lt;tt class="docutils literal"&gt;lex&lt;/tt&gt; over
a &lt;a class="reference external" href="https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript"&gt;custom regexp-based lexer&lt;/a&gt;
isn't very high.&lt;/p&gt;
&lt;p class="last"&gt;That said, it wouldn't make much sense to embark on a journey to rewrite
&lt;em&gt;just&lt;/em&gt; the lexer; the dependency on PLY would still remain, and besides,
PLY's lexer and parser are designed to work well together. So it wouldn't
help me much without tackling the parser beast.&lt;/p&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-5" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-5"&gt;[5]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;I've decided to ask it to the port the parser first, leaving the lexer
alone. This was to split the work into reasonable chunks. Besides, I
figured that the parser is the hard job anyway - if it succeeds in that,
the lexer should be easy. That assumption turned out to be correct.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Python"></category><category term="Machine Learning"></category><category term="Compilation"></category><category term="Recursive descent parsing"></category></entry><entry><title>Compiling Scheme to WebAssembly</title><link href="https://eli.thegreenplace.net/2026/compiling-scheme-to-webassembly/" rel="alternate"></link><published>2026-01-17T14:37:00-08:00</published><updated>2026-01-17T22:40:40-08:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2026-01-17:/2026/compiling-scheme-to-webassembly/</id><summary type="html">&lt;p&gt;One of my oldest open-source projects - &lt;a class="reference external" href="https://github.com/eliben/bobscheme"&gt;Bob&lt;/a&gt;
- has &lt;a class="reference external" href="https://eli.thegreenplace.net/2010/11/06/bob-a-scheme-interpreter-compiler-and-vm-in-python"&gt;celebrated 15 a couple of months ago&lt;/a&gt;.
Bob is a suite of implementations of the Scheme programming language in Python,
including an interpreter, a compiler and a VM. Back then I was doing some hacking
on CPython internals and was very curious …&lt;/p&gt;</summary><content type="html">&lt;p&gt;One of my oldest open-source projects - &lt;a class="reference external" href="https://github.com/eliben/bobscheme"&gt;Bob&lt;/a&gt;
- has &lt;a class="reference external" href="https://eli.thegreenplace.net/2010/11/06/bob-a-scheme-interpreter-compiler-and-vm-in-python"&gt;celebrated 15 a couple of months ago&lt;/a&gt;.
Bob is a suite of implementations of the Scheme programming language in Python,
including an interpreter, a compiler and a VM. Back then I was doing some hacking
on CPython internals and was very curious about how CPython-like bytecode VMs
work; Bob was an experiment to find out, by implementing one from scratch for
R5RS Scheme.&lt;/p&gt;
&lt;p&gt;Several months later I &lt;a class="reference external" href="https://eli.thegreenplace.net/2011/04/09/a-c-vm-added-to-bob"&gt;added a C++ VM to Bob&lt;/a&gt;,
as an exercise to learn how such VMs are implemented in a low-level language
without all the runtime support Python provides; most importantly, without the
built-in GC. The C++ VM in Bob implements its own mark-and-sweep GC.&lt;/p&gt;
&lt;p&gt;After many quiet years (with just a sprinkling of cosmetic changes, porting to
GitHub, updates to Python 3, etc), I felt the itch to work on Bob again just
before the holidays. Specifically, I decided to add another compiler to the
suite - this one from Scheme directly to WebAssembly.&lt;/p&gt;
&lt;p&gt;The goals of this effort were two-fold:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;Experiment with lowering a real, high-level language like Scheme to
WebAssembly. Experiments like the recent &lt;a class="reference external" href="https://eli.thegreenplace.net/2025/revisiting-lets-build-a-compiler/"&gt;Let's Build a Compiler&lt;/a&gt;
compile toy languages that are at the C level (no runtime). Scheme has built-in
data structures, lexical closures, garbage collection, etc. It's much more challenging.&lt;/li&gt;
&lt;li&gt;Get some hands-on experience with the WASM GC extension &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;. I have several
samples of using WASM GC in the &lt;a class="reference external" href="https://github.com/eliben/wasm-wat-samples"&gt;wasm-wat-samples repository&lt;/a&gt;,
but I really wanted to try it for something &amp;quot;real&amp;quot;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Well, it's done now; here's an updated schematic of the Bob project:&lt;/p&gt;
&lt;img alt="Bob project diagram with all the components it includes" class="align-center" src="https://eli.thegreenplace.net/images/2026/bob_toplevel.png" /&gt;
&lt;p&gt;The new part is the rightmost vertical path. A &lt;a class="reference external" href="https://github.com/eliben/bobscheme/blob/main/bob/wasmcompiler.py"&gt;WasmCompiler&lt;/a&gt;
class lowers parsed Scheme expressions all the way down to WebAssembly text,
which can then be compiled to a binary and executed using standard WASM tools &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="highlights"&gt;
&lt;h2&gt;Highlights&lt;/h2&gt;
&lt;p&gt;The most interesting aspect of this project was working with WASM GC to
represent Scheme objects. As long as we properly box/wrap all values in
&lt;tt class="docutils literal"&gt;ref&lt;/tt&gt;s, the underlying WASM execution environment will take care of the
memory management.&lt;/p&gt;
&lt;p&gt;For Bob, here's how some key Scheme objects are represented:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;;; PAIR holds the car and cdr of a cons cell.
(type $PAIR (struct (field (mut (ref null eq))) (field (mut (ref null eq)))))

;; BOOL represents a Scheme boolean. zero -&amp;gt; false, nonzero -&amp;gt; true.
(type $BOOL (struct (field i32)))

;; SYMBOL represents a Scheme symbol. It holds an offset in linear memory
;; and the length of the symbol name.
(type $SYMBOL (struct (field i32) (field i32)))
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;$PAIR&lt;/tt&gt; is of particular interest, as it may contain arbitrary objects in
its fields; &lt;tt class="docutils literal"&gt;(ref null eq)&lt;/tt&gt; means &amp;quot;a nullable reference to something that
has identity&amp;quot;. &lt;tt class="docutils literal"&gt;ref.test&lt;/tt&gt; can be used to check - for a given
reference - the run-time type of the value it refers to.&lt;/p&gt;
&lt;p&gt;You may wonder - what about numeric values? Here WASM has a trick - the &lt;tt class="docutils literal"&gt;i31&lt;/tt&gt;
type can be used to represent a reference to an integer, but without
actually boxing it (one bit is used to distinguish such an object from a
real reference). So we don't need a separate type to hold references to numbers.&lt;/p&gt;
&lt;p&gt;Also, the &lt;tt class="docutils literal"&gt;$SYMBOL&lt;/tt&gt; type looks unusual - how is it represented with two
numbers? The key to the mystery is that WASM has no built-in support for
strings; they should be implemented manually using offsets to linear memory.
The Bob WASM compiler emits the string values of all symbols encountered into
linear memory, keeping track of the offset and length of each one; these are
the two numbers placed in &lt;tt class="docutils literal"&gt;$SYMBOL&lt;/tt&gt;. This also allows to fairly easily
implement the string interning feature of Scheme; multiple instances of the
same symbol will only be allocated once.&lt;/p&gt;
&lt;p&gt;Consider this trivial Scheme snippet:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;write&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;foo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The compiler emits the symbols &amp;quot;foo&amp;quot; and &amp;quot;bar&amp;quot; into linear memory as follows &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;(data (i32.const 2048) &amp;quot;foo&amp;quot;)
(data (i32.const 2051) &amp;quot;bar&amp;quot;)
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And looking for one of these addresses in the rest of the emitted code, we'll
find:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;(struct.new $SYMBOL (i32.const 2051) (i32.const 3))
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As part of the code for constructing the constant &lt;tt class="docutils literal"&gt;cons&lt;/tt&gt; list representing the
argument to &lt;tt class="docutils literal"&gt;write&lt;/tt&gt;; address 2051 and length 3: this is the symbol &lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;.&lt;/p&gt;
&lt;p&gt;Speaking of &lt;tt class="docutils literal"&gt;write&lt;/tt&gt;, implementing this builtin was quite interesting. For
compatibility with the other Bob implementations in my repository, &lt;tt class="docutils literal"&gt;write&lt;/tt&gt;
needs to be able to print recursive representations of arbitrary Scheme values,
including lists, symbols, etc.&lt;/p&gt;
&lt;p&gt;Initially I was reluctant to implement all of this functionality by hand in
WASM text, but all alternatives ran into challenges:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;Deferring this to the host is difficult because the host environment has
no access to WASM GC references - they are completely opaque.&lt;/li&gt;
&lt;li&gt;Implementing it in another language (maybe C?) and lowering to WASM is also
challenging for a similar reason - the other language is unlikely to have
a good representation of WASM GC objects.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So I bit the bullet and - with some AI help for the tedious parts - just wrote
an implementation of &lt;tt class="docutils literal"&gt;write&lt;/tt&gt; directly in WASM text; it wasn't really that
bad. I import only two functions from the host:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;(import &amp;quot;env&amp;quot; &amp;quot;write_char&amp;quot; (func $write_char (param i32)))
(import &amp;quot;env&amp;quot; &amp;quot;write_i32&amp;quot; (func $write_i32 (param i32)))
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Though emitting integers &lt;a class="reference external" href="https://eli.thegreenplace.net/2023/itoa-integer-to-string-in-webassembly/"&gt;directly from WASM isn't hard&lt;/a&gt;,
I figured this project already has enough code and some host help here would
be welcome. For all the rest, only the lowest level &lt;tt class="docutils literal"&gt;write_char&lt;/tt&gt; is used.
For example, here's how booleans are emitted in the canonical Scheme notation
(&lt;tt class="docutils literal"&gt;#t&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;#f&lt;/tt&gt;):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;(func $emit_bool (param $b (ref $BOOL))
    (call $emit (i32.const 35)) ;; &amp;#39;#&amp;#39;
    (if (i32.eqz (struct.get $BOOL 0 (local.get $b)))
        (then (call $emit (i32.const 102))) ;; &amp;#39;f&amp;#39;
        (else (call $emit (i32.const 116))) ;; &amp;#39;t&amp;#39;
    )
)
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This was a really fun project, and I learned quite a bit about realistic code
emission to WASM. Feel free to check out the source code of &lt;a class="reference external" href="https://github.com/eliben/bobscheme/blob/main/bob/wasmcompiler.py"&gt;WasmCompiler&lt;/a&gt; - it's
very well documented. While it's a bit over 1000 LOC in total &lt;a class="footnote-reference" href="#footnote-4" id="footnote-reference-4"&gt;[4]&lt;/a&gt;, more than half
of that is actually WASM text snippets that implement the builtin types and
functions needed by a basic Scheme implementation.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;The GC proposal &lt;a class="reference external" href="https://github.com/WebAssembly/gc"&gt;is documented here&lt;/a&gt;.
It was officially added to the WASM spec in Oct 2023.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;p class="first"&gt;In Bob this is currently done with &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;bytecodealliance/wasm-tools&lt;/span&gt;&lt;/tt&gt; for the
text-to-binary conversion and Node.js for the execution environment, but
this can change in the future.&lt;/p&gt;
&lt;p class="last"&gt;I actually wanted to use Python bindings to wasmtime, but these don't
appear to support WASM GC yet.&lt;/p&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2048 is just an arbitrary offset the compiler uses as the beginning of
the section for symbols in memory. We could
also use the multiple memories feature of WASM and dedicate a separate
linear memory just for symbols.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-4" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-4"&gt;[4]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;To be clear, this is just the WASM compiler class; it uses the &lt;tt class="docutils literal"&gt;Expr&lt;/tt&gt;
representation of Scheme that is created by Bob's parser (and lexer);
the code of these other components is shared among all Bob
implementations and isn't counted here.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Lisp"></category><category term="Python"></category><category term="WebAssembly"></category></entry><entry><title>Plugins case study: mdBook preprocessors</title><link href="https://eli.thegreenplace.net/2025/plugins-case-study-mdbook-preprocessors/" rel="alternate"></link><published>2025-12-17T18:11:00-08:00</published><updated>2026-06-07T00:38:58-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2025-12-17:/2025/plugins-case-study-mdbook-preprocessors/</id><summary type="html">&lt;p&gt;&lt;a class="reference external" href="https://rust-lang.github.io/mdBook/index.html"&gt;mdBook&lt;/a&gt; is a tool for easily
creating books out of Markdown files. It's very popular in the Rust ecosystem,
where it's used (among other things) to publish &lt;a class="reference external" href="https://doc.rust-lang.org/book/"&gt;the official Rust book&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;mdBook has a simple yet effective plugin mechanism that can be used to modify
the book output in arbitrary …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;a class="reference external" href="https://rust-lang.github.io/mdBook/index.html"&gt;mdBook&lt;/a&gt; is a tool for easily
creating books out of Markdown files. It's very popular in the Rust ecosystem,
where it's used (among other things) to publish &lt;a class="reference external" href="https://doc.rust-lang.org/book/"&gt;the official Rust book&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;mdBook has a simple yet effective plugin mechanism that can be used to modify
the book output in arbitrary ways, using any programming language or tool. This
post describes the mechanism and how it aligns with the
&lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures"&gt;fundamental concepts of plugin infrastructures&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="mdbook-preprocessors"&gt;
&lt;h2&gt;mdBook preprocessors&lt;/h2&gt;
&lt;p&gt;mdBook's architecture is pretty simple: your contents go into a directory tree
of Markdown files. mdBook then renders these into a book, with one file per
chapter. The book's output is HTML by default, but mdBook supports other
outputs like PDF.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://rust-lang.github.io/mdBook/for_developers/preprocessors.html"&gt;preprocessor mechanism&lt;/a&gt;
lets us register an arbitrary program that runs on the book's source after
it's loaded from Markdown files; this program can modify the book's contents in
any way it wishes before it all gets sent to the renderer for generating output.&lt;/p&gt;
&lt;img alt="Preprocessor flow for mdbook" class="align-center" src="https://eli.thegreenplace.net/images/2025/mdbook-preprocessor.png" /&gt;
&lt;p&gt;The official documentation &lt;a class="reference external" href="https://rust-lang.github.io/mdBook/for_developers/preprocessors.html#hooking-into-mdbook"&gt;explains this process very well&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="sample-plugin"&gt;
&lt;h2&gt;Sample plugin&lt;/h2&gt;
&lt;p&gt;I rewrote &lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures"&gt;my classical &amp;quot;nacrissist&amp;quot; plugin&lt;/a&gt;
for mdBook; the code is &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2025/plugin-mdbook"&gt;available here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In fact, there are two renditions of the same plugin there:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;One in Python, to demonstrate how mdBook can invoke preprocessors written in
any programming language.&lt;/li&gt;
&lt;li&gt;One in Rust, to demonstrate how mdBook exposes an application API to plugins
written in Rust (since mdBook is itself written in Rust).&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div class="section" id="fundamental-plugin-concepts-in-this-case-study"&gt;
&lt;h2&gt;Fundamental plugin concepts in this case study&lt;/h2&gt;
&lt;p&gt;Let's see how this case study of mdBook preprocessors measures against the
&lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures"&gt;Fundamental plugin concepts&lt;/a&gt;
that were covered &lt;a class="reference external" href="https://eli.thegreenplace.net/tag/plugins"&gt;several times on this blog&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="discovery"&gt;
&lt;h3&gt;Discovery&lt;/h3&gt;
&lt;p&gt;Discovery in mdBook is very explicit. For every plugin we want mdBook to use,
it has to be listed in the project's &lt;tt class="docutils literal"&gt;book.toml&lt;/tt&gt; configuration file. For
example, in &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2025/plugin-mdbook"&gt;the code sample for this post&lt;/a&gt;, the Python narcissist plugin
is noted in &lt;tt class="docutils literal"&gt;book.toml&lt;/tt&gt; as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;[preprocessor.narcissistpy]
command = &amp;quot;python3 ../preprocessor-python-narcissist/narcissist.py&amp;quot;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Each preprocessor is a command for &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt; to execute in a sub-process.
Here it uses Python, but it can be anything else that can be validly executed.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="registration"&gt;
&lt;h3&gt;Registration&lt;/h3&gt;
&lt;p&gt;For the purpose of registration, &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt; actually invokes the plugin command
&lt;em&gt;twice&lt;/em&gt;. The first time, it passes the arguments &lt;tt class="docutils literal"&gt;supports &amp;lt;renderer&amp;gt;&lt;/tt&gt; where
&lt;tt class="docutils literal"&gt;&amp;lt;renderer&amp;gt;&lt;/tt&gt; is the name of the renderer (e.g. &lt;tt class="docutils literal"&gt;html&lt;/tt&gt;). If the command
returns 0, it means the preprocessor supports this renderer; otherwise, it
doesn't.&lt;/p&gt;
&lt;p&gt;In the second invocation, &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt; passes some metadata plus the entire book
in JSON format to the preprocessor through stdin, and expects the preprocessor
to return the modified book as JSON to stdout (using the same schema).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="hooks"&gt;
&lt;h3&gt;Hooks&lt;/h3&gt;
&lt;p&gt;In terms of hooks, &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt; takes a very coarse-grained approach. The
preprocessor gets the &lt;em&gt;entire book&lt;/em&gt; in a single JSON object (along with a
context object that contains metadata), and is expected to emit the entire
modified book in a single JSON object.
It's up to the preprocessor to figure out which parts of the book to read and
which parts to modify.&lt;/p&gt;
&lt;p&gt;Given that books and other documentation typically have limited sizes, this
is a reasonable design choice. Even tens of MiB of JSON-encoded data are very
quick to pass between sub-processes via stdout and marshal/unmarshal. But we
wouldn't be able to implement Wikipedia using this design.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="exposing-an-application-api-to-plugins"&gt;
&lt;h3&gt;Exposing an application API to plugins&lt;/h3&gt;
&lt;p&gt;This is tricky, given that the preprocessor mechanism is language-agnostic.
Here, &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt; only offers additional utilities to preprocessors implemented
in Rust. These get access to &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt;'s API to unmarshal the JSON
representing the context metadata and book's contents. &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt; offers the
&lt;a class="reference external" href="https://docs.rs/mdbook-preprocessor/latest/mdbook_preprocessor/trait.Preprocessor.html"&gt;Preprocessor trait&lt;/a&gt;
Rust preprocessors can implement, which makes it easier to wrangle the book's
contents. See &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2025/plugin-mdbook/preprocessor-rust-narcissist"&gt;my Rust version of the narcissist preprocessor&lt;/a&gt;
for a basic example of this.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="renderers-backends"&gt;
&lt;h2&gt;Renderers / backends&lt;/h2&gt;
&lt;p&gt;Actually, &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt; has &lt;em&gt;another&lt;/em&gt; plugin mechanism, but it's very similar
conceptually to preprocessors. A &lt;em&gt;renderer&lt;/em&gt; (also called a &lt;em&gt;backend&lt;/em&gt; in some
of &lt;tt class="docutils literal"&gt;mdBook&lt;/tt&gt;'s own doc pages) takes the same input as a preprocessor, but is
free to do whatever it wants with it. The default renderer emits the HTML
for the book; &lt;a class="reference external" href="https://github.com/rust-lang/mdBook/wiki/Third-party-plugins#backends"&gt;other renderers&lt;/a&gt;
can do other things.&lt;/p&gt;
&lt;p&gt;The idea is that the book can go through multiple preprocessors, but at the
end a &lt;em&gt;single&lt;/em&gt; renderer.&lt;/p&gt;
&lt;p&gt;The data a renderer receives is exactly the same as a preprocessor - JSON
encoded book contents. Due to this similarity, there's no real point getting
deeper into renderers in this post.&lt;/p&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Plugins"></category><category term="Python"></category><category term="Rust"></category></entry><entry><title>Revisiting "Let's Build a Compiler"</title><link href="https://eli.thegreenplace.net/2025/revisiting-lets-build-a-compiler/" rel="alternate"></link><published>2025-12-09T20:40:00-08:00</published><updated>2026-01-17T22:40:40-08:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2025-12-09:/2025/revisiting-lets-build-a-compiler/</id><summary type="html">&lt;p&gt;There's an old compiler-building tutorial that has become part of the field's
lore: the &lt;a class="reference external" href="https://compilers.iecc.com/crenshaw/"&gt;Let's Build a Compiler&lt;/a&gt;
series by Jack Crenshaw (published between 1988 and 1995).&lt;/p&gt;
&lt;p&gt;I &lt;a class="reference external" href="https://eli.thegreenplace.net/2003/07/29/great-compilers-tutorial"&gt;ran into it in 2003&lt;/a&gt;
and was very impressed, but it's now 2025 and this tutorial is still being mentioned quite
often …&lt;/p&gt;</summary><content type="html">&lt;p&gt;There's an old compiler-building tutorial that has become part of the field's
lore: the &lt;a class="reference external" href="https://compilers.iecc.com/crenshaw/"&gt;Let's Build a Compiler&lt;/a&gt;
series by Jack Crenshaw (published between 1988 and 1995).&lt;/p&gt;
&lt;p&gt;I &lt;a class="reference external" href="https://eli.thegreenplace.net/2003/07/29/great-compilers-tutorial"&gt;ran into it in 2003&lt;/a&gt;
and was very impressed, but it's now 2025 and this tutorial is still being mentioned quite
often &lt;a class="reference external" href="https://hn.algolia.com/?dateRange=pastYear&amp;amp;page=0&amp;amp;prefix=true&amp;amp;query=crenshaw&amp;amp;sort=byDate&amp;amp;type=all"&gt;in Hacker News threads&lt;/a&gt;.
Why is that? Why does a tutorial from 35
years ago, built in Pascal and emitting Motorola 68000 assembly - technologies that
are virtually unknown for the new generation of programmers - hold sway over
compiler enthusiasts? I've decided to find out.&lt;/p&gt;
&lt;p&gt;The tutorial is &lt;a class="reference external" href="https://compilers.iecc.com/crenshaw/"&gt;easily available and readable online&lt;/a&gt;, but
just re-reading it seemed insufficient. So I've decided on meticulously
translating the compilers built in it to Python and emit a more modern target -
WebAssembly. It was an enjoyable process and I want to share the outcome and
some insights gained along the way.&lt;/p&gt;
&lt;p&gt;The result is &lt;a class="reference external" href="https://github.com/eliben/letsbuildacompiler"&gt;this code repository&lt;/a&gt;.
Of particular interest is the &lt;a class="reference external" href="https://github.com/eliben/letsbuildacompiler/blob/main/TUTORIAL.md"&gt;TUTORIAL.md file&lt;/a&gt;,
which describes how each part in the original tutorial is mapped to my code. So
if you want to read the original tutorial but play with code you can actually
easily try on your own, feel free to follow my path.&lt;/p&gt;
&lt;div class="section" id="a-sample"&gt;
&lt;h2&gt;A sample&lt;/h2&gt;
&lt;p&gt;To get a taste of the input language being compiled and the output my compiler
generates, here's a sample program in the KISS language designed by Jack
Crenshaw:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;var X=0

 { sum from 0 to n-1 inclusive, and add to result }
 procedure addseq(n, ref result)
     var i, sum  { 0 initialized }
     while i &amp;lt; n
         sum = sum + i
         i = i + 1
     end
     result = result + sum
 end

 program testprog
 begin
     addseq(11, X)
 end
 .
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It's from part 13 of the tutorial, so it showcases procedures along with control
constructs like the &lt;tt class="docutils literal"&gt;while&lt;/tt&gt; loop, and passing parameters both by value and by
reference. Here's the WASM text generated by my compiler for part 13:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;module&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;memory&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="c1"&gt;;; Linear stack pointer. Used to pass parameters by ref.&lt;/span&gt;
  &lt;span class="c1"&gt;;; Grows downwards (towards lower addresses).&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="nv"&gt;$__sp&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;i32.const&lt;/span&gt; &lt;span class="mf"&gt;65536&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="nv"&gt;$X&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;i32.const&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="nv"&gt;$ADDSEQ&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;param&lt;/span&gt; &lt;span class="nv"&gt;$N&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;param&lt;/span&gt; &lt;span class="nv"&gt;$RESULT&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;local&lt;/span&gt; &lt;span class="nv"&gt;$I&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;local&lt;/span&gt; &lt;span class="nv"&gt;$SUM&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;loop&lt;/span&gt; &lt;span class="nv"&gt;$loop1&lt;/span&gt;
      &lt;span class="k"&gt;block&lt;/span&gt; &lt;span class="nv"&gt;$breakloop1&lt;/span&gt;
        &lt;span class="nb"&gt;local.get&lt;/span&gt; &lt;span class="nv"&gt;$I&lt;/span&gt;
        &lt;span class="nb"&gt;local.get&lt;/span&gt; &lt;span class="nv"&gt;$N&lt;/span&gt;
        &lt;span class="nb"&gt;i32.lt_s&lt;/span&gt;
        &lt;span class="nb"&gt;i32.eqz&lt;/span&gt;
        &lt;span class="nb"&gt;br_if&lt;/span&gt; &lt;span class="nv"&gt;$breakloop1&lt;/span&gt;
        &lt;span class="nb"&gt;local.get&lt;/span&gt; &lt;span class="nv"&gt;$SUM&lt;/span&gt;
        &lt;span class="nb"&gt;local.get&lt;/span&gt; &lt;span class="nv"&gt;$I&lt;/span&gt;
        &lt;span class="nb"&gt;i32.add&lt;/span&gt;
        &lt;span class="nb"&gt;local.set&lt;/span&gt; &lt;span class="nv"&gt;$SUM&lt;/span&gt;
        &lt;span class="nb"&gt;local.get&lt;/span&gt; &lt;span class="nv"&gt;$I&lt;/span&gt;
        &lt;span class="nb"&gt;i32.const&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="nb"&gt;i32.add&lt;/span&gt;
        &lt;span class="nb"&gt;local.set&lt;/span&gt; &lt;span class="nv"&gt;$I&lt;/span&gt;
        &lt;span class="nb"&gt;br&lt;/span&gt; &lt;span class="nv"&gt;$loop1&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="nb"&gt;local.get&lt;/span&gt; &lt;span class="nv"&gt;$RESULT&lt;/span&gt;
    &lt;span class="nb"&gt;local.get&lt;/span&gt; &lt;span class="nv"&gt;$RESULT&lt;/span&gt;
    &lt;span class="nb"&gt;i32.load&lt;/span&gt;
    &lt;span class="nb"&gt;local.get&lt;/span&gt; &lt;span class="nv"&gt;$SUM&lt;/span&gt;
    &lt;span class="nb"&gt;i32.add&lt;/span&gt;
    &lt;span class="nb"&gt;i32.store&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="nv"&gt;$main&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;main&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;result&lt;/span&gt; &lt;span class="kt"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;i32.const&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;
    &lt;span class="nb"&gt;global.get&lt;/span&gt; &lt;span class="nv"&gt;$__sp&lt;/span&gt;      &lt;span class="c1"&gt;;; make space on stack&lt;/span&gt;
    &lt;span class="nb"&gt;i32.const&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
    &lt;span class="nb"&gt;i32.sub&lt;/span&gt;
    &lt;span class="nb"&gt;global.set&lt;/span&gt; &lt;span class="nv"&gt;$__sp&lt;/span&gt;
    &lt;span class="nb"&gt;global.get&lt;/span&gt; &lt;span class="nv"&gt;$__sp&lt;/span&gt;
    &lt;span class="nb"&gt;global.get&lt;/span&gt; &lt;span class="nv"&gt;$X&lt;/span&gt;
    &lt;span class="nb"&gt;i32.store&lt;/span&gt;
    &lt;span class="nb"&gt;global.get&lt;/span&gt; &lt;span class="nv"&gt;$__sp&lt;/span&gt;    &lt;span class="c1"&gt;;; push address as parameter&lt;/span&gt;
    &lt;span class="nb"&gt;call&lt;/span&gt; &lt;span class="nv"&gt;$ADDSEQ&lt;/span&gt;
    &lt;span class="c1"&gt;;; restore parameter X by ref&lt;/span&gt;
    &lt;span class="nb"&gt;global.get&lt;/span&gt; &lt;span class="nv"&gt;$__sp&lt;/span&gt;
    &lt;span class="nb"&gt;i32.load&lt;/span&gt; &lt;span class="k"&gt;offset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nb"&gt;global.set&lt;/span&gt; &lt;span class="nv"&gt;$X&lt;/span&gt;
    &lt;span class="c1"&gt;;; clean up stack for ref parameters&lt;/span&gt;
    &lt;span class="nb"&gt;global.get&lt;/span&gt; &lt;span class="nv"&gt;$__sp&lt;/span&gt;
    &lt;span class="nb"&gt;i32.const&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
    &lt;span class="nb"&gt;i32.add&lt;/span&gt;
    &lt;span class="nb"&gt;global.set&lt;/span&gt; &lt;span class="nv"&gt;$__sp&lt;/span&gt;
    &lt;span class="nb"&gt;global.get&lt;/span&gt; &lt;span class="nv"&gt;$X&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You'll notice that there is some trickiness in the emitted code w.r.t. handling
the by-reference parameter (my &lt;a class="reference external" href="https://eli.thegreenplace.net/2025/notes-on-the-wasm-basic-c-abi/"&gt;previous post&lt;/a&gt;
deals with this issue in more detail). In general, though, the emitted code is
inefficient - there is close to 0 optimization applied.&lt;/p&gt;
&lt;p&gt;Also, if you're very diligent you'll notice something odd about the global
variable &lt;tt class="docutils literal"&gt;X&lt;/tt&gt; - it seems to be implicitly returned by the generated &lt;tt class="docutils literal"&gt;main&lt;/tt&gt;
function. This is just a testing facility that makes my compiler easy to test.
All the compilers are extensively tested - usually by running the
generated WASM code &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt; and verifying expected results.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="insights-what-makes-this-tutorial-so-special"&gt;
&lt;h2&gt;Insights - what makes this tutorial so special?&lt;/h2&gt;
&lt;p&gt;While reading the original tutorial again, I had on opportunity to reminisce on
what makes it so effective. Other than the very fluent and conversational
writing style of Jack Crenshaw, I think it's a combination of two key
factors:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;The tutorial builds a recursive-descent parser step by step, rather than
giving a long preface on automata and table-based parser generators. When
I first encountered it (in 2003), it was taken for granted that if you want
to write a parser then lex + yacc are the way to go &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;. Following the
development of a simple and clean hand-written
parser was a revelation that wholly changed my approach to the subject;
subsequently, hand-written recursive-descent parsers have been my go-to approach
&lt;a class="reference external" href="https://eli.thegreenplace.net/tag/recursive-descent-parsing"&gt;for almost 20 years now&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Rather than getting stuck in front-end minutiae, the tutorial goes straight
to generating working assembly code, from very early on. This was also a
breath of fresh air for engineers who grew up with more traditional courses
where you spend 90% of the time on parsing, type checking and other semantic
analysis and often run entirely out of steam by the time code generation
is taught.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To be honest, I don't think either of these are a big problem with modern
resources, but back in the day the tutorial clearly hit the right nerve with
many people.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="what-else-does-it-teach-us"&gt;
&lt;h2&gt;What else does it teach us?&lt;/h2&gt;
&lt;p&gt;Jack Crenshaw's tutorial takes the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Syntax-directed_translation"&gt;syntax-directed translation&lt;/a&gt;
approach, where code is emitted &lt;em&gt;while parsing&lt;/em&gt;, without having to divide the
compiler into explicit phases with IRs. As I said above, this is a fantastic
approach for getting started, but in the latter parts of the tutorial it starts
showing its limitations. Especially once we get to types, it becomes painfully
obvious that it would be very nice if we knew the types of expressions &lt;em&gt;before&lt;/em&gt;
we generate code for them.&lt;/p&gt;
&lt;p&gt;I don't know if this is implicated in Jack Crenshaw's abandoning the tutorial
at some point after part 14, but it may very well be. He keeps writing how
the emitted code is clearly sub-optimal &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt; and can be improved, but IMHO it's
just not that easy to improve using the syntax-directed translation strategy.
With perfect hindsight vision, I would probably use Part 14 (types) as a turning
point - emitting some kind of AST from the parser and then doing simple type
checking and analysis on that AST prior to generating code from it.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;All in all, the original tutorial remains a wonderfully readable introduction
to building compilers. This post and the &lt;a class="reference external" href="https://github.com/eliben/letsbuildacompiler"&gt;GitHub repository&lt;/a&gt;
it describes are a modest
contribution that aims to improve the experience of folks reading the original
tutorial today and not willing to use obsolete technologies. As always, let
me know if you run into any issues or have questions!&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;This is done using the &lt;a class="reference external" href="https://pypi.org/project/wasmtime/"&gt;Python bindings to wasmtime&lt;/a&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;By the way, gcc switched from YACC to hand-written recursive-descent
parsing in the 2004-2006 timeframe, and Clang has been implemented with
a recursive-descent parser from the start (2007).&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;p class="first"&gt;Concretely: when we compile &lt;tt class="docutils literal"&gt;subexpr1 + subexpr2&lt;/tt&gt; and the two sides have different
types, it would be mighty nice to know that &lt;em&gt;before&lt;/em&gt; we actually generate
the code for both sub-expressions. But the syntax-directed translation
approach just doesn't work that way.&lt;/p&gt;
&lt;p class="last"&gt;To be clear: it's easy to generate &lt;em&gt;working&lt;/em&gt; code; it's just not easy
to generate optimal code without some sort of type analysis that's
done before code is actually generated.&lt;/p&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Compilation"></category><category term="WebAssembly"></category><category term="Python"></category><category term="Recursive descent parsing"></category></entry></feed>