Eli Bendersky's website - Python

Plugins case study: Pluggy

2026-06-13T20:21:00-07:00

Recently I came upon Pluggy, a Python library for developing plugin systems. It was originally developed as part of the pytest project - known for its rich plugin ecosystem - and later extracted into a standalone library. You're supposed to reach out for Pluggy if you want to add a plugin system to your tool or library and want to use something proven rather than rolling your own.

In this post I will share some notes on how Pluggy works, and will then review how it aligns with the fundamental concepts of plugin infrastructures.

Using Pluggy

Pluggy is built around the concept of hooks: functions that host applications or tools (from here on, just "hosts") expose and plugins implement. A host exposes hooks by using a decorator returned from pluggy.HookspecMarker and a plugin implements this hook using a decorator returned from pluggy.HookimplMarker.

Pluggy's documentation explains this fairly well; in this post, I'll show how to implement the htmlize tool with some plugins, introduced in the original article in my plugin series.

As a reminder, htmlize is a toy tool that takes markup notation similar to reStructuredText, and converts it to to HTML. It supports plugins to handle custom "roles" like:

some text :role:`customized text` and more text

As well as plugins that do arbitrary processing on the entire text.

Defining hooks

Out host defines two hooks:

import pluggy

hookspec = pluggy.HookspecMarker("htmlize")

@hookspec(firstresult=True)
def htmlize_role_handler(role_name):
    """Return a function accepting role contents.

    The function will be called with a single argument - the role contents, and
    should return what the role gets replaced with.
    """
    pass

@hookspec
def htmlize_contents(post, db):
    """Return a function accepting full document contents.

    The function will be called with a single argument - the document contents
    (after paragraph splitting and role processing), and should return the
    transformed contents.
    """
    pass

A hook is created by calling HookspecMarker with the project's name. This project name has to match between the host and its plugins. Pluggy is permissive about what hooks accept as parameters and what they return; for maximal flexibility and to stay true to the original htmlize example, our hooks return functions.

To accompany this HookspecMarker, the host also defines a HookimplMarker with the same name:

hookimpl = pluggy.HookimplMarker("htmlize")

This is used by plugins to attach to hooks when they're loaded.

Loading plugins in the host

The host's main function loads plugins at startup as follows:

pm = pluggy.PluginManager("htmlize")
pm.add_hookspecs(hookspecs)
pm.load_setuptools_entrypoints("htmlize")

hookspecs is our Python module containing the hooks shown above. load_setuptools_entrypoints is Pluggy's helper for loading plugins that were pip-installed into the same environment and registered as setuptools entry points. It's a way to signal - in one's setup.py or pyproject.toml file - some metadata that projects can review at runtime. In our project, the plugins register themselves with this section in the pyproject.toml file:

[project.entry-points.htmlize]
tt = "tt"

This says "for entry point htmlize, define a new entry named tt". Pluggy's load_setuptools_entrypoints then uses importlib.metadata to access this information.

Note that Pluggy doesn't require using this mechanism. Hosts can implement any plugin discovery method they want, and add plugins directly to their PluginManager with the register method. But this is the mechanism used for pytest and many other projects; it makes it very easy to automatically discover and register plugins that are installed with pip and equivalent tools.

Invoking plugins

Once PluginManager loads the plugins, invoking them is straightforward; here's how htmlize invokes the contents hooks [1]:

# Build full contents back again, and ask plugins to act on
# contents.
contents = ''.join(parts)
for handler in plugin_manager.hook.htmlize_contents(post=post, db=db):
    contents = handler(contents)
return contents

Generally, hook invocations return a list of all the hooks attached to by different plugins (a single host application can have multiple plugins installed and attaching to the same hook). When the host invokes the hook as shown above, the default order is LIFO, but plugins can affect this with hook options like tryfirst and trylast.

Implementing hooks in plugins

Here's our entire narcissist plugin that's attaching to the contents hook:

import htmlize

@htmlize.hookimpl
def htmlize_contents(post, db):
    repl = f'<b>I ({post.author})</b>'

    def hook(contents):
        return re.sub(r'\bI\b', repl, contents)

    return hook

Some notes:

It expects htmlize to be installed; as discussed previously, we rely on Pluggy's default install-based approach where both the host and plugins are installed into the same Python environment and can thus find each other. However, Pluggy supports any custom discovery method.
It uses the hookimpl exported value shown earlier.
It returns a function that acts on contents; this is the htmlize-specific contract (ABI, if you will) we've discussed before.

Fundamental plugin concepts in this case study

Let's see how this case study of Pluggy measures against the Fundamental plugin concepts that were covered several times on this blog.

It's important to remember that Pluggy is not a specific host application with a bespoke plugin system; rather, it's a reusable library for creating such plugin systems. Therefore, this is more of a meta case study.

Discovery

Generally, Pluggy leaves discovery logic to the user's discretion. Its PluginManager has a register method for adding plugins, and these can be discovered in any way the application chooses.

That said, Pluggy comes with one discovery mechanism built in - through the entry points process of Python packaging, as shown above. This is hugely convenient for a large number of applications, as long as both the application and its plugins are installed via standard Python packaging tools (which is a very reasonable assumption in the Python ecosystem).

Registration

In the entry point process, plugins register themselves by adding a [project.entry-points.<HOST-ID>] section in their pyproject.toml file.

Otherwise - as in the previous section - users are free to devise their own registration schemes.

Hooks

This one is easy, since it's called hooks in Pluggy parlance as well! Pluggy's implementation of hooks is rather elegant, with function decorators available for plugins to set. We've seen an example of this above with @htmlize.hookimpl decorating htmlize_contents.

Exposing an application API to plugins

Since Pluggy is designed for Python hosts and Python plugins, this one is fairly straightforward. The plugins typically assume the host project is already installed in the Python environment and its modules can be imported.

In our example, hookimpl is imported from htmlize by the plugin to accomplish this. It also shows how host data is passed to the plugin - the post and db parameters. These are APIs exposed by the host for the plugins' use.

Conclusion - is Pluggy worth it?

In footnote 2 of my original fundamental concepts of plugin infrastructures post, I wrote [2]:

This is probably why there are very few well-established plugin frameworks in existence (even in low-level languages like C or C++). It's too easy (and tempting) to roll your own.

I still believe my statement is true - plugin frameworks are very easy to create, and the functionality they provide is relatively small compared to their large surface area. In other words, this is a shallow API.

That said, Pluggy does provide some nice functionality for the more advanced uses of plugins:

Automatic entry point registration mechanism - if you need it
Signature validation
Consistent plugin result collection across multiple hook attachments in a single plugin and across many plugins
Plugin ordering with firstresult, tryfirst, trylast, etc.
Hook "wrappers" for some special use cases

Are these worthwhile for your project? It really depends on the project, and it's always worth keeping the tradeoff between dependencies and project effort in mind.

Code

The full code repository for this post is available here.

[1]	Here `plugin_manager` is the value previously returned from `pluggy.PluginManager`; in the previous code snippet it's saved into `pm` - the different variable name is because a function call is made and `plugin_manager` is the parameter name.

[2]	To be fair, that post predates the creation of Pluggy!

Rewriting pycparser with the help of an LLM

2026-02-04T19:35:00-08:00

pycparser is my most widely used open source project (with ~20M daily downloads from PyPI [1]). It's a pure-Python parser for the C programming language, producing ASTs inspired by Python's own. Until very recently, it's been using PLY: Python Lex-Yacc for the core parsing.

In this post, I'll describe how I collaborated with an LLM coding agent (Codex) to help me rewrite pycparser to use a hand-written recursive-descent parser and remove the dependency on PLY. This has been an interesting experience and the post contains lots of information and is therefore quite long; if you're just interested in the final result, check out the latest code of pycparser - the main branch already has the new implementation.

The issues with the existing parser implementation

While pycparser has been working well overall, there were a number of nagging issues that persisted over years.

Parsing strategy: YACC vs. hand-written recursive descent

I began working on pycparser in 2008, and back then using a YACC-based approach for parsing a whole language like C seemed like a no-brainer to me. Isn't this what everyone does when writing a serious parser? Besides, the K&R2 book famously carries the entire grammar of the C99 language in an appendix - so it seemed like a simple matter of translating that to PLY-yacc syntax.

And indeed, it wasn't too hard, though there definitely were some complications in building the ASTs for declarations (C's gnarliest part).

Shortly after completing pycparser, I got more and more interested in compilation and started learning about the different kinds of parsers more seriously. Over time, I grew convinced that recursive descent is the way to go - producing parsers that are easier to understand and maintain (and are often faster!).

It all ties in to the benefits of dependencies in software projects as a function of effort. Using parser generators is a heavy conceptual dependency: it's really nice when you have to churn out many parsers for small languages. But when you have to maintain a single, very complex parser, as part of a large project - the benefits quickly dissipate and you're left with a substantial dependency that you constantly grapple with.

The other issue with dependencies

And then there are the usual problems with dependencies; dependencies get abandoned, and they may also develop security issues. Sometimes, both of these become true.

Many years ago, pycparser forked and started vendoring its own version of PLY. This was part of transitioning pycparser to a dual Python 2/3 code base when PLY was slower to adapt. I believe this was the right decision, since PLY "just worked" and I didn't have to deal with active (and very tedious in the Python ecosystem, where packaging tools are replaced faster than dirty socks) dependency management.

A couple of weeks ago this issue was opened for pycparser. It turns out the some old PLY code triggers security checks used by some Linux distributions; while this code was fixed in a later commit of PLY, PLY itself was apparently abandoned and archived in late 2025. And guess what? That happened in the middle of a large rewrite of the package, so re-vendoring the pre-archiving commit seemed like a risky proposition.

On the issue it was suggested that "hopefully the dependent packages move on to a non-abandoned parser or implement their own"; I originally laughed this idea off, but then it got me thinking... which is what this post is all about.

Growing complexity of parsing a messy language

The original K&R2 grammar for C99 had - famously - a single shift-reduce conflict having to do with dangling elses belonging to the most recent if statement. And indeed, other than the famous lexer hack used to deal with C's type name / ID ambiguity, pycparser only had this single shift-reduce conflict.

But things got more complicated. Over the years, features were added that weren't strictly in the standard but were supported by all the industrial compilers. The more advanced C11 and C23 standards weren't beholden to the promises of conflict-free YACC parsing (since almost no industrial-strength compilers use YACC at this point), so all caution went out of the window.

The latest (PLY-based) release of pycparser has many reduce-reduce conflicts [2]; these are a severe maintenance hazard because it means the parsing rules essentially have to be tie-broken by order of appearance in the code. This is very brittle; pycparser has only managed to maintain its stability and quality through its comprehensive test suite. Over time, it became harder and harder to extend, because YACC parsing rules have all kinds of spooky-action-at-a-distance effects. The straw that broke the camel's back was this PR which again proposed to increase the number of reduce-reduce conflicts [3].

This - again - prompted me to think "what if I just dump YACC and switch to a hand-written recursive descent parser", and here we are.

The mental roadblock

None of the challenges described above are new; I've been pondering them for many years now, and yet biting the bullet and rewriting the parser didn't feel like something I'd like to get into. By my private estimates it'd take at least a week of deep heads-down work to port the gritty 2000 lines of YACC grammar rules to a recursive descent parser [4]. Moreover, it wouldn't be a particularly fun project either - I didn't feel like I'd learn much new and my interests have shifted away from this project. In short, the Potential well was just too deep.

Why would this even work? Tests

I've definitely noticed the improvement in capabilities of LLM coding agents in the past few months, and many reputable people online rave about using them for increasingly larger projects. That said, would an LLM agent really be able to accomplish such a complex project on its own? This isn't just a toy, it's thousands of lines of dense parsing code.

What gave me hope is the concept of conformance suites mentioned by Simon Willison. Agents seem to do well when there's a very clear and rigid goal function - such as a large, high-coverage conformance test suite.

And pycparser has an very extensive one. Over 2500 lines of test code parsing various C snippets to ASTs with expected results, grown over a decade and a half of real issues and bugs reported by users.

I figured the LLM can either succeed or fail and throw its hands up in despair, but it's quite unlikely to produce a wrong port that would still pass all the tests. So I set it to run.

The initial port

I fired up Codex in pycparser's repository, and wrote this prompt just to make sure it understands me and can run the tests:

This repository implements a Python parser producing an AST. It has a very
extensive test suite; can you first verify you successfully run all the tests?
(with python3 -m unittest discover, from the root dir)

Codex figured it out (I gave it the exact command, after all!); my next prompt was the real thing [5]:

pycparser uses the PLY library for its lexer and parser, and its own AST
package for the ASTs created by the parser. I want to explore replacing the
Parser component (leaving the Lexer alone for now) with a hand written
recursive descent parser that will pass all the same tests. Can you try that?

Here Codex went to work and churned for over an hour. Having never observed an agent work for nearly this long, I kind of assumed it went off the rails and will fail sooner or later. So I was rather surprised and skeptical when it eventually came back with:

Replaced the PLY parse path with a handwritten recursive‑descent parser while
keeping the existing lexer and AST helpers intact; CParser now points to the
new RD implementation, with the old PLY version preserved as PLYCParser in
pycparser/c_parser.py. The RD parser includes a token stream, declarator/type
parsing, expression precedence handling, pragmas, and statement parsing to
match the previous AST output and coordinate behavior.

It took me a while to poke around the code and run it until I was convinced - it had actually done it! It wrote a new recursive descent parser with only ancillary dependencies on PLY, and that parser passed the test suite. After a few more prompts, we've removed the ancillary dependencies and made the structure clearer. I hadn't looked too deeply into code quality at this point, but at least on the functional level - it succeeded. This was very impressive!

A quick note on reviews and branches

A change like the one described above is impossible to code-review as one PR in any meaningful way; so I used a different strategy. Before embarking on this path, I created a new branch and once Codex finished the initial rewrite, I committed this change, knowing that I will review it in detail, piece-by-piece later on.

Even though coding agents have their own notion of history and can "revert" certain changes, I felt much safer relying on Git. In the worst case if all of this goes south, I can nuke the branch and it's as if nothing ever happened. I was determined to only merge this branch onto main once I was fully satisfied with the code. In what follows, I had to git reset several times when I didn't like the direction in which Codex was going. In hindsight, doing this work in a branch was absolutely the right choice.

The long tail of goofs

Once I've sufficiently convinced myself that the new parser is actually working, I used Codex to similarly rewrite the lexer and get rid of the PLY dependency entirely, deleting it from the repository. Then, I started looking more deeply into code quality - reading the code created by Codex and trying to wrap my head around it.

And - oh my - this was quite the journey. Much has been written about the code produced by agents, and much of it seems to be true. Maybe it's a setting I'm missing (I'm not using my own custom AGENTS.md yet, for instance), but Codex seems to be that eager programmer that wants to get from A to B whatever the cost. Readability, minimalism and code clarity are very much secondary goals.

Using raise...except for control flow? Yep. Abusing Python's weak typing (like having None, false and other values all mean different things for a given variable)? For sure. Spreading the logic of a complex function all over the place instead of putting all the key parts in a single switch statement? You bet.

Moreover, the agent is hilariously lazy. More than once I had to convince it to do something it initially said is impossible, and even insisted again in follow-up messages. The anthropomorphization here is mildly concerning, to be honest. I could never imagine I would be writing something like the following to a computer, and yet - here we are: "Remember how we moved X to Y before? You can do it again for Z, definitely. Just try".

My process was to see how I can instruct Codex to fix things, and intervene myself (by rewriting code) as little as possible. I've mostly succeeded in this, and did maybe 20% of the work myself.

My branch grew dozens of commits, falling into roughly these categories:

The code in X is too complex; why can't we do Y instead?
The use of X is needlessly convoluted; change Y to Z, and T to V in all instances.
The code in X is unclear; please add a detailed comment - with examples - to explain what it does.

Interestingly, after doing (3), the agent was often more effective in giving the code a "fresh look" and succeeding in either (1) or (2).

The end result

Eventually, after many hours spent in this process, I was reasonably pleased with the code. It's far from perfect, of course, but taking the essential complexities into account, it's something I could see myself maintaining (with or without the help of an agent). I'm sure I'll find more ways to improve it in the future, but I have a reasonable degree of confidence that this will be doable.

It passes all the tests, so I've been able to release a new version (3.00) without major issues so far. The only issue I've discovered is that some of CFFI's tests are overly precise about the phrasing of errors reported by pycparser; this was an easy fix.

The new parser is also faster, by about 30% based on my benchmarks! This is typical of recursive descent when compared with YACC-generated parsers, in my experience. After reviewing the initial rewrite of the lexer, I've spent a while instructing Codex on how to make it faster, and it worked reasonably well.

Followup - static typing

While working on this, it became quite obvious that static typing would make the process easier. LLM coding agents really benefit from closed loops with strict guardrails (e.g. a test suite to pass), and type-annotations act as such. For example, had pycparser already been type annotated, Codex would probably not have overloaded values to multiple types (like None vs. False vs. others).

In a followup, I asked Codex to type-annotate pycparser (running checks using ty), and this was also a back-and-forth because the process exposed some issues that needed to be refactored. Time will tell, but hopefully it will make further changes in the project simpler for the agent.

Based on this experience, I'd bet that coding agents will be somewhat more effective in strongly typed languages like Go, TypeScript and especially Rust.

Conclusions

Overall, this project has been a really good experience, and I'm impressed with what modern LLM coding agents can do! While there's no reason to expect that progress in this domain will stop, even if it does - these are already very useful tools that can significantly improve programmer productivity.

Could I have done this myself, without an agent's help? Sure. But it would have taken me much longer, assuming that I could even muster the will and concentration to engage in this project. I estimate it would take me at least a week of full-time work (so 30-40 hours) spread over who knows how long to accomplish. With Codex, I put in an order of magnitude less work into this (around 4-5 hours, I'd estimate) and I'm happy with the result.

It was also fun. At least in one sense, my professional life can be described as the pursuit of focus, deep work and flow. It's not easy for me to get into this state, but when I do I'm highly productive and find it very enjoyable. Agents really help me here. When I know I need to write some code and it's hard to get started, asking an agent to write a prototype is a great catalyst for my motivation. Hence the meme at the beginning of the post.

Does code quality even matter?

One can't avoid a nagging question - does the quality of the code produced by agents even matter? Clearly, the agents themselves can understand it (if not today's agent, then at least next year's). Why worry about future maintainability if the agent can maintain it? In other words, does it make sense to just go full vibe-coding?

This is a fair question, and one I don't have an answer to. Right now, for projects I maintain and stand behind, it seems obvious to me that the code should be fully understandable and accepted by me, and the agent is just a tool helping me get to that state more efficiently. It's hard to say what the future holds here; it's going to interesting, for sure.

[1]	pycparser has a fair number of direct dependents, but the majority of downloads comes through CFFI, which itself is a major building block for much of the Python ecosystem.

[2]	The table-building report says 177, but that's certainly an over-dramatization because it's common for a single conflict to manifest in several ways.

[3]	It didn't help the PR's case that it was almost certainly vibe coded.

[4]

There was also the lexer to consider, but this seemed like a much simpler job. My impression is that in the early days of computing, lex gained prominence because of strong regexp support which wasn't very common yet. These days, with excellent regexp libraries existing for pretty much every language, the added value of lex over a custom regexp-based lexer isn't very high.

That said, it wouldn't make much sense to embark on a journey to rewrite just the lexer; the dependency on PLY would still remain, and besides, PLY's lexer and parser are designed to work well together. So it wouldn't help me much without tackling the parser beast.

[5]	I've decided to ask it to the port the parser first, leaving the lexer alone. This was to split the work into reasonable chunks. Besides, I figured that the parser is the hard job anyway - if it succeeds in that, the lexer should be easy. That assumption turned out to be correct.

Compiling Scheme to WebAssembly

2026-01-17T14:37:00-08:00

One of my oldest open-source projects - Bob - has celebrated 15 a couple of months ago. Bob is a suite of implementations of the Scheme programming language in Python, including an interpreter, a compiler and a VM. Back then I was doing some hacking on CPython internals and was very curious about how CPython-like bytecode VMs work; Bob was an experiment to find out, by implementing one from scratch for R5RS Scheme.

Several months later I added a C++ VM to Bob, as an exercise to learn how such VMs are implemented in a low-level language without all the runtime support Python provides; most importantly, without the built-in GC. The C++ VM in Bob implements its own mark-and-sweep GC.

After many quiet years (with just a sprinkling of cosmetic changes, porting to GitHub, updates to Python 3, etc), I felt the itch to work on Bob again just before the holidays. Specifically, I decided to add another compiler to the suite - this one from Scheme directly to WebAssembly.

The goals of this effort were two-fold:

Experiment with lowering a real, high-level language like Scheme to WebAssembly. Experiments like the recent Let's Build a Compiler compile toy languages that are at the C level (no runtime). Scheme has built-in data structures, lexical closures, garbage collection, etc. It's much more challenging.
Get some hands-on experience with the WASM GC extension [1]. I have several samples of using WASM GC in the wasm-wat-samples repository, but I really wanted to try it for something "real".

Well, it's done now; here's an updated schematic of the Bob project:

The new part is the rightmost vertical path. A WasmCompiler class lowers parsed Scheme expressions all the way down to WebAssembly text, which can then be compiled to a binary and executed using standard WASM tools [2].

Highlights

The most interesting aspect of this project was working with WASM GC to represent Scheme objects. As long as we properly box/wrap all values in refs, the underlying WASM execution environment will take care of the memory management.

For Bob, here's how some key Scheme objects are represented:

;; PAIR holds the car and cdr of a cons cell.
(type $PAIR (struct (field (mut (ref null eq))) (field (mut (ref null eq)))))

;; BOOL represents a Scheme boolean. zero -> false, nonzero -> true.
(type $BOOL (struct (field i32)))

;; SYMBOL represents a Scheme symbol. It holds an offset in linear memory
;; and the length of the symbol name.
(type $SYMBOL (struct (field i32) (field i32)))

$PAIR is of particular interest, as it may contain arbitrary objects in its fields; (ref null eq) means "a nullable reference to something that has identity". ref.test can be used to check - for a given reference - the run-time type of the value it refers to.

You may wonder - what about numeric values? Here WASM has a trick - the i31 type can be used to represent a reference to an integer, but without actually boxing it (one bit is used to distinguish such an object from a real reference). So we don't need a separate type to hold references to numbers.

Also, the $SYMBOL type looks unusual - how is it represented with two numbers? The key to the mystery is that WASM has no built-in support for strings; they should be implemented manually using offsets to linear memory. The Bob WASM compiler emits the string values of all symbols encountered into linear memory, keeping track of the offset and length of each one; these are the two numbers placed in $SYMBOL. This also allows to fairly easily implement the string interning feature of Scheme; multiple instances of the same symbol will only be allocated once.

Consider this trivial Scheme snippet:

(write '(10 20 foo bar))

The compiler emits the symbols "foo" and "bar" into linear memory as follows [3]:

(data (i32.const 2048) "foo")
(data (i32.const 2051) "bar")

And looking for one of these addresses in the rest of the emitted code, we'll find:

(struct.new $SYMBOL (i32.const 2051) (i32.const 3))

As part of the code for constructing the constant cons list representing the argument to write; address 2051 and length 3: this is the symbol bar.

Speaking of write, implementing this builtin was quite interesting. For compatibility with the other Bob implementations in my repository, write needs to be able to print recursive representations of arbitrary Scheme values, including lists, symbols, etc.

Initially I was reluctant to implement all of this functionality by hand in WASM text, but all alternatives ran into challenges:

Deferring this to the host is difficult because the host environment has no access to WASM GC references - they are completely opaque.
Implementing it in another language (maybe C?) and lowering to WASM is also challenging for a similar reason - the other language is unlikely to have a good representation of WASM GC objects.

So I bit the bullet and - with some AI help for the tedious parts - just wrote an implementation of write directly in WASM text; it wasn't really that bad. I import only two functions from the host:

(import "env" "write_char" (func $write_char (param i32)))
(import "env" "write_i32" (func $write_i32 (param i32)))

Though emitting integers directly from WASM isn't hard, I figured this project already has enough code and some host help here would be welcome. For all the rest, only the lowest level write_char is used. For example, here's how booleans are emitted in the canonical Scheme notation (#t and #f):

(func $emit_bool (param $b (ref $BOOL))
    (call $emit (i32.const 35)) ;; '#'
    (if (i32.eqz (struct.get $BOOL 0 (local.get $b)))
        (then (call $emit (i32.const 102))) ;; 'f'
        (else (call $emit (i32.const 116))) ;; 't'
    )
)

Conclusion

This was a really fun project, and I learned quite a bit about realistic code emission to WASM. Feel free to check out the source code of WasmCompiler - it's very well documented. While it's a bit over 1000 LOC in total [4], more than half of that is actually WASM text snippets that implement the builtin types and functions needed by a basic Scheme implementation.

[1]	The GC proposal is documented here. It was officially added to the WASM spec in Oct 2023.

[2]	In Bob this is currently done with `bytecodealliance/wasm-tools` for the text-to-binary conversion and Node.js for the execution environment, but this can change in the future. I actually wanted to use Python bindings to wasmtime, but these don't appear to support WASM GC yet.

[3]	2048 is just an arbitrary offset the compiler uses as the beginning of the section for symbols in memory. We could also use the multiple memories feature of WASM and dedicate a separate linear memory just for symbols.

[4]	To be clear, this is just the WASM compiler class; it uses the `Expr` representation of Scheme that is created by Bob's parser (and lexer); the code of these other components is shared among all Bob implementations and isn't counted here.

Plugins case study: mdBook preprocessors

2025-12-17T18:11:00-08:00

mdBook is a tool for easily creating books out of Markdown files. It's very popular in the Rust ecosystem, where it's used (among other things) to publish the official Rust book.

mdBook has a simple yet effective plugin mechanism that can be used to modify the book output in arbitrary ways, using any programming language or tool. This post describes the mechanism and how it aligns with the fundamental concepts of plugin infrastructures.

mdBook preprocessors

mdBook's architecture is pretty simple: your contents go into a directory tree of Markdown files. mdBook then renders these into a book, with one file per chapter. The book's output is HTML by default, but mdBook supports other outputs like PDF.

The preprocessor mechanism lets us register an arbitrary program that runs on the book's source after it's loaded from Markdown files; this program can modify the book's contents in any way it wishes before it all gets sent to the renderer for generating output.

The official documentation explains this process very well.

Sample plugin

I rewrote my classical "nacrissist" plugin for mdBook; the code is available here.

In fact, there are two renditions of the same plugin there:

One in Python, to demonstrate how mdBook can invoke preprocessors written in any programming language.
One in Rust, to demonstrate how mdBook exposes an application API to plugins written in Rust (since mdBook is itself written in Rust).

Fundamental plugin concepts in this case study

Let's see how this case study of mdBook preprocessors measures against the Fundamental plugin concepts that were covered several times on this blog.

Discovery

Discovery in mdBook is very explicit. For every plugin we want mdBook to use, it has to be listed in the project's book.toml configuration file. For example, in the code sample for this post, the Python narcissist plugin is noted in book.toml as follows:

[preprocessor.narcissistpy]
command = "python3 ../preprocessor-python-narcissist/narcissist.py"

Each preprocessor is a command for mdBook to execute in a sub-process. Here it uses Python, but it can be anything else that can be validly executed.

Registration

For the purpose of registration, mdBook actually invokes the plugin command twice. The first time, it passes the arguments supports <renderer> where <renderer> is the name of the renderer (e.g. html). If the command returns 0, it means the preprocessor supports this renderer; otherwise, it doesn't.

In the second invocation, mdBook passes some metadata plus the entire book in JSON format to the preprocessor through stdin, and expects the preprocessor to return the modified book as JSON to stdout (using the same schema).

Hooks

In terms of hooks, mdBook takes a very coarse-grained approach. The preprocessor gets the entire book in a single JSON object (along with a context object that contains metadata), and is expected to emit the entire modified book in a single JSON object. It's up to the preprocessor to figure out which parts of the book to read and which parts to modify.

Given that books and other documentation typically have limited sizes, this is a reasonable design choice. Even tens of MiB of JSON-encoded data are very quick to pass between sub-processes via stdout and marshal/unmarshal. But we wouldn't be able to implement Wikipedia using this design.

Exposing an application API to plugins

This is tricky, given that the preprocessor mechanism is language-agnostic. Here, mdBook only offers additional utilities to preprocessors implemented in Rust. These get access to mdBook's API to unmarshal the JSON representing the context metadata and book's contents. mdBook offers the Preprocessor trait Rust preprocessors can implement, which makes it easier to wrangle the book's contents. See my Rust version of the narcissist preprocessor for a basic example of this.

Renderers / backends

Actually, mdBook has another plugin mechanism, but it's very similar conceptually to preprocessors. A renderer (also called a backend in some of mdBook's own doc pages) takes the same input as a preprocessor, but is free to do whatever it wants with it. The default renderer emits the HTML for the book; other renderers can do other things.

The idea is that the book can go through multiple preprocessors, but at the end a single renderer.

The data a renderer receives is exactly the same as a preprocessor - JSON encoded book contents. Due to this similarity, there's no real point getting deeper into renderers in this post.

Revisiting "Let's Build a Compiler"

2025-12-09T20:40:00-08:00

There's an old compiler-building tutorial that has become part of the field's lore: the Let's Build a Compiler series by Jack Crenshaw (published between 1988 and 1995).

I ran into it in 2003 and was very impressed, but it's now 2025 and this tutorial is still being mentioned quite often in Hacker News threads. Why is that? Why does a tutorial from 35 years ago, built in Pascal and emitting Motorola 68000 assembly - technologies that are virtually unknown for the new generation of programmers - hold sway over compiler enthusiasts? I've decided to find out.

The tutorial is easily available and readable online, but just re-reading it seemed insufficient. So I've decided on meticulously translating the compilers built in it to Python and emit a more modern target - WebAssembly. It was an enjoyable process and I want to share the outcome and some insights gained along the way.

The result is this code repository. Of particular interest is the TUTORIAL.md file, which describes how each part in the original tutorial is mapped to my code. So if you want to read the original tutorial but play with code you can actually easily try on your own, feel free to follow my path.

A sample

To get a taste of the input language being compiled and the output my compiler generates, here's a sample program in the KISS language designed by Jack Crenshaw:

var X=0

 { sum from 0 to n-1 inclusive, and add to result }
 procedure addseq(n, ref result)
     var i, sum  { 0 initialized }
     while i < n
         sum = sum + i
         i = i + 1
     end
     result = result + sum
 end

 program testprog
 begin
     addseq(11, X)
 end
 .

It's from part 13 of the tutorial, so it showcases procedures along with control constructs like the while loop, and passing parameters both by value and by reference. Here's the WASM text generated by my compiler for part 13:

(module
  (memory 8)
  ;; Linear stack pointer. Used to pass parameters by ref.
  ;; Grows downwards (towards lower addresses).
  (global $__sp (mut i32) (i32.const 65536))

  (global $X (mut i32) (i32.const 0))

  (func $ADDSEQ (param $N i32) (param $RESULT i32)
    (local $I i32)
    (local $SUM i32)
    loop $loop1
      block $breakloop1
        local.get $I
        local.get $N
        i32.lt_s
        i32.eqz
        br_if $breakloop1
        local.get $SUM
        local.get $I
        i32.add
        local.set $SUM
        local.get $I
        i32.const 1
        i32.add
        local.set $I
        br $loop1
      end
    end
    local.get $RESULT
    local.get $RESULT
    i32.load
    local.get $SUM
    i32.add
    i32.store
  )

  (func $main (export "main") (result i32)
    i32.const 11
    global.get $__sp      ;; make space on stack
    i32.const 4
    i32.sub
    global.set $__sp
    global.get $__sp
    global.get $X
    i32.store
    global.get $__sp    ;; push address as parameter
    call $ADDSEQ
    ;; restore parameter X by ref
    global.get $__sp
    i32.load offset=0
    global.set $X
    ;; clean up stack for ref parameters
    global.get $__sp
    i32.const 4
    i32.add
    global.set $__sp
    global.get $X
  )
)

You'll notice that there is some trickiness in the emitted code w.r.t. handling the by-reference parameter (my previous post deals with this issue in more detail). In general, though, the emitted code is inefficient - there is close to 0 optimization applied.

Also, if you're very diligent you'll notice something odd about the global variable X - it seems to be implicitly returned by the generated main function. This is just a testing facility that makes my compiler easy to test. All the compilers are extensively tested - usually by running the generated WASM code [1] and verifying expected results.

Insights - what makes this tutorial so special?

While reading the original tutorial again, I had on opportunity to reminisce on what makes it so effective. Other than the very fluent and conversational writing style of Jack Crenshaw, I think it's a combination of two key factors:

The tutorial builds a recursive-descent parser step by step, rather than giving a long preface on automata and table-based parser generators. When I first encountered it (in 2003), it was taken for granted that if you want to write a parser then lex + yacc are the way to go [2]. Following the development of a simple and clean hand-written parser was a revelation that wholly changed my approach to the subject; subsequently, hand-written recursive-descent parsers have been my go-to approach for almost 20 years now.
Rather than getting stuck in front-end minutiae, the tutorial goes straight to generating working assembly code, from very early on. This was also a breath of fresh air for engineers who grew up with more traditional courses where you spend 90% of the time on parsing, type checking and other semantic analysis and often run entirely out of steam by the time code generation is taught.

To be honest, I don't think either of these are a big problem with modern resources, but back in the day the tutorial clearly hit the right nerve with many people.

What else does it teach us?

Jack Crenshaw's tutorial takes the syntax-directed translation approach, where code is emitted while parsing, without having to divide the compiler into explicit phases with IRs. As I said above, this is a fantastic approach for getting started, but in the latter parts of the tutorial it starts showing its limitations. Especially once we get to types, it becomes painfully obvious that it would be very nice if we knew the types of expressions before we generate code for them.

I don't know if this is implicated in Jack Crenshaw's abandoning the tutorial at some point after part 14, but it may very well be. He keeps writing how the emitted code is clearly sub-optimal [3] and can be improved, but IMHO it's just not that easy to improve using the syntax-directed translation strategy. With perfect hindsight vision, I would probably use Part 14 (types) as a turning point - emitting some kind of AST from the parser and then doing simple type checking and analysis on that AST prior to generating code from it.

Conclusion

All in all, the original tutorial remains a wonderfully readable introduction to building compilers. This post and the GitHub repository it describes are a modest contribution that aims to improve the experience of folks reading the original tutorial today and not willing to use obsolete technologies. As always, let me know if you run into any issues or have questions!

[1]	This is done using the Python bindings to wasmtime.

[2]	By the way, gcc switched from YACC to hand-written recursive-descent parsing in the 2004-2006 timeframe, and Clang has been implemented with a recursive-descent parser from the start (2007).

[3]

Concretely: when we compile subexpr1 + subexpr2 and the two sides have different types, it would be mighty nice to know that before we actually generate the code for both sub-expressions. But the syntax-directed translation approach just doesn't work that way.

To be clear: it's easy to generate working code; it's just not easy to generate optimal code without some sort of type analysis that's done before code is actually generated.