Student Writeup
For this project, I built a small Julia reimplementation of the original Matexpridea.
In my first attempt, I wrote a lexer and parser to read user metadata from code annotations, similar to the original implementation. That approach quickly became tedious, and I realized that I had jumped into coding a little too quickly before fully understanding Julia macros. Once I took some time to deeper my understanding of how macros worked, it became clear that they were a much better fit for the project.
One major advantage is that Julia already parses code into an Expr tree. This let me focus on compiler passes instead of tokenization. Another advantage is that macro expansion produces ordinary Julia code. I could inspect generated functions with @macroexpand, run them directly in tests, and compare the results against standard Julia array operations.
In my implementation, the user writes a normal Julia function, wraps it in @matexpr, and can add an @declare block to tell the compiler about input shapes and matrix structure.
Requirements Map
Here is how the implementation lines up with the project requirements:
| Requirement | Where I implemented it |
|---|---|
| Julia macro interface | @matexpr and @declare in src/macros.jl |
| Expression normalization | src/core and src/frontend/pipeline.jl |
| Symbolic differentiation and AD mode selection | src/frontend/diff.jl |
| Shape and structure analysis | src/analysis/structure.jl |
| Fixed-size code generation | src/backend/structured_codegen.jl |
| Generic Julia lowering fallback | src/backend/lowering.jl and src/backend/emit.jl |
| Tests | test/runtests.jl and the focused test files |
| Timing script | bench/benchmark.jl |
| Documentation and writeup | README.md and docs/src |
I also implementation some extra / potential A+ features!
I remember learning, both on my own and in one of Prof. Bindel’s research meetings, that forward-mode AD is especially efficient for functions with small input dimension and large output dimension, while reverse-mode AD is better for functions with large input dimension and small output dimension (like backpropagation). Because of that, I thought it would be cool to implement both and have the program automatically choose which differentiation mode to use based on the size information in the declaration metadata! (I go into more detail about this later).
User-Facing Design
The main syntax looks like this:
@matexpr function dense_mv(A, x)
@declare begin
input(A, (2, 3), Dense())
input(x, (3, 1), Dense())
end
A * x
endI chose this form because it feels like normal Julia. The macro gets a Julia expression tree from the parser, and the declarations are also Julia syntax instead of comments or strings.
The @declare block gives the compiler facts that are hard to recover from a plain expression:
- the variable role, currently only
input - fixed integer dimensions
- optional structure tags like
Dense(),Diagonal(), orSymmetric()
I learned that validating metadata early is worth it. For example, a Diagonal() or Symmetric() declaration must be square. Catching that at declaration time keeps later compiler passes simpler.
Compiler Pipeline
The macro expansion pipeline is:
- Read the function body and extract
@declaremetadata. - Remove Julia line-number nodes so expression comparisons are stable.
- Expand supported
deriv(...)calls. - Normalize simple algebraic forms.
- Infer matrix shape and structure from declarations.
- Apply structure-aware simplifications.
- Use a fixed-size specialization if the expression matches a supported matrix pattern.
- Fall back to ordinary lowered Julia code when no structured specialization applies.
I learned that keeping these steps separate made debugging much easier. When a test failed, I could usually tell whether the problem was in differentiation, shape inference, simplification, or code generation.
Expression Rewriting
The core rewrite layer works on Julia expressions. It supports bottom-up rewrites, fixed-point rewrites, and small pattern rules. I used it for cleanup such as:
x + 0 => x0 + x => xx * 1 => x1 * x => xx * 0 => 0(A')' => A(A * B)' => B' * A'
I learned that this kind of rewrite system is useful even when it is small. It made derivative results cleaner and kept generated code from getting too noisy. At the same time, I avoided trying to build a full computer algebra system.
Symbolic Differentiation
Inside a @matexpr function, the user can write deriv(...):
deriv(x * y, x) # y
deriv(sin(x), x) # cos(x)
deriv([x, x * y], x) # [1, y]
deriv(x * y, [x, y]) # [y, x]The forward path walks the expression and applies local symbolic rules like the product rule. This works well when there are only a few derivative directions or when the output is vector-valued.
The backward path is for scalar-output expressions. It starts with output adjoint 1 and pushes adjoints backward through the expression tree. When declarations show that the derivative input has more entries than the scalar output, the compiler picks this path automatically.
For example:
@matexpr function dot_grad(c, x)
@declare begin
input(c, (3, 1), Dense())
input(x, (3, 1), Dense())
end
deriv(c' * x, x)
endHere the output is scalar and x has three entries, so the compiler chooses the backward symbolic path and returns c.
I kept the implementation bounded to scalar-output objectives instead of trying to support every possible Jacobian case.
Shape And Structure Analysis
The structure pass tracks three facts:
- row count
- column count
- coarse matrix structure
The supported structures are Dense(), Symmetric(), Diagonal(), ZeroStruct(), and IdentityStruct().
The pass understands declared symbols, transpose, addition, subtraction, and multiplication. That enables simplifications such as:
Z + AandA + ZbecomeA.A - ZbecomesA.I * AandA * IbecomeAwhen dimensions match.D1 * D2stays diagonal.S'becomesSfor square symmetric matrices.D'becomesDfor square diagonal matrices.
Fixed-Size Code Generation
The structured backend emits scalarized Julia expressions for a few fixed-size patterns:
- diagonal matrix-vector multiplication
- dense or symmetric matrix-vector multiplication
- diagonal-diagonal multiplication
- dense or symmetric matrix-matrix multiplication
- matrix addition and subtraction
For a dense matrix-vector multiply, the generated result is a literal vector whose entries are scalar sums:
[
A[1, 1] * x[1] + A[1, 2] * x[2],
A[2, 1] * x[1] + A[2, 2] * x[2],
]I also added support for transposed declared operands by reversing generated indices. That means A' * x can use the same scalarized backend without materializing A' first.
Fallback Lowering
Not every expression gets a structured specialization. If the frontend can still represent the expression, the compiler lowers it into ordinary Julia code instead.
This fallback made the prototype more usable. It let simple scalar arithmetic and derivative examples compile even when no matrix optimization applied.
Challenges
The biggest challenge was shape correctness. At first, some simplifications looked obvious but were not safe once dimensions were included. This forced me to make shape checks part of the rewrite logic instead of treating them as a separate afterthought.
Another challenge was Julia's expression syntax. Matrix literals use internal forms like :vect, :row, and :vcat, so both differentiation and emission needed explicit support for those cases.
The reverse AD extension was also harder than I expected. It was not enough to differentiate scalar algebra, as the pass also needed enough shape metadata to handle expressions like c' * x cleanly.
Benchmark Results
After looking at the first benchmark results, I realized that some of my cases were not really the cases Matexpr is designed to win. Dense matrix-matrix multiplication, matrix addition, and matrix subtraction are already very fast in Julia, and my generated code still has to allocate dense output arrays.
So I changed the benchmark to focus on cases where the current implementation has a better chance: small fixed-size matrix-vector multiplication, diagonal matrix-vector multiplication, and structure-based simplifications like identity and zero operands. A representative local run with Julia 1.12.3 produced:
| Case | Base Julia | Matexpr | Base / Matexpr |
|---|---|---|---|
| dense matvec N=4 | 30.01 ns | 14.18 ns | 2.12x |
| dense matvec N=8 | 63.86 ns | 29.55 ns | 2.16x |
| dense matvec N=16 | 133.0 ns | 99.7 ns | 1.33x |
| diagonal matvec N=4 | 29.94 ns | 12.18 ns | 2.46x |
| diagonal matvec N=8 | 64.2 ns | 15.06 ns | 4.26x |
| diagonal matvec N=16 | 133.1 ns | 20.52 ns | 6.49x |
| identity matvec N=8 | 64.2 ns | 2.083 ns | 30.8x |
| zero add N=8 | 40.49 ns | 2.125 ns | 19.1x |
These results made a lot more sense to me. Dense matrix-vector multiplication does pretty well at small sizes because the generated code avoids some generic array-operation overhead. By N = 16, it still wins, but the win is smaller, which makes sense because Julia's built-in dense operations are already good.
The diagonal matrix-vector benchmark is the clearest arithmetic win. The input is still represented as a dense matrix with zeros, but the declaration tells Matexpr to only read the diagonal entries. As N grows, base Julia is still doing dense matrix-vector work, while Matexpr is only doing one multiply per row. That is why the speedup grows from 2.46x at N = 4 to 6.49x at N = 16. However, the results in this category are misleading, as Julia does have support within its linear alegbra library for declaring diagonal matrices.
The identity and zero cases show the value of structure metadata even more directly. I * x simplifies to x, and Z + A simplifies to A, so the generated function mostly just returns an existing input. That is a real compiler optimization, but is also potenitally misleading, as Julia's linear algebra also has support for identity/zero matrices. I suspect that benchmarking against those would show similar, but probably slightly worse timing, as Julia definietly has more built in optimizations.
I was also really confused when I saw such bad performance for the diagonal product my first benchmark. I think the reason is that my diagonal product code returns a full dense matrix, even though it exploits sparsity for the arithmetic. The cost is dominated by building the result, not the multiplication count. The better design would be to generate a diagonal result instead of a full dense matrix.
Testing
I organized tests by compiler layer:
- AST utilities and rewrite rules
- symbolic differentiation
- frontend processing
- code emission and lowering
- declaration parsing
- structure inference and simplification
- fixed-size structured code generation
- macro integration and executable generated functions
This helped a lot while developing. A derivative failure usually pointed to test/test_diff.jl, while a matrix-shape failure usually pointed to test/test_structure.jl or the structured-codegen tests.
What I Learned
I learned that explicit metadata is one of the most useful tools in a small compiler. Once the compiler knows dimensions and structure, it can make better choices and reject impossible cases earlier.
I also learned that a compiler project benefits from having a working path all the way through, even if it is not optimized. The generic lowering fallback made the project feel complete because every supported frontend expression still had some way to become executable Julia code.
Future Work
The next thing I would improve is storage-aware declarations. A diagonal input should be stored as its diagonal entries instead of as a full dense matrix. That would make diagonal specializations much more meaningful. Another reasonable extension would be to add more of the features implemented in the original matexpr, such as scratch temporaries and output and inout declarations.