Expander & ECC Prover Update
We're excited to share the latest improvements to Expander, our proving backend, and its integration with ECC. This update continues our focus on performance, flexibility, and real-world usability, especially for demanding applications like zkML. Here's a high-level overview of what's changed since our last tagged release:
- (general) Enhanced shared memory in MPI for more efficient inter-process collaboration
- (general) Introduced a more flexible and robust interface for the polynomial commitment scheme (PCS)
- (general) Enabled configurable SIMD support to allow arbitrary levels of parallelism
- (zkML) Optimized memory usage by sharing the full computation graph and witness
- (zkML) Fine-grained control over CPU resource allocation to align with proof requirements
- (zkML) Clear separation of setup, proving, and verification phases for cleaner service integration
- (zkML) Improved efficiency in merging multiple PCS claims
- (minor) Bug fixes and removal of deprecated features
Empower Shared Memory
Shared memory has always been a core component of our proving system, enabling multiple processes to efficiently collaborate on the same data. However, using shared memory in scenarios where each process contains multiple threads has been less explored. We've now implemented and tested our shared memory tools under this model, achieving better concurrency and memory efficiency.
Additionally, we've added full support for data types requiring alignment beyond 64 bits (e.g., AVX256, AVX512), manually handling pointer arithmetic and alignment where needed.
Flexible PCS Interface
We've improved the interface of our polynomial commitment scheme to better handle cases where the polynomial is smaller than the setup size. Depending on the scheme in use, we either commit directly to the smaller polynomial for performance, or pad it to satisfy proof requirements. For example, in the KZG scheme, it’s straightforward to commit to a smaller polynomial using a subset of the setup. In contrast, Hyrax requires zero-padding to ensure completeness and soundness.
Flexible SIMD
We’ve extensively used SIMD instructions in Expander. Previously, each field had a fixed SIMD width based on native CPU capabilities (e.g., AVX256, AVX512). However, there are cases where increasing SIMD parallelism can better utilize the full capacity of a CPU core. We've added support for flexible SIMD configurations, allowing arbitrary SIMD sizes and improving overall throughput.
Share Computation Graph and Witness
In zkML applications, our machine learning models are first compiled into a computation graph with multiple kernels, which are then fed into the prover. Thanks to recent shared memory improvements, we've significantly reduced memory usage during this step. For instance, proving a VGG network now uses under 8GB, which makes it feasible on a personal laptop.
Note that storing the network and quantized witness alone exceeds 3GB in our case, meaning the prover itself adds minimal overhead. We're now working toward streaming witnesses, which would further enable proving large models on home devices.
Better Control of Computation Resources
Algorithms often aggressively consume computational resources. However, careful management can both improve performance and reduce cost. In zkML, we typically operate with high levels of data parallelism. Previously, CPU resources were greedily allocated for each parallel task, leading to oversubscription when parallelism exceeded available capacity.
With this update, we’ve introduced manual control over CPU usage, allowing users to allocate only a subset of available resources without affecting the proof. The generated proof remains consistent, as if it were produced with unlimited resources.
zkML as a Service
We’ve now fully separated the setup, proving, and verification phases of our zkML system, enabling the prover to operate as a standalone service. This mirrors real-world deployment scenarios, and we’ll be launching our zkML service soon.
Combine Multiple PCS Claims
Polynomial commitment schemes (PCS) are foundational to proving systems. In zkML, our system naturally requires opening PCS commitments at multiple points. Previously, we padded polynomials before attempting to merge multiple claims. We've now optimized this with a dedicated sumcheck protocol that efficiently merges multilinear polynomials and their corresponding claims. The total overhead is now linear in the sum of the polynomial sizes, which is asymptotically optimal.
Summary
This update pushes our proving system closer to practical, scalable deployment, particularly for zkML. With improvements in shared memory, configurable SIMD and PCS handling, and reduced memory overhead, we’re enabling faster, more flexible proofs that can run even on personal devices. The clearer separation of proving phases also lays the foundation for service-oriented zkML applications.