Most of an LLM's compute is matrix multiply. At 1-bit, multiplies become XNOR + popcount — gates that cost a fraction of a femtojoule.
The model runs locally. No GPU clusters. No per-token API meter. No rate limits. Use AI as much as you want, with no hidden costs.
Designed to run on the silicon shipping today — ARM, Qualcomm, Apple Silicon — while future-proofing for custom 1-bit accelerators.
Perplexity is the standard measure of language-model quality - lower is better. In an internal feasibility study in the context of an underparametised model, our model beats BitNet by 6× and 2-bit QAT by 18×, while using less memory than either. Current methods collapse on the task, our method retains a level of performance.
*In internal testing.