Zyphra’s NeuraNoC is a pioneering packet-switched network-on-chip (NoC), named for its routing mechanism that resembles the spiking behavior of neurons in the brain by encoding processor connections as Bernoulli processes.
Zyphra is excited to release Zamba2-mini, a state-of-the-art SLM for on-device applications. Zamba2-mini achieves highly competitive evaluation scores and performance numbers and fits in a tiny memory footprint of <700MB at 4bit quantization. Zamba2- mini (1.2B) ~ Llama2 7B
Training hybrid models is hard, and papers tend to gloss over the practical engineering work that goes into building good ones. The purpose of this cookbook is to enable other technical groups to hit the ground running when building their own hybrid (SSM, Transformer, MoE) models.
In this post, we discuss and illustrate the usefulness of graph-based RAG systems for multi-hop Question-Answering (QA) tasks. Multi-hop questions are those that require a chain of multiple retrieval steps to answer.
This blog post discusses the key factors to consider when deploying models on edge devices. We emphasize the significant hardware constraints of these devices, and identify techniques to efficiently utilize local hardware resources - quantization, low-rank adapters, and real-time parameter offloading from storage.
We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.
Investors need to think outside the box when it comes to addressing artificial intelligence’s energy problem.
Zyphra is excited to announce Tree Attention, a novel method for efficiently parallelizing multi-GPU transformer decoding with significant advantages in speed and memory.
Effective retrieval from long-form conversational data faces two unique problems compared to static database retrieval.
An Efficient And Faster Small Language Model
A 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv
An LLM training dataset with 1.3T tokens
An SSM-hybrid foundation model to bring AI to more devices
The Startup Tackling Karpathy’s Vision
A Novel Architecture that Combines the Mamba SSM with MoE to Obtain the Benefits of Both
Zyphra is excited to release Zamba2-small, a 2.7B state-of-the-art (SOTA) small language model for on-device applications.