Zyphra Blog

Explore our latest product and research announcements
New

Frontier Training Kernels for Transformers (FA2) and SSMs (Mamba2) on AMD Instinct MI300X Accelerators

In this blog, we demonstrate the first backwards kernels to surpass H100s for both transformers (Flash Attention v2) and hybrid models (Mamba2), which enables training foundation models on AMD Instinct MI300x accelerators.

Category
Research
Publication Date
December 10, 2024
Authors
Quentin Anthony
Read More
New

Building Zyda-2, a 5 Trillion Token High-Quality Dataset, with NVIDIA NeMo Curator

Zyphra is excited to release Zyda2, a 5-trillion token dataset composed of filtered and cross-deduplicated DCLM, FineWeb-Edu, Zyda-1, and Dolma v1.7's Common Crawl portion. Leveraging NVIDIA NeMo Curator, we've dramatically accelerated data processing from 3 weeks to 2 days while reducing costs.

Category
Data
Publication Date
October 15, 2024
Authors
Zyphra & Nvidia
Read More
New

Zamba2-7B

Zyphra is excited to release Zamba2-7B, a state-of-the-art small language model. At the 7B scale, we outperform the leading models of Mistral, Google’s Gemma and Meta’s Llama3 series in both quality and performance. We believe Zamba2-7B is the leading model for running on-device and on consumer GPUs as well as for many enterprise applications which require a powerful but compact and efficient model for natural-language tasks.

Category
Model
Publication Date
October 14, 2024
Authors
Zyphra Team
Read More
New

Reaching 1B Context Length with RAG

We demonstrate a retrieval system extending any off-the-shelf LLM to 1B (billion) context on a standard CPU during inference time. These preliminary results suggest our algorithm is a promising approach for performing long-context tasks especially in compute constrained scenarios (on device, cost-effective on-prem & cloud etc).

Category
Research
Publication Date
October 21, 2024
Authors
Nick Alonso, Beren Millidge
Read More
New

NeuraNoC - A neuroscience-inspired packet switch network-on-chip (NoC)

Zyphra’s NeuraNoC is a pioneering packet-switched network-on-chip (NoC), named for its routing mechanism that resembles the spiking behavior of neurons in the brain by encoding processor connections as Bernoulli processes.

Category
Research
Publication Date
November 10, 2023
Authors
Tomas Figliolia
Read More
New

Zamba2-mini (1.2B)

Zyphra is excited to release Zamba2-mini, a state-of-the-art SLM for on-device applications. Zamba2-mini achieves highly competitive evaluation scores and performance numbers and fits in a tiny memory footprint of <700MB at 4bit quantization. Zamba2- mini (1.2B) ~ Llama2 7B

Category
Model
Publication Date
August 27, 2024
Authors
Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Beren Millidge
Read More
New

The Zyphra Training Cookbook

Training hybrid models is hard, and papers tend to gloss over the practical engineering work that goes into building good ones. The purpose of this cookbook is to enable other technical groups to hit the ground running when building their own hybrid (SSM, Transformer, MoE) models.

Category
Research
Publication Date
August 26, 2024
Authors
Quentin Anthony, Beren Millidge, Paolo Glorioso, and Yury Tokpanov
Read More
New

Understanding Graph-based RAG and Multi-Hop Question Answering

In this post, we discuss and illustrate the usefulness of graph-based RAG systems for multi-hop Question-Answering (QA) tasks. Multi-hop questions are those that require a chain of multiple retrieval steps to answer.

Category
Research
Publication Date
August 22, 2024
Authors
Authors: Nick Alonso, Beren Millidge
Read More
New

Edge LLMs: Benefits, Challenges, and Solutions

This blog post discusses the key factors to consider when deploying models on edge devices. We emphasize the significant hardware constraints of these devices, and identify techniques to efficiently utilize local hardware resources - quantization, low-rank adapters, and real-time parameter offloading from storage.

Category
Research
Publication Date
August 21, 2024
Authors
Andrew Greene, Kamil Rocki, Tomas Figliolia, Travis Oliphant, Beren Millidge
Read More
New

The Unreasonable Ineffectiveness of the Deeper Layers

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Category
Research
Publication Date
March 28, 2024
Authors
Paolo Glorioso, Beren Millidge
Read More
New

How investors can help solve AI’s energy problem

Investors need to think outside the box when it comes to addressing artificial intelligence’s energy problem.

Category
Press
Publication Date
August 14, 2024
Authors
Cipher
Read More
New

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Zyphra is excited to announce Tree Attention, a novel method for efficiently parallelizing multi-GPU transformer decoding with significant advantages in speed and memory.

Category
Research
Publication Date
August 7, 2024
Authors
Vasudev Shyam, Jonathan Pilault, Emily Shepperd, Quentin Anthony, Beren Millidge
Read More
New

Toward Conversational Agents with Context and Time Sensitive Long-term Memory

Effective retrieval from long-form conversational data faces two unique problems compared to static database retrieval.

Category
Research
Publication Date
June 4, 2024
Authors
Nick Alonso, Tomás Figliolia, Anthony Ndirango, Beren Millidge
Read More
New

Zyphra Launches Zamba2

An Efficient And Faster Small Language Model

Category
Press
Publication Date
July 31, 2024
Authors
Forbes
Read More
New

Zyphra's Zyda

A 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv

Category
Press
Publication Date
June 7, 2024
Authors
VentureBeat
Read More
New

Zyphra Debuts Zyda

An LLM training dataset with 1.3T tokens

Category
Press
Publication Date
June 7, 2024
Authors
SiliconAngle
Read More
New

Zyphra Releases Zamba

An SSM-hybrid foundation model to bring AI to more devices

Category
Press
Publication Date
April 16, 2024
Authors
VentureBeat
Read More
New

What To Make of OpenAI’s New Board

The Startup Tackling Karpathy’s Vision

Category
Press
Publication Date
March 11, 2024
Authors
TheInformation
Read More
New

Zyphra Open-Sources BlackMamba

A Novel Architecture that Combines the Mamba SSM with MoE to Obtain the Benefits of Both

Category
Press
Publication Date
February 6, 2024
Authors
MarkTechPost
Read More
New

Zamba2-small (2.7B)

Zyphra is excited to release Zamba2-small, a 2.7B state-of-the-art (SOTA) small language model for on-device applications.

Category
Model
Publication Date
July 28, 2024
Authors
Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Beren Millidge
Read More
New

Zyda

Zyphra is pleased to announce Zyda, a 1.3T trillion-token open dataset for language modeling.

Category
Data
Publication Date
June 7, 2024
Authors
Yury Tokpanov, Beren Millidge, Paolo Glorioso, Jonathan Pilault, James Whittington, Quentin Anthony
Read More
New

Zamba

Zyphra is proud to release Zamba, a novel 7B parameter foundation model.

Category
Model
Publication Date
April 16, 2024
Authors
Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge
Read More
Data
Building Zyda-2
Zyphra is excited to release Zyda2, a 5-trillion token dataset composed of filtered and cross-deduplicated DCLM, FineWeb-Edu, Zyda-1, and Dolma v1.7's Common Crawl portion.
Read More
Research
Tree Attention
Zyphra is excited to announce Tree Attention, a novel method for efficiently parallelizing multi-GPU transformer decoding with significant advantages in speed and memory.
Read More

Browse More News

No results found.

Please try different key words.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.