Joint collaboration between Zyphra, AMD, and IBM delivers ZAYA1, the first large-scale Mixture-of-Experts foundation model trained entirely on an AMD platform using AMD Instinct MI300X GPUs, AMD Pollara networking & ROCm software.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.

Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.

For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.

Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.

For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.

IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.

IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.

IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.

IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.

IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.

Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.

IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.


Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.


We present histograms depicting distribution of cluster sizes in all the datasets (see Fig. 7-11). Please, note that all the figures are in log-log scale. We see a significant drop in the number of clusters starting from the size of around 100. This drop is present both in DCLM and FineWeb-Edu2 (see Fig. 8 and 9 respectively), and most likely is explained by a combination of the deduplication strategy and quality when creating both datasets: DCLM deduplication was done individually within 10 shards, while FineWeb-Edu2 was deduplicated within every Common Crawl snapshot. We find that large clusters usually contain low quality material (repeated advertisements, license agreements templates, etc), so it’s not surprising that such documents were removed. Notably, DCLM still contained one cluster with the size close to 1 million documents, containing low quality documents seemingly coming from the advertisements (see Appendix).We find both Zyda-1and Dolma-CC contain a small amount of duplicates, which is expected, since both datasets were deduplicated globally by their authors. Remaining duplicates are likely false negatives from the initial deduplication procedure. Note, that distribution of duplicates clusters sizes of these two datasets (Fig. 10 and 11) don’t contain any sharp drops, but rather hyper exponentially decreases with cluster size.




Below is an example of the document from the largest cluster (~1M documents) of duplicates in DCLM (quality score 0.482627):
Is safe? Is scam?
Is safe for your PC?
Is safe or is it scam?
Domain is SafeSafe score: 1
The higher the number, the more dangerous the website.Any number higher than 1 means DANGER.
Positive votes:
Negative votes:
Vote Up Vote Down review
Have you had bad experience with Warn us, please!
Below one will find a few documents with different quality scores from DCLM coming from the same duplicates cluster. Quality score varies from ~0.2 to ~0.04.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Reported scores underlined.
Pass@1 scores with greedy sampling.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
Pass@1 scores with greedy sampling. Livebench 2024-11-25.
Bold: Best score at 1.5B scale w/ greedy sampling
*reported scores
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
Evals (reported underlined). All numbers pass@1 estimated using n=16
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Footnote: Training on the Eurus-2-RL dataset did not match the DeepScaleR math evaluation numbers, possibly due to lower quality synthetic math questions in NuminaMath-CoT providing a mixed training signal, or the solvability filtering process with QwQ-preview reducing the difficulty of the dataset. Additionally, the relatively small percentage of code (5%) likely led to math dominating training at the expense of code performance. Training on domain specific datasets and merging resulting models seems to be a potential way to counteract this problem, as demonstrated with SFT in Light-R1.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.
Zyphra today announced a major milestone in its AI infrastructure and model development with the release of a technical report showing how Zyphra has demonstrated large scale training on AMD GPUs and networking.
The paper introduces ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on an integrated AMD platform (AMD Instinct™ GPUs, AMD Pensando™ networking interconnect & ROCm software stack) as a viable high-performance, production-ready alternative platform for frontier-scale AI training.
Despite operating at a fraction of the active parameter count, ZAYA1-base (8.3B total parameters, 760m active) achieves performance comparable to leading models such as Qwen3-4B (Alibaba) and Gemma3-12B (Google), and outperforms models including Llama-3-8B (Meta) and OLMoE across reasoning, mathematics, and coding benchmarks.
“Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” said Krithik Puthalath, CEO of Zyphra. “ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”
Mixture-of-Experts (MoE) models have become the foundational architecture for modern, frontier AI systems, using specialized expert networks that activate dynamically to deliver greater efficiency, scalability, and reasoning performance than traditional dense architectures. This paradigm shift defines today’s leading frontier models including GPT-5, Claude-4.5 DeepSeek-V3 and Kimi2 all of which leverage MoE designs to expand capability while optimizing compute utilization. ZAYA1 represents the first large-scale pretraining of an MoE model on an AMD platform, demonstrating that the AMD AI ecosystem is ready to power frontier-class AI development end-to-end.
Zyphra co-designed ZAYA1 around AMD silicon, introducing innovations such as an advanced routing architecture, Compressed Convolutional Attention (CCA), and lightweight residual scaling to achieve higher training throughput and more efficient inference through improved expert utilization.
“Zyphra’s work with AMD and IBM demonstrates how an open platform built on AMD Instinct GPUs and AMD Pensando networking can deliver breakthrough performance and efficiency for large-scale AI,” said Philip Guido, EVP and Chief Commercial Officer, AMD. “This milestone underscores how AMD hardware and software innovations are enabling the next wave of frontier AI development with industry leaders.”
Building on prior collaborative work and to achieve this milestone, Zyphra collaborated closely with AMD and IBM to design and deploy a large-scale training cluster powered by AMD Instinct GPUs with AMD Pensando networking (ethernet) interconnect. The jointly engineered AMD and IBM cluster announced earlier this quarter, combines AMD Instinct MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture providing the foundation for ZAYA1’s large-scale pretraining.
“As AI creates opportunities for enterprises to innovate, foundation models are key to unlocking accelerated development, efficiency and productivity,” said Alan Peacock, GM of IBM Cloud. "We are proud to deliver IBM’s scalable AI infrastructure as the foundation for ZAYA1’s large-scale model and are excited to continue collaborating with AMD on AI model development across our mutual clients.”
The joint collaboration demonstrates how Zyphra’s advanced AI research and optimized software stack, combined with AMD’s platform powered by IBM’s infrastructure through IBM Cloud can deliver the performance needed for reliable frontier-scale AI model development.
For more information please reference the technical report on arXiv, the Zyphra technical blog post, and AMD blog post.
Zyphra is a full-stack, open-source superintelligence company based in San Francisco on a mission to build human-aligned AGI.
Zyphra’s core research thesis toward general superintelligence is focused on developing next-generation multimodal architectures for long-context reasoning, long-term memory, and continual learning.
For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. Billions of people, leading Fortune 500 businesses and cutting edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work and play. AMD employees are focused on building leadership high- 3 performance and adaptive products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, LinkedIn, Facebook and X pages.
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain a competitive edge in their industries. Thousands of governments and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit www.ibm.com for more information.