Tachyum will begin mass production of its Prodigy Universal Processor (PUP) later this year. The product which combines the functionality of a CPU, GPU, and TPU in a single unit threatens to turn the AI market on its head.

The 192-core 5nm processor delivers 4.5 times the performance of the best processors for cloud workloads, can be up to three times better than GPUs for high-performance computing (HPC), and can be six times more effective than GPUs for AI applications. Because the components incorporate functionality for various types of workloads, it can dynamically switch between computational domains while also eliminating the need for expensive hardware dedicated to AI workloads.

What IT Do

Cheaper and More Efficient?

Prodigy is purported to be significantly better than the best performing processors currently available in hyperscale, HPC and AI markets. Prodigy delivers up to 3x the performance of the highest performing x86 processors for cloud workloads, up to 3x compared to the highest performing GPUs for HPC, and up to 6x for AI applications. Because of its utility for both high performance and line of business applications. Prodigy powered data center servers can seamlessly and dynamically switch between workloads.

AI

The Prodigy Cloud/AI/HPC supercomputer processor chip offers 4x the performance of the fastest Xeon, has 3x more raw performance than NVIDIA’s H100 on HPC and has 6x more raw performance on AI training and inference workloads, and up to 10x performance at the same power. Prodigy eliminates the need for expensive dedicated AI hardware and significantly increases server utilization.

Unlike other CPU and GPU solutions, Tachyum’s Prodigy was designed to handle matrix and vector processing from the ground up, rather than as an afterthought. Among Prodigy’s vector and matrix features are support for a range of data types (FP64, FP32, TF32, BF16, Int8, FP8 and TAI); 2×1024 bit vector units per core; AI sparsity and super sparsity support; and no penalty for misaligned vector loads or stores when crossing cache lines. This built in support offers high performance for AI training and inference workloads, increases performance and reduces memory utilization.

Data Centers

Without Prodigy, hyperscale data centers must use a combination of different CPU, GPU and TPU hardware, for these different workloads, creating inefficiency, expense, and the complexity of separate supply and maintenance infrastructures. Using specific hardware dedicated to each type of workload (e.g. data center, AI, HPC), results in underutilization of hardware resources, and more challenging programming, support, and maintenance. Prodigy delivers unprecedented data center performance, power, and economics.

Prodigy’s capability to seamlessly switch among these various workloads radically changes the competitive landscape and the economics of data centers. Prodigy is designed to overcome the tests of increasing data center power consumption and stalled performance scaling.

Side by Side Comparison of Similar Model

CATEGORYAMD EPYC 8534PTACHYUM PRODIGY T832 or T848
PLATFORMSERVERSERVER
FAMILYAMD EPYC™ 8004 SERIESPRODIGY UNIVERSAL
# OF THREADS128128
# OF CPU CORES6448-96
SYSTEM MEMORY TYPEDDR5DDR5 – 6400
SYSTEM MEMORY SPECSUp to 4800MT/s?*
L3 CACHE128MB48 MB L2+L3 cache w ECC
PCI EXPRESS® VERSIONPCIe 5.0 x96PCIe 5.0 x48
WORKLOAD AFFINITYClient-Middleware Computing, HCI, Media streaming, Networking/NFV & SW-defined storageHPC, Big AI, Exascale Supercomputers, Big Data, Cloud, Edge Computing, Storage, Databases, Data Analytics, Web Hosting
PRICING4,950 USD23,000 USD
* The specifications sheet for this processor family does not state system memory

How IT Do It

Prodigy is the smallest and fastest general purpose, 64-bit processor, requiring 10x less power and reducing server cost by 3x. New proprietary software has made many parts of the hardware found in a typical processor redundant. Shorter wires, due to a smaller core, translates into much greater speed for the processor. How Prodigy overcomes the hardware inefficiencies that make Generative AI cost prohibitive and energy excessive; and how it enables quantization using 8-bit floating point (FP8) to enable enormous model sizes.

The PUP components feature DDR5 RAM controllers and high performance but low power Digital signal processing (DSP). DSP based PHY or the physical layer that provides an electrical, mechanical, and procedural interface to the transmission medium is incorporated into the Prodigy Universal Processor allow it to achieve speeds of 6400 Mega Transfers per second (MT/s) at nominal voltage for Prodigy chip which provides headroom for expected speeds of up to, or even over, 7200 MT/s.

The Prodigy Family address a wide array workload, providing the same high performance and efficiency.

The critical components are supplied by a global leader in high-speed DDR DRAM controllers and DDR DRAM PHYs for the world’s technology infrastructure. The quality of PUP and the support provided by working closely with Tachyum engineers have allowed them to close DDR5 timings. Samsung 1 beta both 16Gb and 32Gb monolithic DDR5 DRAM, Micron 1 beta DDR5, and Hynix 1 beta DDR5 enable the processor family in reaching up to 7200 MT/s performance.

Prodigy’s Floating-Point Unit (FPU) carries out operations on floating-point numbers, such as addition, subtraction, multiplication, division, and square root. The FPU on a Field Programmable Gate Array (FPGA) runs vector operations, including mask operations and operations of unaligned vectors. The vectorization in the compiler reaching the production stage and vectorizing compilers and vectorized libraries are fully enabled on coming chip shipments.

Conclusion

PUP eliminates the need for expensive dedicated AI hardware and dramatically increases server utilization. Tachyum’s Prodigy integrates 128 high performance custom designed 64-bit compute cores, to deliver up to 4x the performance of the highest performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.

In 2023, Tachyum received a major purchase order from a U.S. company to build a large-scale system based on its 5nm Prodigy Universal Processor chip. With performance of 50 EF of FP64 and 8 ZF of AI training for large language models, the system has gained a lot of attention from interested parties looking to build a similar scale system for their AI applications and workloads.

Side by Side Comparison of Similar Model

CATEGORYAMD EPYC 7763TACHYUM PRODIGY T16192-HT
PLATFORMSERVERSERVER
FAMILYAMD EPYC™ 7003 SERIESPRODIGY UNIVERSAL
# OF THREADS128?
# OF CPU CORES64192
SYSTEM MEMORY TYPEDDR4DDR5 – 6400
SYSTEM MEMORY SPECSUp to 3200MT/s?*
L3 CACHE256MB48 MB L2+L3 cache w ECC
PCI EXPRESS® VERSIONPCIe 4.0 x128PCIe 5.0 x96
WORKLOAD AFFINITYAnalytics, Cache-sensitive scale-up/out, CAE/CFD/FEA, ERM/SCM/CRM apps, High-capacity data mgmt (NR/RDBMS), VM DensityHPC, Big AI, Exascale Supercomputers, Big Data, Cloud, Edge Computing, Storage, Databases, Data Analytics, Web Hosting
PRICING7,890 USD?1
*No values were presented for system memory — 1 – No pricing was available for this model.

Leave a comment

Trending