Meta working on new AI chip, next-gen GPUs for video workloads

The company also plans a new AI-optimised data centre design and the second phase of its 16,000 GPU supercomputer for AI research.

New Delhi: Meta (formerly Facebook) is building its first-generation custom silicon chip for running artificial intelligence (AI) models, saying its AI compute needs will grow dramatically over the next decade as we break new ground in AI research.

Called MTIA (Meta Training and Inference Accelerator), the in-house, custom accelerator chip will provide greater compute power and efficiency than CPUs and is customised for internal workloads.

“By deploying both MTIA chips and GPUs, we’ll deliver better performance, decreased latency, and greater efficiency for each workload,” said Santosh Janardhan, VP and Head of Infrastructure at Meta.

The company also plans a new AI-optimised data centre design and the second phase of its 16,000 GPU supercomputer for AI research.

“These efforts — and additional projects still underway — will enable us to develop larger, more sophisticated AI models and then deploy them efficiently at scale,” Janardhan added.

The next-generation data centre will be an AI-optimised design, supporting liquid-cooled AI hardware and a high-performance AI network connecting thousands of AI chips together for data centre-scale AI training clusters.

“It will also be faster and more cost-effective to build, and it will complement other new hardware such as our first in-house-developed ASIC solution, MSVP (Meta Scalable Video Processor), which is designed to power the constantly growing video workloads at Meta,” Janardhan informed.

Meta’s Research SuperCluster (RSC) AI supercomputer, which the company believes is one of the fastest AI supercomputers in the world, was built to train the next generation of large AI models to power new augmented reality tools, content understanding systems, real-time translation technology and more.

It features 16,000 GPUs, all accessible across the 3-level Clos network fabric that provides full bandwidth to each of the 2,000 training systems.

“Custom-designing much of our infrastructure enables us to optimize an end-to-end experience from the physical layer to the virtual layer to the software layer to the actual user experience,” said Meta.

Back to top button