Google with artificial intelligence in the cloud

by Blogger4 September 2023#!31Mon, 22 Jul 2024 16:14:43 +0200+02:004331#31Mon, 22 Jul 2024 16:14:43 +0200+02:00-4Europe/Rome3131Europe/Rome202431 22pm31pm-31Mon, 22 Jul 2024 16:14:43 +0200+02:004Europe/Rome3131Europe/Rome2024312024Mon, 22 Jul 2024 16:14:43 +0200144147pmMonday=4159#!31Mon, 22 Jul 2024 16:14:43 +0200+02:00Europe/Rome7#July 22nd, 2024#!31Mon, 22 Jul 2024 16:14:43 +0200+02:004331#/31Mon, 22 Jul 2024 16:14:43 +0200+02:00-4Europe/Rome3131Europe/Rome202431#!31Mon, 22 Jul 2024 16:14:43 +0200+02:00Europe/Rome7#No Comments

Google has further enhanced its AI-optimized infrastructure offerings in the cloud, introducing a new Tensor Processing Unit (TPU) called v53e and making available new A3 virtual machines, designed for training and running large AI models.

TPU v5e: Efficiency and Scalability

TPU v5e has been described by Google as the “most efficient, versatile, and scalable cloud computing unit yet.” This TPU is optimized for training and inference on medium and large models, delivering up to 2x and 2.5x higher training and inference performance per dollar compared to the previous TPU v4, while reducing costs.

TPU v5e pods can accommodate up to 256 chips, delivering over 400 Tb/s of aggregate bandwidth and 100 petaOps of INT8 performance. Additionally, TPU v5e can support up to 8 different virtual machine configurations, allowing Google Cloud customers to easily scale their infrastructure to their needs.

Compatibility and Integration

The v5e TPUs natively support frameworks like JAX, PyTorch, and TensorFlow, and integrate with open-source tools like Transformers and Accelerate by Hugging Face, PyTorch Lightning, and Ray. Google also introduced Multislice technology, which enables large-scale model training by leveraging thousands of interconnected v5e and v4 TPUs.

New A3 Virtual Machines: Enhanced Performance

The new A3 virtual machines follow the success of the G2 and are optimized for AI workloads. The A3s are equipped with eight NVIDIA H100 Tensor Core GPUs with Transformer Engines, allowing them to run models with trillions of parameters.

Combining Google cloud capabilities with NVIDIA GPUs enables 3x faster training and up to 10x greater network bandwidth than the previous generation of VMs. Each A3 VM features dual 4th Gen Intel Xeon processors and 2TB of host memory. Additionally, with NVIDIA NVLink technology, the new VMs offer 3.6 TB/s of bi-sectional GPU bandwidth.

Personalization and Innovation

These recent announcements from Google Cloud represent a step forward in supporting businesses and innovators in developing and deploying increasingly advanced AI models. Consumers can benefit from the ability to customize their infrastructure to their needs, leveraging the power of cloud AI offered by Google.

In short, with the new TPU v5e, A3 virtual machines, and advanced technologies like Multislice, Google is redefining the cloud AI landscape, offering increasingly high-performance and scalable solutions for training and running AI models of any size.