Connect with us
Google and NVIDIA Detail Infrastructure to Lower AI Inference Costs and Address Enterprise Security

Artificial Intelligence

Google and NVIDIA Detail Infrastructure to Lower AI Inference Costs and Address Enterprise Security

Google and NVIDIA Detail Infrastructure to Lower AI Inference Costs and Address Enterprise Security

At the Google Cloud Next conference, Google and NVIDIA outlined a hardware roadmap specifically designed to manage the cost of AI inference at scale. The two companies detailed a new class of A5X bare-metal instances that run on the NVIDIA Vera Rubin NVL72 rack-scale system. Through hardware and software co-design, this architecture aims to deliver up to ten times lower inference cost per token compared to previous generations, while also achieving ten times higher token throughput per megawatt.

Connecting thousands of processors requires massive bandwidth to prevent processing delays. The A5X instances address this challenge by pairing NVIDIA ConnectX-9 SuperNICs with Google’s Virgo networking technology. This configuration can scale to 80,000 NVIDIA Rubin GPUs within a single site cluster, and up to 960,000 GPUs across a multisite deployment. Operating at this scale requires sophisticated workload management, as routing data across nearly a million parallel processors demands exact synchronization to avoid idle compute time.

Data Governance and Sovereign Cloud Security

Beyond raw processing capabilities, data governance remains a primary issue for enterprise deployments. Highly regulated sectors, including finance and healthcare, often stall machine learning initiatives due to data sovereignty requirements and the risks of exposing proprietary information. To address these compliance mandates, Google Gemini models running on NVIDIA Blackwell and Blackwell Ultra GPUs are entering preview on Google Distributed Cloud.

This deployment method allows organizations to retain frontier models entirely within their controlled environments, alongside their most sensitive data stores. The architecture incorporates NVIDIA Confidential Computing, a hardware-level security protocol that ensures training models operate within a protected environment where prompts and fine-tuning data remain encrypted. The encryption prevents unauthorized parties, including the cloud infrastructure operators themselves, from viewing or altering the underlying data.

For multi-tenant public cloud environments, a preview of Confidential G4 VMs equipped with NVIDIA RTX PRO 6000 Blackwell GPUs introduces the same cryptographic protections. This gives regulated industries access to high-performance hardware without violating data privacy standards. This release represents the first cloud-based confidential computing offering for NVIDIA Blackwell GPUs.

Operational Overhead in Agentic AI Training

Building multi-step agentic systems requires connecting large language models to complex application programming interfaces, maintaining continuous vector database synchronization, and actively mitigating algorithmic hallucinations during execution. To streamline this heavy engineering requirement, NVIDIA Nemotron 3 Super is now available on the Gemini Enterprise Agent Platform. The platform provides developers with tools to customize and deploy reasoning and multimodal models specifically designed for agentic tasks.

The broader NVIDIA platform on Google Cloud is optimized for various models, including Google’s Gemini and Gemma families. This gives developers the tools to construct systems that reason, plan, and act. Training these models at scale introduces heavy operational overhead, particularly when managing cluster sizing and hardware failures during long reinforcement learning cycles.

Google Cloud and NVIDIA introduced Managed Training Clusters on the Gemini Enterprise Agent Platform. This includes a managed reinforcement learning API built with NVIDIA NeMo RL. The system automates cluster sizing, failure recovery, and job execution, allowing data science teams to concentrate on model quality rather than low-level infrastructure management.

Legacy Architecture Integration and Physical Simulations

The integration of machine learning into heavy industry and manufacturing presents a different class of engineering challenges. Connecting digital models to physical factory floors requires exact physical simulations, massive compute power, and standardization across legacy data formats. NVIDIA’s AI infrastructure and physical AI libraries are now available on Google Cloud, providing the foundation for organizations to simulate and automate real-world manufacturing workflows.

Major industrial software providers, such as Cadence and Siemens, have made their solutions available on Google Cloud, accelerated by NVIDIA infrastructure. These tools power the engineering and manufacturing of heavy machinery, aerospace platforms, and autonomous vehicles. Manufacturing firms often run on decades-old product lifecycle management systems, making the translation of geometry and physics data difficult.

By utilizing NVIDIA Omniverse libraries and the open-source NVIDIA Isaac Sim framework via the Google Cloud Marketplace, developers can bypass some of these translation issues. This allows them to construct physically accurate digital twins and train robotics simulation pipelines prior to physical deployment.

The developments outlined at the conference signal a continued focus by major cloud providers on closing the gap between raw compute performance and the practical, secure deployment of AI at industrial scale. Further previews and general availability timelines for both the A5X instances and the new confidential computing offerings are expected to be announced in the coming quarters, as these companies work to meet evolving enterprise demands for cost efficiency and data sovereignty.

More in Artificial Intelligence