Developers & Practitioners

Scaling the Next Generation of Global Innovation: How Google Supports Top Startups Around the World

Thu, 18 Jun 2026 12:51:00 +0000

In the high-stakes world of tech entrepreneurship, the leap from a brilliant prototype to a scalable, market-defining business can be brutal. Founders need much more than capital; they need deep architectural guidance, sovereign-level policy alignment, and technical systems engineered to enable rapid growth.

Joy’s Law states: "[N]o matter who you are, most of the smartest people work for someone else."

We recognize that true innovation inherently happens “elsewhere.” This philosophy drives our active support of global accelerators across a diverse, geographic footprint of innovation markets to tap into this decentralized brilliance. For over a decade, our Google accelerator program has acted as a catalyst for this exact transition. By bridging the gap between raw entrepreneurial ambition and Google’s world-class engineering ecosystem, the program has quietly built one of the most resilient, high-performing startup portfolios on Earth.

The Power of the Network: A Decade by the Numbers

While many startup accelerators struggle with significant failure rates, our accelerator program has set a high bar for long-term success. By pairing top-tier founders and CTOs with customized, deeply technical engagement from Google, along with learned industry best practices, the program has consistently helped build both highly valuable companies and products.

The scope of this global network is impressive:

Metric	Impact to Date
Global Footprint	2,011 startups supported across 88 countries
Program Experience	144 cohorts graduated over 10 years
Survival Rate	93% portfolio survival rate
Financial Momentum	$46.3B in funding raised; $135.1B collective portfolio valuation
Startup Job Creation	305,900 employees across the entire startup portfolio

The Developer Value-Add: By design, this isn't a high-level business bootcamp. The founders of Accelerator startups identify a deeply technical problem that they then work on with bespoke support from Google to solve. These startups get access to Google engineers and product managers, along with access to our platforms and tools. From advising on architectures to optimizing AI model pipelines, Google experts work directly with the founding teams to help tackle some of their most complex technical hurdles.

Strategic Momentum: Geopolitics, Green Infrastructure, and Robotics

The startup ecosystem is shifting rapidly, and our accelerator program is evolving along with it. This year, Google launched new initiatives to support global economic development and explore and evolve critical environmental infrastructure. Just a few examples:

Sovereign-Level Policy & Strategic Wins

Australia: Accelerator alumni have successfully anchored the Google AI stack directly into the country's national R&D strategy, engaging directly with Members of Parliament in Canberra.
Canada: The Canadian Office of Innovation, Science, and Economic Development officially recognized and cited the impact of the Canada accelerator program in its formal report for the G7 Summit.

Cutting-Edge Frontier Programs

This year marks a major expansion into specialized, frontier tech verticals:

The Google DeepMind Accelerator (Europe): Dedicated strictly to hardening technical builds for AI-native robotics companies, effectively bridging the gap between lab prototyping and commercial market success.
The GDM Accelerator (AI for Planet) in APAC: A joint initiative between Google DeepMind and Google's Sustainability teams. The program focuses heavily on biodiversity foundation models to position Google at the forefront of the critical ESG (Environmental, Social, and Governance) infrastructure market.
Japan Relaunch: Marking a major strategic re-entry into one of Asia's most vital technology hubs.

The hive mind opportunity

To maximize the power of this unique network, earlier this year we successfully transitioned our disparate regional alumni networks into a Unified Alumni Community. We now bring together more than 1,750 startups and 3,000 founders across 90+ countries through shared online channels and the opportunity to attend in-person events, where founders get access to Google senior leadership and our newest models and tech, opportunities to directly influence the development of new Google products to better support their businesses’ growth, and learn from and support each other.

Don't Miss It: Upcoming Demo Days

The culmination of each of our intense accelerator journeys is Demo Day, where top-tier cohorts showcase their technical builds and new market-defining concepts. You can watch these milestones live streamed directly via the Google for Startups events on YouTube. Mark your calendar for the remaining 2026 showcases:

Summer & Fall 2026

Africa Accelerator: June 19
Middle East, North Africa, and Turkey Accelerator: June 26
Korea Accelerator: July 15
Brazil Accelerator: July 16
Europe and Israel DeepMind Accelerator (Robotics): September 11
India: September 30

Winter 2026

India Accelerator: November 4
Southeast Asia Accelerator: November 13
North America Accelerator (Energy): November 19
South Africa Accelerator: December 11
Europe and Israel (Energy): December 11
Global Google.org Accelerator(Government Innovation): December 11

Open & Upcoming Applications

If you are a founder or CTO looking to radically scale your technical infrastructure, optimize your product market-fit, and gain equity-free support from Google's global talent pool, applications are officially moving.

Applications Open Right Now:

GFSA Southeast Asia (Leverage the newly launched AI Startup Innovation Corridor connecting SEA to Silicon Valley)
GFSA China
Google.org Accelerator: AI for Science

Agent Factory Recap: 100X engineering with AI agents in Google Antigravity 2.0

Thu, 18 Jun 2026 07:00:00 +0000

In this episode of the Agent Factory, I sat down with Rody Davis, one of Google’s top agentic engineers. We dive into the massive shift from traditional IDEs to agent-first platforms, the reality of code reviews in an AI-driven world, and how to use "skills" to perform at a 100X level.

This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.

Google Antigravity 2.0 - What is it?

Antigravity 2.0 has evolved from a simple agentic IDE into a full-scale agent-first platform. It now consists of four core pillars: a standalone desktop Agent Manager for orchestration, a robust CLI for server-side work, an SDK for custom Python-based workflows, and a specialized IDE. This unbundled approach allows developers to compose their own environment, managing multiple folders and complex project structures without being forced into a single-workspace layout.

Rody Davis on 100X Engineering

We explored the strategies elite engineers use to scale their impact and reduce the "cognitive toil" of daily development.

Scaling Impact and Reducing Toil

Timestamp: 01:55

Rody explains that AI isn't just about writing code; it's about accelerating the entire lifecycle. He uses agents to write richer test suites and prototype multiple versions of an app before committing to a framework. By offloading "toil", like building marketing sites, he can focus on high-level architecture and problem-solving.

Skills as "Context Cheat Sheets"

Timestamp: 03:05

A core philosophy in Rody’s workflow is the use of "Skills." He views skills as a way to compress context for the model. "It’s literally a cheat sheet for the agent," Rody notes. By providing the agent with specific design systems or API documentation, the model becomes significantly faster and more accurate, avoiding the latency of searching through massive, unorganized docs.

Customizations, Skills, and MCP Servers

Timestamp: 04:17

Rody walks us through the customizations tab in Antigravity 2.0, showing how to extend an agent's capabilities:

Android CLI: Building and deploying mobile apps directly from the command line.
Modern Web Guidance: Grounding the agent in the latest CSS and accessibility standards.
MCP Servers: Using the Model Context Protocol to enable features like hot reloading for Flutter and Dart.

The Bonsai Approach to Code Review

Timestamp: 05:27

Rody compares maintaining a codebase to being a Bonsai artist: constantly pruning to keep things simple. He advocates for flat architectures where state, UI, and data are strictly separated. This makes it easier for a human to "steer" the agent; if the agent starts putting files in the wrong place, the architectural violation is immediately obvious.

Do you review 100% of agent-generated code?

Timestamp: 07:11

Rody’s answer depends on the task. For a marketing site, he focuses on the visual output rather than the code. However, for backend logic, he cares deeply about API contracts and schemas. He recommends writing the first example yourself so the agent can simply "copy the pattern" for the rest of the codebase.

Building Extensions to Solve Daily Friction

Timestamp: 09:05

To solve the problem of managing files across multiple Git projects, Rody used Antigravity to build a custom macOS Finder extension in Swift. This tool allows him to filter files by time boxes (today, last week, etc.), demonstrating how agents can build specialized utilities that reduce daily friction.

Do AI engineers still write code by hand?

Timestamp: 10:22

"Oh yeah," Rody says. He still loves the syntax of languages like Go and the challenge of controlling computers. He believes it's vital to understand the building blocks deeply so that when you face a problem two years down the road, you know exactly which "old project" to reach back for.

Powering Personal Websites with Gemma 4

Timestamp: 11:42

Rody showcases his personal website, which uses Gemma 4 and Embedding Gemma to provide dynamic content recommendations offline. By vectorizing post summaries at compile time, the site can suggest related content via a local vector database without needing a live backend server.

The Factory Floor

The Factory Floor is our segment for getting hands-on. Here, we moved from high-level concepts to practical code with live demos.

Multi-Agent Parallelism in Action

Timestamp: 14:02

In this demo, Rody uses a single stream-of-thought voice prompt to build a full-stack application. We watched as Antigravity:

Spun up parallel sub-agents, including a dedicated DevOps and QA engineer. (see 19:48)
Built a multilingual note-taking app using Vite, Go, and SQLite.
Orchestrated the entire stack via Docker Compose.
Localized the app into five different languages simultaneously.

Unbundling the IDE Ecosystem

Timestamp: 15:35

We discussed why Google separated the IDE from the Agent Manager. Rody highlights that this unlocks different workflows: the CLI is perfect for SSH sessions on a Raspberry Pi, while the Agent Manager handles general knowledge work and orchestration across multiple folders.

Turning Documentation into Reusable Skills

Timestamp: 25:41

Rody shares his process for turning documentation into skills. He wrote a Go CLI that parses websites into markdown, allowing him to install hundreds of skills for the sites he visits frequently. This ensures the agent always has access to the specific version of the docs he is using.

Rapid Fire: Future Tech Predictions

Timestamp: 27:35

We put Rody on the spot with some controversial takes:

Vibe Coding: Rody believes a non-technical founder will launch a company using only vibe coding by 2026, but the real test will be maintaining it in years 2 through 5.
Production Failures: Rody agrees that vibe coding will cause significant production failures, leading to a new hot job for software engineers: consulting to solve those failures.
Codebase Health: Rody argues that poor codebase health, not context windows, is the biggest bottleneck in AI speed.

Grounding Yourself in a Changing Landscape

Timestamp: 31:10

Rody advises engineers to focus on why they were hired: to solve problems and engineer things that didn't exist before. He suggests using AI to provide better communication handoffs between colleagues, making artifacts so easy to approve that they are "ready to sign off" the moment they are handed over.

Conclusion

The era of agentic engineering is here, but as Rody Davis demonstrated, it requires more architectural discipline, not less. By treating your codebase like a Bonsai tree and your agents like an orchestra, you can move past the "toil" and focus on building the frameworks of the future.

Your turn to build

Are you ready to build anything? We’ve officially launched the #NapkinChallenge. Take a handwritten sketch of an app idea, use Antigravity 2.0 to build it, and share your creation on social media.

Try Antigravity 2.0: antigravity.google
Join the Challenge: Napkin Challenge Details
Rody’s personal website, github repo and skills

Connect with us

Rody Davis → X, LinkedIn
Shir Meir Lador → X, LinkedIn

Cloud Network Insights: end-to-end observability for the Cross-Cloud Network

Wed, 17 Jun 2026 19:30:00 +0000

In today’s digital landscape, the network is no longer confined to a single data center or even a single cloud provider. Enterprises are increasingly adopting cross-cloud strategies, connecting Google Cloud workloads to on-premises environments, other clouds like AWS and Azure, and a vast array of internet-facing applications. While this flexibility drives innovation, it can also introduce significant operational complexity. When a user experiences degradation in application performance, the critical question remains: Is it the network, the application, or something else?

We are excited to announce the general availability of Cloud Network Insights, an out-of-the-box, Google Cloud-native solution that provides comprehensive visibility into network and digital experience performance across complex multi-cloud, and hybrid environments.

Closing the visibility gap with active monitoring

Cloud Network Insights, offered in partnership with Broadcom AppNeta, expands your observability beyond Google Cloud to your entire global deployment. By utilizing active synthetic probing, the solution monitors network routes even when no user traffic is present, allowing teams to be proactive rather than reactive.

Whether the source of degradation is in the cloud, on-premises data centers, internet applications, ISPs, or last-mile connectivity, Cloud Network Insights helps you pinpoint the exact location of the bottleneck.

Cloud Network Insights integrates directly into the Google Cloud Observability suite, bringing sophisticated network intelligence into the tools you already use. With Cloud Network Insights, you get:

End-to-end network path visibility: Gain a hop-by-hop visualization of the network path between your sources and destinations. Monitor critical metrics like round-trip time (RTT), packet loss, and jitter across networks you don’t directly manage.
Digital experience insights: Go beyond the network layer to monitor digital experience for web applications. Measure DNS resolution times, HTTP response codes, and full browser page-load times to identify whether an application's degradation is due to the network or the application itself.
Proactive detection and alerting: Use synthetic testing to identify performance dips before they impact your customers. Alarms are integrated with Cloud Monitoring and Cloud Logging, enabling alerting via email, Slack, or PagerDuty.
SLA validation: Arm your team with the data needed to verify if ISPs and service providers are meeting their performance commitments.
Rapid root-cause analysis: Quickly differentiate between network problems, application-level issues, or browser performance impacts.
Integrated monitoring: Access metrics and logs directly within Google Cloud, leveraging Cloud Monitoring and Cloud Logging for dashboards and alerting. Utilize the open partner ecosystem of Google Cloud as well as support for the OpenTelemetry protocol for metrics and logs, allowing direct ingestion by OTel SDKs and collectors.
Agentic workload monitoring: Use synthetic testing to monitor connectivity and network performance to help ensure optimal connectivity to your agents and tools.

Network performance and multi-path routes to/from Google Cloud, AWS, and Azure in one view

How it works: active synthetic probing

Cloud Network Insights uses active synthetic probing technology that consists of three main components:

Monitoring Points: You deploy lightweight software agents, called Monitoring Points, into critical network segments, such as a central VPC, a remote branch, or an on-premises data center. These can be deployed as containers or virtual machines.
Synthetic probes: These Monitoring Points send small, frequent bursts of synthetic traffic (simulating a user or application) to a target destination. This allows you to monitor performance 24/7, even when no real users are on the network.
Data synchronization: The Monitoring Points send real-time performance telemetry to a central backend service. This data is then synchronized back to Google Cloud, with metrics exported to Cloud Monitoring, and alarms and events sent to Cloud Logging.

Core capabilities

Cloud Network Insights supports two primary types of monitoring to give you a full picture of your infrastructure:

1. Network performance monitoring (Layers 3 and 4)

This provides a hop-by-hop visualization of the network between a source and a destination, including.

Metrics captured: Round-trip time (RTT), packet loss, jitter, and path changes.
Single-ended mode: The agent probes an external target (like a URL, IP address or an API endpoint) that doesn't have a Monitoring Point installed.
Dual-ended mode: The Monitoring Point probes another Monitoring Point. This provides richer data, including precise one-way latency and the ability to detect asymmetric routing (when data takes a different path going out than it does coming back).

Network path metrics in Google Cloud console

2. Digital experience monitoring (Layer 7)

With digital experience monitoring, you can track the end-to-end experience of a web application. Here, you can choose from:

Browser mode: Uses a real browser engine (Selenium) to load full web pages, execute JavaScript, and render content. It measures complete page-load times to validate the actual user experience.
HTTP mode: Sends synthetic HTTP/S requests to a URL or API endpoint. This is a lightweight check for server availability, response time, and DNS/TLS performance.

Intelligence and automation

Cloud Network Insights also offers a variety of monitoring and troubleshooting capabilities.

Proactive alarms: Cloud Network Insights leverages auto-baselining to establish dynamic performance thresholds based on your historical metric data. If a metric deviates from your defined parameters, the system instantly triggers an event in Google Cloud, routing alerts directly to your team via email, Slack, or PagerDuty.
Monitoring policies: You can automate monitoring setups across large-scale environments by defining policies that dynamically create or remove paths based on custom tags. For instance, you can automatically track a core web application's performance from specific geographic regions.
Root-cause analysis: Because Cloud Network Insights extends visibility into traditionally "unwatched" areas like ISPs and transit networks, it instantly pinpoints whether a slowdown is occurring within Google Cloud, at the ISP level, or inside another cloud environment like AWS or Azure.
AI-driven insights: With integration to Gemini Cloud Assist, you can use natural language to interrogate Cloud Network Insights telemetry alongside your broader infrastructure data. Rather than manually pivoting between dashboards, ask Gemini to cross-reference specific Cloud Network Insights metrics against other Google Cloud metrics, reducing mean time to resolution (MTTR).

What customers are saying

We are already seeing strong interest from customers looking to simplify their cross-cloud operations. Organizations like Sabre and Pexip are already using Cloud Network Insights to gain clarity in their hybrid environments.

"In an environment as complex and high-scale as Sabre’s, total visibility isn't just a luxury — it's a requirement for operational resilience. Cloud Network Insights will enable us to further shift our posture towards proactive optimization. By providing granular, real-time telemetry across our global cloud footprint, it helps eliminate the traditional 'black box' of the network, allowing our teams to resolve bottlenecks before they impact the traveler experience." - Alfredo Rodriguez, VP of Cloud and Infrastructure, Sabre

“Cloud Network Insights closes the 'visibility gap' between the private corporate network and the public cloud, empowering our joint customers to pinpoint performance bottlenecks in seconds rather than hours.” - Alan Davidson, CIO, Broadcom

Get started today

Navigating complex digital ecosystems shouldn't mean sacrificing visibility. Cloud Network Insights bridges the gap across multi-cloud and hybrid environments by combining deep network performance metrics with digital experience monitoring. Coupled with direct integrations into Google Cloud Observability and Gemini Cloud Assist, your teams are empowered with intelligent alerting, robust SLA validation, and rapid root-cause analysis. We look forward to helping you gain a clearer, unified view of your Cross-Cloud Network.

You can get started in the Google Cloud console today. To learn more:

Explore our product documentation for deep dives into deploying Monitoring Points and configuring policies.
Check out the latest release notes to stay updated on new features.
Watch the overview video
Hear more about the partnership between Google Cloud and Broadcom:

Build and Deploy a Remote MCP Server to GKE in 30 Minutes

Wed, 17 Jun 2026 00:00:00 +0000

Build and Deploy a Remote MCP Server to GKE in 30 Minutes

Integrating context from tools and data sources into LLMs can be challenging, which impacts the ease of development for AI agents. To address this challenge, Anthropic introduced the Model Context Protocol (MCP), which standardizes how applications provide context to these models. Developers often want to build an MCP server for their APIs to make them available to fellow developers, allowing them to use it as context in their own applications. Google Kubernetes Engine (GKE) provides a scalable, reliable, and secure environment to deploy these remote MCP servers.

This guide shows the straightforward process of setting up a secure remote MCP server on GKE.

MCP transports

The Model Context Protocol follows a client-server architecture. It initially only supported running the server locally using the stdio transport. The protocol has since evolved and now supports remote access transports, specifically Streamable HTTP.

With Streamable HTTP, the server operates as an independent process that can handle multiple client connections. This transport uses HTTP POST and GET requests. The server must provide a single HTTP endpoint path that supports both POST and GET methods, such as https://example.com/mcp. You can learn more about the different transports in the official documentation.

Benefits of running an MCP server on GKE

Running an MCP server remotely on GKE provides several architecture benefits:

Scalability: GKE Autopilot is built to handle highly variable traffic. Since MCP Servers are stateless, GKE can scale horizontally to handle spikes in demand efficiently.
Centralized access: Teams can share access to a centralized MCP server, allowing developers to connect from local machines, Agents or pipelines instead of running redundant local servers. Updates to the central server immediately benefit everyone.
Enhanced security: The Kubernetes Gateway API combined with SSL certificates provides an easy way to force secure, encrypted traffic. This allows only secure connections to the MCP server, preventing unauthorized access.

Prerequisites

Before starting, ensure the following tools are installed:

python 3.10 or higher
uv (for package and project management, see the installation documentation)
Google Cloud SDK (gcloud)
kubectl command-line tool

Installation

Prepare environment variables

code_block: <ListValue: [StructValue([('code', 'export PROJECT_ID=$(gcloud config get-value project)\r\nexport REGION=us-central1'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edb580>)])]>

Create a folder, mcp-on-gke, to store the code for the server and deployment.

code_block: <ListValue: [StructValue([('code', 'mkdir mcp-on-gke && cd mcp-on-gke'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edb730>)])]>

Now configure the Google Cloud credentials and set the active project.

code_block: <ListValue: [StructValue([('code', 'gcloud auth login\r\ngcloud config set project $PROJECT_ID'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbaf0>)])]>

Initiate the GKE Autopilot cluster creation in the background. This process takes a few minutes, so starting it now allows the cluster to provision while you complete the rest of the setup. Make sure to use an Autopilot version that ensures Cost-Optimized Compute (CCOP) is enabled for fast autoscale.

code_block: <ListValue: [StructValue([('code', 'gcloud container clusters create-auto mcp-cluster \\\r\n --region $REGION \\\r\n --release-channel rapid \\\r\n --async'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edba60>)])]>

Use uv to create a project, which will generate a pyproject.toml file.

code_block: <ListValue: [StructValue([('code', 'uv init'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbdf0>)])]>

Next, create the additional files needed: server.py for the MCP server code, test_server.py for testing, and a Dockerfile for the container deployment.

Math MCP server

Large language models are excellent at non-deterministic tasks, such as generating text, summarizing ideas, and reasoning about concepts. However, they can be unreliable for deterministic tasks like math operations. To solve this, developers can create tools that provide valuable context. Using FastMCP, a framework for building MCP servers in Python, it is possible to create a simple math server with two tools: add and subtract.

First, add FastMCP as a dependency.

code_block: <ListValue: [StructValue([('code', 'uv add fastmcp\r\nuv add asyncio'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbe50>)])]>

Copy the following code into server.py to create the server.

code_block: <ListValue: [StructValue([('code', 'from fastmcp import FastMCP\r\nfrom starlette.requests import Request\r\nfrom starlette.responses import PlainTextResponse\r\nimport asyncio\r\nimport logging\r\n\r\nlogger = logging.getLogger(__name__)\r\nlogging.basicConfig(format="[%(levelname)s]: %(message)s", level=logging.INFO)\r\n\r\nmcp_port=3000\r\n\r\n# Initialize the FastMCP server\r\nserver = FastMCP(\r\n "Math Server",\r\n)\r\n\r\n@server.tool()\r\ndef add(a: int, b: int) -> int:\r\n """Add two numbers together."""\r\n return a + b\r\n\r\n@server.tool()\r\ndef subtract(a: int, b: int) -> int:\r\n """Subtract the second number from the first."""\r\n return a - b\r\n\r\n@server.custom_route("/healthz", methods=["GET"])\r\nasync def health_check(request: Request) -> PlainTextResponse:\r\n """Simple health check endpoint that returns a 200 OK response"""\r\n return PlainTextResponse("OK")\r\n\r\nif __name__ == "__main__":\r\n logger.info(f" MCP server started on port {mcp_port}")\r\n # Could also use \'sse\' transport, host="0.0.0.0" required for Cloud Run.\r\n asyncio.run(\r\n server.run_async(\r\n transport="streamable-http", \r\n host="0.0.0.0",\r\n port=mcp_port\r\n )\r\n )'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbeb0>)])]>

This example uses the streamable-http transport, which is recommended for remote servers. The script encapsulates the logic needed to run a scalable MCP endpoint.

Testing the MCP server locally

Create the test_mcp_server.py script to connect to test the MCP Server. This will be useful to test the MCP server before deploying it to GKE.

code_block: <ListValue: [StructValue([('code', 'from fastmcp import Client, FastMCP\r\nimport asyncio\r\nimport logging\r\n\r\n# Connect to the remote MCP server\r\nclient = Client("https://localhost:3000/mcp")\r\n\r\nasync def test_remote_server():\r\n async with client:\r\n # Basic server interaction\r\n await client.ping()\r\n\r\n # List available operations\r\n tools = await client.list_tools()\r\n print(f"Available tools: {tools} \\n")\r\n\r\n # Execute add operation\r\n result = await client.call_tool("add", {"a": 5, "b": 3})\r\n print(f"Result of addition: {result} \\n")\r\n\r\n # Execute subtract operation\r\n result = await client.call_tool("subtract", {"a": 5, "b": 3})\r\n print(f"Result of subtraction: {result} \\n")\r\n\r\nif __name__ == "__main__":\r\n asyncio.run(test_remote_server())'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbf10>)])]>

Run the MCP server locally to test the connection:

code_block: <ListValue: [StructValue([('code', 'uv run server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbf70>)])]>

Then execute the test script in a new terminal to verify the connection.

code_block: <ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbb80>)])]>

The output should print available tools and the results of invocing the add and subtract tools confirming the MCP server is functional.

Building the container image

To speed up the deployment process, build the container image while the cluster is still creating.

First, prepare the Dockerfile:

code_block: <ListValue: [StructValue([('code', 'FROM python:3.10-slim\r\nCOPY --from=ghcr.io/astral-sh/uv:0.4.15 /uv /bin/uv\r\nWORKDIR /app\r\nCOPY pyproject.toml .\r\nCOPY server.py .\r\nRUN uv sync\r\nCMD ["uv", "run", "server.py"]'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edb850>)])]>

Now, set up the Artifact Registry and build the container image.

Set up Artifact Registry

code_block: <ListValue: [StructValue([('code', 'gcloud artifacts repositories create mcp-repo \r\n--repository-format=docker \r\n--location=$REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edb7c0>)])]>

Build and push the image in parallel

code_block: <ListValue: [StructValue([('code', 'gcloud builds submit --tag $REGION-docker.pkg.dev/$PROJECT_ID/mcp-repo/math-mcp-server:latest'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbc70>)])]>

Once the image build is complete, verify that the cluster is ready and retrieve the credentials. If the output of the cluster is not "RUNNING" wait for it to be ready.

code_block: <ListValue: [StructValue([('code', 'gcloud container clusters list\r\ngcloud container clusters get-credentials mcp-cluster --region $REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edb400>)])]>

Deploying to GKE with Gateway API and SSL

The next step involves deploying the server workloads and exposing them securely using the Kubernetes Gateway API rather than the legacy Ingress. This guarantees secure, encrypted traffic via SSL certificates.

Create a deployment.yaml file to define the Kubernetes Deployment and Service. Replace the placeholders with your actual project ID and region.

code_block: <ListValue: [StructValue([('code', 'apiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n name: mcp-server\r\nspec:\r\n replicas: 2\r\n selector:\r\n matchLabels:\r\n app: mcp-server\r\n template:\r\n metadata:\r\n labels:\r\n app: mcp-server\r\n spec:\r\n containers:\r\n - name: mcp-server\r\n image: $REGION-docker.pkg.dev/$PROJECT_ID/mcp-repo/math-mcp-server:latest\r\n ports:\r\n - containerPort: 3000\r\n resources:\r\n requests:\r\n memory: "256Mi"\r\n cpu: "250m"\r\n limits:\r\n memory: "512Mi"\r\n cpu: "500m"\r\n livenessProbe:\r\n httpGet:\r\n path: /healthz\r\n port: 3000\r\n initialDelaySeconds: 15\r\n periodSeconds: 20\r\n readinessProbe:\r\n httpGet:\r\n path: /healthz\r\n port: 3000\r\n initialDelaySeconds: 5\r\n periodSeconds: 10\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n name: mcp-service\r\nspec:\r\n selector:\r\n app: mcp-server\r\n ports:\r\n - port: 80\r\n targetPort: 3000'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857edbd30>)])]>

Apply this configuration to the cluster:

code_block: <ListValue: [StructValue([('code', 'kubectl apply -f deployment.yaml'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f486a3ab580>)])]>

Check the pods are up and running

code_block: <ListValue: [StructValue([('code', 'kubectl get pods'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f486a3ab790>)])]>

To ensure our remote MCP Server is accessible let's try to reach it with a port-forward.

code_block: <ListValue: [StructValue([('code', 'kubectl port-forward svc/mcp-service 8080:80'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48696118e0>)])]>

Run the test script to verify the connection. make sure to edit the MCP Server URL in the test script to http://localhost:8080/mcp.

code_block: <ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4869611ee0>)])]>

Now let's secure the connection. To do so, we'll use a Google-managed SSL certificate and attach it to a Gateway API resource. First, reserve a static IP address for your load balancer:

code_block: <ListValue: [StructValue([('code', 'gcloud compute addresses create mcp-server-ip --global\r\nexport MCP_SERVER_IP=$(gcloud compute addresses describe mcp-server-ip --global --format="value(address)")\r\necho "Your IP: $MCP_SERVER_IP"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4869611c70>)])]>

Point your domain's DNS A record at $MCP_SERVER_IP. Example: mcp.yourdomain.com

Create a Google-Managed Certificate. Replace mcp.yourdomain.com with your actual domain.

code_block: <ListValue: [StructValue([('code', 'gcloud compute ssl-certificates create mcp-cert --domains mcp.yourdomain.com --global'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4869611700>)])]>

Create a gateway.yaml file to provision the load balancer and configure Transport Layer Security (TLS) termination.

code_block: <ListValue: [StructValue([('code', '# Gateway: HTTPS load balancer with the managed certificate and static IP\r\napiVersion: gateway.networking.k8s.io/v1beta1\r\nkind: Gateway\r\nmetadata:\r\n name: mcp-gateway\r\nspec:\r\n gatewayClassName: gke-l7-global-external-managed\r\n listeners:\r\n - name: https\r\n protocol: HTTPS\r\n port: 443\r\n tls:\r\n mode: Terminate\r\n options:\r\n networking.gke.io/pre-shared-certs: mcp-cert\r\n addresses:\r\n - type: NamedAddress\r\n value: mcp-server-ip\r\n---\r\n# HTTPRoute: forward traffic to the MCP Server\r\napiVersion: gateway.networking.k8s.io/v1\r\nkind: HTTPRoute\r\nmetadata:\r\n name: mcp-route\r\nspec:\r\n parentRefs:\r\n - name: mcp-gateway\r\n hostnames:\r\n - "mcp.yourdomain.com"\r\n rules:\r\n - matches:\r\n - path:\r\n type: PathPrefix\r\n value: /mcp\r\n backendRefs:\r\n - name: mcp-service\r\n port: 80\r\n---\r\n# The GCPBackendPolicy is used to configure session affinity and other backend.\r\n# Since MCP Servers are stateful we enable session affinity. This ensures that\r\n# requests from the same client are sent to the same backend.\r\napiVersion: networking.gke.io/v1\r\nkind: GCPBackendPolicy\r\nmetadata:\r\n name: mcp-backend-policy\r\nspec:\r\n default:\r\n sessionAffinity:\r\n type: CLIENT_IP\r\n targetRef:\r\n group: ""\r\n kind: Service\r\n name: mcp-service\r\n---\r\n# The HealthCheckPolicy is used to configure custom health probes for the MCP Server.\r\napiVersion: networking.gke.io/v1\r\nkind: HealthCheckPolicy\r\nmetadata:\r\n name: mcp-health\r\n namespace: default\r\nspec:\r\n default:\r\n checkIntervalSec: 15\r\n timeoutSec: 5\r\n healthyThreshold: 1\r\n unhealthyThreshold: 2\r\n logConfig:\r\n enabled: false\r\n config:\r\n type: HTTP\r\n httpHealthCheck:\r\n port: 3000\r\n requestPath: /healthz\r\n targetRef:\r\n group: ""\r\n kind: Service\r\n name: mcp-service'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48696111f0>)])]>

Deploying this configuration creates the infrastructure required to route external traffic securely to the MCP server.

code_block: <ListValue: [StructValue([('code', 'kubectl apply -f gateway.yaml'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4869611970>)])]>

Wait a few minutes for the load balancer to become active and the certificate to provision. Developers can check the status using kubectl get gateway mcp-gateway.

Try to reach the remote MCP Server. Run the test script to verify the connection. make sure to edit the MCP Server URL in the test script to https://mcp.yourdomain.com/mcp.

code_block: <ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4869611670>)])]>

Cleanup

code_block: <ListValue: [StructValue([('code', 'kubectl delete -f deployment.yaml\r\nkubectl delete -f gateway.yaml\r\ngcloud compute addresses delete mcp-server-ip --global\r\ngcloud compute ssl-certificates delete mcp-cert --global\r\ngcloud artifacts repositories delete mcp-repo --location=$REGION\r\ngcloud container clusters delete mcp-cluster --region $REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4869611070>)])]>

Deploying Model Context Protocol servers to Kubernetes enables new use cases for integrated agents and AI workflows. To dive deeper into these capabilities, explore the following resources:

How customer collaboration is shaping the future of GenAI security with Model Armor

Tue, 16 Jun 2026 07:00:00 +0000

At Google Cloud, we believe that the best products are built in partnership with our customers. Their feedback and real-world experiences are invaluable in helping refine our services and deliver solutions that truly meet our customers’ needs. In January 2026, our Google Cloud Developer Advocacy team participated in a high-velocity technical sprint with a major Google Cloud customer and a leader in the telecommunications industry.

This collaborative engagement provided us with deep insights, leading to significant enhancements in Model Armor information experience, our service for Runtime security for generative and agentic AI.

Accelerating GenAI adoption through "radical empathy"

The objective of this engagement was to support the productionization of a next-generation GenAI customer support platform built using Google Cloud's Agent Development Kit (ADK) and Agent Platform. By sitting directly with the customer's developers and security specialists, we gained a unique opportunity to observe how developers interact with Gemini Enterprise Agent Platform in a live, complex environment.

This experience provided something traditional documentation cycles cannot replicate: radical empathy. By logging friction points, as developers worked, we translated functional blockers into technical insights in real-time, identifying exactly where developers were hindered by ambiguous configuration guidance or a lack of granular detail.

Key discoveries from the front lines

By observing the development workflow firsthand, we identified four critical friction points:

Search-first workflows: Developers rarely navigate through documentation hierarchies; instead, they rely on search to jump straight to specific code examples. A lack of comprehensive, copy-pasteable snippets for common use cases—like PII redaction—was a primary point of friction.
Balancing confidence levels: Finding the right balance between comprehensive threat detection and minimizing disruptive false positives proved challenging. For instance, using aggressive settings like "low and above" often caused a high volume of false positives that interrupted legitimate customer support flows.
The need for granular guidance: While the core concepts of Model Armor were understood, developers needed more detail on how different enforcement methods function in practice to balance security with usability.
Integration roadblocks (the 403 error): When integrating Model Armor with other services like Apigee, developers frequently encountered 403 PERMISSION_DENIED errors. This indicated a gap in our documentation regarding necessary cross-service IAM roles and permissions.

Turning insights into action

The insights gained from this partnership were immediately channeled into a comprehensive overhaul of Model Armor’s documentation and guidance:

Tested, copy-pasteable code samples: We have added numerous tested, ready-to-use code samples throughout the documentation to support search-first workflows.
The confidence level matrix: We introduced a new technical reference to help users understand the trade-offs between different filter levels. We now explicitly recommend "High" or "Medium" thresholds for general content to minimize false positives, reserving "Low and above" for high-security threats like prompt injection and jailbreak detection.
Explicit integration guides: We updated our integration guides, with a focus on Apigee, Gemini Enterprise Agent Platform, and GKE. These now clearly outline the specific IAM roles required (such as roles/modelarmor.user) to ensure smooth, error-free deployments.
Deeper technical documentation: We have enhanced the documentation to provide in-depth explanations of enforcement methods and their real-world applications.

The power of partnership

Getting "in the room" with our customers allowed us to bridge the gap between technical accuracy and operational utility. This journey of co-innovation ensures that Model Armor serves as a genuine catalyst for your success. We encourage you to explore the updated documentation and share your feedback as we continue to build the most secure platform for your GenAI workloads.

Get started:

Explore the updated Model Armor documentation

How I learned Go in a Day with Antigravity 2.0 and How You Can Do the Same

Mon, 15 Jun 2026 09:29:00 +0000

I have been exploring how to reclaim my software stack from NPM dependency overhead and replace my resource-intensive Node.js runtime with a compiled, single-binary Go CLI. The result of my efforts is skl, a fast tool we use for managing Agent Skills, that launches in 2ms and uses only 11MB of memory.

But how exactly did I do it?

Simply, I set the architectural goals and audited the logic, while Antigravity handled the mechanical work of code translation, test generation, and platform path mappings for us. This post describes the step-by-step walkthrough of our migration workflow to help you build yours.

Step 0: Seed personal learning goals

Before writing any code, you start by defining the boundaries of your project. In our case, I wanted a zero-dependency core that used minimal external packages. I decided that our CLI tool needs to be fast, and our security model had to be zero-trust wherever appropriate. In the process, my agent added specific constraints: sanitizing all of our inputs, blocking path traversals, and enforcing depth limits on our folder scans to prevent CPU hangs.

I began by prompting Gemini to audit alternative stacks and help us weigh their tradeoffs.

code_block: <ListValue: [StructValue([('code', 'Research online and identify 3-5 CLI tool building alternatives to use over TS and explain why (focus on performance and security) with specific example and links'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8520>)])]>

Here are some alternatives we considered:

Rust was exceptionally performant, but navigating its borrow checker rules and managing its lifetime annotations added too much friction for our simple symlinking tool.
If you choose Python, you will have to distribute a runtime interpreter and manage virtual environments, dragging in packaging overhead via pip that we wanted to avoid.
Zig offered excellent low-level memory controls and compiling speed, but it lacked high-level standard library abstractions for HTTP operations and archive extraction out of the box.
Compiled Swift provided clean scripting on macOS, but its cross-platform compilation capabilities for Windows and Linux were less suited for our multi-platform requirements.

For us, Go struck the right balance: it gave us synchronous, linear code, instant compiling, and a rich standard library.

To ensure I was not doing the same work that someone had already completed before me, I kicked off the project by asking directly:

code_block: <ListValue: [StructValue([('code', 'I want to port the `npx skills` to go. Did anyone do this before?'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8040>)])]>

The agent researched the web and verified that there was no official Go port of the vercel-labs/skills repository. It confirmed that while the official CLI is TypeScript-based and distributed via npm, the Agent Skills specification itself is open and language-agnostic. This meant we were free to build a compiled Go port from scratch.

And since I want to learn in the process, I also asked for Go-specific tips, tricks, and traps:

code_block: <ListValue: [StructValue([('code', 'Identify 3-5 patterns on how to / how NOT to use GO and explain them to me'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8310>)])]>

Step 1: It's about Skills

To make best use of best practices in a language that I'm not familiar with, I decided to find the most popular, well-received Agent Skill (instructions that guide AI coding assistants) and install it before we write any code or even start planning. Grounding the environment first ensures that any code written or planned subsequently conforms to the community's consensus style.

Skill search prompt

I asked the agent what community agent skills were available for Go:

code_block: <ListValue: [StructValue([('code', 'what are the top community agent skills for `go`?'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8100>)])]>

Once the agent suggested samber/cc-skills-golang, I directed it to install the skill pack:

code_block: <ListValue: [StructValue([('code', 'add all skills from samber/cc-skills-golang'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8430>)])]>

Once installed, I manually verified that the skill was discovered and ready by typing /golang- to invoke autocompletion.

Step 2: Gap analysis and planning

I initialized the architectural goals by providing the agent with the following instruction:

code_block: <ListValue: [StructValue([('code', 'Plan 100% functionality port of `npx skills` to Go, focusing on safety, best practices, and with 90% unit test coverage. Pull the repo and map things out. Ask me any questions.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd87c0>)])]>

Our first topic task was the dynamic onboarding flow. When asked what the default should be, I suggested prompting to install antigravity-cli if no agent is found. I also defined the fallback behavior to the universal directory when multiple active agents are detected:

code_block: <ListValue: [StructValue([('code', "For the MVP, we target Antigravity 2 support as default and fallback to universal through the standards-compliant '.agents' directory (if multiple agents detected)."), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8880>)])]>

Implementation

After I approved the Plan, Antigravity handled the systematic conversion of all 51+ agent configuration records (even though I didn't explicitly ask for all this, the AI correctly identified the task as simple enough to just include in the MVP scope), mapping distinct directories for Aider, Claude Code, Cursor, Zed, and others from TypeScript to Go, ensuring we fully covered all environments.

The core structures are conveniently located in one file types.go:

code_block: <ListValue: [StructValue([('code', 'type AgentType string\r\n\r\ntype AgentConfig struct {\r\n\tName string\r\n\tDisplayName string\r\n\tSkillsDir string\r\n\tGlobalSkillsDir string\r\n\tShowInUniversalList bool\r\n\tDetectInstalled func(home, configHome, cwd string) bool\r\n}\r\n\r\n...'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8790>)])]>

This mapping works well. For example, the detection logic for Zed handles Linux (Flatpak), macOS, and Windows configurations dynamically in just a few lines:

code_block: <ListValue: [StructValue([('code', '"zed": {\r\n\tName: "zed",\r\n\tDisplayName: "Zed",\r\n\tSkillsDir: ".agents/skills",\r\n\tGlobalSkillsDir: filepath.Join(home, ".agents/skills"),\r\n\tDetectInstalled: func(h, c, w string) bool {\r\n\t\treturn exists(filepath.Join(c, "zed")) ||\r\n\t\t\t(zedAppDataHome != "" && exists(filepath.Join(zedAppDataHome, "Zed"))) ||\r\n\t\t\t(zedFlatpakConfigHome != "" && exists(filepath.Join(zedFlatpakConfigHome, "zed")))\r\n\t},\r\n}'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8610>)])]>

Next, I noticed that the Antigravity user onboarding code was intermingled with the automated mapping. A default like this one is a personal user choice and is better suited for isolation in its own file: agy-onboarding.go:

code_block: <ListValue: [StructValue([('code', 'move default Antigravity 2 prompting to agy-onboarding.go'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8640>)])]>

With version zero scaffolded, it was time to test.

Step 3: Enforcing a quality assurance (QA) loop

To guarantee that the Go port behaved identically to the original TypeScript CLI, we adopted a Test-Driven Development (TDD) loop. I kicked it off with this prompt:

code_block: <ListValue: [StructValue([('code', 'Apply TDD principles and https://preslav.me/2026/05/19/10-golang-error-handling-commandments/'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8850>)])]>

This initiated the TDD process. Rather than explicitly prompting the agent to use skills, I guided it to fetch the 3rd party best-practice blog post, which reminded the agent about relevant Agent Skills (golang-how-to, golang-testing, golang-error-handling, and golang-cli). Because Antigravity has a sandbox, it parsed these skills and automatically started executing the QA loop. And it will keep re-applying these TDD principles in the current trajectory, anytime it is about to change functional code.

Test-first frontmatter parsing

For frontmatter parsing, the agent wrote frontmatter_test.go first using Go's table-driven test pattern (which was a delightful new pattern for me to discover):

code_block: <ListValue: [StructValue([('code', 'func TestParseFrontmatter(t *testing.T) {\r\n\ttests := []struct {\r\n\t\tname string\r\n\t\traw string\r\n\t\twantData map[string]interface{}\r\n\t\twantContent string\r\n\t}{\r\n\t\t{\r\n\t\t\tname: "valid frontmatter",\r\n\t\t\traw: "---\\nname: my-skill\\n---\\n# Content\\n",\r\n\t\t\twantData: map[string]interface{}{"name": "my-skill"},\r\n\t\t\twantContent: "# Content\\n",\r\n\t\t},\r\n\t}\r\n\tfor _, tt := range tests {\r\n\t\tt.Run(tt.name, func(t *testing.T) {\r\n\t\t\tgotData, gotContent, err := ParseFrontmatter(tt.raw)\r\n\t\t\t# assert results...\r\n\t\t})\r\n\t}\r\n}'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8760>)])]>

When Antigravity ran go test, it failed cleanly as we expected. My agent then generated frontmatter.go, implementing a linear string scanning loop that splits the document and unmarshals its YAML metadata. By using simple linear scanning instead of complex regular expressions, we hardened our tool against Regular Expression Denial of Service (ReDoS) vulnerabilities that could crash the application. Including safety as a goal (in my initial prompt) resulted in safer code, even though the original Node implementation was using regular expressions.

Grounding via error commandments

Since we're talking about error handling, I'll cover here how we aligned our error structures with Preslav Rachev's 10 Golang Error Handling Commandments. Go requires you to return error values explicitly rather than catching them as exceptions. By integrating these rules, I directed the agent to check its errors immediately at every level (if err != nil) and wrap them with contextual detail (fmt.Errorf("action: %w", err)) before it propagates them up our call stack. While doing a final review of the generated code, I realized Antigravity forgot about this best practice, so I reminded it:

code_block: <ListValue: [StructValue([('code', "shorten error messages in all files, remove 'failed to' prefixes, etc. See the 10 golang commandments"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8160>)])]>

It promptly fixed them across the codebase.

Are unit tests enough?

The short answer is No.

To ensure that the AI did not introduce subtle bugs or hallucinations during the translation process, I performed code reviews rather than blindly trusting passing test suites.

When I audited the generated tests, I realized that passing green checks alone weren't enough: We were missing tests for that long list of installation locations and the various combinations of having no agents, a single agent, or multiple agents active at the same time. Since this was a complete rewrite, I wanted end-to-end integration coverage for these journeys. To address this gap, I prompted Antigravity with a set of targeted scenarios:

code_block: <ListValue: [StructValue([('code', 'Add integration tests:\r\n1. no agents installed: verify that it installs to antigravity and outputs the agy-cli onboarding tip.\r\n2. support for all agents but one\r\n3. exactly one agent installed, including cases where the same path might be attributed to multiple agents\r\n4. support for non-parametrized agents'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8550>)])]>

Note: Non-parameterized agents like Claude Code or Codex define their configuration paths globally when the package loads (or via environment variables) instead of scanning the active workspace folder at runtime.

The changelist that added these tests didn't touch any production files, the logic was solid. But I didn't want to leave this to luck. If you care about a specific feature or workflow, you have to be explicit about it. Taking five minutes to verify your end-to-end coverage and defining a few solid tests protects your users from experiencing a broken release down the line.

Step 4: Parallel subagents for CLI commands

When you port a full suite of CLI commands (init, add, list, remove, find, update,...) along with their sub-options, you face a large surface area. Rather than migrating them sequentially, it might be better to parallelize our work. In our case, it was a good choice because we wanted each subagent to focus on its specific topic rather than keep in mind the entire tool, and this helped spot a few gaps.

However, subagents are not always the best choice; you should only prioritize parallel execution on voluminous, independent tasks that are clearly bounded. When done right, parallel subagents won't consume significantly more tokens than a single long-running thread, but they protect the main coordinator agent from hitting context compression limits under the weight of a massive codebase. Most simple projects do not require this level of scale. A good rule of thumb is to reserve subagents for workloads equivalent to tens of features with tens of subfeatures.

In previous steps, I ran a single agent to quickly and efficiently build an MVP. But I was not sure whether it fully ported the code. So I asked it directly:

code_block: <ListValue: [StructValue([('code', 'did you cover 100% of the original CLI? \r\nhave subagents research each option individually and each test and fill in the gaps'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8220>)])]>

It turned out this was the right call. The subagents conducted an in-depth audit of the commands, catching several option gaps and missing tests that were subsequently integrated in this audit commit.

code_block: <ListValue: []>

Each subagent worked on exactly one command. They analyzed flag permutations like -g/--global and --copy, drafted table-driven unit tests, and verified their code compiled cleanly. Once they reported back, the main coordinator integrated their changes, resolved any conflicts, and validated that the entire combined project compiled successfully.

The Elephant and the Goldfish

To keep our agent focused during this migration, we used the Elephant and Goldfish metaphor, an architectural pattern documented in Google Research's Elephants, Goldfish, and the New Golden Age of Software Engineering. This relies on two distinct roles: the Elephant (the long-term coordinator session holding design rules and codebase memory) and the Goldfish (transient, clean subagents that you spawn to run a single task without background history).

While Antigravity does use automated session compression to manage its context size, you might want to actively manage your context window by maintaining your own checklists and partitioning your work to isolated, transient subagents, when less (context) is more (clarity).

Step 5: Package structure, compilation, and CI/CD

Through some back-and-forth communication, I learned how Go packages are structured and identified the limitations I needed to consider. I now had a cleanly structured and well documented package main.go that supported native installation:

code_block: <ListValue: [StructValue([('code', 'go install github.com/alexastrum/skl@latest'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8b50>)])]>

I prompted the agent to capture the implementation details and document them for future reference:

code_block: <ListValue: [StructValue([('code', 'summarize findings for humans in README.md, considerations for agents in AGENTS.md'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8e50>)])]>

To verify the build, auto-run tests, and make sure it works on other machines as well, I asked the agent to:

code_block: <ListValue: [StructValue([('code', 'make sure it builds on all supported platforms'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8b20>)])]>

Antigravity set up the ci.yml workflow to run a matrix build, which had a surprising dependency:

code_block: <ListValue: [StructValue([('code', 'env:\r\n FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true" # HMMMMMM ???\r\njobs:\r\n test:\r\n strategy:\r\n matrix:\r\n os: [ubuntu-latest, macos-latest, windows-latest]\r\n# ...'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8cd0>)])]>

Unexpected caveats

Paradoxically, even though we migrated from Node to Go, our GitHub pipeline still depends on Node for standard GitHub Actions helpers like actions/checkout and actions/setup-go.
The tool is completely ready to be run and compiled locally. However, if we want to distribute pre-compiled binaries to other users, we would need to configure code signing for macOS and Windows.

Since building a custom action with code signing is a complex process, it is best reserved for another time.

Step 6: Create an Agent Skill

It was time to document the process itself. To codify this workflow, we created a reusable Agent Skill.

I started by asking the agent to plan a skill creation prompt that included the most important steps:

code_block: <ListValue: [StructValue([('code', 'Review the current trajectory (including my specific prompts that generated accepted results) and lets plan to create a `/cli-to-go-migration` skill. What steps should the skill follow?'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8f40>)])]>

I got a draft prompt which I iterated upon. After some back-and-forth, I anchored my final instructions on five core rules (though yours might be different). Here's the final prompt I used:

code_block: <ListValue: [StructValue([('code', 'Review the current trajectory (including my specific prompts that generated accepted results) and lets plan to create a `/cli-to-go-migration` skill. Rules:\r\n\r\n#### 1. Goals\r\nThe agent must start with research before proposing code. It identifies broader user goals, reviews multiple stack alternatives, and checks for prior work to lock in on one target language and research its idioms.\r\n\r\n#### 2. Setup\r\nBefore modifying any files, the agent verifies or initializes a Git repository to keep a clean history. Later, it must also report download failures directly and fail gracefully once all independent work is finished, rather than falling back to placeholders or non-terminating loops.\r\n\r\n#### 3. Importing existing knowledge\r\nIf required grounding skills (like `golang-cli` or `golang-testing`) are missing but are explicitly named in a prompt, the agent blocks execution and offers to install them automatically after asking for confirmation, rather than printing instructions for the developer to follow.\r\n\r\n#### 4. Breakpoints\r\nThe skill establishes hard halts for known AI pain points. The agent stops for human or algorithmic validation when encountering specific problems and anytime confusion sets in.\r\n\r\n#### 5. Alignment checks\r\nWhenever we see signs of misalignment, we need to set explicit rules. For example, when I noticed that the agent was over-editing some docs and missing others, I set the rule that the agent should only apply the `/humanizer` skill to human-facing files, like the `README.md` or help docs, while leaving structured developer context, like `AGENTS.md`, clean of style edits so that other agents can parse its metadata accurately.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857fd8a60>)])]>

There isn't a one-size-fits-all approach, but asking the agent to create a skill and anchor it on a few guardrails is a good start. In practice, you will likely take turns polishing multiple prompts, until you feel like the agent's responses are aligned with your goals. Then you will ask for a proof read from the AI, and finally perform a human review of the SKILL.md contents.

Conclusion

Rebuilding skl in Go was a fun, educational experience that solved a personal tooling need. It worked, so I decided to document the process. Thinking through this prism, I realized that the journey itself was the reward. You grow as an engineer by codifying your architectural choices into reusable skills and personal experience; while the compiled binary is the physical proof that your process worked.

Surprisingly, the most significant shift I experienced during this migration is behavioral.

Pulling away from an IDE (integrated development environment) and using Antigravity 2.0 made it easier for me to keep a high-level view, preventing me from going in and fixing the issues that arose during the migration. Instead, it guided me to understand why the issues occurred, and learn Go-language specific details.

In a traditional IDE, the moment your assistant encounters an issue, your instinct is to grab your keyboard and debug. Operating without an editor forces you to remain the architect, steering the machine from the navigation deck rather than fighting the engine room fires yourself. That's exactly how we learn to manage agents at scale.

10 Indispensable Prompts Our Team Refuses to Build Without

Thu, 11 Jun 2026 07:00:00 +0000

Look at any builder's prompt history and you'll see a collection of highly specific, sometimes chaotic, one-off prompts. We use AI to debug a single error message, refactor a messy email, or generate a quick boilerplate.

If you sit down with people who consistently ship high-quality work, you'll find something interesting. They aren't just improvising. They have a set of go-to prompts they have tweaked and improved over time and used on nearly every project.

I asked some of my peers and leaders a simple question: "What prompt do you use most often, and why?"

What they shared wasn't just a list of arbitrary commands. Here's the unfiltered look at the prompts our team refuses to ship without, and more importantly, why they use them.

Build a spec

Maja Bilić

Senior Outbound Product Manager • Engineering

Follow on LinkedIn

Prompt:

code_block: <ListValue: [StructValue([('code', "Act as a cynical Principal Architect and Technical PM. I want to build a [product] that allows [user] to do [action]. Do not write code. Analyze this concept and list the top 5 technical, UX and architectural considerations. Then ask me key questions for each of the 5 considerations so we can work together on building the spec. Once you have all the answers, create a PRD doc and implementation plan. Don't over engineer or over simplify the design or implementation plan."), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f760>)])]>

Why? I have written bad product requirements documentations (PRDs), and I have read many bad PRDs. This prompt ensures I use the persona of a cynical Architect / PM who helps distill the idea, critique the approach and concept, and collaborate on defining the most important pieces. This way I make sure I work through the plan with an agent's help while also developing the product design idea further. I also love the guardrail of not over engineering or over simplifying things; AI tends to do that sometimes, especially when writing product design docs.

Widget tests

Andrew Brogdon

Staff Developer Relations Engineer • Engineering

Follow on X, LinkedIn

Prompt:

code_block: <ListValue: [StructValue([('code', "I'd like to partner with you on increasing the robustness of this project by creating widget tests. If you haven't already, please read the Flutter team's skill for creating widget tests (https://github.com/flutter/skills/tree/main/skills/flutter-add-widget-test). Then, let's do these things:\r\n\r\n* Examine my application's codebase to identify areas of the UI/UX that are not being tested properly.\r\n* Determine if the existing code is written in a testable way (are dependencies injected? Are domains loosely or tightly coupled? Etc.).\r\n* Determine which domains require more rigor than others.\r\n* Create an overall testing plan for the application.\r\n* Determine which areas of functionality are already aligned with that plan, and which are missing tests.\r\n* Create a plan to implement those tests.\r\n* Execute that plan.\r\n\r\nDo not proceed from one step to another unless you are completely confident about your reasoning. You are encouraged to as many questions as needed."), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f880>)])]>

Why? My favorite use of agentic coding tools is to actually do all the things I used to feel guilty about not doing in my projects. Proper testing is definitely on that list. The official skills from the Dart/Flutter team do a great job of instructing agents on what good widget tests look like, so combining it with this prompt (which essentially just fits those steps into my own coding workflow) helps me reduce the toil required to maintain reliable, guilt-free codebases.

Find all the tests / Clean-up commit

Aja Hammerly

Director of Builder Relations • Engineering

Follow on X, LinkedIn

Prompt:

code_block: <ListValue: [StructValue([('code', 'Run all the tests and identify any missing tests and write them. Pay special attention to edge cases and race conditions.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f400>)])]>

code_block: <ListValue: [StructValue([('code', "Find any unused code, embarrassing comments, comment to code inconsistencies, unresolved TODOs, or other things in this commit that shouldn't be in there."), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f940>)])]>

Why? I find that when I'm working on code I'll often get extremely focused on the "happy path", the main path I want a user to take through the code. While I'm focused on that I'll put in TODO or FIX comments on edge cases I don't want to think about yet. I'll also forget to update comments and leave debugging comments in sometimes. And while I try to follow test driven development, I don't always get tests in on all the edge cases. I run these two prompts, usually in a new conversation without the development context as a first round of code review before submitting to an AI or human reviewer for the next step. This ensures that what I've built is in good shape for others to review and use.

Check for correct and compliant permissions

Rich Hyndman

Head of Antigravity Developer Relations • Engineering

Follow on X, LinkedIn

Prompt:

code_block: <ListValue: [StructValue([('code', "Run a comprehensive check on this Android project to ensure all permissions are correct and compliant. Perform the following steps:\r\n1. Locate and analyze all 'AndroidManifest.xml' files (including main, debug, and flavor-specific manifests), extract a master list of declared <uses-permission> tags. \r\n2. Cross-reference these declared permissions against the codebase to verify where they are actually used. Identify any bloatware or unused permissions that can be safely removed.\r\n3. Check the Kotlin/Java source files to ensure that all runtime permissions implement the dynamic runtime permission request flow 'checkSelfPermission','onRequestPermissionsResult' or the Activity Result API.\r\n4. Verify that any hardware features associated with the permissions (like android.hardware.camera) are correctly declared. \r\nOutput your findings as a Markdown report. Provide file paths and suggested code diffs for any fixes. Do not make any file edits until I approve the plan."), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f640>)])]>

Why? Antigravity, with Gemini 3.5 Flash and the Android plugin is an excellent Android development partner! Checking for the correct permissions can keep your app running smoothly and help avoid delays when uploading to the Play Store.

Conduct code review

Shir Meir Lador

Head of AI, Developer Relations • Engineering

Follow on X, LinkedIn

Prompt:

code_block: <ListValue: [StructValue([('code', 'Act as a strict, highly analytical Principal Engineer conducting a pre-production code review. You have incredibly high standards and zero tolerance for fragile, "happy-path" code. Your goal is to guide me to write bulletproof, production-ready systems.\r\nGrade my uncommitted changes on an A-to-F scale for production readiness. \r\nDo not award an "A" unless my code is exceptionally robust. Specifically, analyze the changes for:\r\n1. Efficiency: Redundant API calls, wasteful database queries, or un-cached resource leaks.\r\n2. Resilience: Silent failure points, lack of explicit error boundaries, and missing rate-limit fallbacks.\r\n3. Architecture: Tight coupling and lack of clear separation of concerns.\r\nFor every issue, explain pragmatically where the code is vulnerable to real-world production failures. Then, provide the exact git diffs needed to upgrade my code and earn that "A."'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f0d0>)])]>

Why? If you ask an LLM to review your code, it almost always defaults to being polite. It tells you your naming is clean, suggests a few docstrings, and hands you a green checkmark. But polite reviews don't prevent production outages. I like this prompt because it completely cuts through that AI fluff. By forcing the model to grade your work on a harsh scale and demanding a working git diff to fix it, you turn it into a real partner. It stops guessing and starts actually reading your network calls and database queries to find where the code is going to break. It’s like having an uncompromising senior dev sitting over your shoulder, pointing out exactly where you got lazy, and then handing you the exact code to fix it.

Explain trade-offs to aid decision-making

James O'Reilly

Staff Developer Relations Engineer • Engineering

Follow on X, LinkedIn

Prompt:

code_block: <ListValue: [StructValue([('code', "Explain the pros and cons of executing your suggested Implementation Plan. Be specific about the trade-offs we're making related to perforance, cost, security and maintainability so I can make an informed decision on how to proceed."), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f1f0>)])]>

Why? I force AI to stress-test its own logic. By asking it about the trade-offs being made, I find the AI will rethink its strategy, stay hyper-focused on our specific implementation and avoid giving vague, hand-wavy responses. I also find this approach prevents AI from acting like the final authority and keeps me in control of the decision making.

Improve AI-generated code through research

Emma Twersky

Head of Flutter & Dart Developer Relations • Engineering

Follow on X, LinkedIn

Prompt:

code_block: <ListValue: [StructValue([('code', "Research online, focusing on X threads, StackOverflow, GitHub issues and tech blogs for common security pitfalls, architectural misalignments, and subtle logic errors found in AI-generated INSERT_TECH_YOU'RE_USING_HERE code. Based on these findings, generate a manual review checklist specifically for auditing high-risk areas like platform channel validation, deep link routing, and sensitive data logging in crash reports."), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f1c0>)])]>

Why? While AI can write code 10x faster, it often produces slop—code that is rational but conceptually buggy because it makes incorrect assumptions about unspecified details. Research shows that up to 40% of AI-generated code contains vulnerabilities, and developers often trust it more than their own, which creates a dangerous mismatch. I use this prompt to generate a targeted checklist that protects against 'rubber-stamping' verbose AI changes and ensures my human judgment focuses on the high-risk 'seams' where models typically fail. Use AI to generate the tasks, but still keep a human in the loop where it matters most.

Find problems through iteration

Fred Sauer

Head of Frameworks & Languages Developer Relations • Engineering

Follow on X, LinkedIn

Prompt:

Simplified, my "last" (series of) prompt(s) looks something like:

code_block: <ListValue: [StructValue([('code', '- Code review the uncommitted changes.\r\n\r\nI prefer being less specific has oversteering can lead to blind spots.\r\nI prefer a new chat session for a fresh set of "eyes".\r\nI iterate until the results returned are boring and I\'m satisfied.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f2e0>)])]>

If I come into this last phase with an opinion, (e.g. the change feels too complex), or I feel I don't have a good insight into how "good" the change is, then I might challenge the model with this prompt:

code_block: <ListValue: [StructValue([('code', '- Code review the uncommitted changes. Identify any unhandled corner cases. Assess performance. Summarize findings.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6f310>)])]>

Then, having received 5 findings:

code_block: <ListValue: [StructValue([('code', '- Fix 1, 3 and 5.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6fa90>)])]>

Why? I don't have ONE last prompt I send. It's more that my change goes through stages. The earliest stage is often about discovery (find the needle or thread to pull on). Then I move on to existence proof, i.e. I just want it to prove the thing I want to do can be done. Then I evaluate: is the PoC reasonable? Too complex? Makes changes entirely in the wrong place(s)? I then iterate and try to make the solution elegant, both how it's implemented, and where what is changed. Once I have something I'm happy with, like I feel happy if I had written what I now have, I move on to that last phase you discuss with is code review. This is about finding problems or identifying opportunities to make the change even better. I'm often surprised with what insights the model comes up with.

Review every pull request

Remigiusz Samborski

Lead Developer Relations Engineer • Engineering

Follow on X, LinkedIn

Prompt:

I use the following prompt embedded in GitHub Actions for most of my engineering projects:

code_block: <ListValue: [StructValue([('code', '## Role\r\n\r\nYou are a world-class autonomous code review agent. You operate within a secure GitHub Actions environment. Your analysis is precise, your feedback is constructive, and your adherence to instructions is absolute. You do not deviate from your programming. You are tasked with reviewing a GitHub Pull Request.\r\n\r\n\r\n## Primary Directive\r\n\r\nYour sole purpose is to perform a comprehensive code review and post all feedback and suggestions directly to the Pull Request on GitHub using the provided tools. All output must be directed through these tools. Any analysis not submitted as a review comment or summary is lost and constitutes a task failure.\r\n\r\n[...]'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6fac0>)])]>

Full prompt: link

Why? Using an automated Gemini CLI review in PRs helps catch issues and improvement opportunities during the review process. Additionally as more code is generated by AI Agents and development speed increases, reviews are becoming the bottleneck. By ensuring every PR gets reviewed automatically, human reviewers can focus on the higher-level architectural and conceptual review of the proposed change.

Apply directed acyclic graph analysis for tests

Karl Weinmeister

Director, Developer Relations • Engineering

Follow on X, LinkedIn

Prompt:

code_block: <ListValue: [StructValue([('code', 'Analyze the application workflow as a directed acyclic graph. Identify impactful tests for components, seams across components, and across the system. Present your findings in a markdown table as a prioritized gap analysis.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f6fb80>)])]>

Why?

Most application workflows aren't linear. When you ask an LLM to suggest tests, you typically get a generic checklist that could apply to any project.

However, when you force it to think about your system as a Directed Acyclic Graph (DAG) with nodes and edges, it starts reasoning structurally about where things can break.

I’ve also asked to consider the “seams” - a term from Michael Feathers' Working Effectively with Legacy Code. It points the model toward boundaries between components that are often under-tested.

Finally, I’ve asked the model to summarize the results as a prioritized table of opportunities. This gives your agent a clear roadmap for making your app more resilient.

Conclusion

The thread connecting all of these prompts is about de-risking human assumptions. Whether it's hunting for obscure edge cases, translating developer speak for end-users, or stress testing an architecture before code is written. Our team uses AI as an adversarial thinker designed to ask the hard questions we might overlook when we're deep in the weeds.

By building these "must-run" prompts into our daily workflows, we don't just ship faster, we ship with a level of confidence that used to require entire committees to achieve.

Choosing your surface: Antigravity 2.0, Antigravity CLI, Antigravity IDE, or Antigravity SDK

Wed, 10 Jun 2026 07:00:00 +0000

TL;DR:

Antigravity 2.0: A desktop app to orchestrate multiple autonomous agents working in parallel across independent projects.
Antigravity CLI: A terminal interface designed for command-line workflows and headless execution.
Antigravity IDE: An editor for developers who want to write code directly alongside an agent.
Antigravity SDK: A Python library for building and deploying your own custom agents that use the Antigravity Harness.

Quick Comparison

Feature	Antigravity 2.0	Antigravity CLI	Antigravity IDE	Antigravity SDK
Interface	Desktop App	Terminal (TUI)	Desktop App	Python Code
Best For	Multiple simultaneous tasks	Command-line / Headless	Directly editing code	Building custom agents

The Four Surfaces of Antigravity

1. Antigravity 2.0

The default recommendation. Manages tasks across multiple projects at the same time.

Antigravity 2.0 is a standalone desktop application. It is designed to let you run multiple tasks without blocking your main workspace. You can easily switch between and monitor different projects from one screen. You can also schedule tasks to run on a regular schedule to check code quality or find outdated packages.

2. Antigravity CLI

For terminal workflows and headless execution.

Built in Go for speed, the Antigravity CLI is for those who prefer to work in the terminal with fast, keyboard-driven navigation and simple shortcuts. You can start background agents using terminal commands without locking up your active command-line window. Choose the CLI if you need headless execution (such as working over SSH or inside remote containers).

3. Antigravity IDE

For developers who want to see and edit the code directly.

The IDE surface puts agents directly inside your current workspace. This is the best choice if you want to see exactly what code the agent is editing and accept or reject changes line-by-line. With built-in debugging, the agent can see runtime errors and offer a one-click fix right in your editor.

4. Antigravity SDK (Python)

Best for: Writing custom agent logic and automated pipelines.

code_block: <ListValue: [StructValue([('code', 'import asyncio\r\nfrom google.antigravity import Agent, LocalAgentConfig\r\n\r\nasync def main():\r\n config = LocalAgentConfig(\r\n system_instructions="You are an expert assistant for codebase navigation.",\r\n # api_key="your_api_key_here",\r\n )\r\n async with Agent(config) as agent:\r\n response = await agent.chat("What files are in the current directory?")\r\n print(await response.text())\r\n\r\nasync def run():\r\n await main()\r\n\r\nif __name__ == "__main__":\r\n asyncio.run(run())'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f61b50>)])]>

The Google Antigravity SDK is a Python library that lets you build your own custom agents from scratch. Because it runs on the same shared harness, you get direct access to the exact same tools and rules that power Google’s official Antigravity tools. You can write an agent locally and deploy it to Google Cloud with zero code changes.

Summary

While each interface looks different, they all run on the same underlying agent harness. No matter which of the Antigravity surfaces you choose, you get support for plugins, skills, and more. Your agents have access to the same core logic, so pick the one that works best for your project.

For guides and documentation, visit antigravity.google, and when you’re ready to get started, visit the Antigravity Download Page.

Scaling AI Agents: A Step-by-Step Guide to Deploying ADK on GKE Autopilot

Thu, 04 Jun 2026 07:00:00 +0000

While building AI agents locally using Google’s Agent Development Kit (ADK) is an excellent way to prototype, production-ready agents require a robust, scalable infrastructure. For developers looking to move beyond simple instances and into the world of managed container orchestration, Google Kubernetes Engine (GKE) Autopilot offers the perfect balance of flexibility and ease of use.

In this tutorial, I will walk you through building a technical agent with ADK and deploying it to GKE Autopilot. We will focus on utilizing Gemini on Vertex AI as the core model and ensure highest security standards by implementing Workload Identity for permission management.

Understanding the GKE ADK Architecture

Deploying an ADK agent on GKE Autopilot involves more than just running a container. We leverage GKE's native capabilities to handle scaling and security. Our architecture consists of an ADK-based Python application packaged as a Docker image and stored in Artifact Registry. This container runs as a Deployment on GKE Autopilot, where it communicates securely with Vertex AI using Workload Identity—mapping a Kubernetes Service Account to a Google Cloud IAM Service Account.

To expose the agent to the world, we use the Kubernetes Gateway API, the modern successor to Ingress, which provides a cleaner separation of concerns and native support for Google Cloud Load Balancing.

Prerequisites

Before we begin, ensure you have the following tools and accounts ready:

Python 3.10 or higher.
uv for package management.
Google Cloud SDK (gcloud) installed and configured.
A Google Cloud project with billing enabled.
kubectl command-line tool.
jq for parsing JSON responses.
The following APIs enabled: Kubernetes Engine, Artifact Registry, and Vertex AI.

Step 0: Configuring Google Cloud and Authentication

Before interacting with Google Cloud services, you must authenticate your environment and set the active project. This ensures that both the gcloud CLI and your local Python environment can access Vertex AI.

Login to Google Cloud SDK:
```
gcloud auth login
```
Set your active project:
```
gcloud config set project [PROJECT_ID]
```
Setup Application Default Credentials (ADC): This is crucial for the ADK library to authenticate with Vertex AI during local testing.
```
gcloud auth application-default login
```
Define Environment Variables: To ensure we can easily reuse our configuration in subsequent steps, let's export our project, region, and cluster name as environment variables.
```
export PROJECT_ID=$(gcloud config get-value project)
export REGION=us-central1
export CLUSTER_NAME=adk-cluster
```

Step 1: Provisioning GKE Autopilot

GKE Autopilot is the recommended way to run Kubernetes without managing nodes. It allows you to focus on your agent deployment while Google manages the infrastructure. Starting the cluster creation now allows it to provision in the background while we build the agent.

gcloud container clusters create-auto $CLUSTER_NAME --region $REGION

While the cluster is provisioning, we can move on to building our agent.

Step 2: Building the Agent with ADK

First, let's create our agent. Start by creating a folder for the agent code:

mkdir adk-agent
cd adk-agent

Initialize a new Python project with uv:

uv init

Add dependencies

uv add google-adk

Create a new agent using the adk cli

uv run adk create weather_agent

You will be asked to choose a model for the root agent. Choose gemini-2.5-flash (Number 1). Next you will be asked to choose a backend. Choose Vertex AI (Number 2). Next you will be asked to enter your Google Cloud project ID. Enter your project ID. Next you will be asked to enter your Google Cloud region. Choose a region of your choice. Example: us-central1.

The previous command scaffolded a new directory weather_agent with the following structure:

weather_agent/
├── .env
├── __init__.py
└── agent.py

ADK requires the agent code to be in agent.py file. Let's edit the agent.py file to add a simple tool for the agent.

 from google.adk import Agent
# Define a simple tool for the agent
def get_weather(city: str) -> str:
    """Returns the current weather in a city."""
    return f"The weather in {city} is 90 degrees Fahrenheit and sunny."
# Initialize the agent with Vertex AI and Gemini
root_agent = Agent(
    name="weather_agent",
    model="gemini-2.5-pro",
    tools=[get_weather]
)

The agent.py file is the entry point for the agent. It is used to define the agent and its tools. The get_weather function is a simple tool that returns the current weather in a city. For the purpose of this tutorial, we are using a hardcoded value for the weather. In a real-world scenario, you would use an API to get the current weather.

Step 3: Testing the Agent Locally

Before deploying the agent to GKE Autopilot, we need to test it locally to ensure it works as expected. Run the following command to start the agent in debug mode with the web UI:

uv run adk web

Open http://localhost:8000 in your browser and you should see the ADK web UI. You can then interact with your agent by typing messages in the chat interface.

If the agent returns a message like "The weather in [CITY] is 90 degrees Fahrenheit and sunny." Congratulations! your ADK agent is working. Now you can proceed to the next step.

Step 4: Preparing for GKE Autopilot

The ADK cli has a built-in command to deploy the agent to GKE Autopilot. However the default settings are not suitable for a production environment. For example, the default settings do not use Workload Identity for authentication with Vertex AI and to expose the Web UI via a Load Balancer on port 80.

We will instead manage the lifecycle of the container ourselves. First we need to containerize the agent.

Create a .dockerignore file in the adk-agent directory to prevent your local virtual environment from being copied into the image:

.venv
.adk
__pycache__
*.pyc
.env

Create a Dockerfile for your agent in the adk-agent directory. We will use a multi-stage build to keep the final production image lightweight and secure:

# Stage 1: Build the virtual environment
FROM python:3.10-slim AS builder

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Set working directory
WORKDIR /app

# Force uv to use the system Python and use copy instead of symlinks
ENV UV_PYTHON_PREFERENCE=only-system
ENV UV_LINK_MODE=copy
ENV UV_COMPILE_BYTECODE=1
ENV UV_PYTHON=/usr/local/bin/python3

# Install dependencies
# We copy only files needed for installation to maximize cache
COPY pyproject.toml uv.lock ./
# Note: We don't use --frozen yet as the host lock file might be slightly out of sync
# but sync will update it in the builder stage.
RUN uv sync --no-install-project --no-dev --no-cache

# Copy the agent code
COPY . .
# Sync the project itself
RUN uv sync --no-dev --no-cache

# Stage 2: Runtime image
FROM python:3.10-slim

WORKDIR /app

# Copy the pre-built environment from the builder
COPY --from=builder /app/.venv /app/.venv
# Copy the application code (including weather_agent folder)
COPY . .

# Add the environment to the PATH
ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1

# Run the ADK API server
# We point to the weather_agent folder
CMD ["adk", "api_server", ".", "--host", "0.0.0.0", "--port", "8080"]

Build and push the image to Artifact Registry:

# Create repository
gcloud artifacts repositories create adk-repo --repository-format=docker --location=$REGION

# Build and push
gcloud builds submit --tag $REGION-docker.pkg.dev/$PROJECT_ID/adk-repo/gke-agent:latest

Step 5: Implementing Workload Identity for Security

Security is paramount. Instead of hardcoding API keys, we use Workload Identity to grant the GKE pod permission to access Vertex AI.

1. Create an IAM Service Account:

gcloud iam service-accounts create adk-gke-sa

2. Grant Vertex AI permissions:

gcloud projects add-iam-policy-binding $PROJECT_ID \

    --member="serviceAccount:adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

3. Allow the Kubernetes Service Account to impersonate the IAM SA:

gcloud iam service-accounts add-iam-policy-binding adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com \
    --role="roles/iam.workloadIdentityUser" \
    --member="serviceAccount:$PROJECT_ID.svc.id.goog[default/adk-ksa]"

Step 6: Deploying the Agent to GKE

Now, we define the Kubernetes resources. Create a deployment.yaml that includes the Service Account annotation for Workload Identity. Replace $PROJECT_ID and $REGION with your actual project ID and region.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: adk-ksa
  annotations:
    iam.gke.io/gcp-service-account: adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: adk-agent
spec:
  replicas: 2
  selector:
    matchLabels:
      app: adk-agent
  template:
    metadata:
      labels:
        app: adk-agent
    spec:
      serviceAccountName: adk-ksa
      containers:
      - name: adk-agent
        image: $REGION-docker.pkg.dev/$PROJECT_ID/adk-repo/gke-agent:latest
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits: 
            cpu: "1"
            memory: "1Gi"
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: adk-service
spec:
  selector:
    app: adk-agent
  ports:
  - port: 80
    targetPort: 8080

Apply the configuration:

kubectl apply -f deployment.yaml

Check the status of the deployment:

kubectl get pods -w

Once the pods are running, you can use kubectl port-forward to access the agent locally:

kubectl port-forward svc/adk-service 8080:80

Since we deployed the agent without Web UI, we can't access it at http://localhost:8080. However, we can still interact with it using the API and curl.

In a new terminal, run the following commands:

# Create a new session
curl -X POST http://localhost:8080/apps/weather_agent/users/u_123/sessions/s_123

# Run a message
curl -s -X POST http://localhost:8080/run \
-H "Content-Type: application/json" \
-d '{
"appName": "weather_agent",
"userId": "u_123",
"sessionId": "s_123",
"newMessage": {
    "role": "user",
    "parts": [{
    "text": "Hey whats the weather in new york today"
    }]
}
}' | jq .

The curl command will return the response in JSON format. The jq command is used to parse the JSON response and display it in a more readable format. . You should see a response like:

{
    "sessionId": "s_123",
    "messages": [
        {
            "role": "assistant",
            "parts": [
                {
                    "text": "The weather in New York today is sunny with a high of 90 degrees Fahrenheit."
                }
            ]
        }
    ]
}

(Optional) Step 7: Exposing via Gateway API and HTTPS load balancer

Finally, we expose the agent using the GKE Gateway API with a Google-managed TLS certificate. This is the recommended, production-grade approach — Google will automatically provision and renew the certificate for your domain.

NB: GKE supports other options to provision certificates. You can use Let's Encrypt with cert-manager, pre-shared certificates, or any other certificate authority. You can check the GKE documentation for more details.

First, reserve a static IP address for your load balancer:

gcloud compute addresses create adk-agent-ip --global
export AGENT_IP=$(gcloud compute addresses describe adk-agent-ip --global --format="value(address)")
echo "Your IP: $AGENT_IP"

Point your domain's DNS A record at $AGENT_IP. Example: adk.mydomain.com

Create a Google-Managed Certificate. Replace adk.yourdomain.com with your actual domain::

gcloud compute ssl-certificates create adk-cert --domains adk.yourdomain.com --global

Create a gateway.yaml with the following content:

# Gateway: HTTPS load balancer with the managed certificate and static IP
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: adk-gateway
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      options:
        networking.gke.io/pre-shared-certs: adk-cert
  addresses:
  - type: NamedAddress
    value: adk-agent-ip
---
# HTTPRoute: forward traffic to the ADK service
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: adk-route
spec:
  parentRefs:
  - name: adk-gateway
  hostnames:
  - "api.yourdomain.com"
  rules:
  - backendRefs:
    - name: adk-service
      port: 80
---
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: adk-health
  namespace: default
spec:
  default:
    checkIntervalSec: 15
    timeoutSec: 5
    healthyThreshold: 1
    unhealthyThreshold: 2
    logConfig:
      enabled: false
    config:
      type: HTTP
      httpHealthCheck:
        port: 8080
        requestPath: /health
  targetRef:
    group: ""
    kind: Service
    name: adk-service

Apply the configuration:

kubectl apply -f gateway.yaml

Certificate provisioning can take up to 20 minutes. Monitor the status with:

gcloud compute ssl-certificates describe adk-cert --global

Once the status shows Active, your agent is live at https://api.yourdomain.com. You can test it with:

# Create a new session
curl -X POST https://api.yourdomain.com/apps/weather_agent/users/u_124/sessions/s_124

# Run a message
curl -s -X POST https://api.yourdomain.com/run \
-H "Content-Type: application/json" \
-d '{
"appName": "weather_agent",
"userId": "u_124",
"sessionId": "s_124",
"newMessage": {
    "role": "user",
    "parts": [{
    "text": "Hey whats the weather in new york today"
    }]
}
}' | jq .

Conclusion & Looking Ahead

By following these steps, you have successfully deployed a production-ready AI agent built with ADK onto GKE Autopilot that invokes Gemini on Vertex AI with Workload Identity for authentication. This setup ensures that your agent can scale horizontally to meet demand while maintaining a high security posture.

As you look ahead, consider integrating more complex tools or leveraging GKE's multi-cluster capabilities for even greater resilience. For more details on the technologies used here, explore the official GKE documentation and the ADK repository.

To avoid ongoing charges, remember to delete the GKE cluster and the Artifact Registry repository when finished:

kubectl delete -f gateway.yaml
kubectl delete -f deployment.yaml
gcloud compute addresses delete adk-agent-ip --global
gcloud compute ssl-certificates delete adk-cert --global
gcloud container clusters delete $CLUSTER_NAME --region $REGION
gcloud artifacts repositories delete adk-repo --location $REGION

Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers

Tue, 02 Jun 2026 17:00:00 +0000

Google Cloud Storage (GCS) is a foundational component of the modern agentic tech stack and the preferred home for unstructured data at scale. As enterprises deploy agents in production, the critical focus has shifted to turning data into context and building secure, standardized integrations to access context. This is the core of smart storage: making unstructured data inherently agent-ready by turning passive objects into rich context for reasoning. Whether it’s automating complex financial workflows or diagnosing system failures in seconds, AI success now depends on how seamlessly agents can leverage this intelligence to make smart, high-stakes decisions.

In this blog, we will share three examples of agents built by customers using GCS, and then share how you can securely and reliably connect your agents to GCS using Model Context Protocol (MCP). Combined with smart storage features like auto annotations and object contexts, GCS MCP server makes the whole agent deployment process easy and simple.

Real-world agent success on Google Cloud Storage

We are seeing incredible innovation from customers leveraging MCP and Google’s agentic tech stack to solve complex business problems:

Palo Alto Networks built the Strata Co-Pilot agent, a screen-aware AI assistant that guides network security administrators through complex configuration flows—either by highlighting steps or executing them directly. The agent is powered by the Gemini Live API, with GCS serving as its “historical memory” connected via the GCS MCP server.
Airwallex developed an AI Assistant that understands user context, answers questions, and executes workflows on their behalf. For example, it can smartly analyze expense policy documents and generate detailed approval workflows - a task that would normally take hours to do manually. GCS and GCS metadata are used by the agent to store documents and the extracted information, respectively.

Snap's Job Optimization Agent analyzes Flink and Spark job specs, metadata, and historical metrics stored on GCS across thousands of jobs to find optimization opportunities, generate cost estimates, and tune configurations. Using this agent, Snap is already seeing investigation time reduced from 30 minutes to 30 seconds!

In all these three agents, the GCS MCP server handles data operations as well as enforces standard RBAC and access policies.

Connecting agents to GCS using MCP

MCP has rapidly emerged as the universal standard for connecting agents to data sources, but building custom servers from scratch is often a slow, distracting process that diverts focus from innovation. This path introduces significant development overhead and risk, as it forces you to manage everything from authentication and error handling to keeping pace with GCS’s evolving capabilities. To solve this, GCS offers two powerful MCP server options — Remote and Local — allowing you to offload the foundational plumbing and focus on creating value.

1. Remote MCP server: Fully-managed
Connecting your agents to the Cloud Storage MCP server requires zero infrastructure deployment. By simply pointing your agent configuration to the managed endpoint, you gain immediate access to your unstructured data on GCS, allowing you to scale your agentic workloads effortlessly without the burden of operational overhead.

Because the Cloud Storage MCP server follows the open MCP standard, it works seamlessly with major agentic frameworks like ADK and is compatible with MCP clients. You can easily connect clients like Google Antigravity and Anthropic’s Claude by adding a Custom Connector in the settings. Simply point it to your Cloud Storage MCP endpoint, and you are ready to start building — no complex configuration files required.

Connecting an agent to storage requires robust security and governance. GCS MCP server is built on Google Cloud's standard identity, observability, and security frameworks:

Identity-first security: Authentication is handled entirely through Identity and Access Management (IAM) rather than shared keys. This ensures agents can only access data (buckets and objects) explicitly authorized by the user.
Full observability: To track agent activity, every request and action taken via these MCP servers is logged in Cloud Audit Logs. This provides security teams with a record of every interaction, maintaining visibility alongside ease of access.
MCP security - content scanning: You can optionally configure the MCP endpoint with Google’s content security service, Google Cloud Model Armor. This allows you to implement security controls against common MCP attack vectors—such as direct and indirect prompt injection attacks, MCP Tool poisoning attacks, and malicious URL/SQL injections—as well as prevent the leakage of sensitive data.

Cloud Storage MCP servers are perfect for most production use cases; however, as with all remote servers, you lose the capability to fully customize your MCP tools.

2. Local MCP Server: Self-managed for controlled customization
While the Remote server handles standard data access, Local MCP is the right choice when you need to build custom tools specific to your business logic. For example, if your agent needs to perform specialized data transformations—such as redacting PII or adding context from another internal system—whenever it reads a file from GCS, a Local MCP server allows you to define those unique capabilities

The GCS Local MCP server is an open-source GitHub repository of Google-maintained tools that provides you with a reliable bridge to your data. Here are a few tips to keep in mind while designing custom tools:

Provide precise, clear descriptions to minimize incorrect invocations by the models
Implement model-friendly error handling for models to understand their mistakes and self-correct

The GCS Local MCP is now also a part of the MCP Toolbox for Databases, a single open-source repository containing connectors for major data services such as GCS, BigQuery, AlloyDB, Spanner, and Cloud SQL, making it easier to monitor and manage your data ecosystem. The Toolbox offers simplified development with reduced boilerplate code, enhanced security through OAuth2 and OIDC, and end-to-end observability with OpenTelemetry integration.

Get started

Whether you are optimizing an existing process like Snap or automating workflow creations like Airwallex, your unstructured data is one of your agent's greatest assets.

Explore the generally available GCS Remote MCP Server.
Check out our GCS Local MCP GitHub repository to start building custom tools today, or use it as part of MCP Toolbox for Databases.
Reach out to us to discuss your Agent use case with GCS data.

Experimenting with TPUs, GKE Managed DRANET, and Multi-cluster Inference Gateway

Tue, 02 Jun 2026 07:00:00 +0000

What happens when your workload fails in one region but you need access to service? This is a common case for availability and uptime. With recent enhancement to the Kubernetes ecosystem and capabilities like Dynamic Resource Allocation (DRA) and Inference Gateway. I decided to experiment with these capabilities in Google Cloud for a simple test using an AI inference workload.

In this blog, we will explore this setup and you can also jump straight into the detailed configs in this codelab Build multi-cluster GKE Inference Gateway, with TPUs , Cloud Storage FUSE and managed DRANET.

Building blocks

To build out this experiment, use the following products, features, and tools:

Google Kubernetes Engine (GKE) managed DRANET: This is a managed feature that lets you request and share resources among Pods. This supports GPUs, and TPUs. In this test TPUs were used in two different regions with networking assigned using managed DRANET.
Multi-cluster GKE Inference gateway: Load balances your AI/ML inference workloads across multiple GKE clusters. This works in a failover situation which is what my experiment intended to test. The type which supports this is the Multi-cluster Cross-region internal Application Load Balancer gke-l7-cross-regional-internal-managed-mc
Cloud Storage FUSE: Provides a way to store data, models, checkpoints, and logs directly in Cloud Storage. To speed up the deployment, an open source gemma model was downloaded to this storage for retrieval.
Virtual private Cloud (VPC): The foundational global network providing isolated, secure communication for the internal load balancers and compute nodes
GKE Fleets: Fleets group the separate regional clusters under a unified management control plane
TPU v6e: Google's custom AI accelerators that provide the high-performance compute required to serve the model. The VM family type used was the ct6e-standard-4t in a 2x2 Slice

Design pattern example

The aim is to deploy a LLM model (Gemma 3) onto 2 GKE clusters in different regions. Each cluster will use 4 TPU v6e chips. The model should be stored in Cloud Storage. The workload is served using GKE Inference Gateway which supports multi-clusters. The traffic should be routed to the region closest to the user and failover to the other region if one region fails.

Putting it together

To get access to the TPUs for your project in two regions you have to ensure you have the necessary quota in those regions.

Begin: Set up the environment.

Create a standard VPC, with firewall rules and subnet in the same zone as the reservation.
Create a proxy-only subnet this will be used with the Internal regional application load balancer attached to the GKE inference gateway
Set up firewall rules allowing traffic and health checks.
Reserve static internal IP addresses in both regions for the Gateway.
Provision a Cloud Storage FUSE bucket and configure a dedicated IAM Service Account. Bind this to a Kubernetes Workload Identity so your pods can securely mount the bucket and read the model weights directly.

Next: Create standard GKE clusters and node pools.

Deploy two separate GKE clusters in your chosen regions configured.
Enable the Gateway API (--gateway-api=standard) and the Cloud Storage FUSE CSI driver (--addons GcsFuseCsiDriver) during cluster creation.
Create dedicated TPU v6e node pools (ct6e-standard-4t) for both clusters.
Enable managed DRANET on these TPU node pools by setting the flags ---accelerator-network-profile=auto, and --node-labels=cloud.google.com/gke-networking-dra-driver=true

Next: Establish the global mesh via Fleet Registration.

Register both GKE clusters to a unified GKE Fleet by following the fleet creation and registration setup.
Enable Multi-Cluster Service Discovery and Multi-Cluster Ingress on your fleet.
Designate your primary region as the configuration hub to act as the control plane for routing rules across both regions.

Next: Deploy the AI workload.

Use a temporary Kubernetes job to download the Gemma 3 (gemma-3-27b-it) model weights directly into your Cloud Storage bucket.
Define a ResourceClaimTemplate that explicitly requests the managed DRANET device class (deviceClassName: netdev.google.com ) with the allocation mode set to "All".

code_block: <ListValue: [StructValue([('code', 'apiVersion: resource.k8s.io/v1\r\nkind: ResourceClaimTemplate\r\nmetadata:\r\n name: all-netdev\r\n namespace: default\r\nspec:\r\n spec:\r\n devices:\r\n requests:\r\n - name: req-netdev\r\n exactly:\r\n deviceClassName: netdev.google.com\r\n allocationMode: All'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c4130>)])]>

Deploy your inference server (e.g. vLLM) on the TPU nodes in both regions. Ensure the pod spec utilizes node selectors for the 2x2 TPU topology, requests exactly 4 TPUs, and mounts the netdev claim. This guarantees your pods utilize the dedicated accelerator networking alongside standard Ethernet.

Next: Configure the Multi-Cluster Inference Gateway.

Install the necessary Custom Resource Definitions (CRDs) so Kubernetes can process specialized routing objects like the InferenceObjective.
Deploy an AutoscalingMetric to track hardware utilization, such as KV cache usage.
Use Helm to group the independent AI deployments from both regions into a single, logical InferencePool.
Deploy the Cross-Region Gateway and its associated HTTPRoute to manage incoming global traffic.
Apply health checks and backend policies to the pool to ensure load balancing relies on your custom hardware metrics.

Configure an InferenceObjective to instruct the gateway to route prompts to the region with the highest availability, avoiding overloaded TPUs.

code_block: <ListValue: [StructValue([('code', 'apiVersion: gateway.networking.k8s.io/v1\r\nkind: Gateway\r\nmetadata:\r\n name: cross-region-gateway\r\n namespace: default\r\nspec:\r\n gatewayClassName: gke-l7-cross-regional-internal-managed-mc\r\n addresses:\r\n - type: networking.gke.io/named-address-with-region\r\n value: "regions/europe-west4/addresses/gemma-gateway-ip-europe-west4"\r\n - type: networking.gke.io/named-address-with-region\r\n value: "regions/us-east5/addresses/gemma-gateway-ip-us-east5"\r\n listeners:\r\n - name: http\r\n protocol: HTTP\r\n port: 80\r\n---\r\napiVersion: gateway.networking.k8s.io/v1\r\nkind: HTTPRoute\r\nmetadata:\r\n name: gemma-route\r\n namespace: default\r\nspec:\r\n parentRefs:\r\n - name: cross-region-gateway\r\n kind: Gateway\r\n rules:\r\n - backendRefs:\r\n - group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n port: 8000\r\n---\r\napiVersion: networking.gke.io/v1\r\nkind: HealthCheckPolicy\r\nmetadata:\r\n name: gemma-health-check\r\n namespace: default\r\nspec:\r\n targetRef:\r\n group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n default:\r\n config:\r\n type: HTTP\r\n httpHealthCheck:\r\n requestPath: /health\r\n port: 8000\r\n---\r\napiVersion: networking.gke.io/v1\r\nkind: GCPBackendPolicy\r\nmetadata:\r\n name: gemma-backend-policy\r\n namespace: default\r\nspec:\r\n targetRef:\r\n group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n default:\r\n timeoutSec: 100\r\n balancingMode: CUSTOM_METRICS\r\n trafficDuration: LONG\r\n customMetrics:\r\n - name: gke.named_metrics.tpu-cache\r\n dryRun: false\r\n maxUtilizationPercent: 60\r\n---\r\napiVersion: autoscaling.gke.io/v1beta1\r\nkind: AutoscalingMetric\r\nmetadata:\r\n name: tpu-cache\r\n namespace: default\r\nspec:\r\n selector:\r\n matchLabels:\r\n app: gemma-server\r\n endpoints:\r\n - port: 8000\r\n path: /metrics\r\n metrics:\r\n - name: vllm:kv_cache_usage_perc\r\n exportName: tpu-cache\r\n---\r\napiVersion: inference.networking.x-k8s.io/v1alpha2\r\nkind: InferenceObjective\r\nmetadata:\r\n name: gemma-objective\r\n namespace: default\r\nspec:\r\n priority: 10\r\n poolRef:\r\n name: gemma-pool\r\n group: "inference.networking.k8s.io"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c43d0>)])]>

Testing the Failover

Verify the highly available architecture by simulating a primary region outage. Once the primary deployment is taken offline, the Gateway automatically detects the failure and seamlessly reroutes all subsequent user requests to the active secondary cluster, ensuring continuous availability without dropping traffic.

Next Steps

Take a deeper dive into a hands-on codelab and more information on these features review the following.

Hands-on Codelab: Build multi-cluster GKE Inference Gateway, with TPUs , Cloud Storage FUSE and managed DRANET
Document set: DRANET
Documentation: AI Hypercomputer

Want to ask a question, find out more or share a thought? Please connect with me on Linkedin.

Developer's guide to Gemini Enterprise and A2UI integration

Fri, 29 May 2026 16:00:00 +0000

If you've built a chatbot, you know this conversation:

User: "Book a table for two tomorrow at 7pm." Agent: "Okay, for what day?" User: "Tomorrow." Agent: "What time?"

A date picker would have ended this in one tap. But until recently, agents had no standard way to render a date picker — or a map, or a multi-select list — inside the chat surface they live in. They could only return text or markdown for generic usage.

Today, we're walking through how to fix that with A2UI, an open protocol for agent-driven user interfaces, and how to integrate an A2UI-enabled agent with Gemini Enterprise (GE) so your agent renders rich and interactive UI natively in the GE chat surface — and in your own custom frontend if you want one. We'll use a working restaurant-finder agent — built with the Google Agent Development Kit (ADK), the A2A protocol, and Gemini — as the reference. The full source is on GitHub and there's a 2-minute demo video.

The problem: agents speak text, but users want UI

Most agent frameworks today return strings. That's fine for short answers, but it breaks down quickly:

Multi-turn slot filling (date, time, party size) burns turns and patience.
Choices among options (which restaurant? which insurance plan?) become long bulleted lists the user has to copy-paste back.
Spatial information (locations, routes, floor plans) is reduced to addresses.

Developers have tried to patch this by sending HTML or JavaScript fragments, but that introduces real risks: cross-site scripting, UI injection from a remote agent you don't fully control, and visual drift from the host app's design system. What's needed is a way to transmit UI that's safe like data and expressive like code.

What A2UI is

A2UI is an open protocol, introduced by Google and co-developed with the Flutter team and product teams behind Gemini Enterprise. Instead of returning text or HTML, an agent returns a JSON payload that describes a UI: a tree of components (Card, Text, Button, ChoicePicker, Image, …) and a separate data model holding the values those components display.

Three properties make this useful in practice:

Declarative, not executable. The payload is data. The client only renders components from a pre-approved catalog, so a remote agent can't inject arbitrary code or steal credentials through a UI widget.
Streaming-friendly. The format is a flat list of small JSON messages, so the LLM can emit them incrementally and the client can paint as they arrive.
Framework-agnostic. The same agent response renders through Lit, Angular, Flutter, or native mobile. The agent doesn't know — or care — what's on the other end.

A2UI is also transport-agnostic. The messages ride inside whatever pipe you already use: A2A JSON-RPC, AG-UI, WebSockets, SSE. In our reference implementation, A2UI rides inside the A2A protocol as DataPart objects with the MIME type application/json+a2ui.

Where A2UI sits in the stack

A2UI is one piece of a four-layer stack. Confusion usually comes from conflating these layers — they're each doing a different job:

Layer	Owns	Examples
App experience	Client shell and conversation state — chat window, input box, message history	CopilotKit, AG-UI
Pixel drawing	Turning component descriptions into actual rendered UI	Lit, Flutter, Angular
Conversation pipeline	Client–server transport — sending messages, receiving responses	A2A Protocol
Cargo (data format)	The thing flowing through the pipeline that describes the UI	A2UI

Read top to bottom: CopilotKit/AG-UI owns the app experience. Lit/Flutter/Angular own the rendering. While CopilotKit and AG-UI provide valuable abstractions, they remain strictly optional for implementing A2UI. In this architecture, A2A serves as the underlying conversation pipeline, while A2UI represents the structured cargo that actually traverses that pipe.

That separation is why the same A2UI payload renders identically in three very different deployment shapes:

Bespoke web app — a custom client shell (like the reference repo's Lit frontend/) plus a custom A2UI renderer.
CopilotKit / AG-UI app — CopilotKit owns the chat shell, an A2UI renderer is registered inside it for rich cards.
Gemini Enterprise — GE is the shell, the renderer, and the transport client. You only build the agent.

So for the GE path, the stack collapses to two layers you control: the A2A endpoint (your agent) and the A2UI cargo it emits. The other two layers are GE's responsibility. CopilotKit and AG-UI are great if you're building a standalone product UI elsewhere — they're just out of scope for embedding an agent inside Gemini Enterprise.

Pattern revisions

The protocol evolves quickly, and different clients support different revisions. Two patterns are common today:

Inline pattern — the agent sends a component tree with the data baked into each component (the pattern Gemini Enterprise renders today).
Decoupled pattern — the agent sends the component tree and the data model as separate messages, so subsequent turns can update one without re-sending the other. This reduces tokens and latency for long-running conversations and is the direction the protocol is heading.

The reference repo serves both patterns from one backend, picking which to emit per request based on the client's X-A2A-Extensions header. As new revisions ship, you add another catalog and the same negotiation pattern keeps working.

How A2UI works inside Gemini Enterprise

Gemini Enterprise ships with a built-in A2UI renderer. For the developer, that means the integration story is short:

Build your A2A agent, embedding an A2UI catalog and example payloads alongside the regular tool definitions.
Register the agent with Gemini Enterprise as an A2A endpoint. (Use make register-gemini-enterprise in the reference repo.)
A GE admin shares the agent with employees, just like any other agent in the GE catalog.

At runtime, the flow looks like this:

The user types a request in the GE chat. GE calls your agent's A2A endpoint and sends along GE's own A2UI catalog — the list of UI components GE knows how to render.
Your agent decides whether a UI widget is the right response. If yes, it emits an A2UI JSON message (e.g., a ChoicePicker of restaurant options). If no, it falls back to text. Both can coexist in the same response.
GE receives the JSON, validates it against its catalog, and renders the widget natively in GE's own design language — so it visually matches the rest of the chat surface.
When the user interacts with the widget (selects three options, picks a date), GE serializes the interaction back into JSON and sends it to your agent as the next turn. Your agent processes structured input, not free-form text.

One thing worth flagging: because your agent doesn't ship its own renderer for GE, you don't need to choose a frontend framework to start. Your A2A endpoint can run anywhere — Cloud Run, GKE, on-prem — and GE handles the rendering.

High-level architecture example

The reference implementation is an ADK backend on Cloud Run designed to plug seamlessly into Gemini Enterprise.

Gemini Enterprise connects directly to your agent using standard A2A JSON-RPC calls.
The agent serves the inline message pattern expected by the Gemini Enterprise managed UI.
Custom components like GoogleMap render via Google Maps Embed iframes, with the API key injected server-side so the LLM never sees it.

The following demonstration illustrates how Google Maps functions as a live, interactive component within Gemini Enterprise rather than a static image. Leveraging A2UI's streaming-friendly architecture, the agent updates the map view in real-time—dropping pins and adjusting coordinates incrementally as results arrive from the Maps API.

See it running, then build your own

Detailed implementation guide here.
Demo video (2 minutes, end-to-end with both the Lit shell and Gemini Enterprise): https://youtu.be/_5AaYwyqVio
A2UI spec and component reference: a2ui.org
Gemini Enterprise updates, including the A2UI renderer: What's new in Gemini Enterprise
A2UI generative UI announcement: Introducing A2UI generative UI

If you're already building agents on Google Cloud, the fastest path is to clone the reference repo, run make local-backend for a local smoke test, and then make register-gemini-enterprise to wire it into GE. From there, swap in your own catalog, your own tools, and your own domain. The next time a user asks your agent for "a table for two tomorrow at 7pm," the answer can be a date picker instead of another question.

A Guide to AI Cold Starts on Cloud Run

Wed, 27 May 2026 17:23:00 +0000

I saw a developer asking on Reddit if there was any “sane way” to manage Cloud Run cold starts for AI across multiple regions. They were experiencing startup latencies of up to 20 seconds, a frustrating gap where the infrastructure is spinning up while the user waits for a response.

The discussion was full of developers who had almost given up on serverless GPUs, with some even migrating back to GKE just to escape the latency. I decided it was time to dive deep into the Mechanics of AI Cold Starts and see if we could find that "sane way."

During my research into hosting models like Gemma 4 on Cloud Run, I had the privilege of co-presenting at Google Cloud Next '26 with Oded Shahar (Senior Engineering Manager for Cloud Run) and our guest speaker Ajay Nair (Global VP of Platform at Elastic).

In our session, "Build AI architectures with custom models on Cloud Run," Ajay shared the production-hardened strategies that allow Elastic to serve millions of daily requests across 17+ model variants, all while maintaining the 'scale-to-zero' efficiency of Cloud Run.

Build AI architectures with custom models on Cloud Run

Ajay showed us that the secret isn't just in the model, but in treating GPUs as fungible compute rather than infrastructure to manage.

I realized then that minimizing cold start latency isn't just about the model, it's about the infrastructure patterns and architectural decisions that keep it fast, scalable, and secure.

The anatomy of an AI cold start

As the official Google Cloud GPU best practices explain, an AI cold start is a shift from standard web microservices. You aren't just booting code, you're moving gigabytes of weights into a specialized physical accelerator.

Think of it as a four-phase race. If you don't optimize each step, you're going to lose your users.

Phase 1: Infrastructure Provisioning (~5s)

Cloud Run allocates the physical GPU and injects pre-installed NVIDIA drivers. Since Google manages the drivers for you, you don't have to bloat your Dockerfile.

Phase 2: Block-Level Container Image Streaming (1-2s)

Cloud Run uses "image streaming," meaning it pulls only the blocks needed to boot. Your 15GB CUDA image can actually start as fast as a tiny Node.js app!

Phase 3: Engine Initialization (5-15s)

This is where your inference engine (vLLM, Ollama) warms up. This is a massive CPU-heavy task, and it's where most people get throttled without realizing it.

Phase 4: Model Loading & VRAM Transfer

This is the final hurdle - moving those model weights from storage into the GPU memory. Unlike standard web apps where CPU is king, GPU memory is your primary constraint here. If your model’s weights don’t fit entirely within the GPU memory, performance degrades significantly as it swaps to slower system RAM.

Best practices to handling AI cold starts

To build a "sane" production environment, here are a few crucial levers you can pull, informed by the official Google Cloud documentation on AI inference with GPUs.

Optimize Phase 4

Pick the Right Deployment Option

Phase 4 is the "final hurdle" where you move gigabytes of weights from storage into GPU memory. Your choice of storage determines how fast this transfer happens:

Cloud Storage (Concurrent Download) - Fastest: Using the Google Cloud CLI (gcloud storage cp) allows you to download model files in parallel. This is the recommended method for massive weights because it maximizes network throughput and drastically reduces transfer time.
Cloud Storage (FUSE) - Easiest: This provides "zero-code" changes by mounting a bucket as a local file system. However, because it does not parallelize the initial download, it is significantly slower for large model weights
Container Image - Best for <10GB: Baking weights into your image is efficient for smaller models thanks to Cloud Run's Image Streaming. For models over 10GB, however, the import and streaming overhead can become a bottleneck.
Internet: Avoid this. It is the slowest and least predictable path for production inference.

Model Format & Size

Optimizing your model's format and size is a direct "hack" to shorten Phase 4 (Model Loading & VRAM Transfer). Because this phase is constrained by how fast you can move gigabytes of data into VRAM, smaller and more efficient files are critical.

4-bit Quantization: This is the ultimate cold start hack. Smaller weights mean fewer gigabytes to pull from storage, which directly accelerates the download and transfer portion of Phase 4,
Fast Formats: Pick a model format with fast load times like GGUF to minimize startup time. For the fastest performance, move away from Python "pickle" files and use Safetensors for zero-copy loading.
Ensure VRAM Fit: Use quantized models to ensure the weights fit entirely within the GPU memory. If the model exceeds VRAM, Phase 4 will stall as the system swaps to significantly slower RAM.

Optimize Phases 3 & 4: Infrastructure & Network Levers

These infrastructure settings provide the necessary resources to accelerate the most demanding parts of the startup process.

Startup CPU Boost (Accelerates Phase 3)

This feature temporarily doubles your CPU power during startup. A 1 vCPU instance boosts to 2 vCPUs for the duration of startup and the first 10 seconds of serving. It is essential for Phase 3, as engine initialization is a massive CPU-heavy task.

Direct VPC Egress & PGA (Accelerates Phase 4)

Utilizing Direct VPC Egress with Private Google Access (PGA) ensures your model weight traffic stays on Google’s internal high-speed backbone. This optimizes the network path to shorten the time spent moving gigabytes of weights into VRAM.

Concurrency Tuning (Cold Start Avoidance):

In Cloud Run, "concurrency" refers to the maximum number of requests a single instance can handle before the platform scales out to start a new one. For AI workloads, you must tune this setting in tandem with your model engine's internal parallelism flags (e.g., --max-num-seqs for vLLM or OLLAMA_NUM_PARALLEL for Ollama).

Use the official Google Cloud formula to find your ideal Cloud Run concurrency:

(Number of model instances∗parallel queries per model)+(number of model instances∗ideal batch size)

Example: If your instance loads 3 model instances onto the GPU, and each model instance can handle 4 parallel queries with an ideal batch size of 4, you would set your Cloud Run maximum concurrent requests to 24: (3×4)+(3×4)

How the math works: The goal is to keep the GPU fully saturated while ensuring users aren't stuck in a long queue. In this example, the total of 24 concurrent requests is split into two functional groups:

Active Processing (12 requests): Calculated as (3 instances×4 queries), this represents the total number of requests the GPU can actively process at any given moment.
The "Next Batch" Buffer (12 requests): Calculated as (3 instances×4 batch size), these are the requests waiting "on deck" inside the container. As soon as the GPU finishes the first batch, it immediately picks up these waiting requests.

By tuning this value as high as your VRAM allows (usually 10-20 users), one warm instance can serve many requests without triggering a new scale-out event and the cold start that comes with it.

Scaling Controls (Tuning the Threshold)

While the formula above defines your maximum capacity, you can also tune when Cloud Run decides to start the next instance. Cloud Run's autoscaler typically targets 60% utilization, but for long-running AI cold starts, you can increase this threshold to 80% or 90% via Scaling Controls.

Concurrency Target: Increasing this allows you to "pack" more requests into a single warm instance before triggering a scale-out.
CPU Target: Increasing the CPU target prevents the platform from starting a new instance just because initialization or high-intensity inference spiked the CPU utilization.

Scaling & Reliability Strategies

Sometimes the best way to handle a cold start is to avoid it entirely or manage it proactively.

The Single-Region "Always-On" Tradeoff

If you are deploying globally, the cost of keeping minimum instances set to 1 in every region adds up. Instead, consider an 'always-on' service in just one region. A 100ms global network delay is a much better user experience than a 20s local cold start.

The 15-Minute Grace Period: A common question is 'How long will my instance stay warm after a request?' Cloud Run generally keeps instances alive for 15 minutes after they become idle (processing zero requests). If your traffic is predictable and comes in every 10–12 minutes, you might not even need an 'always-on' service, the platform’s default shutdown policy will keep a warm instance ready for your next user.

Note: While this idle time is "free" for standard request-based services, remember that GPU services require instance-based billing, so you will be billed for the duration the instance remains warm between requests.

The "Wake-Up Call" Strategy

Sometimes the best way to handle a cold start is to proactively mask it. If your UI can predict an upcoming request, for example, when a user clicks "New Chat" or begins hovering over a text area, you can send a lightweight health check to your service immediately. By the time the user finishes typing their prompt, the first two phases of the cold start (Infrastructure Provisioning and Container Image Streaming) are already finished in the background.

Pro-Tip: Use Non-Inference Endpoints To make this "wake-up call" as fast as possible, always use a non-inference endpoint rather than sending a dummy prompt like "hi".

Why it’s faster: Non-inference endpoints (like /v1/models for vLLM or /api/tags for Ollama) are handled by the container’s web server the moment it starts. They don’t have to wait for the slow "Phase 4" model loading and VRAM transfer to complete before sending a success response.
No Chat Pollution: Because these endpoints don't trigger the model's completion logic, they won't interfere with the user's actual chat history or accidentally trigger session creation in your backend.

Recommended Endpoints:

vLLM: GET /health or GET GET /v1/models
Ollama: GET /api/tags or GET /api/version

Tune Startup Probes for VRAM

AI models take significant time to move gigabytes of weights from storage into GPU memory (Phase 4). If your startup check fails too many times, Cloud Run will assume your container is broken and kill it.

To prevent this:

Increase the Failure Threshold: Use a high failureThreshold (e.g., 60 or more). Since the total allowed startup time is the product of failureThreshold \times periodSeconds, a threshold of 60 with a 5-second period gives your model a healthy 5-minute window to load.
Utilize the 30-Minute Maximum: While standard services are limited to 4 minutes, Cloud Run supports a total startup time of up to 30 minutes (1,800 seconds) for intensive workloads.
Avoid False Positives (The Ollama Fix): Be careful with engines like Ollama, which may open a TCP port as soon as the service starts, but before the model is actually in VRAM. Always ensure you are preloading models during the container's entrypoint script to ensure the startup probe only passes once the model is truly ready for inference.

Lessons from Elastic’s strategy

In our NEXT ‘26 session, Ajay Nair highlighted three architectural decisions that allowed Elastic to treat GPUs as fungible compute, rather than infrastructure to manage:

Bypass the Compilation Tax: By setting enforce_eager=True in vLLM, they traded a tiny bit of throughput for cold starts that finish in less than a minute rather than multiple minutes.
Standalone Checkpoints: They avoided the latency of runtime adapter-switching by pre-merging each LoRA variant into a standalone checkpoint.
One Workload, One Service: Each independently-scalable workload — defined by model, task adapter, and traffic shape — is deployed as its own Cloud Run service. This produces 30+ services across ~15 model families, with some models split by task (e.g., v5 retrieval vs. clustering) or by query/passage role.

Ready to get started?

Optimizing the cold start process is the difference between a hobby project and a production-ready application. The best part? Cloud Run handles the NVIDIA driver and CUDA installation for you, starting the instance in about 5 seconds.

For a deeper dive, the official documentation is your best friend:

For the full technical breakdown, I highly recommend watching the recording of the session from Google Cloud Next '26. It provides the most comprehensive blueprint for hosting high-performance open models on serverless infrastructure."

Happy building!

Special thanks to Sara Ford and Shane Ouchi from the Cloud Run team and to Zac Li from Elastic for the helpful review and feedback on this article.

Shipping features to production just got easier with new feature flags in AppLifecycle Manager

Thu, 21 May 2026 16:00:00 +0000

Many development teams are familiar with the hesitation that comes right before pushing a new feature live. As AI helps developers write code faster, the gap between rapid code generation and safe production deployment continues to grow.

Feature flags offer a practical way to manage this risk by separating the act of deploying code from the act of releasing a feature to users. Instead of a single, high-risk launch event that affects all users simultaneously, teams can ship code to production with new features hidden by default in a controlled manner.

To help teams adopt this workflow, we are announcing the public preview of AppLifecycle Manager Feature Flags (ALM FF). This service provides a rule-based solution to manage software behavior across Google Cloud, helping you support rapid development without sacrificing production stability.

Read on to learn four ways these feature flags will help accelerate your deployment.

1. Decouple for safety and velocity

The core mission of ALM FF is to increase development velocity by decoupling your feature releases from your code deployments. Traditionally, releasing a feature requires a binary deployment — a high-risk event that affected all users simultaneously.

With ALM FF, you can ship code to production with new features disabled by default. This allows your team to move faster, deploying code continuously while choosing the exact moment to enable a feature via a toggle. If an issue is detected, the flag acts as an instant kill switch, disabling the problematic feature immediately without the need for a full, time-consuming code rollback.

2. Gradual enablement with precise targeting

Safety is about precision. ALM FF leverages the Common Expression Language (CEL) to implement sophisticated logic for gradual feature enablement.

Percentage feature enablement: Instead of a global launch, you can ramp up a feature to 1%, 5%, or 50% of your traffic. This allows you to monitor system health and performance metrics incrementally, ensuring stability before reaching your entire user base.
Precise allowlisting: You can target specific internal teams, beta testers, or early-access customers by allowlisting their identifiers. This ensures that only the intended audience sees a feature during its initial validation phase.

3. Dynamic configuration for the AI era

Beyond simple toggles, ALM FF offers a dynamic way to inject configuration into your applications. By using string-type flags, you can update application behavior — such as system prompts for LLM integrations—in real-time. This allows product managers and business owners to tweak AI responses and application logic without requiring any code changes or infrastructure rollouts.

4. Built on open standards

We believe safety should not mean lock-in. ALM FF is built on the OpenFeature standard, utilizing industry-standard SDKs and the flagd evaluation engine. This ensures your feature management patterns are portable and follow best practices without adding Google-specific dependencies to your core application code.

Get started

ALM FF is now in public preview. To take control of your releases, you can:

Review the docs: Public Documentation
Onboard today: Quickstart Guide
Give us feedback: Help us shape the future of feature management

Securing Your Gemini and Google API Keys

Thu, 21 May 2026 10:19:00 +0000

Today, AI services rely heavily on API keys. To run AI agents, users provide API keys that signify paid tokens, subscriptions, or paid accounts. While API keys are easy to use, it is just as easy to use them unsafely. The result of a hijacked key is a compromised environment that is misused or abused by perpetrators.

I decided to write this blog post after seeing a thread in the r/googlecloud subreddit asking for a tutorial so users can go and protect themselves. In this post, you will find a few simple steps you can take to reduce your risks and improve the security of API keys created by Google.

You use Google API keys to access Gemini and other AI Google products as well as Google Cloud APIs. In fact, a Gemini API key is actually a standard Google API key behind the scenes. While I will be focusing on Google API key security, you can apply some of these recommendations to API keys and product tokens created elsewhere.

Step 1: Generate a New API Key

Regardless of where you start, you end up creating a new API key in one of Google Cloud projects. You probably will use Credentials under the "APIs & Services" menu in the Cloud console.

Or you may use gcloud services api-keys create command instead. Or there is some other interface which will create a new Google Cloud API key. Regardless of the path and the interface, you need to do the following:

Create the key in a stand alone project that is not used for any other purpose.
Restrict API access and client applications for the new API key.

These steps limit the potential reach of the key and greatly simplify troubleshooting activities if something goes wrong.

API Restrictions

API restrictions define exactly which services can be accessed using a specific API key. To keep your environment secure, always limit this list to the absolute minimum set of services required. While the Google Cloud console now prevents the creation of entirely unrestricted keys, it can still be tempting to add extra APIs to "future-proof" or speed up development. However, we strongly advise against this. By strictly adhering to the principle of least privilege, you significantly reduce the potential damage (or "blast radius") if a key is ever accidentally exposed or hijacked.

It is also important to audit keys generated automatically through integrated developer tools. For example, creating an API key in Firebase restricts the use to 24 APIs including Datastore, Firestore, Cloud SQL Admin and others.

If you use Firebase to store your website you probably will not use most of them. When you create an API key to use with AI Studio, restrict it to only "Gemini API".

Attention points:

If you search for an API that you want to select but it is missing, this API is probably not enabled in the Google Cloud project that you use. Go to the API Library in your Cloud console, find the API by name and enable it first.
You can do all actions using the Cloud console or gcloud CLI. Other interfaces (e.g. Firebase) may not provide you with access to all parameters of the API keys

Application Restrictions

Similar to API restrictions that limit what services your key can be used for, Application Restrictions limit the applications which can use the key. For example, if you create an API key only for use with Google AI Studio, setting up the application restrictions to the website "https://aistudio.google.com/" will prevent using your key by automations that utilize Gemini and consume a high volume of tokens at scale.

You can set up one or more restrictions of one of the following types:

Website/Web application using the list of URLs
Services using the list of IPv4 or IPv6 address or a subnet masks
iOS applications using the list of Bundle IDs
Android applications using the list of pairs of the package name and certificate fingerprint

Note that you can restrict the key to a single application type only. Create a designated API key for each application type. Having a key per application type helps when observing the key usage and investigating potentially compromised keys.

Step 2: Store API key

I want to reiterate that the API key is not paired with your identity. ANYONE can use it. So, storing the key securely is as important as restricting the key use in Step 1.

The rule is simple: NEVER EVER store the key where it can be easily seen.

If you use an API key in your application, store it in Secret Manager or a similar secret management service. Secret Manager allows you to inject your API key into Cloud Run and GKE environments easily. However, to elevate the key protection you may want to read the key in your code instead. See documentation for an example.

If you use an API key with an external application that asks you to type in the key, take extra steps to explore how the application manages your key. You would need to find out how the key is stored and how it is used in the requests. For Web applications, you may use browser developer tools to inspect application traffic and ensure that the key is never sent in an unencrypted communication channel. For example, Google AI Studio uses encrypted local storage and sends the key via a TLS-encrypted channel.

If Something Goes Wrong

What to do if you suspect that your key is compromised? The straightforward action is the same as with a credit card. First thing ‒ delete the key. You can do it in the Cloud console or using gcloud services api-keys delete command. If you find out that it was a false alarm, you can undelete during the next 30 days.

What if you do not know which key is compromised? In that case you need to do a two-step investigation:

Find out all API keys in your organization or project(s)
Check the graph of API consumption for APIs this key allowing to access

Find out all your API keys

There is more than one way to find your API key resources. You can use Asset Inventory in the Cloud console and filter the dashboard by the Resource type to check apikeys.Key. If you do not see this resource type, find and click on "View more…" to expand the resource type list. Note that the list shows deleted API keys as well.

If you favor CLI, and you know specific project(s) you can use the gcloud services api-keys list command.

To see all active keys in your organization, you will need to use the gcloud asset search-all-resources command and query its JSON output to filter out deleted keys:

code_block: <ListValue: [StructValue([('code', 'gcloud asset search-all-resources \\\r\n --scope=\'organizations/123456789012\' \\\r\n --asset-types=\'apikeys.googleapis.com/Key\' \\\r\n --read-mask="name,displayName,versionedResources" \\\r\n --format=json \\\r\n --order-by=\'createTime\' \\\r\n| jq \'.[] | select(.versionedResources | all(.resource.data.deleteTime == null))\''), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579ce9d0>)])]>

Find out API consumption

There is a way to track the usage of the API key. You can do it using the Cloud Monitoring metric serviceruntime.googleapis.com/api/request_count. This metric shows a number of times different services have been invoked. To see the number of service requests for a particular API key you will need to use the metric's label credential_id and filter it by the API key unique ID. You can see the metric data using Metrics explorer or use the Monitoring API with the following PromQL expression:

code_block: <ListValue: [StructValue([('code', 'sum(\r\n rate({\r\n "__name__"="serviceruntime.googleapis.com/api/request_count",\r\n "monitored_resource"="consumed_api",\r\n "credential_id"="apikey:00000000-0000-0000-0000-000000000000"\r\n }[${__interval}])\r\n)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579cebe0>)])]>

You can further filter this metric by service_name label using API name (e.g. mapstools.googleapis.com).

In order to find out the API key ID you will need to use one of the following methods:

Using the Cloud console, open the Credentials page and select the API key that you want. Inspect URL of the API key page in the browser which will look like: https://console.cloud.google.com/apis/credentials/key/[KEY_ID]?project=[PROJECT_ID]. Copy the [KEY_ID] part.
Using gcloud CLI, run the gcloud services api-keys list --format='value(displayName,uid)'command and find the key by its display name. Copy the UID next to the display name.

Abnormally high level of API invocations usually indicates that the API key was compromised and used to access API by a malicious party.

Step 3: API key management hygiene

Whether you are an engineer, an experienced cloud user or just came to experiment, keeping proper API key hygiene is important to avoid your environment being hijacked from you.

If you already use Google API keys do the following right now:

Find out all API keys that you have
Delete all keys that you no longer use or do not recognize (do not worry, you can restore them during next 30 days)
Restrict API keys to only APIs that you intend to use. Narrow the list of clients that can use the APIs if you can
If you administer your Google Cloud projects or organization, consider setting up the apikeys.googleapis.com/Key org policy to minimize wrangling API keys
Consider periodically rotating (refreshing) your API keys by replacing them with newly created ones that share the exact same restrictions. Just be careful to track down and update all places where your existing key is used before deleting it to prevent unexpectedly breaking your application or abruptly losing access to one.

Wrapping up

Securing API keys is a vital step in protecting your cloud ecosystem. Implementing strict API and application restrictions, utilizing secure storage, and proactively monitoring consumption are highly effective ways to prevent unauthorized access. These practices safeguard your development environment from exploitation and prevent unexpected billing charges.

To help you implement these practices, here are a few practical tools and resources you can explore next:

Check more about APIs: Review Best practices for managing API keys and practice Search for and use Google APIs.
Watch a quick tutorial: Check out this great Google Cloud Tech video on Manage your Cloud Run secrets securely with Secret Manager to see secure storage concepts in action.
Get hands-on with a Codelab: Practice fetching credentials safely in a guided environment by trying Secret Manager with Python or with Spring Boot codelabs.

Dive deeper into the docs: Learn about how to select metrics, create charts and set up alerts to observe your API consumption.

What Google I/O '26 means for developing agents on Google Cloud

Tue, 19 May 2026 17:45:00 +0000

At Google I/O, we introduced a unified development toolkit featuring Antigravity 2.0 and the Managed Agents API, giving developers better ways to build locally and deploy securely to the cloud on a shared protocol layer. In this blog, we’re going to show you how Gemini Enterprise Agent Platform and the new developer tools shared at I/O fit together, unpack the spectrum of choice for building, and share what we’d actually try first.

Following the evolution of Vertex AI into the Gemini Enterprise Agent Platform – a comprehensive platform to build, scale, govern, and optimize agents with new features like session memory and centralized governance – we are now extending these capabilities directly into your local development tools. Our goal is to bridge the gap between high-speed prototyping and secure, compliant corporate deployment, offering a modular approach where you can choose between quick-start workflows or full production control to fit your stack's specific needs.

Here’s how those pieces now lay out across the entire spectrum of choice.

The four rungs: The spectrum of how to build agents

We like to think of the agent development ecosystem as four rungs on a ladder, designed to give you a clear slider between out-of-the-box configuration and complete code-first control. They're deliberately additive, meaning that starting fast on the lower rungs above never locks you out of graduating to the deeper customization of the rungs above.

Underneath all four rungs is the A2A protocol. This interoperability ensures that an agent built on the first rung can be called as a sub-agent on the fourth rung, allowing your entire architecture to scale seamlessly on the same infrastructure.

Rung one: Agent Studio (low code)

A visual workspace inside Agent Platform. You discover models in Model Garden, engineer prompts, wire up tools, and ship an agent without writing code. Best for business-facing teams and rapid prototyping. The agent you build here runs on the exact same runtime as everything below it.

Rung two: Managed Agents API

New at I/O, the Managed Agents API is for technical teams who want to “manage the mission, not the machine." It allows you to define agentic behavior and let Google Cloud handle the heavy lifting, acting as an agent-as-a-service with nothing to manage.

You use the Managed Agents API to configure your agent, and the Interactions API to invoke it. You package your instructions, skills, and tools, POST them, and Gemini builds and runs the agent.

What makes this deployable is the Google Cloud sandbox, which is secure by design. The agent harness runs on our servers, and each agent has its own ephemeral sandbox provisioned with your skills, Model Context Protocol (MCP) servers, and server-side tools. Full integration with A2A and Agent Platform governance and security are coming soon.

Rung three: Antigravity and friends

Antigravity is our primary solution for developers looking to leverage AI for coding tasks and agent orchestration, enabling teams to transform how apps are built and deployed. We've consolidated our developer-facing coding strategy into this single, powerful harness shared across multiple surfaces.

It’s co-optimized with the Gemini family of models, offering high efficiency to speed up development cycles and reduce costs. Skills you develop with Antigravity are intended to be portable across different surfaces.

This is for development teams who want to utilize Google's advanced reasoning capabilities within their coding workflows, implement custom development loops, and transform how they build, deploy, and manage applications.

Today, we are expanding this with new tools:

Antigravity 2.0: A new standalone desktop application providing a centralized workspace to steer, customize, and orchestrate coding agents. Developers can use this to manage complex tasks, such as orchestrating agents to refactor code, generate unit tests, or even scaffold new service components based on a specification. Agents can spin subagents from a single prompt, while multi-agent orchestration allows tasks to run in parallel.
Antigravity CLI: This brings the full Antigravity experience to the command line: same harness, same agent, same quality of intelligence as Antigravity 2.0, with a product experience tailored for the terminal. It's optimized for speed and lower overhead, and adapts entirely to you. The CLI is tightly integrated with the desktop app, sharing authentication, context, skills, and configurations, providing a consistent experience across both interfaces. Use the Antigravity SDK to build your own runtime.
Enterprise security and compliance: Google Cloud customers can now use Antigravity 2.0 and Antigravity CLI with their Gemini Enterprise Agent Platform project. All you have to do is to log in with Cloud OAuth, set your Agent Platform Project ID and region. This ensures that all agent inference runs via Agent Platform models within your secure cloud boundary, inheriting Google Cloud’s standard data privacy protections and Terms of Service. This ensures your customer data is in your control , and you can utilize regional model endpoints.

Integrating other coding agents

While Antigravity is our recommended agentic coding solution, Google Cloud is designed to work well with any coding agent you choose. Our platform is open, and we provide tools to ensure flexibility:

The Agent CLI and Agent Development Kit (ADK) allow you to build and interact with agents from various sources, including tools like Claude Code. This means developers can often keep their preferred interfaces while running the underlying AI inference on Google Cloud. This approach ensures your workflows benefit from Google Cloud's security, compliance, and infrastructure.
Our Skills for Google products, launched at Next, are designed to be compatible with multiple coding tools, enabling you to enhance different agents with a consistent set of capabilities.

This flexibility allows teams to integrate their existing favorite tools and models, ensuring seamless and compliant operation within their established workflows.

Rung four: Agent Development Kit (ADK 2.0)

Code-first, low floor, high ceiling. If Managed Agents are configuration-first, ADK is engineering-first. This is for software engineers who want to build custom agent meshes from the ground up - any architecture, any model, unconstrained.

ADK enhancements launched at Google Cloud Next are now available for everyone. It introduces a unified graph-based engine that gives you a slider from dynamic, model-led reasoning to strict, deterministic workflows. The framework handles the heavy lifting of multi-agent coordination, managing how sub-agents, tools, and data pass between one another.

Collaborative workflows (Python v2.0.0): Previously called the Task-based Agent Collaboration API, this is how you build self-managing agent teams. A coordinator delegates to subagents using explicit operating modes:
- chat: Full user interaction, manual return to parent, this is “handoff conversation to sub-agents”.
- task: User interaction for clarifications, automatic return to parent, this is a new “collaborate for this assignment” which is the best of both other options.
- single-turn: No user interaction, parallel execution, automatic return, this is “agent as tool”.
Dynamic workflows: Dynamic workflows in ADK allow you to put aside graph-based path structures and use the full power of your chosen programming language to build workflows. With Dynamic workflows, you can create workflows with simple decorators, invoke workflow nodes as functions, and build complex routing logic.
ADK Kotlin (Beta): "ADK for Android." Kotlin support joins Python, Go, and Java, increasing language coverage so your on-device mobile agents can seamlessly coordinate with your backend Python agents.

Finally, the Agents CLI packages Google's expert skills for ADK, eval, deploy, observability, and publishing - turning any AI coding agent (like Antigravity, Gemini CLI, Claude Code, or Cursor) into an expert at agent app building as well as agent ops. It gives your AI Agent skills to understand the Google Cloud agent stack, turning an expansive ecosystem into a seamless assembly line for developers hillclimbing their agent builds.

What we'd actually try first

If we were starting today, here's the order we'd reach for things:

Start with the Antigravity 2.0 desktop app: Explore the interface, add a pre-built agent, and interact with it to understand the core functionality. This provides a more intuitive entry point before diving into API specifics.
Build a mesh: Feel free to explore Managed Agents API through the Agents API skill and Interactions API skill. When you start hitting routing decisions you want to make explicit, or need complex multi-agent orchestration, port your logic to ADK 2.0. The graph model is worth the learning curve as soon as you have more than two branching paths. Don't worry about stringing together a bunch of separate pieces to make this happen - this is exactly where the Agents CLI shines.
Govern and reuse shared domain logic: Check out Skill Registry (public preview): A centralized catalog to govern and promote the reuse of packaged domain logic. Skills are accessible via the Managed Agents API, Agent Platform SDK, and ADK (via SkillToolset). Skill Registry will be part of Agent Registry shortly.
Evaluate: Use the Gemini Enterprise Agent Platform evaluation suite to move beyond basic text-matching vibe checks. Leverage synthetic user simulation to auto-generate multi-turn testing scenarios and safely mock API environments to pressure-test tool resilience. Finally, utilize its LLM-based autoraters and trace logging to evaluate complex logic, group failures, and continuously optimize your agent.
Secure the pipeline: Leverage Gemini Enterprise Agent Platform governance capabilities like Agent Identity, Agent Gateway, Agent Security, and Agent Registry to secure your deployment. Once CodeMender releases, add it to your CI/CD to proactively secure the code your human (and AI) developers are pushing.

Note: You can do this whole loop on a Google Cloud Starter Tier account without a billing account attached. First two app deployments are on us.

We’re excited and hope you are, too

The agent space is evolving rapidly. Agent Platform offers a secure and adaptable foundation. Core components like the Agent Gateway, identity management, and the Skill Registry work together to ensure a robust and controlled environment for your agents, enabling you to innovate flexibly without vendor lock-in.

Pick the rung that fits the project. Bring whatever coding agent your team prefers. The platform you graduate to is the same one either way, and the data stays inside your Cloud project the whole time.

If you only read one set of docs after this post, make it the Agents overview in the Agent Platform documentation. If you build something interesting, show us - the best examples will land in the next round of templates.

We can’t wait to see what you build!

Gemini Live Agent Challenge: Announcing the winners and highlights

Fri, 15 May 2026 16:00:00 +0000

The Gemini Live Agent Challenge is officially in the books! We challenged developers worldwide to break out of the traditional 'text box' paradigm by building next-generation AI agents. From our initial announcement to amassing 11,878 participants and 1,536 submitted projects from 151 countries, the results were nothing short of spectacular.

The mission was to seamlessly integrate multimodal capabilities—building agents that help you see, hear, speak, and create in real time — using the Gemini Live API, the Agent Development Kit (ADK), and the robust infrastructure of Google Cloud. Participants pushed the boundaries of interactive AI across three distinct categories: The Live Agent, The Creative Storyteller, and The UI Navigator.

Congratulations to the builders who took home the top prizes! These winning teams combined technical precision with bold imagination, completely redefining how users can interact with and experience agents. Two of these standout developers were even recognized in person at Google Cloud Next 2026. Here’s a look at their experience, alongside the complete list of winning agents.

Celebrating our category winners at Google Cloud Next ‘26

Category winners Jeremiah Somoine and Bryen Param were invited to attend Google Cloud Next 2026 in Las Vegas, where they shared their experiences and insights with the broader developer community. Both winners presented Lightning Talks at the Developer Theatre on the expo floor and sat down for exclusive interviews in the Creator Studio Pod at the GDE and Certified Lounge.

During his time at the event, Bryen discussed the core inspiration behind drone-copilot. He explained that his project was driven by the question of "what if a model could interact with the real world?", showcasing how multimodal capabilities can bridge the gap between AI and physical environments.

Jeremiah, currently a college student, reflected on the development process behind Sankofa, noting that "the best response to a technical limitation was a creative one." When asked what advice he would give to other students looking to build the next generation of AI applications, he emphasized the importance of jumping at any opportunity to get hands-on with the technology. "The best way to learn is by doing," he said, encouraging aspiring developers to simply dive in and start building.

Winners

Grand Prize winner: ORION - Operating Room Intelligent Orchestration Node
By: Aditya Shukla

ORION, or Operating Room Intelligent Orchestration Node, is a voice-directed surgical co-pilot for robotic surgery. Surgeons can speak naturally and instantly receive answers, live data on display, and real-time visual assistance - all without breaking scrub.

The Live Agent winner: drone-copilot
By: Bryen Param

Drone-copilot transforms how users interact with hardware by enabling natural, real-time conversations with a drone instead of using a joystick or complex menus. Simply by speaking, users can instruct the drone to navigate, perform autonomous visual inspections, or describe its surroundings, while the drone verbally responds and confirms its actions in real time.

Creative Storyteller winner: Sankofa
By: Jeremiah Somoine

Sankofa acts as a multimodal AI "griot"—a traditional West African storyteller—transforming fragmented family histories into deeply immersive narratives. Based on just a few user details, it weaves together rich voice narration, watercolor imagery, and ambient soundscapes into a historical story, allowing users to engage in a real-time voice conversation with the storyteller to explore their roots further.

UI Navigator winner: Moonwalk
By: Enaiho Uwas Paul and Aman Kumar Sah

Moonwalk is a conversational, hands-free desktop assistant that helps users intuitively navigate their computer and complete complex tasks using just their voice. By remembering personal preferences and past interactions, it acts as an intelligent co-pilot that can seamlessly control your mouse and keyboard to execute everyday workflows—like booking flights or managing spreadsheets—while you simply sit back and speak.

Best multimodal integration and user experience winner: Wand
By: David Li

Wand is a voice-first, pointer-aware browser assistant that helps you seamlessly navigate and interact with any website using a combination of natural speech and hand gestures. By simply pointing at your screen and speaking — like asking to "play this video" or "zoom in here"—this live agent helps you instantly execute clicks, searches, and commands without ever needing to touch a mouse or keyboard.

Best technical execution and agent architecture winner: JohnKeats.AI
By: Matthew Keats

JohnKeats.AI is a voice-first emotional companion designed to actively listen and hold space for users without rushing to offer solutions. By processing subtle vocal cues like pitch, pacing, and tone, it reacts naturally to a user's emotional state in real time to provide a deeply reflective and empathetic conversational experience.

Best innovation and thought leadership winner: Rayan Memory
By: Yusuf Elnady

Rayan Memory tackles the universal problem of forgetting by turning your daily learnings into a fully explorable 3D "memory palace." A background agent passively listens to your real-world audio to extract important ideas as physical artifacts, allowing you to walk through themed virtual rooms and converse with a dedicated AI companion to easily retrieve your exact memories.

Honorable mention: NagarDrishti
By: Nikita Dongre and Omkar Dongre

NagarDrishti tackles dangerous road conditions by allowing citizens to safely report potholes and waterlogging using a hands-free voice assistant while driving. These real-time reports instantly populate an interactive dashboard, where city officials can use natural language to easily identify hazard hotspots and manage critical repairs.

Honorable mention: Ekaette
By: Bassey John

Ekaette revolutionizes customer service by replacing frustrating hold queues with a conversational, multimodal AI assistant that operates across live phone calls and text messaging. Customers can speak naturally with the agent over a standard phone line while seamlessly sharing photos, reviewing product options, or completing payments via WhatsApp, c

Honorable mention: VibeCat
By: Sejun Kim and Michael Chang

VibeCat is a proactive macOS desktop companion that continuously watches your screen, understands your context, and suggests helpful actions before you even ask. Instead of waiting for a command, it speaks up first — like offering to fix a missing line of code or execute a terminal command — and completes the task only after receiving your permission.

Honorable mention: Call My Parts
By: Sugam Palav, Nikhil Lohar, Siddhant Panday, and Vishal Parekh

Call My Parts automates the tedious, time-consuming process of sourcing used vehicle parts by doing the research and vendor outreach for you. Users simply speak their part request, and the AI agent autonomously searches vendor websites, calls suppliers to check pricing and inventory, and compiles the best options into a ranked, easy-to-read dashboard.

Honorable mention: Relay
By: Faith Ogundimu

Relay is an interactive AI lab partner that uses your webcam to watch and guide your physical electronics projects in real time. It provides step-by-step voice instructions to help you build circuits, catches wiring mistakes before they happen, and reinforces your skills with a built-in 3D simulation sandbox and adaptive quizzes.

Keep the momentum going

Inspired by these incredible projects? Start building and stay connected with the community through our latest programs and events:

Join Gemini Enterprise Agent Ready (GEAR), designed to help developers and decision-makers build and deploy production-ready AI agents.
Catch up on Google Cloud Next 2026: We just wrapped up an amazing Google Cloud Next! If you weren't able to join us in person — or simply want to relive the energy — take a look at our social and livestream recaps to catch up on some of the exciting developer activations straight from the expo floor.
Tune in on Tuesdays: Want to be the first to hear about new tools, product updates, and upcoming hackathons? Join us for our weekly livestream every Tuesday 9:00 A.M. PDT / 12:00 P.M. EDT for the latest in all things Google Cloud.

Congratulations again to all of our winners and participants. We can't wait to see what you build next!

Ship code within minutes with the Gemini CLI DevOps Extension

Fri, 08 May 2026 19:00:00 +0000

With AI coding tools like Antigravity and Claude Code, I can build a working web app in record time. But deploying it? That's where I'd historically lose the rest of the afternoon to Dockerfiles, IAM bindings, and YAML. So I'd take the shortcut most developers take: I just wouldn't do it. The app would stay on my laptop, and my work would never ship.

This is the classic tension between the inner loop: the fast, local cycle of writing and testing code, and the outer loop: containerization, CI/CD pipelines, and production infrastructure. Most developers are productive in one but not the other, and the gap between them is where projects stall.

The Gemini CLI Extension for CI/CD bridges this gap. It handles both quick deployments and full pipeline generation from a single terminal interface. Let me show you how.

Building the Cosmic Guestbook App

To demonstrate this workflow, we need an app. Let's start from an empty directory and use our agent to "vibe code" a brand new project: the Cosmic Guestbook.

We want a full-stack architecture: a React frontend and a Node.js Express backend API. Instead of scaffolding this by hand, we can ask our agent to jumpstart the app:

code_block: <ListValue: [StructValue([('code', '"Build a \'Cosmic Guestbook\' web app. I need a dynamic Node.js Express backend and a React frontend utilizing Vite. Make the frontend look like a beautiful, glassmorphic sci-fi interface."'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c5850>)])]>

Within moments, our agent scaffolds the backend/ directory with server.js and the frontend/ directory with a fully styled React app. We now have a functioning, two-tier web app sitting on our laptop.

Installing the Extension

But code on a laptop isn't shipping. To get this guestbook online, we need to equip our chosen environment with the CI/CD extension. Regardless of your setup, start by ensuring that you have the gcloud CLI installed and authenticate using Application Default Credentials: gcloud auth application-default login.

Now, install the extension in your preferred development environment:

For Gemini CLI

Run the following command directly in your terminal:

code_block: <ListValue: [StructValue([('code', 'gemini extensions install https://github.com/gemini-cli-extensions/cicd'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c5b50>)])]>

For Claude Code

Add the marketplace and install the plugin directly from the terminal:

code_block: <ListValue: [StructValue([('code', '# 1. Add the Marketplace\r\nclaude plugin marketplace add https://github.com/gemini-cli-extensions/cicd.git\r\n\r\n# 2. Install the Plugin\r\nclaude plugin install cicd'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c5430>)])]>

For Antigravity and agents supported by npx skills

You can enable the extension's MCP Server as custom MCP and add skills to your workspace:

code_block: <ListValue: [StructValue([('code', '# Add the Skills\r\nnpx skills add https://github.com/gemini-cli-extensions/cicd --global --all --agent antigravity'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c54f0>)])]>

How It Works

The CI/CD extension is a powerful three-tier system designed to translate your intent into secure, production-ready infrastructure in all these agent environments:

Skills: Specialized AI skills like google-cicd-deploy and google-cicd-pipeline-design are defined in the extension. These instruct your AI agent (Gemini CLI, Claude Code, or Antigravity) on how to think—helping it analyze your code, ask the right questions, and handle errors gracefully.
CI/CD MCP server: Running in the background is a specialized Go-based Model Context Protocol (MCP) server. This server provides a suite of tools that gives your agent the hands it needs to actually manipulate Google Cloud: everything from scanning for secrets to provisioning Cloud Run services.
Local knowledge base: To ensure the most accurate answers, the system includes a pre-indexed retrieval-augmented generation (RAG) database containing verified architecture patterns, which lets the agent ground its design decisions in the source of truth.

Your chosen AI assistant orchestrates these tools and patterns into a cohesive deployment lifecycle.

The Inner Loop

When you're building a prototype or testing a new feature, you don't need a massive, multi-environment CI/CD pipeline. You just need a public URL to test your webhook or show a stakeholder. This is the inner loop, and it needs to be fast.

The traditional approach involves manually writing a Dockerfile, authenticating with a container registry, building the image, pushing it, and finally deploying it. The CI/CD extension turns this into a single natural language prompt: gemini "Deploy this application to Google Cloud using the google-cicd-deploy skill". If you're using Claude Code, you can prompt it exactly the same way via claude -p "Deploy this application...", and in Antigravity, simply type your deployment request.

When you run this prompt, your AI agent analyzes your local workspace to figure out the best deployment approach.

Step 1: Pre-Deployment Security Scan

Leaked secrets are one of the most common and expensive security failures in software. GitGuardian's 2025 State of Secrets Sprawl report found 23.8 million new credentials exposed on public GitHub in a single year; 70% of secrets that were leaked in 2022 are still active today. It happens fast: you hardcode a database password during local testing, forget to remove it, and push.

The extension catches this before it becomes a problem. Before any code leaves your machine, it runs a secret check across your workspace. If it finds a Stripe API key or a database credential sitting in your source, the agent halts the deployment and warns you. No secrets ship to the cloud by accident.

If you accidentally commit a database password or a Stripe API key while hacking on localhost, your assistant catches it and prevents the deployment. This is what true shift-left security looks like in practice.

Step 2: Application Analysis & Containerization

Next, your agent checks your package.json or go.mod to figure out your framework. It automatically decides whether to use Google Cloud Storage (for static sites) or Cloud Run (for dynamic services).

If you're building a dynamic service that doesn't have a Dockerfile, the extension leverages Google Cloud's buildpacks to containerize it for you on the fly.

Step 3: Clarification and Deployment

Conversational agents don't blindly guess how you want your app configured. Before making any changes to your Google Cloud environment, the assistant pauses the flow to ask prioritizing questions based on what it found during its analysis.

For our Cosmic Guestbook, I didn't specify a region or whether the service should be public. The assistant paused and asked me about it:

"I have completed the analysis and prepared the application for a unified deployment to Cloud Run. Since the Node.js buildpack requires a single entry point, I created a root package.json with a custom build script that builds the React frontend and allows the Express backend to serve it statically.

Additionally, the secret scan completed successfully, and no sensitive credentials were found.

Before we deploy using the Cloud Run source deployment option (buildpacks), I need to clarify a few deployment parameters:

Region/Location: The gcloud defaults do not specify a location. Which Google Cloud region would you like to deploy to (e.g., us-central1, europe-west1)?
Access: Should this application be publicly accessible or private?
Service Name: I propose naming the service cosmic-guestbook. Does that sound good to you?"

This conversational pause ensures that even in the fast inner loop, you retain complete control over your cloud architecture. After you confirm the details, the agent pushes the code live and returns the public URL:

code_block: <ListValue: [StructValue([('code', '# Final Output:\r\nYour application is now live and publicly accessible at the following URL:\r\nhttps://cosmic-guestbook-xxxxxxxx-uc.a.run.app'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c4a90>)])]>

Behind the scenes, the deployment is handled automatically via cloudrun.deploy_to_cloud_run_from_source.

The Outer Loop

A scrappy deployment prompt is perfect for a Tuesday afternoon prototype, but you can't run a production system from your laptop. Eventually, you need the rigors of the outer loop: automated testing, source control integration, and formal continuous deployment.

Writing cloudbuild.yaml files and provisioning the necessary infrastructure (like Artifact Registry repositories or GitHub connections through Developer Connect) is notoriously tedious and error-prone. With the google-cicd-pipeline-design skill, your AI agent acts as your personal platform engineering consultant.

Instead of writing YAML from scratch, you have a conversation. Your agent will ask you about your testing strategy and where you want to deploy, and then it autonomously provisions the required Google Cloud infrastructure.

Step 1: Architectural Design & Feedback

You start the process directly in your conversational interface:

code_block: <ListValue: [StructValue([('code', '# Prompt your agent to kick off the design process:\r\ngemini "Design a CI/CD pipeline using the google-cicd-pipeline-design skill"\r\n# OR\r\nclaude -p "Design a CI/CD pipeline using the google-cicd-pipeline-design skill"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c4c40>)])]>

Your assistant doesn't work in a black box. It retrieves common CI/CD patterns from its knowledge base. With the most relevant knowledge in hand, it proposes a concrete plan in YAML for you to review.

Step 2: Infrastructure Provisioning

After you approve the plan, the assistant works sequentially through the required infrastructure steps. For example, it might first create a registry for your containers.

code_block: <ListValue: [StructValue([('code', '// Example MCP call to provision the registry\r\n{\r\n "name": "create_artifact_repository",\r\n "arguments": {\r\n "repository_id": "demo-app-repo",\r\n "location": "us-central1",\r\n "format": "DOCKER"\r\n }\r\n}'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c4550>)])]>

It might then set up a Git connection so that Cloud Build can read your source code.

Step 3: Pipeline Generation & Trigger

Finally, the agent generates the actual cloudbuild.yaml file that defines the pipeline stages (test, build, deploy). Here's a snippet of a generated configuration from the repository that highlights the initial build steps:

code_block: <ListValue: [StructValue([('code', 'steps:\r\n # Step 1: Install tools (like the linter) and clean the cache.\r\n - name: \'golang:1.24\'\r\n id: \'Install Tools\'\r\n entrypoint: \'sh\'\r\n args:\r\n - \'-c\'\r\n - |\r\n set -e\r\n export PATH=/workspace/bin:$$PATH\r\n echo "Installing golangci-lint..."\r\n go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.64.8\r\n echo "Cleaning module cache..."\r\n go clean -modcache\r\n env:\r\n - \'GOPATH=/workspace\'\r\n dir: \'devops-mcp-server\''), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f48579c4af0>)])]>

With the pipeline defined, we need a way to execute it automatically. The agent finishes by creating a Cloud Build trigger. The trigger acts as the glue between your GitHub repository and Cloud Build, ensuring that every push to the main branch automatically fires off the cloudbuild.yaml steps.

code_block: <ListValue: [StructValue([('code', '// Example MCP call setting the trigger\r\n{\r\n "name": "create_build_trigger",\r\n "arguments": {\r\n "trigger_name": "main-branch-deploy",\r\n "filename": "cloudbuild.yaml",\r\n "branch_pattern": "^main$"\r\n }\r\n}'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f4857f7d700>)])]>

Security And Control

AI-assisted infrastructure generation sounds incredible, but it's reasonable to ask: is it safe?

The extension operates strictly within the permissions of your local Application Default Credentials (ADC). It can't do anything that you can't do. Because it uses the Model Context Protocol (MCP), every action that it takes, from creating an Artifact Registry to modifying a Cloud Build trigger, runs through strongly typed, verifiable tools.

If you don't like a step in the proposed pipeline, you tell your agent to change it. You're always the "Editor-in-Chief" of your infrastructure. We strongly recommend that you adhere to the principle of least privilege for both your local ADC and any service accounts that are used by the generated pipelines.

When Dev and Ops Converge

The friction between wanting to write code and needing to ship it is finally dissolving. We're moving past the era where deep expertise in YAML formatting was a prerequisite for putting an app on the internet.

By handling the boilerplate of both the scrappy inner loop and the automated outer loop, conversational AI lets developers focus on the business logic that actually matters.

Next Steps

If you want to experience this convergence yourself, here are your immediate next steps:

Get the tools: Install the CI/CD Extension for Gemini CLI.
Deploy the inner loop: Take an existing side project (or ask your chosen agent to scaffold a new one like our Cosmic Guestbook) and prompt it to deploy to Google Cloud to instantly see it live on Cloud Run or Cloud Storage.
Automate the outer loop: Run a design command against a repository that you're ready to productionize, and watch your agent generate your cloudbuild.yaml and provision your infrastructure.

Stop wrestling with configuration files and start shipping. Let me know what you build by reaching out on LinkedIn, X, or Bluesky!

How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms

Thu, 07 May 2026 16:00:00 +0000

The agricultural and crop protection supply chain is one of the most intricate networks in the world. It takes up to two years to turn active ingredients into the final products farmers need, and a single change in weather or regulations can disrupt everything. Planners at BASF Agricultural Solutions navigate this reality daily across 180 production sites. To understand how local decisions ripple across their entire global network, BASF turned to AlphaEvolve on Google Cloud to build a digital twin of their supply chain.

Planning across a two-year lead time

BASF Agricultural Solutions manages a network with over 5,000 distinct value chains. Creating a single end product requires a bill of materials that can be over 30 levels deep, moving across different production sites and regions.

Currently, human planners make thousands of local decisions every day. They decide what to produce, when to produce it, and how much safety stock to hold. Because the network is so large, a planner can’t easily see how a localized decision affects the rest of the global supply chain.

This scale can lead to additional working capital and inventory and or cause production imbalances. Traditional mathematical models struggle to capture the dynamic reality of the network that planners navigate based on years of experience.

Building a foundation for decision support

AlphaEvolve is an evolutionary coding agent that generates and refines algorithms autonomously. In collaboration with Google Cloud and prognostica GmbH, BASF’s objective was not to replace human decision-making, but to establish a new model for decision support that helps planners handle the real-world complexity of the production network.

The team gave AlphaEvolve a foundational "seed" program. This initial code established a standard planning logic that translated demand forecasts into production schedules, serving as a functional baseline before introducing dynamic, network-wide coordination. From there, they fed the model three years of historical data, including inventory levels, market demand, and actual production outputs. AlphaEvolve then generated variations of the code, mutating the logic to see if it could simulate a supply chain that matched the real-world historical data.

Measuring what good looks like in initial tests

For AlphaEvolve to improve, it needed a specific goal. The evaluation function scored every new piece of generated code on one primary metric: how closely the simulated inventory levels and production decisions matched the actual historical reality recorded by BASF.

The latest AlphaEvolve runs delivered more than 80% relative improvement in accuracy compared to the initial seed model. With further adjustments, the team expects to push performance even higher — bringing the model to a level of accuracy not achieved with other approaches and making it actionable for operational use.

The results

The evolved planning logic delivered immediate, measurable improvements over the initial seed model. The final algorithm successfully mirrored the actual historical performance of the supply chain, significantly reducing the error rate compared to the initial seed.

“We had several attempts to build a digital twin for our complex supply network using deterministic models, and all of them failed,” said Dr. Goetz Krabbe, vice president for global supply chain at BASF. “By using AlphaEvolve, we cannot only map the complex network based on system data, but at the same time understand and copy the human decisions that drive our daily operations. This gives us a highly accurate and easy to maintain data driven digital twin of the entire network. Using it we can optimize our inventory levels and respond to market volatility with confidence while avoiding stockouts."

What the evolved algorithm actually does

By running thousands of experiments, AlphaEvolve developed a clear, human-readable algorithm that explains how the BASF network truly operates. It automatically discovered factually correct, domain-specific supply chain rules that explain the observed production outputs and inventory levels for the tested product value chain:

Production consolidation: The algorithm learned to group production amounts together, accurately mapping how planners optimize plant time.
Dynamic safety stocks: It introduced safety stock parameters to handle volatile and seasonal demand patterns, helping to strictly manage capital costs while preventing out-of-stock situations.
Network-wide coordination: The model successfully mapped the dependencies between different production tiers, providing a clear foundation for optimizing asset utilization globally.

What's next

The initial simulations showed that evolutionary AI can accurately model large-scale, dynamic supply chains. BASF’s objective is to create a digital twin of their entire global production network as a new foundation for simulation, decision support, scenario forecasting and optimization. This will allow the team to continuously simulate operations, identify hidden bottlenecks before they affect throughput, and optimize asset utilization across all global facilities.

_{This project was a collaboration between the BASF SE team including: Benjamin Priese, Michael Arlt, Debora Morgenstern and Tobias Hausen as well as Manuel Doerr and Thomas Christ from Prognostica GmbH Würzburg, and the AI for Science team at Google Cloud including (but not limited to): Kartik Sanu, Laurynas Tamulevičius, Nicolas Stroppa, Chris Page, Srikanth Soma, John Semerdjian, Skandar Hannachi, Vishal Agarwal and Anant Nawalgaria as well as Christoph Tittelbach from the Google account team and partners at Google DeepMind}

Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX

Wed, 06 May 2026 16:00:00 +0000

AI coding agents are rapidly becoming ubiquitous across the software industry, fundamentally changing how developers write, test, and debug daily code. While these tools excel at localized, self-contained tasks, applying them to massive, systemic codebase migrations requires an entirely new approach.

Google is already addressing this challenge by incorporating AI into many migration workflows: x86 to ARM (enabling workloads on Google Axion processors); int32 to int64 identifiers (to avoid running out of ids); JUnit3 to JUnit4 (for testing); and Joda-Time to java.time (a modern time library). However, AI model migration represents a whole new level of complexity that requires even more advanced methods for AI-assisted migration.

Translating a production-grade machine learning model from one framework to another, for example, from TensorFlow (TF) to JAX, is not a simple syntax update. It is a long-horizon task that requires untangling thousands of lines of code, managing complex states across multiple files, and preserving precise mathematical equivalence. Generic, single-agent coding assistants typically struggle under this weight — they frequently lose context over long workflows, hallucinate APIs, or fail to produce buildable code across an entire repository.

Google’s AI and Infrastructure team has pioneered a new approach to this industry-wide problem. The result is 6x faster model migration, a milestone Sundar highlighted in the recent Google Cloud Next keynote. In this post, we share how we deployed specialized, multi-agent AI systems to migrate some of Google’s largest-scale production models from TF to JAX.

Accelerating the transition from TF to JAX

For many teams at Google — and across the industry — the future of scalable machine learning is being built on JAX. Designed around a functional, stateless paradigm, JAX is heavily optimized for modern Tensor Processing Unit (TPU) infrastructure and XLA compilation, making it the bedrock of the modern AI stack.

Evolving to this future presents a monumental challenge. Thousands of production models are built on TensorFlow, a framework characterized by object-oriented, stateful layer initialization and static execution graphs. Manually migrating these models to JAX requires a fundamental rethinking of how layers interact, and how state is explicitly managed. Across large organizations, this type of migration alone represents hundreds (if not thousands) of software engineering (SWE) years — time better spent on researching new architectures and driving product innovation.

Overcoming this challenge with AI started as an ambitious experiment within Google’s AI and Infrastructure team, but has evolved into a repeatable blueprint for addressing complex engineering problems across the company.

Moving beyond single-agent coding

Our early experiments with agentic code translation showed promise for simple models. However, when faced with the realities of a Google-scale migration — complex, production-grade models spanning multiple files and thousands of lines of code — generic, single-agent setups struggled. They could not balance high-level structural rules with low-level execution details, resulting in a variety of failures, such as overwriting critical files or skipping necessary functionality. To overcome these common challenges inherent to enterprise migrations, we developed a highly specialized multi-agent architecture that consists of:

The Planner agent: Using deterministic, compiler-based static analysis, the Planner maps out the codebase's entire dependency tree. It then works alongside other agents to break the migration down into a discrete, step-by-step plan, helping ensure the migration happens logically from the "leaf nodes" (layers without unmigrated dependencies) upward.
The Orchestrator agent: This agent acts as the project manager. It dynamically groups plan steps into manageable chunks to keep the context window focused, injects the necessary domain knowledge, and handles failure recovery if a step doesn't build.
The Coder agent: Built as a reasoning and acting agent, the Coder is the workhorse. Integrated directly into our internal IDE tools, it has the ability to read files, write code, run builds, and execute unit tests. Crucially, it operates in a "test-and-fix" loop, self-correcting until it produces a compilable, verifiable component in the target language.

Figure: Multi-agent AI system for complex code migrations. Process diagram describing the multi-agent system used to migrate legacy model code to JAX. Image generated with Gemini Nano Banana 2.

Scalable validation and dynamic Playbooks

Generative AI models are only as good as the context they are provided. Because source and target architectures rarely map 1-to-1, we engineered a scalable, hierarchical system of Playbooks.

These Playbooks range from general repository instructions to highly specific "golden examples" distilled from successful manual migrations. By feeding the Orchestrator a client-specific Playbook (for instance, one tailored to YouTube's unique ranking model infrastructure), the system avoids generic hallucinations and strictly adheres to internal coding standards. This Playbook architecture is framework-agnostic, meaning it can be adapted to guide migrations between any two programming languages or frameworks.

Furthermore, we instituted rigorous quality metrics to ensure the generated code is actually production-ready:

Quantitative verification: For each unit of code, we verify correctness mathematically. In the case of the TF-to-JAX migration, the system utilizes algorithmic gradient ascent to find the maximum error between the original TF layer and the new JAX layer, mathematically verifying functional equivalence.
Qualitative evaluation: We also evaluate the migrated code against a set of qualitative standards. In the case of the TF-to-JAX migration, we deploy a blind-audit LLM Judge that scores the migrated code against a framework-agnostic architectural checklist, so that critical, domain-specific logic is completely captured.

Redefining migration velocity

By deploying this multi-agent system, we dramatically alter the economics of software migration.

In our evaluations on real-world, highly complex YouTube models (featuring thousands of lines of code, hundreds of layers, and deep metric dependencies), the multi-agent system achieved a 6.4x to 8x speedup over performing the migration manually. What traditionally took several SWE-months can now be reduced to only a few weeks of AI-assisted code generation, followed by expert human review.

The system effectively handles the boilerplate, identifies target idioms, maps the dependencies, and generates the unit tests, allowing engineers to act as reviewers and architects rather than manual translators.

Looking ahead into the AI-assisted era

AI is transforming the pace of technological innovation. Without using AI to accelerate our ability to conduct large-scale migrations, it will become increasingly difficult for organizations to adopt the latest breakthroughs and maintain the security, reliability, and performance of their systems.

Our work migrating machine learning implementations from one ML framework to another demonstrates that by combining deterministic static analysis, strict testing loops, and specialized multi-agent architectures, we can safely automate some of the most complex software engineering challenges in the industry. A detailed description of the process is published in our technical paper.

_{This work is the result of collaboration across Google. We thank key contributors: Stoyan Nikolov, Niyati Parameswaran, Bernhard Konrad, Moritz Gronbach, Niket Kumar, Ann Yan, Varun Singh, Yaning Liang, Antoine Baudoux, Xevi Miró Bruix, Daniele Codecasa, Madhura Dudhgaonkar, Elian Dumitru, Alex Ivanov, Christopher Milne-O’Grady, Ahmed Omran, Ivan Petrychenko, Assaf Raman, Stefan Schnabl, Yurun Shen, Maxim Tabachnyk, Niranjan Tulpule, Amin Vahdat, and Jeff Zhou.}