Google Gemini Robotics-ER 1.6: Embodied Reasoning Upgrade for Real-World Robots
Google DeepMind released Gemini Robotics-ER 1.6 with improved spatial reasoning, instrument reading, and multi-view success detection. Here's what changed and why it matters for AI robotics.
Google Gemini Robotics-ER 1.6: Embodied Reasoning Upgrade for Real-World Robots
On April 14, 2026, Google DeepMind released Gemini Robotics-ER 1.6, the latest version of its reasoning-first model designed specifically for physical robotics tasks. The update brings measurable improvements in spatial reasoning, multi-view success detection, and a new instrument reading capability developed in collaboration with Boston Dynamics.
This is the third iteration of Google's embodied reasoning line, following ER 1.5 released earlier in 2026. The model is available now through the Gemini API and Google AI Studio.
What is Gemini Robotics-ER
Gemini Robotics-ER (Embodied Reasoning) is a specialized variant of Gemini built to function as the high-level reasoning engine for robots. Rather than controlling motors or joints directly, it handles the cognitive side: understanding environments, planning tasks, and determining when a task is complete.
The model works by processing camera feeds and other sensor data, then outputting structured instructions that downstream systems like vision-language-action models (VLAs) execute. It can also natively call tools like Google Search or custom functions during task execution.
ER 1.6 sits above Gemini 3.0 Flash in the robotics stack. While Flash handles general-purpose reasoning, ER adds spatial awareness, physical reasoning, and safety compliance that general models lack.
Key improvements in ER 1.6
Spatial reasoning and pointing
Pointing is the foundational capability that enables all other spatial tasks. ER 1.6 uses points to:
- •Detect and count objects with higher precision
- •Define spatial relationships (move X to location Y)
- •Map trajectories for motion planning
- •Handle complex constraints like "point to every object small enough to fit inside the blue cup"
Google's benchmarks show ER 1.6 correctly identifies object counts that ER 1.5 misses. In one test involving a toolbox scene, ER 1.5 failed to count hammers correctly, missed scissors entirely, and hallucinated a wheelbarrow that wasn't present. ER 1.6 got all counts right and correctly noted absent items.
Multi-view success detection
Knowing when a task is done matters as much as knowing how to start. Success detection lets a robot decide whether to retry a failed action or move to the next step in a plan.
ER 1.6 advances multi-view reasoning, allowing robots to synthesize information from multiple camera angles (overhead, wrist-mounted, etc.) even with occlusions, poor lighting, or ambiguous scenes. The model understands how different viewpoints relate to each other across time, which is critical for tasks like "put the blue pen into the black pen holder" where the pen disappears from one camera but appears in another.
Instrument reading (new)
This is the headline feature. Developed through the Boston Dynamics partnership, instrument reading lets robots interpret industrial gauges, thermometers, pressure gauges, chemical sight glasses, and digital readouts.
The use case comes directly from real facility inspection needs. Spot, Boston Dynamics' quadruped robot, already patrols industrial facilities capturing images of instruments. ER 1.6 adds the ability to actually read and interpret those images.
The model handles multiple complexity levels:
- •Circular analog pressure gauges with tick marks and needles
- •Vertical level indicators showing liquid fill levels
- •Modern digital readouts
- •Multi-needle gauges with different decimal places
Reading a gauge requires several steps: zooming into the image for detail, identifying the needle position relative to tick marks, estimating proportions and intervals, accounting for camera perspective distortion (especially for curved sight glasses), reading unit labels, and combining multiple needles into a single reading.
ER 1.6 uses "agentic vision" to accomplish this, combining visual reasoning with code execution. The model takes intermediate reasoning steps rather than trying to read the gauge in a single pass.
Safety improvements
Google claims ER 1.6 is the safest Gemini robotics model to date. Specific improvements:
- •Better compliance with physical safety constraints (e.g., "don't handle liquids," "don't pick up objects heavier than 20kg")
- •+6% improvement over Gemini 3.0 Flash in identifying text-based safety hazards
- •+10% improvement in video-based hazard perception
- •Superior performance on adversarial spatial reasoning tasks designed to test policy compliance
The model also makes safer spatial decisions about which objects can be manipulated given gripper or material constraints.
Benchmark results
Google's published benchmarks compare ER 1.6 against ER 1.5 and Gemini 3.0 Flash across four task categories:
| Task | ER 1.6 | ER 1.5 | Gemini 3.0 Flash |
|---|---|---|---|
| Single-view success detection | Highest | Lowest | Middle |
| Multi-view success detection | Highest | Lowest | Middle |
| Instrument reading (agentic vision) | Highest | N/A | Lower |
| Pointing / spatial reasoning | Highest | Lowest | Close second |
ER 1.6 outperforms both predecessors across all tested categories. The gap is widest on instrument reading (a new capability) and multi-view success detection.
Pricing and availability
Gemini Robotics-ER 1.6 is available through:
- •Gemini API at https://ai.google.dev/gemini-api/docs/robotics-overview
- •Google AI Studio (model: gemini-robotics-er-1.6-preview)
Google provides a developer Colab notebook with configuration examples and sample prompts for embodied reasoning tasks, hosted on the google-gemini/robotics-samples GitHub repository.
Pricing follows standard Gemini API tiers. The model runs as a preview, so expect potential changes to rate limits and pricing before general availability.
Boston Dynamics partnership
The instrument reading capability came from a direct collaboration with Boston Dynamics. Marco da Silva, VP and GM of Spot at Boston Dynamics, stated that capabilities like instrument reading and task reasoning will enable Spot to operate completely autonomously in real-world inspection scenarios.
This partnership signals where Google is focusing its robotics efforts: not consumer robots, but industrial applications where embodied reasoning has immediate ROI. Facility inspection, equipment monitoring, and hazard detection are high-value use cases that justify the compute cost of running a frontier model on every camera frame.
How it compares to competitors
The embodied reasoning model category is still relatively new, but a few comparisons stand out:
- •vs. NVIDIA GR00T: GR00T focuses on training foundation models for humanoid robots, while ER 1.6 is a reasoning layer that sits above any robot's control system. Different architectural approaches.
- •vs. Physical Intelligence (pi0): pi0 trains end-to-end from demonstration data. ER 1.6 takes a more modular approach, reasoning at a higher level and delegating low-level control to VLAs.
- •vs. ER 1.5: The upgrade path is clear. ER 1.6 fixes ER 1.5's object counting failures, adds instrument reading, and improves multi-view reasoning significantly. If you're using ER 1.5, the upgrade is worth it.
Limitations
Google is upfront that ER 1.6 has boundaries. The company is actively soliciting feedback from developers, asking for 10-50 labeled images showing specific failure modes in specialized applications. This suggests the model still struggles with edge cases in niche industrial settings.
The reliance on agentic vision (which requires code execution) means higher latency and compute cost per inference. For real-time control loops, this is a constraint. The model is best suited for task-level planning rather than direct low-frequency control.
Bottom line
Gemini Robotics-ER 1.6 represents a meaningful step in making AI models useful for physical world tasks. The instrument reading capability alone justifies the update for anyone working in industrial automation. The safety improvements and multi-view reasoning advances make it a practical choice for autonomous inspection and facility management.
For developers, the key takeaway is that Google is making embodied reasoning accessible through standard API channels. You don't need a robotics lab to experiment with ER 1.6 -- a Gemini API key and some images of your environment are enough to start.
Share this article
About NeuralStackly
Expert researcher and writer at NeuralStackly, dedicated to finding the best AI tools to boost productivity and business growth.
View all postsRelated Articles
Continue reading with these related posts
Qwen Free Tier Discontinued: What It Means for AI Tool Pricing in 2026
Qwen Free Tier Discontinued: What It Means for AI Tool Pricing in 2026
Qwen just killed its free tier. Here is what that means for developers, AI tool builders, and the future of AI pricing across the industry.
Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6
Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6
Anthropic's Claude Opus 4.7 is now available with major gains in agentic coding, high-resolution vision, and a new xhigh effort level. Full benchmarks, pricing, migration guide,...

Claude Mythos: Anthropic's AI Model So Powerful It May Never Be Released
Anthropic's Claude Mythos can find thousands of zero-day vulnerabilities, but the company says it's too dangerous for public release. Here's everything we know about the most co...