Multimodal coding tasks — where you provide an image or document alongside a text prompt — are a growing part of developer workflows. Uploading a database schema diagram and asking for the ORM models, sharing a UI screenshot and asking for the React component, or providing an architecture diagram and asking for Kubernetes config. This guide tests Gemini 1.5 Pro and Claude Opus on these specific developer tasks.
Test Setup
All tests use:
- Gemini 1.5 Pro via Google AI Studio API
- Claude Opus 4 via Anthropic API
- Real development tasks, not synthetic benchmarks
- Evaluation on correctness, completeness, and code quality
Task 1: Database ERD to ORM Models
Input: A database entity-relationship diagram (PNG) showing 5 tables with relationships, indices, and foreign keys.
Prompt: “Generate SQLAlchemy models for all tables shown in this diagram. Include relationships, foreign keys, and the appropriate indices.”
Gemini 1.5 Pro Output
from sqlalchemy import Column, Integer, String, ForeignKey, DateTime, Index
from sqlalchemy.orm import relationship, declarative_base
from datetime import datetime
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
email = Column(String(255), unique=True, nullable=False)
created_at = Column(DateTime, default=datetime.utcnow)
orders = relationship('Order', back_populates='user')
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
price = Column(Integer, nullable=False) # cents
category_id = Column(Integer, ForeignKey('categories.id'))
category = relationship('Category', back_populates='products')
# ... continued
Assessment: Gemini read the diagram accurately, identified all tables and relationships, and used correct SQLAlchemy syntax. It missed the composite index shown in the diagram (idx_orders_user_created on (user_id, created_at)).
Claude Opus Output
Claude’s output included the missing composite index and added a note about it:
# Claude additionally generated:
class Order(Base):
__tablename__ = 'orders'
__table_args__ = (
Index('idx_orders_user_created', 'user_id', 'created_at'), # From diagram
)
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('users.id'), nullable=False)
status = Column(String(50), nullable=False)
created_at = Column(DateTime, default=datetime.utcnow)
user = relationship('User', back_populates='orders')
items = relationship('OrderItem', back_populates='order', cascade='all, delete-orphan')
Claude also added cascade='all, delete-orphan' on the OrderItem relationship, which is visible in the ERD (the diamond notation indicates a composition relationship). Gemini missed this.
Winner: Claude — more careful diagram reading, caught subtle annotation details.
Task 2: Screenshot to React Component
Input: A screenshot of a notification dropdown UI (bell icon, unread badge, list of notifications with icons, timestamps, and a “Mark all read” button).
Prompt: “Convert this screenshot to a React TypeScript component using Tailwind CSS.”
Both models produced working components. The difference was in fidelity:
Gemini generated a correct component structure but used approximate styling:
// Gemini: approximated colors and spacing
<div className="bg-white rounded-lg shadow-lg w-80 p-4">
<div className="flex justify-between mb-3">
<h3 className="font-semibold">Notifications</h3>
<button className="text-blue-500 text-sm">Mark all read</button>
</div>
Claude matched the screenshot more precisely:
// Claude: exact color values and spacing matching the screenshot
<div className="bg-white rounded-xl shadow-xl w-[320px] border border-gray-100">
<div className="flex items-center justify-between px-4 py-3 border-b border-gray-100">
<div className="flex items-center gap-2">
<h3 className="text-[15px] font-semibold text-gray-900">Notifications</h3>
<span className="bg-blue-100 text-blue-700 text-xs font-medium px-2 py-0.5 rounded-full">3</span>
</div>
<button className="text-[13px] text-blue-600 hover:text-blue-700 font-medium">
Mark all read
</button>
</div>
Claude noticed the unread count badge next to the title (Gemini missed it), used the exact border color visible in the screenshot, and matched the font size more closely.
Winner: Claude — better visual fidelity in component generation.
Task 3: Architecture Diagram to Kubernetes Config
Input: An AWS architecture diagram (PNG) showing: ALB → ECS cluster (2 services) → RDS → ElastiCache, with VPC subnets and security groups visible.
Prompt: “Generate Kubernetes manifests that implement the architecture shown in this diagram, adapted for Kubernetes (ALB → Ingress, ECS → Deployments, ElastiCache → Redis).”
Gemini’s Approach:
Gemini recognized all components and generated correct Kubernetes manifests:
# Gemini output
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 2
selector:
matchLabels:
app: api-service
template:
spec:
containers:
- name: api
image: api-service:latest
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: redis-secret
key: url
Claude’s Approach:
Claude generated the same manifests but added a NetworkPolicy based on the security group rules visible in the diagram, and noted which assumptions it made:
# Claude also generated:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-service-netpol
spec:
podSelector:
matchLabels:
app: api-service
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: nginx-ingress
ports:
- port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- port: 5432
- to:
- podSelector:
matchLabels:
app: redis
ports:
- port: 6379
Claude noticed the security group arrows in the diagram and translated them to NetworkPolicies — a detail that significantly affects security posture.
Winner: Claude — translated architecture constraints, not just topology.
Task 4: API Documentation PDF to SDK Code
Input: A 12-page PDF of an API reference for a payment gateway.
Prompt: “Generate a Python SDK for this payment API with typed request/response models and proper error handling.”
Gemini: With its 1M token context window, Gemini read the entire PDF correctly and generated a complete SDK. Generation time: ~45 seconds.
Claude: Also read the PDF completely. Generation time: ~60 seconds.
Both generated similarly complete SDKs. Gemini’s larger context window (1M vs Claude’s 200K) would matter for very long documents, but most API PDFs are under 50 pages.
For SDK generation, both are equivalent. Gemini’s edge is for very long documents.
Performance Summary
| Task | Gemini 1.5 Pro | Claude Opus |
|---|---|---|
| ERD → ORM | Good | Better (catches details) |
| Screenshot → UI code | Good | Better (color fidelity) |
| Architecture → K8s | Good | Better (security constraints) |
| PDF → SDK | Excellent | Excellent |
| Very long documents (>100p) | Better (1M context) | Good |
| Latency | Faster (30-40% faster) | Slower |
| Cost | Similar | Similar |
Workflow Recommendation
For UI and diagram tasks where visual fidelity matters, use Claude. For long document processing (large API docs, technical specifications), Gemini’s larger context window gives it an advantage. Both are far better than GPT-4V for reading technical diagrams with text annotations.
# Route by task type
def multimodal_coding_task(image_path, prompt, task_type):
if task_type == 'long_document':
# Use Gemini for documents > 50 pages
return gemini_analyze(image_path, prompt)
else:
# Use Claude for diagrams, screenshots, architecture
return claude_analyze(image_path, prompt)
Related Reading
- Best AI Tools for Generating CSS from Designs
- Which AI Generates Better Swift UI Views from Design Specs
- AI Coding Assistant Comparison for React Component Generation
Built by theluckystrike — More at zovo.one