AI Tools Compared

Multimodal coding tasks — where you provide an image or document alongside a text prompt — are a growing part of developer workflows. Uploading a database schema diagram and asking for the ORM models, sharing a UI screenshot and asking for the React component, or providing an architecture diagram and asking for Kubernetes config. This guide tests Gemini 1.5 Pro and Claude Opus on these specific developer tasks.

Test Setup

All tests use:

Task 1: Database ERD to ORM Models

Input: A database entity-relationship diagram (PNG) showing 5 tables with relationships, indices, and foreign keys.

Prompt: “Generate SQLAlchemy models for all tables shown in this diagram. Include relationships, foreign keys, and the appropriate indices.”

Gemini 1.5 Pro Output

from sqlalchemy import Column, Integer, String, ForeignKey, DateTime, Index
from sqlalchemy.orm import relationship, declarative_base
from datetime import datetime

Base = declarative_base()

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    email = Column(String(255), unique=True, nullable=False)
    created_at = Column(DateTime, default=datetime.utcnow)
    orders = relationship('Order', back_populates='user')

class Product(Base):
    __tablename__ = 'products'
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    price = Column(Integer, nullable=False)  # cents
    category_id = Column(Integer, ForeignKey('categories.id'))
    category = relationship('Category', back_populates='products')

# ... continued

Assessment: Gemini read the diagram accurately, identified all tables and relationships, and used correct SQLAlchemy syntax. It missed the composite index shown in the diagram (idx_orders_user_created on (user_id, created_at)).

Claude Opus Output

Claude’s output included the missing composite index and added a note about it:

# Claude additionally generated:
class Order(Base):
    __tablename__ = 'orders'
    __table_args__ = (
        Index('idx_orders_user_created', 'user_id', 'created_at'),  # From diagram
    )
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('users.id'), nullable=False)
    status = Column(String(50), nullable=False)
    created_at = Column(DateTime, default=datetime.utcnow)
    user = relationship('User', back_populates='orders')
    items = relationship('OrderItem', back_populates='order', cascade='all, delete-orphan')

Claude also added cascade='all, delete-orphan' on the OrderItem relationship, which is visible in the ERD (the diamond notation indicates a composition relationship). Gemini missed this.

Winner: Claude — more careful diagram reading, caught subtle annotation details.

Task 2: Screenshot to React Component

Input: A screenshot of a notification dropdown UI (bell icon, unread badge, list of notifications with icons, timestamps, and a “Mark all read” button).

Prompt: “Convert this screenshot to a React TypeScript component using Tailwind CSS.”

Both models produced working components. The difference was in fidelity:

Gemini generated a correct component structure but used approximate styling:

// Gemini: approximated colors and spacing
<div className="bg-white rounded-lg shadow-lg w-80 p-4">
  <div className="flex justify-between mb-3">
    <h3 className="font-semibold">Notifications</h3>
    <button className="text-blue-500 text-sm">Mark all read</button>
  </div>

Claude matched the screenshot more precisely:

// Claude: exact color values and spacing matching the screenshot
<div className="bg-white rounded-xl shadow-xl w-[320px] border border-gray-100">
  <div className="flex items-center justify-between px-4 py-3 border-b border-gray-100">
    <div className="flex items-center gap-2">
      <h3 className="text-[15px] font-semibold text-gray-900">Notifications</h3>
      <span className="bg-blue-100 text-blue-700 text-xs font-medium px-2 py-0.5 rounded-full">3</span>
    </div>
    <button className="text-[13px] text-blue-600 hover:text-blue-700 font-medium">
      Mark all read
    </button>
  </div>

Claude noticed the unread count badge next to the title (Gemini missed it), used the exact border color visible in the screenshot, and matched the font size more closely.

Winner: Claude — better visual fidelity in component generation.

Task 3: Architecture Diagram to Kubernetes Config

Input: An AWS architecture diagram (PNG) showing: ALB → ECS cluster (2 services) → RDS → ElastiCache, with VPC subnets and security groups visible.

Prompt: “Generate Kubernetes manifests that implement the architecture shown in this diagram, adapted for Kubernetes (ALB → Ingress, ECS → Deployments, ElastiCache → Redis).”

Gemini’s Approach:

Gemini recognized all components and generated correct Kubernetes manifests:

# Gemini output
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api-service
  template:
    spec:
      containers:
      - name: api
        image: api-service:latest
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: redis-secret
              key: url

Claude’s Approach:

Claude generated the same manifests but added a NetworkPolicy based on the security group rules visible in the diagram, and noted which assumptions it made:

# Claude also generated:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-service-netpol
spec:
  podSelector:
    matchLabels:
      app: api-service
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: nginx-ingress
    ports:
    - port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - port: 5432
  - to:
    - podSelector:
        matchLabels:
          app: redis
    ports:
    - port: 6379

Claude noticed the security group arrows in the diagram and translated them to NetworkPolicies — a detail that significantly affects security posture.

Winner: Claude — translated architecture constraints, not just topology.

Task 4: API Documentation PDF to SDK Code

Input: A 12-page PDF of an API reference for a payment gateway.

Prompt: “Generate a Python SDK for this payment API with typed request/response models and proper error handling.”

Gemini: With its 1M token context window, Gemini read the entire PDF correctly and generated a complete SDK. Generation time: ~45 seconds.

Claude: Also read the PDF completely. Generation time: ~60 seconds.

Both generated similarly complete SDKs. Gemini’s larger context window (1M vs Claude’s 200K) would matter for very long documents, but most API PDFs are under 50 pages.

For SDK generation, both are equivalent. Gemini’s edge is for very long documents.

Performance Summary

Task Gemini 1.5 Pro Claude Opus
ERD → ORM Good Better (catches details)
Screenshot → UI code Good Better (color fidelity)
Architecture → K8s Good Better (security constraints)
PDF → SDK Excellent Excellent
Very long documents (>100p) Better (1M context) Good
Latency Faster (30-40% faster) Slower
Cost Similar Similar

Workflow Recommendation

For UI and diagram tasks where visual fidelity matters, use Claude. For long document processing (large API docs, technical specifications), Gemini’s larger context window gives it an advantage. Both are far better than GPT-4V for reading technical diagrams with text annotations.

# Route by task type
def multimodal_coding_task(image_path, prompt, task_type):
    if task_type == 'long_document':
        # Use Gemini for documents > 50 pages
        return gemini_analyze(image_path, prompt)
    else:
        # Use Claude for diagrams, screenshots, architecture
        return claude_analyze(image_path, prompt)

Built by theluckystrike — More at zovo.one