China Golden Shield Project How Censorship Detection Works

The Great Firewall detects VPN traffic through multiple layers: deep packet inspection (DPI) identifies protocol signatures, DNS poisoning blocks forbidden domains, SNI filtering inspects unencrypted domain names during TLS handshakes, and active probing tests suspected servers. Even encrypted traffic reveals metadata through packet size patterns, timing analysis, and connection duration that machine learning classifiers can use to identify circumvention tools. Understanding these detection mechanisms is essential for developers building resilient privacy applications and power users seeking effective evasion strategies.

Core Detection Mechanisms

The GFW operates at multiple network layers, combining several detection techniques to identify and block traffic it deems undesirable.

Deep Packet Inspection (DPI)

Deep Packet Inspection examines the contents of network packets beyond just header information. Unlike simple port blocking, DPI analyzes payload data in real-time to identify protocols, applications, and forbidden content.

How DPI Works in Practice:

When you send data through a network path that passes through GFW inspection points, your packets get captured and analyzed. The system looks for specific patterns in unencrypted traffic or attempts to identify protocol fingerprints in encrypted traffic.

Here’s a conceptual example of how packet inspection might identify a protocol:

import re

def detect_shadowsocks_packet(packet_data):
    """
    Simplified example showing pattern detection logic.
    Real GFW uses more sophisticated machine learning and
    protocol fingerprinting techniques.
    """
    # Shadowsocks traffic has characteristic patterns
    shadowsocks_patterns = [
        rb'\x03\x00',  # Common header pattern
        rb'\x05\x01\x00',  # SOCKS5 greeting
    ]

    for pattern in shadowsocks_patterns:
        if pattern in packet_data:
            return True, "Shadowsocks protocol detected"

    return False, None

def detect_openvpn_packet(packet_data):
    """OpenVPN has distinct packet characteristics."""
    # OpenVPN packet headers have specific markers
    if packet_data[:2] in [b'\x00\x12', b'\x00\x14']:
        if b'OpenVPN' in packet_data or len(packet_data) > 100:
            return True, "OpenVPN traffic detected"
    return False, None

The actual GFW uses much more sophisticated pattern matching, including:

Protocol fingerprinting: Identifying traffic by examining packet sizes, timing, and structure
Statistical analysis: Detecting encrypted traffic that doesn’t match expected HTTPS characteristics
Machine learning classifiers: Training models on known circumvention tool traffic patterns

DNS Manipulation and Poisoning

The GFW extensively manipulates DNS responses to block access to forbidden domains. This happens at the DNS resolution level before any connection attempt reaches the target server.

DNS Poisoning Techniques:

# Example: Simulating DNS resolution in Python to demonstrate detection
# This shows what a client might experience

import socket

def resolve_with_fallback(domain):
    """
    Demonstrates how DNS poisoning affects resolution.
    In China, certain domains return incorrect IPs or timeout.
    """
    try:
        # First attempt - may return poisoned result
        ip = socket.gethostbyname(domain)

        # Check if IP is in known blocked ranges
        blocked_ranges = ['104.16.0.0/12', '172.16.0.0/12']

        return ip
    except socket.gaierror:
        return None

# Common blocked domains returnNXDOMAIN or wrong IPs
# Example: google.com, facebook.com, twitter.com from within China

Types of DNS-based Blocking:

NXDOMAIN Injection: Returning “domain does not exist” for blocked domains
Sinkholing: Returning IP addresses that point to blocking infrastructure
TTL Manipulation: Setting very short TTLs to force frequent re-resolution
Selective Dropping: Simply not responding to DNS queries for certain domains

SNI Filtering

Server Name Indication (SNI) is a TLS extension that indicates which hostname the client wants to connect to. The GFW inspects SNI fields in TLS handshake packets to block connections to forbidden domains—even when the connection content is encrypted.

Practical Impact:

When you establish a TLS connection, the SNI field is sent in plaintext during the handshake:

# This is what the GFW sees during TLS handshake
def extract_sni_from_tls_packet(packet):
    """
    TLS Client Hello contains SNI as Server Name Indication.
    This field is visible to network observers.
    """
    # TLS record header
    # TLS handshake type (0x01 = ClientHello)
    # SNI extension type (0x0000)
    # SNI list length and hostname

    # The GFW can read this and match against blocklists
    blocked_snis = [
        'google.com',
        '*.google.com',
        'facebook.com',
        'twitter.com',
        'youtube.com',
        't.co',
    ]

    # If SNI matches blocked list, connection gets terminated
    return sni_matches(sni, blocked_snis)

This means simply using HTTPS isn’t sufficient for bypassing the GFW—the hostname itself becomes detectable.

URL Filtering and Keyword Detection

Beyond DPI and DNS, the GFW maintains keyword blocklists that trigger connection termination when specific terms appear in HTTP requests or even HTTPS metadata.

Detection Targets:

URL paths: /tweet, /facebook, /youtube
Query parameters: Search queries containing sensitive terms
HTTP headers: User-Agent strings, Accept-Language headers
Body content: For unencrypted HTTP connections

Traffic Analysis and Behavioral Detection

The GFW doesn’t rely solely on content inspection. It analyzes traffic patterns to identify circumvention工具 (circumvention tools) based on how they behave.

Connection Pattern Analysis

Even perfectly encrypted traffic reveals metadata that can trigger detection:

# Simplified model of traffic pattern analysis
class TrafficAnalyzer:
    def __init__(self):
        self.packet_sizes = []
        self.timing_intervals = []

    def analyze_connection(self, packets):
        """
        The GFW analyzes:
        - Packet size distribution
        - Timing between packets
        - Ratio of sent to received data
        - Connection duration
        - Number of concurrent connections
        """

        # Tor-like patterns: small fixed-size packets, regular timing
        if self.is_tor_pattern(packets):
            return "BLOCK", "Tor-like traffic detected"

        # VPN patterns: large encrypted packets
        if self.is_vpn_pattern(packets):
            return "BLOCK", "VPN protocol detected"

        # WireGuard pattern: very small headers
        if self.is_wireguard_pattern(packets):
            return "BLOCK", "WireGuard detected"

        return "ALLOW", None

    def is_tor_pattern(self, packets):
        """Tor cells are exactly 514 bytes."""
        cell_size = 514
        return all(len(p) == cell_size for p in packets[:10])

Active Probing

The GFW employs active probing—reaching out to suspected servers to test their responses:

Connection testing: GFW connects to suspected proxy servers
Protocol verification: Sends protocol-specific probes
Response analysis: Checks if responses match expected patterns
Blocking: Adds confirmed circumvention servers to blocklists

Practical Implications for Developers

Understanding these detection mechanisms informs how to build more resilient systems:

Traffic Obfuscation Strategies

TLS-based transport hides content but not metadata:

# Using TLS to encrypt traffic - hides content but not SNI
import ssl
import socket

def create_obfuscated_connection(target, port):
    """
    This encrypts your traffic but:
    - SNI is visible in handshake
    - Traffic patterns may still be detectable
    - DPI can sometimes identify TLS fingerprints
    """
    context = ssl.create_default_context()
    conn = context.wrap_socket(
        socket.socket(socket.AF_INET),
        server_hostname=target
    )
    conn.connect((target, port))
    return conn

Protocol layering adds more layers of indirection:

# V2Ray style: WebSocket over TLS over TCP
# The traffic looks like normal HTTPS web browsing
# But timing and other metadata can still reveal it

Recommended Approaches for 2026

For developers building applications that need to work in censored environments:

Domain fronting: Using allowed CDNs to proxy traffic
Meek-like techniques: Hiding traffic inside legitimate service connections
Regular protocol rotation: Changing protocols to avoid blocklists
Custom TLS fingerprints: Making traffic appear as common browsers

Built by theluckystrike — More at zovo.one