Privacy Tools Guide

When you make a call through a SIP (Session Initiation Protocol) provider, the call itself is only part of the story. The metadata generated during call setup, routing, and termination creates an extensive trail that reveals who you called, when, for how long, and often where you were at the time. Understanding what providers log is essential for anyone building privacy-sensitive communications systems or selecting a VoIP service.

SIP Call Logging Fundamentals

SIP operates in two distinct layers: signaling and media. The signaling layer (using SIP messages) handles call setup, teardown, and modification. The media layer (using RTP - Real-time Transport Protocol) carries the actual audio. Most providers log both, but the signaling logs contain the most sensitive metadata.

Every SIP call begins with an INVITE request, followed by a 200 OK response, and concludes with a BYE request. These messages travel through proxy servers operated by your SIP provider, creating detailed logs regardless of whether the call content itself is encrypted.

What Providers Actually Log

Connection Metadata

When your SIP client registers with a provider, it sends a REGISTER request containing your SIP URI, contact address, and often the IP address of your device. Providers retain this registration data along with timestamps showing when you came online and went offline.

REGISTER sip:sipprovider.com SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:5060;rport;branch=z9hG4bK
From: <sip:1234567890@sipprovider.com>;tag=as7f3d9b2
To: <sip:1234567890@sipprovider.com>
Contact: <sip:1234567890@192.168.1.100:5060>
Expires: 3600

This single registration message reveals your phone number (in the SIP URI), your device’s IP address, and with basic traffic analysis, your approximate physical location. Providers typically retain registration history for months or years.

Call Detail Records (CDRs)

Every completed call generates a Call Detail Record containing:

These CDRs form the backbone of billing systems but also represent a record of your communication patterns. Law enforcement regularly requests CDR data from VoIP providers as part of investigations.

###SIP Message Headers

Beyond basic CDRs, providers capture the full SIP message exchange:

# Example: Parsing SIP headers from a pcap capture
from scapy.all import *

def extract_sip_metadata(pcap_file):
    packets = rdpcap(pcap_file)
    for pkt in packets:
        if pkt.haslayer(UDP) and pkt.haslayer(Raw):
            try:
                sip_data = pkt[Raw].load.decode('utf-8', errors='ignore')
                if 'INVITE' in sip_data or 'BYE' in sip_data or 'CANCEL' in sip_data:
                    from_email = re.search(r'From: <sip:([^@]+)@', sip_data)
                    to_email = re.search(r'To: <sip:([^@]+)@', sip_data)
                    call_id = re.search(r'Call-ID: ([^\s]+)', sip_data)
                    print(f"From: {from_email.group(1) if from_email else 'Unknown'}")
                    print(f"To: {to_email.group(1) if to_email else 'Unknown'}")
                    print(f"Call-ID: {call_id.group(1) if call_id else 'Unknown'}")
            except:
                pass

This script demonstrates how trivially the Call-ID header links all SIP messages belonging to a single call. Providers use this to reconstruct complete call flows.

Media Relay Information

When calls go through provider-operated media relays (often necessary for NAT traversal), the provider knows:

Even with encrypted SRTP (Secure RTP), the metadata about who exchanged media with whom remains visible to the provider.

Provider Retention Practices

Retention periods vary significantly by provider and jurisdiction:

Provider Type Typical Retention Data Shared with Third Parties
Major US VoIP (Comcast, AT&T) 3-7 years Law enforcement, subpoenas
Business SIP providers 1-5 years Often sold to data brokers
Consumer VoIP (Google Voice) Indefinite Ad networks, law enforcement
Privacy-focused SIP 0-30 days Only with warrant
Self-hosted Asterisk/FreePBX You decide Your choice

The “free” consumer VoIP services often have the most aggressive retention because your usage data is monetized through advertising and data sharing.

Practical Implications for Developers

Building Privacy-Aware SIP Applications

When implementing SIP-based applications, you have several options to minimize logging exposure:

# Minimal logging configuration for Asterisk
# /etc/asterisk/logger.conf

[general]
; Disable detailed logging to files
; Instead, only log warnings and errors
; This reduces the metadata footprint

[logfiles]
; Console logging at notice level (minimal)
console => notice,warning,error

; File logging disabled or minimal
; full => /var/log/asterisk/full
; messages => /var/log/asterisk/messages
# Python script to query your own Asterisk CDR database
# showing what data your system collects

import sqlite3

def show_cdr_data(db_path='/var/log/asterisk/cdr.db'):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Show what columns are being recorded
    cursor.execute("PRAGMA table_info(cdr)")
    columns = cursor.fetchall()
    print("CDR Columns stored:")
    for col in columns:
        print(f"  - {col[1]} ({col[2]})")

    # Show sample data (sanitized)
    cursor.execute("SELECT src, dst, duration, calldate FROM cdr LIMIT 5")
    print("\nSample CDR data:")
    for row in cursor.fetchall():
        print(f"  From: {row[0]} -> To: {row[1]}, Duration: {row[2]}s, Date: {row[3]}")

    conn.close()

if __name__ == '__main__':
    show_cdr_data()

This demonstrates that even self-hosted systems capture substantial data. The solution is intentional log rotation and sanitization:

import sqlite3
import time
from datetime import datetime, timedelta

def sanitize_cdr_for_privacy(db_path='/var/log/asterisk/cdr.db',
                              keep_days=7):
    """
    Remove sensitive CDR data after retention period
    while preserving aggregate statistics
    """
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    cutoff_date = datetime.now() - timedelta(days=keep_days)

    # Delete detailed records older than retention period
    cursor.execute(
        "DELETE FROM cdr WHERE calldate < ?",
        (cutoff_date.strftime('%Y-%m-%d'),)
    )

    deleted = cursor.rowcount
    conn.commit()
    conn.close()

    print(f"Deleted {deleted} CDR records older than {keep_days} days")

# Run daily via cron: 0 3 * * * /usr/bin/python3 /path/to/sanitize_cdr.py

Evaluating Third-Party SIP Providers

When selecting a provider for production systems, request answers to these questions:

  1. How long are CDRs retained? - Less than 30 days is reasonable for privacy
  2. Is call content logged? - Should be no for encrypted calls
  3. Is metadata shared with advertisers or data brokers? - Must be no
  4. Do you support SRTP encryption? - Mandatory for privacy
  5. Can I use my own SIP domain? - Prevents number-to-identity linking
  6. What happens to data if I close my account? - Must be deleted within 30 days

Providers like Twilio, Vonage, and Bandwidth are designed for business compliance and retain extensive records. For privacy-sensitive applications, consider self-hosting or providers specifically marketed as privacy-focused.

Mitigating SIP Metadata Exposure

Complete metadata elimination is difficult because some information exchange is required for call delivery. However, several strategies reduce exposure:

The reality is that SIP was designed for functionality, not privacy. Understanding what your provider logs is the first step toward making informed decisions about the services you use and the systems you build.

Built by theluckystrike — More at zovo.one