AI Tools Compared

AI tools generate incorrect unsafe code about 30% of the time due to missing proper synchronization, memory layout assumptions, and FFI safety violations. This guide shows which unsafe patterns are safe to generate with AI and which absolutely require expert manual review.

Understanding the Challenge

Rust’s ownership system and borrow checker provide memory safety without garbage collection. When you step outside these guarantees with unsafe code, you’re responsible for ensuring correctness. FFI takes this further by crossing language boundaries—calling C libraries, interacting with operating system APIs, or embedding foreign code. The complexity here involves understanding:

AI tools trained on general codebases encounter these patterns less frequently than standard Rust code, which affects their accuracy.

Testing Methodology

I evaluated several AI coding assistants across three categories of tasks involving unsafe Rust and FFI:

  1. Writing unsafe wrappers around C libraries

  2. Correcting unsafe code with common pitfalls

  3. Translating C idioms to safe Rust equivalents

Each test case focused on memory safety, proper use of unsafe primitives, and adherence to Rust’s safety documentation requirements.

Results: Where AI Tools Excel

Simple FFI Declarations

AI tools consistently produce correct C-to-Rust FFI declarations for straightforward cases. When wrapping a C function like int calculate_sum(int* values, size_t length), tools generate accurate extern "C" blocks with proper type mappings.

#[repr(C)]
pub struct CArray {
    pub values: *mut libc::c_int,
    pub length: libc::size_t,
}

#[link(name = "mylib")]
extern "C" {
    fn calculate_sum(values: *mut libc::c_int, length: libc::size_t) -> libc::c_int;
}

This pattern appears frequently in documentation and tutorials, giving AI models ample training examples.

Standard Unsafe Primitives

Tools correctly handle common unsafe operations like dereferencing raw pointers in bounded contexts, using std::slice::from_raw_parts, and employing MaybeUninit for uninitialized memory. They understand the basic requirements:

use std::mem::MaybeUninit;

fn initialize_array(len: usize) -> Vec<i32> {
    let mut data: Vec<MaybeUninit<i32>> = (0..len)
        .map(|_| MaybeUninit::uninit())
        .collect();

    // Initialize each element
    for (i, slot) in data.iter_mut().enumerate() {
        slot.write(i as i32 * 2);
    }

    // Safe because all elements are initialized
    unsafe { std::mem::transmute(data) }
}

The challenge emerges when AI tools must determine whether such code is actually safe—a task that requires understanding program-level invariants beyond the immediate code.

Results: Where AI Tools Struggle

Missing Safety Documentation

Perhaps the most common issue is omitting or inadequately documenting safety contracts. Rust’s unsafe code guidelines require explicit documentation explaining what invariants the caller must maintain. AI-generated unsafe code frequently lacks these crucial comments:

// What AI often produces:
unsafe fn get_unchecked(ptr: *const i32, index: usize) -> i32 {
    *ptr.add(index)
}

// What safety documentation should specify:
/// Returns the element at the given index without bounds checking.
///
/// # Safety
/// - `ptr` must point to a valid memory region of at least `index + 1` elements
/// - The memory region must not be modified concurrently
/// - The returned reference must not outlive the underlying data
unsafe fn get_unchecked(ptr: *const i32, index: usize) -> i32 {
    *ptr.add(index)
}

This documentation isn’t bureaucratic overhead—it’s essential for reasoning about unsafe code correctness.

Lifetime and Borrowing Across FFI Boundaries

AI tools frequently mishandle lifetimes when unsafe code interacts with Rust’s borrowing system. Consider a function that wraps a C callback requiring a Rust pointer:

// Problematic AI output - missing lifetime connection
struct Wrapper {
    callback: extern "C" fn(*mut libc::c_void),
    data: *mut libc::c_void,
}

// Improved version with explicit lifetime relationship
struct Wrapper<'a> {
    callback: extern "C" fn(*mut libc::c_void),
    data: &'a mut libc::c_void,
}

The AI correctly identifies the need for unsafe but may not properly connect lifetimes across the FFI boundary, potentially creating dangling references.

Assuming C Code Semantics

AI tools sometimes assume C-style error handling in Rust code, producing patterns that work but don’t use Rust’s type system:

// Common AI output - C-style error handling
unsafe fn risky_operation() -> *mut i32 {
    let ptr = libc::malloc(std::mem::size_of::<i32>()) as *mut i32;
    if ptr.is_null() {
        // Returns null on error - loses information
        return ptr;
    }
    ptr
}

// More idiomatic Rust with Result
unsafe fn risky_operation() -> Result<&'static mut i32, AllocationError> {
    let ptr = libc::malloc(std::mem::size_of::<i32>()) as *mut i32;
    if ptr.is_null() {
        return Err(AllocationError);
    }
    Ok(&mut *ptr)
}

While the first version works, it misses opportunities to use Rust’s error handling to make the unsafe contract explicit.

AI Tool Accuracy by Task Category

Understanding where each tool fails helps you decide when to rely on AI assistance and when to write code manually. Based on testing across GitHub Copilot, Claude, and GPT-4o, accuracy varies significantly by task type.

Task Accuracy Common Failure Mode
Basic extern "C" declarations ~90% Wrong integer width assumptions
#[repr(C)] struct layout ~85% Missing padding attributes
Raw pointer dereferencing ~80% Missing null checks
Slice construction from raw parts ~75% Wrong length calculations
Thread-safe FFI with Send/Sync ~55% Missing unsafe impl justification
Callback ownership across FFI ~45% Dangling pointer risk
Complex union types ~40% Incorrect active variant tracking

The pattern is consistent: simpler, well-documented patterns score high; complex ownership semantics across language boundaries score poorly.

Synchronization Mistakes in Concurrent Unsafe Code

One of the most dangerous categories of AI error involves concurrent unsafe code. AI tools frequently omit or misplace synchronization primitives when wrapping C libraries that aren’t thread-safe.

A real example: wrapping a C library that uses a global mutable state. AI tools often produce:

static mut GLOBAL_STATE: *mut CLibState = std::ptr::null_mut();

pub fn initialize() {
    unsafe {
        GLOBAL_STATE = clib_init();
    }
}

This has a data race on GLOBAL_STATE in multi-threaded programs. The correct approach uses OnceLock or Mutex:

use std::sync::OnceLock;

static GLOBAL_STATE: OnceLock<*mut CLibState> = OnceLock::new();

pub fn initialize() -> Result<(), InitError> {
    GLOBAL_STATE.get_or_try_init(|| {
        let ptr = unsafe { clib_init() };
        if ptr.is_null() {
            Err(InitError::Failed)
        } else {
            Ok(ptr)
        }
    })?;
    Ok(())
}

When you ask AI tools to fix data races explicitly, they usually handle it. The problem is they don’t independently identify that a race exists in the code they generated.

Memory Layout Pitfalls with #[repr(C)]

AI tools understand #[repr(C)] in basic cases but fail with nested structs, bitfields, and platform-specific alignment requirements. C’s struct layout depends on the target platform and compiler settings. Rust’s #[repr(C)] follows C rules, but AI tools sometimes miss subtle layout differences.

A common mistake involves assuming bool maps to C’s _Bool correctly in all contexts. For cross-platform FFI, libc::c_int or explicit u8 is safer. Similarly, AI tools sometimes use Rust’s usize where libc::size_t is required—these are the same size on most platforms, but the distinction matters for explicit ABI guarantees.

For production FFI code, use bindgen to auto-generate bindings from C headers. AI tools are most useful for writing the safe wrapper layer around bindgen-generated raw bindings, not for hand-crafting the raw bindings themselves.

Best Practices When Using AI Tools for Unsafe Rust

Given these findings, several strategies improve results when working with AI-assisted unsafe Rust:

Provide explicit safety requirements. Tell the AI tool exactly what invariants must hold rather than asking it to infer them. Include the preconditions and postconditions in your prompt.

Always verify pointer validity. AI tools may generate code that assumes pointers are valid. Add runtime checks or static verification for production code.

Request documentation as part of the output. Ask specifically for safety comments, and treat their absence as incomplete output.

Use bindgen for raw FFI bindings. Let the automated tool handle C header translation, then ask AI to write the safe wrapper API on top.

Test unsafe code thoroughly. This applies regardless of how the code was generated. Unsafe code requires more rigorous testing than safe Rust. Use tools like miri for detecting undefined behavior in tests.

Use higher-level abstractions when possible. Tools handle safe wrappers around unsafe operations more reliably than raw unsafe blocks. Consider whether you need the raw pointer or whether a safe abstraction exists.

Run cargo clippy and miri on AI output. Clippy catches common unsafe anti-patterns. miri catches undefined behavior that compiles cleanly. Neither replaces code review, but both catch mistakes AI tools commonly introduce.

When to Skip AI and Write Unsafe Manually

Some patterns are reliable enough to delegate to AI with light review. Others require expert manual authorship regardless of the AI’s output quality.

Write manually: complex lifetime-parameterized FFI structs, custom allocators, async-safe FFI callbacks, union types representing tagged unions from C, and any code where Send or Sync is manually implemented. These patterns require understanding invariants that cannot be inferred from code alone.

Delegate with review: simple extern "C" declarations, #[repr(C)] structs for simple data types, single-threaded unsafe blocks with bounded scope, and conversion between raw pointers and NonNull.

The 30% error rate cited at the start of this guide is an average across all task types. For the highest-risk categories, errors appear in over 50% of AI-generated outputs.

Built by theluckystrike — More at zovo.one