AI tools generate incorrect unsafe code about 30% of the time due to missing proper synchronization, memory layout assumptions, and FFI safety violations. This guide shows which unsafe patterns are safe to generate with AI and which absolutely require expert manual review.
Understanding the Challenge
Rust’s ownership system and borrow checker provide memory safety without garbage collection. When you step outside these guarantees with unsafe code, you’re responsible for ensuring correctness. FFI takes this further by crossing language boundaries—calling C libraries, interacting with operating system APIs, or embedding foreign code. The complexity here involves understanding:
-
Pointer arithmetic and lifetime relationships across unsafe boundaries
-
Rust’s safety invariants that must hold even when the compiler can’t verify them
-
ABI compatibility between Rust and foreign calling conventions
-
Proper error handling when dealing with C APIs that lack Rust’s safety guarantees
AI tools trained on general codebases encounter these patterns less frequently than standard Rust code, which affects their accuracy.
Testing Methodology
I evaluated several AI coding assistants across three categories of tasks involving unsafe Rust and FFI:
-
Writing unsafe wrappers around C libraries
-
Correcting unsafe code with common pitfalls
-
Translating C idioms to safe Rust equivalents
Each test case focused on memory safety, proper use of unsafe primitives, and adherence to Rust’s safety documentation requirements.
Results: Where AI Tools Excel
Simple FFI Declarations
AI tools consistently produce correct C-to-Rust FFI declarations for straightforward cases. When wrapping a C function like int calculate_sum(int* values, size_t length), tools generate accurate extern "C" blocks with proper type mappings.
#[repr(C)]
pub struct CArray {
pub values: *mut libc::c_int,
pub length: libc::size_t,
}
#[link(name = "mylib")]
extern "C" {
fn calculate_sum(values: *mut libc::c_int, length: libc::size_t) -> libc::c_int;
}
This pattern appears frequently in documentation and tutorials, giving AI models ample training examples.
Standard Unsafe Primitives
Tools correctly handle common unsafe operations like dereferencing raw pointers in bounded contexts, using std::slice::from_raw_parts, and employing MaybeUninit for uninitialized memory. They understand the basic requirements:
use std::mem::MaybeUninit;
fn initialize_array(len: usize) -> Vec<i32> {
let mut data: Vec<MaybeUninit<i32>> = (0..len)
.map(|_| MaybeUninit::uninit())
.collect();
// Initialize each element
for (i, slot) in data.iter_mut().enumerate() {
slot.write(i as i32 * 2);
}
// Safe because all elements are initialized
unsafe { std::mem::transmute(data) }
}
The challenge emerges when AI tools must determine whether such code is actually safe—a task that requires understanding program-level invariants beyond the immediate code.
Results: Where AI Tools Struggle
Missing Safety Documentation
Perhaps the most common issue is omitting or inadequately documenting safety contracts. Rust’s unsafe code guidelines require explicit documentation explaining what invariants the caller must maintain. AI-generated unsafe code frequently lacks these crucial comments:
// What AI often produces:
unsafe fn get_unchecked(ptr: *const i32, index: usize) -> i32 {
*ptr.add(index)
}
// What safety documentation should specify:
/// Returns the element at the given index without bounds checking.
///
/// # Safety
/// - `ptr` must point to a valid memory region of at least `index + 1` elements
/// - The memory region must not be modified concurrently
/// - The returned reference must not outlive the underlying data
unsafe fn get_unchecked(ptr: *const i32, index: usize) -> i32 {
*ptr.add(index)
}
This documentation isn’t bureaucratic overhead—it’s essential for reasoning about unsafe code correctness.
Lifetime and Borrowing Across FFI Boundaries
AI tools frequently mishandle lifetimes when unsafe code interacts with Rust’s borrowing system. Consider a function that wraps a C callback requiring a Rust pointer:
// Problematic AI output - missing lifetime connection
struct Wrapper {
callback: extern "C" fn(*mut libc::c_void),
data: *mut libc::c_void,
}
// Improved version with explicit lifetime relationship
struct Wrapper<'a> {
callback: extern "C" fn(*mut libc::c_void),
data: &'a mut libc::c_void,
}
The AI correctly identifies the need for unsafe but may not properly connect lifetimes across the FFI boundary, potentially creating dangling references.
Assuming C Code Semantics
AI tools sometimes assume C-style error handling in Rust code, producing patterns that work but don’t use Rust’s type system:
// Common AI output - C-style error handling
unsafe fn risky_operation() -> *mut i32 {
let ptr = libc::malloc(std::mem::size_of::<i32>()) as *mut i32;
if ptr.is_null() {
// Returns null on error - loses information
return ptr;
}
ptr
}
// More idiomatic Rust with Result
unsafe fn risky_operation() -> Result<&'static mut i32, AllocationError> {
let ptr = libc::malloc(std::mem::size_of::<i32>()) as *mut i32;
if ptr.is_null() {
return Err(AllocationError);
}
Ok(&mut *ptr)
}
While the first version works, it misses opportunities to use Rust’s error handling to make the unsafe contract explicit.
AI Tool Accuracy by Task Category
Understanding where each tool fails helps you decide when to rely on AI assistance and when to write code manually. Based on testing across GitHub Copilot, Claude, and GPT-4o, accuracy varies significantly by task type.
| Task | Accuracy | Common Failure Mode |
|---|---|---|
Basic extern "C" declarations |
~90% | Wrong integer width assumptions |
#[repr(C)] struct layout |
~85% | Missing padding attributes |
| Raw pointer dereferencing | ~80% | Missing null checks |
| Slice construction from raw parts | ~75% | Wrong length calculations |
Thread-safe FFI with Send/Sync |
~55% | Missing unsafe impl justification |
| Callback ownership across FFI | ~45% | Dangling pointer risk |
| Complex union types | ~40% | Incorrect active variant tracking |
The pattern is consistent: simpler, well-documented patterns score high; complex ownership semantics across language boundaries score poorly.
Synchronization Mistakes in Concurrent Unsafe Code
One of the most dangerous categories of AI error involves concurrent unsafe code. AI tools frequently omit or misplace synchronization primitives when wrapping C libraries that aren’t thread-safe.
A real example: wrapping a C library that uses a global mutable state. AI tools often produce:
static mut GLOBAL_STATE: *mut CLibState = std::ptr::null_mut();
pub fn initialize() {
unsafe {
GLOBAL_STATE = clib_init();
}
}
This has a data race on GLOBAL_STATE in multi-threaded programs. The correct approach uses OnceLock or Mutex:
use std::sync::OnceLock;
static GLOBAL_STATE: OnceLock<*mut CLibState> = OnceLock::new();
pub fn initialize() -> Result<(), InitError> {
GLOBAL_STATE.get_or_try_init(|| {
let ptr = unsafe { clib_init() };
if ptr.is_null() {
Err(InitError::Failed)
} else {
Ok(ptr)
}
})?;
Ok(())
}
When you ask AI tools to fix data races explicitly, they usually handle it. The problem is they don’t independently identify that a race exists in the code they generated.
Memory Layout Pitfalls with #[repr(C)]
AI tools understand #[repr(C)] in basic cases but fail with nested structs, bitfields, and platform-specific alignment requirements. C’s struct layout depends on the target platform and compiler settings. Rust’s #[repr(C)] follows C rules, but AI tools sometimes miss subtle layout differences.
A common mistake involves assuming bool maps to C’s _Bool correctly in all contexts. For cross-platform FFI, libc::c_int or explicit u8 is safer. Similarly, AI tools sometimes use Rust’s usize where libc::size_t is required—these are the same size on most platforms, but the distinction matters for explicit ABI guarantees.
For production FFI code, use bindgen to auto-generate bindings from C headers. AI tools are most useful for writing the safe wrapper layer around bindgen-generated raw bindings, not for hand-crafting the raw bindings themselves.
Best Practices When Using AI Tools for Unsafe Rust
Given these findings, several strategies improve results when working with AI-assisted unsafe Rust:
Provide explicit safety requirements. Tell the AI tool exactly what invariants must hold rather than asking it to infer them. Include the preconditions and postconditions in your prompt.
Always verify pointer validity. AI tools may generate code that assumes pointers are valid. Add runtime checks or static verification for production code.
Request documentation as part of the output. Ask specifically for safety comments, and treat their absence as incomplete output.
Use bindgen for raw FFI bindings. Let the automated tool handle C header translation, then ask AI to write the safe wrapper API on top.
Test unsafe code thoroughly. This applies regardless of how the code was generated. Unsafe code requires more rigorous testing than safe Rust. Use tools like miri for detecting undefined behavior in tests.
Use higher-level abstractions when possible. Tools handle safe wrappers around unsafe operations more reliably than raw unsafe blocks. Consider whether you need the raw pointer or whether a safe abstraction exists.
Run cargo clippy and miri on AI output. Clippy catches common unsafe anti-patterns. miri catches undefined behavior that compiles cleanly. Neither replaces code review, but both catch mistakes AI tools commonly introduce.
When to Skip AI and Write Unsafe Manually
Some patterns are reliable enough to delegate to AI with light review. Others require expert manual authorship regardless of the AI’s output quality.
Write manually: complex lifetime-parameterized FFI structs, custom allocators, async-safe FFI callbacks, union types representing tagged unions from C, and any code where Send or Sync is manually implemented. These patterns require understanding invariants that cannot be inferred from code alone.
Delegate with review: simple extern "C" declarations, #[repr(C)] structs for simple data types, single-threaded unsafe blocks with bounded scope, and conversion between raw pointers and NonNull.
The 30% error rate cited at the start of this guide is an average across all task types. For the highest-risk categories, errors appear in over 50% of AI-generated outputs.
Related Articles
- How Accurate Are AI Tools at Generating Rust Crossbeam
- How Accurate Are AI Tools
- How Accurate Are AI Tools at Rust WASM Compilation and Bindg
- Best Prompting Strategies for Getting Accurate Code from
- Best AI Tools for Writing Rust Async Code with Tokio Runtime
Built by theluckystrike — More at zovo.one