/posts/sqlite-lying-concurrent-writes

WAL Mode Does Not Make SQLite Multi-Writer

TL;DR

Three PHP-FPM workers, one SQLite file, and a WAL checkpoint race that corrupts your database without raising an error. The fix trades concurrent access for a single dedicated writer backed by Redis Streams.

8 min updated

Three PHP-FPM workers. One SQLite file. WAL mode on. Every best-practice PRAGMA set.

The database corrupts anyway. No error fires. Your backups replicate the damage before you notice.

WAL mode makes concurrent reads safe. It does not make concurrent writes safe. A checkpoint can move pages while a second writer sits between lock attempts -- and SQLite will not tell you. The fix is not a PRAGMA or a retry loop. It is architectural: one process writes, everything else submits to a queue.


Three PHP Workers, One Database File

Run lsof app.db on a busy server. Three PIDs. Three open file descriptors. Three processes convinced they own the database.

// Each PHP-FPM worker opens its own connection
$db = new PDO('sqlite:app.db');
$db->exec('PRAGMA journal_mode=WAL');
$db->exec('PRAGMA busy_timeout=5000');

function writeEvent($data) {
    global $db;
    $stmt = $db->prepare('INSERT INTO events (data) VALUES (?)');
    $stmt->execute([$data]);
}

busy_timeout tells SQLite to retry for 5 seconds before returning "database is locked." The retry loop is where things break.

Attach strace to all three processes under load:

# Process 1
openat(AT_FDCWD, "app.db", O_RDWR) = 3
fcntl(3, F_SETLK, {type=F_WRLCK...}) = 0  # Got the write lock

# Process 2 (50 microseconds later)
openat(AT_FDCWD, "app.db", O_RDWR) = 3
fcntl(3, F_SETLK, {type=F_WRLCK...}) = -1 EAGAIN
# EAGAIN -- lock held by Process 1, entering retry loop

# Process 3 (during Process 2's retry)
openat(AT_FDCWD, "app.db", O_RDWR) = 3
fcntl(3, F_SETLK, {type=F_RDLCK...}) = 0  # Got a READ lock

Process 3 grabbed a read lock while Process 2 sat between retries. That read lock is the trigger.

WAL Checkpoints Collide With Lock Gaps

Timing logs from production reconstructed the sequence:

[P1] 1706284800.001 - Acquiring write lock
[P2] 1706284800.002 - Lock busy, entering retry
[P3] 1706284800.003 - Checkpoint triggered
[P1] 1706284800.004 - Writing row ID 1234
[P3] 1706284800.005 - Checkpoint moving pages
[P2] 1706284800.006 - Got lock, writing row ID 1234

Two processes wrote the same row ID. No error.

WAL mode appends changes to a write-ahead log instead of modifying the main database file. Periodically, a checkpoint copies committed pages from the WAL back into the main file. The checkpoint needs a read lock to verify no readers are using the pages it moves.

The race unfolds in four steps. Process 1 holds the write lock and writes a row. Process 3 starts a checkpoint with a read lock, begins moving pages -- including the page Process 1 just wrote. Process 2, stuck in its busy_timeout retry loop, finally acquires the write lock after Process 1 releases it. Process 2 writes to a page the checkpoint is actively relocating.

The result: a page containing partial data from two writes. PRAGMA integrity_check catches it -- after the fact:

$result = $db->query('PRAGMA integrity_check')->fetchAll();
// Anything other than [['ok']] means the database is already damaged

The corruption is silent at write time. The INSERT succeeds. The error surfaces when a future read hits the damaged page or when you run an explicit integrity check. By then, your backups have already copied the corrupted file.

File Locks Do Not Survive Process Death

The obvious fix: an application-level lock. First attempt -- advisory file lock:

function writeEvent($data) {
    $fp = fopen('app.db.lock', 'w');
    if (flock($fp, LOCK_EX)) {
        global $db;
        $stmt = $db->prepare('INSERT INTO events (data) VALUES (?)');
        $stmt->execute([$data]);
        flock($fp, LOCK_UN);
    }
    fclose($fp);
}

Works until PHP-FPM kills a worker holding the lock. The lock file stays. New workers block forever. You get paged at 3 AM.

Second attempt -- Redis as a distributed lock:

$redis = new Redis();
$redis->connect('127.0.0.1');

function writeEvent($data) {
    global $redis, $db;
    $lock = $redis->set('sqlite_write_lock', 1, ['nx', 'ex' => 5]);
    if ($lock) {
        $stmt = $db->prepare('INSERT INTO events (data) VALUES (?)');
        $stmt->execute([$data]);
        $redis->del('sqlite_write_lock');
    }
}

Better -- the TTL prevents permanent deadlock. But write throughput is now gated by lock acquisition latency. Every web request blocks until it gets the lock, writes, and releases. Under load, the 5-second TTL fires and writes drop silently.

BEGIN IMMEDIATE acquires a reserved lock at transaction start rather than at the first write statement. This eliminates one class of race -- a transaction promoting from reader to writer mid-flight. It does not solve the checkpoint collision. The fundamental issue is multiple processes holding open connections to the same file.

Both fixes try to make multi-writer safe. SQLite is not multi-writer. It is single-writer with a retry loop bolted on.

One Writer Process, Zero Lock Contention

Stop coordinating multiple writers. Run one.

Web requests submit writes to a Redis Stream. A single dedicated process consumes the stream and writes to SQLite. One lock holder. Zero contention.

Full implementation: SQLiteWriteQueue class
class SQLiteWriteQueue
{
    private Redis $redis;
    private PDO $db;
    private string $stream = 'sqlite:writes';
    private string $consumer = 'sqlite:writer';
    private string $dlq = 'sqlite:dlq';

    public function __construct()
    {
        $this->redis = new Redis();
        $this->redis->connect('127.0.0.1');
    }

    public function submitWrite(string $sql, array $params): void
    {
        // Any PHP process can call this -- web, CLI, queue workers
        $this->redis->xAdd($this->stream, '*', [
            'sql' => $sql,
            'params' => json_encode($params),
            'attempt' => 0,
            'submitted_at' => microtime(true)
        ]);
    }

    public function processWrites(): void
    {
        // Only ONE process runs this -- the dedicated writer
        try {
            $this->redis->xGroup('CREATE', $this->stream, $this->consumer, '0');
        } catch (RedisException $e) {
            // Group already exists
        }

        $this->db = new PDO('sqlite:app.db');
        $this->db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

        while (true) {
            $messages = $this->redis->xReadGroup(
                $this->consumer,
                'worker-1',
                [$this->stream => '>'],
                100,
                1000
            );

            if (!$messages) {
                continue;
            }

            foreach ($messages[$this->stream] as $messageId => $data) {
                if (!$this->processMessage($messageId, $data)) {
                    $this->sendToDLQ($messageId, $data);
                }
            }
        }
    }

    private function processMessage(string $messageId, array $data): bool
    {
        $attempt = (int) ($data['attempt'] ?? 0);

        try {
            $sql = $data['sql'];
            $params = json_decode($data['params'], true);

            $start = microtime(true);
            $stmt = $this->db->prepare($sql);
            $stmt->execute($params);
            $duration = microtime(true) - $start;

            if ($duration > 0.1) {
                error_log("Slow write ({$duration}s): $sql");
            }

            $this->redis->xAck($this->stream, $this->consumer, $messageId);
            return true;

        } catch (PDOException $e) {
            if ($e->getCode() === '23000') {
                // Constraint violation -- permanent failure
                $this->redis->xAck($this->stream, $this->consumer, $messageId);
                error_log("Permanent failure: " . $e->getMessage());
                return false;
            }

            if ($attempt >= 3) {
                $this->redis->xAck($this->stream, $this->consumer, $messageId);
                return false;
            }

            // Requeue with incremented attempt
            $data['attempt'] = $attempt + 1;
            $data['last_error'] = $e->getMessage();
            $this->redis->xAdd($this->stream, '*', $data);
            $this->redis->xAck($this->stream, $this->consumer, $messageId);

            usleep(pow(2, $attempt) * 100000); // 100ms, 200ms, 400ms
            return true;
        }
    }

    private function sendToDLQ(string $messageId, array $data): void
    {
        $this->redis->xAdd($this->dlq, '*', array_merge($data, [
            'original_id' => $messageId,
            'failed_at' => date('c'),
            'worker' => gethostname() . ':' . getmypid()
        ]));

        $dlqSize = $this->redis->xLen($this->dlq);
        if ($dlqSize > 100) {
            error_log("WARNING: DLQ size is $dlqSize");
        }
    }

    public function getStats(): array
    {
        $info = $this->redis->xInfo('STREAM', $this->stream);
        $pending = $this->redis->xPending($this->stream, $this->consumer);

        return [
            'queue_length' => $info['length'],
            'pending_count' => $pending[0] ?? 0,
            'oldest_pending_ms' => $pending[2] ?? 0,
            'dlq_size' => $this->redis->xLen($this->dlq),
            'consumers' => $info['groups'][0]['consumers'] ?? 0
        ];
    }
}

The caller side:

$queue = new SQLiteWriteQueue();
$queue->submitWrite(
    'INSERT INTO users (email, created_at) VALUES (?, ?)',
    [$email, date('Y-m-d H:i:s')]
);

The writer runs as a systemd service. It reads up to 100 messages per tick via xReadGroup, writes them sequentially, and acknowledges each. Failed writes retry with exponential backoff (100ms, 200ms, 400ms). Three failures route the message to a dead letter queue.

Consumer groups with manual acknowledgment prevent double-processing. Writer crashes mid-batch? Unacknowledged messages stay in the pending list and get redelivered on restart.

Throughput Increases When Contention Disappears

Benchmark results from this deployment:

Before (3 concurrent FPM workers writing directly to SQLite):
  Throughput:         1,200 writes/sec
  Corruption events:  3-5 per day
  Data loss:          ~500 rows per corruption event

After (Redis Streams + single writer):
  Submitted to Redis: 45,000 writes/sec
  SQLite writes:      8,000 writes/sec (batched)
  Corruption events:  0 in 6 months
  Data loss:          0

One writer faster than three? Yes. The three writers spent most of their cycles waiting for locks, retrying after EAGAIN, and recovering from failed transactions. The single writer owns the file. No lock acquisition. No retry storms. Sequential writes at full disk speed.

Web requests return in microseconds after the Redis XADD. The user never waits for the SQLite write. The queue absorbs spikes.

Monitor the queue. pending_count above 1,000 or oldest_pending_ms above 5,000 means the writer is falling behind. dlq_size above zero means permanent failures need investigation.

Direct Writes vs Queued Writes vs Switching Databases

AxisDirect SQLite (WAL)Redis Stream + Single WriterPostgreSQL
Write latency (caller)1-50ms (lock-dependent)<1ms (Redis XADD)1-5ms (network)
Write throughput1,200/sec (3 workers)8,000/sec (batched)50,000+/sec
Corruption riskRace under concurrent writesNone (single writer)None (MVCC)
Operational costZero (embedded)Redis process + writer serviceSeparate server
When it winsSingle-process CLI tools, mobile appsMulti-process web apps on a single machineAnything with >1 server or >1 app needing the same data

When your deployment grows past one server, stop using SQLite. The single-writer queue works because it runs on one machine. Writes from two machines means a client/server database -- PostgreSQL, MySQL, whatever speaks TCP.

The Same Race Hides in Other File-Backed Stores

This is not an SQLite bug. It is a file-locking bug. Any system using fcntl or flock for write coordination has the same gap between lock release and acquisition.

  • LevelDB and RocksDB enforce single-writer by design -- the process refuses to open if another holds the lock file. SQLite is more permissive, which is why it fails more quietly.
  • PHP file-based sessions use flock during session_start(). Under high concurrency, session writes corrupt the file. Fix: move sessions to Redis or Memcached.
  • Composer's download cache uses file locks for concurrent installs. Those "cache corrupted" errors during parallel CI? Same pattern.

The principle: write serialization. One writer process, zero contention. Throughput rises because queue overhead is cheaper than filesystem lock overhead.

Detect whether you have this problem now:

# Count files with direct SQLite connections
grep -r "new PDO.*sqlite" . --include="*.php" | grep -v test | wc -l
# If > 1 file opens the same database, you have multiple potential writers

# Check for multiple PIDs with open handles during load
lsof | grep your_database.db | awk '{print $2}' | sort -u
# Multiple PIDs = the race condition is possible