/posts/sqlite-lying-concurrent-writes
WAL Mode Does Not Make SQLite Multi-Writer
TL;DR
Three PHP-FPM workers, one SQLite file, and a WAL checkpoint race that corrupts your database without raising an error. The fix trades concurrent access for a single dedicated writer backed by Redis Streams.
Guide
Three PHP-FPM workers. One SQLite file. WAL mode on. Every best-practice PRAGMA set.
The database corrupts anyway. No error fires. Your backups replicate the damage before you notice.
WAL mode makes concurrent reads safe. It does not make concurrent writes safe. A checkpoint can move pages while a second writer sits between lock attempts -- and SQLite will not tell you. The fix is not a PRAGMA or a retry loop. It is architectural: one process writes, everything else submits to a queue.
Three PHP Workers, One Database File
Run lsof app.db on a busy server. Three PIDs. Three open file descriptors. Three processes convinced they own the database.
// Each PHP-FPM worker opens its own connection
$db = new PDO('sqlite:app.db');
$db->exec('PRAGMA journal_mode=WAL');
$db->exec('PRAGMA busy_timeout=5000');
function writeEvent($data) {
global $db;
$stmt = $db->prepare('INSERT INTO events (data) VALUES (?)');
$stmt->execute([$data]);
}
busy_timeout tells SQLite to retry for 5 seconds before returning "database is locked." The retry loop is where things break.
Attach strace to all three processes under load:
# Process 1
openat(AT_FDCWD, "app.db", O_RDWR) = 3
fcntl(3, F_SETLK, {type=F_WRLCK...}) = 0 # Got the write lock
# Process 2 (50 microseconds later)
openat(AT_FDCWD, "app.db", O_RDWR) = 3
fcntl(3, F_SETLK, {type=F_WRLCK...}) = -1 EAGAIN
# EAGAIN -- lock held by Process 1, entering retry loop
# Process 3 (during Process 2's retry)
openat(AT_FDCWD, "app.db", O_RDWR) = 3
fcntl(3, F_SETLK, {type=F_RDLCK...}) = 0 # Got a READ lock
Process 3 grabbed a read lock while Process 2 sat between retries. That read lock is the trigger.
WAL Checkpoints Collide With Lock Gaps
Timing logs from production reconstructed the sequence:
[P1] 1706284800.001 - Acquiring write lock
[P2] 1706284800.002 - Lock busy, entering retry
[P3] 1706284800.003 - Checkpoint triggered
[P1] 1706284800.004 - Writing row ID 1234
[P3] 1706284800.005 - Checkpoint moving pages
[P2] 1706284800.006 - Got lock, writing row ID 1234
Two processes wrote the same row ID. No error.
WAL mode appends changes to a write-ahead log instead of modifying the main database file. Periodically, a checkpoint copies committed pages from the WAL back into the main file. The checkpoint needs a read lock to verify no readers are using the pages it moves.
The race unfolds in four steps. Process 1 holds the write lock and writes a row. Process 3 starts a checkpoint with a read lock, begins moving pages -- including the page Process 1 just wrote. Process 2, stuck in its busy_timeout retry loop, finally acquires the write lock after Process 1 releases it. Process 2 writes to a page the checkpoint is actively relocating.
The result: a page containing partial data from two writes. PRAGMA integrity_check catches it -- after the fact:
$result = $db->query('PRAGMA integrity_check')->fetchAll();
// Anything other than [['ok']] means the database is already damaged
The corruption is silent at write time. The INSERT succeeds. The error surfaces when a future read hits the damaged page or when you run an explicit integrity check. By then, your backups have already copied the corrupted file.
File Locks Do Not Survive Process Death
The obvious fix: an application-level lock. First attempt -- advisory file lock:
function writeEvent($data) {
$fp = fopen('app.db.lock', 'w');
if (flock($fp, LOCK_EX)) {
global $db;
$stmt = $db->prepare('INSERT INTO events (data) VALUES (?)');
$stmt->execute([$data]);
flock($fp, LOCK_UN);
}
fclose($fp);
}
Works until PHP-FPM kills a worker holding the lock. The lock file stays. New workers block forever. You get paged at 3 AM.
Second attempt -- Redis as a distributed lock:
$redis = new Redis();
$redis->connect('127.0.0.1');
function writeEvent($data) {
global $redis, $db;
$lock = $redis->set('sqlite_write_lock', 1, ['nx', 'ex' => 5]);
if ($lock) {
$stmt = $db->prepare('INSERT INTO events (data) VALUES (?)');
$stmt->execute([$data]);
$redis->del('sqlite_write_lock');
}
}
Better -- the TTL prevents permanent deadlock. But write throughput is now gated by lock acquisition latency. Every web request blocks until it gets the lock, writes, and releases. Under load, the 5-second TTL fires and writes drop silently.
BEGIN IMMEDIATEacquires a reserved lock at transaction start rather than at the first write statement. This eliminates one class of race -- a transaction promoting from reader to writer mid-flight. It does not solve the checkpoint collision. The fundamental issue is multiple processes holding open connections to the same file.
Both fixes try to make multi-writer safe. SQLite is not multi-writer. It is single-writer with a retry loop bolted on.
One Writer Process, Zero Lock Contention
Stop coordinating multiple writers. Run one.
Web requests submit writes to a Redis Stream. A single dedicated process consumes the stream and writes to SQLite. One lock holder. Zero contention.
Full implementation: SQLiteWriteQueue class
class SQLiteWriteQueue
{
private Redis $redis;
private PDO $db;
private string $stream = 'sqlite:writes';
private string $consumer = 'sqlite:writer';
private string $dlq = 'sqlite:dlq';
public function __construct()
{
$this->redis = new Redis();
$this->redis->connect('127.0.0.1');
}
public function submitWrite(string $sql, array $params): void
{
// Any PHP process can call this -- web, CLI, queue workers
$this->redis->xAdd($this->stream, '*', [
'sql' => $sql,
'params' => json_encode($params),
'attempt' => 0,
'submitted_at' => microtime(true)
]);
}
public function processWrites(): void
{
// Only ONE process runs this -- the dedicated writer
try {
$this->redis->xGroup('CREATE', $this->stream, $this->consumer, '0');
} catch (RedisException $e) {
// Group already exists
}
$this->db = new PDO('sqlite:app.db');
$this->db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
while (true) {
$messages = $this->redis->xReadGroup(
$this->consumer,
'worker-1',
[$this->stream => '>'],
100,
1000
);
if (!$messages) {
continue;
}
foreach ($messages[$this->stream] as $messageId => $data) {
if (!$this->processMessage($messageId, $data)) {
$this->sendToDLQ($messageId, $data);
}
}
}
}
private function processMessage(string $messageId, array $data): bool
{
$attempt = (int) ($data['attempt'] ?? 0);
try {
$sql = $data['sql'];
$params = json_decode($data['params'], true);
$start = microtime(true);
$stmt = $this->db->prepare($sql);
$stmt->execute($params);
$duration = microtime(true) - $start;
if ($duration > 0.1) {
error_log("Slow write ({$duration}s): $sql");
}
$this->redis->xAck($this->stream, $this->consumer, $messageId);
return true;
} catch (PDOException $e) {
if ($e->getCode() === '23000') {
// Constraint violation -- permanent failure
$this->redis->xAck($this->stream, $this->consumer, $messageId);
error_log("Permanent failure: " . $e->getMessage());
return false;
}
if ($attempt >= 3) {
$this->redis->xAck($this->stream, $this->consumer, $messageId);
return false;
}
// Requeue with incremented attempt
$data['attempt'] = $attempt + 1;
$data['last_error'] = $e->getMessage();
$this->redis->xAdd($this->stream, '*', $data);
$this->redis->xAck($this->stream, $this->consumer, $messageId);
usleep(pow(2, $attempt) * 100000); // 100ms, 200ms, 400ms
return true;
}
}
private function sendToDLQ(string $messageId, array $data): void
{
$this->redis->xAdd($this->dlq, '*', array_merge($data, [
'original_id' => $messageId,
'failed_at' => date('c'),
'worker' => gethostname() . ':' . getmypid()
]));
$dlqSize = $this->redis->xLen($this->dlq);
if ($dlqSize > 100) {
error_log("WARNING: DLQ size is $dlqSize");
}
}
public function getStats(): array
{
$info = $this->redis->xInfo('STREAM', $this->stream);
$pending = $this->redis->xPending($this->stream, $this->consumer);
return [
'queue_length' => $info['length'],
'pending_count' => $pending[0] ?? 0,
'oldest_pending_ms' => $pending[2] ?? 0,
'dlq_size' => $this->redis->xLen($this->dlq),
'consumers' => $info['groups'][0]['consumers'] ?? 0
];
}
}
The caller side:
$queue = new SQLiteWriteQueue();
$queue->submitWrite(
'INSERT INTO users (email, created_at) VALUES (?, ?)',
[$email, date('Y-m-d H:i:s')]
);
The writer runs as a systemd service. It reads up to 100 messages per tick via xReadGroup, writes them sequentially, and acknowledges each. Failed writes retry with exponential backoff (100ms, 200ms, 400ms). Three failures route the message to a dead letter queue.
Consumer groups with manual acknowledgment prevent double-processing. Writer crashes mid-batch? Unacknowledged messages stay in the pending list and get redelivered on restart.
Throughput Increases When Contention Disappears
Benchmark results from this deployment:
Before (3 concurrent FPM workers writing directly to SQLite):
Throughput: 1,200 writes/sec
Corruption events: 3-5 per day
Data loss: ~500 rows per corruption event
After (Redis Streams + single writer):
Submitted to Redis: 45,000 writes/sec
SQLite writes: 8,000 writes/sec (batched)
Corruption events: 0 in 6 months
Data loss: 0
One writer faster than three? Yes. The three writers spent most of their cycles waiting for locks, retrying after EAGAIN, and recovering from failed transactions. The single writer owns the file. No lock acquisition. No retry storms. Sequential writes at full disk speed.
Web requests return in microseconds after the Redis XADD. The user never waits for the SQLite write. The queue absorbs spikes.
Monitor the queue.
pending_countabove 1,000 oroldest_pending_msabove 5,000 means the writer is falling behind.dlq_sizeabove zero means permanent failures need investigation.
Direct Writes vs Queued Writes vs Switching Databases
| Axis | Direct SQLite (WAL) | Redis Stream + Single Writer | PostgreSQL |
|---|---|---|---|
| Write latency (caller) | 1-50ms (lock-dependent) | <1ms (Redis XADD) | 1-5ms (network) |
| Write throughput | 1,200/sec (3 workers) | 8,000/sec (batched) | 50,000+/sec |
| Corruption risk | Race under concurrent writes | None (single writer) | None (MVCC) |
| Operational cost | Zero (embedded) | Redis process + writer service | Separate server |
| When it wins | Single-process CLI tools, mobile apps | Multi-process web apps on a single machine | Anything with >1 server or >1 app needing the same data |
When your deployment grows past one server, stop using SQLite. The single-writer queue works because it runs on one machine. Writes from two machines means a client/server database -- PostgreSQL, MySQL, whatever speaks TCP.
The Same Race Hides in Other File-Backed Stores
This is not an SQLite bug. It is a file-locking bug. Any system using fcntl or flock for write coordination has the same gap between lock release and acquisition.
- LevelDB and RocksDB enforce single-writer by design -- the process refuses to open if another holds the lock file. SQLite is more permissive, which is why it fails more quietly.
- PHP file-based sessions use
flockduringsession_start(). Under high concurrency, session writes corrupt the file. Fix: move sessions to Redis or Memcached. - Composer's download cache uses file locks for concurrent installs. Those "cache corrupted" errors during parallel CI? Same pattern.
The principle: write serialization. One writer process, zero contention. Throughput rises because queue overhead is cheaper than filesystem lock overhead.
Detect whether you have this problem now:
# Count files with direct SQLite connections
grep -r "new PDO.*sqlite" . --include="*.php" | grep -v test | wc -l
# If > 1 file opens the same database, you have multiple potential writers
# Check for multiple PIDs with open handles during load
lsof | grep your_database.db | awk '{print $2}' | sort -u
# Multiple PIDs = the race condition is possible