The SQLite Renaissance: Why Cloudflare D1 and Workers Are the Right Shape for Agentic Apps
Most of the infrastructure we use to run web applications was designed for a world where a user clicks a button and a human waits for a response. That world is ending faster than most teams realise. The new shape of traffic is agentic: long-running processes, hundreds of concurrent tool calls, bursty writes, per-task state, and workloads that scale not with the number of humans but with the number of tasks a fleet of agents is chewing through at any given moment.
That shift is exposing assumptions baked into every stack we grew up with. Connection pools. Central database primaries. Cold-start penalties on serverless functions. Per-request pricing models that assume a human is behind each request. VPC networking rules that assume workloads live in one region.
We have spent the last several months rebuilding parts of our stack around a very different set of primitives — and the combination that keeps winning is embarrassingly simple: SQLite, as exposed through Cloudflare D1, running behind Cloudflare Workers. This post is about why that combination is the right shape for agentic applications, what the tradeoffs are, and the patterns we have landed on after shipping several of these systems into production.
The Shape of Agentic Workloads
Before we can talk about the right database, we have to be honest about what agentic traffic actually looks like. It is not the same as a traditional web app, and treating it like one is the root cause of most of the operational pain teams are currently feeling.
It is bursty. A single agent run can generate dozens of tool calls in a few seconds, then sit idle for ten minutes waiting for a model response or a human review. Multiply that across a hundred agents and you have traffic that swings from near-zero to thousands of requests per second and back again, with no predictable pattern.
It is highly parallel. Agents do not queue politely. They fan out. A planning agent spawns sub-agents. Each sub-agent calls tools. Each tool call reads and writes state. A single user task can trigger hundreds of simultaneous reads against shared context and dozens of writes to task-specific state.
It is stateful in small increments. Unlike a batch job that reads a million rows at the start and writes a million rows at the end, agents accumulate state in tiny increments. A few bytes here, a few bytes there. A tool call record. A partial result. A memory write. An event in an audit log. The write pattern is closer to a conversation than a transaction.
It is isolated per task. Most agent state is scoped to a single run, a single session, or a single tenant. Two agents working on different tasks should have no reason to contend for the same rows. Two tenants should not share a database at all.
It is latency-sensitive in unexpected places. The latency that matters is not human perception latency. It is the latency between a tool call being issued and its result being written to state, because that latency multiplies across every step the agent takes. Thirty milliseconds of extra write latency per tool call becomes several seconds across a complex plan.
These five characteristics together describe a workload that is brutal for a traditional Postgres-behind-a-connection-pool setup and almost tailor-made for SQLite at the edge.
Why SQLite, Specifically
SQLite is thirty years old and quietly runs on more devices than any other database in history. Every iPhone. Every Android phone. Every browser. Every aircraft. It is the most deployed database on the planet and nobody thinks about it because it works.
For most of those thirty years, SQLite was considered an embedded database — a library you linked into your app, not a database you ran a web service against. The conventional wisdom was that serious web applications used a real database server. SQLite was for testing, for mobile, for tiny use cases.
That conventional wisdom was wrong, and it has taken the industry a long time to notice.
Here is what SQLite gives you that a remote Postgres does not:
Zero network latency on reads. When your database is in the same process as your code, a read is a function call. There is no wire. There is no serialization. There is no pool. A point lookup on an indexed row takes microseconds, not milliseconds.
Unlimited concurrent readers. SQLite’s WAL mode lets any number of processes read the database simultaneously with zero contention. Writers are serialized, but readers are free. For agentic workloads that are read-heavy against shared context — “what did the previous step do?” “what is my task definition?” “what is my tool schema?” — this is a massive win.
Tiny surface area. A SQLite database is a file. You can copy it, ship it, archive it, replicate it. Backup is cp. Disaster recovery is cp. Migration is cp. There is no dump-and-restore. There is no cluster to rebuild. There is no pg_upgrade. There is no vacuuming catastrophe at 3 a.m.
Transactional guarantees you can actually trust. SQLite is famously the most rigorously tested piece of software in open source. Its transactional guarantees are the benchmark other databases measure themselves against, not the other way around.
Per-tenant or per-task databases become free. If spinning up a new database is as cheap as creating a file, you can have one database per tenant, per task, or even per agent run without thinking about it. This is the single most important unlock for agentic architectures and we will come back to it.
The catch has always been that SQLite is a library, and agentic applications run as services. You cannot trivially put a SQLite file behind a web service and have it work well for a distributed workload. That is exactly the problem D1 solves.
What D1 Actually Is
Cloudflare D1 is Cloudflare’s managed SQLite service. Under the hood it is real SQLite — not “SQLite-compatible,” not a fork, but the actual SQLite engine — running inside Cloudflare’s infrastructure and exposed to your Workers as a database binding.
The important details:
Each D1 database is a real SQLite database. You can inspect it, you can dump it, you can import a SQL file into it. It speaks the SQLite dialect. Your queries are SQL queries. Your transactions are SQL transactions.
It is accessed from Workers as a binding, not a connection string. This is a bigger deal than it sounds. There is no connection pool. There is no connection lifecycle. There is no “connection exhausted” error. Your Worker receives a database handle as part of its runtime environment, and queries are dispatched through the D1 Worker API. No TCP handshake. No TLS negotiation. No pool saturation.
It has a generous free tier and cheap paid pricing. A single D1 database on the Workers paid plan gives you tens of millions of rows of storage and hundreds of millions of reads per month for a few dollars. This matters more than it seems because it means the cost of spinning up many databases is near zero, which enables the patterns we will discuss below.
It replicates reads globally. D1 now offers read replication — your database has a primary region for writes and replicates reads to other regions near your users. For agentic workloads where reads vastly outnumber writes, this is material.
It is not infinite. D1 has size limits (tens of gigabytes per database at the time of writing) and write throughput limits (hundreds of writes per second per database). These limits are not a problem if you design around them. They become a problem if you try to use D1 as a single monolithic database for your entire application. More on that shortly.
Why Workers Are the Right Runtime
D1 is only half the story. The other half is Cloudflare Workers, the serverless compute platform that D1 is designed to pair with.
Workers are not Lambda. The difference matters.
Lambda was designed for heavy compute in a traditional cloud environment. It has a cold-start penalty. It charges per invocation and per GB-second of execution time. It runs in a specific region. It talks to databases over a network.
Workers were designed for the edge, for high concurrency, and for short, bursty execution. A Worker starts in under a millisecond because it is not a container — it is a V8 isolate. You can run tens of thousands of Workers concurrently on a single physical machine. Workers charge per request, not per second, and the per-request cost is fractional cents.
Combine that with D1 and you get something interesting: a compute and storage primitive where the cost scales almost perfectly with actual work done, where cold starts are not a factor, and where the database is colocated with the code.
For agentic workloads, the characteristics that matter are:
No cold starts. When your agent issues a tool call and that call hits a Worker, the Worker responds in microseconds. There is no 500-millisecond Lambda warm-up. There is no container scheduling delay. The agent’s loop is not blocked.
Massive concurrency for free. A hundred concurrent agents, each firing ten tool calls a second, is a thousand requests per second. That is nothing for a Workers deployment. You do not provision for it. You do not scale it. It just works, and you pay a few dollars for the burst.
Global edge execution. Your agent is running somewhere — maybe in a browser, maybe in a user’s environment, maybe in a CI runner, maybe in another cloud. Wherever it is, the Worker it calls runs at the nearest Cloudflare PoP. This shaves real latency off every tool call, and as we said, that latency multiplies.
Tight coupling with storage. When your Worker reads from D1, it reads from a binding — the query path is optimized for in-region access. No network hop across AZs. No connection pooling. No DNS lookup. No TLS handshake. Just a query and a result.
The practical consequence of this pairing is that the cost of “do a thing and write it down” drops to near zero. And agentic systems are nothing but hundreds of thousands of instances of “do a thing and write it down.”
The Pattern That Keeps Winning: Database Per Scope
Here is the insight that changed how we think about state for agentic applications. Because D1 databases are cheap and quick to provision, you should create many of them — one per tenant, one per workspace, one per agent run, or even one per task — rather than one giant shared database.
Let that sink in. The assumption in every traditional architecture is that a database is an expensive, long-lived thing you share across all users. That assumption evaporates when a database is a SQLite file and you can have thousands of them for pennies.
The patterns we use:
Database per tenant. Every customer gets their own D1 database. Full stop. Their tool call history, their memory store, their configuration, their audit log, everything. This gives you perfect isolation (a noisy neighbor cannot affect anyone else’s write throughput), trivial data export (hand them their .sqlite file), and simple compliance stories (deletion is dropping a database).
Database per agent run. For agents that execute complex, multi-step plans with significant intermediate state, we create a dedicated D1 database for the run itself. It holds the plan, the step log, the tool call audit trail, the intermediate artifacts. When the run completes, we archive the database to R2 as a single file and reclaim the D1 resources. The entire run is then a single artifact that we can replay, debug, or share.
Database per workspace. For multi-user agentic applications, workspaces get their own database. This keeps collaborative context tightly scoped and makes “share a workspace” equivalent to “clone a database.”
Shared control-plane database. There is still one shared database, but it is small and slow-changing — user accounts, workspace metadata, billing, routing information. The per-tenant databases handle the hot path. The control plane handles provisioning and lookup.
The routing logic in your Worker becomes: look up which database this request belongs to, bind to it, run the query. That lookup is itself a D1 query against the control plane, and it is fast enough to be invisible.
This architecture is impossible with a traditional Postgres setup. You cannot have ten thousand Postgres instances without going bankrupt and insane. You can have ten thousand D1 databases and the monthly bill is a rounding error.
What This Looks Like in Practice
Let’s walk through a concrete example. Imagine you are building an agentic application that helps teams write technical documentation. Each team is a tenant. Each documentation task is an agent run. The agent uses tools to read the existing codebase, search the web, consult a style guide, and draft content. Along the way it logs every step for later review.
In a traditional architecture you would have one Postgres cluster with tables for tenants, users, tasks, tool_calls, artifacts, and audit_events. Every request hits the same database. As you scale, you shard, you add read replicas, you fight connection pool exhaustion, you set up pgbouncer, you monitor slow queries, you worry about vacuum bloat, you page your ops team at 3 a.m. because a single runaway tool call generated ten million rows in tool_calls.
In the architecture we are describing:
- A small shared D1 database holds tenants, users, and task metadata. It is read-heavy, write-light, and it maps a task ID to the name of the D1 database that holds that task’s state.
- Each tenant has their own D1 database for workspace-level state — style guides, persistent memory, shared artifacts.
- Each agent run creates a new D1 database, scoped to that run. The plan lives there. The tool call log lives there. Every intermediate artifact lives there.
- When the run completes, the run-specific database is exported to R2 as a single file and its D1 resources are released. The R2 file is now a complete, self-contained record of the run that you can hand to a debugger, a compliance auditor, or a replay harness.
- Workers handle the whole thing. Tool calls hit Workers. The Workers route based on task ID. They bind to the right D1 database. They execute the query. They return a result.
The entire system has no connection pool, no shared database bottleneck, no vacuum schedule, no read replica lag, no noisy neighbor problem, and no ops pager. It scales because each tenant and each run is independent. It is cheap because per-database costs are negligible and you pay almost exclusively for the work you actually do.
Durable Objects: The Piece That Makes It All Coherent
There is one thing we have not talked about yet, and it is the piece that makes the whole pattern work: Cloudflare Durable Objects.
A Durable Object is a tiny stateful primitive that lives in one specific place in Cloudflare’s network, has its own persistent storage, and serializes all the requests that target it. Think of it as a single-instance actor with attached storage. You can have millions of them. Each one is cheap.
Why this matters for agentic applications: Durable Objects are the right primitive for coordinating work that must not run concurrently. If you have an agent run that should only be executed by one process at a time — no matter how many clients try to poke it — a Durable Object gives you that guarantee without a single line of locking code.
The pattern we use:
- Every agent run is represented by a Durable Object keyed on the run ID.
- The Durable Object holds the run’s in-memory state (the current plan, the next step, the pending tool calls) and persists checkpoints to its own embedded storage.
- Tool call results arrive at the Durable Object, which applies them atomically to the state.
- Larger, queryable state — the full audit log, the list of artifacts, the history of tool calls — is written to the run’s dedicated D1 database.
- When the run finishes, the Durable Object archives the D1 database to R2 and shuts itself down.
This gives you the best of both worlds: the single-writer guarantee and in-memory speed of a Durable Object for the hot path, plus the queryable, exportable, auditable surface of a D1 database for anything you need to inspect or share.
It is also how you get around D1’s write throughput limit. Instead of hammering the database with concurrent writes from many processes, you funnel all writes through the Durable Object, which batches and commits them on its own schedule. The database sees orderly, rate-limited writes. The clients see immediate acknowledgment from the Durable Object. Everyone wins.
Queues, KV, and Workers AI: The Supporting Cast
The D1-plus-Workers-plus-Durable-Objects core is enough for a lot of agentic applications. For production systems you will usually also want:
Cloudflare Queues for async work that should not block the agent’s loop. A tool call fires, the Worker enqueues a heavy task, the agent gets an immediate acknowledgment, and a consumer Worker processes the task in the background. This is critical for tool calls that involve long-running external APIs — you do not want your agent’s loop to sit waiting on a five-second HTTP call if you can fan it out.
Workers KV for cheap, globally distributed reads of rarely-changing data. Tool schemas, prompt templates, model routing tables, feature flags. KV is eventually consistent, which is fine for data that changes slowly and is read constantly.
R2 for large artifacts and archived databases. When an agent run produces a file, it goes in R2. When a run completes, the run database goes in R2 as a .sqlite file. R2 has no egress fees, which makes it cheap to pull archived runs back into a debugger or replay harness.
Workers AI for model inference at the edge. For the growing class of agentic applications where you want model inference close to state — say, a reranker that reads from D1 and scores results — running the model in the same Worker as the query is a real speedup and often a cost reduction.
The composition here is deliberate. Each primitive does one thing well. Workers are the compute. D1 is the queryable state. Durable Objects are the coordination. Queues are the async backbone. KV is the cache. R2 is the archive. Workers AI is the inference. They all speak to each other through bindings, with no connection strings, no credentials to rotate, and no network hops outside Cloudflare’s infrastructure.
This is not a collection of services. It is a coherent runtime for a particular shape of application, and that shape happens to be the shape of agentic software.
The Honest Tradeoffs
We are not going to pretend this architecture has no downsides. It has several, and you should know them before you commit.
D1 is not for massive single-database workloads. If you need one database with hundreds of gigabytes of data and thousands of writes per second, D1 is not the right tool. You want Postgres, or a distributed SQL system, or a purpose-built analytical store. D1 shines when you can partition your workload into many databases, each of which is modestly sized.
Write throughput per database is limited. Hundreds of writes per second sounds like a lot until you actually hit it. If you have a hot agent run that is generating thousands of tool call records per second, you will saturate the database. The answer is usually to batch writes through a Durable Object, but you have to design for it.
SQL dialect differences matter at the margins. D1 supports the SQLite dialect, which is close to but not identical to Postgres or MySQL. If you are migrating an existing application with a lot of vendor-specific SQL, expect some work to port queries.
Tooling is younger. The observability story for D1 and Workers is improving quickly but still lags a mature Postgres + Kubernetes setup. You will be writing some of your own dashboards. You will be learning how to read the metrics that Workers exposes. You will be a little further out on the frontier.
Vendor concentration is real. Moving your stack into Cloudflare means you have one vendor for compute, storage, networking, CDN, and coordination primitives. That is a lot of eggs in one basket. For some teams that is a good trade — one bill, one SLA, one set of credentials, one integration surface. For other teams it is an unacceptable concentration risk. Know which one you are.
It does not look like what most of your engineers know. Most backend engineers have spent their careers building against a central SQL database behind an ORM, in a Kubernetes cluster, behind a load balancer. This architecture is different enough that they will need to rethink some habits. That is a cost. It is also, in our experience, a learning investment that pays back quickly.
Why This Matters For Agentic Apps Specifically
Pull back from the technical details and look at the overall shape of what we have described. We have a runtime that:
- Scales automatically with bursty, parallel workloads
- Charges proportionally to work actually done
- Provisions state with near-zero friction
- Colocates compute with storage
- Isolates tenants and runs by default
- Has no connection pools to saturate
- Has no cold starts to absorb
- Has no ops pager at 3 a.m. for routine workloads
- Produces archivable, portable artifacts for every run
- Runs globally at the edge with no regional pinning
Now read that list again and think about what a fleet of AI agents actually needs from its infrastructure. It is the same list. The match is not coincidental — it is the result of SQLite’s architecture being a better fit for the workload shape than anything else commercially available.
The irony is that we had to go back thirty years, to an embedded database that everyone thought was unserious, to find the primitive that fits the most modern workload we know how to build. That is a pattern we see often in this industry. The right tool was sitting in plain sight the whole time. Someone just had to put it in the right place.
D1 and Workers put SQLite in the right place.
When You Should Use This Stack
To be specific about it, here is when we reach for this architecture:
- You are building an agentic application where many agents are running concurrently and producing significant per-run state.
- Your workload is naturally partitioned by tenant, workspace, session, or run.
- You need low-latency reads and writes from globally distributed clients.
- Your write pattern is many small writes rather than a few huge ones.
- You want to ship and iterate quickly without spending weeks on infrastructure.
- You value per-run artifacts that can be archived, shared, and replayed.
- You are comfortable investing in the Cloudflare ecosystem.
And here is when we would not:
- Your workload is dominated by a single massive database with complex cross-table analytics.
- You need write throughput that exceeds a few hundred ops per second on a single logical dataset.
- You have hard regulatory requirements that prevent running on a specific vendor’s infrastructure.
- You have a large investment in Postgres-specific features that would be expensive to port.
- Your team is early and you need to optimize for the hiring market rather than for architecture fit.
Most agentic applications we are involved in today fall squarely in the first bucket. Your mileage may vary.
Where We Are Going
We are going to keep writing about this. There is a lot more to say about how we handle schema migrations across thousands of per-tenant databases, how we run backfills without affecting live agent runs, how we route requests to the right database with minimal overhead, and how we observe and debug these systems in production.
For now, if you are wrestling with the pain of running agentic workloads on a stack that was designed for human-scale web traffic, take a serious look at D1 and Workers. Start with a single agent run. Put its state in a dedicated D1 database. Coordinate it through a Durable Object. Ship the whole thing as a Worker. Measure the latency. Measure the cost. Compare it to what you have today.
We think you will find, as we did, that the SQLite renaissance is not a nostalgia trip. It is the most underrated piece of infrastructure in the business right now, and it is quietly becoming the right default for a whole class of applications that nobody had invented when it was written.
The future of backend infrastructure for agents is small databases, close to the compute, arranged in the millions. D1 and Workers happen to be the easiest way to get there today.

