Back to Blog

Why Filesystems Are the Natural Execution Substrate for AI Agents

A filesystem is the most natural execution substrate for AI agents because it mirrors how computation actually happens in practice. Learn how Duvo combines real filesystems with ephemeral, sandboxed environments to give agents deterministic state, native tool access, and the ability to chain complex multi-step workflows reliably.

Ondrej Romancov, Pavel Kral, Daniel Bukac
February 04, 2026 9 min read

Don't want to scroll? Summarize with AI

Why Filesystems Are the Natural Execution Substrate for AI Agents
11:23

A filesystem is the most natural execution substrate for agents because it mirrors how computation actually happens in practice. At Duvo, agents don't just call APIs or return text, they execute real programs. They run scripts, transform data, orchestrate multi-step workflows, read and write files, create directories, and produce intermediate artifacts that subsequent steps depend on.

Key Takeaways

  • Filesystems unlock tool-native reasoning - By giving agents a real filesystem, they can leverage decades of battle-tested developer tools (grep, git, Python scripts) and work with large datasets that far exceed context-window limits, enabling tighter feedback loops between generation, inspection, and execution.
  • Per-run sandboxing delivers security by construction - Every agent executes in its own isolated, ephemeral E2B sandbox that spins up in milliseconds and disappears after execution, eliminating shared-state risks and making horizontal scaling practical.
  • The workspace is the connective tissue for multi-step workflows - A structured /workspace directory holds reference documents, skills, memory, and tool outputs, letting agents chain complex operations—query a database, write results to a file, run analysis scripts, produce reports—just like a developer working in a project directory.

Why a Filesystem

This maps cleanly onto the assumptions behind modern developer and data tooling, where the filesystem is treated as a first-class primitive. Bash commands, CLIs, and scripting languages like Python are deeply represented in model training data, and coding agents are increasingly optimized around filesystem interaction. By giving agents a real filesystem, we let them stand on decades of battle-tested tools — grep, find, git, globbing, diffing — instead of forcing us to re-implement fragile, bespoke abstractions.

Crucially, native discovery via files often outperforms RAG-style retrieval when agents reason over generated artifacts. Rather than relying on embeddings or summaries, the agent can directly inspect the ground-truth state: what files exist, what changed, and how data is structured right now. This leads to more reliable reasoning, fewer hallucinations, and simpler mental models.

Another major advantage is low-latency local I/O. Agents can work with large files like CSVs, Parquet, Arrow, logs and exports, that far exceed context-window limits, yet remain immediately accessible to code and tools. This creates a tight feedback loop where generation, inspection, and execution reinforce each other.

Taken together, a real filesystem gives agents a deterministic, inspectable state across steps. That determinism materially improves reliability, debuggability, and auditability in production workflows — which is a prerequisite for agents that are expected to do real work consistently and reliably.

Security

A filesystem only becomes viable for agents when it is fully sandboxed. In Duvo, every agent works inside its own isolated sandbox backed by E2B — not just a logical namespace, but a hard isolation boundary. Agents cannot access Duvo's infrastructure, other tenants' data, or even other runs of themselves.

This gives us security by construction. If an agent executes destructive, buggy, or even malicious code, the blast radius is strictly limited to its own environment. There's no shared state to corrupt and no implicit trust boundary to cross.

Sandboxes are intentionally ephemeral by default. They are created per run and disappear after execution unless the state is explicitly persisted. This eliminates lingering state and cleanup complexity. Default ephemerality also reduces data-retention risk: nothing survives unless we make a deliberate decision for it to do so.

Just as importantly, sandbox startup is fast. We're talking tens of milliseconds. This makes per-run isolation practical at scale. Hundreds of agent runs can execute in parallel without long warm-up times or complex capacity planning.

Features like direct shell execution, programmatic filesystem access, pause/resume for long-running agents, and presigned URLs for large file transfers let agents move data efficiently without routing everything through Duvo's backend.

The result is a system designed for bursty, ephemeral workloads: reproducible, isolated, and horizontally scalable.

Filesystem Persistence: Keeping the Workspace Alive For Followups

A micro-VM can be further useful as it can outlive a single interaction. At Duvo, agents are empowered to work autonomously but may still ask for help or approvals from human counterparts. Users can enter the loop from services like Slack, Microsoft Teams and other places where your company already operates. That's why Duvo sandboxes aren't always strictly ephemeral, they're semi-persistent. When an agent finishes a task or waits for user input, the VM is temporarily paused rather than destroyed.

The entire filesystem freezes in place, preserving:

  • Scripts the agent wrote
  • Data files it downloaded
  • Full agent trajectory

When the user sends a follow-up message or responds to a human in the loop request, Duvo reconnects to the same paused sandbox, and the agent picks up exactly where it left off, with every file still on disk. This makes multi-turn workflows feel continuous rather than starting from scratch each time. When a session resumes, the agent's state, conversation history, and todo list is reconstructed entirely from the sandbox itself, no external database required. The filesystem serves as the canonical source of truth for the exact time necessary and not a minute longer.

How Duvo Organizes the Agent Filesystem

At the heart of that VM is a single directory: /workspace.

Everything the agent needs: reference documents, skills, memory and tool outputs lives here, giving it a predictable environment where it can read, write, and reason about files the same way a developer works in a project directory.

Before work begins, Duvo populates the workspace:

  • Reference Documents: Uploaded files like policy manuals, product specs, and CSV datasets are placed on disk. Files live here rather than inline in the prompt, so a 500-page manual costs zero tokens until the agent actually opens the relevant section.
  • Skills: Structured instructions encoding domain knowledge or workflows are placed into the workspace. Multiple skills can be loaded simultaneously, each in its own directory.
  • Memory: Accumulated memory from previous runs is written to a markdown file at the workspace root — following the pattern established by AI coding agents like Claude Code, Gemini CLI, and OpenAI Codex, which use dedicated files (CLAUDE.md, GEMINI.md, AGENTS.md) to persist instructions and context across sessions.

The workspace is also where work product resides:

  • Tool Outputs: Query results land as CSVs, exported Google Sheets become structured files, and browser screenshots are saved to disk.
  • Agent Artifacts: The agent creates its own files here, such as Python scripts, JSON files, markdown reports, and generated images.

This makes the filesystem the connective tissue for chaining operations: query a database, write results to a file, run a Python script against it, produce a chart — all within /workspace. The agent can iterate on its work the same way a human developer does: draft a script, run it, inspect the output, revise, and repeat.

The Workspace in Action: Real-World Examples

To see how this plays out, here are three patterns we use daily.

Data Analysis: Turning Query Results into Actionable Insights

When an AI agent runs a database query, the results can be massive: thousands or even millions of rows. Dumping all of that into the agent's conversation would overwhelm its context window and waste precious tokens on raw data the agent can't meaningfully process inline. Instead, Duvo writes query results directly to the agent's workspace as files. The agent receives only a compact summary that includes the row count, column names, file location and a hint on how to load the data. From there, the agent can write its own Python scripts to slice, filter, aggregate, and visualize the dataset however it needs. This turns the agent from a passive reader of data into an active analyst: it can iterate on its analysis, try different approaches, and produce charts or reports, all without ever clogging its thinking space with raw rows.

Google Sheets: Understanding Structure, Not Just Values

A spreadsheet is more than its cell values. Formulas encode business logic where as formatting communicates meaning. Bold headers, color-coded statuses, currency symbols. To give the agent a full picture, Duvo exports each sheet as three separate CSV files: one for the displayed data, one for the underlying formulas, and one for the cell formatting. This three-layer representation lets the agent truly understand a spreadsheet the way a human would. It can see that column D isn't just numbers — it's a SUM formula. It can notice that red-highlighted rows signal overdue items. When the agent needs to update the sheet, it can modify the right layer: updating a formula rather than overwriting it with a static value, or preserving formatting that carries meaning. The files live in the agent's workspace, so it can read, compare, and manipulate them with standard tools before pushing changes back.

Browser Screenshots: Bridging the Visual and Digital Worlds

When an AI agent browses the web, it sometimes encounters information that only exists visually — a QR code on a payment page, a CAPTCHA-protected download link, or a chart that needs to be shared. Duvo saves browser screenshots to the agent's filesystem, making them first-class artifacts the agent can work with. This is more powerful than just "seeing" a page: because the screenshot is a real file, the agent can upload it to other systems, attach it to emails, pass it through image processing, or store it for documentation. A practical example: the agent navigates to a vendor portal, captures a QR code displayed on screen, and then uploads that image to an internal payment system — a workflow that bridges the visual web with programmatic action, all through the filesystem.

Summary

The filesystem isn't just a storage layer. It's the foundation that makes reliable agent execution possible.

By treating the filesystem as a first-class primitive, Duvo agents gain three critical capabilities:

  • Deterministic state — Every file, script, and artifact is inspectable and reproducible. No hidden state, no magic. The workspace is the source of truth.
  • Native tooling — Agents leverage the same battle-tested tools developers use daily (grep, git, Python) rather than fragile abstractions built from scratch.
  • Composable workflows — Complex operations chain naturally: query data, write to disk, run analysis, generate reports. Each step builds on the last, just like working in a real project directory.

Combined with millisecond-startup sandboxes and semi-persistent pause/resume, this architecture delivers agents that don't just produce plausible output — they do real, verifiable work.

Like what you read? Share with a friend

Ondrej Romancov, Pavel Kral, Daniel Bukac

Related Articles