OpenClaw and the Security Crisis Nobody Planned For

/images/blog-generated/openclaw-security-crisis-nobody-planned-for.webp

In November 2025, Austrian developer Peter Steinberger – previously known as the founder of PSPDFKit – released OpenClaw, a free, open-source autonomous AI agent that connects to your messaging platforms, uses LLMs, and executes real-world tasks on your behalf. By early 2026, it had exploded to over 135,000 GitHub stars. In February 2026, Steinberger announced he was joining OpenAI.

And by March 2026, OpenClaw had become one of the most significant security incidents in the history of consumer-facing AI.

This is not a hit piece on Steinberger or the OpenClaw project. The engineering is impressive. The vision of a personal AI agent that can manage your calendar, control your smart home, triage your messages, and automate repetitive tasks across 50+ integrations is genuinely compelling. The problem is not ambition. The problem is architecture. And the consequences of getting the architecture wrong, when you are giving an AI agent near-total access to a user’s operating system and communications infrastructure, are catastrophic.

At CONFLICT, we have been building agentic AI systems for over two years. We wrote about zero trust for AI workloads last month because this exact class of problem keeps us up at night. OpenClaw is a case study in everything we have been warning about – and the speed at which it unraveled makes the lessons impossible to ignore.

What OpenClaw Does (And Why People Loved It)

OpenClaw connects to WhatsApp, Telegram, Slack, Discord, Signal, iMessage, and Microsoft Teams. You message it like you would message a person. It uses large language models to understand your intent and then executes tasks using over 100 built-in skills and 50+ integrations. Need to check your calendar, draft an email, adjust your thermostat, query a database, and summarize a PDF? One message. OpenClaw handles it.

The skill system is extensible through ClawHub, a marketplace where developers publish and share new capabilities. Want your agent to manage your Spotify playlists, monitor your crypto portfolio, or automate your Shopify store? There is probably a ClawHub skill for that.

The appeal is obvious. This is the AI assistant that every demo has promised since GPT-3. The fact that it is open-source, self-hosted, and free made it irresistible to developers and power users. One hundred thirty-five thousand GitHub stars in a few months is not hype. That is genuine demand for a product that solves a real problem.

The trouble is that solving the product problem without solving the security problem first is like building a car with an incredible engine and no brakes.

The Security Meltdown: A Timeline

The security issues with OpenClaw did not emerge gradually. They came in waves, each more alarming than the last, and they came from every direction – the skill marketplace, the core application, the deployment model, and the agent’s interaction with the broader internet.

ClawHub: The Malware Marketplace

ClawHub is OpenClaw’s equivalent of npm, PyPI, or the Chrome Web Store – a community-driven marketplace where anyone can publish skills that extend the agent’s capabilities. Like those platforms, it became a malware delivery channel almost immediately.

In early 2026, Snyk published the results of a comprehensive audit they called the ToxicSkills study. Their team analyzed approximately 3,984 skills published on ClawHub. The findings were staggering:

Roughly 1 in 5 packages (over 1,184) were malicious. These were not borderline cases. They contained keyloggers, backdoors, infostealers, and remote access trojans (RATs).
127+ skills requested raw secrets – private keys, Stripe API keys, Azure secrets, password-manager master passwords – instead of using OAuth or any other standard credential delegation mechanism.
36% of skills contained prompt injection vulnerabilities that could cause the agent to execute unintended actions.

To put this in perspective: the worst supply chain security numbers in the npm ecosystem have never come close to a 30% malware rate. The Python Package Index has had incidents, but nothing at this scale relative to the total package count. ClawHub achieved in months what took other ecosystems years to accumulate, and it did so because the barrier to entry was even lower. A ClawHub skill is not a traditional software package. It is a set of instructions and configurations that an AI agent follows. Writing a malicious skill does not require sophisticated software engineering. It requires understanding how to manipulate an LLM.

Critical Vulnerabilities in the Core

The ClawHub problems would be bad enough in isolation. But the core OpenClaw application had its own critical vulnerabilities.

CVE-2026-30741 was a remote code execution vulnerability that allowed arbitrary code execution through prompt injection. Documented by SentinelOne, it meant that a carefully crafted message sent to an OpenClaw instance – through any of its connected messaging platforms – could execute arbitrary code on the host machine. Not code within a sandbox. Code on the actual operating system. If your OpenClaw instance was running on your laptop, an attacker who knew your Telegram handle could potentially gain full access to your machine.

The ClawJacked vulnerability, reported by The Hacker News, was arguably worse in its subtlety. A malicious website could hijack a local OpenClaw instance via WebSocket, without requiring any plugins, extensions, or user interaction beyond visiting the page. Browse a compromised website while OpenClaw is running locally, and the website can issue commands to your agent. Your agent, which has access to your messaging platforms, your email, your smart home, and whatever other integrations you have configured.

Exposed Instances at Scale

Between January 27 and February 8, 2026, Bitsight identified over 30,000 OpenClaw instances exposed directly to the internet. Not behind a VPN. Not behind authentication. Exposed. These instances were accessible to anyone who found them, and finding them was trivial with standard internet scanning tools.

Thirty thousand instances, each potentially connected to the operator’s messaging platforms, smart home devices, and business tools. Each running with whatever permissions the user configured, which – given the documentation’s emphasis on ease of setup over security hardening – often meant broad permissions on the host system.

Industry Response

The security community’s response was comprehensive and unambiguous. Major analyses were published by CrowdStrike, Trend Micro, Cisco, Microsoft, Kaspersky, Malwarebytes, SentinelOne, and Snyk. When that many security firms independently flag the same product, it is not a difference of opinion. It is a consensus.

In March 2026, China restricted state agencies from using OpenClaw, citing security concerns. Whatever your politics, when a nation-state’s security apparatus decides a consumer tool is too dangerous for government employees, that is a signal worth paying attention to.

The Architectural Failure

It would be easy to frame this as a series of bugs that need to be patched. Fix the RCE. Fix the WebSocket hijacking. Add review processes to ClawHub. Problem solved, ship the next version.

That framing misses the point entirely. The security failures in OpenClaw are not bugs. They are consequences of architectural decisions that are fundamentally incompatible with secure operation.

Problem 1: Unrestricted OS Access

OpenClaw runs as a process on the host operating system with the permissions of the user who launched it. It can read files, execute commands, make network requests, and interact with other applications. This is by design – the agent needs broad capabilities to be useful.

But “useful” and “secure” are in direct tension here. A personal AI agent needs to do things on your behalf. That means it needs access. The question is how you scope and control that access. OpenClaw’s answer is: you don’t, really. The agent runs with your permissions, and trust is assumed.

This is the equivalent of giving every npm package you install full access to your filesystem, network, and running processes. We know that is a terrible idea. We have spent years building sandboxing, permission systems, and isolation mechanisms to prevent exactly that scenario in other software. OpenClaw discarded all of that institutional knowledge.

Problem 2: Unvetted Skill Marketplace

ClawHub has no meaningful security review process. Skills can be published by anyone, and they execute within the agent’s permission context – which, as established, is the user’s full operating system access.

This is npm’s supply chain problem, amplified by the fact that these “packages” are consumed and executed by an LLM that can be manipulated through prompt injection. A traditional malicious npm package needs to execute code during installation or import. A malicious ClawHub skill just needs to include instructions that the LLM follows. The attack surface is fundamentally larger.

The Snyk ToxicSkills study found that 127+ skills requested raw secrets – master passwords, private keys, API keys – instead of using OAuth or similar delegation mechanisms. In a traditional software marketplace, this would be a red flag caught by automated scanning or manual review. In ClawHub, there was no review to catch it. And the request for raw secrets was not a bug in those skills. It was the intended behavior. The skills were designed to harvest credentials.

Problem 3: Trust Model Inversion

Secure systems follow a principle: trust is earned, not assumed. New code, new users, new connections start with minimal trust and gain more as they prove safe. OpenClaw inverts this model. The agent starts with maximum trust (full OS access, all messaging platforms, all integrations) and there is no mechanism to reduce it.

When you connect OpenClaw to your WhatsApp, Telegram, and Slack simultaneously, you are creating a single point of compromise that spans your entire communications infrastructure. A successful attack on the agent – through a malicious skill, a prompt injection, or a vulnerability like ClawJacked – gives the attacker access to all of it.

Problem 4: No Execution Boundary

There is no meaningful boundary between what the agent is asked to do and what it can do. If a prompt injection causes the agent to execute a shell command, that command runs with full user permissions. There is no sandbox, no policy engine, no approval step for high-risk actions.

Compare this to how we build agentic systems at CONFLICT. Every agent we deploy operates within explicit execution boundaries. It has a defined set of actions it can take, a policy engine that evaluates whether each action is authorized, and a human-in-the-loop checkpoint for any action that crosses a risk threshold. The agent cannot escalate its own permissions. It cannot perform actions outside its scope even if the underlying LLM is convinced it should.

What This Means for the Industry

OpenClaw is not an isolated incident. It is a preview of what happens when the agentic AI paradigm collides with the real world’s security requirements. The same architectural flaws – broad permissions, unvetted extensions, no execution boundaries, inverted trust models – will appear in every agentic system that prioritizes capability over security.

The Supply Chain Problem Is Getting Worse

Software supply chain security has been a growing concern for years. SolarWinds, Log4Shell, the xz utils backdoor – each incident demonstrated that modern software depends on a web of dependencies that are difficult to audit and easy to compromise.

Agentic AI makes this worse in two ways. First, the “dependencies” are no longer just code libraries. They are skills, plugins, and tool integrations that execute with the agent’s full permissions. Second, the attack vector is not just malicious code. It is prompt injection – manipulating the LLM into taking unintended actions. Prompt injection is easier to execute, harder to detect, and harder to defend against than traditional code-based attacks.

The 36% prompt injection vulnerability rate in ClawHub skills is a number that should alarm anyone building or deploying agentic systems. More than a third of the skills on the platform could be weaponized to make the agent do things the user did not intend. And these were not sophisticated attacks. They were straightforward prompt injections that a basic security review would have caught.

Consumer AI Agents Need Industrial Security

There is a temptation to treat consumer AI agents as a different category from enterprise AI systems. They are personal tools, running on personal machines, for personal use. Surely they do not need the same security rigor as a system processing financial transactions or managing critical infrastructure.

This is wrong. A consumer AI agent that is connected to your messaging platforms, your email, your smart home, and your financial accounts has access to your entire digital life. Compromising it is equivalent to compromising you. The security requirements are not lower because the context is personal. They are arguably higher because the blast radius includes everything.

The Regulatory Signal

China’s decision to restrict OpenClaw for state agencies is the first, but it will not be the last regulatory response. The EU’s AI Act already includes provisions for AI system security. The NIST AI Risk Management Framework addresses agent safety and security. As agentic AI systems proliferate and incidents accumulate, regulatory pressure will increase.

Organizations deploying agentic systems today should be planning for a regulatory environment that will eventually require the kinds of security controls that OpenClaw lacks: auditable decision logs, permission scoping, execution boundaries, and human oversight for high-risk actions.

What We Would Do Differently

We are not armchair critics. We build agentic systems for production use, and we have grappled with every one of these problems. Our approach, which we detailed in our zero trust for AI workloads post, is built on principles that directly address the failures we see in OpenClaw.

Scoped Permissions, Not Full Access

Every agent we build operates with the minimum permissions required for its specific task. An agent that manages calendar scheduling does not have access to the filesystem. An agent that processes documents does not have access to messaging platforms. Permissions are task-scoped, time-limited, and revocable.

This is not optional. It is the first principle of any secure system. The fact that an AI agent “might need” broad access is not a reason to grant it. Humans “might need” root access to production databases. We do not give it to them by default.

Controlled Execution Environments

Agents run in sandboxed environments with explicit boundaries. System calls are restricted. Network access is controlled. File system access is limited to specific directories. These constraints are enforced by the infrastructure, not by trusting the agent to police itself.

When an agent needs to execute code – which many agentic workflows require – that execution happens in an isolated container with no network access, no persistent storage, and a time limit. The results are validated before being returned to the agent’s main workflow.

Vetted and Audited Extensions

We do not run a public skill marketplace, and we would not use one without a rigorous vetting process. Every tool, integration, and capability that an agent can use is reviewed for security implications before it is made available. This includes reviewing the prompts and instructions that tools use, not just the code, because prompt injection is a tool-level concern.

Human-in-the-Loop for High-Risk Actions

Any action that crosses a defined risk threshold – sending a message, making a financial transaction, modifying infrastructure, accessing sensitive data – requires human approval. The agent can prepare the action, explain its reasoning, and request authorization. It cannot proceed without it.

This is not a limitation. It is a feature. An AI agent that can autonomously send messages from your accounts, modify your infrastructure, and access your financial data without human approval is not a productivity tool. It is a liability.

The Uncomfortable Truth

OpenClaw is popular because it works. It does what it promises – it connects your communication platforms to an LLM and automates tasks across your digital life. The 135,000+ GitHub stars represent genuine user value.

But it works the way a building without fire exits works. Everything is fine until it isn’t, and when it isn’t fine, the consequences are severe. Over 1,184 malicious skills on ClawHub. A critical RCE exploitable through prompt injection. Thirty thousand instances exposed to the internet. WebSocket hijacking without user interaction.

The agentic AI paradigm is not going away. The demand for AI agents that can take actions on our behalf is real and growing. But the way we build and deploy these agents has to change. The OpenClaw crisis is the fire that should trigger the code review.

Every organization that is building or deploying agentic AI systems should be asking: do our agents have scoped permissions or blanket access? Do we vet the tools and integrations our agents use? Do we have execution boundaries that prevent prompt injection from becoming code execution? Do we have human oversight for high-risk actions? Do we have audit trails that can reconstruct what an agent did and why?

If the answer to any of those questions is “not yet,” the answer needs to become “now.” OpenClaw has demonstrated, at scale, what happens when it stays “not yet” for too long.

posted by admin

Mar 11, 2026 - 14