research report

OpenClaw Security Report

June 2026 · static analysis of 63,697 skills · By 4Worlds

Executive Summary

We scanned every skill in the OpenClaw registry — 63,697 in total — using ClawAudit's static analysis engine. Start with what the code can do, measured directly: 12.8% (1 in 8) can read environment variables — where apps keep API keys and tokens — and 6.2% can both read environment variables and make outbound network calls, the shape of data exfiltration (co-occurring capabilities, not verified data-flow). A much smaller 0.4% reach dedicated credential stores like SSH keys or AWS configs — the distinction between "can read an env var" and "can read your secrets" matters, and we count them separately.

Layering ClawAudit's risk heuristics on top of those capabilities, 0.3% of skills land in the Dangerous tier and only 89% in Trusted, with an average trust score of 84.3/100 — the Caution band. Read these tier percentages as an automated triage signal, not a precise finding: they come from pattern-matching heuristics that over-flag substantially on the Risky and Dangerous bands (see Methodology and Limitations). A deep-scan review layer is rolling out to correct that, and only a subset of the corpus has been deep-scanned so far. What the numbers do support is the takeaway that the typical skill warrants manual review before installation.

Methodology

ClawAudit performs zone-aware static analysis on SKILL.md files. It parses the markdown structure, classifies content zones (prose, code blocks, YAML frontmatter, headings), and applies 60+ detection patterns weighted by zone context. Code blocks are treated as executable instructions and weighted higher than prose descriptions. Security documentation — sections describing threats as warnings — is suppressed to avoid false positives.

Each skill receives a trust score from 0 to 100 based on the severity and quantity of findings, positive trust signals (version numbers, documentation, metadata), and the presence of compound threats (e.g., file read + network out = potential data exfiltration).

Trust tiers:

  • Trusted (80-100): No significant issues. 56,695 skills (89%)
  • Caution (60-79): Minor concerns, review recommended. 294 skills (0.5%)
  • Risky (40-59): Significant issues found. 6,504 skills (10.2%)
  • Dangerous (0-39): Critical findings flagged. 204 skills (0.3%)

Findings by the Numbers

9,713
Critical findings
29,168
High severity
91,335
Total findings

Across the entire registry, we flagged 91,335 security findings. Of these, 9,713 are critical severity — patterns like credential access from environment variables, obfuscated eval chains, or direct prompt-injection language.

Capability Landscape

Understanding what capabilities skills request reveals the attack surface of the ecosystem. The most common capabilities are:

credential_access
12.8%
network_out
11.7%
package_install
11.4%
network_in
9.7%
data_encoding
3.3%
file_read
2.8%
agent_memory
2.8%
file_write
1.8%
process_exec
1.2%
dynamic_eval
0.8%

12.8% of skills have credential_access capabilities. When file write access combines with network access, it creates a potential exfiltration channel — and 7,424 skills have outbound network capabilities.

Common Threat Patterns

Environment-variable access

8,177 skills (12.8%) read environment variables. That's where apps keep API keys and tokens — but also ordinary config like NODE_ENV and feature flags. So the honest read is "can read env vars," not "reads your secrets": most of this is benign, and the volume means the ecosystem normalizes env-var access, making genuinely malicious reads harder to spot.

A far smaller 0.4% (239 skills) touch dedicated credential stores — SSH keys, AWS/GCP/Azure configs, the OS keychain. That is the capability that actually means "can read your secrets," and we count it separately from broad env-var access on purpose. Conflating the two is how a scanner overstates its own findings.

Package Installation

7,230 skills install packages at runtime. This is a supply chain risk — a compromised dependency could execute arbitrary code during installation. Skills that install packages and have network access create a particularly dangerous combination.

Prompt Injection

We flagged prompt-injection patterns — language that attempts to override agent instructions, manipulate system prompts, or hijack agent behavior — in hundreds of skills. Some are benign (security tools demonstrating attacks), but many appear in unexpected contexts.

Recommendations

  1. Audit before installing. Use ClawAudit or similar tooling to check skills before adding them to your agent. A 30-second scan can flag a credential-stealing skill before you install it.
  2. Review credential requirements. If a skill asks for API keys, verify it actually needs them. Overprivileged skills are a red flag.
  3. Watch for compound threats. A skill that reads files and makes network requests could be exfiltrating data. Individual capabilities are fine; certain combinations are not.
  4. Sandbox untrusted skills. Run skills with minimal permissions. Don't give file system or network access unless required.
  5. Registry-level gatekeeping. OpenClaw should consider automated security scanning as part of the skill submission process.

Limitations

ClawAudit's tier scores come from a static analyzer — it reads SKILL.md files and applies pattern matching. It cannot execute code, trace data flows, or detect novel obfuscation techniques. That pattern-matching over-flags substantially on the flagged tiers: when we checked the regex verdicts against a deep scan, a majority of the regex-Dangerous verdicts did not survive — they were over-calls, not confirmed threats. We are correcting this with a two-layer, deep-scanned verdict (the capability measurements stay deterministic; the tier judgment gets a deeper second pass), and we mark deep-scanned verdicts as such. Until a skill is deep-scanned, treat its tier as a triage flag, not a finding. False negatives are also possible — for highly obfuscated or novel attack vectors, and for prose-based risks that leave no detectable code pattern.

This report represents a snapshot as of June 2026. The registry is constantly changing as skills are added, updated, and removed.

Want to audit a specific skill? Use the free API or browse the registry.