Security and Safety

Permission Models

definition

Permission models define the authorization framework that governs what actions an agent can take, which resources it can access, and under what conditions it needs human approval. Common patterns include allow-list models (only explicitly permitted actions are available), deny-list models (everything is allowed except listed actions), tiered permissions (routine actions auto-approved, destructive actions require approval), and capability-based security (agents receive unforgeable tokens granting specific permissions).

Permission models define the authorization framework that governs what actions an agent can take, which resources it can access, and under what conditions it needs human approval. Common patterns include allow-list models (only explicitly permitted actions are available), deny-list models (everything is allowed except listed actions), tiered permissions (routine actions auto-approved, destructive actions require approval), and capability-based security (agents receive unforgeable tokens granting specific permissions). The architectural decision of which permission model to use determines the trust boundary of your entire agent system — too restrictive and agents can't do useful work, too permissive and every prompt injection becomes a potential breach. The most effective production systems use tiered permissions that match the risk level of each action, allowing agents to read freely, write with logging, and delete only with human approval. This concept connects to least privilege for the underlying principle, human-in-the-loop for implementing approval gates, supervision for the monitoring layer above permissions, and tool sandboxing for execution-level enforcement.

on the map

Permission Models Security and Safety

related concepts

Tool Sandboxing Human in the Loop The Autonomy Spectrum