Data Exfiltration
definition
Data exfiltration in agentic systems occurs when an agent is tricked or exploited into sending sensitive information — API keys, source code, customer data, environment variables — to unauthorized external destinations. This can happen through prompt injection (where a malicious instruction tells the agent to include secrets in an HTTP tool call), through tool misuse (where the agent inadvertently includes sensitive data in a response), or through context leakage (where conversation history containing secrets gets sent to an unintended party).
Data exfiltration in agentic systems occurs when an agent is tricked or exploited into sending sensitive information — API keys, source code, customer data, environment variables — to unauthorized external destinations. This can happen through prompt injection (where a malicious instruction tells the agent to include secrets in an HTTP tool call), through tool misuse (where the agent inadvertently includes sensitive data in a response), or through context leakage (where conversation history containing secrets gets sent to an unintended party). The risk is amplified in agentic coding tools because they typically have access to the full project environment including .env files, git history, and production credentials. Defending against exfiltration requires a combination of output filtering (monitoring what data leaves the system), network controls (restricting which endpoints agents can reach), and secret management (ensuring sensitive values aren't exposed in agent-accessible paths). This concept connects to prompt injection as the primary attack vector for triggering exfiltration, least privilege for limiting what data agents can access, audit logging for detecting exfiltration attempts after the fact, and tool sandboxing for restricting network access.