An attacker chats with your LLM agent. A few messages later, the agent records a false memory. When a legitimate user asks a related question, the agent retrieves the poisoned memory and executes the attacker's action. Success rate: up to 95%. And it gets worse: that memory hijacks the agent's control flow — forcing tool selection, reordering workflows, expanding scope across tasks. Over 90% of trials are vulnerable. Long-term memory in LLM agents is an attack surface, and the current defenses are insufficient.
MemPoison — poisoning that bypasses filters
LLM agents with long-term memory follow a pipeline: extract relevant information, rewrite for compression, store, and retrieve by embedding similarity. Prior research assumed an attacker would write directly to the memory store. In practice, selective extraction filters low-saliency content. Naive memories get discarded.
MemPoison solves this with three techniques. Semantic Relational Bridge binds trigger and payload into a coherent statement — the pipeline cannot separate them without losing context. Entity Masquerading optimizes the trigger to resemble a named entity; LLMs preserve named entities verbatim during rewriting, so the trigger survives. Joint Embedding Optimization packs poisoned texts into a tight cluster in embedding space, isolated from benign ones. Retrieval pulls the poisoned memory.
The result: ASR up to 0.95 across different domains and memory mechanisms. Perplexity filtering does not detect it — the texts are semantically coherent. Paraphrasing does not remove it — entity masquerading preserves the trigger after rewriting. MemPoison works against the very mechanisms designed to filter.
The technical detail that matters: MemPoison exploits anisotropy in embedding space and redistributes attention patterns. The poisoned cluster creates a high-density region that attracts related queries, diverting retrieval from legitimate memories. This is a structural vulnerability — any memory system based on embedding similarity is potentially vulnerable.
MCFA — when memory hijacks control flow
Memory Control Flow Attacks go beyond polluting RAG. Retrieved memory hijacks the agent's control flow — tool selection and execution. The attacker needs no access to the system prompt, tools, or memory store. Standard interaction suffices.
The MEMFLOW framework documented the numbers. Tool Override: 91.7% to 100%. Memory forces the agent to select tools it should not. With retrieval OFF, override drops to 0% — the deviation is caused by memory. Workflow Reordering: 52.8% to 69.4%. Memory reorders tool invocations, skipping security steps. Cross-Task Scope Expansion: 97.2% to 100%. An injection in one task generalizes to different templates, propagating across domains. Persistence: 100% over long horizons. Post-hoc textual corrections fail in 100% of cases — the agent relapses into malicious behavior on the next retrieval.
Tested on GPT-5 mini, Claude Sonnet 4.5, and Gemini 2.5 Flash, across LangChain and LlamaIndex frameworks. All vulnerable. The vulnerability is in the memory design, not the implementation.
Why current defenses fail
System prompts do not protect. When retrieval is active, memory overrides security instructions. The data is clear: tool override reaches 100% with retrieval ON and drops to 0% with retrieval OFF. Memory is stronger than instruction.
Selective extraction filters noise, not coherent payloads. MemPoison demonstrated that semantically coherent statements pass through the pipeline intact. Saliency-based filtering assumes malicious content has low saliency — that assumption does not hold.
Textual corrections do not work. MCFA showed that the agent relapses on the next retrieval. Memory is a durable backdoor — correcting the instruction does not remove the poisoned memory.
Even production-style mitigations like dual-channel memory with role-based segregation show 85%+ control flow deviations. It reduces, not eliminates. The shared memory architecture across tasks is the fundamental problem.
The highest-risk scenario: multi-tenant agents
Agents serving multiple users from the same memory store are the most critical scenario. An attacker poisons memory in one interaction. All subsequent users are affected. MemPoison exploits this directly: the poisoned cluster in embedding space attracts queries from any user asking related questions.
The risk is compounded by persistence. Poisoned memory does not expire. Over long horizons, MCFA documents 100% persistence. Every retrieval reactivates the malicious behavior. There is no natural decay. And because MemPoison's Joint Embedding Optimization creates a dense cluster isolated from benign memories, the poisoned entries dominate retrieval results — legitimate memories get outranked by the attacker's payload.
This is not a theoretical concern. Any organization deploying agents that serve multiple users — customer support, internal tooling, automated workflows — shares this risk profile. The attack requires no special access, no exploit, no privilege escalation. Just a conversation.
What we verify at Tech86
We evaluate AI agent architectures with a focus on memory and retrieval attack surfaces. If your agents use persistent memory without user isolation, you are one conversation away from an attack with a 95% success rate. If they use memory with retrieval and high-risk tools, over 90% of scenarios are vulnerable to control flow hijacking.
The first step is mapping: which agents use persistent memory, what retrieval mechanism, whether the memory store is shared. Then isolate, monitor, and test adversarially. Without adversarial testing, you do not know whether your mitigations work — or merely reduce the problem to 85% deviations instead of 100%.
