A Quick Stop at the HostileShop

Day 2 17:35 Fuse en Security
Dec. 28, 2025 17:35-18:15
Fahrplan__event__banner_image_alt A Quick Stop at the HostileShop
Nothing stops [this train](https://ai-2027.com/). It just [might not arrive on schedule](https://www.interconnects.ai/p/brakes-on-an-intelligence-explosion)... LLMs appear unlikely to become capable of either true human-level novelty creation or AGI. However, they excel at task execution in [well-established task domains](https://epochai.substack.com/p/most-ai-value-will-come-from-broad), even exceeding most humans in some of these domains. This capability set has yielded an "Agentic Revolution", where LLMs are being deployed as components of software systems for various tasks. These **LLM Agents** work **_just well enough_** to deploy in scenarios for which they are either [not yet safe](https://brave.com/blog/comet-prompt-injection/), or are [fundamentally impossible to secure against](https://labs.zenity.io/p/why-aren-t-we-making-any-progress-in-security-from-ai-bf02). The resulting vulnerability surface is very much reminiscent of the hacking scene in the 1990s, but at a lightning pace, with exploits often being patched within hours after they widely circulate. The hacking dopamine treadmill has become an express train. Rather than hop right on what looked like an express train to Fail City, I wanted a tool that would **hack LLM Agents automatically**, and also let me know if and when LLM Agents finally become secure enough for use in privacy preserving systems, without the need to rely on [oppressive](https://runtheprompts.com/resources/chatgpt-info/chatgpt-is-reporting-your-prompts-to-police/) [levels of surveillance](https://www.anthropic.com/news/activating-asl3-protections). All of this led me to create [HostileShop](https://github.com/mikeperry-tor/HostileShop).

In this talk, I will give you an overview of LLM Agent hacking. I will briefly introduce LLM Agents, their vulnerability surface, and types of attacks. Because everything old is new again, I will draw some parallels to the 90s hacking scene.

I will then present HostileShop, which is a python-based LLM Agent security testing tool that was selected as one of the ten prize winners in OpenAI's GPT-OSS-20B RedTeam Contest.

HostileShop creates a fully simulated web shopping environment where an Attacker Agent LLM attempts to manipulate a Target Shopping Agent LLM into performing unauthorized actions that are automatically detected by the framework.

HostileShop is best at discovering prompt injections that induce LLM Agents to make improper "tool calls". In other words, HostileShop finds the magic spells that make LLM Agents call functions that they have available to them, often with the specific input of your choice.

HostileShop is also capable of enhancement and mutation of "universal" jailbreaks. This allows cross-LLM adaptation of universal jailbreaks that are powerful enough to make the target LLM become fully under your control, for arbitrary actions. This also enables public jailbreaks that have been partially blocked to work again, until they are more comprehensively addressed.

For 1990s hacking vibes, HostileShop has a text-based chat interface that lets you chat with the attacker agent, or become the attacker yourself.

For 2025 contrarian vibes (in some circles), HostileShop was vibe coded without an ounce of shame. Basically, I used LLMs to write a framework to have LLMs attack other LLMs. Crucially however, for reasons that I will explain in the talk, HostileShop does not use an LLM to judge attack success. Instead, success is determined automatically and immediately by the framework.

At the time of this writing, HostileShop has working attack examples for the entire LLM frontier, though things move very fast in this arena.

There is a chance that by the time of the conference, all of the attacks that HostileShop is capable of discovering will have been fixed. In this case, the talk will focus on the current state of LLM security, the future of private Agentic AI, and either some thoughts on how to avoid complete dystopia, or amusing rants about the dystopia that has already arrived.

Speakers of this event