LogoAgentHunter
  • Submit
  • Industries
  • Categories
  • Agency
Logo
LogoAgentHunter

Discover, Compare, and Leverage the Best AI Agents

Featured On

Featured on yo.directory
yo.directory
Featured on yo.directory
Featured on Startup Fame
Startup Fame
Featured on Startup Fame
AIStage
Listed on AIStage
Sprunkid
Featured on Sprunkid
Featured on Twelve Tools
Twelve Tools
Featured on Twelve Tools
Listed on Turbo0
Turbo0
Listed on Turbo0
Featured on Product Hunt
Product Hunt
Featured on Product Hunt
Game Sprunki
Featured on Game Sprunki
AI Toolz Dir
Featured on AI Toolz Dir
Featured on Microlaunch
Microlaunch
Featured on Microlaunch
Featured on Fazier
Fazier
Featured on Fazier
Featured on Techbase Directory
Techbase Directory
Featured on Techbase Directory
backlinkdirs
Featured on Backlink Dirs
Featured on SideProjectors
SideProjectors
Featured on SideProjectors
Submit AI Tools
Featured on Submit AI Tools
AI Hunt
Featured on AI Hunt
Featured on Dang.ai
Dang.ai
Featured on Dang.ai
Featured on AI Finder
AI Finder
Featured on AI Finder
Featured on LaunchIgniter
LaunchIgniter
Featured on LaunchIgniter
Imglab
Featured on Imglab
AI138
Featured on AI138
600.tools
Featured on 600.tools
Featured Tool
Featured on Featured Tool
Dirs.cc
Featured on Dirs.cc
Ant Directory
Featured on Ant Directory
Featured on MagicBox.tools
MagicBox.tools
Featured on MagicBox.tools
Featured on Code.market
Code.market
Featured on Code.market
Featured on LaunchBoard
LaunchBoard
Featured on LaunchBoard
Genify
Featured on Genify
Copyright © 2025 All Rights Reserved.
Product
  • AI Agents Directory
  • AI Agent Glossary
  • Industries
  • Categories
Resources
  • AI Agentic Workflows
  • Blog
  • News
  • Submit
  • Coummunity
  • Ebooks
Company
  • About Us
  • Privacy Policy
  • Terms of Service
  • Sitemap
Friend Links
  • AI Music API
  • ImaginePro AI
  • Dog Names
  • Readdit Analytics
Back to News List

Meta Releases Open Source LlamaFirewall to Protect AI Agents

May 13, 2025•Sergio De Simone•Original Link•2 minutes
AI Security
Open Source
Meta

Meta's LlamaFirewall is a security framework designed to protect AI agents from prompt injection, goal misalignment, and insecure code generation, achieving over 90% efficacy in reducing attack success rates.

Meta has introduced LlamaFirewall, an open-source security framework aimed at safeguarding AI agents against threats like prompt injection, goal misalignment, and insecure code generation. According to Meta's research paper, the framework demonstrated over 90% efficacy in reducing attack success rates when tested on the AgentDojo benchmark.

Key Features of LlamaFirewall

LlamaFirewall operates as a real-time guardrail monitor with three primary protection layers:

  1. PromptGuard 2: A fine-tuned BERT-style model designed to detect jailbreak attempts in real time. It analyzes user prompts and untrusted data sources, addressing tactics like instruction overrides and token injection. Meta claims it improves performance over its predecessor, with lower latency in its lightweight variant.

  2. AlignmentCheck: An experimental chain-of-thought auditor that monitors an agent’s reasoning for signs of goal hijacking or misalignment. Unlike traditional methods, it evaluates the entire execution trace, flagging deviations that suggest covert prompt injection or misleading tool output.

  3. CodeShield: An online static analysis engine for LLM-generated code, supporting Semgrep and regex-based rules. Originally part of the Llama 3 launch, it now integrates into LlamaFirewall, offering syntax-aware pattern matching across eight programming languages.

"Although CodeShield is effective in identifying a wide range of insecure code patterns, it is not comprehensive and may miss nuanced or context-dependent vulnerabilities." — Meta Researchers

Performance and Use Cases

  • PromptGuard 2 and AlignmentCheck combined improve performance on the AgentDojo benchmark.
  • CodeShield achieved 96% precision and 79% recall in identifying insecure code during CyberSecEval3 testing.

Meta outlined two practical workflows:

  1. Travel Planning Agent: Uses PromptGuard to scan web content (e.g., travel reviews) for jailbreak attempts, while AlignmentCheck monitors for goal shifts.
  2. Coding Agent: Generates SQL code, retrieves examples from the web, and verifies them with CodeShield.

LlamaFirewall in action

Future Developments

Meta plans to expand LlamaFirewall’s capabilities, including:

  • Support for multimodal agents.
  • Reduced latency.
  • Broader threat coverage.
  • More realistic benchmarking.

This release underscores Meta’s commitment to AI safety and open-source innovation, providing developers with tools to mitigate risks in AI agent deployments.

Related News

August 14, 2025•Tom Field

AI Agents Pose New Security Challenges for Defenders

Palo Alto Networks' Kevin Kin discusses the growing security risks posed by AI agents and the difficulty in distinguishing their behavior from users.

AI Security
Threat Detection
Zero Trust
August 12, 2025•Michael Nuñez

AI OS Agents Pose Security Risks as Tech Giants Accelerate Development

New research highlights rapid advancements in AI systems that operate computers like humans, raising significant security and privacy concerns across industries.

AI Security
OS Agents
Tech Innovation

About the Author

Dr. Emily Wang

Dr. Emily Wang

AI Product Strategy Expert

Former Google AI Product Manager with 10 years of experience in AI product development and strategy formulation. Led multiple successful AI products from 0 to 1 development process, now provides product strategy consulting for AI startups while writing AI product analysis articles for various tech media outlets.

Expertise

AI Product Management
User Experience
Business Strategy
Market Analysis
Experience
10 years
Publications
65+
Credentials
2
LinkedInMedium

Agent Newsletter

Get Agentic Newsletter Today

Subscribe to our newsletter for the latest news and updates