Comparisonsecret-scanningbenchmarkgitleakstrufflehoggitguardiancomparisonfalse-positivesai-security

We Tested 4 Secret Scanners on 8 Real Repos. The Results Were Eye-Opening.

We scanned 2.2 million lines of code across 8 popular open-source repositories using Gitleaks, TruffleHog, GitGuardian, and Puaro. Here are the raw numbers — and what they actually mean.

Puaro TeamAuthor

April 08, 2026Apr 08

15 min read

We Tested 4 Secret Scanners on 8 Real Repos. The Results Were Eye-Opening.

You've probably heard the advice: "Add a secret scanner to your pipeline." Good advice. But which one? And once it's running — can you actually trust what it tells you?

We ran a benchmark to find out.

We took 8 popular open-source repositories, scanned 2.2 million lines of code, and ran four tools against all of it: Gitleaks, TruffleHog, GitGuardian, and Puaro. Then we looked at the raw numbers.

The results surprised us — not because any tool was secretly bad, but because of how differently they answered the same question: "Is this a real security threat?"

The problem nobody talks about: alert noise

Before the numbers, a quick framing.

When a secret scanner flags something, it's saying: "There might be a sensitive credential here." That flag is either correct (a real threat) or incorrect (a false positive — a test value, a placeholder, a code example).

The problem? Most tools fire alerts without telling you which is which. You get a list of 8,449 items and you're on your own.

That's not a small problem. It's a fundamental one.

⚠️

Alert fatigue is real. When developers see hundreds of false positives, they start ignoring alerts entirely. The tool that cried wolf doesn't protect you — it just creates noise.

The headline number: OpenSSL

We start with OpenSSL — the most important cryptography library in the world, used in almost every application that handles secure connections. If anything is a real-world stress test for a scanner, it's this.

Here's what each tool flagged:

Tool	Alerts fired	What it told you about them
Gitleaks	8,449	Nothing. A flat list.
GitGuardian	675	Nothing. A flat list.
TruffleHog	349	Nothing. A flat list.
Puaro	15	Severity level + reasoning for every single one.

Puaro found 15 alerts. That's a 99.8% noise reduction compared to Gitleaks.

More importantly: of those 15, we found 5 actual threats — 2 critical and 3 high severity. Every finding came with an explanation of why it matters.

Here's an example. Two private cryptographic keys were found in OpenSSL's fuzzing test suite:

"Contains a standard PEM header for an EC private key. Part of a fuzzing harness for OpenSSL using static test keys. Private keys are highly sensitive assets that should never be committed to version control."

And one more, rated HIGH:

"PEM-encoded DSA private key in a fuzzing test suite. Non-production context, but committing raw private keys risks accidental propagation to production."

Notice what's happening: the AI isn't just pattern-matching. It's reading the context — which file, what kind of project, what the code around it looks like — and then explaining its reasoning in plain English.

The full benchmark: all 8 repos

We didn't just test OpenSSL. Here's the complete data across 8 repositories — ranging from crypto libraries to intentional vulnerability demos.

Repository	Gitleaks	TruffleHog	GitGuardian	Puaro
openssl	8,449	349	675	15 (2 critical, 3 high)
trufflehog	1,211	1,793	1,216	1,619 (all low)
gitleaks	2	181	320	413 (all low)
bitcoin	28	0	15	90 (all low)
juice-shop	49	3	131	51 (1 high, 50 low)
wrongsecrets	38	19	40	40 (all low)
AgentDefense	14	1	12	7 (4 high, 2 medium)
BenchmarkJava	2	194	16	1 (1 low)
TOTAL	9,793	2,540	2,425	2,236

A few things stand out immediately.

Gitleaks fires almost 10,000 alerts across 8 repos. That's not a typo. In a single repo (OpenSSL), it fires 8,449 times. If your security team is working through Gitleaks output manually, that's not a security process — that's a full-time job.

TruffleHog found zero threats in bitcoin. The bitcoin core codebase is one of the most security-conscious projects in existence. It has real test keys and sensitive data in its repo. TruffleHog missed all of it.

Puaro found 7 high and medium severity issues in AgentDefense that every other tool either missed or buried in noise.

A study in intelligence: OWASP WrongSecrets

One data point deserves special attention because it shows something the raw numbers alone don't capture.

OWASP WrongSecrets is a project specifically designed to contain fake secrets for training purposes. It's full of obviously fake credentials with names like youCantHandleThisSecret. Every tool found roughly the same number of alerts:

Gitleaks: 38
TruffleHog: 19
GitGuardian: 40
Puaro: 40

Same count. Very different output.

GitGuardian, Gitleaks, and TruffleHog hand you a list. They say "these look like secrets" and stop there.

Puaro classifies every single one as Low severity and explains why:

"The repository name 'wrongsecrets' is a known benchmark tool. The value is a human-readable phrase rather than a high-entropy credential. Risk is minimal."

That's the difference between a fire alarm and a fire fighter. One tells you something might be burning. The other tells you it's probably the toaster, not the building.

The goal of secret scanning isn't to find everything that looks like a secret. It's to find the things that are actually dangerous — and tell you clearly which ones those are.

The key numbers at a glance

99.8%

Noise reduction on OpenSSL
(15 vs 8,449 alerts)

2.2M

Lines of code scanned
across 8 repos

5

Real threats in OpenSSL
(2 critical + 3 high)

0

Competitors that classify
severity or explain findings

Why the numbers look this way: how Puaro works

The benchmark results aren't an accident. They come from a fundamentally different approach to scanning.

Most secret scanners work in one step: search through code for strings that match known patterns. If it matches, fire an alert. End of story.

Puaro runs four stages before declaring anything a finding.

Stage 1: Filter out the obvious noise

Before even looking for secrets, Puaro throws away everything that clearly isn't dangerous. Test directories, documentation folders, example files, placeholder values, code comments. The vast majority of what a generic scanner would flag never even reaches the next stage.

Stage 2: Find real candidates

From what's left, Puaro uses provider-specific detection patterns — targeting the actual formats of real credentials across dozens of services — to find genuine suspects. This is far more targeted than generic pattern matching.

Stage 3: Ask AI to reason about it

Here's the step the other tools skip. Puaro's AI reasoning engine evaluates each candidate in full context: which file is this in, what kind of project is this, what does the surrounding code look like?

The AI doesn't just say "this looks like a key." It reasons about it:

Is this a test fixture?
Is it a configuration default that ships with many projects?
Is the entropy high enough to be a real credential?
Does the surrounding code suggest this is a real usage or a documentation example?

The output is a structured verdict: severity (critical, high, medium, low), a confidence level, and a plain-English explanation.

Stage 4: Trace where it goes

For real findings, Puaro goes one step further. It uses code analysis to map the "blast radius" — where is this secret used? Does it flow to a network call? Does it get logged anywhere? What would an attacker be able to do with it?

💡

The result: By the time the AI makes a decision, it's only evaluating a highly filtered set of genuine suspects — not your entire codebase. That's why the output is signal, not noise.

"Isn't this just asking ChatGPT about your code?"

Fair question. The short answer is no.

A "GPT wrapper" approach would send your entire codebase to an AI and ask it to find secrets. That approach has real problems: it's expensive, it's slow, AI models can invent findings that don't exist (called hallucinations), and it gives you no way to verify the results.

Puaro's approach is the opposite. The AI only sees candidates that have already passed through three rounds of deterministic filtering. By the time AI reasoning happens, the hard work is done — the AI is the final check, not the primary scanner.

	GPT Wrapper	Puaro (Compound AI)
How it works	Code → AI → Results	Code → Filters → Patterns → AI → Analysis → Results
AI sees	100% of your code	Only pre-filtered suspects
Hallucination risk	High — no verification layer	Low — deterministic filters run first
Reliability	Entirely dependent on the LLM	Multi-layer system with built-in redundancy

What only Puaro gives you

Here's the feature comparison from the benchmark:

Capability	Puaro	Gitleaks	TruffleHog	GitGuardian
AI-powered detection	✓	✗	✗	partial
Severity classification (Critical / High / Medium / Low)	✓	✗	✗	✗
Plain-English reason for every finding	✓	✗	✗	✗
Secret lifecycle analysis (where it flows in code)	✓	✗	✗	✗
Remediation guidance per finding	✓	✗	✗	✗
Real-time PR scanning	✓	CI only	✓	✓
Secret flow visualization	✓	✗	✗	✗

Every other tool in this benchmark outputs a flat list. You get a location ("line 42 in config.js") and a type ("potential AWS key"). No severity. No reason. No guidance.

Puaro is the only tool in the benchmark that answers the question developers actually need answered: "Should I be worried about this right now?"

What this means in practice

Let's make this concrete.

Imagine your team scans a 200,000-line codebase. Here's what you're likely looking at with each tool:

Gitleaks → You receive 1,000+ alerts, no severity, no context. Your most junior developer gets assigned to triage. They close half as false positives (guessing). The other half sit unresolved for weeks.
TruffleHog → Fewer alerts because it verifies credentials against providers. But it only works for credential types it knows about. Anything else gets missed or shown unverified.
GitGuardian → A cleaner interface than the others, but the same fundamental problem: no reasoning, no severity, no guidance. A list is still a list.
Puaro → Alerts come with severity levels and explanations. Your team only acts on critical and high findings first. Everything is triaged automatically. The developer who receives the alert knows exactly what to do.

✅

The bottom line: A finding you understand is one you can fix. A finding you can't understand is just noise — and noise gets ignored.

The honest takeaway

The other tools in this benchmark aren't bad. Gitleaks is fast and free, which makes it a reasonable pre-commit gate. TruffleHog's credential verification is genuinely useful for deep historical audits. GitGuardian has a polished interface and solid detection.

But none of them answer the question that matters: "Is this actually dangerous?"

That gap — between finding something and understanding it — is where most security breaches actually happen. Not because the tool missed the secret, but because the alert was buried in noise and nobody got to it in time.

The benchmark numbers tell that story clearly. 9,793 alerts from Gitleaks across 8 repos. 15 alerts from Puaro in the same repos, every one explained, every one classified.

You can't act on 10,000 alerts. You can act on 15.

If you are comparing rollout options, start with the secret scanner comparison, then review Puaro's pull request secret scanning features and secret scanning pricing.

Want to see what Puaro finds in your own codebase? Start scanning free — setup takes under 5 minutes, no credit card required.

More Security Insights

Comparison10 min readMar 10, 2026

Puaro vs Gitleaks vs TruffleHog: Which Secret Scanner Fits Your Workflow?

A practical comparison of three secret scanning tools — regex speed, deep verification, and AI-powered classification — to help you choose the right fit for your team.

secret-scanning gitleaks trufflehog

Read article

Security Insights2 min readMay 15, 2026

I’m Officially Tired of Being the "Human" in "Human Error"

We’ve all seen the headlines. Another massive source code leak. Another CISO quoting "tightening internal protocols." It’s a rigged game. Here is why discipline doesn't scale in AppSec.

cybersecurity devsecops puaro

Read article

Supply Chain Security8 min readNov 01, 2025

How AI-Powered Scanning Prevents the Next 'GlassWorm' Supply Chain Attack

The recent GlassWorm incident exposed critical vulnerabilities in the software supply chain when developers accidentally leaked VS Code extension tokens. Learn how AI-powered scanning provides proactive prevention beyond simple pattern matching.

supply-chain glassworm vscode-extensions

Read article

READY TO SECURE YOUR CODE?

Experience Puaro's Protection

Put these security insights into practice. Start scanning and see how Puaro can protect your applications from credential leaks and security vulnerabilities.

Start Scanning Free