Comparisonsecret-scanningbenchmarkgitleakstrufflehoggitguardiancomparisonfalse-positivesai-securitycode-security

We Tested 4 Secret Scanners on 8 Real Repos. The Results Were Eye-Opening.

We scanned 2.2 million lines of code across 8 popular open-source repositories using Gitleaks, TruffleHog, GitGuardian, and Puaro. Here are the raw numbers — and what they actually mean.

Author
Apr 08
15 min read
We Tested 4 Secret Scanners on 8 Real Repos. The Results Were Eye-Opening.

We Tested 4 Secret Scanners on 8 Real Repos. The Results Were Eye-Opening.

You've probably heard the advice: "Add a secret scanner to your pipeline." Good advice. But which one? And once it's running — can you actually trust what it tells you?

We ran a benchmark to find out.

We took 8 popular open-source repositories, scanned 2.2 million lines of code, and ran four tools against all of it: Gitleaks, TruffleHog, GitGuardian, and Puaro. Then we looked at the raw numbers.

The results surprised us — not because any tool was secretly bad, but because of how differently they answered the same question: "Is this a real security threat?"


The problem nobody talks about: alert noise

Before the numbers, a quick framing.

When a secret scanner flags something, it's saying: "There might be a sensitive credential here." That flag is either correct (a real threat) or incorrect (a false positive — a test value, a placeholder, a code example).

The problem? Most tools fire alerts without telling you which is which. You get a list of 8,449 items and you're on your own.

That's not a small problem. It's a fundamental one.

⚠️

Alert fatigue is real. When developers see hundreds of false positives, they start ignoring alerts entirely. The tool that cried wolf doesn't protect you — it just creates noise.


The headline number: OpenSSL

We start with OpenSSL — the most important cryptography library in the world, used in almost every application that handles secure connections. If anything is a real-world stress test for a scanner, it's this.

Here's what each tool flagged:

ToolAlerts firedWhat it told you about them
Gitleaks8,449Nothing. A flat list.
GitGuardian675Nothing. A flat list.
TruffleHog349Nothing. A flat list.
Puaro15Severity level + reasoning for every single one.

Puaro found 15 alerts. That's a 99.8% noise reduction compared to Gitleaks.

More importantly: of those 15, we found 5 actual threats — 2 critical and 3 high severity. Every finding came with an explanation of why it matters.

Here's an example. Two private cryptographic keys were found in OpenSSL's fuzzing test suite:

"Contains a standard PEM header for an EC private key. Part of a fuzzing harness for OpenSSL using static test keys. Private keys are highly sensitive assets that should never be committed to version control."

And one more, rated HIGH:

"PEM-encoded DSA private key in a fuzzing test suite. Non-production context, but committing raw private keys risks accidental propagation to production."

Notice what's happening: the AI isn't just pattern-matching. It's reading the context — which file, what kind of project, what the code around it looks like — and then explaining its reasoning in plain English.


The full benchmark: all 8 repos

We didn't just test OpenSSL. Here's the complete data across 8 repositories — ranging from crypto libraries to intentional vulnerability demos.

RepositoryGitleaksTruffleHogGitGuardianPuaro
openssl8,44934967515 (2 critical, 3 high)
trufflehog1,2111,7931,2161,619 (all low)
gitleaks2181320413 (all low)
bitcoin2801590 (all low)
juice-shop49313151 (1 high, 50 low)
wrongsecrets38194040 (all low)
AgentDefense141127 (4 high, 2 medium)
BenchmarkJava2194161 (1 low)
TOTAL9,7932,5402,4252,236

A few things stand out immediately.

Gitleaks fires almost 10,000 alerts across 8 repos. That's not a typo. In a single repo (OpenSSL), it fires 8,449 times. If your security team is working through Gitleaks output manually, that's not a security process — that's a full-time job.

TruffleHog found zero threats in bitcoin. The bitcoin core codebase is one of the most security-conscious projects in existence. It has real test keys and sensitive data in its repo. TruffleHog missed all of it.

Puaro found 7 high and medium severity issues in AgentDefense that every other tool either missed or buried in noise.


A study in intelligence: OWASP WrongSecrets

One data point deserves special attention because it shows something the raw numbers alone don't capture.

OWASP WrongSecrets is a project specifically designed to contain fake secrets for training purposes. It's full of obviously fake credentials with names like youCantHandleThisSecret. Every tool found roughly the same number of alerts:

  • Gitleaks: 38
  • TruffleHog: 19
  • GitGuardian: 40
  • Puaro: 40

Same count. Very different output.

GitGuardian, Gitleaks, and TruffleHog hand you a list. They say "these look like secrets" and stop there.

Puaro classifies every single one as Low severity and explains why:

"The repository name 'wrongsecrets' is a known benchmark tool. The value is a human-readable phrase rather than a high-entropy credential. Risk is minimal."

That's the difference between a fire alarm and a fire fighter. One tells you something might be burning. The other tells you it's probably the toaster, not the building.

The goal of secret scanning isn't to find everything that looks like a secret. It's to find the things that are actually dangerous — and tell you clearly which ones those are.


The key numbers at a glance

99.8%
Noise reduction on OpenSSL
(15 vs 8,449 alerts)
2.2M
Lines of code scanned
across 8 repos
5
Real threats in OpenSSL
(2 critical + 3 high)
0
Competitors that classify
severity or explain findings

Why the numbers look this way: how Puaro works

The benchmark results aren't an accident. They come from a fundamentally different approach to scanning.

Most secret scanners work in one step: search through code for strings that match known patterns. If it matches, fire an alert. End of story.

Puaro runs four stages before declaring anything a finding.

Stage 1: Filter out the obvious noise

Before even looking for secrets, Puaro throws away everything that clearly isn't dangerous. Test directories, documentation folders, example files, placeholder values, code comments. The vast majority of what a generic scanner would flag never even reaches the next stage.

Stage 2: Find real candidates

From what's left, Puaro uses provider-specific detection patterns — targeting the actual formats of real credentials across dozens of services — to find genuine suspects. This is far more targeted than generic pattern matching.

Stage 3: Ask AI to reason about it

Here's the step the other tools skip. Puaro's AI reasoning engine evaluates each candidate in full context: which file is this in, what kind of project is this, what does the surrounding code look like?

The AI doesn't just say "this looks like a key." It reasons about it:

  • Is this a test fixture?
  • Is it a configuration default that ships with many projects?
  • Is the entropy high enough to be a real credential?
  • Does the surrounding code suggest this is a real usage or a documentation example?

The output is a structured verdict: severity (critical, high, medium, low), a confidence level, and a plain-English explanation.

Stage 4: Trace where it goes

For real findings, Puaro goes one step further. It uses code analysis to map the "blast radius" — where is this secret used? Does it flow to a network call? Does it get logged anywhere? What would an attacker be able to do with it?

💡

The result: By the time the AI makes a decision, it's only evaluating a highly filtered set of genuine suspects — not your entire codebase. That's why the output is signal, not noise.


"Isn't this just asking ChatGPT about your code?"

Fair question. The short answer is no.

A "GPT wrapper" approach would send your entire codebase to an AI and ask it to find secrets. That approach has real problems: it's expensive, it's slow, AI models can invent findings that don't exist (called hallucinations), and it gives you no way to verify the results.

Puaro's approach is the opposite. The AI only sees candidates that have already passed through three rounds of deterministic filtering. By the time AI reasoning happens, the hard work is done — the AI is the final check, not the primary scanner.

GPT WrapperPuaro (Compound AI)
How it worksCode → AI → ResultsCode → Filters → Patterns → AI → Analysis → Results
AI sees100% of your codeOnly pre-filtered suspects
Hallucination riskHigh — no verification layerLow — deterministic filters run first
ReliabilityEntirely dependent on the LLMMulti-layer system with built-in redundancy

What only Puaro gives you

Here's the feature comparison from the benchmark:

CapabilityPuaroGitleaksTruffleHogGitGuardian
AI-powered detectionpartial
Severity classification (Critical / High / Medium / Low)
Plain-English reason for every finding
Secret lifecycle analysis (where it flows in code)
Remediation guidance per finding
Real-time PR scanningCI only
Secret flow visualization

Every other tool in this benchmark outputs a flat list. You get a location ("line 42 in config.js") and a type ("potential AWS key"). No severity. No reason. No guidance.

Puaro is the only tool in the benchmark that answers the question developers actually need answered: "Should I be worried about this right now?"


What this means in practice

Let's make this concrete.

Imagine your team scans a 200,000-line codebase. Here's what you're likely looking at with each tool:

  • Gitleaks → You receive 1,000+ alerts, no severity, no context. Your most junior developer gets assigned to triage. They close half as false positives (guessing). The other half sit unresolved for weeks.

  • TruffleHog → Fewer alerts because it verifies credentials against providers. But it only works for credential types it knows about. Anything else gets missed or shown unverified.

  • GitGuardian → A cleaner interface than the others, but the same fundamental problem: no reasoning, no severity, no guidance. A list is still a list.

  • Puaro → Alerts come with severity levels and explanations. Your team only acts on critical and high findings first. Everything is triaged automatically. The developer who receives the alert knows exactly what to do.

The bottom line: A finding you understand is one you can fix. A finding you can't understand is just noise — and noise gets ignored.


The honest takeaway

The other tools in this benchmark aren't bad. Gitleaks is fast and free, which makes it a reasonable pre-commit gate. TruffleHog's credential verification is genuinely useful for deep historical audits. GitGuardian has a polished interface and solid detection.

But none of them answer the question that matters: "Is this actually dangerous?"

That gap — between finding something and understanding it — is where most security breaches actually happen. Not because the tool missed the secret, but because the alert was buried in noise and nobody got to it in time.

The benchmark numbers tell that story clearly. 9,793 alerts from Gitleaks across 8 repos. 15 alerts from Puaro in the same repos, every one explained, every one classified.

You can't act on 10,000 alerts. You can act on 15.


Want to see what Puaro finds in your own codebase? Start scanning free — setup takes under 5 minutes, no credit card required.

RELATED CONTENT

More Security Insights

Comparison10 min readMar 10, 2026

Puaro vs Gitleaks vs TruffleHog: Which Secret Scanner Fits Your Workflow?

A practical comparison of three secret scanning tools — regex speed, deep verification, and AI-powered classification — to help you choose the right fit for your team.

Read article
Security Insights5 min readOct 08, 2025

The $12 Billion Secret Scanning Revolution: How AI is Transforming Code Security

The source code secret scanning industry is exploding to $12 billion by 2033. Discover how AI-powered detection is achieving 97% accuracy and why your team needs to act now.

Read article
Supply Chain Security8 min readNov 01, 2025

How AI-Powered Scanning Prevents the Next 'GlassWorm' Supply Chain Attack

The recent GlassWorm incident exposed critical vulnerabilities in the software supply chain when developers accidentally leaked VS Code extension tokens. Learn how AI-powered scanning provides proactive prevention beyond simple pattern matching.

Read article
READY TO SECURE YOUR CODE?

Experience Puaro's Protection

Put these security insights into practice. Start scanning and see how Puaro can protect your applications from credential leaks and security vulnerabilities.