Model Comparison — LangChain Audit (same prompt)

Date: 2026-06-17 (updated with Fable) Prompt: Four-phase audit (langchain-prompt.md) Reports compared:

Model	File	Overall grade
Claude Opus 4.8	`audit-report-opus.md`	A−
Claude Fable 5	`audit-report-fable.md`	A−
Claude Sonnet 5	`audit-report-sonnet-5.md`	B+
Claude Sonnet 4.6	`audit-report-sonnet-4-6.md`	B+
Claude Haiku 4.5	`audit-report-haiku.md`	A

Overall verdict: No single report replaces the others. Five complementary perspectives on the same repo.

Opus — Advanced threat modeling (TOCTOU on unprotected paths, shell default, dual bypass).
Fable — Holistic view with the same calibration as Opus (A−): god-file debt, unsafe load() defaults, parked lint (BLE/ERA/ANN401), best task plan (M0–M3 milestones with effort/risk).
Sonnet 5 — Primary Sonnet audit: incomplete SSRF adoption, graph_mermaid.py, silent exceptions, repo hygiene.
Sonnet 4.6 — Ops complement: lockfile CI, README gpt-5.4, SECURITY.md (not repeated in S5 or Fable).
Haiku — LOC and architecture map; factual CI errors.

Ideal composite report: Haiku (exploration) → Sonnet 5 (audit) → Sonnet 4.6 (CI/adoption checks) → Opus (threat review) → Fable (strategy + executable backlog) → human review.

0. Fable — What does it add to the experiment?

Aspect	Fable	Comparison
Grade	A−	Ties with Opus; more honest than Haiku (A)
Focus	Technical debt + disabled guardrails + plan	Opus = threats; Fable = how to fix it
Top risk 1	God files (5 files >1,800 LOC; `base.py` 6,574)	Also flagged by Haiku/Opus/S5; Fable quantifies all 5
Top risk 2	Unsafe default `load()` `allowed_objects='core'`	Same as Sonnet 4.6; Opus didn't rank it in top 3
Top risk 3	208 `type: ignore` + `disallow_any_generics=false`	S5 counts ignores; Fable ties it to a CI ratchet
Exclusive	Vendored `mustache.py` (704 LOC), C90 disabled, `usage.py` swallows `AttributeError`	No one else caught this
Phase 4 plan	M0–M3 milestones, quick wins, non-goals, measurable DoD	The most actionable of the five
Missed	TOCTOU, shell host, SSRF 2 sites, `graph_mermaid`, lockfile CI, broken README	Complements Opus + S5 + S4.6

Fable's conclusion: Doesn't replace Opus (agent threats) or Sonnet 5 (SSRF adoption). It's the best candidate for turning findings into a roadmap with priorities and implementation risk.

0b. Sonnet 4.6 vs Sonnet 5 — What changed?

Aspect	Sonnet 4.6	Sonnet 5
Grade	B+	B+ (same calibration, better justification)
Methodology	file:line citations	+ acknowledges shallow clone, no coverage %, no guessing
SSRF	Default `load()`, env bypass	Transport only in 2 call sites; `graph_mermaid.py:461` unprotected
Exceptions	16/8/7 `except Exception` (High)	Blocks that swallow without logging — Medium, more actionable
CI	Lockfile check commented out (High)	Working-tree clean after tests; no lockfile in `_lint.yml`
Docs	Broken README `gpt-5.4`	README correct; audit artifacts at repo root

Sonnet conclusion: Sonnet 5 as the primary pass; cross-check with Sonnet 4.6 for lockfile/load()/README if the repo hasn't changed. Fable already covers load() but not lockfile or README.

1. Executive summary

Aspect	Best	Why
Grade calibration (A–F)	Opus ≈ Fable	A− balances strengths and risks. Sonnet is a cautious B+. Haiku's A is inflated.
Threat modeling	Opus	TOCTOU; default shell host; dual bypass.
Strategy and task plan	Fable	M0–M3 themes, quick wins, non-goals, DoD with metrics (type:ignore ratchet, active BLE, etc.).
Security "infra vs adoption"	Sonnet 5	2 `ssrf_safe_client` call sites; `graph_mermaid.py`; IP pinning in `_transport.py`.
Ops / CI / adoption	Sonnet 4.6	Commented lockfile, README, `SECURITY.md` — absent in S5, Fable, and Opus.
Technical debt / complexity	Fable (+ Haiku)	God files, lint TODOs, C90 off, in-tree mustache, agents v1 coverage.
Methodological honesty	Opus ≈ Sonnet 5 ≈ Fable	Fact/Judgment; explicit limits (Fable flags coverage % as unverified).
Quick wins	Fable (+ S5, S4.6)	Fable: `.gitignore` for audit artifacts, `logger.debug` in usage.py, issues for ruff rules. S5: silent excepts. S4.6: lockfile, SECURITY.md.

2. Phase 1 — Repository Map

Aspect	Best	Why
Factual accuracy	Sonnet 4.6 (+ Sonnet 5)	S4.6: `_lint.yml` lockfile. Haiku: CI lockfile error. Fable: correct on per-package lockfiles but misses the CI check.
Structural depth	Haiku (+ Fable, Opus, S5)	Fable: 27 workflows, 454 tests, 15 partners, test/source ratios. Haiku: LOC per module.
Analysis limitations	Sonnet 5 / Opus / Fable	Fable: coverage % and CVE scan flagged as not executed.

3. Phase 2 — Audit (summary by dimension)

Security

Finding	Opus	Fable	S5	S4.6	Haiku
TOCTOU / DNS rebinding	✅	❌	⚠️	❌	❌
Default shell host	✅	❌	❌	❌	❌
SSRF transport barely adopted (2 sites)	❌	❌	✅	❌	❌
`graph_mermaid.py` fetch without SSRF protection	❌	❌	✅	❌	❌
Unsafe default `load()` `allowed_objects='core'`	❌	✅	❌	✅	❌
`LANGCHAIN_ENV` bypass	✅	⚠️	✅	✅	❌
`subprocess` S603	❌	❌	❌	✅	❌
No eval/exec/pickle on input	❌	✅	❌	❌	❌
Dedicated `_security` SSRF module	❌	✅	✅	❌	❌

Security winner: Opus (threats) + Sonnet 5 (adoption) + Fable/S4.6 (load() default).

Testing / CI

Finding	Opus	Fable	S5	S4.6	Haiku
Lockfile check commented out in `_lint.yml`	❌	❌	❌	✅	✗ incorrect
Working-tree clean in CI	❌	❌	✅	❌	❌
Agent factory low test coverage vs LOC	❌	✅	❌	❌	❌
454 unit tests + pytest-socket	❌	✅	⚠️	❌	❌

Testing winner: Sonnet 4.6 (lockfile) + Fable (agents v1 gap) + Sonnet 5 (CI hygiene).

Architecture

Finding	Haiku	Fable	Opus	S5	S4.6
6,574-LOC god object	✅	✅	✅	✅	✅
5,064-LOC OpenAI partner	❌	✅	❌	✅	❌
Callback/tracer cycles	✅	❌	❌	❌	❌
Duplicated block_translators	✅	❌	❌	❌	❌
Vendored `mustache.py` 704 LOC	❌	✅	❌	❌	❌
McCabe C90 disabled	❌	✅	❌	❌	❌

Code quality

Finding	Fable	S5	S4.6	Opus
208 `type: ignore` in core	✅	✅	❌	ANN401
BLE/ERA/ANN401 in ignore list (TODO)	✅	❌	❌	BLE ignored
`usage.py` swallows AttributeError	✅	⚠️ similar issue	❌	❌
Zero bare `except:`	❌	✅	❌	❌

DX / documentation

Finding	Fable	S5	Opus	S4.6	Haiku
Audit artifacts at repo root	✅	✅	❌	❌	❌
Invalid README `gpt-5.4`	❌	❌	❌	✅	❌
Missing SECURITY.md	❌	❌	❌	✅	❌
Threat model in `load.py` docstring	✅	⚠️	✅	✅	❌

4. Exclusive findings matrix (5 models)

Finding	Opus	Fable	S5	S4.6	Haiku
TOCTOU (validation without pinning)	✅	❌	—	❌	❌
ShellToolMiddleware default host	✅	❌	❌	❌	❌
SSRF transport only 2 adoptions	❌	❌	✅	❌	❌
graph_mermaid.py without SSRF protection	❌	❌	✅	❌	❌
Silent excepts without logging (tools/base…)	❌	⚠️	✅	❌	❌
`usage.py` AttributeError without logging	❌	✅	❌	❌	❌
208 type:ignore + proposed CI ratchet	❌	✅	✅	❌	❌
Vendored mustache.py + template risk	❌	✅	❌	❌	❌
C90 complexity lint disabled	❌	✅	❌	❌	❌
M0–M3 plan with effort/risk/deps	❌	✅	❌	❌	❌
Unsafe default load()	❌	✅	❌	✅	❌
Commented-out lockfile CI	❌	❌	❌	✅	✗
README gpt-5.4	❌	❌	❌	✅	❌
Missing SECURITY.md	❌	❌	❌	✅	❌
Callback/tracer cycles	❌	❌	❌	❌	✅
Duplicated block_translators	❌	❌	❌	❌	✅
Partners without pre-commit	✅	❌	❌	❌	❌
Leftover audit reports at repo root	❌	✅	✅	❌	❌

5. Prompt compliance

Requirement	Opus	Fable	Sonnet 5	Sonnet 4.6	Haiku
file:line citations	✅	✅	✅	✅	⚠️
Fact vs Judgment	✅	✅	✅	✅	⚠️
No guessing	✅	✅	✅	⚠️	❌ lockfile
Trade-offs / non-goals	✅	✅✅	✅	✅	✅
Actionable Phase 4	⚠️	✅✅	✅	✅	⚠️
Honest grade	✅ A−	✅ A−	✅ B+	✅ B+	❌ A

Compliance winner: Fable (phase 4) → Opus → Sonnet 5 → Sonnet 4.6 → Haiku.

6. Ranking by goal

Goal	1st	2nd	3rd	4th	5th
Threat modeling	Opus	Sonnet 5	Fable	Sonnet 4.6	Haiku
Strategy / backlog	Fable	Sonnet 5	Opus	Sonnet 4.6	Haiku
SSRF adoption security	Sonnet 5	Opus	Sonnet 4.6	Fable	Haiku
CI / ops / adoption	Sonnet 4.6	Sonnet 5	Fable	Opus	Haiku
Primary audit (Sonnet)	Sonnet 5	Sonnet 4.6	—	—	—
Grade calibration	Opus / Fable	Sonnet 5	Sonnet 4.6	Haiku	—
Architecture / LOC	Haiku	Fable	Opus	Sonnet 5	S4.6
Prompt compliance	Fable	Opus	Sonnet 5	Sonnet 4.6	Haiku
Ready to act on	Merge of S5+S4.6+Opus+Fable	—	—	—	—
Cost / quality	Sonnet 5	Fable	Sonnet 4.6	Opus	Haiku

7. Recommended hybrid pipeline (5 models)

1. Haiku        → LOC map + cycles + translators (fast)
2. Sonnet 5     → Primary audit + SSRF adoption + hygiene
3. Sonnet 4.6   → Ops pass: lockfile, README, SECURITY.md
4. Opus         → Threat model + shell default + TOCTOU
5. Fable        → Strategy + milestones + quick wins + non-goals
6. Human        → Merge the backlog; verify _lint.yml and README in your current repo

Budget alternative: If you can't run both Opus and Fable, choose Opus for agent/shell-facing products and Fable for refactoring/technical debt work.

8. Conclusion

Sonnet 5 remains the primary auditor of the mid tier: rigor + real SSRF adoption.
Sonnet 4.6 is still worth an ops pass for findings neither S5 nor Fable repeated (lockfile CI, README, SECURITY.md).
Opus is irreplaceable for agent threats and TOCTOU.
Fable adds the missing piece: the same honest grade as Opus (A−) with the best execution plan — ideal for closing the audit loop.
Haiku only as initial exploration, always verified.

For video/marketing: the story shifts from "4 models" to "5 reports, 5 roles" — tiering + merging, not "the most expensive tier wins."

Updated: 2026-06-17 Location: management/marketing/PROMPTS-CATALOG/17-06-fable/comparison-models-report.md