Model Comparison — LangChain Audit (same prompt)
Date: 2026-06-17 (updated with Fable)
Prompt: Four-phase audit (langchain-prompt.md)
Reports compared:
| Model |
File |
Overall grade |
| Claude Opus 4.8 |
audit-report-opus.md |
A− |
| Claude Fable 5 |
audit-report-fable.md |
A− |
| Claude Sonnet 5 |
audit-report-sonnet-5.md |
B+ |
| Claude Sonnet 4.6 |
audit-report-sonnet-4-6.md |
B+ |
| Claude Haiku 4.5 |
audit-report-haiku.md |
A |
Overall verdict: No single report replaces the others. Five complementary perspectives on the same repo.
- Opus — Advanced threat modeling (TOCTOU on unprotected paths, shell default, dual bypass).
- Fable — Holistic view with the same calibration as Opus (A−): god-file debt, unsafe
load() defaults, parked lint (BLE/ERA/ANN401), best task plan (M0–M3 milestones with effort/risk).
- Sonnet 5 — Primary Sonnet audit: incomplete SSRF adoption,
graph_mermaid.py, silent exceptions, repo hygiene.
- Sonnet 4.6 — Ops complement: lockfile CI, README
gpt-5.4, SECURITY.md (not repeated in S5 or Fable).
- Haiku — LOC and architecture map; factual CI errors.
Ideal composite report: Haiku (exploration) → Sonnet 5 (audit) → Sonnet 4.6 (CI/adoption checks) → Opus (threat review) → Fable (strategy + executable backlog) → human review.
0. Fable — What does it add to the experiment?
| Aspect |
Fable |
Comparison |
| Grade |
A− |
Ties with Opus; more honest than Haiku (A) |
| Focus |
Technical debt + disabled guardrails + plan |
Opus = threats; Fable = how to fix it |
| Top risk 1 |
God files (5 files >1,800 LOC; base.py 6,574) |
Also flagged by Haiku/Opus/S5; Fable quantifies all 5 |
| Top risk 2 |
Unsafe default load() allowed_objects='core' |
Same as Sonnet 4.6; Opus didn't rank it in top 3 |
| Top risk 3 |
208 type: ignore + disallow_any_generics=false |
S5 counts ignores; Fable ties it to a CI ratchet |
| Exclusive |
Vendored mustache.py (704 LOC), C90 disabled, usage.py swallows AttributeError |
No one else caught this |
| Phase 4 plan |
M0–M3 milestones, quick wins, non-goals, measurable DoD |
The most actionable of the five |
| Missed |
TOCTOU, shell host, SSRF 2 sites, graph_mermaid, lockfile CI, broken README |
Complements Opus + S5 + S4.6 |
Fable's conclusion: Doesn't replace Opus (agent threats) or Sonnet 5 (SSRF adoption). It's the best candidate for turning findings into a roadmap with priorities and implementation risk.
0b. Sonnet 4.6 vs Sonnet 5 — What changed?
| Aspect |
Sonnet 4.6 |
Sonnet 5 |
| Grade |
B+ |
B+ (same calibration, better justification) |
| Methodology |
file:line citations |
+ acknowledges shallow clone, no coverage %, no guessing |
| SSRF |
Default load(), env bypass |
Transport only in 2 call sites; graph_mermaid.py:461 unprotected |
| Exceptions |
16/8/7 except Exception (High) |
Blocks that swallow without logging — Medium, more actionable |
| CI |
Lockfile check commented out (High) |
Working-tree clean after tests; no lockfile in _lint.yml |
| Docs |
Broken README gpt-5.4 |
README correct; audit artifacts at repo root |
Sonnet conclusion: Sonnet 5 as the primary pass; cross-check with Sonnet 4.6 for lockfile/load()/README if the repo hasn't changed. Fable already covers load() but not lockfile or README.
1. Executive summary
| Aspect |
Best |
Why |
| Grade calibration (A–F) |
Opus ≈ Fable |
A− balances strengths and risks. Sonnet is a cautious B+. Haiku's A is inflated. |
| Threat modeling |
Opus |
TOCTOU; default shell host; dual bypass. |
| Strategy and task plan |
Fable |
M0–M3 themes, quick wins, non-goals, DoD with metrics (type:ignore ratchet, active BLE, etc.). |
| Security "infra vs adoption" |
Sonnet 5 |
2 ssrf_safe_client call sites; graph_mermaid.py; IP pinning in _transport.py. |
| Ops / CI / adoption |
Sonnet 4.6 |
Commented lockfile, README, SECURITY.md — absent in S5, Fable, and Opus. |
| Technical debt / complexity |
Fable (+ Haiku) |
God files, lint TODOs, C90 off, in-tree mustache, agents v1 coverage. |
| Methodological honesty |
Opus ≈ Sonnet 5 ≈ Fable |
Fact/Judgment; explicit limits (Fable flags coverage % as unverified). |
| Quick wins |
Fable (+ S5, S4.6) |
Fable: .gitignore for audit artifacts, logger.debug in usage.py, issues for ruff rules. S5: silent excepts. S4.6: lockfile, SECURITY.md. |
2. Phase 1 — Repository Map
| Aspect |
Best |
Why |
| Factual accuracy |
Sonnet 4.6 (+ Sonnet 5) |
S4.6: _lint.yml lockfile. Haiku: CI lockfile error. Fable: correct on per-package lockfiles but misses the CI check. |
| Structural depth |
Haiku (+ Fable, Opus, S5) |
Fable: 27 workflows, 454 tests, 15 partners, test/source ratios. Haiku: LOC per module. |
| Analysis limitations |
Sonnet 5 / Opus / Fable |
Fable: coverage % and CVE scan flagged as not executed. |
3. Phase 2 — Audit (summary by dimension)
Security
| Finding |
Opus |
Fable |
S5 |
S4.6 |
Haiku |
| TOCTOU / DNS rebinding |
✅ |
❌ |
⚠️ |
❌ |
❌ |
| Default shell host |
✅ |
❌ |
❌ |
❌ |
❌ |
| SSRF transport barely adopted (2 sites) |
❌ |
❌ |
✅ |
❌ |
❌ |
graph_mermaid.py fetch without SSRF protection |
❌ |
❌ |
✅ |
❌ |
❌ |
Unsafe default load() allowed_objects='core' |
❌ |
✅ |
❌ |
✅ |
❌ |
LANGCHAIN_ENV bypass |
✅ |
⚠️ |
✅ |
✅ |
❌ |
subprocess S603 |
❌ |
❌ |
❌ |
✅ |
❌ |
| No eval/exec/pickle on input |
❌ |
✅ |
❌ |
❌ |
❌ |
Dedicated _security SSRF module |
❌ |
✅ |
✅ |
❌ |
❌ |
Security winner: Opus (threats) + Sonnet 5 (adoption) + Fable/S4.6 (load() default).
Testing / CI
| Finding |
Opus |
Fable |
S5 |
S4.6 |
Haiku |
Lockfile check commented out in _lint.yml |
❌ |
❌ |
❌ |
✅ |
✗ incorrect |
| Working-tree clean in CI |
❌ |
❌ |
✅ |
❌ |
❌ |
| Agent factory low test coverage vs LOC |
❌ |
✅ |
❌ |
❌ |
❌ |
| 454 unit tests + pytest-socket |
❌ |
✅ |
⚠️ |
❌ |
❌ |
Testing winner: Sonnet 4.6 (lockfile) + Fable (agents v1 gap) + Sonnet 5 (CI hygiene).
Architecture
| Finding |
Haiku |
Fable |
Opus |
S5 |
S4.6 |
| 6,574-LOC god object |
✅ |
✅ |
✅ |
✅ |
✅ |
| 5,064-LOC OpenAI partner |
❌ |
✅ |
❌ |
✅ |
❌ |
| Callback/tracer cycles |
✅ |
❌ |
❌ |
❌ |
❌ |
| Duplicated block_translators |
✅ |
❌ |
❌ |
❌ |
❌ |
Vendored mustache.py 704 LOC |
❌ |
✅ |
❌ |
❌ |
❌ |
| McCabe C90 disabled |
❌ |
✅ |
❌ |
❌ |
❌ |
Code quality
| Finding |
Fable |
S5 |
S4.6 |
Opus |
208 type: ignore in core |
✅ |
✅ |
❌ |
ANN401 |
| BLE/ERA/ANN401 in ignore list (TODO) |
✅ |
❌ |
❌ |
BLE ignored |
usage.py swallows AttributeError |
✅ |
⚠️ similar issue |
❌ |
❌ |
Zero bare except: |
❌ |
✅ |
❌ |
❌ |
DX / documentation
| Finding |
Fable |
S5 |
Opus |
S4.6 |
Haiku |
| Audit artifacts at repo root |
✅ |
✅ |
❌ |
❌ |
❌ |
Invalid README gpt-5.4 |
❌ |
❌ |
❌ |
✅ |
❌ |
| Missing SECURITY.md |
❌ |
❌ |
❌ |
✅ |
❌ |
Threat model in load.py docstring |
✅ |
⚠️ |
✅ |
✅ |
❌ |
4. Exclusive findings matrix (5 models)
| Finding |
Opus |
Fable |
S5 |
S4.6 |
Haiku |
| TOCTOU (validation without pinning) |
✅ |
❌ |
— |
❌ |
❌ |
| ShellToolMiddleware default host |
✅ |
❌ |
❌ |
❌ |
❌ |
| SSRF transport only 2 adoptions |
❌ |
❌ |
✅ |
❌ |
❌ |
| graph_mermaid.py without SSRF protection |
❌ |
❌ |
✅ |
❌ |
❌ |
| Silent excepts without logging (tools/base…) |
❌ |
⚠️ |
✅ |
❌ |
❌ |
usage.py AttributeError without logging |
❌ |
✅ |
❌ |
❌ |
❌ |
| 208 type:ignore + proposed CI ratchet |
❌ |
✅ |
✅ |
❌ |
❌ |
| Vendored mustache.py + template risk |
❌ |
✅ |
❌ |
❌ |
❌ |
| C90 complexity lint disabled |
❌ |
✅ |
❌ |
❌ |
❌ |
| M0–M3 plan with effort/risk/deps |
❌ |
✅ |
❌ |
❌ |
❌ |
| Unsafe default load() |
❌ |
✅ |
❌ |
✅ |
❌ |
| Commented-out lockfile CI |
❌ |
❌ |
❌ |
✅ |
✗ |
| README gpt-5.4 |
❌ |
❌ |
❌ |
✅ |
❌ |
| Missing SECURITY.md |
❌ |
❌ |
❌ |
✅ |
❌ |
| Callback/tracer cycles |
❌ |
❌ |
❌ |
❌ |
✅ |
| Duplicated block_translators |
❌ |
❌ |
❌ |
❌ |
✅ |
| Partners without pre-commit |
✅ |
❌ |
❌ |
❌ |
❌ |
| Leftover audit reports at repo root |
❌ |
✅ |
✅ |
❌ |
❌ |
5. Prompt compliance
| Requirement |
Opus |
Fable |
Sonnet 5 |
Sonnet 4.6 |
Haiku |
| file:line citations |
✅ |
✅ |
✅ |
✅ |
⚠️ |
| Fact vs Judgment |
✅ |
✅ |
✅ |
✅ |
⚠️ |
| No guessing |
✅ |
✅ |
✅ |
⚠️ |
❌ lockfile |
| Trade-offs / non-goals |
✅ |
✅✅ |
✅ |
✅ |
✅ |
| Actionable Phase 4 |
⚠️ |
✅✅ |
✅ |
✅ |
⚠️ |
| Honest grade |
✅ A− |
✅ A− |
✅ B+ |
✅ B+ |
❌ A |
Compliance winner: Fable (phase 4) → Opus → Sonnet 5 → Sonnet 4.6 → Haiku.
6. Ranking by goal
| Goal |
1st |
2nd |
3rd |
4th |
5th |
| Threat modeling |
Opus |
Sonnet 5 |
Fable |
Sonnet 4.6 |
Haiku |
| Strategy / backlog |
Fable |
Sonnet 5 |
Opus |
Sonnet 4.6 |
Haiku |
| SSRF adoption security |
Sonnet 5 |
Opus |
Sonnet 4.6 |
Fable |
Haiku |
| CI / ops / adoption |
Sonnet 4.6 |
Sonnet 5 |
Fable |
Opus |
Haiku |
| Primary audit (Sonnet) |
Sonnet 5 |
Sonnet 4.6 |
— |
— |
— |
| Grade calibration |
Opus / Fable |
Sonnet 5 |
Sonnet 4.6 |
Haiku |
— |
| Architecture / LOC |
Haiku |
Fable |
Opus |
Sonnet 5 |
S4.6 |
| Prompt compliance |
Fable |
Opus |
Sonnet 5 |
Sonnet 4.6 |
Haiku |
| Ready to act on |
Merge of S5+S4.6+Opus+Fable |
— |
— |
— |
— |
| Cost / quality |
Sonnet 5 |
Fable |
Sonnet 4.6 |
Opus |
Haiku |
7. Recommended hybrid pipeline (5 models)
1. Haiku → LOC map + cycles + translators (fast)
2. Sonnet 5 → Primary audit + SSRF adoption + hygiene
3. Sonnet 4.6 → Ops pass: lockfile, README, SECURITY.md
4. Opus → Threat model + shell default + TOCTOU
5. Fable → Strategy + milestones + quick wins + non-goals
6. Human → Merge the backlog; verify _lint.yml and README in your current repo
Budget alternative: If you can't run both Opus and Fable, choose Opus for agent/shell-facing products and Fable for refactoring/technical debt work.
8. Conclusion
- Sonnet 5 remains the primary auditor of the mid tier: rigor + real SSRF adoption.
- Sonnet 4.6 is still worth an ops pass for findings neither S5 nor Fable repeated (lockfile CI, README, SECURITY.md).
- Opus is irreplaceable for agent threats and TOCTOU.
- Fable adds the missing piece: the same honest grade as Opus (A−) with the best execution plan — ideal for closing the audit loop.
- Haiku only as initial exploration, always verified.
For video/marketing: the story shifts from "4 models" to "5 reports, 5 roles" — tiering + merging, not "the most expensive tier wins."
Updated: 2026-06-17
Location: management/marketing/PROMPTS-CATALOG/17-06-fable/comparison-models-report.md