← Back to article · Internal artifact

Model Comparison — LangChain Audit (same prompt)

Date: 2026-06-17 (updated with Fable) Prompt: Four-phase audit (langchain-prompt.md) Reports compared:

Model File Overall grade
Claude Opus 4.8 audit-report-opus.md A−
Claude Fable 5 audit-report-fable.md A−
Claude Sonnet 5 audit-report-sonnet-5.md B+
Claude Sonnet 4.6 audit-report-sonnet-4-6.md B+
Claude Haiku 4.5 audit-report-haiku.md A

Overall verdict: No single report replaces the others. Five complementary perspectives on the same repo.

  • Opus — Advanced threat modeling (TOCTOU on unprotected paths, shell default, dual bypass).
  • FableHolistic view with the same calibration as Opus (A−): god-file debt, unsafe load() defaults, parked lint (BLE/ERA/ANN401), best task plan (M0–M3 milestones with effort/risk).
  • Sonnet 5 — Primary Sonnet audit: incomplete SSRF adoption, graph_mermaid.py, silent exceptions, repo hygiene.
  • Sonnet 4.6 — Ops complement: lockfile CI, README gpt-5.4, SECURITY.md (not repeated in S5 or Fable).
  • Haiku — LOC and architecture map; factual CI errors.

Ideal composite report: Haiku (exploration) → Sonnet 5 (audit) → Sonnet 4.6 (CI/adoption checks) → Opus (threat review) → Fable (strategy + executable backlog) → human review.


0. Fable — What does it add to the experiment?

Aspect Fable Comparison
Grade A− Ties with Opus; more honest than Haiku (A)
Focus Technical debt + disabled guardrails + plan Opus = threats; Fable = how to fix it
Top risk 1 God files (5 files >1,800 LOC; base.py 6,574) Also flagged by Haiku/Opus/S5; Fable quantifies all 5
Top risk 2 Unsafe default load() allowed_objects='core' Same as Sonnet 4.6; Opus didn't rank it in top 3
Top risk 3 208 type: ignore + disallow_any_generics=false S5 counts ignores; Fable ties it to a CI ratchet
Exclusive Vendored mustache.py (704 LOC), C90 disabled, usage.py swallows AttributeError No one else caught this
Phase 4 plan M0–M3 milestones, quick wins, non-goals, measurable DoD The most actionable of the five
Missed TOCTOU, shell host, SSRF 2 sites, graph_mermaid, lockfile CI, broken README Complements Opus + S5 + S4.6

Fable's conclusion: Doesn't replace Opus (agent threats) or Sonnet 5 (SSRF adoption). It's the best candidate for turning findings into a roadmap with priorities and implementation risk.


0b. Sonnet 4.6 vs Sonnet 5 — What changed?

Aspect Sonnet 4.6 Sonnet 5
Grade B+ B+ (same calibration, better justification)
Methodology file:line citations + acknowledges shallow clone, no coverage %, no guessing
SSRF Default load(), env bypass Transport only in 2 call sites; graph_mermaid.py:461 unprotected
Exceptions 16/8/7 except Exception (High) Blocks that swallow without logging — Medium, more actionable
CI Lockfile check commented out (High) Working-tree clean after tests; no lockfile in _lint.yml
Docs Broken README gpt-5.4 README correct; audit artifacts at repo root

Sonnet conclusion: Sonnet 5 as the primary pass; cross-check with Sonnet 4.6 for lockfile/load()/README if the repo hasn't changed. Fable already covers load() but not lockfile or README.


1. Executive summary

Aspect Best Why
Grade calibration (A–F) OpusFable A− balances strengths and risks. Sonnet is a cautious B+. Haiku's A is inflated.
Threat modeling Opus TOCTOU; default shell host; dual bypass.
Strategy and task plan Fable M0–M3 themes, quick wins, non-goals, DoD with metrics (type:ignore ratchet, active BLE, etc.).
Security "infra vs adoption" Sonnet 5 2 ssrf_safe_client call sites; graph_mermaid.py; IP pinning in _transport.py.
Ops / CI / adoption Sonnet 4.6 Commented lockfile, README, SECURITY.md — absent in S5, Fable, and Opus.
Technical debt / complexity Fable (+ Haiku) God files, lint TODOs, C90 off, in-tree mustache, agents v1 coverage.
Methodological honesty OpusSonnet 5Fable Fact/Judgment; explicit limits (Fable flags coverage % as unverified).
Quick wins Fable (+ S5, S4.6) Fable: .gitignore for audit artifacts, logger.debug in usage.py, issues for ruff rules. S5: silent excepts. S4.6: lockfile, SECURITY.md.

2. Phase 1 — Repository Map

Aspect Best Why
Factual accuracy Sonnet 4.6 (+ Sonnet 5) S4.6: _lint.yml lockfile. Haiku: CI lockfile error. Fable: correct on per-package lockfiles but misses the CI check.
Structural depth Haiku (+ Fable, Opus, S5) Fable: 27 workflows, 454 tests, 15 partners, test/source ratios. Haiku: LOC per module.
Analysis limitations Sonnet 5 / Opus / Fable Fable: coverage % and CVE scan flagged as not executed.

3. Phase 2 — Audit (summary by dimension)

Security

Finding Opus Fable S5 S4.6 Haiku
TOCTOU / DNS rebinding ⚠️
Default shell host
SSRF transport barely adopted (2 sites)
graph_mermaid.py fetch without SSRF protection
Unsafe default load() allowed_objects='core'
LANGCHAIN_ENV bypass ⚠️
subprocess S603
No eval/exec/pickle on input
Dedicated _security SSRF module

Security winner: Opus (threats) + Sonnet 5 (adoption) + Fable/S4.6 (load() default).

Testing / CI

Finding Opus Fable S5 S4.6 Haiku
Lockfile check commented out in _lint.yml ✗ incorrect
Working-tree clean in CI
Agent factory low test coverage vs LOC
454 unit tests + pytest-socket ⚠️

Testing winner: Sonnet 4.6 (lockfile) + Fable (agents v1 gap) + Sonnet 5 (CI hygiene).

Architecture

Finding Haiku Fable Opus S5 S4.6
6,574-LOC god object
5,064-LOC OpenAI partner
Callback/tracer cycles
Duplicated block_translators
Vendored mustache.py 704 LOC
McCabe C90 disabled

Code quality

Finding Fable S5 S4.6 Opus
208 type: ignore in core ANN401
BLE/ERA/ANN401 in ignore list (TODO) BLE ignored
usage.py swallows AttributeError ⚠️ similar issue
Zero bare except:

DX / documentation

Finding Fable S5 Opus S4.6 Haiku
Audit artifacts at repo root
Invalid README gpt-5.4
Missing SECURITY.md
Threat model in load.py docstring ⚠️

4. Exclusive findings matrix (5 models)

Finding Opus Fable S5 S4.6 Haiku
TOCTOU (validation without pinning)
ShellToolMiddleware default host
SSRF transport only 2 adoptions
graph_mermaid.py without SSRF protection
Silent excepts without logging (tools/base…) ⚠️
usage.py AttributeError without logging
208 type:ignore + proposed CI ratchet
Vendored mustache.py + template risk
C90 complexity lint disabled
M0–M3 plan with effort/risk/deps
Unsafe default load()
Commented-out lockfile CI
README gpt-5.4
Missing SECURITY.md
Callback/tracer cycles
Duplicated block_translators
Partners without pre-commit
Leftover audit reports at repo root

5. Prompt compliance

Requirement Opus Fable Sonnet 5 Sonnet 4.6 Haiku
file:line citations ⚠️
Fact vs Judgment ⚠️
No guessing ⚠️ ❌ lockfile
Trade-offs / non-goals ✅✅
Actionable Phase 4 ⚠️ ✅✅ ⚠️
Honest grade ✅ A− ✅ A− ✅ B+ ✅ B+ ❌ A

Compliance winner: Fable (phase 4) → OpusSonnet 5 → Sonnet 4.6 → Haiku.


6. Ranking by goal

Goal 1st 2nd 3rd 4th 5th
Threat modeling Opus Sonnet 5 Fable Sonnet 4.6 Haiku
Strategy / backlog Fable Sonnet 5 Opus Sonnet 4.6 Haiku
SSRF adoption security Sonnet 5 Opus Sonnet 4.6 Fable Haiku
CI / ops / adoption Sonnet 4.6 Sonnet 5 Fable Opus Haiku
Primary audit (Sonnet) Sonnet 5 Sonnet 4.6
Grade calibration Opus / Fable Sonnet 5 Sonnet 4.6 Haiku
Architecture / LOC Haiku Fable Opus Sonnet 5 S4.6
Prompt compliance Fable Opus Sonnet 5 Sonnet 4.6 Haiku
Ready to act on Merge of S5+S4.6+Opus+Fable
Cost / quality Sonnet 5 Fable Sonnet 4.6 Opus Haiku

7. Recommended hybrid pipeline (5 models)

1. Haiku        → LOC map + cycles + translators (fast)
2. Sonnet 5     → Primary audit + SSRF adoption + hygiene
3. Sonnet 4.6   → Ops pass: lockfile, README, SECURITY.md
4. Opus         → Threat model + shell default + TOCTOU
5. Fable        → Strategy + milestones + quick wins + non-goals
6. Human        → Merge the backlog; verify _lint.yml and README in your current repo

Budget alternative: If you can't run both Opus and Fable, choose Opus for agent/shell-facing products and Fable for refactoring/technical debt work.


8. Conclusion

  • Sonnet 5 remains the primary auditor of the mid tier: rigor + real SSRF adoption.
  • Sonnet 4.6 is still worth an ops pass for findings neither S5 nor Fable repeated (lockfile CI, README, SECURITY.md).
  • Opus is irreplaceable for agent threats and TOCTOU.
  • Fable adds the missing piece: the same honest grade as Opus (A−) with the best execution plan — ideal for closing the audit loop.
  • Haiku only as initial exploration, always verified.

For video/marketing: the story shifts from "4 models" to "5 reports, 5 roles" — tiering + merging, not "the most expensive tier wins."

Updated: 2026-06-17 Location: management/marketing/PROMPTS-CATALOG/17-06-fable/comparison-models-report.md