🦜 LangChain Python Audit Report

Technical Deep Dive & Improvement Roadmap

Audit Date: 2026-06-10 | Repository: LangChain Python OSS | Focus: libs/core

Aβˆ’
Executive Summary

LangChain Core is a production-quality, well-maintained open-source library that serves as the foundational abstraction layer for the LangChain ecosystem. The codebase demonstrates strong engineering practices: comprehensive test coverage (1,693+ test functions), strict type checking, security-focused design, and mature governance.

Top 3 Risks
God Object: Runnable class (6,574 lines)
langchain_core/runnables/base.py:125–6574
Single class handles composition, async/sync variants, configuration, streaming, and serialization. Difficult to test and extend.
Circular dependency: Callbacks ↔ Runnables
callbacks/manager.py, runnables/base.py, runnables/config.py
Tight coupling complicates feature integration and forces TYPE_CHECKING workarounds.
Undocumented critical behaviors & 30+ TODOs
prompts/, messages/, tools/ modules
Incomplete implementations, edge cases not documented. Contributors uncertain about intended behavior.
Top 3 Opportunities
1. Extract common patterns into reusable utilities
Reduce duplication across language models, runnables, and callback managers. Centralize error handling, async/sync bridging, and configuration merge logic.
2. Simplify deserialization architecture
The load/mapping system is secure but complex. Refactoring would reduce maintenance burden and make "safe mode" more accessible to users.
3. Establish hard module boundaries
Enforce isolation between "core abstractions," "implementations," and "integrations" via linting rules and documentation to reduce coupling.
Key Metrics
Metric Value Assessment
Source Files 349 .py files Moderate size, well-organized
Total Lines ~68.5k lines Healthy (core abstractions only)
Test Files 167 files, 1,693+ tests Excellent coverage
Type Safety mypy strict, 100% hints Production-grade
Largest File runnables/base.py: 6,574 lines Complex (needs refactoring)
Security Issues 0 critical found Well-designed (SSRF, safe deserialization)
Health Grade Justification

Aβˆ’ (Excellent with minor improvements needed)

Tech Stack
Component Technology
Language Python 3.10–3.14
Package Manager uv (fast, deterministic)
Build System hatchling
Type Checking mypy (strict mode)
Linting/Formatting ruff (0.15.0+)
Testing pytest, pytest-asyncio, syrupy
Core Dependencies pydantic (2.7.4+), tenacity, langsmith, jsonpatch, PyYAML
Security Custom SSRF protection, deserialization allowlists
Directory Structure

Main Modules

Module Purpose Key Files
runnables/ Composition & execution model base.py (6,574 lines), config.py, schema.py
language_models/ LLM & chat model abstractions base.py, chat_models.py (2,714 lines), llms.py
callbacks/ Event handling & tracing manager.py (2,792 lines), base.py
messages/ Message abstractions utils.py (2,400 lines), content.py, block_translators/
prompts/ Prompt templates chat.py (1,491 lines), string.py, loading.py
tools/ Tool/agent framework base.py (1,633 lines), simple.py, structured.py
load/ Serialization/deserialization load.py, serializable.py, mapping.py
_security/ SSRF & transport security _policy.py, _transport.py
Architectural Sketch
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PUBLIC API LAYER (High-level abstractions)              β”‚
β”‚  β”œβ”€ Runnable (composition, invoke/batch/stream)          β”‚
β”‚  β”œβ”€ BaseLanguageModel (chat & LLM protocols)             β”‚
β”‚  └─ BaseTool, BaseRetriever, BaseVectorStore             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓          ↓          ↓          ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  IMPLEMENTATION LAYER (Concrete classes)                 β”‚
β”‚  β”œβ”€ RunnableSequence, RunnableParallel (composition)     β”‚
β”‚  β”œβ”€ Messages, Prompts (domain models)                    β”‚
β”‚  β”œβ”€ CallbackManager, EventStreamCallbackHandler          β”‚
β”‚  └─ ToolCall, ToolMessage (agent framework)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓          ↓          ↓          ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  UTILITY LAYER (Cross-cutting concerns)                  β”‚
β”‚  β”œβ”€ Config merge, async/sync bridges                     β”‚
β”‚  β”œβ”€ Serialization (load, Serializable, mapping)          β”‚
β”‚  β”œβ”€ SSRF protection, error handling                      β”‚
β”‚  └─ Type checking, function calling, JSON schema         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Key Observations
βœ“ Strong security focus: SSRF protection and deserialization safeguards built in from the ground up
βœ“ Async-first design: Most public APIs support both sync and async with intelligent bridging
βœ“ Mature versioning: Semantic versioning with beta/deprecation markers
βœ“ Modular testing: Unit tests (no network), integration tests, benchmarks
Architecture & Design Findings
CRITICALGod Object: Runnable class (base.py)
langchain_core/runnables/base.py:125–6574
The Runnable class spans 6,574 lines and handles composition, async/sync variants, configuration, streaming, and serialization in a single class.
Difficult to test, refactor, and extend. New contributors must understand the entire class to make changes.
HIGHCircular dependency: Callbacks ↔ Runnables
callbacks/manager.py, runnables/base.py, runnables/config.py
Runnables import CallbackManager; callbacks import RunnableConfig. Circular imports force TYPE_CHECKING workarounds.
Brittle interdependencies. Adding new callback types or runnable behaviors requires touching both modules.
MEDIUMLeaky abstractions in message block translators
messages/block_translators/ (6 files: openai.py ~1,086 lines, anthropic.py, groq.py, etc.)
Each translator reimplements similar content parsing, tool call conversion, image handling logic.
Adding a new provider requires understanding 1000+ lines of similar code. Updates must be propagated to all translators.
Code Quality Findings
HIGHFile size hotspot: runnables/base.py (6,574 lines)
Hard to reason about, cascading effects from single changes, testing requires large context.
MEDIUM30+ TODO comments indicating incomplete work
prompts/string.py:378, messages/ai.py:301, language_models/chat_models.py:397
Incomplete implementations are sources of bugs. They signal unfinished architectural decisions.
MEDIUMError handling lacks granularity
Only 3 exception types (LangChainException, TracerException, OutputParserException) for 68k lines. No granular exceptions for async errors, config errors, validation errors.
Security Findings
βœ“ SECURE SSRF protection well-implemented

Comprehensive SSRF protection in _security/_policy.py:

  • 18 blocked IPv4 networks (private, reserved, cloud metadata)
  • 8 blocked IPv6 networks
  • Cloud metadata IP/hostname blocklists (AWS, GCP, Azure)
  • DNS-aware URL checking with async socket resolution
βœ“ SECURE Deserialization uses allowlists

Safe-by-default design with proper threat model documentation:

  • Allowlist-based instantiation (allowed_objects parameter)
  • Escape-based injection protection
  • Namespace validation
  • Default is 'core' (safer than 'all')
βœ“ SECURE No dangerous eval/exec/pickle usage

Audit of 57 files containing pickle references found no unsafe patterns. Pickle is used carefully for internal caching, not on untrusted data.

Testing Assessment
βœ“ Comprehensive test coverage
  • 167 test files, 1,693+ test functions
  • Unit tests with socket disabled (good isolation)
  • Integration tests for external services
  • Benchmarks for performance profiling
⚠ Async test coverage could be expanded

Most tests are sync; async variants are less common. Current async tests may not catch edge cases like race conditions, context variable leaks, or deadlocks.

Strengths
βœ“ Comprehensive type checking β€” mypy strict mode, type hints on 100% of public APIs
βœ“ Security-first design β€” SSRF protection, safe deserialization, careful serialization
βœ“ Async-first architecture β€” Native async support, elegant bridging to sync
βœ“ Mature test suite β€” 1,693 tests, unit/integration separation, snapshot tests
βœ“ Stable public APIs β€” Deprecation markers, changelog, version policy
βœ“ Active maintenance β€” Regular releases, responsive to issues
Improvement Strategy
Theme 1: Monolithic Runnable class
Root Cause

Runnable accumulates responsibilities for composition, execution, configuration, and introspection in a single 6,574-line class.

Target State

Extract interfaces into focused protocols. Move implementation details to private mixins or composition.

Principles
  • Single Responsibility β€” each class has one reason to change
  • Interface Segregation β€” clients depend on minimal interface
  • Composition over inheritance
Trade-offs
Effort M–L (significant refactoring, low risk)
Risk Must maintain 100% backward compat
Benefit Easier testing, onboarding, feature addition
Done Criteria
  • runnables/base.py split into 3–4 focused modules
  • Each module <1,500 lines
  • All tests pass
  • Public API unchanged
Theme 2: Circular dependency (Callbacks ↔ Runnables)
Target State

Define minimal Event protocol. Runnables emit events; callbacks subscribe via registry. Config becomes optional metadata.

Done Criteria
  • No circular imports between runnables and callbacks
  • Custom callbacks can be added without modifying Runnable
  • All tests pass
Theme 3: Duplicated block translator logic
Target State

Extract common patterns (content parsing, tool call conversion) into base class. Each provider implements only overrides.

Done Criteria
  • Common base class with shared utilities
  • Each translator <600 lines
  • 100% test coverage maintained
Theme 4: Incomplete implementations (30+ TODOs)
Target State

Each TODO resolved: either implement, document with ticket, or remove with rationale.

Done Criteria
  • 0 unjustified TODOs in core/
  • Each TODO has GitHub issue link or inline explanation
  • Contributors know which features are incomplete
Measurable Success Metrics
Dimension Current Target
Largest file size 6,574 lines <2,000 lines
Circular imports 3–5 major cycles 0
Unjustified TODOs 30+ 0
Type coverage 100% 100% (maintain)
Test coverage Good Excellent (async parity)
Quick Wins ⚑

High impact, low effort (S = <2 hours).

Remove unused imports

Run `ruff check --select F401` and remove unreachable code. ~2 hours

Document SSRF protection in README

Add 1 paragraph explaining SSRF protection, link to _security/_policy.py. ~30 minutes

Create "architecture" section in CLAUDE.md

Add ASCII diagram of module relationships. ~1 hour

Consolidate repeated type aliases

Create types.py; Input, Output, Callbacks are redefined in multiple files. ~1.5 hours

Add pre-commit hooks

Add .pre-commit-config.yaml for ruff, mypy, pytest. ~1–2 hours

Milestone 0 β€” Safety Net

Establish baseline and safety mechanisms before refactoring.

Task 0.1: Snapshot test coverage and performance β–Ό
Generate baseline metrics
S (1–2h) Risk: Low

Run full test suite locally; record coverage, execution time, memory usage.

Acceptance: Coverage report generated, baseline metrics stored, regression detection enabled.

Task 0.2: Set up pre-commit hooks β–Ό
Add ruff, mypy, pytest hooks
S (1–2h) Risk: Low

Add .pre-commit-config.yaml for linting, formatting, type checking, unit tests.

Acceptance: Hooks run on commit, fail on issues, developers can skip with --no-verify.

Milestone 1 β€” Critical Fixes
Task 1.1: Audit pickle usage β–Ό
Ensure no unsafe pickle patterns
M (2–4h) Risk: Low

Find all 57 pickle references; ensure none use untrusted input. Document findings.

Acceptance: Report lists each pickle call, no unsafe patterns, if found remediate or file ticket.

Task 1.2: Resolve unjustified TODOs β–Ό
Fix, document, or remove 30+ TODOs
M (4–6h) Risk: Medium

For each TODO, implement, add GitHub issue link, or remove with rationale.

Acceptance: Each TODO resolved, 0 unjustified TODOs remain, contributors know status.

Milestone 2 β€” High-Leverage Improvements
⭐ Task 2.1: Extract block translator logic (Top 3 Priority) β–Ό
Refactor 6 message translators to share common base
L (3–4d) Risk: Medium

Implementation Sketch:

  1. Create messages/block_translators/base.py with BaseBlockTranslator
  2. Extract common patterns: content block validation, tool call conversion, null handling, image processing
  3. Each provider inherits from base, overrides only provider-specific logic
  4. Move shared test fixtures to conftest
  5. Add integration tests for new providers

Acceptance: Each translator <600 lines, 0 behavioral changes, new providers can reuse base, coverage maintained.

Pitfalls: Providers have subtle differences; don't over-abstract. Tests must cover all providers.

⭐ Task 2.2: Reduce Runnable size by 50% (Top 3 Priority) β–Ό
Split runnables/base.py into focused modules
L (4–5d) Risk: High

Implementation Sketch:

  1. Identify orthogonal concerns: Execution, Composition, Configuration, Introspection
  2. Extract each into a mixin class (private)
  3. Runnable inherits from mixins (maintains 100% API compat)
  4. Each mixin <800 lines, focused on one concern
  5. Tests organized by concern

Acceptance: runnables/base.py reduced to <1,500 lines, each mixin <800 lines, 0 API changes, import time unchanged, coverage maintained.

Pitfalls: Runnable is used everywhere; high chance of subtle breakage. Don't introduce new public methods. Circular references between mixinsβ€”design carefully.

⭐ Task 2.3: Simplify callback/runnable coupling (Top 3 Priority) β–Ό
Decouple callbacks from runnables via event bus
M (2–3d) Risk: High

Implementation Sketch:

  1. Define Event protocol (minimal, core only)
  2. Create EventBus class (simple pub-sub)
  3. Refactor CallbackManager to emit events instead of tight coupling
  4. Runnables emit events, don't know listeners
  5. Backward compat layer: wrap old callbacks as event listeners

Acceptance: No circular imports, existing API unchanged, new callbacks don't modify Runnable, all tests pass, event bus well-tested.

Pitfalls: Event bus must be thread-safe and async-safe. Maintain backward compat strictly.

Milestone 3 β€” Quality & Polish
Task 3.1: Expand async test coverage β–Ό
Add stress tests for async paths
M (2–3d) Risk: Low

Add concurrency, context var isolation, cancellation, and high-concurrency tests.

Acceptance: 20+ new async tests, no flaky tests, async coverage matches sync.

Task 3.2: Extract error handling patterns β–Ό
Consolidate error logic from multiple modules
M (2–3d) Risk: Medium

Create utils/error_handling.py; remove duplicated patterns from language_models, runnables, tools.

Acceptance: ~100 lines saved, all tests pass, error handling centralized.

Task 3.3: Document architecture β–Ό
Write architecture guide and design docs
M (2–3d) Risk: Low

Document module relationships, extension points, anti-patterns. Create ASCII architecture diagram.

Acceptance: ARCHITECTURE.md with diagram, 1-para overviews per module, "how to extend" guides.

Task 3.4: Add module docstrings β–Ό
Explain responsibilities of each module
S (1–2d) Risk: Low

Add 2–3 sentence docstrings to __init__.py and main modules explaining their role.

Acceptance: Each module has docstring, public/private distinction clear, no docstring >5 sentences.

Prioritized Roadmap
Immediate (Next Sprint)
  • Task 0.1: Snapshot metrics
  • Task 0.2: Pre-commit hooks
  • Task 1.1: Audit pickle
  • Task 1.2: Resolve TODOs
  • Quick wins (5 Γ— S tasks)
Short-term (2–3 sprints)
  • Task 2.1: Extract block translator logic ⭐
  • Task 2.2: Reduce Runnable size ⭐
  • Task 3.1: Expand async tests
Medium-term (4–6 sprints)
  • Task 2.3: Simplify callback coupling ⭐
  • Task 3.2: Extract error handling
  • Task 3.3: Document architecture
  • Task 3.4: Add module docstrings