Technical Audit Report  Β·  Claude Sonnet 4.6  Β·  2026-06-12
B+

LangChain Monorepo Audit

langchain-ai/langchain  |  Python monorepo  |  Production maturity

The LangChain monorepo is a mature, production-grade Python framework supporting millions of developers. The codebase demonstrates strong engineering discipline: comprehensive CI/CD with per-package change detection, strict ruff (ALL rules), mypy strict mode across all packages, production-quality SSRF protection baked into core, an allowlist-based deserialization model with documented threat modeling, and an extensive test suite covering unit, integration, VCR-cassette, and benchmark levels.

The grade B+ reflects: excellent security intent, comprehensive CI, and strong typing commitment β€” offset by a 1,886-line god-function as the primary agent API entry point, suppressed mypy strictness at critical boundaries, a deserialization default that is still waiting for its breaking change, and a pre-commit gap for the most actively developed package.

Top 3 Risks

1
create_agent() God-Function (1,886 lines) β€” Single point of failure for the primary agent API. Impossible to audit or test sub-behaviors in isolation. Every agent feature change requires reading the entire function.
2
Deserialization Default 'core' Still Active β€” load()/loads() with no allowed_objects silently defaults to 'core', allowing SSRF via deserialized base_url on untrusted payloads. Pending warning not actionable.
3
HostExecutionPolicy β€” No Sandboxing β€” The shell middleware's default execution policy provides no syscall, filesystem, or network isolation. An agent using this policy has full host access.

Top 3 Opportunities

1
Decompose create_agent β€” Extract focused builder sub-functions. Improves testability, readability, and sets the architectural pattern for future agent API evolution.
2
Enforce Safe Deserialization Default β€” Change loads()/load() default to 'messages' or raise immediately, closing the deserialization SSRF risk for all new callers without active update.
3
Add Coverage Gates + Pre-commit Parity β€” Gate CI on coverage thresholds and add langchain_v1 to pre-commit hooks to prevent regressions in the most actively developed package.

Health Scorecard

DimensionGradeKey Finding
Architecture & Design⚠ High Riskcreate_agent god-function; duplicate async/sync composition code
Code QualityMediumSuppressed mypy rules; 116 TODOs; swallowed exceptions in error metadata
SecurityMediumDeserialization pending-default; HostExecutionPolicy no sandboxing
TestingMediumNo coverage gates; all-rules suppressed on test_react_agent.py
PerformanceLowLinear IP scan in hot path; non-thread-safe cache
DependenciesLowCVE constraints only via uv; legacy SQLAlchemy 1.4 range in classic
DevEx & OperationsMediumlangchain_v1 missing from pre-commit; manual release process
DocumentationMediumREADME quickstart uses non-existent model gpt-5.4
CI/CDStrongChange-detection matrix; SHA-pinned Actions; Pydantic compat matrix

Repository Map

Purpose, tech stack, architecture, and key directories β€” based on evidence from source files, not assumptions.

Purpose

LangChain is an open-source Python framework for building agents and LLM-powered applications. It provides standard model interfaces (BaseChatModel, BaseLanguageModel), composable chains via the LCEL Runnable protocol, a middleware-based agent framework built on LangGraph, document loaders, vector store abstractions, memory/history primitives, and a large ecosystem of partner integrations.

Target users: Application developers, ML engineers, AI product teams. Maturity: Production (Development Status :: 5 - Production/Stable in all pyproject.toml classifiers).

Tech Stack

Language
Python 3.10–3.14
Build System
hatchling + uv workspace
Linting
ruff (ALL rules)
Type Checking
mypy strict + pydantic.mypy
Testing
pytest + asyncio + syrupy + blockbuster + VCR
Agent Graphs
langgraph (StateGraph, ToolNode)
Tracing
langsmith (integrated at call boundaries)
Async
asyncio + run_in_executor bridging
CI
GitHub Actions (change-detection matrix)
Validation
pydantic v2
Pre-commit
ruff, texthooks, YAML/TOML validation
Serialization
Custom Serializable + Reviver allowlist

Architectural Sketch

langchain/ β”œβ”€β”€ libs/core/ langchain_core β€” base abstractions, Runnable protocol β”‚ β”œβ”€β”€ _api/ deprecation/beta decorator machinery β”‚ β”œβ”€β”€ _security/ SSRF protection, URL validation (production-quality) β”‚ β”œβ”€β”€ callbacks/ callback manager (sync + async, ~2000 lines) β”‚ β”œβ”€β”€ language_models/ BaseChatModel, BaseLLM, model_profile β”‚ β”œβ”€β”€ messages/ AIMessage, HumanMessage + block_translators/ β”‚ β”œβ”€β”€ load/ serialization allowlist (Reviver, load/loads) β”‚ β”œβ”€β”€ runnables/ LCEL Runnable, RunnableConfig, compose primitives β”‚ └── tools/ BaseTool, structured tools β”œβ”€β”€ libs/langchain_v1/ langchain (v1, ACTIVE) β€” high-level public API β”‚ └── langchain/ β”‚ β”œβ”€β”€ agents/ create_agent(), middleware framework β”‚ β”‚ β”œβ”€β”€ factory.py ← primary entry point, 1,886 lines ⚠ β”‚ β”‚ └── middleware/ before/after hooks + 13 implementations β”‚ └── chat_models/ init_chat_model() β”œβ”€β”€ libs/langchain/ langchain-classic (LEGACY, no new features) β”œβ”€β”€ libs/standard-tests/ shared integration test suite β”œβ”€β”€ libs/model-profiles/ model capability data + CLI β”œβ”€β”€ libs/text-splitters/ document chunking └── libs/partners/ openai, anthropic, ollama, groq, mistralai, …

Key Directories

DirectoryDescription
libs/core/langchain_core/_security/SSRF policy + DNS-aware URL validation with NAT64 support (production-quality)
libs/core/langchain_core/load/Allowlist-based deserialization with injection-escape mechanism and documented threat model
libs/core/langchain_core/callbacks/manager.pyCallback dispatch β€” large file (~2,000 lines), primary async/sync bridge
libs/langchain_v1/langchain/agents/factory.pycreate_agent() β€” primary agent builder (1,886 lines) ⚠
libs/langchain_v1/langchain/agents/middleware/13 production middleware implementations (PII, shell, summarization, HITL, retry…)
libs/langchain_v1/tests/unit_tests/agents/~70 unit test files covering agent/middleware behavior, composition, type-checking
.github/workflows/20+ workflow files; change-detection matrix CI with SHA-pinned Actions
libs/partners/openai, anthropic, ollama, groq, mistralai, huggingface, deepseek, xai, perplexity, chroma, qdrant, exa, nomic

Surprising Observations

Counter-Intuitive Package Naming

langchain_v1 (directory) publishes the active langchain package. langchain (directory) publishes langchain-classic (frozen). New contributors expecting the "langchain" directory to be the main package will be confused.

recursion_limit Hardcoded to 9,999 factory.py:1664

The agent graph's recursion limit is hardcoded to 9_999 with an explicit comment referencing a LangGraph issue (# https://github.com/langchain-ai/langgraph/issues/7313). This is a live workaround with no user-override path through the public API.

Pre-commit Skips the Active Package .pre-commit-config.yaml

The pre-commit configuration covers core, langchain (classic), all listed partners β€” but is entirely missing an entry for libs/langchain_v1/, which is the actively developed langchain package published to PyPI.

Audit Report

Evidence-based findings grouped by dimension. Each finding includes: what was found, where (file:line), why it matters, and severity. Fact vs Judgment clearly labeled.

3.1 Architecture & Design

High God-Function: create_agent() β€” 1,886 Lines factory.py:697–1685
Fact The create_agent public function spans ~1,186 lines of body code, contains 10+ nested function definitions (closures), multiple graph-construction decision trees, and handles: model initialization, middleware composition, structured output configuration, tool validation, graph wiring, recursion-limit overrides, and LangSmith metadata.
Why it matters

Any bug in tool routing, middleware edge construction, or structured output handling requires reading the entire function. It is impossible to unit-test sub-behaviors in isolation. Onboarding a new contributor to this area is extremely slow. Side effects from graph.compile() at the end of a 1,000-line function mean failures surface far from their cause.

Medium Duplicate Sync/Async Composition Logic factory.py:221–401, 585–694
Fact _chain_model_call_handlers (sync, ~90 lines) and _chain_async_model_call_handlers (async, ~80 lines) are structurally identical. The _to_composed_result helper, compose_two inner function, and accumulation pattern appear twice each. Same for tool-call wrapper chains.
Why it matters

Any logic change in one path must be replicated manually in the other. Divergence is a matter of time and has already introduced subtle differences.

Medium Callback Manager as Mega-Class langchain_core/callbacks/manager.py
Judgment The callback manager file (~2,000 estimated lines) contains CallbackManager, AsyncCallbackManager, multiple run-specific managers, context-manager helpers, and the chain-group pattern. While this is an established pattern, it's difficult to navigate and changes here carry a high blast radius.
Low Counter-Intuitive Package Directory Naming libs/langchain/pyproject.toml:6, libs/langchain_v1/pyproject.toml:6
Fact The directory libs/langchain_v1/ publishes the package named langchain. The directory libs/langchain/ publishes the package named langchain-classic. This is an artifact of the migration but creates onboarding friction for new contributors.

3.2 Code Quality

High _to_composed_result Helper Duplicated 4Γ— factory.py:242–258, 333–349
Fact Identical helper function body (normalization of ModelResponse | AIMessage | ExtendedModelResponse | _ComposedExtendedModelResponse) appears twice with only async/await differences. The tool-call chain composition helpers have the same duplication.
Medium Suppressed mypy Strictness β€” Any Gaps at Critical Boundaries libs/core/pyproject.toml:95, libs/langchain/pyproject.toml:149–150, libs/langchain_v1/pyproject.toml:120
Fact disallow_any_generics = false in langchain_core and langchain-classic; warn_return_any = false in langchain-classic and langchain. These suppressions mean generic types using Any and functions returning Any are not flagged by mypy in strict mode.
Why it matters

Type errors at the model/callback boundary can result in runtime AttributeError or silent data loss in production. The callbacks manager and runnable base are the most likely areas to harbor unchecked Any propagation.

Medium Nested Bare except Exception in Error Metadata Extraction chat_models.py:114–119
Fact Nested bare except Exception blocks silently set metadata["body"] = None on any failure. The outer try/except pattern wraps response.json() with a fallback, then another try/except for response.text.
try:
    metadata["body"] = response.json()
except Exception:
    try:
        metadata["body"] = getattr(response, "text", None)
    except Exception:
        metadata["body"] = None
Medium 116 TODO/FIXME Comments Across 68 Files
Fact grep for TODO|FIXME|HACK|XXX returned 116 occurrences across 68 files. Notable concentrations in todo.py (middleware), langchain_classic agents, and partner packages.
Low Hardcoded recursion_limit=9_999 β€” No Public Override factory.py:1664
Fact config: RunnableConfig = {"recursion_limit": 9_999} is set with a comment referencing LangGraph issue #7313. Users cannot adjust the recursion limit through create_agent's public API.
Low Stale-Prone FALLBACK_MODELS_WITH_STRUCTURED_OUTPUT List factory.py:152–160
Fact ["grok", "gpt-5", "gpt-4.1", "gpt-4o", "gpt-oss"] β€” a static fallback list with no update mechanism.

3.3 Security

Medium Deserialization Default allowed_objects=None β†’ 'core' β€” Pending Warning Not Actionable load.py:408–419, 657–668, 793–804
Fact When allowed_objects=None, all three entry points emit a pending=True deprecation warning and silently default to 'core'. The 'core' allowlist includes chat models with attacker-configurable base_url.
Why it matters

'core' allows deserialization of chat models with attacker-controlled base_url, enabling SSRF. Any caller that hasn't explicitly passed allowed_objects is silently in the unsafe mode. The pending warning is not a blocking error.

Security Context

The deserialization threat model in the module docstring is exemplary β€” this finding is about the gap between the documented best practice and the current default behavior.

Medium HostExecutionPolicy β€” No Isolation Boundary middleware/_execution.py:27–48
Fact _launch_subprocess calls subprocess.Popen with the same environment as the Python process. The HostExecutionPolicy class doc states "for trusted, same-host execution" but this trust requirement is easy to violate in production agent deployments.
Why it matters

If an LLM model calls the shell tool with HostExecutionPolicy, the agent has full host access. Production deployments that don't explicitly choose DockerExecutionPolicy or CodexSandboxExecutionPolicy get no sandboxing β€” and the API doesn't make this visible.

Low Test-Environment URL Bypass in SSRF Validation _ssrf_protection.py:69–74
Fact The bypass is triggered by LANGCHAIN_ENV=local_test with a hostname matching test*server*. Any deployment that accidentally sets this env var disables SSRF protection for matching hostnames.
Low Pre-commit Hooks Not Language-Isolated .pre-commit-config.yaml:22–131
Fact All per-package hooks use language: system. Contributors without the correct packages installed will get hook failures or silently-skipped linting.

3.4 Testing

Medium No Coverage Gates in CI
Fact The CI workflows run pytest but enforce no minimum coverage threshold. pyproject.toml specifies [tool.coverage.run] but no [tool.coverage.report] fail_under.
Why it matters

Coverage can regress silently on new features. The middleware implementations have good breadth, but there is no CI gate to detect future regressions.

Medium test_react_agent.py β€” All Ruff Rules Disabled libs/langchain_v1/pyproject.toml:168
Fact "tests/unit_tests/agents/test_react_agent.py" = ["ALL"] β€” the entire ruff ruleset is disabled for a test file covering a core public API.
Low VCR Cassettes Not Validated for Staleness
Judgment VCR cassettes can become stale as API schemas evolve. There is no mechanism to detect or alert when cassettes might no longer match live API behavior.

3.5 Performance

Medium _ip_in_blocked_networks β€” Linear Scan on Every URL Validation _policy.py:138–183
Fact The code's own comment at line 143–144 notes the potential need for memoisation. The current implementation iterates through all 15 IPv4 + 8 IPv6 blocked networks for every IP check.
Low _default_class_paths_cache β€” Non-Thread-Safe Module-Level Dict load.py:184
Fact Module-level mutable dict used as a cache. Population is not protected by a lock. Race conditions during initialization are theoretically possible.

3.6 Dependencies

Medium CVE Constraints Only Enforced via uv libs/core/pyproject.toml:82, libs/langchain_v1/pyproject.toml:96
Fact constraint-dependencies = ["pygments>=2.20.0"] # CVE-2026-4539 and urllib3>=2.6.3 are specified as uv constraint-dependencies. Direct pip install users will not receive these CVE mitigations.
Low langchain-classic Allows SQLAlchemy 1.4 (EOL) libs/langchain/pyproject.toml:31
Fact "SQLAlchemy>=1.4.0,<3.0.0" in the legacy package allows SQLAlchemy 1.4 which reached end-of-life in 2023.

3.7 Developer Experience & Operations

Medium langchain_v1 Missing from Pre-commit .pre-commit-config.yaml
Fact The pre-commit config covers core, langchain (classic), standard-tests, text-splitters, and partner packages β€” but NOT langchain_v1 (the active langchain package). Contributors modifying libs/langchain_v1/ will not have format/lint hooks triggered locally.
Low Manual Release Process via workflow_dispatch .github/workflows/_release.yml
Fact Releases are manually triggered. No automated release-on-tag or version-bump automation.

3.8 Documentation

Medium README Quickstart References Non-Existent Model gpt-5.4 README.md:40
Fact model = init_chat_model("openai:gpt-5.4") β€” gpt-5.4 is not a valid OpenAI model identifier. New users following the quickstart will hit an immediate error.
Low create_agent Docstring Uses Date-Specific Model ID factory.py:843
Fact model="anthropic:claude-sonnet-4-5-20250929" β€” a date-suffixed model ID that will become outdated.

Strengths

Excellent Security Engineering in langchain_core._security

The SSRF policy module (SSRF policy, DNS-aware validation, NAT64 handling, cloud metadata endpoint blocking) is production-quality and comprehensively tested. The threat model in load.py is among the best-documented deserialization security documentation in an open-source Python project.

Strong CI Infrastructure

Change-detection matrix CI runs only affected packages. Pre-commit enforces format+lint locally. VCR cassettes enable integration test replay without API keys. Pydantic version compatibility matrix. All Actions pinned to full commit SHAs (supply-chain protection).

Thorough Type Coverage Commitment

mypy strict enforced on all packages with the pydantic.mypy plugin. ruff selects ALL rules with a small, documented ignore list. Minor suppressions are few and explicitly commented.

Comprehensive Middleware Test Suite

~40 test files covering every middleware implementation, including sync/async variants, composition, state update, edge cases, and type-checking tests.

Excellent Deprecation Machinery

The _api/ module provides @deprecated, @beta, LangChainDeprecationWarning, and LangChainBetaWarning with stacklevel control used consistently throughout.

Allowlist-Based Deserialization with Injection-Escape

The Reviver class + _block_jinja2_templates + _is_escaped_dict provides layered defense against deserialization attacks. The threat model is clearly documented.

blockbuster in Test Dependencies

Inclusion of blockbuster (detects blocking calls in async paths) in test deps shows active attention to async correctness β€” a common and hard-to-detect class of bugs.

Improvement Strategy

Five themes synthesizing the audit findings into a strategic approach with target states, principles, and explicit trade-offs.

Theme 1 β€” Decompose the God-Function

Root cause: create_agent() accumulated all agent-building logic incrementally over time. Each new feature (middleware composition, structured output, recursion limit override, LangSmith metadata) was appended rather than extracted.

Target state: create_agent becomes a thin orchestrator (~100 lines) that calls focused builder functions: _build_middleware_stack(), _build_tool_node(), _build_graph_edges(), _build_model_node(). Each builder is independently testable and documented.

Principles: Single Responsibility, Testability First, Progressive Decomposition.

Re-architecting the entire agent framework or changing the public API signature is explicitly NOT in scope β€” too risky with active users and no behavioral benefit.
create_agent body ≀ 200 lines. Each extracted builder has dedicated unit tests. All existing agent tests pass unchanged.

Theme 2 β€” Eliminate Async/Sync Code Duplication

Root cause: Python's async/await syntax requires explicit duplication for async variants. The composition helpers were written independently rather than with a shared backbone.

Target state: A generic _chain_handlers_generic(handlers, make_inner_caller) that works for both sync and async by parameterizing the execution behavior. Four functions reduce to two or one.

Principles: DRY, reduce surface area for divergence.

Full async abstraction of all middleware paths is too invasive for a behavioral-neutral refactor β€” scope to composition helpers only.
Duplicate composition helpers consolidated. No behavioral change detectable by existing tests.

Theme 3 β€” Harden Deserialization Defaults

Root cause: Backward compatibility prevented an immediate switch from None β†’ safe default. The pending=True warning was a stepping stone that was not promoted to a breaking change.

Target state: load()/loads() default to 'messages' (or raise ValueError) when allowed_objects is not specified. No more pending warning β€” the behavior change is explicit and documented.

Principles: Secure by Default, Fail Fast.

Breaking change for code relying on the implicit 'core' default. Requires a major version bump or a deprecation timeline announcement. Partner packages that call load() internally must be updated first.
No warn_deprecated call reachable via allowed_objects=None. Callers must pass an explicit value.

Theme 4 β€” Enforce Coverage Gates

Root cause: The project has a strong testing culture but no enforcement mechanism to prevent coverage regressions.

Target state: CI fails if coverage drops below 80% for langchain_core and 70% for langchain_v1. Thresholds are documented and reviewed quarterly.

Principles: Automate Quality Gates, Trust but Verify.

Coverage numbers can be misleading with complex mocks. Set conservative thresholds initially and adjust upward after establishing baselines. Do not let the gate become a "gaming" target.
fail_under set in coverage config. CI step fails and reports which modules fell below threshold.

Theme 5 β€” Pre-commit Parity for langchain_v1

Root cause: The pre-commit config predates langchain_v1's emergence as the active package or was simply not updated.

Target state: langchain_v1 has a format and lint pre-commit hook identical to other packages, plus a version-consistency check.

Principles: Automate Consistency, Shift Left.

Pre-commit hooks are language: system β€” contributors must have the right venv. This limitation exists for all packages and is documented in CLAUDE.md.
Pre-commit config includes a langchain-v1 entry. git commit on any libs/langchain_v1/ file triggers format+lint.

What NOT to Fix Now

ProblemWhy Defer
Refactor CallbackManager mega-fileHigh blast radius; not a correctness issue; would require extensive regression testing across all callback paths
Enable warn_return_any in langchain_v1Would reveal many existing type gaps; best done after the god-function decomposition stabilizes the codebase
Replace VCR cassettes with full integration testsWould require API keys in CI and increase flakiness; cassettes are a reasonable trade-off for a framework this size
Automated release pipelineManual releases provide a natural human review gate; automation adds complexity without clear ROI for this release cadence
Migrate langchain-classic to langchain_v1 patternslangchain-classic is frozen; migration effort exceeds value for a deprecated package

Task Plan

Quick Wins β€” High-impact, S-effort tasks that can be done immediately

QW-1  Add langchain_v1 pre-commit hook
S Quick Win

One YAML stanza in .pre-commit-config.yaml. Immediate CI parity for the active package.

Risk
Low
QW-2  Fix README quickstart model reference
S Quick Win

Replace "openai:gpt-5.4" with a current valid model ID. Immediate fix for new-user experience.

Risk
None
QW-3  Add recursion_limit param to create_agent
S Quick Win

Add recursion_limit: int = 9_999 keyword-only arg to create_agent. Use it instead of the hardcoded value.

Risk
Low
QW-4  Add fail_under = 70 to coverage config
S Quick Win

Add [tool.coverage.report] fail_under = 70 to langchain_v1 pyproject.toml after establishing baseline.

Risk
Low
QW-5  Promote loads()/load() warning from pending to active
S Quick Win

Change pending=True to pending=False in the three warn_deprecated calls in load.py. Callers will see a real deprecation warning, not a pending one.

Risk
Medium

Milestone 0 β€” Safety Net

Milestone 0 Things that must be in place before safe refactoring: key path tests, CI gates, baselines.
M0-1  Add Snapshot Tests for create_agent Graph Structure MRisk: Low

Add tests that serialize the compiled graph structure (edges, nodes, order) to syrupy snapshots for common configurations (no middleware, one middleware, structured output). These snapshots become the regression baseline before any refactoring of create_agent.

Affected Files

libs/langchain_v1/tests/unit_tests/agents/test_create_agent_graph_structure.py (new)

Acceptance Criteria
  • Snapshots exist and pass in CI for 5 representative configurations.
  • Any change to graph structure (node names, edge routing) triggers snapshot review prompt.
  • Snapshots are committed to the repository.
Dependencies
None
M0-2  Enable Coverage Reporting in CI (Baseline) SRisk: None

Add --cov --cov-report=xml to the pytest invocation in the langchain_v1 Makefile. Do not enforce a threshold yet β€” just collect and surface data.

Affected Files

libs/langchain_v1/Makefile

Acceptance Criteria
  • CI uploads a coverage XML artifact.
  • Coverage % visible in PR status checks or comments.
Dependencies
None

Milestone 1 β€” Critical Fixes

Milestone 1 Security issues and correctness problems.
M1-1  Promote Deserialization Default to Breaking Change MRisk: Medium

Change allowed_objects=None behavior from pending-deprecation + fallback to 'core', to either an immediate ValueError or a safe default of 'messages'. Document migration path in release notes.

Affected Files

libs/core/langchain_core/load/load.py (lines 408, 657, 793)

Acceptance Criteria
  • Calling load(data) without allowed_objects raises immediately or defaults safely to 'messages'.
  • No more silent 'core' behavior via omission.
  • All internal callers updated to pass explicit values.
  • Migration guide in CHANGELOG.
Dependencies
None (but coordinate with partner packages)

Implementation Sketch

  1. Grep all callers of load( and loads( without allowed_objects across the entire monorepo.
  2. Update all internal callers to pass explicit allowed_objects='core' or the appropriate restrictive value.
  3. Change the if allowed_objects is None: block in Reviver.__init__, loads(), and load() to raise ValueError with a clear migration message.
  4. Alternatively: default to 'messages' with a one-cycle active (not pending) deprecation warning.
  5. Update docs and changelog.

Pitfall: External partner packages that call load() internally will need updates. Open a tracking issue before merging to coordinate ecosystem-wide.

M1-2  Document and Gate HostExecutionPolicy Usage MRisk: Low

Add a prominent DANGER docstring to HostExecutionPolicy stating it provides no sandboxing. Add an explicit opt-in parameter (acknowledge_unsafe_execution: bool = False) that must be set to True; raise ValueError or emit LangChainBetaWarning if not set.

Affected Files

libs/langchain_v1/langchain/agents/middleware/_execution.py

Acceptance Criteria
  • Using HostExecutionPolicy() without an explicit opt-in kwarg emits a warning.
  • Docstring clearly explains the sandboxing alternatives (DockerExecutionPolicy, CodexSandboxExecutionPolicy).
  • All existing tests updated to pass the opt-in kwarg.
Dependencies
None
M1-3  Fix README Quickstart Model Reference SRisk: None

Replace "openai:gpt-5.4" with a current, valid OpenAI model identifier. Also update factory.py:843 docstring example model.

Affected Files

README.md:40, libs/langchain_v1/langchain/agents/factory.py:843

Acceptance Criteria
  • The quickstart code example uses a currently valid model ID.
  • Running the example does not produce a "model not found" error.
Dependencies
None

Milestone 2 β€” High-Leverage Improvements

Milestone 2 Changes that make all subsequent work easier and improve long-term maintainability.
M2-1  Extract Builder Sub-Functions from create_agent LRisk: High

Decompose factory.py:create_agent by extracting: _compose_middleware_stack(), _setup_tool_node(), _configure_structured_output(), _wire_graph_edges(). Keep create_agent as the public entry point.

Affected Files

libs/langchain_v1/langchain/agents/factory.py

Acceptance Criteria
  • create_agent body ≀ 200 lines.
  • Each extracted builder function has dedicated unit tests.
  • All existing agent tests (graph structure snapshots from M0-1 + behavior tests) pass unchanged.
  • Public API signature of create_agent is unchanged.
Dependencies
M0-1 (snapshots as regression guard)

Implementation Sketch

  1. Start with _setup_tool_node(tools, middleware, ...) β€” most self-contained block (~lines 939–966).
  2. Extract _compose_middleware_stack(middleware, ...) for the 6 middleware filter lists and composition chains (lines 972–1035).
  3. Extract _configure_structured_output(response_format, model, tools) for the strategy derivation and structured_output_tools dict (lines 868–893).
  4. Extract _wire_graph_edges(graph, ...) for the conditional edge construction (lines 1503–1660).
  5. Keep inner closures (model_node, _execute_model_sync, amodel_node) inside create_agent initially β€” they close over too many outer variables.

Pitfall: structured_output_tools dict is shared between setup and inner closures. Must be returned from the builder and passed explicitly. Identify all free variables in each inner function before extracting.

M2-2  Deduplicate Sync/Async Composition Helpers MRisk: Medium

Create a shared backbone for _chain_model_call_handlers / _chain_async_model_call_handlers and _chain_tool_call_wrappers / _chain_async_tool_call_wrappers.

Affected Files

libs/langchain_v1/langchain/agents/factory.py:221–694

Acceptance Criteria
  • Duplicate composition logic consolidated into shared helpers.
  • All composition tests pass with identical semantics.
  • No behavioral change detectable by existing tests.
Dependencies
M0-1
M2-3  Add langchain_v1 Pre-commit Hook SRisk: Low

Add a hook entry to .pre-commit-config.yaml for libs/langchain_v1/ that runs make -C libs/langchain_v1 format lint. Also add a version-consistency check.

Affected Files

.pre-commit-config.yaml

Acceptance Criteria
  • pre-commit run --all-files includes langchain_v1 format+lint.
  • Contributors modifying libs/langchain_v1/ get local hook feedback on commit.
Dependencies
None

Implementation Sketch

- id: langchain-v1
  name: format and lint langchain_v1
  language: system
  entry: make -C libs/langchain_v1 format lint
  files: ^libs/langchain_v1/
  pass_filenames: false
- id: langchain-v1-version
  name: check langchain_v1 version consistency
  language: system
  entry: make -C libs/langchain_v1 check_version
  files: ^libs/langchain_v1/(pyproject\.toml|langchain/__init__\.py)$
  pass_filenames: false
M2-4  Add Coverage Gates to CI MRisk: Low

Add fail_under = 75 to langchain_v1 coverage config. Add fail_under = 80 to langchain_core. Update CI to report and enforce thresholds.

Affected Files

libs/langchain_v1/pyproject.toml, libs/core/pyproject.toml, libs/langchain_v1/Makefile, libs/core/Makefile

Acceptance Criteria
  • CI fails if coverage drops below threshold.
  • make test also reports coverage summary.
  • Thresholds documented in a comment explaining the rationale.
Dependencies
M0-2 (establish baseline first)
M2-5  Enable disallow_any_generics in langchain_core LRisk: Medium

Remove disallow_any_generics = false from langchain_core's pyproject.toml and fix all resulting mypy errors.

Affected Files

libs/core/pyproject.toml:95, various langchain_core source files

Acceptance Criteria
  • make lint passes with disallow_any_generics = true.
  • No new Any-typed generics introduced.
Dependencies
None (but start with langchain_core before tackling langchain-classic)

Milestone 3 β€” Quality & Polish

Milestone 3 Remaining medium/low-priority items worth addressing.
M3-1  Thread-Safe _default_class_paths_cache SRisk: Very Low

Add a threading.Lock() around the _default_class_paths_cache population in _get_default_allowed_class_paths.

Affected Files

libs/core/langchain_core/load/load.py:187–217

Acceptance Criteria
  • Cache population is protected by a lock. No functional change in single-threaded use.
Dependencies
None
M3-2  Fix test_react_agent.py β€” Remove ["ALL"] Ruff Suppression MRisk: Low

Re-enable ruff linting on tests/unit_tests/agents/test_react_agent.py and fix any violations found.

Affected Files

libs/langchain_v1/pyproject.toml:168, libs/langchain_v1/tests/unit_tests/agents/test_react_agent.py

Acceptance Criteria
  • File passes ruff with at most specific per-file-ignores (not ["ALL"]).
M3-3  Update or Document FALLBACK_MODELS_WITH_STRUCTURED_OUTPUT SRisk: Low

Either document the list as intentionally curated with a review process, or replace with model profile lookups.

Affected Files

libs/langchain_v1/langchain/agents/factory.py:152–160

Acceptance Criteria
  • The fallback list has a clear maintenance strategy documented in a comment.
M3-4  Expose recursion_limit in create_agent Public API SRisk: Low

Add recursion_limit: int = 9_999 as a keyword-only argument to create_agent. Use it instead of the hardcoded value.

Affected Files

libs/langchain_v1/langchain/agents/factory.py:697 (signature), :1664 (usage)

Acceptance Criteria
  • Users can pass recursion_limit=25 to create_agent.
  • LangGraph issue reference preserved in a comment.
  • Default behavior unchanged.
M3-5  VCR Cassette Staleness Detection MRisk: Low

Add a CI step that runs cassette-backed tests with record_mode=none. Flag cassettes older than 90 days for review.

Affected Files

.github/workflows/_test_vcr.yml

Acceptance Criteria
  • CI fails if a cassette does not exist when expected in playback mode.
  • A scheduled report lists cassettes older than 90 days.