LangChain Monorepo — Technical Audit Report

Scope: langchain/ Python monorepo (langchain-core, langchain (v1), langchain-classic, text-splitters, standard-tests, model-profiles, 16 partner packages). Method: Evidence-based static review. All file:line references are to the repository tree rooted at the directory containing this file. Where something could not be verified statically (e.g. live coverage %, runtime behavior), it is labeled as such. Audit date: 2026-06-17.

1. Executive Summary

Overall health grade: A− (strong, mature, production-grade project with a small number of real but bounded security/design risks).

LangChain is a large, actively maintained, MIT-licensed Python monorepo that is the de-facto standard framework for building LLM applications and agents. The engineering culture is unusually disciplined for an OSS project of this size: ruff is configured with select = ["ALL"], mypy runs in strict mode, GitHub Actions are pinned to full commit SHAs, CI is change-scoped for speed, dependency ranges are bounded, and there is a dedicated _security package with SSRF protection and a documented usedforsecurity=False posture around SHA-1. The codebase is well-documented (Google-style docstrings enforced) and has a deep test footprint (167 test files in core, 90 in langchain-v1). The grade is held just below A because of a handful of architectural and security items that matter at this project's scale: an SSRF guard that is inherently vulnerable to DNS-rebinding (time-of-check/time-of-use), an environment-variable-driven validation bypass that is broader than its docstring claims, a host-shell agent tool that defaults to full host access, and several genuine God-files (notably runnables/base.py at 6,574 lines).

Top 3 risks

SSRF protection is TOCTOU-vulnerable (DNS rebinding). validate_safe_url / validate_url resolve DNS at validation time, but the real HTTP request resolves DNS again later — an attacker-controlled DNS record can pass validation then re-point to a private IP. (libs/core/langchain_core/_security/_ssrf_protection.py:86, libs/core/langchain_core/_security/_policy.py:259)
Environment-variable SSRF bypass is broader than documented. _effective_allowed_hosts allows localhost/testserver for any LANGCHAIN_ENV starting with "local", while validate_safe_url's own bypass and docstrings describe a narrower local_test condition. (libs/core/langchain_core/_security/_policy.py:231, _ssrf_protection.py:69)
ShellToolMiddleware defaults to HostExecutionPolicy (full host shell, redaction is post-execution only). Safe defaults matter because agents execute model-chosen commands. (libs/langchain_v1/langchain/agents/middleware/shell_tool.py:503, :565, :538)

Top 3 opportunities

Adopt connection-time IP pinning / a custom transport for SSRF to close the DNS-rebinding gap (a _transport.py already exists — wire validation into the actual socket connect).
Decompose the God-files (runnables/base.py 6,574 lines; callbacks/manager.py 2,792; language_models/chat_models.py 2,714) to improve navigability, review velocity, and import cost.
Tighten the security defaults & make them explicit (opt-in host shell, narrow the env bypass, default key_encoder documentation) — high trust-impact, low effort.

2. Repository Map (Phase 1)

Purpose & maturity

Purpose: "The agent engineering platform" — a framework for building agents and LLM-powered applications with a standard interface across model providers, embeddings, vector stores, retrievers, and tools.
Intended users: Python application developers building LLM/agent apps; partner integrators.
Maturity: Production library. pyproject.toml classifiers declare Development Status :: 5 - Production/Stable. langchain-core==1.4.3, langchain==1.3.6. (libs/core/pyproject.toml:11, :24; libs/langchain_v1/pyproject.toml:24)

Tech stack

Area	Choice
Language	Python `>=3.10,<4.0` (3.10–3.14 classifiers)
Packaging/build	`uv` workspace + `hatchling` build backend; per-package `pyproject.toml` + `uv.lock`
Core runtime deps (core)	`pydantic>=2.7.4,<3`, `langsmith`, `tenacity`, `jsonpatch`, `PyYAML`, `typing-extensions`, `packaging`, `uuid-utils`, `langchain-protocol`
Agents	`langgraph>=1.2.4,<1.3` (langchain v1 depends on langgraph)
Lint/format	`ruff` (`select = ["ALL"]`)
Types	`mypy` `strict = true`, pydantic mypy plugin
Tests	`pytest`, `pytest-asyncio` (auto), `syrupy` snapshots, `pytest-socket` (no-network enforcement), `pytest-xdist`, `blockbuster`, `pytest-benchmark`/`codspeed`
CI/CD	GitHub Actions (27 workflows), change-scoped matrix, SHA-pinned actions, manual release workflow

Architecture sketch

langchain-protocol (external)        langgraph (external, 1.2.x)
        │                                   │
        ▼                                   ▼
  langchain-core  ──────────────────►  langchain (v1, public)
  (base abstractions:                  (init_chat_model, create_agent,
   Runnables, messages,                 middleware, tools, structured output)
   tools, callbacks,                          │
   _security, indexing)                       │  optional extras
        ▲                                     ▼
        │                       partners/* (openai, anthropic, ollama, groq,
  text-splitters                 mistralai, huggingface, qdrant, chroma, exa,
  standard-tests                 nomic, fireworks, deepseek, openrouter,
  model-profiles                 perplexity, xai)
        │
        └──► langchain-classic (libs/langchain) — legacy, maintenance-only

Key directories (one line each)

Path	Description
`libs/core/langchain_core/`	Base abstractions: Runnables, messages, tools, callbacks, tracers, indexing, `_security`.
`libs/langchain_v1/langchain/`	Actively maintained public `langchain` package: `init_chat_model`, agents, middleware, tools.
`libs/langchain/langchain_classic/`	Legacy `langchain-classic` package (maintenance only, no new features).
`libs/partners/*/`	16 first-party provider integrations, each its own package.
`libs/text-splitters/`	Document chunking utilities.
`libs/standard-tests/`	Shared standardized test suite for partner integrations.
`libs/model-profiles/`	Model capability profile data + `langchain-profiles` CLI.
`.github/workflows/`	27 CI/CD workflows (lint, test, release, labeling, codspeed perf).

What surprised me (positively & otherwise)

A dedicated _security package with a real, policy-driven SSRF implementation (IPv4/IPv6 blocklists, cloud-metadata IPs, NAT64-embedded-IPv4 extraction, k8s .svc.cluster.local blocking). This is far more than most OSS libraries ship. (libs/core/langchain_core/_security/_policy.py)
ruff select = ["ALL"] + mypy strict across the monorepo — an aggressive quality bar that is rare at this scale.
A LANGCHAIN_ENV-based validation bypass baked into the security policy (_policy.py:231) — convenient for tests but a security-relevant surprise.
AGENTS.md and CLAUDE.md are byte-identical (318 lines each) — duplicated guidance rather than one file referencing the other.
The directory layout has a doubled root (langchain/langchain/) and the repo is a shallow git clone (.git/shallow present), so full history-based analysis is not possible here.

3. Audit Report (Phase 2)

Findings are grouped by dimension and sorted by severity. Each is tagged [Fact] (directly verifiable in a file) or [Judgment] (informed assessment).

Security

S1 — SSRF validation is TOCTOU / DNS-rebinding vulnerable — High [Fact + Judgment]

What: validate_safe_url resolves the hostname via socket.getaddrinfo and validates the returned IPs, then returns the URL string. The actual HTTP request happens later in the caller and re-resolves DNS. An attacker controlling the DNS record can return a public IP during validation and a private/metadata IP at fetch time.
Where: libs/core/langchain_core/_security/_ssrf_protection.py:86–98; async equivalent libs/core/langchain_core/_security/_policy.py:259–268.
Why it matters: The function's stated purpose is to "prevent SSRF" (_ssrf_protection.py:49). Validating the URL but not pinning the validated IP at connection time means the guarantee does not hold against an active attacker. Consequences: access to cloud metadata (credentials) and internal services.
Severity: High.

S2 — Env-driven SSRF bypass is broader than its docstring — Medium [Fact]

What: _effective_allowed_hosts adds localhost and testserver to the allow-list whenever LANGCHAIN_ENV starts with "local" (e.g. local, localdev, local_anything). Separately, validate_safe_url has its own bypass requiring LANGCHAIN_ENV == "local_test" AND hostname test...server.
Where: libs/core/langchain_core/_security/_policy.py:231; libs/core/langchain_core/_security/_ssrf_protection.py:69–74.
Why it matters: Two different bypass conditions for the same subsystem are confusing and the _policy.py one is wider than a reader of validate_safe_url would expect. If an environment is misconfigured (or an attacker can influence env), localhost SSRF is silently re-enabled. The bypass is undocumented in the public docstring.
Severity: Medium.

S3 — ShellToolMiddleware defaults to full host shell access — High [Fact + Judgment]

What: When no execution_policy is supplied, the middleware uses HostExecutionPolicy() — the model can run arbitrary commands on the host. Redaction rules are applied after execution and explicitly "do not prevent exfiltration of secrets" under host policy.
Where: libs/langchain_v1/langchain/agents/middleware/shell_tool.py:503 (class docstring), :565 (default), :538 (warning).
Why it matters: This is opt-out rather than opt-in for the most dangerous capability an agent can have. The risk is partially mitigated by documentation, but a "safe by default" posture (e.g. require an explicit policy, or default to a sandbox when available) is the safer design.
Severity: High (by impact; it is an intentional, documented design choice, so partly a Judgment on default-selection).

S4 — SHA-1 is the default key_encoder for the indexing API — Low [Fact]

What: index/aindex default key_encoder="sha1". A one-time UserWarning is emitted and usedforsecurity=False is set, but SHA-1 remains the default fingerprint algorithm.
Where: libs/core/langchain_core/indexing/api.py:307, :646, :46, :55–70.
Why it matters: SHA-1 is not collision-resistant; the code itself warns of this. For document de-duplication this is mostly a correctness/robustness concern (deliberate collisions could cause documents to be treated as identical). Defaulting to blake2b/sha256 would be safer, but changing a default is a breaking change — hence Low + documented.
Severity: Low.

S5 — Proactive dependency-CVE pinning is present (positive, but note maintenance burden) — Low [Fact]

What: constraint-dependencies pin pygments>=2.20.0 # CVE-2026-4539 (core) and urllib3>=2.6.3, pygments>=2.20.0 (langchain v1).
Where: libs/core/pyproject.toml:82; libs/langchain_v1/pyproject.toml:96.
Why it matters: Demonstrates active CVE tracking. The minor risk is that hand-maintained constraint comments can drift; these belong in a tracked SCA process. Largely a strength.
Severity: Low.

Architecture & Design

A1 — God-file: runnables/base.py at 6,574 lines — Medium [Fact + Judgment]

What: The core Runnable abstraction file is 6,574 lines; callbacks/manager.py 2,792; language_models/chat_models.py 2,714; messages/utils.py 2,400.
Where: libs/core/langchain_core/runnables/base.py (6574 LOC).
Why it matters: Single very large modules raise the cost of review, increase merge-conflict surface, slow IDE/type-checker performance, and inflate import time. Runnable is the most central abstraction, so the blast radius of any change here is large.
Severity: Medium (it is cohesive and stable, so this is partly Judgment).

A2 — init_chat_model provider registry is a hardcoded God-dict — Low [Fact]

What: _BUILTIN_PROVIDERS hardcodes 28 providers with import paths/class names/creator lambdas, plus a parallel _attempt_infer_model_provider prefix table and a docstring list — three sources of the same truth that must be kept in sync.
Where: libs/langchain_v1/langchain/chat_models/base.py:38–100, :521–594, :207–309 (docstring).
Why it matters: Adding/renaming a provider requires editing three places; drift produces confusing inference behavior. Low because it is well-contained and covered by the CLAUDE.md "FOR CONTRIBUTORS" note.
Severity: Low.

A3 — Three coexisting langchain packages (core / v1 / classic) — Low [Judgment]

What: libs/core (langchain-core), libs/langchain_v1 (langchain), libs/langchain (langchain-classic) coexist; CLAUDE.md labels classic "legacy, no new features."
Where: libs/langchain/, libs/langchain_v1/, CLAUDE.md:16–17.
Why it matters: Necessary for a major-version migration, but newcomers can edit the wrong package. The directory name langchain_v1 vs published name langchain is a known footgun.
Severity: Low.

Code Quality

Q1 — Broad-exception handling is intentionally allowed and used — Medium [Fact]

What: ruff ignores the BLE (blind-except) rule monorepo-wide; 28 except (Base)Exception/bare-pattern occurrences exist across 9 files in langchain_v1/langchain, e.g. _create_resources catches BaseException (shell_tool.py:716, :775).
Where: libs/core/pyproject.toml:114 and libs/langchain_v1/pyproject.toml:145 ("BLE" ignored); occurrences in factory.py, structured_output.py, model_fallback.py, summarization.py, types.py, shell_tool.py, etc.
Why it matters: Catching BaseException can swallow KeyboardInterrupt/SystemExit and mask real errors. Several uses are legitimate (resource cleanup re-raises), but disabling the lint rule globally removes the guardrail that would force each case to be justified.
Severity: Medium.

Q2 — mypy strictness is partially disabled with TODO markers — Low [Fact]

What: core sets disallow_any_generics = false # TODO: activate for 'strict' checking; langchain v1 sets warn_return_any = false # TODO. v1 also excludes several agent test trees from type checking.
Where: libs/core/pyproject.toml:94–95; libs/langchain_v1/pyproject.toml:112–120.
Why it matters: These are honest, tracked gaps in an otherwise strict config; they leave some Any-leakage unchecked in central code.
Severity: Low.

Q3 — ANN401 (no Any in annotations) globally ignored — Low [Fact]

What: Any annotations are pervasive (e.g. _ConfigurableModel.invoke(... ) -> Any, **kwargs: Any). The ANN401 rule is in the ignore list.
Where: libs/core/pyproject.toml:113; libs/langchain_v1/pyproject.toml:144; usage throughout chat_models/base.py.
Why it matters: Any is sometimes unavoidable at framework boundaries (pluggable kwargs), but blanket-ignoring the rule means accidental Any is invisible. Low — largely a pragmatic framework tradeoff.
Severity: Low.

Testing

T1 — Substantial unit-test footprint with network isolation enforced — Strength/Low [Fact]

What: 167 test files in libs/core, 90 in libs/langchain_v1; pytest-socket blocks network in unit tests; snapshot testing via syrupy; blockbuster detects blocking calls in async paths.
Where: test trees under libs/*/tests; libs/core/pyproject.toml:61–78, :146–154.
Why it matters: Strong baseline. The one caveat: actual coverage % could not be measured statically here, so coverage-gap claims are deferred.
Severity: Low (informational).

T2 — Whole agent test trees excluded from type checking — Medium [Fact]

What: mypy excludes tests/unit_tests/agents/middleware/, .../specifications/, and test_*.py under agents; ruff also relaxes ANN/ARG for tests/unit_tests/agents/* and disables ALL rules for test_react_agent.py.
Where: libs/langchain_v1/pyproject.toml:112–117, :161–168.
Why it matters: The agents subsystem is the newest and highest-churn area; excluding its tests from type/lint checks reduces the safety net exactly where it is most needed.
Severity: Medium.

Performance

P1 — Linear blocklist scans per IP in the SSRF hot path — Low [Fact]

What: _ip_in_blocked_networks iterates all blocked networks for each address; the code itself notes "if profiling shows this is a hot path, consider memoising".
Where: libs/core/langchain_core/_security/_policy.py:138–183 (note at :143).
Why it matters: Negligible for typical request volumes; only relevant if used in tight retrieval loops. Documented and bounded.
Severity: Low.

P2 — Per-line encode("utf-8") and list-append accumulation in shell output collection — Low [Judgment]

What: _collect_output encodes every line to measure bytes and appends to a Python list; for very chatty commands this is O(lines) allocations.
Where: libs/langchain_v1/langchain/agents/middleware/shell_tool.py:277–298.
Why it matters: Output is already truncated by line/byte limits, so unbounded growth is mitigated; allocation overhead is minor.
Severity: Low.

Dependencies

D1 — Bounded version ranges + per-package lockfiles — Strength/Low [Fact]

What: All runtime deps use bounded ranges (e.g. pydantic>=2.7.4,<3, langgraph>=1.2.4,<1.3); each package ships uv.lock; dependabot.yml present.
Where: libs/core/pyproject.toml:26–36; libs/langchain_v1/pyproject.toml:26–30; libs/*/uv.lock.
Why it matters: Reproducible builds and controlled upgrades. Strong.
Severity: Low (informational).

Developer Experience & Operations

O1 — Pre-commit lint/format hooks omit several partner packages — Medium [Fact]

What: .pre-commit-config.yaml defines per-package format lint hooks for core, langchain, standard-tests, text-splitters, anthropic, chroma, exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai, qdrant — but deepseek, openrouter, perplexity, and xai (which exist under libs/partners/) have no corresponding local hook.
Where: .pre-commit-config.yaml:48–113; partner dirs from libs/partners/ listing.
Why it matters: Contributors editing those partner packages get no local format/lint enforcement; they rely solely on CI. Inconsistent DX and a drift risk as packages are added.
Severity: Medium.

O2 — AGENTS.md and CLAUDE.md are duplicated verbatim — Low [Fact]

What: The two files are identical 318-line copies of the same guidance.
Where: AGENTS.md, CLAUDE.md.
Why it matters: Two copies will drift; one should be the source of truth and the other a pointer (or a symlink / generated file checked in CI). There is a check_agents_sync.yml workflow, which suggests sync is enforced — but maintaining two full copies is still heavier than necessary.
Severity: Low.

O3 — Mature, security-conscious CI — Strength/Low [Fact]

What: 27 workflows; change-scoped test matrix; GitHub Actions pinned to full commit SHAs (e.g. actions/checkout@de0fac2e…); least-privilege permissions: contents: read; concurrency cancellation.
Where: .github/workflows/check_diffs.yml:33–56; CLAUDE.md:310–312 (SHA-pin policy).
Why it matters: Strong supply-chain hygiene.
Severity: Low (informational).

Documentation

DOC1 — Docstrings are extensive and enforced — Strength [Fact]

What: Google-style docstrings enforced via ruff pydocstyle; init_chat_model has a multi-hundred-line docstring with examples; security functions document Raises.
Where: libs/*/pyproject.toml pydocstyle config; chat_models/base.py:218–474.
Severity: Strength.

DOC2 — The SSRF bypass behavior is not surfaced in the public docstring — Low [Fact]

What: validate_safe_url's docstring describes blocking private/metadata but not the LANGCHAIN_ENV test bypass (_ssrf_protection.py:69) nor the _policy.py:231 localhost allowance.
Where: _ssrf_protection.py:47–63 vs :69–74; _policy.py:231.
Severity: Low.

Strengths (preserve these)

Dedicated, policy-based SSRF protection with IPv6/NAT64/cloud-metadata awareness — rare and valuable. (_policy.py)
ruff ALL + mypy strict monorepo-wide quality bar.
SHA-pinned GitHub Actions, least-privilege permissions, change-scoped CI.
Bounded dependency ranges + per-package lockfiles and active CVE pinning.
Deep unit-test footprint with network isolation (pytest-socket) and async-blocking detection (blockbuster).
Strong, enforced documentation standards and contributor guidance (CLAUDE.md/AGENTS.md).
Clean layered architecture (core → langchain → partners) with a deliberate classic/v1 split for migration.

4. Improvement Strategy (Phase 3)

Theme 1 — "Security guarantees should be end-to-end, not point-in-time"

Explains: S1 (TOCTOU), S2 (env bypass), DOC2.
Target state: SSRF validation pins the validated IP through to the actual socket connect (no second, unvalidated DNS resolution), the env bypass has exactly one well-documented condition, and all bypasses are documented in the public docstring.
Principles: Time-of-check must equal time-of-use; least surprise; document security-relevant escape hatches.

Theme 2 — "Dangerous capabilities should be safe-by-default and opt-in"

Explains: S3 (host shell default).
Target state: The most dangerous middleware (host shell) requires an explicit execution policy or defaults to the strongest available sandbox; the host policy is a conscious opt-in.
Principles: Secure defaults; principle of least privilege for agent tools.

Theme 3 — "Decompose the central God-files to protect velocity"

Explains: A1, partially A2.
Target state: runnables/base.py and the other 2k+-line core modules are split along cohesive seams (sync/async, declarative ops, schema) behind a stable public surface, with no public API changes.
Principles: High cohesion / low coupling; keep public __init__ exports stable (CLAUDE.md's stable-interface rule).

Theme 4 — "Make the quality net uniform across the monorepo"

Explains: O1 (missing pre-commit hooks), T2 (agent tests excluded from typing), Q2/Q3 (strictness TODOs).
Target state: Every package present in libs/partners/ has a pre-commit hook; agent tests are type-checked; strictness TODOs are burned down or ticketed.
Principles: Consistency reduces cognitive load and drift; the safety net should be strongest in the highest-churn area (agents).

Trade-offs — what NOT to fix now (and why)

Do not change key_encoder default from SHA-1 (S4) — it is a breaking change for existing indexes; the warning + usedforsecurity=False are adequate for now. Revisit at the next major version.
Do not re-enable BLE/ANN401 globally overnight (Q1/Q3) — would generate large, low-signal churn across a 1.4M+ token codebase. Burn down per-package instead.
Do not merge classic/v1/core (A3) — the split is intentional for the v1 migration; consolidating now is high-risk and low-reward.
Do not micro-optimize the SSRF blocklist (P1) or shell output loop (P2) — bounded and not on a measured hot path; the code already flags where to optimize if profiling justifies it.

Definition of done (measurable signals)

No High security findings remain (S1, S3 resolved or explicitly accepted with mitigations).
SSRF subsystem has exactly one documented env bypass; a regression test asserts a rebinding-style scenario is blocked at connect time.
ShellToolMiddleware has no implicit HostExecutionPolicy default (or a test asserting the documented opt-in).
Every directory under libs/partners/ has a matching pre-commit hook (CI check passes).
Agent test trees are type-checked (removed from mypy exclude) or each exclusion has a tracked ticket.
runnables/base.py reduced below an agreed LOC budget with no public API diff (snapshot of __init__ exports unchanged).

5. Task Plan (Phase 4)

Workload: S = <2h · M = half day · L = 1–2 days · XL = needs breakdown.

⚡ Quick Wins (high-impact, S-effort, do immediately)

QW1 — Unify the SSRF env bypass + document it. Make _effective_allowed_hosts use the same single, narrow condition as validate_safe_url, and document the bypass in the public docstring. (S, low risk)
QW2 — Add pre-commit hooks for deepseek, openrouter, perplexity, xai. Mirror existing per-package hook blocks. (S, low risk)
QW3 — Collapse AGENTS.md/CLAUDE.md duplication to one source + a pointer, relying on check_agents_sync.yml. (S, low risk)
QW4 — Document the SHA-1 key_encoder default and recommend blake2b/sha256 in the index/aindex docstrings (no behavior change). (S, no risk)

Milestone 0 — Safety Net (do before refactoring)

M0.1 — Add SSRF rebinding regression tests

Description: Add unit tests that simulate a host resolving to a public IP at validation and a private/metadata IP at "connect" time, asserting the request is blocked.
Affected: libs/core/tests/unit_tests/_security/, _security/_ssrf_protection.py, _policy.py.
Acceptance: Test fails against current code (demonstrating the gap), passes after S1 fix.
Workload: M · Risk: Low · Depends on: none.

M0.2 — Snapshot public API surface of langchain_core.runnables

Description: Capture the exported names of runnables/__init__.py as a test fixture to guard the M2 refactor.
Affected: libs/core/tests/unit_tests/runnables/.
Acceptance: A test asserts the export set is unchanged.
Workload: S · Risk: Low · Depends on: none.

Milestone 1 — Critical Fixes (security & correctness)

M1.1 — Close the SSRF TOCTOU gap (IP pinning at connect) (TOP PRIORITY #1)

Description: Wire validated IPs into the actual transport so the connection uses the IP that was validated, eliminating the second DNS resolution. Leverage the existing _security/_transport.py.
Affected: _security/_transport.py, _security/_ssrf_protection.py, callers that fetch URLs.
Acceptance: M0.1 rebinding test passes; existing SSRF tests pass; no public signature change to validate_safe_url.
Workload: L · Risk: Medium (touches request path) · Depends on: M0.1.

M1.2 — Make ShellToolMiddleware safe-by-default (TOP PRIORITY #2)

Description: Require an explicit execution_policy, OR default to the strongest available sandbox (CodexSandboxExecutionPolicy/DockerExecutionPolicy) when present, falling back to host only with an explicit flag.
Affected: libs/langchain_v1/langchain/agents/middleware/shell_tool.py:508–571.
Acceptance: Constructing the middleware without a policy does not silently grant host shell; a test asserts the documented default; docstring updated.
Workload: M · Risk: Medium (default change is user-visible — follow CLAUDE.md stable-interface rule, use keyword-only + warn) · Depends on: none.

M1.3 — Unify & document the env bypass (QW1, promoted)

Description: Single bypass condition + public docstring note.
Affected: _policy.py:231, _ssrf_protection.py:69.
Acceptance: One code path; test covers it; docstring documents it.
Workload: S · Risk: Low · Depends on: none.

Milestone 2 — High-Leverage Improvements

M2.1 — Decompose runnables/base.py (TOP PRIORITY #3)

Description: Split the 6,574-line module into cohesive submodules (e.g. base protocol, sync impl, async impl, declarative/config ops, schema) re-exported from runnables/__init__.py.
Affected: libs/core/langchain_core/runnables/base.py (+ new submodules), runnables/__init__.py.
Acceptance: M0.2 export snapshot unchanged; mypy strict + ruff pass; import time not regressed.
Workload: XL (needs design breakdown) · Risk: Medium-High (most central abstraction) · Depends on: M0.2.

M2.2 — Type-check the agents test trees

Description: Remove the mypy excludes for agents tests; fix resulting errors incrementally.
Affected: libs/langchain_v1/pyproject.toml:112–117, :161–168; agent test files.
Acceptance: mypy . passes without the excludes (or excludes reduced with tickets for the rest).
Workload: L · Risk: Low · Depends on: none.

M2.3 — Single source of truth for the provider registry

Description: Derive the inference prefix table and docstring provider list from _BUILTIN_PROVIDERS (or a generated check) to prevent drift.
Affected: libs/langchain_v1/langchain/chat_models/base.py:38–100, :521–594.
Acceptance: A test asserts inference table ⊆ registry; adding a provider requires one edit.
Workload: M · Risk: Low · Depends on: none.

Milestone 3 — Quality & Polish

M3.1 — Burn down BLE (blind-except) per package

Description: Re-enable BLE package-by-package; replace except BaseException/broad catches with specific exceptions or justified # noqa with a reason.
Affected: ignore lists in libs/*/pyproject.toml; ~9 files in langchain_v1/langchain.
Acceptance: BLE enabled for at least core + langchain_v1; remaining exceptions justified inline.
Workload: L · Risk: Low-Medium · Depends on: none.

M3.2 — Burn down mypy strictness TODOs

Description: Enable disallow_any_generics (core) and warn_return_any (v1), fixing fallout.
Affected: libs/core/pyproject.toml:94, libs/langchain_v1/pyproject.toml:120.
Acceptance: Flags enabled; mypy passes.
Workload: L · Risk: Low · Depends on: M2.1 (touches same central code).

M3.3 — Make SHA-1 default explicit / plan migration

Description: Keep SHA-1 default but document it loudly and schedule a default change for the next major.
Affected: libs/core/langchain_core/indexing/api.py docstrings.
Acceptance: Docstrings recommend stronger algorithms; a tracked issue exists for the major-version change.
Workload: S · Risk: Low · Depends on: none.

Implementation sketches — Top 3 tasks

#1 — M1.1: Close the SSRF TOCTOU gap

Approach: Resolve the hostname once, validate every resolved IP, then connect to the validated IP directly (passing the original hostname for TLS SNI / Host header). Implement via a custom requests/httpx/urllib transport adapter (a _transport.py already exists to build on).
Key steps: (1) Extend the transport to accept a pre-validated IP set; (2) have validate_safe_url/validate_url return the validated IP(s), not just the string; (3) route fetches through the transport; (4) add the M0.1 rebinding test using a stub resolver.
Pitfalls: Breaking TLS hostname verification if you connect by IP without preserving SNI; IPv6 literal formatting in the Host header; keeping validate_safe_url's public signature stable (return type must stay str — expose IPs via a new internal function). Round-robin DNS / multiple A records must all be validated and the connect must use a validated one.

#2 — M1.2: Safe-by-default shell middleware

Approach: Change the constructor so an unspecified policy does not mean "host". Prefer a sandbox if available; otherwise require an explicit HostExecutionPolicy() (or a allow_host=True keyword-only flag) and emit a warning.
Key steps: (1) Add keyword-only opt-in; (2) detect sandbox availability (Codex/Docker) and select it; (3) update class docstring + the post-exec-redaction warning; (4) add tests for each default path.
Pitfalls: This is a user-visible behavior change — follow CLAUDE.md's stable-interface rule: introduce via keyword-only argument with a deprecation/transition warning rather than silently flipping the default; document clearly in release notes.

#3 — M2.1: Decompose `runnables/base.py`

Approach: Identify cohesive seams (core Runnable/RunnableSerializable base, RunnableSequence/RunnableParallel, binding/config/declarative ops, schema generation) and move each into a submodule, re-exporting from runnables/__init__.py so the public surface is byte-identical.
Key steps: (1) Land M0.2 export snapshot; (2) move one cohesive group at a time, running mypy+ruff+tests after each; (3) keep relative-import ban in mind (ruff ban-relative-imports = "all").
Pitfalls: Circular imports between the split modules (use TYPE_CHECKING guards, already idiomatic here); import-time regressions; accidental changes to __all__. Because Runnable is the most depended-on abstraction, do this in small, individually-reviewable PRs, not one mega-diff.

End of report.