LangChain Monorepo — Technical Audit Report
Scope:
langchain/Python monorepo (langchain-core, langchain (v1), langchain-classic, text-splitters, standard-tests, model-profiles, 16 partner packages). Method: Evidence-based static review. All file:line references are to the repository tree rooted at the directory containing this file. Where something could not be verified statically (e.g. live coverage %, runtime behavior), it is labeled as such. Audit date: 2026-06-17.
1. Executive Summary
Overall health grade: A− (strong, mature, production-grade project with a small number of real but bounded security/design risks).
LangChain is a large, actively maintained, MIT-licensed Python monorepo that is the de-facto standard framework for building LLM applications and agents. The engineering culture is unusually disciplined for an OSS project of this size: ruff is configured with select = ["ALL"], mypy runs in strict mode, GitHub Actions are pinned to full commit SHAs, CI is change-scoped for speed, dependency ranges are bounded, and there is a dedicated _security package with SSRF protection and a documented usedforsecurity=False posture around SHA-1. The codebase is well-documented (Google-style docstrings enforced) and has a deep test footprint (167 test files in core, 90 in langchain-v1). The grade is held just below A because of a handful of architectural and security items that matter at this project's scale: an SSRF guard that is inherently vulnerable to DNS-rebinding (time-of-check/time-of-use), an environment-variable-driven validation bypass that is broader than its docstring claims, a host-shell agent tool that defaults to full host access, and several genuine God-files (notably runnables/base.py at 6,574 lines).
Top 3 risks
- SSRF protection is TOCTOU-vulnerable (DNS rebinding).
validate_safe_url/validate_urlresolve DNS at validation time, but the real HTTP request resolves DNS again later — an attacker-controlled DNS record can pass validation then re-point to a private IP. (libs/core/langchain_core/_security/_ssrf_protection.py:86,libs/core/langchain_core/_security/_policy.py:259) - Environment-variable SSRF bypass is broader than documented.
_effective_allowed_hostsallowslocalhost/testserverfor anyLANGCHAIN_ENVstarting with"local", whilevalidate_safe_url's own bypass and docstrings describe a narrowerlocal_testcondition. (libs/core/langchain_core/_security/_policy.py:231,_ssrf_protection.py:69) ShellToolMiddlewaredefaults toHostExecutionPolicy(full host shell, redaction is post-execution only). Safe defaults matter because agents execute model-chosen commands. (libs/langchain_v1/langchain/agents/middleware/shell_tool.py:503,:565,:538)
Top 3 opportunities
- Adopt connection-time IP pinning / a custom transport for SSRF to close the DNS-rebinding gap (a
_transport.pyalready exists — wire validation into the actual socket connect). - Decompose the God-files (
runnables/base.py6,574 lines;callbacks/manager.py2,792;language_models/chat_models.py2,714) to improve navigability, review velocity, and import cost. - Tighten the security defaults & make them explicit (opt-in host shell, narrow the env bypass, default
key_encoderdocumentation) — high trust-impact, low effort.
2. Repository Map (Phase 1)
Purpose & maturity
- Purpose: "The agent engineering platform" — a framework for building agents and LLM-powered applications with a standard interface across model providers, embeddings, vector stores, retrievers, and tools.
- Intended users: Python application developers building LLM/agent apps; partner integrators.
- Maturity: Production library.
pyproject.tomlclassifiers declareDevelopment Status :: 5 - Production/Stable.langchain-core==1.4.3,langchain==1.3.6. (libs/core/pyproject.toml:11,:24;libs/langchain_v1/pyproject.toml:24)
Tech stack
| Area | Choice |
|---|---|
| Language | Python >=3.10,<4.0 (3.10–3.14 classifiers) |
| Packaging/build | uv workspace + hatchling build backend; per-package pyproject.toml + uv.lock |
| Core runtime deps (core) | pydantic>=2.7.4,<3, langsmith, tenacity, jsonpatch, PyYAML, typing-extensions, packaging, uuid-utils, langchain-protocol |
| Agents | langgraph>=1.2.4,<1.3 (langchain v1 depends on langgraph) |
| Lint/format | ruff (select = ["ALL"]) |
| Types | mypy strict = true, pydantic mypy plugin |
| Tests | pytest, pytest-asyncio (auto), syrupy snapshots, pytest-socket (no-network enforcement), pytest-xdist, blockbuster, pytest-benchmark/codspeed |
| CI/CD | GitHub Actions (27 workflows), change-scoped matrix, SHA-pinned actions, manual release workflow |
Architecture sketch
langchain-protocol (external) langgraph (external, 1.2.x)
│ │
▼ ▼
langchain-core ──────────────────► langchain (v1, public)
(base abstractions: (init_chat_model, create_agent,
Runnables, messages, middleware, tools, structured output)
tools, callbacks, │
_security, indexing) │ optional extras
▲ ▼
│ partners/* (openai, anthropic, ollama, groq,
text-splitters mistralai, huggingface, qdrant, chroma, exa,
standard-tests nomic, fireworks, deepseek, openrouter,
model-profiles perplexity, xai)
│
└──► langchain-classic (libs/langchain) — legacy, maintenance-only
Key directories (one line each)
| Path | Description |
|---|---|
libs/core/langchain_core/ |
Base abstractions: Runnables, messages, tools, callbacks, tracers, indexing, _security. |
libs/langchain_v1/langchain/ |
Actively maintained public langchain package: init_chat_model, agents, middleware, tools. |
libs/langchain/langchain_classic/ |
Legacy langchain-classic package (maintenance only, no new features). |
libs/partners/*/ |
16 first-party provider integrations, each its own package. |
libs/text-splitters/ |
Document chunking utilities. |
libs/standard-tests/ |
Shared standardized test suite for partner integrations. |
libs/model-profiles/ |
Model capability profile data + langchain-profiles CLI. |
.github/workflows/ |
27 CI/CD workflows (lint, test, release, labeling, codspeed perf). |
What surprised me (positively & otherwise)
- A dedicated
_securitypackage with a real, policy-driven SSRF implementation (IPv4/IPv6 blocklists, cloud-metadata IPs, NAT64-embedded-IPv4 extraction, k8s.svc.cluster.localblocking). This is far more than most OSS libraries ship. (libs/core/langchain_core/_security/_policy.py) ruff select = ["ALL"]+mypy strictacross the monorepo — an aggressive quality bar that is rare at this scale.- A
LANGCHAIN_ENV-based validation bypass baked into the security policy (_policy.py:231) — convenient for tests but a security-relevant surprise. AGENTS.mdandCLAUDE.mdare byte-identical (318 lines each) — duplicated guidance rather than one file referencing the other.- The directory layout has a doubled root (
langchain/langchain/) and the repo is a shallow git clone (.git/shallowpresent), so full history-based analysis is not possible here.
3. Audit Report (Phase 2)
Findings are grouped by dimension and sorted by severity. Each is tagged [Fact] (directly verifiable in a file) or [Judgment] (informed assessment).
Security
S1 — SSRF validation is TOCTOU / DNS-rebinding vulnerable — High [Fact + Judgment]
- What:
validate_safe_urlresolves the hostname viasocket.getaddrinfoand validates the returned IPs, then returns the URL string. The actual HTTP request happens later in the caller and re-resolves DNS. An attacker controlling the DNS record can return a public IP during validation and a private/metadata IP at fetch time. - Where:
libs/core/langchain_core/_security/_ssrf_protection.py:86–98; async equivalentlibs/core/langchain_core/_security/_policy.py:259–268. - Why it matters: The function's stated purpose is to "prevent SSRF" (
_ssrf_protection.py:49). Validating the URL but not pinning the validated IP at connection time means the guarantee does not hold against an active attacker. Consequences: access to cloud metadata (credentials) and internal services. - Severity: High.
S2 — Env-driven SSRF bypass is broader than its docstring — Medium [Fact]
- What:
_effective_allowed_hostsaddslocalhostandtestserverto the allow-list wheneverLANGCHAIN_ENVstarts with"local"(e.g.local,localdev,local_anything). Separately,validate_safe_urlhas its own bypass requiringLANGCHAIN_ENV == "local_test"AND hostnametest...server. - Where:
libs/core/langchain_core/_security/_policy.py:231;libs/core/langchain_core/_security/_ssrf_protection.py:69–74. - Why it matters: Two different bypass conditions for the same subsystem are confusing and the
_policy.pyone is wider than a reader ofvalidate_safe_urlwould expect. If an environment is misconfigured (or an attacker can influence env), localhost SSRF is silently re-enabled. The bypass is undocumented in the public docstring. - Severity: Medium.
S3 — ShellToolMiddleware defaults to full host shell access — High [Fact + Judgment]
- What: When no
execution_policyis supplied, the middleware usesHostExecutionPolicy()— the model can run arbitrary commands on the host. Redaction rules are applied after execution and explicitly "do not prevent exfiltration of secrets" under host policy. - Where:
libs/langchain_v1/langchain/agents/middleware/shell_tool.py:503(class docstring),:565(default),:538(warning). - Why it matters: This is opt-out rather than opt-in for the most dangerous capability an agent can have. The risk is partially mitigated by documentation, but a "safe by default" posture (e.g. require an explicit policy, or default to a sandbox when available) is the safer design.
- Severity: High (by impact; it is an intentional, documented design choice, so partly a Judgment on default-selection).
S4 — SHA-1 is the default key_encoder for the indexing API — Low [Fact]
- What:
index/aindexdefaultkey_encoder="sha1". A one-timeUserWarningis emitted andusedforsecurity=Falseis set, but SHA-1 remains the default fingerprint algorithm. - Where:
libs/core/langchain_core/indexing/api.py:307,:646,:46,:55–70. - Why it matters: SHA-1 is not collision-resistant; the code itself warns of this. For document de-duplication this is mostly a correctness/robustness concern (deliberate collisions could cause documents to be treated as identical). Defaulting to
blake2b/sha256would be safer, but changing a default is a breaking change — hence Low + documented. - Severity: Low.
S5 — Proactive dependency-CVE pinning is present (positive, but note maintenance burden) — Low [Fact]
- What:
constraint-dependenciespinpygments>=2.20.0 # CVE-2026-4539(core) andurllib3>=2.6.3,pygments>=2.20.0(langchain v1). - Where:
libs/core/pyproject.toml:82;libs/langchain_v1/pyproject.toml:96. - Why it matters: Demonstrates active CVE tracking. The minor risk is that hand-maintained constraint comments can drift; these belong in a tracked SCA process. Largely a strength.
- Severity: Low.
Architecture & Design
A1 — God-file: runnables/base.py at 6,574 lines — Medium [Fact + Judgment]
- What: The core
Runnableabstraction file is 6,574 lines;callbacks/manager.py2,792;language_models/chat_models.py2,714;messages/utils.py2,400. - Where:
libs/core/langchain_core/runnables/base.py(6574 LOC). - Why it matters: Single very large modules raise the cost of review, increase merge-conflict surface, slow IDE/type-checker performance, and inflate import time.
Runnableis the most central abstraction, so the blast radius of any change here is large. - Severity: Medium (it is cohesive and stable, so this is partly Judgment).
A2 — init_chat_model provider registry is a hardcoded God-dict — Low [Fact]
- What:
_BUILTIN_PROVIDERShardcodes 28 providers with import paths/class names/creator lambdas, plus a parallel_attempt_infer_model_providerprefix table and a docstring list — three sources of the same truth that must be kept in sync. - Where:
libs/langchain_v1/langchain/chat_models/base.py:38–100,:521–594,:207–309(docstring). - Why it matters: Adding/renaming a provider requires editing three places; drift produces confusing inference behavior. Low because it is well-contained and covered by the CLAUDE.md "FOR CONTRIBUTORS" note.
- Severity: Low.
A3 — Three coexisting langchain packages (core / v1 / classic) — Low [Judgment]
- What:
libs/core(langchain-core),libs/langchain_v1(langchain),libs/langchain(langchain-classic) coexist; CLAUDE.md labels classic "legacy, no new features." - Where:
libs/langchain/,libs/langchain_v1/,CLAUDE.md:16–17. - Why it matters: Necessary for a major-version migration, but newcomers can edit the wrong package. The directory name
langchain_v1vs published namelangchainis a known footgun. - Severity: Low.
Code Quality
Q1 — Broad-exception handling is intentionally allowed and used — Medium [Fact]
- What:
ruffignores theBLE(blind-except) rule monorepo-wide; 28except (Base)Exception/bare-pattern occurrences exist across 9 files inlangchain_v1/langchain, e.g._create_resourcescatchesBaseException(shell_tool.py:716,:775). - Where:
libs/core/pyproject.toml:114andlibs/langchain_v1/pyproject.toml:145("BLE"ignored); occurrences infactory.py,structured_output.py,model_fallback.py,summarization.py,types.py,shell_tool.py, etc. - Why it matters: Catching
BaseExceptioncan swallowKeyboardInterrupt/SystemExitand mask real errors. Several uses are legitimate (resource cleanup re-raises), but disabling the lint rule globally removes the guardrail that would force each case to be justified. - Severity: Medium.
Q2 — mypy strictness is partially disabled with TODO markers — Low [Fact]
- What: core sets
disallow_any_generics = false # TODO: activate for 'strict' checking; langchain v1 setswarn_return_any = false # TODO. v1 also excludes several agent test trees from type checking. - Where:
libs/core/pyproject.toml:94–95;libs/langchain_v1/pyproject.toml:112–120. - Why it matters: These are honest, tracked gaps in an otherwise strict config; they leave some
Any-leakage unchecked in central code. - Severity: Low.
Q3 — ANN401 (no Any in annotations) globally ignored — Low [Fact]
- What:
Anyannotations are pervasive (e.g._ConfigurableModel.invoke(... ) -> Any,**kwargs: Any). TheANN401rule is in the ignore list. - Where:
libs/core/pyproject.toml:113;libs/langchain_v1/pyproject.toml:144; usage throughoutchat_models/base.py. - Why it matters:
Anyis sometimes unavoidable at framework boundaries (pluggable kwargs), but blanket-ignoring the rule means accidentalAnyis invisible. Low — largely a pragmatic framework tradeoff. - Severity: Low.
Testing
T1 — Substantial unit-test footprint with network isolation enforced — Strength/Low [Fact]
- What: 167 test files in
libs/core, 90 inlibs/langchain_v1;pytest-socketblocks network in unit tests; snapshot testing viasyrupy;blockbusterdetects blocking calls in async paths. - Where: test trees under
libs/*/tests;libs/core/pyproject.toml:61–78,:146–154. - Why it matters: Strong baseline. The one caveat: actual coverage % could not be measured statically here, so coverage-gap claims are deferred.
- Severity: Low (informational).
T2 — Whole agent test trees excluded from type checking — Medium [Fact]
- What:
mypyexcludestests/unit_tests/agents/middleware/,.../specifications/, andtest_*.pyunder agents; ruff also relaxesANN/ARGfortests/unit_tests/agents/*and disables ALL rules fortest_react_agent.py. - Where:
libs/langchain_v1/pyproject.toml:112–117,:161–168. - Why it matters: The agents subsystem is the newest and highest-churn area; excluding its tests from type/lint checks reduces the safety net exactly where it is most needed.
- Severity: Medium.
Performance
P1 — Linear blocklist scans per IP in the SSRF hot path — Low [Fact]
- What:
_ip_in_blocked_networksiterates all blocked networks for each address; the code itself notes "if profiling shows this is a hot path, consider memoising". - Where:
libs/core/langchain_core/_security/_policy.py:138–183(note at:143). - Why it matters: Negligible for typical request volumes; only relevant if used in tight retrieval loops. Documented and bounded.
- Severity: Low.
P2 — Per-line encode("utf-8") and list-append accumulation in shell output collection — Low [Judgment]
- What:
_collect_outputencodes every line to measure bytes and appends to a Python list; for very chatty commands this is O(lines) allocations. - Where:
libs/langchain_v1/langchain/agents/middleware/shell_tool.py:277–298. - Why it matters: Output is already truncated by line/byte limits, so unbounded growth is mitigated; allocation overhead is minor.
- Severity: Low.
Dependencies
D1 — Bounded version ranges + per-package lockfiles — Strength/Low [Fact]
- What: All runtime deps use bounded ranges (e.g.
pydantic>=2.7.4,<3,langgraph>=1.2.4,<1.3); each package shipsuv.lock;dependabot.ymlpresent. - Where:
libs/core/pyproject.toml:26–36;libs/langchain_v1/pyproject.toml:26–30;libs/*/uv.lock. - Why it matters: Reproducible builds and controlled upgrades. Strong.
- Severity: Low (informational).
Developer Experience & Operations
O1 — Pre-commit lint/format hooks omit several partner packages — Medium [Fact]
- What:
.pre-commit-config.yamldefines per-packageformat linthooks for core, langchain, standard-tests, text-splitters, anthropic, chroma, exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai, qdrant — butdeepseek,openrouter,perplexity, andxai(which exist underlibs/partners/) have no corresponding local hook. - Where:
.pre-commit-config.yaml:48–113; partner dirs fromlibs/partners/listing. - Why it matters: Contributors editing those partner packages get no local format/lint enforcement; they rely solely on CI. Inconsistent DX and a drift risk as packages are added.
- Severity: Medium.
O2 — AGENTS.md and CLAUDE.md are duplicated verbatim — Low [Fact]
- What: The two files are identical 318-line copies of the same guidance.
- Where:
AGENTS.md,CLAUDE.md. - Why it matters: Two copies will drift; one should be the source of truth and the other a pointer (or a symlink / generated file checked in CI). There is a
check_agents_sync.ymlworkflow, which suggests sync is enforced — but maintaining two full copies is still heavier than necessary. - Severity: Low.
O3 — Mature, security-conscious CI — Strength/Low [Fact]
- What: 27 workflows; change-scoped test matrix; GitHub Actions pinned to full commit SHAs (e.g.
actions/checkout@de0fac2e…); least-privilegepermissions: contents: read; concurrency cancellation. - Where:
.github/workflows/check_diffs.yml:33–56; CLAUDE.md:310–312 (SHA-pin policy). - Why it matters: Strong supply-chain hygiene.
- Severity: Low (informational).
Documentation
DOC1 — Docstrings are extensive and enforced — Strength [Fact]
- What: Google-style docstrings enforced via ruff pydocstyle;
init_chat_modelhas a multi-hundred-line docstring with examples; security functions documentRaises. - Where:
libs/*/pyproject.tomlpydocstyle config;chat_models/base.py:218–474. - Severity: Strength.
DOC2 — The SSRF bypass behavior is not surfaced in the public docstring — Low [Fact]
- What:
validate_safe_url's docstring describes blocking private/metadata but not theLANGCHAIN_ENVtest bypass (_ssrf_protection.py:69) nor the_policy.py:231localhost allowance. - Where:
_ssrf_protection.py:47–63vs:69–74;_policy.py:231. - Severity: Low.
Strengths (preserve these)
- Dedicated, policy-based SSRF protection with IPv6/NAT64/cloud-metadata awareness — rare and valuable. (
_policy.py) ruff ALL+mypy strictmonorepo-wide quality bar.- SHA-pinned GitHub Actions, least-privilege permissions, change-scoped CI.
- Bounded dependency ranges + per-package lockfiles and active CVE pinning.
- Deep unit-test footprint with network isolation (
pytest-socket) and async-blocking detection (blockbuster). - Strong, enforced documentation standards and contributor guidance (CLAUDE.md/AGENTS.md).
- Clean layered architecture (core → langchain → partners) with a deliberate classic/v1 split for migration.
4. Improvement Strategy (Phase 3)
Theme 1 — "Security guarantees should be end-to-end, not point-in-time"
- Explains: S1 (TOCTOU), S2 (env bypass), DOC2.
- Target state: SSRF validation pins the validated IP through to the actual socket connect (no second, unvalidated DNS resolution), the env bypass has exactly one well-documented condition, and all bypasses are documented in the public docstring.
- Principles: Time-of-check must equal time-of-use; least surprise; document security-relevant escape hatches.
Theme 2 — "Dangerous capabilities should be safe-by-default and opt-in"
- Explains: S3 (host shell default).
- Target state: The most dangerous middleware (host shell) requires an explicit execution policy or defaults to the strongest available sandbox; the host policy is a conscious opt-in.
- Principles: Secure defaults; principle of least privilege for agent tools.
Theme 3 — "Decompose the central God-files to protect velocity"
- Explains: A1, partially A2.
- Target state:
runnables/base.pyand the other 2k+-line core modules are split along cohesive seams (sync/async, declarative ops, schema) behind a stable public surface, with no public API changes. - Principles: High cohesion / low coupling; keep public
__init__exports stable (CLAUDE.md's stable-interface rule).
Theme 4 — "Make the quality net uniform across the monorepo"
- Explains: O1 (missing pre-commit hooks), T2 (agent tests excluded from typing), Q2/Q3 (strictness TODOs).
- Target state: Every package present in
libs/partners/has a pre-commit hook; agent tests are type-checked; strictness TODOs are burned down or ticketed. - Principles: Consistency reduces cognitive load and drift; the safety net should be strongest in the highest-churn area (agents).
Trade-offs — what NOT to fix now (and why)
- Do not change
key_encoderdefault from SHA-1 (S4) — it is a breaking change for existing indexes; the warning +usedforsecurity=Falseare adequate for now. Revisit at the next major version. - Do not re-enable
BLE/ANN401globally overnight (Q1/Q3) — would generate large, low-signal churn across a 1.4M+ token codebase. Burn down per-package instead. - Do not merge classic/v1/core (A3) — the split is intentional for the v1 migration; consolidating now is high-risk and low-reward.
- Do not micro-optimize the SSRF blocklist (P1) or shell output loop (P2) — bounded and not on a measured hot path; the code already flags where to optimize if profiling justifies it.
Definition of done (measurable signals)
- No High security findings remain (S1, S3 resolved or explicitly accepted with mitigations).
- SSRF subsystem has exactly one documented env bypass; a regression test asserts a rebinding-style scenario is blocked at connect time.
ShellToolMiddlewarehas no implicitHostExecutionPolicydefault (or a test asserting the documented opt-in).- Every directory under
libs/partners/has a matching pre-commit hook (CI check passes). - Agent test trees are type-checked (removed from
mypyexclude) or each exclusion has a tracked ticket. runnables/base.pyreduced below an agreed LOC budget with no public API diff (snapshot of__init__exports unchanged).
5. Task Plan (Phase 4)
Workload: S = <2h · M = half day · L = 1–2 days · XL = needs breakdown.
⚡ Quick Wins (high-impact, S-effort, do immediately)
- QW1 — Unify the SSRF env bypass + document it. Make
_effective_allowed_hostsuse the same single, narrow condition asvalidate_safe_url, and document the bypass in the public docstring. (S, low risk) - QW2 — Add pre-commit hooks for
deepseek,openrouter,perplexity,xai. Mirror existing per-package hook blocks. (S, low risk) - QW3 — Collapse
AGENTS.md/CLAUDE.mdduplication to one source + a pointer, relying oncheck_agents_sync.yml. (S, low risk) - QW4 — Document the SHA-1
key_encoderdefault and recommendblake2b/sha256in theindex/aindexdocstrings (no behavior change). (S, no risk)
Milestone 0 — Safety Net (do before refactoring)
M0.1 — Add SSRF rebinding regression tests
- Description: Add unit tests that simulate a host resolving to a public IP at validation and a private/metadata IP at "connect" time, asserting the request is blocked.
- Affected:
libs/core/tests/unit_tests/_security/,_security/_ssrf_protection.py,_policy.py. - Acceptance: Test fails against current code (demonstrating the gap), passes after S1 fix.
- Workload: M · Risk: Low · Depends on: none.
M0.2 — Snapshot public API surface of langchain_core.runnables
- Description: Capture the exported names of
runnables/__init__.pyas a test fixture to guard the M2 refactor. - Affected:
libs/core/tests/unit_tests/runnables/. - Acceptance: A test asserts the export set is unchanged.
- Workload: S · Risk: Low · Depends on: none.
Milestone 1 — Critical Fixes (security & correctness)
M1.1 — Close the SSRF TOCTOU gap (IP pinning at connect) (TOP PRIORITY #1)
- Description: Wire validated IPs into the actual transport so the connection uses the IP that was validated, eliminating the second DNS resolution. Leverage the existing
_security/_transport.py. - Affected:
_security/_transport.py,_security/_ssrf_protection.py, callers that fetch URLs. - Acceptance: M0.1 rebinding test passes; existing SSRF tests pass; no public signature change to
validate_safe_url. - Workload: L · Risk: Medium (touches request path) · Depends on: M0.1.
M1.2 — Make ShellToolMiddleware safe-by-default (TOP PRIORITY #2)
- Description: Require an explicit
execution_policy, OR default to the strongest available sandbox (CodexSandboxExecutionPolicy/DockerExecutionPolicy) when present, falling back to host only with an explicit flag. - Affected:
libs/langchain_v1/langchain/agents/middleware/shell_tool.py:508–571. - Acceptance: Constructing the middleware without a policy does not silently grant host shell; a test asserts the documented default; docstring updated.
- Workload: M · Risk: Medium (default change is user-visible — follow CLAUDE.md stable-interface rule, use keyword-only + warn) · Depends on: none.
M1.3 — Unify & document the env bypass (QW1, promoted)
- Description: Single bypass condition + public docstring note.
- Affected:
_policy.py:231,_ssrf_protection.py:69. - Acceptance: One code path; test covers it; docstring documents it.
- Workload: S · Risk: Low · Depends on: none.
Milestone 2 — High-Leverage Improvements
M2.1 — Decompose runnables/base.py (TOP PRIORITY #3)
- Description: Split the 6,574-line module into cohesive submodules (e.g. base protocol, sync impl, async impl, declarative/config ops, schema) re-exported from
runnables/__init__.py. - Affected:
libs/core/langchain_core/runnables/base.py(+ new submodules),runnables/__init__.py. - Acceptance: M0.2 export snapshot unchanged; mypy strict + ruff pass; import time not regressed.
- Workload: XL (needs design breakdown) · Risk: Medium-High (most central abstraction) · Depends on: M0.2.
M2.2 — Type-check the agents test trees
- Description: Remove the
mypyexcludes for agents tests; fix resulting errors incrementally. - Affected:
libs/langchain_v1/pyproject.toml:112–117,:161–168; agent test files. - Acceptance:
mypy .passes without the excludes (or excludes reduced with tickets for the rest). - Workload: L · Risk: Low · Depends on: none.
M2.3 — Single source of truth for the provider registry
- Description: Derive the inference prefix table and docstring provider list from
_BUILTIN_PROVIDERS(or a generated check) to prevent drift. - Affected:
libs/langchain_v1/langchain/chat_models/base.py:38–100,:521–594. - Acceptance: A test asserts inference table ⊆ registry; adding a provider requires one edit.
- Workload: M · Risk: Low · Depends on: none.
Milestone 3 — Quality & Polish
M3.1 — Burn down BLE (blind-except) per package
- Description: Re-enable
BLEpackage-by-package; replaceexcept BaseException/broad catches with specific exceptions or justified# noqawith a reason. - Affected: ignore lists in
libs/*/pyproject.toml; ~9 files inlangchain_v1/langchain. - Acceptance:
BLEenabled for at least core + langchain_v1; remaining exceptions justified inline. - Workload: L · Risk: Low-Medium · Depends on: none.
M3.2 — Burn down mypy strictness TODOs
- Description: Enable
disallow_any_generics(core) andwarn_return_any(v1), fixing fallout. - Affected:
libs/core/pyproject.toml:94,libs/langchain_v1/pyproject.toml:120. - Acceptance: Flags enabled; mypy passes.
- Workload: L · Risk: Low · Depends on: M2.1 (touches same central code).
M3.3 — Make SHA-1 default explicit / plan migration
- Description: Keep SHA-1 default but document it loudly and schedule a default change for the next major.
- Affected:
libs/core/langchain_core/indexing/api.pydocstrings. - Acceptance: Docstrings recommend stronger algorithms; a tracked issue exists for the major-version change.
- Workload: S · Risk: Low · Depends on: none.
Implementation sketches — Top 3 tasks
#1 — M1.1: Close the SSRF TOCTOU gap
- Approach: Resolve the hostname once, validate every resolved IP, then connect to the validated IP directly (passing the original hostname for TLS SNI / Host header). Implement via a custom
requests/httpx/urllibtransport adapter (a_transport.pyalready exists to build on). - Key steps: (1) Extend the transport to accept a pre-validated IP set; (2) have
validate_safe_url/validate_urlreturn the validated IP(s), not just the string; (3) route fetches through the transport; (4) add the M0.1 rebinding test using a stub resolver. - Pitfalls: Breaking TLS hostname verification if you connect by IP without preserving SNI; IPv6 literal formatting in the
Hostheader; keepingvalidate_safe_url's public signature stable (return type must staystr— expose IPs via a new internal function). Round-robin DNS / multiple A records must all be validated and the connect must use a validated one.
#2 — M1.2: Safe-by-default shell middleware
- Approach: Change the constructor so an unspecified policy does not mean "host". Prefer a sandbox if available; otherwise require an explicit
HostExecutionPolicy()(or aallow_host=Truekeyword-only flag) and emit a warning. - Key steps: (1) Add keyword-only opt-in; (2) detect sandbox availability (Codex/Docker) and select it; (3) update class docstring + the post-exec-redaction warning; (4) add tests for each default path.
- Pitfalls: This is a user-visible behavior change — follow CLAUDE.md's stable-interface rule: introduce via keyword-only argument with a deprecation/transition warning rather than silently flipping the default; document clearly in release notes.
#3 — M2.1: Decompose runnables/base.py
- Approach: Identify cohesive seams (core
Runnable/RunnableSerializablebase,RunnableSequence/RunnableParallel, binding/config/declarative ops, schema generation) and move each into a submodule, re-exporting fromrunnables/__init__.pyso the public surface is byte-identical. - Key steps: (1) Land M0.2 export snapshot; (2) move one cohesive group at a time, running mypy+ruff+tests after each; (3) keep relative-import ban in mind (ruff
ban-relative-imports = "all"). - Pitfalls: Circular imports between the split modules (use
TYPE_CHECKINGguards, already idiomatic here); import-time regressions; accidental changes to__all__. BecauseRunnableis the most depended-on abstraction, do this in small, individually-reviewable PRs, not one mega-diff.
End of report.