LangChain Monorepo — Audit Report
Date: 2026-06-17
Auditor: Claude Sonnet 4.6 (principal-engineer-level audit)
Scope: c:\CTRLNODE_EXAMPLE\.ctrlnode\langchain\langchain
1. Executive Summary
Overall Health Grade: B+
The LangChain Python monorepo is a mature, production-grade open-source framework for building LLM-powered applications and agents. The codebase shows significant engineering discipline: strict linting with ruff, static typing with mypy (strict mode), per-package isolation with uv, and a sophisticated CI/CD pipeline including lint, test, minimum-version, and integration test gates. The security posture is notably strong, with a dedicated _security module implementing DNS-aware SSRF protection, cloud metadata blocking, and a comprehensive PII redaction middleware — unusual for an open-source AI framework.
Top 3 Risks:
- Broad exception catching in core hot paths —
runnables/base.pyandlanguage_models/chat_models.pyhave 16 and 8 broadexcept Exceptioncatches respectively; swallowed errors make production debugging extremely difficult. - Deserialization threat model acknowledged but high risk to adopters —
load.pydocuments SSRF risks in theallowed_objects='core'default, but the default remains'core', meaning unsuspecting integrators callingload()on partially-trusted inputs face constructor-side-effects including network calls. - Dual package identity (
langchainvslangchain-classic) — Two active packages (langchain_v1→ published aslangchain,libs/langchain→ published aslangchain-classic) create confusion; the classic package depends onSQLAlchemy>=1.4andrequests, adding heavy transitive dependencies even when unused.
Top 3 Opportunities:
- Tighten
load()default to'messages'— A one-line change dramatically reduces inadvertent SSRF exposure for the majority of adopters who deserialize chat history. - Introduce structured error hierarchy — Replace ad-hoc
except Exceptionwith domain-specific exception types, enabling callers to distinguish retriable vs fatal errors. - Enable
disallow_any_genericsandwarn_return_anyin mypy strict mode — Both are explicitly disabled withTODOcomments; enabling them would catch a class of runtime type errors before deployment.
2. Repository Map
Purpose
LangChain is an agent engineering platform — a Python framework enabling developers to build LLM-powered applications and autonomous agents. Target users are AI application developers integrating models from OpenAI, Anthropic, Google, Ollama, and others. Maturity: production-stable (Development Status :: 5 in all packages).
Tech Stack
| Layer | Technology |
|---|---|
| Language | Python 3.10–3.14 |
| Package management | uv (lockfile per package) |
| Build backend | hatchling |
| Linting | ruff (ALL rules, selective ignore) |
| Type checking | mypy (strict, per package) |
| Testing | pytest + pytest-asyncio + syrupy (snapshots) + blockbuster |
| Serialization | pydantic v2 + custom Serializable base |
| Async | asyncio throughout; both sync and async method pairs |
| CI | GitHub Actions (matrix: lint, unit test, integration test, min-version test) |
| Observability | LangSmith integration (tracing, callbacks) |
| Dependency graphs | langgraph (agent orchestration) |
Architectural Sketch
User Code
│
▼
langchain (libs/langchain_v1) ← public API, agents, chat_models, tools
│ depends on
▼
langchain-core (libs/core) ← base abstractions, Runnable, callbacks, messages
│ depends on
▼
langsmith + langgraph + pydantic ← observability, graph execution, validation
│
▼
Partner packages (libs/partners/*) ← OpenAI, Anthropic, Ollama, Groq, etc.
│ tested by
▼
langchain-tests (libs/standard-tests) ← shared test suite for integrations
Key Directories
| Directory | Description |
|---|---|
libs/core/langchain_core/ |
Base abstractions: Runnable, messages, callbacks, prompts, output parsers, tools, SSRF security |
libs/langchain_v1/langchain/ |
Active langchain package: agents, middleware, chat_models, tools |
libs/langchain/ |
Legacy langchain-classic package (no new features) |
libs/partners/ |
15 first-party provider integrations (openai, anthropic, ollama, groq, etc.) |
libs/standard-tests/ |
Shared integration test suite for partner packages |
libs/model-profiles/ |
Model capability profiles (context windows, feature flags) |
libs/text-splitters/ |
Document chunking utilities |
.github/workflows/ |
20+ CI/CD workflow files |
.github/scripts/ |
PR labeling, diff detection, min-version calculation |
Surprises
libs/langchain(classic) is frozen;libs/langchain_v1is the activelangchainpackage — naming is confusing.- The
lockfile checkstep in_lint.ymlis commented out (lines 50-54), meaning lockfile drift is not enforced in CI. - A
_securitymodule with SSRF and PII protection is present inlangchain-core— unusually sophisticated for an OSS AI library. mypystrict mode is on per package, butdisallow_any_generics(core) andwarn_return_any(core and langchain_v1) are explicitly disabled withTODOcomments.
3. Audit Report
Dimension 1: Architecture & Design
Finding A1 — Dual Package Identity Confusion
Severity: Medium
Fact: libs/langchain/ publishes as langchain-classic (v1.0.7), libs/langchain_v1/ publishes as langchain (v1.3.6). The CLAUDE.md refers to langchain-classic as "legacy, no new features" but both directories coexist, share similar naming patterns, and both reference each other in uv.sources.
File: libs/langchain/pyproject.toml:6 (name = "langchain-classic"), libs/langchain_v1/pyproject.toml:6 (name = "langchain")
Why it matters: Contributors and external integrators risk importing from the wrong package silently. The libs/langchain directory name suggests it is langchain but it is the legacy package.
Finding A2 — langchain-classic Carries Heavy Unused Dependencies
Severity: Medium
Fact: langchain-classic declares SQLAlchemy>=1.4.0, requests>=2.0.0, async-timeout as hard runtime dependencies — all legacy baggage.
File: libs/langchain/pyproject.toml:26-34
Why it matters: Users installing langchain-classic transitively pull SQLAlchemy and requests even if they only need prompt templates. This inflates install size and attack surface.
Finding A3 — runnables/base.py is a God Object
Severity: Medium
Judgment: The Runnable base class in libs/core/langchain_core/runnables/base.py imports from 20+ modules (callbacks, tracers, serialization, utilities, output parsers, prompt templates) within 120 lines of imports. The file is the central orchestration hub for all streaming, batching, retry, and composition logic.
File: libs/core/langchain_core/runnables/base.py:1-120
Why it matters: High coupling means any change risks cascading breakage; it is difficult to unit-test in isolation.
Finding A4 — Direct langgraph Internal Import
Severity: Low
Fact: libs/langchain_v1/langchain/agents/factory.py imports langgraph._internal._runnable.RunnableCallable — a private internal module.
File: libs/langchain_v1/langchain/agents/factory.py:22
Why it matters: Private imports break on minor langgraph version bumps without a SemVer contract; this is a latent breakage vector.
Dimension 2: Code Quality
Finding Q1 — Broad Exception Catching in Hot Paths
Severity: High
Fact: runnables/base.py has 16 occurrences of except Exception catch blocks. language_models/chat_models.py has 8. language_models/llms.py has 7. callbacks/manager.py has 7.
Files: libs/core/langchain_core/runnables/base.py, libs/core/langchain_core/language_models/chat_models.py, libs/core/langchain_core/callbacks/manager.py
Why it matters: Catching Exception swallows KeyboardInterrupt subtypes and makes distinguishing recoverable from fatal errors impossible for callers. Debugging production failures requires exception chaining clarity.
Finding Q2 — mypy Strict Mode Partially Disabled
Severity: Medium
Fact: libs/core/pyproject.toml:95 has disallow_any_generics = false with a TODO comment. libs/langchain_v1/pyproject.toml:120 and libs/langchain/pyproject.toml:149-150 disable warn_return_any and disallow_any_generics.
Files: libs/core/pyproject.toml:94-95, libs/langchain_v1/pyproject.toml:119-120
Why it matters: These disabled checks mean Any-typed returns and generic containers slip through static analysis, hiding runtime type errors that only manifest with real data.
Finding Q3 — 33 TODO/FIXME/HACK Markers in Core
Severity: Low
Fact: Grepping TODO|FIXME|HACK|XXX across langchain_core/ yields 33 occurrences across 22 files.
Files: libs/core/langchain_core/language_models/model_profile.py:4, libs/core/langchain_core/messages/content.py:5, and 20 others
Why it matters: Unresolved TODOs in production code accumulate as technical debt; some may gate correctness improvements (e.g., the mypy TODO: activate for 'strict' checking comments).
Finding Q4 — Duplicate Awaitable Import in TYPE_CHECKING Block
Severity: Low
Fact: libs/langchain_v1/langchain/agents/middleware/types.py imports Awaitable from collections.abc twice: once at the top (line 5) and once inside if TYPE_CHECKING: (line 20).
File: libs/langchain_v1/langchain/agents/middleware/types.py:5,20
Why it matters: Minor code smell; the duplicate import is harmless but indicates the file was edited without a linter pass catching the redundancy (or the linter ignore is covering it).
Dimension 3: Security
Finding S1 — Deserialization Default Exposes SSRF (Acknowledged)
Severity: High
Fact: load.py clearly documents that allowed_objects='core' (the current default) is "unsafe with untrusted manifests" and can trigger network calls, file operations, or SSRF via constructor kwargs. The default has not been changed to the safer 'messages'.
File: libs/core/langchain_core/load/load.py:1-65
Why it matters: Adopters calling load()/loads() on chat history payloads that arrive from external systems (webhooks, databases, user uploads) are exposed unless they explicitly override allowed_objects. The threat model is documented but the dangerous default remains.
Finding S2 — SSRF Protection Excellent but Has Test-Environment Bypass
Severity: Medium
Fact: _ssrf_protection.py:68-74 contains a bypass: if LANGCHAIN_ENV == "local_test" and the hostname starts with "test" and contains "server", URL validation is skipped entirely.
File: libs/core/langchain_core/_security/_ssrf_protection.py:68-74
Why it matters: If an attacker can set LANGCHAIN_ENV=local_test (e.g., via environment variable injection in a serverless or container context), they can bypass SSRF protection for any URL with test*server in the hostname.
Finding S3 — subprocess.Popen in Shell Middleware with # noqa: S603
Severity: Medium
Fact: _execution.py suppresses the S603 (subprocess-without-shell-equals-true) lint rule on subprocess.Popen. While the command is passed as a list (which is generally safe), the code accepts Mapping[str, str] environment overrides from callers that could influence child process behavior.
File: libs/langchain_v1/langchain/agents/middleware/_execution.py:35
Why it matters: Shell middleware executing subprocess commands is a high-risk surface. The # noqa: S603 suppression means future edits that accidentally introduce shell=True would not be caught by linting.
Finding S4 — No Rate Limiting on DNS Resolution in SSRF Validation
Severity: Low
Fact: validate_url in _policy.py:238 and validate_safe_url in _ssrf_protection.py:86 both perform synchronous DNS resolution on every call with no caching or rate limiting.
File: libs/core/langchain_core/_security/_policy.py:238-268, libs/core/langchain_core/_security/_ssrf_protection.py:84-105
Why it matters: In high-throughput scenarios, repeated DNS resolution for the same hostname adds latency. More importantly, an attacker supplying many unique hostnames could induce a DNS amplification/exhaustion pattern within the validating service.
Dimension 4: Testing
Finding T1 — Lockfile Verification Commented Out in CI
Severity: High
Fact: The lockfile check step in _lint.yml:50-54 is commented out: # - name: "🔒 Verify Lockfile is Up-to-Date".
File: libs/core/.github/workflows/_lint.yml:50-54 (also at repo-root .github/workflows/_lint.yml:50-54)
Why it matters: Without lockfile verification, uv.lock can drift from pyproject.toml. The next uv sync by a developer or CI runner could silently upgrade or downgrade dependencies, breaking reproducibility and introducing untested version combinations.
Finding T2 — Integration Tests Require Manual Trigger
Severity: Medium
Fact: Integration tests are not run on every PR; they require manual scheduling or explicit trigger. The integration_tests.yml is a separate workflow.
File: .github/workflows/integration_tests.yml
Why it matters: Partner integrations (OpenAI, Anthropic, etc.) can break on model API changes between manual runs. PRs may merge with undetected integration regressions.
Finding T3 — No Observed Coverage Gate in CI
Severity: Medium
Judgment: No CI step enforces a minimum test coverage threshold. tool.coverage.run is configured in pyproject.toml (omitting tests) but no --cov-fail-under or coverage reporting step was found in the workflow files.
Files: libs/core/pyproject.toml:143-144, .github/workflows/_test.yml
Why it matters: Coverage can silently regress. Without a gate, new code paths can ship without test coverage, especially in complex middleware like factory.py and pii.py.
Dimension 5: Performance
Finding P1 — Synchronous DNS Resolution Blocks Async Code
Severity: Medium
Fact: validate_safe_url in _ssrf_protection.py:86 calls socket.getaddrinfo() synchronously. The async variant in _policy.py:260 correctly wraps DNS resolution with asyncio.to_thread(), but the sync wrapper used in Pydantic validators runs on the event loop thread if called from async code paths.
File: libs/core/langchain_core/_security/_ssrf_protection.py:86
Why it matters: If SSRF validation is invoked during object construction in an async context (e.g., as a Pydantic BeforeValidator), it will block the event loop for the duration of DNS resolution.
Finding P2 — _ip_in_blocked_networks Iterates Full Blocklist on Every Call
Severity: Low
Fact: _ip_in_blocked_networks in _policy.py:138 iterates over 14 IPv4 and 8 IPv6 networks sequentially with a comment noting lru_cache could be used.
File: libs/core/langchain_core/_security/_policy.py:138-183 (note on line 144)
Why it matters: In high-throughput URL-fetch scenarios (e.g., retrieval pipelines), this is called per URL. The code itself acknowledges this and suggests memoization, but it has not been implemented.
Dimension 6: Dependencies
Finding D1 — CVE-Motivated Constraint on pygments in pyproject
Severity: High
Fact: libs/core/pyproject.toml:82 contains constraint-dependencies = ["pygments>=2.20.0"] with comment # CVE-2026-4539. Similarly libs/langchain_v1/pyproject.toml:96 constrains urllib3>=2.6.3.
Files: libs/core/pyproject.toml:81-82, libs/langchain_v1/pyproject.toml:95-96
Why it matters: These CVE constraints are applied via uv constraint-dependencies (which only affect the resolved environment, not published package metadata). Downstream users who install langchain-core without using the provided lockfile may not get the patched versions. The CVEs should also be documented in a SECURITY.md or release notes.
Finding D2 — langsmith Pin Allows Pre-1.0 with Wide Range
Severity: Low
Fact: langchain-core depends on langsmith>=0.3.45,<1.0.0. LangSmith is the core tracing/observability dependency; a wide pre-1.0 range could introduce breaking API changes between minor versions.
File: libs/core/pyproject.toml:27
Why it matters: Pre-1.0 packages conventionally allow breaking changes in minor versions. The wide range means langsmith 0.4.0 (hypothetical) could break tracing without a version bump in langchain-core.
Finding D3 — blockbuster Pinned to Very Narrow Range in Test Deps
Severity: Low
Fact: blockbuster>=1.5.18,<1.6.0 (core) and blockbuster>=1.5.26,<1.6.0 (langchain_v1) are very tightly pinned.
Files: libs/core/pyproject.toml:70, libs/langchain_v1/pyproject.toml:73
Why it matters: Overly tight test-dep pins cause uv sync to fail if the package is yanked or the patch series fills the range — a minor operational friction issue.
Dimension 7: Developer Experience & Operations
Finding O1 — Lockfile Verification Disabled (Repeat)
Severity: High (cross-cutting with T1)
Fact: _lint.yml:50-54 has the lockfile check commented out.
Why it matters: Reproducibility of builds is not enforced; developers checking out the repo may silently get different dependency resolutions than CI.
Finding O2 — No SECURITY.md
Severity: Medium
Fact: No SECURITY.md file exists at the repo root or in any libs/ package. CVE constraints are silently embedded in pyproject.toml.
Files: (absence of SECURITY.md verified by directory scan)
Why it matters: Security researchers have no documented disclosure path. Known CVE constraints (pygments CVE-2026-4539, urllib3) are not communicated to downstream users.
Finding O3 — PR_LINT Allows Certain Test Files to Skip ALL Rules
Severity: Low
Fact: libs/langchain_v1/pyproject.toml:168 sets "tests/unit_tests/agents/test_react_agent.py" = ["ALL"] in ruff per-file ignores, completely disabling linting for that test file.
File: libs/langchain_v1/pyproject.toml:168
Why it matters: One test file entirely escapes quality enforcement. This sets a precedent and may mask issues.
Dimension 8: Documentation
Finding Doc1 — README References gpt-5.4 as a Model Name
Severity: Medium
Fact: README.md:40 uses init_chat_model("openai:gpt-5.4") as the quickstart example. As of June 2026, gpt-5.4 is not a known GA OpenAI model name.
File: langchain/README.md:40
Why it matters: New users copy-paste the quickstart and get an API error immediately. This damages the onboarding experience and contradicts CLAUDE.md's instruction to always use verified GA model names.
Finding Doc2 — CLAUDE.md Instructs AI Agents on Model References but README Violates It
Severity: Low
Fact: CLAUDE.md:248-253 says "Always use the latest generally available (GA) models… do not rely on memorized or cached model names." The README.md:40 uses gpt-5.4, which may not be valid.
Files: CLAUDE.md:248-253, README.md:40
Why it matters: The project's own AI-agent guidance contradicts the public-facing quickstart, creating confusion about which model names to use.
Strengths
- Rigorous linting config —
ruffwithselect = ["ALL"]and targeted ignores shows genuine commitment to code quality rather than permissive silence. - Strict mypy across all packages —
strict = trueis set in everypyproject.toml; this is rare in large Python OSS projects. - SSRF protection module — A dedicated
_securitysub-package with DNS-aware IP blocklisting, cloud metadata endpoint protection, NAT64 unwrapping, and K8s internal DNS blocking is genuinely best-in-class for an OSS AI framework. - PII middleware — The
PIIMiddlewareand_PIIStreamTransformerprovide comprehensive PII redaction across streaming and non-streaming surfaces (AI messages, tool calls, state snapshots) with multiple strategies. - Standard test suite —
langchain-testsprovides a shared, standardized integration test suite that partner implementations must pass, ensuring consistent behavior across integrations. - CI rigor — Minimum-version testing (testing with lowest allowed dependency versions) is a strong practice that catches compatibility regressions early.
- Deserialization threat model documented —
load.pydocuments its SSRF threat model clearly in module-level docstrings; this is unusually transparent for a library. - Reproducible builds — Per-package
uv.lockfiles with SHA-pinned dependencies ensure reproducible environments when lockfiles are used. - Conventional commits enforced — PR title linting enforces Conventional Commits with scopes, improving changelog automation and traceability.
- SHA-pinned GitHub Actions — All CI action references use full commit SHAs rather than tags, preventing supply-chain attacks via action tag mutation.
4. Improvement Strategy
Theme 1: "Exception Handling is Ad-Hoc"
Evidence: 80 broad except Exception occurrences in core; no domain exception hierarchy beyond SSRFBlockedError and PIIDetectionError.
Target State: A domain exception hierarchy under langchain_core.exceptions (e.g., LangChainError > RetriableError, FatalError, ConfigurationError). Core hot paths catch specific exceptions and re-raise with context.
Principles: Fail loudly with actionable context; distinguish recoverable from fatal errors at the boundary.
Done when: Zero bare except Exception in runnables/base.py, chat_models.py, callbacks/manager.py; mypy confirms exception types.
Theme 2: "Security Defaults Favor Convenience Over Safety"
Evidence: load() defaults to allowed_objects='core' (unsafe with untrusted input). validate_safe_url bypass activatable via env var. DNS resolution not rate-limited.
Target State: load() defaults to allowed_objects='messages' with a migration guide for users needing 'core'. The test-environment bypass requires an explicit allowlist rather than a string prefix pattern. DNS resolution is cached with a short TTL.
Principles: Secure by default; explicit opt-in for dangerous operations.
Done when: No Critical/High security findings; security CHANGELOG entry per change.
Theme 3: "Type Safety Has Known Gaps"
Evidence: disallow_any_generics = false in core; warn_return_any = false in core and langchain_v1; 33 TODOs including mypy activation TODOs.
Target State: All mypy strict checks enabled. Any usage explicitly annotated with # type: ignore[misc] with a comment explaining why.
Principles: Types are executable documentation; every Any is a potential runtime failure.
Done when: disallow_any_generics = true and warn_return_any = true in all pyproject.toml; CI mypy passes.
Theme 4: "Build Reproducibility is Not Enforced"
Evidence: Lockfile check commented out in CI. No coverage gate.
Target State: Lockfile verification re-enabled in CI. Coverage gate at ≥75% for core package.
Principles: CI must be the single source of truth for "this build is correct."
Done when: _lint.yml lockfile check is uncommented and passing; coverage threshold fails CI if dropped below.
Theme 5: "Documentation Has Accuracy Gaps"
Evidence: README.md uses an unverifiable model name. No SECURITY.md. CVE constraints are invisible to downstream users.
Target State: README uses init_chat_model("openai:gpt-4o") or another verified GA model. SECURITY.md with responsible disclosure process. Release notes call out known CVE mitigations.
Done when: README quickstart passes uv run without model-not-found errors. SECURITY.md exists at repo root.
Trade-offs — What NOT to Fix Now
runnables/base.pyGod object decomposition — High effort, extreme breakage risk. The Runnable abstraction is intentionally the composition hub; breaking it apart would require a major version.- Changing
langchain-classictransitive dependencies — Would be a breaking change for existinglangchain-classicusers. Freeze it and let it sunset naturally. - Enforcing per-IP DNS cache TTL in SSRF validation — Requires introducing a stateful cache that complicates the pure-function design of the security module. Low reward vs effort.
- Moving all 33 TODOs to issues — Mechanical work with limited engineering value. Triage them, close the non-actionable ones.
5. Task Plan
Quick Wins (High Impact, S-effort)
| # | Title | Effort | Risk |
|---|---|---|---|
| QW1 | Fix README quickstart model name | S | None |
| QW2 | Uncomment lockfile verification step in CI | S | Low |
| QW3 | Add SECURITY.md with CVE disclosure process |
S | None |
| QW4 | Remove duplicate Awaitable import in types.py |
S | None |
Milestone 0 — Safety Net
Task M0-1: Re-enable Lockfile Verification in CI
Description: Uncomment the disabled lockfile check in _lint.yml:50-54.
Affected files: .github/workflows/_lint.yml
Acceptance criteria: CI fails on any PR where uv.lock is not in sync with pyproject.toml; verified by making a test change without updating the lockfile.
Workload: S
Risk: Low (may surface existing drift that needs a one-time uv lock run)
Dependencies: None
Task M0-2: Add Core Path Unit Tests for SSRF Validation
Description: Verify existing _security tests cover the env-var bypass path and NAT64 unwrapping edge cases.
Affected files: libs/core/tests/unit_tests/_security/ (create if absent)
Acceptance criteria: Tests exist for: (a) env-var bypass activation, (b) NAT64 prefix unwrapping, (c) K8s internal suffix blocking, (d) cloud metadata hostname blocking.
Workload: M
Risk: None
Dependencies: None
Task M0-3: Add Coverage Gate to Core CI
Description: Add --cov-fail-under=75 to the make test invocation in libs/core/Makefile.
Affected files: libs/core/Makefile
Acceptance criteria: CI build for libs/core fails if line coverage drops below 75%.
Workload: S
Risk: Low (may require writing tests to meet threshold initially)
Dependencies: None
Milestone 1 — Critical Fixes
Task M1-1: Tighten load() Default to allowed_objects='messages'
Description: Change the default value of allowed_objects in load() and loads() from 'core' to 'messages'. Update docstring to explain the change. Add migration note.
Affected files: libs/core/langchain_core/load/load.py
Acceptance criteria: loads(json_str) without explicit allowed_objects only deserializes message classes. Existing unit tests updated. A deprecation warning is emitted when 'core' or 'all' is passed without explicit opt-in flag.
Workload: M
Risk: High (breaking change for users relying on default deserialization of non-message objects — requires semver bump)
Dependencies: M0-1 (need stable CI before merging breaking changes)
Implementation Sketch:
- In
load.py, change the function signature:def load(..., allowed_objects: AllowedObjects = 'messages', ...). - Update the module docstring to note the default change.
- Add an
if allowed_objects in ('core', 'all')warning block emittingLangChainDeprecationWarningwith guidance to passallowed_objectsexplicitly. - Update all existing unit tests in
tests/unit_tests/load/test_serializable.pyandtest_secret_injection.pyto passallowed_objects='core'explicitly where needed. - Pitfall: Partners using
load()internally to reconstruct objects from LangSmith traces may break silently — grep forload(/loads(inlibs/partners/before releasing.
Task M1-2: Harden SSRF Test-Environment Bypass
Description: Replace the string-prefix bypass in validate_safe_url with an explicit allowlist mechanism. The env-var LANGCHAIN_ENV=local_test should only bypass validation for hosts explicitly in policy.allowed_hosts.
Affected files: libs/core/langchain_core/_security/_ssrf_protection.py:68-74, libs/core/langchain_core/_security/_policy.py:229-235
Acceptance criteria: validate_safe_url("http://testserver", allow_private=False) raises ValueError unless testserver is in policy.allowed_hosts. Existing test helpers that rely on this bypass are updated to use allowed_hosts=frozenset({"testserver"}).
Workload: M
Risk: Medium (test infrastructure that relies on the bypass needs updating)
Dependencies: M0-2
Implementation Sketch:
- Remove lines 68-74 of
_ssrf_protection.py. - Update
_effective_allowed_hostsin_policy.py:228-235to add"testserver"and"localhost"whenLANGCHAIN_ENVstarts with"local"— keeping the intent but removing the pattern-match bypass. - Update any integration tests using
http://testserver*URLs to either useallowed_hostson their policy or switch toallow_private=True. - Pitfall: Some Django/FastAPI test runners use
testserveras a hostname; ensure the framework-level hosts list covers this.
Task M1-3: Add SECURITY.md
Description: Create a SECURITY.md at the repo root documenting the responsible disclosure process, known CVE mitigations (pygments CVE-2026-4539, urllib3), and the SSRF/deserialization threat model.
Affected files: SECURITY.md (new)
Acceptance criteria: SECURITY.md is present; GitHub recognizes it as the security policy (shows in Security tab).
Workload: S
Risk: None
Dependencies: None
Milestone 2 — High-Leverage Improvements
Task M2-1: Define Domain Exception Hierarchy in langchain_core.exceptions
Description: Introduce a structured exception hierarchy: LangChainError as base, with subclasses RetriableError, ConfigurationError, SerializationError, ToolError. Replace the most impactful except Exception catch sites in runnables/base.py, chat_models.py, and callbacks/manager.py with specific catches.
Affected files: libs/core/langchain_core/exceptions.py, libs/core/langchain_core/runnables/base.py, libs/core/langchain_core/language_models/chat_models.py, libs/core/langchain_core/callbacks/manager.py
Acceptance criteria: Top 20 except Exception sites in core replaced with typed catches. langchain_core.exceptions exports the new hierarchy. mypy confirms no type errors.
Workload: L
Risk: Medium (exception type changes can break callers catching Exception in except blocks — use LangChainError(Exception) to maintain catch-all compatibility)
Dependencies: M0-3 (need coverage gate to verify no regressions)
Implementation Sketch:
- In
exceptions.py, add:class LangChainError(Exception): ...,class RetriableError(LangChainError): ..., etc. - In
runnables/base.py, replace patterns likeexcept Exception as e: logger.error(...)withexcept (RetriableError, ToolError) as e: ...where the intent is clear from context. - In
chat_models.py, the 8 broad catches likely guard LLM API call failures — replace withexcept (httpx.HTTPError, RetriableError) as e: .... - Pitfall: Some
except Exceptionblocks re-raise viaraise(correct) vs silently swallow (incorrect). The swallowing cases are the priority; identify them withgrep -A3 "except Exception"and check forraisepresence.
Task M2-2: Enable warn_return_any in mypy Strict Mode
Description: Enable warn_return_any = true in libs/langchain_v1/pyproject.toml and libs/core/pyproject.toml. Fix resulting mypy errors.
Affected files: libs/core/pyproject.toml:94-95, libs/langchain_v1/pyproject.toml:119-120, multiple source files
Acceptance criteria: make lint passes with warn_return_any = true and disallow_any_generics = true in core.
Workload: L
Risk: Low (mypy-only change; no runtime behavior change)
Dependencies: None
Task M2-3: Fix README Quickstart Model Name
Description: Replace gpt-5.4 in README.md:40 with a verified GA model such as gpt-4o.
Affected files: README.md:40
Acceptance criteria: Quickstart code block uses a model name that works with the current OpenAI API.
Workload: S
Risk: None
Dependencies: None
Task M2-4: Cache DNS Resolution in SSRF Validation
Description: Add @functools.lru_cache(maxsize=512) or a short-TTL dict cache to the DNS resolution in validate_safe_url and validate_url.
Affected files: libs/core/langchain_core/_security/_ssrf_protection.py, libs/core/langchain_core/_security/_policy.py
Acceptance criteria: Repeated calls to validate_safe_url with the same hostname do not make repeated DNS calls. Cache key is (hostname, port). Unit test verifies socket.getaddrinfo is called once for repeated identical calls.
Workload: S
Risk: Low (cached DNS results can go stale; max TTL of 60s recommended with cachetools.TTLCache rather than unbounded lru_cache)
Dependencies: M0-2
Milestone 3 — Quality & Polish
Task M3-1: Remove # noqa: ALL from test_react_agent.py
Description: Fix the linting issues in tests/unit_tests/agents/test_react_agent.py and remove the blanket ["ALL"] ignore.
Affected files: libs/langchain_v1/pyproject.toml:168, libs/langchain_v1/tests/unit_tests/agents/test_react_agent.py
Acceptance criteria: File passes ruff check without any per-file ignores.
Workload: M
Risk: Low
Dependencies: None
Task M3-2: Triage and Close Resolved TODOs
Description: Review the 33 TODO/FIXME/HACK markers in langchain_core. Close the ones that are already resolved; convert the remainder to GitHub issues with priority labels.
Affected files: All 22 files with TODO markers identified in audit
Acceptance criteria: All remaining TODOs reference a GitHub issue URL.
Workload: M
Risk: None
Dependencies: None
Task M3-3: Document langchain vs langchain-classic Distinction Prominently
Description: Add a clear notice at the top of libs/langchain/README.md and in the monorepo-root README.md stating that langchain-classic is legacy and langchain (from libs/langchain_v1) is the active package.
Affected files: libs/langchain/README.md, libs/README.md
Acceptance criteria: A contributor arriving at libs/langchain/ knows immediately it is the frozen legacy package.
Workload: S
Risk: None
Dependencies: None
Task M3-4: Remove Redundant Import in types.py
Description: Remove the duplicate from collections.abc import Awaitable inside the if TYPE_CHECKING: block in types.py.
Affected files: libs/langchain_v1/langchain/agents/middleware/types.py:19-21
Acceptance criteria: ruff check shows no F811 or duplicate-import warnings for this file.
Workload: S
Risk: None
Dependencies: None
End of audit report.