Multi Agent Systems

Video (best)

  • Harrison Chase (LangChain) — “Multi-Agent Systems with LangGraph”
  • Watch: YouTube
  • Why: N/A — see coverage notes below
  • Level: intermediate

Coverage note: 3Blue1Brown, Andrej Karpathy, Yannic Kilcher, StatQuest, and Serrano.Academy do not have well-known dedicated videos on multi-agent systems as of my knowledge cutoff. The best video content exists in conference talks and framework-specific tutorials, but I cannot confirm exact YouTube IDs without risk of hallucination.

None identified from preferred educator list with a verifiable YouTube ID.


Blog / Written explainer (best)

  • Lilian Weng — “LLM-powered Autonomous Agents”
  • Link: https://lilianweng.github.io/posts/2023-06-23-agent/
  • Why: Weng’s post is the most cited, pedagogically structured written explainer covering agent architectures, memory, tool use, and multi-agent coordination patterns. It bridges theory and practice with clear diagrams and references to seminal work. While it covers single-agent foundations heavily, the multi-agent and orchestration sections are the best freely available written treatment from a trusted author.
  • Level: intermediate

Deep dive

  • AutoGen / Microsoft Research Documentation & Technical Report
  • Link: https://microsoft.github.io/autogen/stable/index.html
  • Why: The AutoGen framework documentation is the most comprehensive technical reference for multi-agent system design patterns including supervisor patterns, hierarchical teams, human-in-the-loop, blackboard/shared state, and conflict resolution. It combines conceptual explanation with architectural diagrams and is actively maintained by a research team. The accompanying technical report (see paper section) grounds it academically.
  • Level: advanced

Original paper

  • Wu et al. (2023) — “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation”
  • Link: https://arxiv.org/abs/2308.08155
  • Why: This is the most readable and widely adopted seminal paper specifically on LLM-based multi-agent systems. It introduces the conversational multi-agent paradigm, covers agent roles, message passing, human-in-the-loop integration, and flexible conversation topologies (peer-to-peer, hierarchical). It is highly cited and directly maps to the related concepts listed for this topic. More accessible than earlier MAS literature from classical AI.
  • Level: intermediate/advanced

Code walkthrough

  • LangChain / LangGraph — “LangGraph Multi-Agent Tutorials (Supervisor & Hierarchical Patterns)”
  • Link: https://langchain-ai.github.io/langgraph/tutorials/introduction/
  • Why: LangGraph’s official tutorials provide the best hands-on code walkthroughs for the exact patterns listed in this topic — supervisor pattern, hierarchical teams, handoffs, shared state, human-in-the-loop, and peer-to-peer architectures. The code is minimal, well-commented, and directly tied to conceptual diagrams. It uses Python and is appropriate for learners in an agentic AI course context.
  • Level: intermediate

Coverage notes

  • Strong: Written explainers (Lilian Weng), framework documentation (AutoGen, LangGraph), and the AutoGen paper cover agent roles, message passing, supervisor patterns, hierarchical teams, and human-in-the-loop very well.
  • Weak: Blackboard pattern and classical conflict resolution/consensus mechanisms are underserved in LLM-era resources; most coverage comes from classical MAS literature (Weiss 1999) rather than modern tutorials.
  • Gap: No high-quality YouTube explainer exists from the preferred educator list (3B1B, Karpathy, Yannic, StatQuest, Serrano) for multi-agent LLM systems specifically. This is a meaningful content gap for the platform — a custom video may be warranted. Conference talks (e.g., from NeurIPS or AI Engineer Summit) exist but lack the pedagogical structure of the preferred educators.
  • Gap: Debate-and-consensus mechanisms (e.g., Society of Mind-style approaches, Du et al. 2023 “Improving Factuality via Multi-Agent Debate”) are covered in papers but have no strong standalone tutorial resource.


Additional Resources for Tutor Depth

35 sources — papers, official docs, working code, benchmarks, and deep explainers that give the AI tutor precision on this topic.

📄 Blackboard Architecture (Components + Control Alternatives)

Paper · source

Architecture-level decomposition of blackboard systems with explicit component roles and control-cycle rationale

Key content
  • Blackboard architecture = 3 components (Intro, Fig. 1):
    1. Blackboard (global database / shared solution state), 2) Knowledge Sources (KSs) (agents that create/modify blackboard contents), 3) Control component (realizes behavior in serial computing environment).
  • Task characteristics suited to blackboards (Sec. 3.1):
    (1) complex/ill-structured, large spaces; systematic generation infeasible;
    (2) opportunistic, situation-dependent invocation of diverse knowledge; control decisions made during solving (not pre-set paths);
    (3) mix of synthetic + analytic processes (bottom-up fusion + top-down model-based reasoning).
  • Solution strategies and architectural implications (Secs. 3.2–4):
    • Search: needs generator + evaluator; in HEARSAY-II, KS action generates hypotheses; KS condition does look-ahead; scheduler performs global evaluation (Fig. 3). Condition/action may be scheduled separately; blackboard can change between them → may require re-evaluation (Sec. 5.3.1).
    • Recognition: match → apply; KS condition specifies situations; scheduler selects best region/event to process next (Fig. 4).
  • Blackboard structure defaults (Sec. 5.1): organized into levels (abstraction/compositional hierarchies). Level object as class, nodes as instances; nodes created dynamically; attribute values may include credibility, timestamps, history. Panels: multiple hierarchies; common second panel = control info (e.g., BB1).
  • KS design pattern (Sec. 5.2, Fig. 5): condition + action; condition often multi-stage filters: context-independent trigger then context-dependent filters; action modifies solution state and may post goals/expectations.
  • Control design axes (Sec. 5.3): schedulable entities (whole KS vs condition/action), event-oriented vs knowledge-oriented scheduling (Figs. 6–7), posting/noticing events, and where control data lives (event records vs scheduling queue vs control panel).

📄 Det-Dec-POMDP formalism + IDPP (JESP best-response via Det-POMDP)

Paper · source

Dec-POMDP/Det-Dec-POMDP tuple definition; determinism assumptions; scalable solution procedure (IDPP) via best-response Det-POMDPs

Key content
  • Det-Dec-POMDP definition (Def. 1, Sec. 4): tuple
    [ \langle I,S,A,Z,T,O,R,\gamma,b_0\rangle ] where (I={1,\dots,n}) agents; (S) states; joint actions (A=\times_i A_i); joint observations (Z=\times_i Z_i); deterministic transition (T:S\times A\to S); deterministic observation (O:A\times S\to Z); reward (R); discount (\gamma); initial belief (b_0\in \Delta(S)).
    Local policy: (\pi_i) maps local action-observation history to action. Uncertainty only in (b_0); thereafter dynamics fully deterministic via (T,O).
  • Belief property (Sec. 4): in deterministic POMDPs, belief support monotonically decreases as deterministic observations rule out inconsistent states.
  • Why hard despite determinism (Sec. 5): initial-state uncertainty induces many possible observation sequences ⇒ exponential joint history space; sufficient-statistic/occupancy approaches become intractable (esp. thousands of observations/agent).
  • IDPP procedure (Sec. 5, Alg. 2): JESP-style iterative best response: repeatedly pick agent (i), fix other agents’ FSC policies, build (i)’s best-response model, solve, update (i)’s policy; iterate until convergence (Nash equilibrium policy set).
  • Key reduction (Thm. 1, Sec. 5): with other agents’ FSCs fixed, agent (i)’s best-response model is a Det-POMDP on extended state
    [ \bar S_i = {(s, q_{-i}, o_i)} ] (environment state (s), other agents’ FSC nodes (q_{-i}), and (i)’s current observation (o_i)); transition becomes deterministic because other agents’ actions/nodes and (O) are deterministic.
  • Solver choice (Sec. 5): IDPP solves each best-response Det-POMDP using Det-MCVI (Schutz et al. 2025), exploiting determinism for scalability.
  • Heuristic initialization (Sec. 5, Alg. 3): avoid joint-observation planning: assume others follow a default MDP policy (\mu:S\to A); compute each agent’s initial policy via standard value iteration in a deterministic model.
  • Empirical setup defaults (Sec. 6): planning time limit 10,000 s; MARL training budget 10,000 episodes, max 100 steps/episode; MCJESP uses 1 s per FSC node planning budget.
  • Key empirical claim (Sec. 6): optimal finite-horizon MAA* fails even at horizon 10 due to memory exhaustion; IDPP maintains low memory by avoiding enumeration of joint histories; IDPP outperforms others on large instances where InfJESP fails.

📄 HLER human-in-the-loop multi-agent research pipeline (ops metrics + gates)

Paper · source

Production-oriented multi-agent pipeline decisions with operational metrics/tradeoffs + human-in-the-loop escalation gates

Key content
  • Architecture (Section 3.1): Central orchestrator dispatches tasks to specialized agents and maintains shared state RunState (records intermediate outputs). Sequential workflow: Data Audit → Data Profiling → Questioning → Data Collection → Analysis → Writing → Self-Critique → Review. Agent outputs stored as structured objects and passed forward.
  • Human decision gates (Section 3.1, 3.4):
    1. Research Question Selection (PI chooses among candidates)
    2. Publication Decision (final quality gate)
  • Agent roles (Section 3.2):
    • DataAuditAgent: builds variable inventory from headers/schema to prevent proposing missing variables.
    • DataProfilingAgent: summary stats, missingness, distributions, correlations; flags endogeneity risks; outputs DataProfile.
    • QuestionAgent: dataset-aware hypothesis generation conditioned on audit + profile (availability, missingness, distributional diagnostics).
    • Data agents: retrieve/merge from public APIs (World Bank, FRED, OpenAlex) + local datasets.
    • EconometricsAgent: executes OLS, fixed-effects panel, DiD, event-study; exports tables/figures/structured summaries.
    • PaperAgent: drafts full manuscript; versioned (e.g., draft_v1.md, draft_v2.md).
    • ReviewerAgent: scores (1–10) on novelty, identification credibility, data quality, clarity, policy relevance; issues revision requests.
  • Two-loop control (Section 3.3):
    • Question quality loop: generate → feasibility screen → human selects; can regenerate with modified constraints.
    • Research revision loop: reviewer requests → re-analysis + rewrite → re-review; typically converges in 2–4 iterations.
  • Empirical results (Section 4):
    • Feasible questions: dataset-aware 87% (69/79) vs unconstrained 41% (34/82). Unconstrained failures: 42% missing variables; 35% design incompatible.
    • End-to-end completion: 86% (12/14) runs completed; 2 failed at econometrics (fixed-effects convergence on sparse subsamples) with graceful halt + logs.
    • Revision improves scores: overall mean 4.8 → 5.9 → 6.3 (v1, v2, final). Biggest gains: clarity +2.1, identification +1.4; novelty ~+0.3.
    • Ops metrics: runtime 20–25 min/run; API cost 1.5/run (vs AI Scientist 15).
  • Implementation defaults (Section 3.5): Python; LLM via Anthropic Claude Sonnet 4.6 (model-agnostic); structured schema objects (e.g., ResearchQuestion, DataProfile, AnalysisResult); manuscripts in Markdown → PDF via Pandoc/LaTeX.

📊 AgentBench (LLM-as-Agent benchmark + failure modes)

Benchmark · source

AgentBench suite definition (8 environments) + comparative results (29 LLMs) + categorized failure modes

Key content
  • Formalization (Section 2): Interactive evaluation of an LLM agent is modeled as a POMDP:
    [ \langle \mathcal{S}, \mathcal{A}, \mathcal{T}, \mathcal{R}, \mathcal{I}, \mathcal{O}\rangle ] where (\mathcal{S})=state space, (\mathcal{A})=action space, (\mathcal{T})=transition function, (\mathcal{R})=reward, (\mathcal{I})=task-instruction space, (\mathcal{O})=observation space; LLM agent denoted (\pi).
  • 8 environments (Section 3):
    • Code-grounded: OS (bash/Ubuntu Docker, metric SR); DB (authentic SQL, metric SR); KG (partially observable KG QA, metric F1).
    • Game-grounded: DCG Aquawar (metric win rate); LTP (yes/no/irrelevant host, metric “game progress”); HH ALFWorld (metric SR).
    • Web-grounded: WS WebShop (metric reward); WB Mind2Web (metric step SR).
  • Dataset scale & defaults (Section 4.1): Dev/Test total sizes 269 / 1,014; ~3k / 11k inference calls (≈ MMLU). Estimated rounds per problem 5–50. Temperature=0 (greedy). Context truncated to ≤3500 tokens; omitted history marked with "[NOTICE] messages are omitted."
  • Overall score procedure (Section 4.1): avoid naive averaging; rescale each task’s average score across evaluated models, then weighted average using fixed weights = reciprocal of average score per task (Table 2 weights: OS 10.8, DB 13.0, KG 13.9, DCG 12.0, LTP 3.5, HH 13.0, WS 30.7, WB 11.6).
  • Failure/finish reasons (Section 2, 4.3): CLE, Invalid Format (IF), Invalid Action (IA), Task Limit Exceeded (TLE), Complete. IF/IA mainly instruction-following; TLE indicates weak multi-turn reasoning/decision-making.
  • Key empirical results (Table 3, Section 4.2):
    • gpt-4 best on 6/8 datasets; HH SR = 78%.
    • API vs OSS gap: average overall score OSS 0.51 vs API 2.32; all API models >1.00 overall.
    • Best OSS (≤70B) reported: CodeLLaMA-34B overall 0.96, still below gpt-3.5-turbo overall 2.32.
  • Outcome ratios example (Table 4): predominant failure is TLE; e.g., KG TLE 67.9%, LTP TLE 82.5%; DB IF 53.3%; HH IA 64.1%.
  • Design rationale: evaluate “primitive” CoT + Action prompting (no ensembles/reflection/search) as cheapest/common deployment; environments isolated via Docker + per-task workers; API-centric toolkit via HTTP server-client.

📊 LangGraph production metrics: latency, checkpoints, replay, scale

Benchmark · source

Operational framing + concrete latency/storage math for production agent workflows (TTFT/TPOT targets, checkpointing, replay/idempotency, fan-out amplification).

Key content
  • Interactive latency targets (MLPerf framing): real-time systems often target TTFT < 1s and TPOT in “tens of ms” to feel responsive; orchestration overhead must fit inside these budgets.
  • LangGraph execution model (supersteps): each step has 3 phases: (1) plan which actors/nodes to execute, (2) execute selected actors in parallel, (3) apply updates. Intermediate updates aren’t visible until the next step → clearer race-condition boundaries and replay points.
  • Eq. 1 — Checkpoint write rate sizing:
    writes/sec = steps_per_request × requests/sec
    Example from source: 12 steps/request and 2,000 req/s ⇒ 24,000 writes/s (plus reads for resume/inspection).
  • Checkpoint alignment: checkpoints persist at superstep boundaries; snapshot count scales with steps, not wall-clock runtime.
  • Replay/resume semantics: workflows can pause and resume (even up to a week later) by restoring state and replaying from a safe point. To avoid duplicated side effects (e.g., “write ticket twice”, “charge twice”), non-deterministic/side-effectful operations must be isolated as separate tasks; design for determinism + idempotency.
  • State types: (a) thread-local execution state keyed by thread_id for checkpointing/resume; (b) cross-thread long-term memory with TTL and optional similarity search (semantic search disabled by default; requires index + compatible store).
  • Tool catalog default guidance: OpenAI function calling guidance cited: keep < 20 functions available at once for higher accuracy; function definitions are injected into system message and billed as input tokens.
  • Empirical comparison: external benchmark (5-agent workflow × 100 runs) reports LangGraph >2× faster than CrewAI; CrewAI had ~5s of a 9s segment as tool-interaction gap; LangGraph passes state deltas vs full histories (token/latency savings).

📊 MCP Server + LangGraph Performance Benchmarks

Benchmark · source

Benchmark tables comparing latency/throughput/error/cost for LangGraph-based MCP server vs alternatives under controlled conditions.

Key content
  • Benchmark methodology / defaults
    • Hardware (GCP): n2-standard-4, 4 vCPU (Intel Xeon 2.3GHz), 16GB RAM, SSD 1000 IOPS, 10Gbps, region us-central1.
    • LLM: Gemini 2.0 Flash. Load tool: k6. Duration: 5 min per scenario after 1-min ramp-up. Avg of 3 runs. Metrics: Prometheus + Grafana.
    • Workloads: Simple Agent (single node); Multi-Agent (3 sequential agents); Complex Workflow (5-node graph w/ conditionals); High Concurrency (100+ concurrent).
  • Empirical results (end-to-end latency includes LLM + network + orchestration + persistence)
    • Simple Agent (MCP+LangGraph, Cloud Run self-hosted): 142 req/s, p50 245ms, p95 890ms, p99 1210ms, error 0.02%, CPU 68% avg (85% peak), mem 4.2GB avg (5.8GB peak).
    • Simple Agent (LangGraph Cloud): 135 req/s, p50 280ms, p95 950ms, p99 1450ms, error 0.05%, cost 0.001/node execution).
    • Multi-Agent 3-step (MCP+LangGraph on GKE): 48 req/s, p50 1850ms, p95 4200ms, p99 6100ms, error 0.08%, scaling ~linear (2× pods ≈ 2× throughput).
    • Multi-Agent (CrewAI self-hosted): 52 req/s, p50 1650ms, p95 3800ms, p99 5400ms, error 0.12%; rationale: lower overhead for simple sequential delegation.
    • Complex 5-node conditional graph (MCP+LangGraph): 32 req/s, p50 2800ms, p95 6500ms, p99 9200ms, error 0.15%; Redis checkpointing, automatic retries.
    • High concurrency (100 VUs; MCP+LangGraph on K8s+HPA): max 425 req/s (autoscale 2→10 pods), p50 320ms, p95 1200ms, p99 2400ms, error 0.25%, recovery 45s.
    • High concurrency (Google ADK): max 380 req/s, p50 360ms, p95 1450ms, error 0.18%, cost higher (Vertex fees).
  • Cost-per-1M complex requests
    • MCP (GKE): 300 + LLM 512; LangGraph Cloud: 1015.
  • Scaling table (vertical)
    • 2 vCPU/8GB: 75 req/s (baseline); 4 vCPU/16GB: 142 req/s (90% efficient); 8 vCPU/32GB: 260 req/s (87% efficient).

📊 Multi-agent orchestration benchmark (SEC filing extraction)

Benchmark · source

Comparative tables across multi-agent orchestration architectures with measurable outcomes + ablations + scaling

Key content
  • Architectures compared (Sec. III):
    • A Sequential pipeline: fixed agent chain; cumulative JSON state passed forward; max context 128K; split long docs into sections then merge.
    • B Parallel fan-out + merge: dispatcher routes sections to domain extractors; merge agent resolves conflicts via confidence-weighted voting.
    • C Hierarchical supervisor-worker: supervisor maintains task queue; confidence threshold = 0.85; low-confidence fields re-assigned; max 2 re-extraction iterations; supports heterogeneous model routing.
    • D Reflexive self-correcting loop: verifier checks (1) format, (2) cross-field consistency, (3) source grounding; example rule: Total Assets = Total Liabilities + Equity; max 3 correction iterations, else emit best-confidence with low-confidence flag.
  • Dataset (Sec. IV-A): 10,000 SEC filings: 4k 10-K (avg 187,340 tokens), 4k 10-Q (82,150), 2k 8-K (14,820); 25 fields across financial metrics (10), governance (8), exec comp (7).
  • Defaults (Sec. IV-C): temperature 0.0 for extraction calls; 0.3 for supervisor/critique.
  • Primary results (Table III): (Claude 3.5 Sonnet)
    • Sequential: F1 0.903, cost $0.187, latency 38.7s
    • Parallel: F1 0.914, cost $0.221, latency 21.3s
    • Hierarchical: F1 0.929, cost $0.261, latency 46.2s
    • Reflexive: F1 0.943, cost $0.430, latency 74.1s
    • Key tradeoff: hierarchical achieves 98.5% of reflexive F1 at 60.7% of cost.
  • Ablations on hierarchical+Claude (Tables V–VIII):
    • Semantic cache (embed sim 0.95, text-embedding-3-small): field-level cache cost $0.171 (−34.5%), F1 0.924.
    • Model routing: 2-tier (Claude+Mixtral) F1 0.912, cost $0.127 (−51.3%).
    • Retries: escalation (retry with stronger model) best F1 0.931.
    • Combined “Hierarchical-Optimized”: F1 0.924, cost $0.148, latency 30.2s (near-sequential cost).
  • Scaling (Table IX): reflexive degrades fastest: F1 0.943→0.871 from 1K→100K docs/day; sequential most resilient (0.903→0.886). Reflexive falls below hierarchical by 50K/day due to queueing/timeouts truncating correction loops.

📖 AutoGen GroupChat & GroupChatManager (speaker selection + defaults)

Reference Doc · source

Concrete parameter defaults + semantics for GroupChat/GroupChatManager (admin, rounds, function-call filtering, auto speaker selection, last_speaker)

Key content
  • GroupChat dataclass fields (core config):
    • agents: list of participating agents; messages: group message list; max_round: max conversation rounds.
    • admin_name default “Admin”; KeyboardInterrupt causes admin agent to take over.
    • func_call_filter default True: if a message is a function call suggestion, next speaker must be an agent whose function_map contains that function name.
  • Speaker selection configuration (defaults + options):
    • speaker_selection_method default “auto”. Allowed: "auto", "manual", "random", "round_robin" (case-insensitive), or a Callable (last_speaker, groupchat) -> Agent | str | None.
      • Callable may return: an Agent in the chat; a string selecting a default method; or None to terminate gracefully.
    • max_retries_for_selecting_speaker default 2 (auto mode requery attempts when LLM returns multiple/no names).
    • allow_repeat_speaker default True; can be False (no repeats) or a list of Agents allowed to repeat.
    • allowed_or_disallowed_speaker_transitions (dict) + speaker_transitions_type ("allowed"/"disallowed"). Mutually exclusive with allow_repeat_speaker.
  • Auto speaker selection workflow (procedure):
    1. Create nested two-agent chat: speaker selector + speaker validator.
    2. Inject group messages; selector proposes next agent.
    3. If invalid (multiple/none), append follow-up prompt and retry up to max_retries_for_selecting_speaker.
    4. If still unresolved, fallback: next agent in list.
  • Prompt templates (defaults):
    • select_speaker_message_template default: role-play instruction with {roles} and {agentlist}; appears first in context.
    • select_speaker_prompt_template default: “Read the above… select next role from {agentlist}… Only return the role.” Appears last; set to None to disable.
    • Follow-ups: select_speaker_auto_multiple_template, select_speaker_auto_none_template (both enforce returning ONLY one case-sensitive agent name).
  • GroupChatManager last_speaker property (semantics):
    • Agents receive messages from the manager; sender.last_speaker reveals the real originating agent of the last group message.

📖 BaseCheckpointSaver.list (LangGraph.js checkpoint listing)

Reference Doc · source

Concrete semantics + parameters for listing checkpoints (filters/pagination) on BaseCheckpointSaver

Key content
  • Core concepts
    • Checkpoint = snapshot of graph state at a superstep (enables “memory”, resumability, human-in-the-loop).
    • CheckpointTuple = { checkpoint, config, metadata, pendingWrites } (checkpoint plus associated config/metadata/pending writes).
    • Thread = unique thread_id grouping a series of checkpoints (supports multi-tenant separation).
  • Required/optional run identifiers (configurable config)
    • Always pass thread_id.
    • Optionally pass checkpoint_id to resume from a specific checkpoint within a thread.
    • Examples:
      • { configurable: { thread_id: "1" } }
      • { configurable: { thread_id: "1", checkpoint_id: "0c62ca34-ac19-445d-bbb0-5b4984975b2a" } }
  • Design rationale: pending writes for durable execution
    • If a node fails mid-superstep, LangGraph stores pending checkpoint writes from nodes that already succeeded so resuming from that superstep avoids re-running successful nodes.
  • BaseCheckpointSaver.list / alist parameters (checkpoint retrieval semantics)
    • config: base configuration used to scope listing (typically includes configurable.thread_id).
    • filter: additional filtering criteria (metadata filter).
    • before: list checkpoints created before this configuration (cursor-style pagination).
    • limit: maximum number of checkpoints to return.
    • Returns: Iterator[CheckpointTuple] (sync) / AsyncIterator[CheckpointTuple] (async).
  • Procedure: list checkpoints (JS example)
    • for await (const checkpoint of checkpointer.list(readConfig)) { console.log(checkpoint); }

📖 CompiledStateGraph runtime surface (JS)

Reference Doc · source

Methods/properties on the compiled graph artifact (invoke/stream/batch/state/checkpointing/interrupts/config)

Key content
  • What it is: CompiledStateGraph is the final result of building + compiling a StateGraph; do not instantiate directly—create via StateGraph.compile(). (Since v0.3; docs shown for v1.2.8)
  • Core execution methods
    • invoke(): Promise<ExtractStateType<O, O>> — run graph once with input + config; returns final output state (per outputChannels).
    • batch(): Promise<OperationResults<Op>> — execute multiple operations in one batch (more efficient than individual runs).
    • stream(): Promise<IterableReadableStream<StreamOutputMap<...>>> — primary real-time observation API; emits per enabled streamMode.
    • streamEvents(): IterableReadableStream<StreamEvent> — stream runnable events.
  • Streaming defaults & modes
    • streamMode: StreamMode[] defaults to ["values"].
    • Supported modes listed: "values" (full state each step), "updates" (state changes), "messages", "custom", "tools" (tool lifecycle events), "debug" (execution tracing). (stream() docs also mention "checkpoints" and "tasks" as streamable event types.)
    • streamChannels optional; if not specified, all channels are streamed.
  • State persistence / HITL
    • checkpointer: boolean | BaseCheckpointSaver<number> — when provided, saves a checkpoint at every superstep; when false/undefined, checkpointing disabled and graph cannot save/restore.
    • getState(): Promise<StateSnapshot> and getStateHistory(): AsyncIterableIterator<StateSnapshot> require a checkpointer.
    • updateState(): Promise<RunnableConfig<...>> — update graph state (requires checkpointer); used for human-in-the-loop, breakpoints, external inputs.
    • Interrupt controls: interruptBefore / interruptAfter: "*" or "__start__" or N[] (node names) to interrupt around nodes.
  • Validation/config defaults
    • autoValidate: boolean defaults to true (validate structure at compile).
    • debug: boolean defaults to false.
    • withConfig(...) returns a new instance (immutable merge pattern).
    • validate(): this checks: no orphaned nodes, valid input/output channels, valid interrupt configs.
  • Graph introspection
    • getGraph() / getGraphAsync() return a drawable Graph.
    • getSubgraphsAsync() yields nested Pregel subgraphs (also deprecated sync getSubgraphs()).

📖 LangGraph Checkpointing (Threads, Checkpoints, Replay)

Reference Doc · source

checkpointing API surface (checkpointer interfaces/classes), persistence semantics, replay/resume configuration

Key content
  • Core requirement (config): to persist/resume, pass thread_id in config:
    config = {"configurable": {"thread_id": "my-thread"}}; graph.invoke(inputs, config)
    thread_id is the primary key for storing/retrieving checkpoints; without it no save/resume/time-travel.
  • Checkpoint metadata (TypedDict):
    source: Literal["input","loop","update","fork"] (origin of checkpoint)
    step: int with conventions: -1 = first "input" checkpoint; 0 = first "loop" checkpoint; increasing thereafter.
  • Checkpoint (TypedDict) fields:
    id: str (unique, monotonically increasing; sortable)
    channel_values: dict[str, Any] (deserialized channel snapshots)
    channel_versions: {channel -> version} (monotonic version strings)
    versions_seen: {node_id -> {channel -> version}} (drives which nodes execute next).
  • BaseCheckpointSaver API (sync + async): get, get_tuple, list, put, put_writes, delete_thread; async: aget, aget_tuple, alist, aput, aput_writes, adelete_thread; plus get_next_version(current)->V (must be monotonically increasing; can be float).
  • Super-step semantics: checkpoint saved at each super-step boundary (“tick” where scheduled nodes run). Resume/replay only from checkpoints.
  • StateSnapshot key fields (for graph.get_state*): values, next: tuple[str,...], config (includes thread_id, checkpoint_ns, checkpoint_id), metadata (includes source, writes, step), created_at, parent_config, tasks.
  • Checkpoint namespace (checkpoint_ns): "" for root graph; "node_name:uuid" for subgraph; nested joined by |.
  • Empirical example: sequential START->A->B->END yields 4 checkpoints: empty/START-next; input/next=A; after A/next=B; after B/next=().
  • Replay rule: invoke with prior checkpoint_id to re-run after it; steps before are skipped (replayed from saved results). LLM/tool calls/interrupts after checkpoint are re-triggered.

📖 LangGraph Functional API @entrypoint

Reference Doc · source

Exact decorator/function signature + runtime semantics (inputs/config binding, execution, persistence)

Key content
  • Purpose: entrypoint decorator defines a LangGraph workflow in functional style (sync or async).
  • Decorator signature (v1.1.6):
    entrypoint(self, checkpointer: BaseCheckpointSaver | None = None, store: BaseStore | None = None, cache: BaseCache | None = None, context_schema: type[ContextT] | None = None, cache_policy: CachePolicy | None = None, retry_policy: RetryPolicy | Sequence[RetryPolicy] | None = None, **kwargs: Unpack[DeprecatedKwargs] = {})
  • Decorated function signature rule: must accept one positional input parameter (any type). To pass multiple inputs, use a dict.
  • Injectable runtime parameters (auto-injected at run time):
    • config: RunnableConfig (run-time configuration values)
    • previous: previous return value for the same thread id (only if checkpointer provided)
    • runtime: Runtime (run info incl. context, store, writer)
  • State management / persistence:
    • previous is available only when a checkpointer is enabled and the same config["configurable"]["thread_id"] is used across invocations.
    • To return one value but checkpoint another, return entrypoint.final[value_type, save_type](value=..., save=...). Next run’s previous receives the saved value.
  • Execution patterns:
    • .invoke(input, config) runs once.
    • .stream(input_or_Command, config) streams results; can resume after interrupt(...) using Command(resume=...).
  • Deprecation: config_schema deprecated since v0.6.0; use context_schema (removal in v2.0.0).

📖 LangGraph Runtime Types (interrupts, streaming, state snapshots)

Reference Doc · source

Canonical type definitions that constrain what nodes can return/raise and what the runner expects (interrupts, streaming, checkpointing, dynamic sends, state snapshots).

Key content
  • Interrupting execution (HITL)
    • interrupt(value: Any) -> Any: first call raises GraphInterrupt and surfaces value to the client; graph later resumes via Command(resume=...) and re-executes the node from the start.
    • Multiple interrupt() calls in one node: resume values are matched by call order, scoped per task (not shared across tasks).
    • Requires checkpointing enabled (interrupt relies on persisted state).
    • Interrupt info surfaced in stream as {'__interrupt__': (Interrupt(value=..., id=...),)}.
  • Checkpoint configuration
    • Checkpointer = None | bool | BaseCheckpointSaver
      • True: enable persistent checkpointing for subgraph
      • False: disable even if parent has one
      • None: inherit from parent
  • Streaming modes
    • StreamMode = Literal["values","updates","checkpoints","tasks","debug","messages","custom"]
      • "values": emit full state after each step (incl. interrupts)
      • "updates": emit node/task names + returned updates (each update separately if multiple in a step)
      • "messages": token-by-token LLM messages + metadata
      • "checkpoints": emit when checkpoint created (format like get_state())
      • "tasks": task start/finish + results/errors
      • "debug": includes "checkpoints" + "tasks"
      • "custom": emit via StreamWriter
    • StreamWriter = Callable[[Any], None]: injected kwarg; no-op unless stream_mode="custom".
  • Retry defaults (RetryPolicy, v0.2.24)
    • initial_interval=0.5s, backoff_factor=2.0, max_interval=128.0s, max_attempts=3, jitter=True.
  • Caching (CachePolicy)
    • key_func default: default_cache_key (hashes input via pickle).
  • Dynamic fan-out
    • Send(node: str, arg: Any): used in conditional edges to invoke a node next step with custom per-send state (map-reduce style).
  • State snapshot structure (StateSnapshot)
    • next: tuple[str,...], config: RunnableConfig, metadata: CheckpointMetadata|None, parent_config: RunnableConfig|None, tasks: tuple[PregelTask,...].
  • Reducer bypass
    • Overwrite(value=...): bypass BinaryOperatorAggregate reducer; multiple Overwrite to same channel in one super-step ⇒ InvalidUpdateError.

📖 LangGraph StateGraph node signature + reducers

Reference Doc · source

Explicit node signature State -> Partial<State> and per-key reducer annotation semantics

Key content
  • Core abstraction (StateGraph): Nodes communicate by reading/writing a shared state.
  • Node signature (Eq. 1):
    node: State -> Partial<State>
    Meaning: each node receives the full current State and returns a dict of updates (a “partial” state) containing only keys it wants to write.
  • Per-key reducers (Eq. 2):
    Each state key may be annotated with a reducer used to aggregate multiple node updates to that key in the same step.
    Reducer signature: reducer: (Value, Value) -> Value
    Used when multiple nodes emit updates for the same key; reducer combines them into one value.
  • Reducer annotation mechanism: Use typing_extensions.Annotated in a TypedDict state schema, e.g.
    x: Annotated[list, reducer] (state key x is a list aggregated via reducer).
  • Edge execution rule: add_edge(start_key: str | list[str], end_key: str)
    • Single start_key: downstream runs after that node completes.
    • Multiple start_key list: downstream waits for ALL listed start nodes to complete.
  • Compilation result: compile() returns CompiledStateGraph implementing Runnable with methods like invoke, stream, ainvoke, astream.
  • Streaming modes (enumeration): StreamMode ∈ {"values","updates","debug","messages","custom"}
    • "values": emit full state after each step
    • "updates": emit per-node updates/events

📖 LangGraph streamMode (CompiledStateGraph) — modes & semantics

Reference Doc · source

Enumerated stream modes, what each yields, and defaults

Key content
  • Default: CompiledStateGraph.streamMode: StreamMode[] defaults to ["values"].
  • Supported stream modes (pass to graph.stream() / graph.astream() via streamMode):
    • "values"ValuesStreamPart: streams the full state snapshot after each step.
    • "updates"UpdatesStreamPart: streams state updates after each step; multiple updates in the same step are streamed separately. Output includes node name → update mapping.
    • "messages"MessagesStreamPart: streams 2-tuples (LLM token/messageChunk, metadata) from LLM calls. (Docs note message events can be emitted even when the LLM is run with .invoke rather than .stream.)
    • "custom"CustomStreamPart: streams arbitrary custom data emitted from nodes via config.writer(...) / get_stream_writer.
    • "checkpoints"CheckpointStreamPart: streams checkpoint events (same format as get_state()).
    • "tools": streams tool-call lifecycle events: on_tool_start, on_tool_event, on_tool_end, on_tool_error.
    • "debug": streams all available execution info (node name + full state; “as much information as possible”).
  • Multiple modes at once: set streamMode: ["updates","custom", ...]. Stream yields tuples [mode, chunk] (mode name + that mode’s data).
  • Unified StreamPart format: examples use version="v2" where each yielded item has { type, data, ... }.
  • Subgraphs: subgraphs: true streams outputs from nested subgraphs; chunks may include a namespace field (e.g., ns) to distinguish subgraph vs root.

📖 StateGraph.compile() (LangGraph Python)

Reference Doc · source

Exact StateGraph.compile() signature/params + compiled graph is a Runnable (invoke/stream/batch/async)

Key content
  • Core model (StateGraph):
    • Nodes communicate via shared state.
    • Node signature: State -> Partial<State> (returns an update dict merged into existing state).
    • Optional reducers per state key: annotate a key with a reducer to aggregate multiple updates.
      • Reducer signature (Eq. 1): (Value, Value) -> Value (left/current, right/update).
  • Must compile to execute: StateGraph is a builder; call .compile() to get an executable graph supporting invoke(), stream(), ainvoke(), astream().
  • compile() signature (Python):
    compile(
      checkpointer: Checkpointer = None,
      *,
      cache: BaseCache | None = None,
      store: BaseStore | None = None,
      interrupt_before: All | list[str] | None = None,
      interrupt_after: All | list[str] | None = None,
      debug: bool = False,
      name: str | None = None,
    ) -> CompiledStateGraph[StateT, ContextT, InputT, OutputT]
    • Defaults: checkpointer=None, cache=None, store=None, interrupt_before=None, interrupt_after=None, debug=False, name=None.
    • Return: CompiledStateGraph implementing Runnable (invokable, streamable, batchable, async).
  • Execution flow procedure (Quickstart):
    1. Define State schema (e.g., TypedDict).
    2. Write node functions returning partial updates.
    3. Add nodes + edges (START → …).
    4. app = graph.compile() then app.invoke(initial_state).
  • Conditional looping example: add_conditional_edges("increment", should_continue) where should_continue returns "increment" until count < 3, else END; result reaches count: 3.

📋 # Source: https://danielfridljand.de/post/temporal-human-in-the-loop

Source ·

📋 # Source: https://docs.temporal.io/ai-cookbook/human-in-the-loop-python

Source ·

📋 # Source: https://github.com/langchain-ai/langgraph/blob/main/docs/docs/concepts/functional_api.md

Source ·

📋 # Source: https://github.com/langchain-ai/langgraph/blob/main/docs/docs/concepts/persistence.md

Source ·

📋 # Source: https://github.com/langchain-ai/langgraph/issues/1568

Source ·

🔍 Atomic message replacement vs reducers (LangGraph StateGraph)

Explainer · source

Node return-type rules + reducer semantics (replace vs append) + replay/idempotency implications

Key content
  • StateGraph contract (Core rule): Each node reads State and returns a Partial<State> update. Returned keys are merged into shared state.
  • Reducer semantics (per-key aggregation): A state key can be annotated with a reducer used to combine multiple updates.
    Reducer signature (Eq. 1): reducer(left: Value, right: UpdateValue) => Value
    • left = current accumulated state value
    • right = node’s returned update for that key
  • Append-style messages reducer (example):
    messages: Annotation<BaseMessage[]>({ reducer: (left, right) => left.concat(Array.isArray(right) ? right : [right]), default: () => [] })
    Default: messages starts as [].
  • Implication for “replace history atomically”: If messages uses an append reducer, returning {"messages": [...]} will append, not overwrite. To overwrite/replace, define messages with a reducer that returns right as the new value (i.e., replacement reducer) so the update is atomic at the state-key level.
  • Idempotency / replay rationale: Durable execution + retries/replay can re-run nodes; append reducers can duplicate messages on re-execution. Replacement reducers (or otherwise idempotent update logic) avoid duplication by making the update deterministic for a given run.
  • Execution workflow: Build graph (addNode, addEdge(START, node), addEdge(node, END)), then must call .compile() before .invoke().

🔍 Blackboard System Architecture & Control Cycle

Explainer · source

Architecture-level decomposition (blackboard, knowledge sources, control) + event-driven control/message flow for opportunistic problem solving.

Key content
  • Core components (Basic Blackboard Architecture, Summary sections):

    • Blackboard = global database containing ALL solution-state data; organized into levels (often matching a solution decomposition / abstraction hierarchy) and nodes (attribute–value structures). Nodes can be created/deleted dynamically; nodes across levels can be linked (inter-node relations).
    • Knowledge Sources (KSs) = event-triggered specialist modules (“demons”) containing rules/procedures/tables; only KSs may modify the blackboard; KSs are procedurally independent (no KS references another KS directly).
    • Control component = event-based scheduling: selects focus of attention (context), selects events, selects and invokes triggered KS instances.
  • KS protocol / structure (Knowledge Sources summary):

    • Condition part often split into trigger (event tokens) + precondition (executability test) + other filters.
    • Action part performs blackboard modifications, I/O, and posts events.
  • Primitive knowledge application cycle (Control III):

    1. Select an event
    2. Select KS(s) triggered by that event (in the event’s context)
    3. Execute KS instancecreates new events (repeat)
  • Event-driven control loop with multiple event lists (Control Design Choices / Control Flow):

    • Loop: Select event list → select event(s) → determine triggered KSs → invoke KS(s).
    • Supports clocked events (activate when evaluation-time ≤ current time) and periodic events (repost with time incremented by Δt).
    • Scheduling alternatives: round-robin vs fixed priority vs dynamic utility weights; FIFO/LIFO/priority by token or time&token; execute one vs all triggered KSs.
  • Concrete example numbers (PROTEAN/881 state example): Agenda shows Executable: 98, 94, 86 (rated KSARs), illustrating rating-based scheduling.

🔍 Conditional Edges & Dynamic Routing Semantics (LangGraph)

Explainer · source

Conditional edge routing function signatures + runtime evaluation semantics (how next nodes are chosen/applied)

Key content
  • Node function contract (core pattern): node(state) -> dict returning partial state updates (e.g., {"messages": [new_msg]}), which are merged into graph state via reducers (example uses add_messages to append rather than overwrite).
  • Conditional edge procedure (runtime routing):
    1. Execute a node (e.g., "chatbot") to produce state updates.
    2. Evaluate a routing function on the current input (either a messages list or a state dict containing "messages").
    3. Routing function returns a label (e.g., "tools" or END).
    4. add_conditional_edges(from_node, router, mapping) interprets router outputs via an optional mapping dict (defaults to identity if omitted).
  • Routing function signature/semantics (example route_tools(state)):
    • Accepts either:
      • state: list (treated as messages; uses state[-1]), or
      • state: dict with state.get("messages", []) (uses last message).
    • Decision rule:
      • If last AI message has attribute tool_calls and len(tool_calls) > 0 ⇒ return "tools"
      • Else ⇒ return END
    • Error behavior: if no messages found ⇒ raises ValueError("No messages found...").
  • Looping design rationale: After tool execution, add a normal edge "tools" -> "chatbot" so the LLM can decide next step; this forms the main agent loop.
  • Default/parameter notes shown:
    • Conditional mapping example: {"tools": "tools", END: END} (lets you rename targets, e.g., "tools": "my_tools").
    • Human-in-the-loop tool example asserts len(message.tool_calls) <= 1 to avoid repeated invocations on resume when interrupts/checkpointing are used.

🔍 Custom state reducers beyond add_messages

Explainer · source

Concrete guidance and pointers on custom reducers beyond add_messages, incl. official “Reducers” concept section + state-reducers how-to.

Key content
  • Reducer definition (per-state-key): Each key/channel in LangGraph State has an independent reducer that merges a node’s update into the prior state value.
    • Default reducer (override): if no reducer specified, updates replace the prior value.
  • Reducer function form (Eq. 1):
    new_value = reducer(old_value, update_value)
    • old_value: current state value for that key
    • update_value: partial update returned by a node for that key
  • Procedure: define reducers in state schema
    • JS/TS (Annotation):
      const State = Annotation.Root({
        bar: Annotation<string[]>({
          reducer: (state, update) => state.concat(update),
          default: () => [],
        }),
      });
    • Python (conceptual parallel): define a typed state schema and attach reducer logic per field (messages commonly use a prebuilt reducer).
  • Concrete behavior examples
    • Example A (no reducers): input {foo:1, bar:["hi"]}; node returns {foo:2} ⇒ state {foo:2, bar:["hi"]}; later {bar:["bye"]}{foo:2, bar:["bye"]} (overwrite).
    • Example B (custom reducer for bar): with concat reducer + default []; input {foo:1, bar:["hi"]}; later update {bar:["bye"]}{foo:1, bar:["hi","bye"]}.
  • Design rationale for messages reducer: naive concat breaks manual edits (e.g., human-in-the-loop) because it always appends; messagesStateReducer handles message IDs (overwrite existing) and deserializes OpenAI-style {role, content} into LangChain BaseMessage.
  • Prebuilt state: MessagesAnnotation / MessagesState provides messages: BaseMessage[] with messagesStateReducer; can be extended via ...MessagesAnnotation.spec.

🔍 Durable Human-in-the-Loop with Temporal (Signals, Waits, Queries)

Explainer · source

Step-by-step mechanics for durable approval gates (signals + wait_condition + query) with retry-safe state persistence.

Key content
  • Core rationale (durability for HITL):

    • Human decisions are durably stored in Workflow history; after crashes/timeouts, the Workflow resumes without re-asking for approval.
    • workflow.wait_condition(...) pauses without consuming CPU; Temporal records the “waiting” checkpoint and resumes only when condition becomes true.
  • Signal data model (Step 1):

    • UserDecision = {KEEP, EDIT, WAIT} (enum).
    • UserDecisionSignal(decision: UserDecision, additional_prompt: str="") (dataclass).
  • Workflow state persistence (Step 2):

    • Instance vars persist across execution/replay:
      • _current_prompt: str
      • _user_decision: UserDecisionSignal = UserDecisionSignal(decision=WAIT)
      • (for queries) _research_result: str = ""
  • Signal handler (Step 3):

    • @workflow.signal async def user_decision_signal(decision_data): self._user_decision = decision_data
  • Approval/edit loop (Step 4 + waiting):

    • Loop:
      1. research_facts = execute_activity(llm_call, start_to_close_timeout=30s)
      2. Store for query: _research_result = research_facts["choices"][0]["message"]["content"]
      3. Gate: await workflow.wait_condition(lambda: _user_decision.decision != WAIT)
      4. If KEEP: exit loop → create_pdf activity (start_to_close_timeout=20s)
      5. If EDIT: append additional_prompt to _current_prompt, set llm_call_input.prompt, then reset _user_decision = WAIT and repeat.
  • Query support:

    • @workflow.query def get_research_result(self)->str: return _research_result
    • Queries are synchronous read-only; do not create history events; can query during/after completion.
  • Client interactions:

    • handle = client.get_workflow_handle(workflow_id)
    • Send signal: await handle.signal("user_decision_signal", UserDecisionSignal(decision=KEEP|EDIT,...))
    • Query: await handle.query(GenerateReportWorkflow.get_research_result)

🔍 Durable LangGraph Agents w/ DynamoDBSaver

Explainer · source

End-to-end durable agent architecture using LangGraph + DynamoDB as checkpoint store (schema/flow for resume/replay)

Key content
  • Why LangGraph (graph control flow): Define nodes (tasks) and edges (control flow) that can branch, merge, and loop (cyclic graphs), enabling complex, stateful workflows beyond linear chains.
  • Core persistence concepts (LangGraph):
    • Thread = unique identifier for accumulated state across runs; must pass thread_id in config:
      Eq. 1 (Thread config): {"configurable": {"thread_id": "1"}}
    • Checkpoint = snapshot saved each super-step as a StateSnapshot containing: config, metadata, state channel values, next nodes to execute, and task info (errors/interrupts).
    • Example: a 2-node graph yields 4 checkpoints: empty at START, after user input (before node_a), after node_a output (before node_b), final after node_b at END.
  • Why persistence matters (production rationale): In-memory checkpoints are ephemeral + local → lost on restart; multi-worker runs have isolated state → cannot resume across workers or recover mid-run. Persistent store enables resume, replay, human-in-the-loop, time travel debugging, audit.
  • DynamoDBSaver design (langgraph-checkpoint-aws):
    • Small checkpoint threshold: < 350 KB stored directly in DynamoDB (serialized item + metadata: thread_id, checkpoint_id, timestamps, state).
    • Large checkpoints: ≥ 350 KB state stored in S3; DynamoDB stores an S3 pointer; retrieval transparently loads from S3.
    • Cost/lifecycle knobs: ttl_seconds (auto-expire checkpoints) and enable_checkpoint_compression (serialize+compress to reduce DynamoDB/S3 costs).
  • Required DynamoDB table schema: partition key PK (String) and sort key SK (String).
  • IAM permissions (minimum):
    • DynamoDB: GetItem, PutItem, Query, BatchGetItem, BatchWriteItem
    • S3 (large checkpoints): PutObject, GetObject, DeleteObject, PutObjectTagging, plus bucket lifecycle GetBucketLifecycleConfiguration, PutBucketLifecycleConfiguration.

🔍 Durable execution + interrupt/resume + state updates (LangGraph #4730)

Explainer · source

Concrete edge-case behavior context: state persistence + interrupt/resume patterns (esp. around human-in-the-loop) and how state is updated via reducers; pointers to subgraph concepts for nested graphs.

Key content
  • LangGraph core model (Graphs as control flow):
    • State = shared snapshot passed to nodes; defined as TypedDict/Pydantic + reducers that specify how updates apply.
    • Nodes: functions node(state) -> dict emitting partial state updates.
    • Edges: determine next node(s); can be conditional or fixed.
    • Runtime proceeds in discrete “super-steps” (Pregel-inspired): nodes execute, emit messages along edges; execution halts when all nodes are inactive and no messages are in transit.
  • Reducer formula (state update rule):
    • For key messages with reducer add_messages: append new messages rather than overwrite.
      Example reducer shown: messages: Annotated[list, lambda x, y: x + y] where x=prior list, y=new list.
    • Keys without reducer annotations overwrite previous values.
  • Durable execution / memory procedure (checkpointing):
    1. Compile graph with a checkpointer (example: memory = MemorySaver(); graph = builder.compile(checkpointer=memory)).
    2. Invoke with configurable.thread_id to persist/load state across calls.
    3. State is saved after each step; later invocations with same thread_id resume from saved checkpoint.
  • Human-in-the-loop procedure (interrupt/resume):
    • Tool uses interrupt({"query": query}) and returns human_response["data"].
    • Design rationale: disable parallel tool calling when interrupts can occur to avoid repeating tool invocations on resume: assert len(message.tool_calls) <= 1.
  • Defaults/parameters shown:
    • TavilySearch(max_results=2)
    • Example recursion limit usage: graph.invoke(..., {"recursion_limit": 10}).

📋 Dynamic Workflow Mode for Conditional Edges (workflow_mode)

Code · source

PR description of an alternate conditional-edge execution procedure + compile-time flag (workflow_mode=True) and requirements (path_map or Literal)

Key content
  • Problem (conditional edges + parallelism):

    • Workflow expectation: each node executes only once unless explicitly handled otherwise.
    • In conditional branching (selector node A with branches Type 1 and Type 2) where both branches converge to node E, LangGraph’s parallel processing can cause node E to execute only once (i.e., convergence behavior depends on runtime scheduling/structure rather than intended workflow semantics).
    • Reported discrepancy: adding an extra node D after B (without changing logical intent) can change whether E executes once or twice, implying non-rigorous conditional-edge triggering based on graph shape.
  • Proposed solution / procedure: “Dynamic Workflow Mode”

    • Add an Analyzer that maintains the directed graph of actual execution paths during runtime.
    • During execution, dynamically adjust trigger conditions for subsequent nodes based on the actual path taken, so nodes in the workflow execute exactly once per run (unless special logic says otherwise).
    • Rationale: because nodes/paths are dynamically generated, branch paths can’t be fully determined pre-execution, so path selection must be dynamic at runtime.
  • Configuration / defaults:

    • Enable via compilation: compile(workflow_mode=True).
    • When workflow_mode=True: must provide either path_map or a Literal return type for the conditional function (only one required).
    • path_map/Literal must cover all possible execution paths.
    • Default: if workflow_mode is unset or False, original LangGraph execution mode is used (backward compatible).

📋 LangGraph + gotoHuman human-approval (interrupt/resume) lead-email agent

Code · source

End-to-end LangGraph + external human review integration surface (webhook-driven interrupt/resume), with persistence/checkpointing via Postgres env var.

Key content
  • Workflow (end-to-end procedure):
    1. Trigger agent with a new lead email address.
      • API trigger: HTTP POST [DEPLOY_URL]/api/agent with JSON body: { "email": "new.lead@email.com" }.
      • Manual trigger in gotoHuman: create a trigger form with a text input field ID email and configure the same webhook URL.
    2. Agent researches + drafts a personalized outreach email (LangGraph).
    3. Agent requests human review/approval in gotoHuman; reviewers see it in gotoHuman inbox.
    4. Webhook callback is invoked for each review response to resume the graph (interrupt/resume pattern).
    5. Human can revise draft before final send (approval workflow).
  • Review form setup:
    • Import gotoHuman form template with ID OmmAnhbnWmird3oz60q2.
    • Configure webhook URL (deployment URL) used to resume execution after review.
    • Optional: generate a short-lived public link to share with reviewers.
  • Deployment/config defaults (env vars):
    • OPENAI_API_KEY=sk-proj-XXX
    • GOTOHUMAN_API_KEY=XYZ
    • GOTOHUMAN_FORM_ID=abcdef123
    • POSTGRES_CONN_STRING="postgres://..."
  • Design rationale: use gotoHuman as a central dashboard for approving critical actions/providing input; integrates with LangGraph via webhook to enable durable, resumable human-in-the-loop execution.

📋 LangGraph add_messages reducer (exact merge + deletion semantics)

Code · source

Reference implementation of message-state reducer utilities (add_messages), including ID-based merging, conversion helpers, and RemoveMessage behavior.

Key content
  • Reducer signature (Eq. 1):
    add_messages(left, right, *, format: Literal["langchain-openai"]|None=None) -> Messages
    • Messages = list[MessageLikeRepresentation] | MessageLikeRepresentation
  • Wrapper behavior: must pass both left and right (non-null) or neither (returns partial); else raises ValueError("Must specify non-null arguments for both 'left' and 'right'...").
  • Coercion pipeline (Procedure A):
    1. If left/right not a list → wrap into list.
    2. Convert to BaseMessage via convert_to_messages(...).
    3. Convert chunks to full messages via message_chunk_to_message(...).
    4. Assign missing IDs: if m.id is Nonem.id = str(uuid.uuid4()) (done for all in left and right).
  • Remove-all sentinel (Procedure B):
    • Constant: REMOVE_ALL_MESSAGES = "__remove_all__".
    • If any RemoveMessage in right has m.id == REMOVE_ALL_MESSAGES, return only messages after that index: right[remove_all_idx+1:] (drops all prior state).
  • ID-based merge rule (Eq. 2):
    • Build merged = left.copy() and merged_by_id = {m.id: index}.
    • For each m in right:
      • If m.id exists: replace merged[existing_idx] = m. If m is RemoveMessage, mark ID for deletion.
      • If m.id missing in left:
        • If m is RemoveMessageerror: deleting non-existent ID.
        • Else append and record index.
    • After loop: filter out IDs marked for deletion.
  • Formatting parameter:
    • format=="langchain-openai"_format_messages uses convert_to_openai_messages then convert_to_messages.
    • Any other truthy formatValueError("Unrecognized format=...").
  • State schema helper:
    MessagesState(TypedDict): messages: Annotated[list[AnyMessage], add_messages]
  • Deprecation: MessageGraph(StateGraph) is deprecated (since v1.0.0; removed v2.0.0); it uses Annotated[list[AnyMessage], add_messages] as the whole state.

🔍 LangGraph execution + state editing (super-steps, reducers, checkpoints)

Explainer · source

End-to-end execution model (state representation, manual edits, update semantics) + how it relates to checkpoints/threads

Key content
  • Graph primitives (definition):
    • State = shared snapshot (schema + per-key reducers).
    • Nodes = functions that take current state (optionally config, runtime) and return partial updates (dict of keys→values).
    • Edges = routing logic (fixed or conditional) selecting next node(s).
  • Execution model (Pregel-like “super-steps”):
    • Nodes start inactive; become active when they receive a message (state) on an incoming edge/channel.
    • Active nodes run, emit updates/messages; recipients run in the next super-step.
    • Nodes with no incoming messages “vote to halt” (become inactive).
    • Termination condition: all nodes inactive and no messages in transit.
  • Reducers (per state key):
    • Default reducer = overwrite (latest update replaces prior value).
    • Example reducer for chat history: add_messages appends to messages and handles message IDs + deserialization.
  • Parallelism rule: if a node has multiple outgoing edges, all destination nodes execute in parallel in the next super-step.
  • Schema filtering + write-anywhere rule: nodes may write to any channel in the graph’s internal state union, even if their input schema is a subset (input/output schemas can filter external I/O).
  • Checkpointing/thread config: state snapshots include values, next, and config with thread_id + checkpoint_id; using the same thread_id reloads saved state for multi-turn continuity.
  • Human-in-the-loop procedure: interrupt(prompt_or_payload) pauses; resume via graph.invoke(Command(resume=...)). Example: first call pauses; second call resumes with "yes".

🔍 LangGraph v0.2 — Checkpointers, durability, replay/resume

Explainer · source

Maintainer rationale for standardizing checkpointer interfaces; how persistence enables durable, replayable graph/state execution.

Key content
  • Core design pillar: LangGraph includes a built-in persistence layer via checkpointers. A checkpointer saves a checkpoint of graph state at each step (step-level durability).
  • Capabilities enabled by step checkpoints:
    • Session memory: store checkpoint history of user interactions; resume from a saved checkpoint in follow-up interactions.
    • Error recovery: continue from last successful step checkpoint after failures.
    • Human-in-the-loop: tool approval, wait for human input, edit agent actions.
    • Time travel / forking: edit graph state at any point in execution history and create an alternative execution from that point (“fork the thread”).
  • Rationale for v0.2 changes: community demand for DB-specific checkpointers (Postgres/Redis/MongoDB) existed, but no clear blueprint for custom implementations; v0.2 introduces standardized interfaces + dedicated libraries to simplify creation/customization and foster a community ecosystem.
  • New checkpointer library ecosystem (interchangeable implementations):
    • langgraph_checkpoint: base interfaces BaseCheckpointSaver, SerializationProtocol; includes MemorySaver (in-memory).
    • langgraph_checkpoint_sqlite: SQLite implementation (local/experimentation).
    • langgraph_checkpoint_postgres: production-grade Postgres implementation (open-sourced from LangGraph Cloud).
  • Postgres checkpointer optimizations:
    • Write-side: Postgres pipeline mode to reduce roundtrips; store each channel value separately and versioned so each checkpoint stores only changed values.
    • Read-side: cursor for list endpoint to efficiently fetch long thread histories.
  • Imports / installs (namespace packages):
    • from langgraph.checkpoint.base import BaseCheckpointSaver
    • from langgraph.checkpoint.memory import MemorySaver
    • from langgraph.checkpoint.sqlite import SqliteSaver (requires pip install langgraph-checkpoint-sqlite)
    • from langgraph.checkpoint.postgres import PostgresSaver (requires pip install langgraph-checkpoint-postgres)
  • Versioning: checkpointer libs follow semantic versioning starting at 1.0; breaking changes in main interfaces → major bump (e.g., langgraph_checkpoint 2.0 implies implementations update to 2.0).
  • Breaking rename: thread_tscheckpoint_id; parent_tsparent_checkpoint_id (still recognized if passed via config).

🔍 Markov Games (Multi-Agent RL) + Minimax-Q

Explainer · source

Formal Markov game definition; value/objectives; reduction from MDP to multi-agent; Minimax-Q update + LP.

Key content
  • Markov game definition (Slide 15):
    • Agents: (N). States: (S) (joint configuration).
    • Actions per agent: (A_1,\dots,A_N). Joint action ((a_1,\dots,a_N)).
    • Transition: (T(s,a_1,\dots,a_N,s’)) = (P(s’ \mid s, a_1,\dots,a_N)).
    • Rewards: (R_i(s,a_1,\dots,a_N)) for each agent (i).
  • Policies + objective (Slide 17): stochastic policy (\pi_i(s,a)=P(a\mid s)). Agent (i) maximizes discounted return (\sum_t \gamma^t r_t^i) (discount (\gamma)).
  • Single-agent value iteration (Slide 20, Eq. SA-Bellman):
    (V_{k+1}(s)=\max_a \sum_{s’} P(s’|s,a)\big(R(s,a,s’)+\gamma V_k(s’)\big)).
    (Q^(s,a)=R(s,a)+\gamma\sum_{s’}P(s’|s,a)V^(s’)), (V^(s)=\max_a Q^(s,a)).
  • Two-player zero-sum Markov game backup (Slides 21–22, Eq. MG-Bellman):
    (Q^(s,a_1,a_2)=R(s,a_1,a_2)+\gamma\sum_{s’}P(s’|s,a_1,a_2)V^(s’)).
    (V^(s)=\max_{\pi_1\in\Delta(A_1)}\min_{a_2\in A_2}\sum_{a_1}\pi_1(s,a_1),Q^(s,a_1,a_2)).
  • Minimax-Q algorithm (Slides 24–27, Eq. MMQ-update):
    (Q(s,a_1,a_2)\leftarrow (1-\alpha)Q(s,a_1,a_2)+\alpha\big(r_1+\gamma V(s’)\big)).
    (V(s)\leftarrow \min_{a_2}\sum_{a_1}\pi_1(s,a_1)Q(s,a_1,a_2)).
    (\pi_1(s,\cdot)\leftarrow \arg\max_{\pi_1’\in\Delta(A_1)} \min_{a_2}\sum_{a_1}\pi_1’(s,a_1)Q(s,a_1,a_2)).
    Action selection: (\epsilon)-greedy w.r.t. (\pi_1).
  • LP to compute (\max\min) policy (Slide 31, Eq. LP): maximize (v) s.t.
    (v \le \sum_{a_1}\pi_1’(s,a_1)Q(s,a_1,a_2)\ \forall a_2); (\sum_{a_1}\pi_1’(s,a_1)=1); (\pi_1’(s,a_1)\ge 0).
  • Defaults/examples: Matching pennies example uses (\gamma=0.9) (Slide 29); init (Q\leftarrow 1), (V\leftarrow 1), (\pi_1) uniform; step size example (\alpha=1/#\text{visits}(a_1,a_2)) (Slide 31).
  • Evaluation workflow (Slides 36–38): train (\pi_1) against (self-play / random / other learner), then test by fixing (\pi_1) and training/evaluating (\pi_2) including best response (BR(\pi_1)) via single-agent RL (treat fixed opponent as environment).

🔍 Merging state after parallel nodes (reducers + ordering)

Explainer · source

How LangGraph combines state updates from parallel branches; reducer requirements; what ordering/determinism to expect.

Key content
  • Execution/merge model (parallel branches):
    • When multiple nodes run “in parallel” and each returns a partial state update (a dict), LangGraph merges updates field-by-field using the state schema’s reducers.
    • Reducer signature (Eq. 1): reducer(existing_value, new_value) -> updated_value
  • Reducer annotation in typed state (procedure):
    • Define a TypedDict state and annotate merge behavior with typing.Annotated[field_type, reducer].
    • Example (code pattern):
      • count: Annotated[int, operator.add] → updates accumulate (count += delta)
      • data: Annotated[dict, merge_dicts] where merge_dicts(existing, new) = {**existing, **new}
      • messages: Annotated[list, add_messages] → appends messages; add_messages is the recommended reducer for chat history (handles duplicates by message ID).
  • Default behavior (important):
    • Without a reducer, a field update overwrites the existing value (last write wins), so parallel branches can appear to “not merge.”
  • Built-in reducers / common choices:
    • operator.add: sums numbers / concatenates lists.
    • operator.or_: merges dicts (with overwrite on key conflicts).
    • Custom reducers for policies like max: keep_max(existing, new) = max(existing, new).
  • Ordering expectation (design rationale):
    • For parallel updates, do not rely on branch completion order for correctness; instead, make merges deterministic by using appropriate reducers (especially for lists/messages).