Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Copyright 2026 Firefly Software Foundation. Licensed under the Apache License 2.0.
[26.06.13] - 2026-06-22¶
Best-in-class auto-instrumentation, on by default — plus an observability-metrics de-dup.
Changed (behavioural)¶
- Native pydantic-ai instrumentation is now ON by default (
native_instrumentation_enableddefaults toTrue, gated byobservability_enabled). Every agent emits rich GenAI-convention spans + metrics for each model request (chat) and tool call (execute_tool), nested under the framework'sagent.{name}span. Default-on is safe because the framework leaves the tracer/meter providers unset (telemetry flows through the global OTel API and costs nothing until the host wires an exporter) and prompt/response content is stripped by default (instrumentation_include_content=False). Setnative_instrumentation_enabled=falseto opt out (e.g. to cut span volume on very high-throughput hosts). - Single source of truth for metrics.
ObservabilityMiddlewareno longer emitsfirefly.tokens.total/firefly.latency— these (andfirefly.cost.total, the prompt/completion token split) are emitted solely by the usage/cost-sink path (UsageTracker→OTelMetricsSink), which previously double-counted them in every metrics backend. The middleware now owns only the span lifecycle (matching the existingagent.completedde-dup). Metrics flow whencost_tracking_enabled(defaultTrue). The span also now carriesfirefly.correlation_id.
Fixed¶
- Removed a stale
# RubricReviewer (placeholder — full implementation in subsequent tasks)comment —RubricRevieweris fully implemented and tested; the comment was misleading.
[26.06.12] - 2026-06-22¶
SP-10: native pydantic-ai OpenTelemetry instrumentation.
Added¶
- Native instrumentation for FireflyAgents via pydantic-ai's
capabilities=[Instrumentation(InstrumentationSettings(...))]— rich GenAI-semantic-convention spans for each model request (chat <model>) and each tool call (execute_tool <name>), with provider, model, and token usage. Off by default; enable withnative_instrumentation_enabled. Four config fields (envFIREFLY_AGENTIC_*): native_instrumentation_enabled(False) — master switch; effective only whenobservability_enabledis alsoTrue.instrumentation_include_content(False) — privacy-safe default; prompt/response text and tool args/results are stripped from spans unless opted in (overrides pydantic-ai's owninclude_content=True).instrumentation_version(2) — GenAI convention version (1–5).instrumentation_event_mode(attributes) — span attributes vs OTel logs. Uses the non-deprecated capabilities API (neverAgent(instrument=), which emits aPydanticAIDeprecationWarningthe SP-9 gate would reject). Providers left unset → spans flow through the global OTel API; the host owns the SDK/exporter.
Changed¶
ObservabilityMiddlewarenow activates itsagent.{name}span in the OTel context (attach inbefore_run, detach inafter_runandon_error) so spans created during the run — notably the native instrumentation spans — nest under it instead of becoming disjoint roots in a separate trace. The span also carriesfirefly.correlation_id(the join key to cost records). The detach runs on every exit path (success, error, streaming, HITL pause) with no context-token leak.
Notes¶
- Two complementary altitudes per run: the coarse firefly
agent.{name}envelope (name/method/correlation_id) and the fine nativeinvoke_agent → chat / execute_tooltree (gen_ai.*). No duplicate model span — pydantic-ai de-dups internally. - Verified with an in-memory OTel exporter (nesting, gating, privacy, no deprecation
warning, one chat span per request) and live against Anthropic (real
claude-haiku-4-5spans with real token usage, nested, content stripped). - This closes the deferred SP-10: the memory's "blocked on pydantic-ai 2.0 capabilities
surface" was outdated — the
Instrumentationcapability is available and warning-free on the pinned 1.x line.
[26.06.11] - 2026-06-22¶
SP-3: human-in-the-loop tool approval re-based onto pydantic-ai native deferred-tools.
Added¶
- Native tool approval / HITL. Tools can declare
requires_approval=True(firefly_tool(...),BaseTool, and threaded throughToolKit.as_pydantic_tools()/as_toolset()). When the model calls such a tool, the agent run pauses before executing it and returns aDeferredToolRequestsasresult.output. Detect with the newis_deferred(result)helper; resume viaagent.run(message_history=..., deferred_tool_results=DeferredToolResults(approvals={call_id: True | ToolApproved(override_args=...) | ToolDenied(message=...)})).FireflyAgentauto-detects HITL (any approval-requiring tool/ToolKit/as_toolset(), or anApprovalRequiredToolsetintoolsets) and widens its output union to allow the pause only then — non-HITL agents are unchanged. Force withhitl=True. - Inline (non-pausing) approval.
FireflyAgent(approval_handler=...)resolves approvals inside the run via a nativeHandleDeferredToolCallscapability — for programmatic / policy-based auto-approval. - Native re-exports from
fireflyframework_agentic.tools:DeferredToolRequests,DeferredToolResults,ToolApproved,ToolDenied,ApprovalRequired(plus the already-exportedApprovalRequiredToolset).is_deferredand theApprovalHandlertype are exported fromfireflyframework_agentic.agents.
Changed¶
- Post-run cross-cutting code now treats a paused run as a control object, not a final
answer:
_persist_memory, the output-guard, validation, cache, logging, and explainability middleware all skip aDeferredToolRequestsoutput (preventing corrupted memory turns, spuriousOutputGuardError/OutputReviewError, and caching a pause). - Tool guard denials (validation / rate-limit / sandbox) now raise
ToolGuardErrorinstead of a plainToolError.ToolGuardErrorsubclassesToolError, so existingexcept ToolErrorhandlers are unaffected. BaseTool._guarded_executenow lets pydantic-ai'sApprovalRequired/CallDeferredcontrol signals propagate untouched (likeModelRetry), instead of wrapping them asToolError. This makes dynamic approval work — a tool body (withtakes_ctx=True) mayraise ApprovalRequired(metadata=...)to defer that specific call; pair withFireflyAgent(hitl=True)so the output union allows the pause.
Removed (breaking)¶
ApprovalGuard(and theApprovalCallbackalias). The bespoke guard-chain approval (sync bool callback →ToolErroron denial, no pause/resume/metadata) is replaced by the native protocol above. Migration:docs/migration.md§6.
Notes¶
- HITL stays three distinct layers by design: tool approval (native deferred-tools, agent
layer), workflow
human()/WorkflowInterrupt(journal-replay), and pipelinePause/approve_pause(checkpoint). They are not collapsed. - Validated against a live Anthropic model: a
requires_approvaltool pauses the real run (tool body does not execute), and resuming with approval runs it exactly once.
[26.06.10] - 2026-06-22¶
SP-5: native structured-output modes for reasoning patterns.
Added¶
- Selectable output modes on the real-model reasoning paths. Reasoning's
structured calls (
_structured_run) now wrap theoutput_typein a pydantic-ai output mode chosen via a new per-patternoutput_mode=argument or the framework-widereasoning_output_modeconfig value (FIREFLY_AGENTIC_REASONING_OUTPUT_MODE): None(default) — pydantic-ai's default tool-calling output (no behaviour change)."tool"— force tool-based structured output (ToolOutput)."native"— provider-native JSON-schema output (NativeOutput; OpenAI/Google/…)."prompted"— schema-in-prompt JSON parsing (PromptedOutput); the portable choice for models without tool-calling or native structured output. Threaded through all six concrete patterns (ReAct, Chain-of-Thought, Plan-and-Execute, Tree-of-Thoughts, Reflexion, Goal-Decomposition). NewOutputModetype alias exported fromfireflyframework_agentic.reasoning. Resolution order: per-pattern argument → config default → pydantic-ai default.
Notes¶
- The model-less duck-typed fallback (
_fallback_parse, used by mocks/agents without a resolvable model) is unchanged — it cannot make an LLM call, so it keeps its text-parsing graceful-degradation cascade. Output modes apply only to the two real-model paths (FireflyAgent route + ephemeral agent). - Validated against a live Anthropic model: a Chain-of-Thought pattern with
output_mode="prompted"produces correct validated structured steps end to end.
[26.06.9] - 2026-06-22¶
Documentation coverage pass — every 26.06.x change is now explained, with two new reference guides and several corrected docs (driven by a per-doc audit verified against the source).
Added¶
docs/resilience.md— the circuit breaker (state machine, directasync with CircuitBreaker(...)usage,CircuitBreakerMiddlewareagent wiring, inspection/reset, API reference).docs/storage.md— the managed-SQLite durable layer (StorageBackend,LocalBackend,DatabaseStore,WriteSession/LockTokenleasing, atomic writes, theSqliteVecVectorStoreconsumer, types/exceptions reference).- Multi-Provider Support section in
docs/architecture.mdconsolidating the "works with any provider" story (identity normalisation, provider-aware cost, prompt-cache routing, tool-schema portability, failover/rate-limit handling).
Changed (docs)¶
agents.md: documented the middleware error lifecycle (on_error/ LIFOrun_error), theObservabilityMiddlewarespan-on-error andCircuitBreakerfailure-recording-via-on_error,MiddlewareContextfields, andmodel_settings/default_temperature(provider-safeNonedefault). Fixed a factual error: corrected the OpenAI prompt-cache mechanism and documented that Claude via Bedrock/OpenRouter is skipped with a warning.reasoning.md: documented that structured runs route through the sourceFireflyAgent(middleware/retry/usage) rather than a barepydantic_ai.Agent.observability.md: provider-agnostic identity → cost, provider-aware reasoning tokens (Geminithoughts_tokens), the strict-mode/Budget-priceable-only caveat, and the Bedrock vendor-prefix retry; fixed the misleading reasoning-token claim.tools.md: correctedretryable(wraps_execute), documentedCachedToolsingle-flight, theFallbackComposerchained traceback, and the Gemini free-form-schema constraint.workflows.md:stream()structured-output fallback, the preciseusing=rejection/forwarding behaviour, andcascade()output_type/instructions/max_escalations.tutorial.md: corrected the staledefault_temperaturedefault.- docs index + README Module Reference now list Resilience and Storage.
Fixed¶
- The middleware error lifecycle is now uniform:
run_with_reasoning()also fireson_erroron failure (previously onlyrun/run_sync/run_streamdid), so circuit breakers and span cleanup work for reasoning runs too.
[26.06.8] - 2026-06-22¶
Provider-aware reasoning-token cost accounting (COST-001/COST-002) — the last open audit item.
Fixed¶
- Gemini thinking tokens are now priced. Gemini reports thinking under
usage.details["thoughts_tokens"]and excludes them fromoutput_tokens, so they were previously uncounted (a ~4× undercount for thinking-heavy calls). The agent, reasoning and workflow cost paths now fold them in at the output rate via a sharedreasoning_tokens_not_in_output(usage)helper. - No double-counting on OpenAI/Anthropic. The helper reads the Gemini-specific
thoughts_tokenskey only — OpenAI'sreasoning_tokensare already insideoutput_tokens(reading them would inflate o-series cost ~53%) and Anthropic folds thinking intooutput_tokenstoo, so both contribute0. Corrected the stale resolver docstrings that implied otherwise.
Changed¶
genai-pricespinned to>=0.0.66,<0.1. It is a pre-1.0 package whoseUsage/calc_pricesurface could change between minors and break all cost resolution at once; bumps are now deliberate.provider_reported_cost(OpenRouter authoritative per-call USD) documented as a seam for custom integrations — pydantic-ai 1.107 does not surface that cost on the result/usage, so it is populated only when a caller passesprovider_payload.
[26.06.7] - 2026-06-21¶
The deferred P2/P3 wave from the framework audit — provider robustness, durability, thread-safety and observability of failures.
Fixed¶
- Cache stampede —
CachedToolnow single-flights concurrent identical misses (the first caller computes; the rest await the same result), and uses anasyncio.Lock, so an expensive/rate-limited tool runs once per key. - Atomic durable writes —
FileCheckpointer(pipeline) andFileJournalBackend(workflows) write to a temp file andos.replace, so a crash mid-write can't leave a truncated checkpoint / resume journal. - Delegation thread-safety —
RoundRobinStrategy(cursor) andContentBasedStrategy(lazy router build) are guarded by locks; concurrent routing no longer races the cursor or constructs duplicate routers. - Prompt-cache no longer silently no-ops on Bedrock/OpenRouter — Claude via those
providers logs a warning and skips (the Anthropic cache settings only apply to the
direct
AnthropicModel) instead of writing dead settings. - Bedrock cost resolution — on a genai-prices miss for a
bedrock:vendor.modelid, retry on the bare model name with the vendor as the provider before giving up. - Provider retry hints — the rate-limit backoff prefers a provider's structured
retry hint (e.g. Gemini
ResourceExhausted.retry_delay) before the OpenAI/Anthropic-shaped textual body. RetryMiddleware.backoff_multiplieris now honoured (it was dead config — the override path never passed it toAdaptiveBackoff).- Gemini-safe tool schema — the built-in
DatabaseToolparamsargument is now a JSON-object string instead of a free-formdict[str, Any](whose open object schema Gemini'sFunctionDeclarationrejects); a dict is still accepted from non-LLM callers. - Error context preserved —
FallbackComposerandrun_with_fallbackkeep the original traceback (raise … from …) instead of discarding it on exhaustion. - Usage-recording failures now log at warning (not debug), so a silently-broken cost/budget pipeline is visible.
Changed¶
QuotaManagerexposes a publicbackoffproperty; framework internals no longer reach into the private_backoff.
Deferred (documented)¶
- COST-001/COST-002 (provider-aware reasoning-token pricing +
provider_payloadplumbing) remain open: a naive caller-side reasoning-token add would double-count OpenAI (whoseoutput_tokensalready includes them), so the correct fix needs provider-aware normalisation and is tracked separately.
[26.06.6] - 2026-06-21¶
Provider-agnosticism, cross-subsystem cohesion and correctness hardening, driven by a framework-wide audit (6 lenses, adversarially verified) and validated against a live Anthropic model end to end. All nine confirmed P0/P1 findings addressed.
Fixed¶
- Provider identity for
Modelobjects —model_utils.extract_model_infonow reads the provider from the model's own_provider.name, soOpenAIResponsesModel,GoogleModel,XaiModel,OpenRouterModel, Azure/DeepSeek, etc. resolve to the correctprovider:modelinstead of dropping the provider (which corrupted cost lookup, quota keys and usage grouping for any non-string model). - Model-family detection — match on the model name only, fixing
ollama:…/groq:…being misclassified asmeta(the substringllamalived insideollama); addedgrok→xai; multi-vendor proxies fall back tounknown. - Structured-output streaming no longer crashes —
StreamHandle.text()falls back to stringifiedstream_output()snapshots andstream_tokens()raises a clearAgentErrorwhen a run is non-text, instead of an opaque pydantic-aiUserError. - Middleware error lifecycle — added an
on_errorhook to the middleware protocol/chain and wired it intorun/run_sync/run_stream. TheCircuitBreakerMiddlewarecan now actually open, andObservabilityMiddlewareends its OTel span on a failed run instead of leaking it. - Reasoning runs through
FireflyAgent— a reasoning pattern's structured calls now route through the sourceFireflyAgent(middleware, 429 retry, usage recording) when available, instead of a bare ephemeralpydantic_ai.Agent; the recorded model identifier is normalised (no morestr(Model)reprs in cost). cost_strictis honoured on every path — the workflowprice_calland the agent usage path no longer swallowUnknownModelCostError, so an unpriceable model fails closed undercost_strictinstead of being billed as$0.retryable()applies where it matters — wraps the_executehook, so retries now cover the pydantic-ai handler path andRunContext-aware tools (not just a directtool.execute()call).
Changed¶
default_temperatureis wired and provider-safe — now defaults toNone(use the provider's own default) and is merged into an agent's model settings only when configured and the caller omits it; previously the knob was silently ignored. This avoids forcing a temperature on models that reject one (e.g. OpenAIo1/o3).config.budget_limit_usddocstring corrected (it raisesBudgetExceededError, not "logs a warning") and documents that the gate enforces only over priceable models.
Added¶
- Live end-to-end test suite (
tests/integration/test_real_anthropic_e2e.py,@pytest.mark.nightly, skipped without a realANTHROPIC_API_KEY) covering the whole stack against a real Anthropic model: agent + tools, structured output, streaming, multi-turn memory, a reasoning pattern, a pipeline, a Dynamic Workflow (FireflyAgentRunner+using=) and cost tracking. Closes the previous gap where every provider-facing behaviour was validated only against mocks;tests/README.mdcorrected to match.
[26.06.5] - 2026-06-21¶
Framework alignment: the Dynamic Workflows engine and the tools system now run on the framework's own primitives and on pydantic-ai's native model — no parallel implementations, no lossy shims. Includes breaking changes; see the Migration Guide.
Changed (BREAKING)¶
- Tool parameters use a real
python_type.ParameterSpecandToolBuilder.parameter(name, python_type, …)now take a real Python type object (list[str],Literal[...], a nestedBaseModel,dict[str, Any] | None, …) instead of a stringtype_annotation. pydantic-ai introspects it directly, so the LLM gets full-fidelity schemas (element types, enums and nested models are preserved). The stringtype_annotationfield and its resolver (_TYPE_MAP,_resolve_param_type) are removed; all built-in tools were migrated. FireflyAgentRunneris the default workflow runner. Workflow sub-agents now run through aFireflyAgent(middleware, observability, guards, 429-retry, global usage tracker / budget gate, model fallback) instead of a barepydantic_ai.Agent. Global cost tracking and a configuredbudget_limit_usdnow apply to sub-agents. The model-resolution contract is unchanged; passrunner=DefaultAgentRunner()for the previous lightweight path.
Added¶
FireflyAgentRunner— runs each workflow call through aFireflyAgent; its source may beNone(a fresh isolated ephemeral agent per call), aFireflyAgentinstance (reused), a registry name, or a factory. Tokens/cost are booked once per ledger (per-runWorkflowBudgetand the global tracker, disjoint).agent(..., using=<FireflyAgent | name>)(andstream(..., using=)) — target a specific configured agent for one call: multi-model sub-agents and per-task cost optimisation. Composes withSmartRoutingRunner.- Tool
RunContextopt-in —BaseTool(..., takes_ctx=True)delivers pydantic-ai'sRunContext(agent deps, usage, retries) to_executeas the keyword-only_ctx; guards and the cache never see it, so it cannot poison a cache key. - Native toolset combinators re-exported from
fireflyframework_agentic.tools:RunContext,FilteredToolset,PrefixedToolset,RenamedToolset,CombinedToolset,WrapperToolset,PreparedToolset,ApprovalRequiredToolset, plus theto_pydantic_handler(tool)helper. - Migration guide (
docs/migration.md).
Fixed¶
ToolKit.as_toolset()now forwards each tool's description to the model (pydantic-ai'sadd_functiondropped it before), so toolset tools are no longer description-less to the LLM.
[26.06.4] - 2026-06-21¶
Dynamic Workflows — the final SOTA wave: token-level streaming and end-to-end static typing of the DSL.
Added¶
- Streaming —
stream(prompt, ...)is an async context manager that streams one sub-agent's output token-by-token: iteratehandle.text()for deltas, then readhandle.outputfor the full result after the block. It honours the budget, concurrency gate, journal (a resumed call yields its cached output once) and cost accounting exactly likeagent(). Backed by aStreamingAgentRunnerprotocol thatDefaultAgentRunnerimplements via pydantic-ai'srun_stream; a non-streaming runner raisesWorkflowError. Streamed calls emitagent.start/agent.endwithstream: True. New exports:stream,StreamHandle,StreamingAgentRunner. - Typed generics —
agent(output_type=T)is now typed to returnT(via@overload) instead ofAny, and@workflowproduces aWorkflow[OutputT]inferred from the function's return annotation, soawait my_workflow(args)is statically typed end-to-end with no casts.
[26.06.3] - 2026-06-20¶
Dynamic Workflows — the durable-composition wave: sub-workflows, durable resume, and human-in-the-loop (the top remaining SOTA gaps from the analysis).
Added¶
- Sub-workflows —
subworkflow(name_or_wf, args)runs another workflow inline, inheriting the parent's budget, concurrency gate, journal and runner (one deterministic sequence stream across the nested run). Emitssubworkflow.start/subworkflow.end. - Durable resume —
JournalBackendprotocol +FileJournalBackend. Attach one to aJournal(Journal(backend=…, run_key=…)) and every completed call flushes to durable storage, so an out-of-process crash resumes from the last call. - Human-in-the-loop —
human(prompt)pauses a run by raisingWorkflowInterrupt; the callerprovide()s the answer and re-runs with the same journal to resume past the pause. Sequence-keyed, so resume is deterministic; pairs with aJournalBackendfor approvals that survive a restart. Emitshuman.pause.
Fixed¶
WorkflowInterruptis in the never-swallow set, so ahuman()pause inside aparallel/pipelinebranch propagates instead of being silenced.
[26.06.2] - 2026-06-20¶
Dynamic Workflows: smart model routing, multi-model cost optimization, and the budget/quality wiring that makes them observable. Driven by a SOTA gap analysis (verdict: sound architecture, under-wired — the cost stack existed but wasn't connected to the engine).
Added¶
SmartRoutingRunner— a drop-inAgentRunnerthat picks the cheapest capable model per call from ordered tiers, with fallback escalation on transient errors. PluggableModelSelectionStrategy(ComplexityHeuristicStrategydefault — training-free;CostFloorStrategy— genai-prices cheapest). Emitsroute.select/route.escalateevents. An explicitagent(model=…)always wins.cascade()— cheap-first, escalate-on-low-confidence (FrugalGPT-style), returning aCascadeResult; a judge model scores each tier by default. Emitscascade.tier.- USD + wall-clock budgets —
WorkflowBudget(max_cost_usd=…, max_wall_seconds=…);DefaultAgentRunnernow prices every call viagenai-prices(AgentCall.cost_usd), andWorkflowContextexposescost_spent_usd/remaining_cost_usd(). judge_panel()/Verdict— heterogeneous-model verification with a structured verdict.map_agents()— concurrentmapsugar overparallel(removes the late-binding-lambda footgun).price_model()helper.
Fixed¶
- Budget kill-switch swallowed in
parallel/pipeline. AWorkflowBudgetErrorraised inside a fan-out branch resolved toNoneinstead of aborting the run; structural/kill-switch errors now propagate (ordinary branch failures still resolve toNone). run_idcollisions. Default run id is nowf"{name}-{uuid4().hex[:8]}"(wasf"{name}-run"), so concurrent runs no longer merge in logs/telemetry.
[26.06.1] - 2026-06-20¶
Completeness & wiring fixes for the new subsystems, validated end-to-end against a real model (structured output, parallel fan-out, pipeline, a sub-agent calling a toolset tool, budget enforcement, journal resume, adversarial verify, and a connected workflow + secure-execution run).
Added¶
- Workflow sub-agents can use tools.
agent()(and theAgentRunnerseam /DefaultAgentRunner) now accepttools=andtoolsets=, so a workflow sub-agent can use aToolKit.as_toolset(), an MCP server, or raw tools — just like a top-level agent.depsnow also sets the underlying agent'sdeps_type.
Fixed¶
- Code Mode async tools.
MontyEnvironment.run_codenow bridges async external functions (e.g. Firefly tools exposed viatoolkit_external_functions) to sync, so sandboxed guest code calls them naturally withoutawait. Previously an async tool returned an unresolved coroutine to the script.
[26.06.0] - 2026-06-20¶
pydantic-ai modernization program (phase 1): dependency upgrade + deprecation / stability fixes, two new headline subsystems (Dynamic Workflows, Secure Script Execution), and native-capability adoption (message persistence, toolsets).
Added¶
- Dynamic Workflows engine (
fireflyframework_agentic.workflows) — a code-defined orchestration DSL over pydantic-ai agents, mirroring Claude's Workflow mechanism:@workflow,agent,parallel(barrier; failures → None),pipeline(streaming, no inter-stage barrier),phase,log;WorkflowBudget(concurrency / agent-count / token ceilings);Journaldeterministic resume; pluggableAgentRunner(DefaultAgentRunner+ test fakes);workflow_registry/run_workflow; verify combinators (adversarial_verify,loop_until_dry). Seedocs/workflows.md. - Secure Script Execution (
fireflyframework_agentic.execution, new[script-execution]extra) — run untrusted/generated Python in a self-hosted, deny-by-default sandbox.ExecutionEnvironmentprotocol;MontyEnvironment(pydantic-monty Rust micro-interpreter — no FS/network/env, host access only via registered external functions);SecureScriptRunner(validate → execute → capture with optional output scrubbing);analyze_code/SafetyPolicyAST pre-screen;ExecutionLimits; Firefly Code Mode (toolkit_external_functions). Seedocs/execution.md. - Native message persistence —
model_utils.serialize_model_messages/deserialize_model_messages(built on pydantic-ai'sModelMessagesTypeAdapter);ConversationMemoryexport/import now round-trips typedModelMessagehistory losslessly (previously dropped as "not portable"). Backward-compatible with pre-existing exports. - Native toolsets —
ToolKit.as_toolset()returns a composable pydantic-aiFunctionToolset;FireflyAgent(toolsets=...)passes toolsets through to the underlying agent (enablesWrapperToolset/ApprovalRequiredToolset/MCP servers). - Deprecation CI gate — pytest promotes
pydantic/pydantic-aideprecation warnings to errors so a future dependency bump that reintroduces one fails CI immediately (the framework's own intentionalDeprecationWarnings are unaffected).
Changed¶
- Upgraded pydantic-ai 1.99.0 → 1.107.0 and pinned
>=1.107.0,<2(the previously-unbounded requirement could resolve to the breaking 2.0 line);pydantic>=2.13,<3;pydantic-settings>=2.14.2,<3. AgentRunResult.usageis now read as a property via the newobservability.usage.resolve_run_usagehelper (noPydanticAIDeprecationWarning), with forward-compatible support for method-style/legacy usage objects.observability/decoratorsuseinspect.iscoroutinefunction(theasynciovariant is removed in Python 3.16); the content-based delegation router usesinstructions=and is cached.
Fixed¶
- OutputGuardMiddleware output sanitisation degraded results to a bare string
(its
_replaceguard never matched the dataclassAgentRunResult), dropping usage/messages/type — now preserved viadataclasses.replace/NamedTuple/mutable handling. - RetryMiddleware was a silent no-op — its configuration never reached the
retry loop; a per-agent
RetryMiddlewarenow takes effect. - Cost (
_firefly_cost_usd) was read for logging but never written — now wired from the recordedUsageRecord. - Nullable/generic tool parameter annotations (
"str | None","dict[str, Any] | None") fell back tostr, producing incorrect non-nullable JSON schemas for the LLM — now resolved correctly (Optional/union/generic forms). - Removed the dead
ModelRetryoptional-import guard (pydantic-ai is a hard dependency).
[26.05.33] - 2026-05-31¶
Removed¶
- BREAKING — REST/queue exposure layer. Deleted the
fireflyframework_agentic.exposurepackage (FastAPI app factory, HTTP/WS controllers, health probes, SSE, CORS/rate-limit/auth middleware, and Kafka/RabbitMQ/Redis consumer/producer hosts), therest/kafka/rabbitmq/redis/queuesextras, theExposureError/QueueConnectionErrorexceptions, and the REST-serving config fieldsauth_api_keys/auth_bearer_tokens/cors_allowed_origins. Serving/hosting is now owned by the consuming service. The framework is a pure in-process library: it serves no port and consumes no broker. - BREAKING — service/infra observability. Removed
observability.configure_exporters(global OTel SDK provider/exporter wiring), the W3C trace-context propagation helpers (inject_trace_context/extract_trace_context/get_trace_context/set_trace_context/trace_context_scope), theWebhookSink, and theotlp_endpointconfig field. The framework still emits model/agent spans/metrics via the OpenTelemetry API; configuring the SDK/exporters and cross-service trace propagation is now the host's responsibility. - BREAKING — inbound RBAC auth. Removed
security.RBACManager/require_permission, therbac_enabled/rbac_jwt_secret/rbac_multi_tenantconfig fields, and thepyjwtdependency from thesecurityextra (cryptographystays forEncryptedMemoryStore). Inbound-request authorization is a hosting concern owned by the service.
Changed¶
experiments/labdocumented as optional leaf developer-tooling modules (no code or dependency change; they were already not imported by the core).
[26.05.32] - 2026-05-31¶
Fixed¶
QdrantVectorStore.deleteis now namespace-scoped — it deletes only points matching both the namespace and the requested ids (via aFilterSelectorcombining a_namespaceFieldConditionwithHasIdCondition), mirroring the namespace filter applied on search. Previously it deleted by a bare id list, ignoring the namespace.
Changed¶
scope_namespacevalidates its inputs — rejects empty components or components containing/, so distinct(tenant_id, workspace_id)scopes can never encode to a colliding namespace. The guard lives where the namespace is built rather than trusting callers.
[26.05.31] - 2026-05-31¶
Added¶
- pgvector vector store —
fireflyframework_agentic.vectorstores.PgVectorVectorStore, an asyncpg-backedBaseVectorStorepeer to the Chroma / Pinecone / Qdrant adapters. Owns its table with an HNSW cosine index, namespace-scoped storage, idempotent runtime schema bootstrap, and metadata filtering. Adds an overridable_prepare_session(conn, *, namespace)per-transaction hook (default no-op) for connection-level session setup — e.g.SET LOCALfor Postgres Row-Level Security GUCs. New optional extra[vectorstores-pgvector](asyncpg); requires thepgvectorextension on the server. This fills the only vector backend the framework was missing. - Tenant-scoped vector store layer —
fireflyframework_agentic.vectorstores.scoped:ScopedVectorStore(an explicit, fail-loudProtocolwith required keyword-onlytenant_id/workspace_id) andTenantScopedVectorStore, a backend-agnostic wrapper that folds(tenant_id, workspace_id)into the canonical"t/<tenant>/w/<workspace>"namespace (and stamps it onto document metadata), making anyVectorStoreProtocolbackend multi-tenant with one wrapper. Addsscope_namespace/parse_scope_namespacehelpers. The existing single-namespaceVectorStoreProtocolis unchanged (additive, non-breaking).
Changed¶
QdrantVectorStorenow creates its collection oninitialise()(cosine distance, sized tovector_size, idempotent) and exposesclose(). Previously the collection had to be created out-of-band before the firstupsert.
Fixed¶
QdrantVectorStoresearch now usesquery_pointsinstead of the removedAsyncQdrantClient.search, restoring compatibility withqdrant-client= 1.12 (the method was dropped upstream).
[26.05.30] - 2026-05-31¶
Added¶
fireflyframework_agentic.content.binary— a host-agnostic binary normalisation stack that turns uploaded files (PDF, Office, images, archives, emails) into consumer-readyBinaryArtifactrows for document loaders or multimodal LLMs. Plain classes + aBinaryConfigDTO (no DI framework), pluggableOfficeConverter(Gotenberg / LibreOffice / NoOp) viabuild_office_converter. New optional extra[binary](pypdf, Pillow, pillow-heif, cairosvg, py7zr, extract-msg). This unifies the normalizers previously duplicated in the flycanon and flydocs services.
Removed (BREAKING)¶
- RAG subsystem — deleted
fireflyframework_agentic.rag(CorpusAgent, SqliteCorpus, StoredChunk, ChunkHit, HybridRetriever, reciprocal_rank_fusion, ingest/retrieval pipelines) andtools.builtins.corpus_rag. Consumers that used the corpus dataclasses / hybrid retriever should vendor them (flycanon now owns itsStoredChunk/ChunkHit/HybridRetrieverlocally). The reusableembeddings,vectorstores,contentandstoragemodules are unchanged. - MCP subsystem — deleted
fireflyframework_agentic.exposure.mcp(server, HTTP CLI, OAuth/Entra auth, transports), thefirefly-mcp-httpconsole script, and themcp+corpus-searchoptional extras. corpus_searchexample and its docs (corpus-search-overview,use-case-corpus-search,comparison-vs-qmd,deploy/mcp-corpus-auth,deploy/corpus-persistence), the.mcp.json.template, and the MCP-serverDockerfile.- Azure deployment/infra — removed the
deploy-mcp.ymlworkflow (Azure Container Apps deploy of the MCP server), the Azurite / Azure-OIDC / Key Vault machinery from the nightly workflow, theazureoptional extra (azure-identity / azure-keyvault-secrets / msal / azure-monitor exporter), the Application Insights / Azure Monitor OTel exporter fromobservability.exporters(observability stays vendor-neutral: console / OTLP), and the dead Entra ID config fields. Kept theAzureEmbedderAzure OpenAI model provider (azure-embeddingsextra). - MarkItDown — removed the Microsoft
markitdowndocument converter: deletedcontent.loaders(MarkitdownLoader+ theloaderspackage) and themarkitdownoptional extra. Services that relied on the universal MarkItDown loader now use native per-format loaders. - Dead Azurite test fixture (and its
mcr.microsoft.com/azure-storage/azuriteimage reference) and the stale corpus / MCP / Azure entries in.env.template.
Changed¶
markdown-it-py(used bycontent.markdown_chunker) is promoted from the removedmarkitdownextra to a core dependency.- CI (
pr-gate,nightly) install--extra binaryand no longer install the removedmcp/corpus-search/azure/markitdownextras.
[26.05.29] - 2026-05-29¶
Added¶
- State-based pipelines, unified on
PipelineEngine.PipelineBuildergains an opt-instate=mode: pass a Pydantic model (PipelineBuilder(name, state=SomeModel)) and nodes becomeasync (state) -> dict | Nonefunctions over a typed shared state instead of port-wired DAG steps.add_node(fn)derives the node id fromfn.__name__, the first node added is auto-detected as the entry point, and the legacy port-based mode is unchanged. There is a single executor:PipelineEngineruns both modes —PipelineBuilder(state=...)simply constructs an engine configured withstate_schema,recursion_limit,audit_log,checkpointer,event_handler, and any routers registered via.branch(...). State-mode runs go through a cycle-aware frontier scheduler and execute independent nodes concurrently (#147, #245). - Reducers for merge semantics. Field-level merge is declared with
Annotated[T, reducer]on the state model. Four reducers ship fromfireflyframework_agentic.pipeline:replace(default),append,extend, andmerge_dict. Each node returns a partial dict and the engine folds it into shared state per the declared reducer, so concurrent fan-out workers accumulate rather than clobber. - Unified branching via
.branch(source, router, mapping=None). One call replaces the legacyBranchStep/FanOutStep(now soft-deprecated with aDeprecationWarning). With no mapping the router returns a target node id directly; with a mapping it returns an abstract label that resolves to a node.PipelineEngine.to_mermaid()renders branch-edge labels from the registered mappings, andDAG.to_mermaid()/DAG.to_json()export any DAG. - Cycles and
Sendfan-out for agentic loops. State-mode DAGs are built withallow_cycles=True, so a node can route back to itself (or an earlier node) for ReAct loops and retry-with-critique. Arecursion_limitkwarg (default25) bounds runaway cycles with a clean failure result via a per-node visit counter. A router may returnlist[Send](Send(target, payload)) for runtime fan-out: each worker runs concurrently over its own payload-merged state copy and the results reduce back into shared state, with per-target visit counters preserved for observability. - Human-in-the-loop pause gates. A node returning the
Pause(reason=...)sentinel halts the pipeline cleanly and writes a paused checkpoint (CheckpointRecordgains backward-compatiblepaused/pause_reasonfields). The result carriespaused/paused_node/pause_reason, and an event handler can observeon_node_pause. Pauses are sticky: resuming requiresinvoke(run_id=..., approve_pause=True), which restarts from the successor of the paused node; resuming without it raisesPipelineError. - Checkpoint, resume, and mid-pipeline entry.
FileCheckpointerpersists state after each successful node;invoke(run_id=...)resumes from the latest checkpoint, skipping completed nodes.invoke(state, start_at=node)jumps into a pipeline mid-flow with an explicit state — useful for replays and partial reruns. - Pipeline audit log. New
pipeline/audit.pyexports a split protocol —AuditLog(write-only) andQueryableAuditLog(addslist_entries) — over anAuditEntrymodel, plus three concrete backends that wrap stdlib / framework primitives:FileAuditLog(JSONL per pipeline + run id, queryable),LoggingAuditLog, andOtelAuditLog. Every node visit is recorded with its status, includingpaused. - Unified
EventHandlerprotocol and OTel spans. A singleEventHandlerprotocol (withPipelineEventHandleras the built-in implementation) covers both pipeline modes. State-mode spans use thepipeline.state.*taxonomy so existing observability dashboards keep working. examples/software_factory/example. A self-contained package that exercises the headline state-mode features end to end: typed state with reducers, router-driven branching with aqa → codegencycle (recursion_limit=3), checkpoint/resume on a transient builder failure, andStatePipelineEventHandlerprogress output. It also ships plug-and-play durable backends —checkpointers/{postgres,redis}.pyandaudit/postgres.py, each a flat ~50–80 LOC class against a caller-supplied connection — swappable via theFIREFLY_CKPTenv var.- Contradiction surfacing in the corpus answerer. Both the fast-path and reasoning prompts gain a MUST rule: when two or more retrieved chunks disagree on the same fact, the answer must surface the conflict and cite the competing sources rather than silently picking one. Verified against contradicting fixtures (e.g. the same quarter's revenue reported as two different figures).
Changed¶
- Durable checkpointer / audit backends now live in examples, not the
framework.
PostgresCheckpointer,RedisCheckpointer, andPostgresAuditLog(and the internalPsycopgBackendhelper) have been dropped from the framework; thepsycopg[binary]dependency is removed from the[postgres]extra. TheCheckpointerandAuditLogprotocols plus the framework-nativeFileCheckpointer,FileAuditLog,LoggingAuditLog, andOtelAuditLogremain. Operators who need a database-backed store implement the protocol against their own connection — see the ~50–80 LOC reference classes underexamples/software_factory/.
Fixed¶
IngestLedgernow records fetch failures. A failed fetch previously advanced the cursor without writing anything, so files silently disappeared from the ledger. Each failure is now recorded so retries and audits can observe it (#219).StructuredRetrieverworks on cloud backends. The retriever was hardcoded toself.root / "corpus.sqlite", which broke onAzureBlobBackendwhere the SQLite database lives in blob storage. It now routes through_db_store.ensure_fresh(), materialising a local copy on cloud backends and remaining a no-op onLocalBackend(#219).firefly-mcp-httpnow wires OpenTelemetry exporters at startup. A_configure_telemetry()helper runs at the top ofmain(), before any framework code records a measurement, so whenAPPLICATIONINSIGHTS_CONNECTION_STRINGis set the metrics and traces actually reach Application Insights. Resolves the operator-reported "App Insights is empty despite the connection string being set".
Changed (dependencies)¶
pydantic-aiupgraded1.75 → 1.99andmistralaiun-pinned. Withmistralaiback on PyPI (2.4.5), the[tool.uv.sources]git workaround is removed and thepydantic-aifloor is lifted to>=1.99.0. TheMistralimport now targets the 2.x layout (mistralai.client).
Internal¶
- Inline imports lifted to module top-level across the codebase for
project-rule compliance, with optional-dependency imports guarded via
TYPE_CHECKINGso pyright narrows correctly without importing at runtime. No behavioural change. - PR-gate CI sped up with shallow checkout, no coverage on PRs, and a
cached
uvresolver across jobs (#218). - Cost-tracking docs now point users at
examples/cost_tracking.pyfor the cost-resolver override pattern.
[26.05.21] - 2026-05-21¶
Changed (BREAKING — delegation routing API)¶
DelegationStrategy.select()replaced bydecide() -> RoutingDecision. Strategies now return ranked, scoredCandidatetuples plus metadata instead of a single agent. No deprecation shim: a shim would lock in the single-agent return shape we are explicitly escaping. External implementers get a cleanProtocolmismatch at type-check time.DelegationRouter.route()keeps its exact current signature, so the common call site is unaffected. New combinatorsChainStrategy,FallbackStrategy, andWeightedStrategynest strategies without subclassing;DelegationRouter.decide()/execute()split selection from execution and emit afirefly.routing.decisionOTel event.CapabilityStrategyandContentBasedStrategynow return empty decisions instead of raising / silently falling back. PreviouslyCapabilityStrategyraisedDelegationErroron no-match (blocking composition with fallback) andContentBasedStrategysilently returned the first agent on LLM failure (hiding errors). Both now return emptyRoutingDecisionobjects. Callers using barerouter.route()still seeDelegationError("Empty routing decision")fromexecute()— same exception class, different message.CostAwareStrategyno longer carries a hardcoded model→tier table. Cost per agent is computed viaresolve_costfromfireflyframework_agentic.observability.cost_resolversagainst a syntheticCostContext(defaults: 1000 input / 500 output tokens), and scores are pool-relative linear normalisations. New keyword arguments configure the sample tokens, the resolver chain, and theon_unknownpolicy ("skip"/"lowest"/"raise").
Added¶
- Tool-using corpus answer agent.
CorpusAgentgains ananswer_strategy: Literal["fast", "reasoning"] = "fast"constructor flag. The fast path is unchanged (one-shot expand → retrieve → rerank → answer); the reasoning path delegates the answer phase to a newReasoningAnswerAgent(infireflyframework_agentic.rag.retrieval) that runs a tool-using ReAct loop over four tools:knowledge_search,sql_query,inspect_table, and a restricted Pythonpython_computesandbox. Construction adds three tunables (max_reasoning_tool_calls,max_reasoning_llm_calls,reasoning_wall_clock_seconds). Default behaviour is unchanged. Answer.reasoning_trace— new optional field of typeReasoningTrace | None(defaultNone). Populated byReasoningAnswerAgentwhenCorpusAgent.query(..., include_trace=True)is set. EveryActionStepcarriestool_name + tool_args(a plain dict), so a recorded trace is re-executable: seetests/examples/corpus_search/test_trace_is_replayable.py.- MCP
corpus_querytool gains two optional params,strategyandinclude_trace.include_tracedefaults toTrue— callers that hit the reasoning path receive the typedReasoningTracein the response without opting in. The fast path never populates a trace regardless of the flag, so the legacy fast-path JSON shape is unchanged. Passinclude_trace=falseto opt out (smaller payload). Process-wide agent cache keys by(corpus_id, strategy)so both paths can coexist for the same corpus. - New optional extra
[reasoning-eval]pulls innumpy>=2.0andpandas>=2.2for thepython_computesandbox. The sandbox itself is AST-validated (denylist on dunder names,eval/exec/compile/__import__/open/input, attribute access to dunder names like__class__/__bases__), runs in a worker thread with a 5 s wall-clock timeout, and caps combined stdout + result rendering at 8 KB. - Reasoning telemetry. Two new OTel instruments: histogram
firefly.rag.reasoning.tool_call_duration(labelled bytool_name) and counterfirefly.rag.reasoning.terminal_state(labelled by outcome —answered | no_info | tool_limit | llm_limit | timeout | error). The existingfirefly.rag.queryspan gains afirefly.rag.answer_strategyattribute on both fast and reasoning paths.
Changed (BREAKING — internal layout)¶
- Per-corpus token store is now provider-agnostic in the framework.
fireflyframework_agentic.security.corpus_tokenexports aCorpusTokenStoreProtocol plus the in-memoryCorpusTokenCacheand thecorpus_token_digesthelper. The Azure-specificKeyVaultTokenStore+build_default_storefactory moved toexamples/corpus_search/azure_security.pyalongside the existing Entra/OBO code. Thefirefly-mcp-httpserver resolves the concrete store at startup via theFIREFLY_MCP_TOKEN_STORE_FACTORYenv var (defaults toexamples.corpus_search.azure_security:build_default_store) so existing Azure deployments keep working, and operators on a different back-end can swap the factory without touching the framework. Thefirefly-mcp-tokenCLI moved toexamples/corpus_search/firefly_mcp_token.pyand is no longer registered as a top-level script; invoke it aspython -m examples.corpus_search.firefly_mcp_token ….
Changed (BREAKING for clients of the auth flag)¶
firefly-mcp-httpper-corpus auth now requires theX-Firefly-Corpus-Idheader on every gated request (in addition toAuthorization: Bearer …). The middleware validates the bearer against Key Vault before letting any request through — including the JSON-RPC handshake,tools/list, andlist_corpora— closing the gap where an outsider could enumerate tool schemas or corpus_ids by sending only a bearer-shaped string. Body-sidearguments.corpus_idmust match the header value for corpus-scoped tools. Update Claude Desktop /mcp-remoteentries to pass--header X-Firefly-Corpus-Id: <id>.
Fixed¶
-
SQL agent reasoning: discriminator filters, parent-level GROUP BY, and sibling-column scans. The text-to-SQL retriever now annotates each string column in the schema context with its
COUNT(DISTINCT)cardinality (e.g.metric_line (string, 3 distinct)) so the agent can spot categorical / discriminator axes and parent-vs- child cardinality gaps at schema-read time. The system prompt gains three rules and three worked examples covering: filtering on a discriminator before aggregating heterogeneous rows (#161), usingGROUP BY <parent>when the user says "by X" / "for each X" / "per X" (#162), and scanning semantically-related sibling columns before concluding "no record" on a NULL result (#163). No new tools or schema-model fields. -
firefly-mcp-httpnow loads.envon startup. The CLI callsload_dotenv(find_dotenv(usecwd=True))at the top ofmain(), so a developer running the server from a project directory gets its variables (e.g.EMBEDDING_MODEL,FIREFLY_MCP_KEYVAULT_URL) without an explicit shellsource. Real process env vars always win —load_dotenvdefaults tooverride=False— so Azure / Container Apps deployments (which inject env from the manifest before the process starts) see no behavioural change.python-dotenvis now a core dependency (previously declared only under thecorpus-search/devextras); promoted so the import inmain()can be unconditional rather than guarded. Resolves theKeyError: 'EMBEDDING_MODEL'operators hit when runningfirefly-mcp-httplocally with a.envpresent. -
firefly-mcp-httplogs unhandled asyncio task exceptions to stderr before the loop has a chance to die silently. Previously, an exception in a task scheduled on the asyncio loop (request-cleanup callbacks, fire-and-forget tool work, SSE long-poll teardown) was routed byBaseEventLoopto theasynciologger at ERROR — but uvicorn's default log config doesn't surface that logger. Operators saw "the server died" / "the bridge can't reconnect" with no traceback. The CLI now installs a loop-level exception handler that routes throughlogging.getLogger("…http_cli")(whichbasicConfigwires up at startup, level overridable viaFIREFLY_MCP_LOG_LEVEL), preserving the exception's traceback viaexc_info=. Does NOT swallow exceptions or change loop behaviour — only makes them visible. -
LocalBackend corpus state now lives under
CORPUS_ROOT, not in~/.cache/.DatabaseStorepreviously kept its working copy at~/.cache/fireflyframework_agentic/dbstore/<store_id>/db.sqlitefor every backend, andLocalBackend.upload/downloadshutil.copyfile'd between that cache and the file underCORPUS_ROOT. The two copies could drift, and arm -rf $CORPUS_ROOTdid not reset corpus state (the dedup ledger and embeddings stayed alive in the cache, re-ingestion silently skipped every file). The store now readsStorageBackend.local_pathat construction; forLocalBackendit co-locates the working copy with the backend file (same inode, no duplicate), and every file used by a corpus — SQLite, WAL/SHM, the metadata sidecar, the lock sentinel — lives under the configured root.LocalBackend.upload/downloadshort-circuit when source and destination are the same inode, so the existing call sites needed no changes. Remote backends (AzureBlobBackend) keep the legacy cache-dir layout because their working copy MUST be a separate local file. Operators upgrading shouldrm -rf ~/.cache/fireflyframework_agentic/dbstore/corpus_search:to reclaim disk; the new layout takes effect automatically on next startup (#170). -
Answerer preserves diacritical marks in non-English responses. The RAG answerer's instructions now tell the model to answer in the same language as the question and to keep correct orthography (
á/é/í/ó/ú/ñ/ü/ç/à/è/ê/ôand equivalents) rather than transliterating to ASCII. Resolves the regression where Spanish answers came back asproduccion/aprobacion/Cual?instead ofproducción/aprobación/¿Cuál?(#157).
Added¶
list_corpus_schemasandcorpus_sqlMCP tools. Two new read-only entrypoints that expose the structured side of a corpus directly, without going through the LLM-drivencorpus_querypipeline.list_corpus_schemas(corpus_id)returns everyTargetSchemasaved byingest_corpus_structured(column names, types, primary/foreign keys, units) so a host can discover what's queryable;corpus_sql(corpus_id, sql, params?, limit?)runs a singleSELECTand returns raw rows. Safety: the connection is opened in SQLitemode=roso writes physically cannot land, the SQL is parsed with sqlglot and onlySELECTis accepted, and table references are whitelisted against the schema registry — internal tables (chunks,_schemas,ingestions, …) are rejected. Addssqlglot>=26.0.0to thecorpus-searchextra.- Optional
unitfield onColumnSpec. Schemas can now declare the human-readable unit a numeric column stores ("USD millions","headcount","percent","days", …). The SQL retriever's schema context surfaces it to the agent asname (type, unit=…), the retriever's system prompt requires the agent to preserve the unit in SELECT results (via alias or co-selection), and the answerer is instructed to quote the unit alongside any numeric quantity it cites — or to flag the ambiguity explicitly when no unit is known, rather than presenting a unit-less number the user cannot verify (#158). firefly-mcp-tokenCLI for operators managing per-corpus tokens in Azure Key Vault. Commands:create,rotate,revoke,list,show-name. UsesDefaultAzureCredential; the minted token goes to stdout (pipe-friendly), status to stderr. Registered as a[project.scripts]entry alongsidefirefly-mcp-http.- Fuzzy entity matching in the SQL retriever. The agentic inspect-loop
gains a
find_similarop oninspect_tablethat tokenises the user's value on whitespace and matches accent-folded, case-insensitive substrings (AND-of-LIKEs, with OR fallback). A newunaccent_lower(col)SQL UDF is registered on every connection so the LLM can write diacritic-tolerant filters inrun_select. The system prompt now steers the LLM to probefind_similarfor free-text entity columns and to retry rather than stop when an equality filter returns 0 rows. numeric_summaryop oninspect_tablein the SQL retriever. Returns total rows, non-null count, null count, sum, min, max, and two mean variants —mean_excluding_nulls(SQL defaultAVG) andmean_blanks_as_zero(treats NULL cells as 0). The two means diverge whenever the column carries NULLs, so the agent can detect the blank-as-zero spreadsheet convention and pick the right interpretation instead of silently averaging over the smaller non-null subset. The system prompt now steers the LLM to probenumeric_summarybefore averaging numeric columns, and to surface both interpretations when ambiguous.- Per-corpus capability tokens for
firefly-mcp-http. WhenFIREFLY_MCP_CORPUS_AUTH_ENABLED=true, every MCP tool call must present a bearer matching thefirefly-mcp-corpus-token-<corpus_id>secret in the Azure Key Vault atFIREFLY_MCP_KEYVAULT_URL. A token leak now exposes one corpus, not the whole server.list_corporais filtered to the caller's authorised corpora. Off by default; stdio transport and existing ingress-fronted HTTP deployments are unaffected. Seedocs/deploy/mcp-corpus-auth.md.
[26.05.11] - 2026-05-11¶
Changed (BREAKING)¶
- Repo layout flattened.
src/fireflyframework_agentic/moved tofireflyframework_agentic/at the repo root. Vendor- and example-specific code (cli/, vendor backends, SharePoint source, thecorpus_searchreference agent's CLI) moved underexamples/corpus_search/. Thestorage-azureextra and the previous top-level[project.scripts]block were removed (#134, #137). corpus_retrieve→knowledge_search. The MCP corpus retrieval tool was renamed for clarity (#134). Update any client code or MCP wiring that referencedcorpus_retrieve.firefly-mcp-httpentry point relocated. Now registered asfireflyframework_agentic.exposure.mcp.http_cli:mainin[project.scripts]. The MCP HTTP server is a first-class deliverable of the package, not an example (#139). Closes #138.
Added¶
- Unified structured + unstructured ingestion in
corpus_search.CorpusAgent.ingest_sourceaccepts both tabular and document sources through a single pipeline, with separate retrievers feeding the answerer's prompt (#108). - Schema-aware structured ingestion. Discover-review-ingest workflow for tabular sources: schema discovery first, then per-column review, then ingest only the approved columns. Closes #117 (#118).
RubricReviewer. Rubric-based grader loop for validation; LLM judges candidate outputs against an explicit rubric and feeds back deltas for retry. Exposed fromvalidation(#130).- Managed SQLite storage backends. Local-file and Azure Blob backends expose a uniform managed-SQLite surface for memory and other persistence needs (#112).
list_corporaMCP tool. Discovery endpoint that enumerates available corpora; nightly e2e test added to keep it honest (#115).
Fixed¶
- Nightly auth via Key Vault + OIDC. Replaced direct
${{ secrets.ANTHROPIC_API_KEY }}injection withazure/login(OIDC) followed byaz keyvault secret showagainstkv-firefly-signature. The previous wiring resolved to empty strings and broke every Anthropic-using test on the nightly (#120, follow-up #137). Closes #125. - MCP container deploy. Repaired the
DockerfileCOPYpaths and console-script entry point sodeploy-mcpbuilds and pushes again after the flat-layout move (#139). Closes #138. - Retrieval benchmark.
runner.pynow ingests only*.mdfiles; the 25,870-row billing-ledger CSV was being fed through the markdown chunker, producing ~24k chunks and corrupting the SQLite-vec store. Smoke test updated for the 12-doc corpus (#140). - Structured ingest folder walks. Filter to tabular file types so non-tabular files in mixed corpora don't trip the structured loader (#123).
- Real-LLM e2e tests. Switched
test_e2e_real_llmto Azure OpenAI embeddings to align with the production embedder path (#119).
Changed¶
- No hardcoded VERSION constants in installers.
install.sh,install.ps1, and theiruninstall.*counterparts no longer carry a hand-bumpedVERSIONstring; the post-install verify readsfireflyframework_agentic.__version__from package metadata (#136). .python-versionremoved.pyproject.toml'srequires-python = ">=3.13"is the sole source of truth (#136).CLAUDE.mdgitignored. Developer-local agent guidance is no longer tracked (#136).- Dependabot bumps.
urllib32.6.3 → 2.7.0 (#135);langchain-core1.3.2 → 1.3.3 (#124). - CI hardening.
deploy-mcp.ymlbumpedactions/checkout@v4 → @v6and SHA-pinneddocker/setup-buildx-actionto v4.0.0 to clear the Node 20 deprecation (#137).
Tests¶
tests/examples/corpus_search/consolidated. Vendor backend tests and the structured-ingestion ledger test moved under the example's tree alongside their production code (#110, #111).- Benchmark smoke test updated to assert the new 12-md corpus shape after the runner fix (#140).
[26.04.30] - 2026-04-30¶
Added¶
- Entra ID security. Token verification and on-behalf-of (OBO) exchange
for Azure AD authentication flows. New
[azure]extra installs the required dependencies (#92). - MCP server. New exposure module ships an MCP server and the
firefly-mcpCLI for exposing agents over the Model Context Protocol (#93).
- Corpus-search example agent. New
examples/corpus_search/ships a folder-ingestion + hybrid-search agent:markitdown→ chunk → embed (Azure OpenAI by default) → SQLite FTS5 + Chroma. Query pipeline is expand (Haiku) → BM25 + vector → RRF fuse → rerank (Haiku) → answer (Sonnet) with inline citations. Framework additions:content/loaders/MarkitdownLoaderandpipeline/triggers/FolderWatcher. New extras:[markitdown],[watch],[corpus-search](#82). - SQLite memory store. New
SQLiteStoreprovides stdlib-backed local persistence for memory, sitting alongsideFileStorewith the same surface (#87). - Refactored prompt manager. New prompt implementation with template
scheme, registry, and explicit
Prompttype used by reasoning prompts (#85). - Nightly CI workflow. Full test suite runs once per day under the
nightlypytest marker, separated from the per-PRpr-gate. On failure, the workflow opens (or comments on) anightly-failuretracking issue; a subsequent green run auto-closes it. README gains a Nightly badge alongside PR gate (#89).
Changed¶
- Security extra renamed.
entra.py→azure.py; the security manager now inherits fromRBACManager. Extra[entra]→[azure]and is installed in the PR gate. - Memory store layout.
SQLiteStorelives instore.pyand is aligned with the other stdlib backends. EmbeddingResult.usageis nowOptional. Backward-compatible change to support embedding backends that do not report usage (#82).- Examples simplified. Use bare
load_dotenv()and sourceMODELfrom.env; removedexamples/_common.py(#81). - CI rename. Workflow
ci→pr-gate; triggers only onpull_request, not onpush.
Fixed¶
- Nightly perf benchmarks. Replaced the broken
benchmark(lambda: pytest.asyncio.fixture(coro))pattern with sync tests driven by a sharedbench_loopevent-loop fixture (required soHttpTool'shttpx.AsyncClientstays bound to a single loop across iterations). Test classes dropped per project convention;skipifandbenchmark(group=...)decorators moved onto each function (#91).
Tests¶
- Test tree reorganized under
tests/unit/for agents, memory, observability, pipeline, tools, resilience, and core (#88). - Responsible AI category (
tests/responsible_ai/) groupsoutput_guardandprompt_guard. - Benchmarks moved to
tests/performance/, markednightly, and renamed totest_bench_*.pyfor pytest collection. - Tests README documents per-category descriptions and the nightly marker.
[26.04.28] - 2026-04-28¶
Changed (BREAKING)¶
- Project rename:
fireflyframework-genai→fireflyframework-agentic. Comprehensive rebrand fromgenaitoagenticacross every public surface. SeeMIGRATIONsection below for an upgrade checklist. - Python module:
fireflyframework_genai→fireflyframework_agentic. - PyPI package:
fireflyframework-genai→fireflyframework-agentic. - Class names:
FireflyGenAI*→FireflyAgentic*(coversFireflyGenAIConfigandFireflyGenAIError). - Environment-variable prefix:
FIREFLY_GENAI_*→FIREFLY_AGENTIC_*. - REST factory:
create_genai_app()→create_agentic_app(). - Repository URLs:
github.com/fireflyframework/fireflyframework-genai→…/fireflyframework-agentic. - Brand prose: "Firefly GenAI" → "Firefly Agentic".
Mentions of "GenAI" as a category (e.g. "GenAI metaframework", "GenAI
workloads", keywords = ["genai"]) are intentionally preserved -- the
framework targets the GenAI domain. References to the external
genai-prices library and the GenAIPricesCostCalculator wrapper class
also remain.
Removed (BREAKING)¶
- Studio extracted to its own repository. The visual IDE, project runtime, scheduler, tunnel, code generation, and AI assistant now live in fireflyframework-agentic-studio. Removed from this repo:
src/fireflyframework_agentic/studio/(Python module).studio-frontend/(SvelteKit SPA).studio-desktop/(Tauri desktop bundle and PyInstaller spec).scripts/build_studio.py.tests/test_studio/(~30 test files).- Studio-only docs:
studio.md,studio-agents.md,api-reference.md,scheduling.md,tunnel-exposure.md,input-output-nodes.md,project-api.md,tutorial-bpm-pipeline.md. examples/studio_launch.py..github/workflows/desktop.yml(Tauri build pipeline).[studio]extra inpyproject.toml(FastAPI, Uvicorn, Strawberry-GraphQL, APScheduler).fireflyCLI entry point (now ships with the studio package).frontend-buildjob and studio artifact wiring in CI.
Added¶
- Pre-commit hooks.
.pre-commit-config.yamlwith ruff (lint + format), file hygiene (trailing whitespace, EOF, YAML/TOML/JSON validation, merge-conflict markers, large-file guard, AST check),gitleaksfor secret scanning, andno-commit-to-branchformain/master. CI gains aPre-commitjob that runs the same hooks on every PR so--no-verifybypasses are caught.
Migration¶
- from fireflyframework_genai import FireflyGenAIConfig, get_config
+ from fireflyframework_agentic import FireflyAgenticConfig, get_config
- from fireflyframework_genai.exposure.rest import create_genai_app
+ from fireflyframework_agentic.exposure.rest import create_agentic_app
For users who previously installed the embedded Studio:
A bulk replace covers most call sites:
grep -rl 'fireflyframework_genai' . | xargs sed -i 's/fireflyframework_genai/fireflyframework_agentic/g'
grep -rl 'fireflyframework-genai' . | xargs sed -i 's/fireflyframework-genai/fireflyframework-agentic/g'
grep -rl 'FireflyGenAI' . | xargs sed -i 's/FireflyGenAI/FireflyAgentic/g'
grep -rl 'FIREFLY_GENAI_' . | xargs sed -i 's/FIREFLY_GENAI_/FIREFLY_AGENTIC_/g'
The full migration guide for Studio users lives in the fireflyframework-agentic-studio README.
Changed¶
- Middleware Protocol -- Renamed
before/aftertobefore_run/after_runonPromptCacheMiddlewareandCircuitBreakerMiddlewareto conform to theAgentMiddlewareprotocol contract. - Exception Hierarchy -- Renamed
MemoryErrortoFireflyMemoryErrorto avoid shadowing the Python built-in. A deprecated alias is kept for backwards compatibility. - Quota Defaults --
quota_enablednow defaults toFalseto avoid unexpected enforcement on first install. - Cost Calculator Type --
cost_calculatorconfig field is nowLiteral["auto", "genai_prices", "static"].
Security¶
- ShellTool -- Replaced
create_subprocess_shellwithcreate_subprocess_execto prevent command-injection via shell metacharacters. - FileSystemTool -- Replaced
str.startswithpath check withPath.is_relative_toto prevent symlink-based path traversal. - RBAC Decorator -- Fixed
require_permissionto useinspect.signaturefor positional argument binding and replacednonlocalmutation with localmanagervariable. - Encryption -- Each
AESEncryptionProvider.encrypt()call now generates a random 16-byte salt for PBKDF2 key derivation, stored assalt[16]+nonce[12]+ciphertext+tag. - REST Middleware --
allow_credentialsis now automatically set toFalsewhenallow_origins=["*"]. API key comparison useshmac.compare_digest. - REST Router -- Exception details are no longer exposed to clients; errors are logged server-side and a generic message is returned.
- Database Store -- Schema name is validated against
^[a-zA-Z_][a-zA-Z0-9_]*$to prevent SQL injection. - FileStore -- Added
Path.is_relative_tocheck in_path()to prevent namespace-based path traversal.
Fixed¶
- Thread Safety -- Added
threading.LocktoInMemoryStore,CachedTool,RateLimitGuard,ConversationMemory.get_turns/get_total_tokens/clear/ clear_all/new_conversation/conversation_ids. - Pipeline Engine --
_gather_inputsnow correctly extractsoutput_keyfrom dict and object results.started_atis initialised before the retry loop. - asyncio.run Crash --
database_store.pyandmanager.pysync wrappers now detect a running event loop and offload to aThreadPoolExecutorinstead of crashing. - TextTool ReDoS -- Regex operations in
_extract,_replace,_splitnow run viaasyncio.to_threadwith a 5-second timeout. - SandboxGuard ReDoS -- User-supplied patterns are compiled with a safe
_safe_compilehelper. - Observability Decorators --
@meterednow records latency in afinallyblock so it is captured even on exceptions. - Logging --
ColoredFormatter.formatnow operates on acopy.copy(record)to avoid mutating shared log records. - SlidingWindowManager -- Uses
collections.dequeand_running_tokenscounter instead of re-estimating the entire window on every eviction. - PromptTemplate -- Added
_UNSETsentinel forPromptVariable.defaultso thatdefault=Noneis correctly propagated. - Queue Consumers -- Kafka, RabbitMQ, and Redis consumers now wrap
_process_messagein try/except to prevent one bad message from killing the consumer loop. - Goal Decomposition --
_execute_tasknow passesmemory=memoryto the delegated_task_pattern.execute(). - ConversationMemory --
clear()andclear_all()now also clear_summariesto prevent stale summary leaks. - Reasoning Registry -- Six built-in patterns are auto-registered at import time.
- Observability Exports --
extract_trace_context,inject_trace_context, andtrace_context_scopeare now re-exported fromobservability/__init__.py. - UsageTracker --
_check_budgetexception handler now logs at DEBUG instead of silently passing.
[26.02.07] - 2026-02-17¶
Added¶
-
Multi-Provider Support Hardening -- New
model_utilsmodule providing centralized model identity extraction (extract_model_info,get_model_identifier,detect_model_family) for uniform handling of both"provider:model"strings andpydantic_ai.models.Modelobjects across the framework's observability and resilience layers. -
Cross-Provider Cost Tracking --
StaticPriceCostCalculatornow resolves pricing through proxy providers.bedrock:anthropic.claude-3-5-sonnet-latestmaps to Anthropic pricing,azure:gpt-4omaps to OpenAI pricing, andollama:*models report$0.00. Added Mistral pricing entries. -
Bedrock Throttling Detection --
_is_rate_limit_error()now detects AWS BedrockThrottlingExceptionandTooManyRequestsException(boto3ClientErrorshapes) in addition to HTTP 429 and string-pattern matching. Also added"throttl"as a fallback string pattern. -
Cross-Provider Prompt Caching --
PromptCacheMiddlewarenow usesdetect_model_family()to route caching configuration by model family rather than string matching.bedrock:anthropic.claude-*correctly routes to Anthropic caching;azure:gpt-*routes to OpenAI caching. -
Model Object Fallback --
FallbackModelWrappernow acceptsSequence[str | Model], allowing cross-provider fallback chains with pre-configuredModelobjects (e.g. Azure → OpenAI → Anthropic).run_with_fallback()updates_model_identifieron each swap so cost tracking and rate-limit backoff keys remain accurate.
26.01.01 - 2026-02-10¶
Changed¶
- CalVer Migration -- Migrated versioning scheme from
M.YY.PatchtoYY.MM.Patchfor clearer calendar-based version identification. This release consolidates all changes from the previous2.26.xreleases.
2.26.1 - 2026-02-09¶
Removed¶
- Studio / CLI / TUI -- Removed the Firefly GenAI Studio package
(
src/fireflyframework_genai/studio/), theflygenaiCLI entry point, the[cli]optional extra, all studio tests (tests/test_studio/), and studio documentation (docs/studio.md). The framework is now a pure library without any CLI or TUI components. Room persistence configuration fields have been removed fromFireflyGenAIConfig.
2.26.1 - 2026-02-08¶
Added¶
-
Database Persistence Backends -- PostgreSQL and MongoDB support for production-grade conversation memory and working memory persistence.
PostgreSQLStoreandMongoDBStoreimplement theMemoryStoreprotocol with connection pooling viaasyncpgandmotor. Automatic schema/collection creation on first use. Configuration via environment variables or direct initialization. Install withpip install fireflyframework-genai[postgres]orpip install fireflyframework-genai[mongodb]. -
Distributed Trace Correlation -- W3C Trace Context propagation across service boundaries (HTTP, message queues, pipelines). Functions
inject_trace_context()andextract_trace_context()for manual propagation. Automatic integration with REST API middleware, Kafka/RabbitMQ/ Redis queue consumers, and pipeline context viacorrelation_id. Enables end-to-end trace correlation in distributed GenAI applications. -
API Quota Management -- Production-grade quota enforcement with
QuotaManager,RateLimiter, andAdaptiveBackoff. Supports daily budget limits (USD), per-model rate limits (requests/minute), and exponential backoff with jitter for 429 responses. Sliding window rate limiting for accurate enforcement. Configuration via environment variables (FIREFLY_GENAI_QUOTA_*). Integrates withUsageTrackerfor unified cost and quota management. -
Security Hardening -- Four new security features for enterprise deployments:
- RBAC -- Role-Based Access Control with JWT authentication, role/permission
management, multi-tenant isolation, and
@require_permissiondecorator. - Encryption -- AES-256-GCM encryption for data at rest via
AESEncryptionProviderandEncryptedMemoryStorewrapper for transparent encryption of anyMemoryStorebackend. - SQL Injection Prevention -- Automatic detection and blocking of 15+
SQL injection patterns in
DatabaseToolqueries. Enforces parameterized queries and rejects string concatenation. -
CORS Security -- Restrictive CORS policy by default (no origins allowed). Explicit allow-list configuration for production via environment variables.
-
HTTP Connection Pooling --
HttpToolnow supports connection pooling viahttpx.AsyncClientfor 50-70% latency reduction on repeated requests. Configurable pool size, keepalive connections, and timeout. Automatic fallback tourllibwhenhttpxnot installed. Configuration via environment variables (FIREFLY_GENAI_HTTP_POOL_*). Async context manager support for cleanup. -
Incremental Streaming -- True token-by-token streaming mode for
FireflyAgent. Newstreaming_modeparameter accepts"buffered"(default, chunk-based) or"incremental"(token-by-token). Incremental mode providesstream_tokens()method with optionaldebounce_msparameter. REST API endpoints:/agents/{name}/stream(buffered) and/agents/{name}/stream/incremental. Both modes work with all middleware. -
Batch Processing --
BatchLLMStepfor pipeline batch processing of multiple prompts through an agent concurrently. Supports both initial inputs and previous step outputs via flexibleprompts_keyparameter. Configurable batch size, completion polling, and per-batch callbacks. Automatic error handling captures individual prompt failures without blocking the batch. Respects all agent middleware including caching and circuit breakers. -
Provider Prompt Caching --
PromptCacheMiddlewareenables provider-specific prompt caching for 90-95% cost reduction on cached tokens. Supports Anthropic (cache_control), OpenAI (cached_content), and Gemini (cachedContent) caching mechanisms. Automatic configuration based on model provider. Cache statistics tracking with hit rate and estimated savings calculation. Configurable system prompt caching, minimum token threshold, and TTL. -
Circuit Breaker Pattern --
CircuitBreakerandCircuitBreakerMiddlewarefor resilient agent execution. Three states: CLOSED (healthy), OPEN (rejecting requests), HALF_OPEN (testing recovery). Configurable failure threshold, recovery timeout, and success threshold. Prevents cascading failures and allows failing services time to recover.CircuitBreakerOpenErrorraised when circuit is open. Metrics tracking viaget_metrics(). -
Integration Test Suite -- 11 comprehensive integration tests in
tests/integration/test_full_integration.pycovering all production features working together: agent with all middleware, streaming with middleware, pipeline with batch processing, memory persistence, circuit breaker with batch processing, cost guard with streaming, multiple agents sharing memory, and feature composition scenarios. -
Examples and Documentation -- Updated examples showing all features in production context:
examples/full_integration.py(comprehensive production agent with all middleware),examples/circuit_breaker.py(resilience patterns),examples/batch_processing.py(batch API usage). Updated documentation indocs/agents.md,docs/pipeline.md,docs/memory.md,docs/observability.md,docs/security.md, anddocs/tools.mdwith detailed usage examples and configuration guides.
Fixed¶
-
Pipeline Data Flow --
BatchLLMStepnow correctly accesses previous step outputs viacontext.get_node_result()with fallback toinputsdict. Supports both node-to-node data flow and initial input patterns. -
Streaming API -- Fixed
UsageTrackerAPI usage in streaming tests (changed fromget_all()toget_summary()). Fixed async generator cleanup to preventStopAsyncIterationerrors.
Changed¶
-
Middleware Count -- Updated documentation from "eight" to "ten" built-in middleware classes to include
PromptCacheMiddlewareandCircuitBreakerMiddleware. -
Defence-in-Depth Example -- Updated production middleware stack example to include prompt caching and circuit breaker alongside existing security and observability middleware.
2.26.0 - 2026-02-07¶
Added¶
- Agent Middleware System -- Pluggable before/after hooks for agent runs via
AgentMiddlewareprotocol andMiddlewareChain. Supports prompt mutation, result transformation, and cross-cutting concerns (audit, guardrails, logging). - Agent Run Timeout --
timeoutparameter onFireflyAgent.run()andrun_sync()backed byasyncio.wait_for(). - Model Fallback --
FallbackModelWrapperandrun_with_fallback()for automatic retry with backup models on failure. - Result Caching --
ResultCachewith TTL, LRU eviction, and hash(model+prompt) keying for deduplicating identical agent calls. - Conversation Summarisation --
ConversationMemorynow accepts asummarizercallback; oldest turns are evicted and summarised when token usage exceeds the threshold. - JSON Structured Logging --
JsonFormatterandformat_style="json"option onconfigure_logging()for machine-parseable log output. - Prompt Injection Guard --
security.PromptGuardwith 10 default regex-based injection patterns, optional sanitisation, max-length check, and extensible custom patterns. - REST Rate Limiting --
RateLimiterandadd_rate_limit_middleware()for sliding-window per-client rate limiting on FastAPI/Starlette apps. - Async Memory I/O --
FileStoregainsasync_save,async_load,async_load_by_key,async_delete,async_clearwrappers viaasyncio.to_thread()to avoid blocking the event loop. - Pipeline Eager Scheduling --
PipelineEnginereplaced level-by-levelasyncio.gather()with a task-queue approach usingasyncio.create_task()andasyncio.wait(FIRST_COMPLETED)so nodes start as soon as their upstream dependencies complete. - Metering & Cost Tracking -- Automatic token usage tracking, cost
estimation, and budget enforcement across agents, reasoning patterns, and
pipelines.
UsageTracker,CostCalculatorprotocol with static andgenai-pricesbackends, budget alerts and limits. - Streaming Usage Tracking --
run_stream()wrapped in_UsageTrackingStreamContextto capture usage on__aexit__. - Pipeline Error Propagation --
FailureStrategyenum (PROPAGATE,SKIP_DOWNSTREAM,FAIL_PIPELINE) onDAGNodewith transitive successor skipping. - Thread-Safe Registries --
threading.Lockadded toAgentRegistry,ToolRegistry,ReasoningPatternRegistry, andConversationMemory. - Config Cross-Validation --
@model_validatoronFireflyGenAIConfigenforcing budget, chunk-overlap, and QoS constraints. - Type Safety -- Replaced
Anywith concrete types (UsageSummary,FireflyAgent,MemoryManager) inpipeline/result.py,pipeline/context.py,agents/delegation.py; fixedProtocolimport inpipeline/steps.py. - Comprehensive Test Suite -- 509 tests covering all modules including middleware, fallback, cache, config validation, JSON logging, lifecycle, agent/tool decorators, guards, composers, toolkit, observability decorators/events, pipeline builder/steps/context, plugin discovery, memory summarisation, prompt guard, rate limiter, and async FileStore.
2.25.0 - 2026-02-07¶
Added¶
- Logging --
configure_loggingfunction for structured framework-wide logging with level, format, and handler configuration. - Examples -- 15 runnable example scripts in
examples/covering agents (basic, conversational, summarizer, classifier, extractor, router), all six reasoning patterns (CoT, ReAct, Reflexion, Plan-and-Execute, ToT, Goal Decomposition), reasoning pipeline and memory integration, and a complex IDP pipeline. - IDP Pipeline Example (
examples/idp_pipeline.py+idp_tools.py) -- Full Intelligent Document Processing pipeline that downloads a real 33-page Unilever PDF and processes it through a 7-node DAG: ingest → split → classify → extract → validate → assemble → explain. Features LLM-powered document splitting (detects 4 sub-documents),create_classifier_agentwith category descriptions,OutputReviewerwith custom retry prompts,GroundingCheckervalidation, LLM-powered explainability narrative generation, ANSI-colored pretty JSON output,TraceRecorder/AuditTrail/ReportBuilderintegration, and exercises all major framework features together. - Core -- Configuration management via Pydantic Settings, typed enumerations, structured exception hierarchy, and a plugin discovery system.
- Agents -- Pydantic AI agent wrapper with lifecycle management, a central
registry, round-robin and capability-based delegation strategies, execution context,
and the
@firefly_agentdecorator. - Tools -- Protocol-driven tool interface, fluent
ToolBuilder,ToolRegistry,ToolKitgrouping, guard system (validation, rate-limiting, approval, sandboxing), sequential/fallback/conditional composition,@firefly_tooldecorator, and built-in tools for HTTP, filesystem, search, database, and shell operations. - Prompts -- Jinja2-based
PromptTemplateengine, versionedPromptRegistry, sequential/conditional/merge composition strategies, variable validation, and file/directory loaders. - Reasoning Patterns -- Abstract
ReasoningPatternwith Template Method design,ReasoningTracefor step-by-step audit, a pattern registry, and a composable pipeline. Ships six patterns: ReAct, Chain of Thought, Plan-and-Execute, Reflexion, Tree of Thoughts, and Goal Decomposition. - Observability -- OpenTelemetry-native
FireflyTracer,FireflyMetricscounter and histogram helpers,FireflyEventsevent emitter, configurable exporters, and@traced/@metereddecorators. - Explainability --
TraceRecorderfor decision-level recording,ExplanationGeneratorfor natural-language summaries,AuditTrailfor compliance, andReportBuilderfor Markdown and JSON reports. - Experiments --
ExperimentandVariantmodels,ExperimentRunnerfor executing A/B tests,ExperimentTrackerfor persistence, andExperimentComparatorfor statistical analysis. - Lab --
LabSessionfor interactive exploration,Benchmarkfor performance measurement,Comparisonfor side-by-side evaluation,Datasetfor test data management, andEvaluatorprotocol for custom scoring. - Exposure REST -- FastAPI application factory, auto-generated agent routes, request-ID and CORS middleware, health-check endpoints, and SSE streaming.
- Exposure Queues -- Abstract consumer/producer model with Kafka, RabbitMQ, and Redis Pub/Sub implementations, plus a pattern-based message router.
- Installation Scripts -- Cross-platform interactive installers (
install.sh,uninstall.sh,install.ps1,uninstall.ps1) with TUI, requirement detection, and remote execution support viacurl | bashandirm | iex. - Documentation Index -- Professional
docs/README.mdlanding page with documentation map organized by architecture layer.