Architecture Guide¶
Copyright 2026 Firefly Software Foundation. Licensed under the Apache License 2.0.
This document describes the high-level architecture of fireflyframework-agentic, the relationships between its modules, and the design principles that guided its construction.
Design Principles¶
The framework follows four guiding principles:
-
Protocol-driven contracts -- Public APIs are defined as Python
Protocolclasses or abstract base classes. This allows any module to be replaced or extended without modifying framework internals. -
Convention over configuration -- Sensible defaults are provided for every setting. A single
FireflyAgenticConfigobject (backed by Pydantic Settings) centralises configuration and supports environment-variable overrides. -
Layered composition -- Modules are organised into layers (Core, Security, Agent, Intelligence, Orchestration, plus optional Experimentation dev-tooling). Higher layers depend on lower layers but never the reverse.
-
Optional dependencies -- Heavy third-party libraries (embedding providers, vector store clients, storage backends, document-conversion tooling) are declared as extras. The core framework imports them lazily so that users only install what they need.
-
Pure in-process library -- fireflyframework-agentic is a library, not a server. It serves no HTTP port and consumes no message broker; the host service owns all serving, hosting, and inbound authentication. The framework emits model and agent spans and metrics through the OpenTelemetry API, but configuring the OTel SDK, exporters, and cross-service trace propagation is the host's responsibility.
Layer Diagram¶
graph TD
subgraph Orchestration Layer
PIPE["Pipeline / DAG Engine<br/><small>DAG · DAGNode · DAGEdge<br/>PipelineEngine · PipelineBuilder · PipelineEventHandler<br/>AgentStep · ReasoningStep · CallableStep · BranchStep<br/>FanOutStep · FanInStep · BatchLLMStep · EmbeddingStep · RetrievalStep<br/>Checkpointer · AuditLog · state reducers · exponential backoff + jitter</small>"]
end
subgraph Experimentation Layer
EXP["Experiments<br/><small>Experiment · Variant<br/>ExperimentRunner · VariantComparator<br/>ExperimentTracker</small>"]
LAB["Lab<br/><small>LabSession · Benchmark<br/>EvalOrchestrator · EvalDataset<br/>ModelComparison</small>"]
end
subgraph Intelligence Layer
REASON["Reasoning Patterns<br/><small>ReAct · CoT · PlanAndExecute<br/>Reflexion · ToT · GoalDecomposition<br/>ReasoningPipeline</small>"]
VAL["Validation & QoS<br/><small>OutputReviewer · OutputValidator<br/>ConfidenceScorer · ConsistencyChecker<br/>GroundingChecker · 5 rule types</small>"]
OBS["Observability<br/><small>FireflyTracer · FireflyMetrics<br/>FireflyEvents · UsageTracker<br/>BudgetGate · cost resolver chain<br/>@traced · @metered</small>"]
EXPL["Explainability<br/><small>TraceRecorder · ExplanationGenerator<br/>AuditTrail · ReportBuilder</small>"]
end
subgraph Security Layer
SEC["Security<br/><small>PromptGuard (25 patterns) · OutputGuard<br/>PromptGuardResult · OutputGuardResult<br/>AESEncryptionProvider · EncryptedMemoryStore<br/>injection detection · sanitisation · output scanning</small>"]
end
subgraph Agent Layer
AGT["Agents<br/><small>FireflyAgent · AgentRegistry<br/>DelegationRouter · AgentLifecycle<br/>@firefly_agent · 5 templates · 11 middleware<br/>7 delegation strategies · FallbackModelWrapper<br/>ResultCache · run timeout</small>"]
TOOLS["Tools<br/><small>BaseTool · ToolBuilder · ToolKit · CachedTool<br/>4 guards · 3 composers · tool timeout · HITL approval<br/>ToolRegistry · 9 built-ins</small>"]
PROMPTS["Prompts<br/><small>PromptTemplate · PromptRegistry<br/>3 composers · PromptValidator<br/>PromptLoader</small>"]
CONTENT["Content<br/><small>TextChunker · DocumentSplitter · MarkdownChunker<br/>ImageTiler · BatchProcessor<br/>ContextCompressor · SlidingWindowManager<br/>content.binary (BinaryNormalizer · office converters)</small>"]
MEM["Memory<br/><small>MemoryManager · ConversationMemory<br/>WorkingMemory · TokenEstimator<br/>InMemoryStore · FileStore · SQLiteStore<br/>summarization · create_llm_summarizer<br/>export/import · async wrappers</small>"]
EMB["Embeddings & Vector Stores<br/><small>BaseEmbedder · EmbedderRegistry · 8 providers<br/>BaseVectorStore · 7 backends<br/>ScopedVectorStore · TenantScopedVectorStore</small>"]
end
subgraph Core Layer
CFG["Config<br/><small>FireflyAgenticConfig<br/>get_config · reset_config</small>"]
TYPES["Types & Protocols<br/><small>AgentLike protocol<br/>TypeVars · type aliases</small>"]
EXC["Exceptions<br/><small>FireflyAgenticError hierarchy<br/>34 exception classes</small>"]
PLUG["Plugin System<br/><small>PluginDiscovery<br/>3 entry-point groups</small>"]
RES["Resilience<br/><small>CircuitBreaker<br/>CircuitBreakerMiddleware<br/>CircuitState</small>"]
STORE["Storage<br/><small>DatabaseStore · LocalBackend<br/>WriteSession · LockToken · RetryPolicy</small>"]
end
PIPE --> AGT
PIPE --> REASON
PIPE --> VAL
PIPE --> EMB
SEC --> AGT
REASON --> AGT
OBS --> AGT
EXPL --> OBS
VAL --> AGT
AGT --> TOOLS
AGT --> PROMPTS
AGT --> CONTENT
AGT --> MEM
AGT --> EMB
AGT --> RES
AGT --> CFG
TOOLS --> CFG
PROMPTS --> CFG
CONTENT --> CFG
MEM --> CFG
MEM --> STORE
EMB --> CFG
REASON --> CFG
VAL --> CFG
%% Experiments/Lab are optional leaf dev-tooling modules. They depend on
%% the agent layer, but the core framework never imports them.
EXP -.optional.-> AGT
LAB -.optional.-> EXP
Protocol & Class Hierarchy¶
Every extension point is a @runtime_checkable protocol. Implement the protocol to
provide your own components; the framework discovers them via duck typing.
classDiagram
class AgentLike {
<<Protocol>>
+run(prompt, **kwargs) Any
}
class ToolProtocol {
<<Protocol>>
+name: str
+description: str
+execute(**kwargs) Any
}
class GuardProtocol {
<<Protocol>>
+check(tool_name, kwargs) GuardResult
}
class ReasoningPattern {
<<Protocol>>
+execute(agent, input, **kwargs) ReasoningResult
}
class StepExecutor {
<<Protocol>>
+execute(context, inputs) Any
}
class DelegationStrategy {
<<Protocol>>
+decide(agents, prompt, **kwargs) RoutingDecision
}
class CompressionStrategy {
<<Protocol>>
+compress(text, max_tokens) str
}
class MemoryStore {
<<Protocol>>
+save(namespace, entry)
+load(namespace) list
+delete(namespace, entry_id)
+clear(namespace)
}
class ValidationRule {
<<Protocol>>
+name: str
+validate(value) ValidationRuleResult
}
AgentLike <|.. FireflyAgent
AgentLike <|.. pydantic_ai.Agent
ToolProtocol <|.. BaseTool
ToolProtocol <|.. SequentialComposer
ToolProtocol <|.. FallbackComposer
ToolProtocol <|.. ConditionalComposer
GuardProtocol <|.. ValidationGuard
GuardProtocol <|.. RateLimitGuard
GuardProtocol <|.. SandboxGuard
GuardProtocol <|.. CompositeGuard
ReasoningPattern <|.. AbstractReasoningPattern
ReasoningPattern <|.. ReasoningPipeline
StepExecutor <|.. AgentStep
StepExecutor <|.. ReasoningStep
StepExecutor <|.. CallableStep
StepExecutor <|.. BranchStep
StepExecutor <|.. FanOutStep
StepExecutor <|.. FanInStep
DelegationStrategy <|.. RoundRobinStrategy
DelegationStrategy <|.. CapabilityStrategy
DelegationStrategy <|.. ContentBasedStrategy
DelegationStrategy <|.. CostAwareStrategy
DelegationStrategy <|.. ChainStrategy
DelegationStrategy <|.. FallbackStrategy
DelegationStrategy <|.. WeightedStrategy
CompressionStrategy <|.. TruncationStrategy
CompressionStrategy <|.. SummarizationStrategy
CompressionStrategy <|.. MapReduceStrategy
MemoryStore <|.. InMemoryStore
MemoryStore <|.. FileStore
MemoryStore <|.. SQLiteStore
ValidationRule <|.. RegexRule
ValidationRule <|.. FormatRule
ValidationRule <|.. RangeRule
ValidationRule <|.. EnumRule
ValidationRule <|.. CustomRule
Module Responsibilities¶
Core Layer¶
The Core layer provides foundational types, configuration, exceptions, and the plugin system. Every other module depends on at least one Core component.
- types.py -- Enumerations for model providers, agent states, and log levels, the
AgentLikeprotocol, TypeVars, and shared type aliases. (The other extension-point protocols --ToolProtocol,GuardProtocol,ReasoningPattern,StepExecutor,DelegationStrategy,CompressionStrategy,MemoryStore,ValidationRule-- live in their respective modules, not intypes.py.) - config.py --
FireflyAgenticConfig, a Pydantic Settings singleton that reads from environment variables and.envfiles. It actively rejects removed serving/exposure config fields (e.g.otlp_endpoint,rbac_enabled,cors_allowed_origins,cost_calculator) with aValueError. - exceptions.py -- A structured exception hierarchy of 42 classes rooted at
FireflyAgenticError. - plugin.py --
PluginDiscoverydiscovers and loads entry-point plugins at startup. - resilience/circuit_breaker.py --
CircuitBreaker(withCircuitStateandCircuitBreakerOpenError) andCircuitBreakerMiddlewarefor guarding agent runs against cascading failures. - storage/ --
DatabaseStoreandLocalBackend(behind theStorageBackendprotocol) for binary/large-object persistence, withWriteSession,LockTokenleasing,RetryPolicy,StorageMetadata, and a family of storage error types.
Security Layer¶
The Security layer provides input sanitisation and prompt injection defence.
- security/prompt_guard.py --
PromptGuardscans user prompts for 25 known injection patterns (including encoding bypass, zero-width evasion, multi-language, jailbreak, and system prompt extraction), reports matches, and optionally sanitises suspicious fragments.default_prompt_guardprovides a shared instance. - security/output_guard.py --
OutputGuardscans LLM responses for PII (6 patterns), secrets (9 patterns), harmful content (4 patterns), custom patterns, and deny patterns.default_output_guardprovides a shared instance. See the Security Guide. - security/encryption.py --
EncryptionProviderprotocol withAESEncryptionProvider, andEncryptedMemoryStore(store, encryption_key, provider=None)which transparently encryptsMemoryEntry.contentat rest (keys, metadata, and timestamps stay plaintext).
Inbound-request authorization (RBAC/JWT) is intentionally not part of the framework -- it is a hosting concern owned by the host service.
Agent Layer¶
The Agent layer wraps Pydantic AI's Agent class and adds lifecycle management,
a global registry, delegation strategies, and declarative decorators.
- base.py --
FireflyAgentwrapspydantic_ai.Agentwith metadata, hooks, middleware chain, run timeout, and streaming usage tracking. - registry.py --
AgentRegistryis a thread-safe singleton that maps names to agents. - lifecycle.py --
AgentLifecyclehandles init, warmup, and shutdown hooks. - delegation.py -- Multi-agent delegation. Strategies return
RoutingDecisionobjects (ranked, scoredCandidatetuples plus metadata). Built-ins:RoundRobinStrategy,CapabilityStrategy,ContentBasedStrategy(LLM routing), andCostAwareStrategy(priced via the cost resolver chain, pool-relative normalisation). CombinatorsChainStrategy,FallbackStrategy, andWeightedStrategynest strategies without subclassing.DelegationRouterseparatesdecide()(pure, emits thefirefly.routing.decisionOTel event) fromexecute()(runs the chosen agent, forks memory);route()remains the one-call convenience. - context.py --
AgentContextcarries request-scoped data through an agent run. - decorators.py --
@firefly_agentregisters an agent declaratively. - middleware.py --
AgentMiddlewareprotocol,MiddlewareContext, andMiddlewareChainfor pluggable before/after hooks on every agent run. - builtin_middleware.py -- The concrete middleware stack:
LoggingMiddleware,PromptGuardMiddleware,CostGuardMiddleware,ObservabilityMiddleware,ExplainabilityMiddleware,CacheMiddleware,OutputGuardMiddleware,ValidationMiddleware, andRetryMiddleware. Two more live elsewhere:PromptCacheMiddleware(prompt_cache.py) andCircuitBreakerMiddleware(resilience/) -- 11 concrete middleware in total. By defaultFireflyAgentauto-wiresLoggingMiddlewarealways, andObservabilityMiddlewarewhenconfig.observability_enabledis set; the rest are opt-in. (Rate-limit retry is handled insideFireflyAgent.run()rather than byRetryMiddleware.) - prompt_cache.py --
PromptCacheMiddlewareandCacheStatisticsfor prompt-level response caching. - fallback.py --
FallbackModelWrapperandrun_with_fallback()for automatic model failover. Accepts bothstrandModelobjects for cross-provider fallback chains. - model_utils.py -- Centralized model identity extraction
(
extract_model_info,get_model_identifier,detect_model_family) for uniform handling of both"provider:model"strings andModelobjects across the framework's observability and resilience layers. - cache.py --
ResultCachewith TTL, LRU eviction, and thread-safe access. - templates/ -- Pre-built template agents (summarizer, classifier, extractor, conversational, router) available as factory functions. See the Template Agents Guide.
Intelligence Layer¶
- reasoning/ -- Pluggable reasoning patterns (ReAct, Chain of Thought, etc.) with a pipeline for chaining patterns sequentially.
- observability/ -- Emits OpenTelemetry spans (
FireflyTracer), custom metrics (FireflyMetrics), and events (FireflyEvents) through the OTel API;UsageTrackerrolls up token usage, cost is resolved via the resolver chain (resolve_cost,genai_prices_cost,provider_reported_cost,DEFAULT_RESOLVERS; gated by thecost_strictconfig flag), andBudgetGateenforces budgets. Configuring the OTel SDK and exporters is the host service's responsibility. - explainability/ -- Decision recording (
TraceRecorder.record(category, ...)with a.recordsproperty), natural-language explanation generation (ExplanationGenerator), audit trails (AuditTrail.append(actor, action, ...)), and report building (ReportBuilder(title=...).build(records)).
Memory Layer¶
- memory/conversation.py --
ConversationMemory: token-aware, per-conversation chat history wrapping pydantic-ai'smessage_historymechanism. - memory/working.py --
WorkingMemory: scoped key-value scratchpad for session facts, entities, and intermediate state. - memory/store.py --
MemoryStoreprotocol withInMemoryStore,FileStore, andSQLiteStorebackends (MemoryScopenamespacing).create_llm_summarizerbuilds an LLM-backed history summarizer. - memory/manager.py --
MemoryManagerfacade composing conversation and working memory, withfork()for multi-agent scope isolation.
Embeddings & Vector Store Layer¶
These modules are reusable building blocks for retrieval-augmented workflows; the framework
ships no turnkey corpus/RAG agent, but RetrievalStep and EmbeddingStep (orchestration)
let you assemble retrieval pipelines.
- embeddings/ --
BaseEmbedder(theEmbeddingProtocol),EmbedderRegistry, similarity helpers (cosine_similarity,dot_product,euclidean_distance), and 8 provider backends underembeddings/providers/: OpenAI, Azure, Cohere, Google, Mistral, Voyage, Bedrock, and Ollama. - vectorstores/ --
BaseVectorStore(theVectorStoreProtocol),VectorStoreRegistry,VectorDocument,SearchFilter/SearchResult, and 7 backends:InMemoryVectorStore,ChromaVectorStore,PineconeVectorStore,QdrantVectorStore,PgVectorVectorStore, andSqliteVecVectorStore. The scoped layer (ScopedVectorStore,TenantScopedVectorStore,scope_namespace,parse_scope_namespace) partitions a shared store by tenant/scope.
Content Layer¶
- content/chunking.py --
TextChunker,DocumentSplitter,ImageTiler, andBatchProcessorfor splitting large inputs into model-friendly chunks. - content/markdown_chunker.py --
MarkdownChunkerfor structure-aware Markdown splitting. - content/compression.py --
ContextCompressorwith pluggable strategies (TruncationStrategy,SummarizationStrategy,MapReduceStrategy) andSlidingWindowManager.ContextCompressor.compress(...)is async. - content/binary/ (the
[binary]extra) -- A document-conversion subsystem.BinaryNormalizer(configured byBinaryConfig) turns uploaded files intoBinaryArtifacts:sniff_media_typedetects the type, office documents are converted viabuild_office_converter(GotenbergConverter/LibreOfficeConverter/NoOpOfficeConverter, all implementingOfficeConverter),PdfGuardrejects encrypted or corrupt PDFs,ImageNormalizerstandardises images, andArchiveUnpacker/EmailUnpackerexpand archives and email attachments.
Validation Layer¶
- validation/rules.py -- Composable validation rules (
RegexRule,FormatRule,RangeRule,EnumRule,CustomRule),FieldValidator,OutputValidator, andValidationReport. - validation/qos.py --
ConfidenceScorer,ConsistencyChecker,GroundingChecker, andQoSGuard(returningQoSResult) for post-generation quality checks. - validation/reviewer.py --
OutputReviewerandRubricReviewer(LLM-as-judge, withfrom_rubric_file(...)) for criteria-based review, returningReviewResultwithRetryAttempthistory.
Orchestration Layer¶
- pipeline/dag.py --
DAG,DAGNode,DAGEdgewith topological sort, cycle detection, execution-level grouping, and per-nodeFailureStrategy. - pipeline/engine.py --
PipelineEngineruns DAGs with eager scheduling, concurrency, retries, timeouts, condition gates, and failure strategy enforcement. - pipeline/builder.py -- Fluent
PipelineBuilderfor constructing pipelines. - pipeline/steps.py -- Step executors implementing the
StepExecutorprotocol:AgentStep,ReasoningStep,CallableStep,BranchStep,FanOutStep,FanInStep,BatchLLMStep,EmbeddingStep, andRetrievalStep(RetrievalStep(store, *, embedder=None, top_k=5, input_key="input")). - pipeline/context.py --
PipelineContextshared data bus, with state reducers (append,extend,merge_dict,replace) and control signals (Pause,Send). - pipeline/result.py --
NodeResult,PipelineResult, andExecutionTraceEntry. - pipeline checkpointing & audit --
Checkpointer/FileCheckpointer(recordingCheckpointRecords) for resumable runs, and the audit-log familyAuditLog/FileAuditLog/LoggingAuditLog/OtelAuditLog/QueryableAuditLog(emittingAuditEntrys). Event hooks are wired viaEventHandler/PipelineEventHandler.
Experimentation Layer (optional dev-tooling)¶
These are optional, leaf dev-tooling modules; the core framework never imports them.
- experiments/ -- Define experiments with named variants, run them through an
ExperimentRunner(experiment, agent_factory, *, context=None), track metrics withExperimentTracker(storage_path=...), and compare results withVariantComparator. - lab/ -- Interactive sessions, benchmarks, datasets, side-by-side comparisons, and pluggable evaluators.
Request Flow¶
The following diagram shows the typical lifecycle of an in-process agent run: a caller resolves an agent from the registry and invokes it, the agent reasons with tools, and observability and explainability artefacts are produced.
sequenceDiagram
participant Caller
participant Reg as AgentRegistry
participant Agent as FireflyAgent
participant Mem as MemoryManager
participant Reason as ReasoningPattern
participant Tool as BaseTool / Guard
participant Val as OutputReviewer
participant OBS as FireflyTracer<br/>FireflyMetrics
participant EXPL as TraceRecorder<br/>AuditTrail
Caller->>Reg: agent_registry.get(name)
Reg-->>Caller: FireflyAgent instance
Caller->>Agent: agent.run(prompt, conversation_id)
Agent->>OBS: tracer.agent_span(agent_name, model=...)
Agent->>Mem: load conversation history
Mem-->>Agent: message_history
Agent->>Reason: pattern.execute(agent, prompt)
loop Reasoning iterations (_reason → _act → _observe)
Reason->>Agent: LLM call via pydantic_ai.Agent
Reason->>Tool: guard.check() → tool.execute()
Tool-->>Reason: tool result
Reason->>OBS: metrics.record_tokens() · tracer.event(...)
Reason->>EXPL: recorder.record(category, ...)
end
Reason-->>Agent: ReasoningResult(output, trace)
Agent->>Val: reviewer.review(output)
Val-->>Agent: validated output (retry on failure)
Agent->>Mem: save conversation turn
Agent->>OBS: metrics.record_latency() (span closes)
Agent->>EXPL: audit_trail.append(actor, action, ...)
Agent-->>Caller: AgentResponse
Pipeline Execution Flow¶
When agents are orchestrated through a DAG pipeline, PipelineEngine executes
nodes level-by-level. Each node wraps a StepExecutor implementation.
sequenceDiagram
participant Caller
participant Builder as PipelineBuilder
participant DAG as DAG<br/>(topological sort)
participant Engine as PipelineEngine
participant Ctx as PipelineContext
participant S1 as AgentStep
participant S2 as ReasoningStep
participant S3 as FanOutStep
participant S4 as FanInStep
participant S5 as CallableStep
Caller->>Builder: .add_node() · .add_edge() · .chain()
Builder->>DAG: build DAG with nodes and edges
Caller->>Engine: engine.run(dag, inputs)
Engine->>DAG: topological_sort() → execution levels
Engine->>Ctx: create PipelineContext(inputs)
loop For each execution level
Engine->>Engine: asyncio.gather(nodes in level)
Note over Engine: condition gate check per node
alt AgentStep node
Engine->>S1: execute(context, inputs)
S1-->>Engine: agent output
else ReasoningStep node
Engine->>S2: execute(context, inputs)
S2-->>Engine: reasoning result
else FanOut → FanIn
Engine->>S3: fan-out to parallel branches
S3-->>Engine: branch outputs
Engine->>S4: fan-in / aggregate
S4-->>Engine: merged result
else CallableStep node
Engine->>S5: execute(context, inputs)
S5-->>Engine: function output
end
Engine->>Ctx: store node results
end
Engine-->>Caller: PipelineResult(node_results, trace)
Memory Architecture¶
MemoryManager composes ConversationMemory and WorkingMemory, delegating
persistence to a pluggable MemoryStore backend.
graph TD
subgraph MemoryManager
MM["MemoryManager<br/><small>new_conversation · fork<br/>get_working · get_conversation</small>"]
end
subgraph Conversation
CM["ConversationMemory<br/><small>add_turn · get_history<br/>token budget · FIFO eviction</small>"]
TE["TokenEstimator<br/><small>estimate_tokens</small>"]
end
subgraph Working
WM["WorkingMemory<br/><small>set · get · delete<br/>scoped namespaces</small>"]
end
subgraph Backends
IMS["InMemoryStore<br/><small>dict-backed</small>"]
FS["FileStore<br/><small>JSON file per namespace</small>"]
SQL["SQLiteStore<br/><small>SQLite-backed</small>"]
end
MM --> CM
MM --> WM
CM --> TE
WM -->|MemoryStore protocol| IMS
WM -->|MemoryStore protocol| FS
WM -->|MemoryStore protocol| SQL
style MM fill:#4a90d9,color:#fff
style CM fill:#7eb8da,color:#000
style WM fill:#7eb8da,color:#000
Reasoning Pattern Architecture¶
All six reasoning patterns extend AbstractReasoningPattern, which provides a
template-method loop: _reason → _act → _observe → _should_continue.
graph TD
subgraph AbstractReasoningPattern
EX["execute(agent, input)"]
R["_reason()"]
A["_act()"]
O["_observe()"]
SC["_should_continue()"]
EX --> R --> A --> O --> SC
SC -->|yes| R
SC -->|no| OUT["ReasoningResult"]
end
subgraph Concrete Patterns
REACT["ReActPattern<br/><small>observe → think → act</small>"]
COT["ChainOfThoughtPattern<br/><small>step-by-step reasoning</small>"]
PAE["PlanAndExecutePattern<br/><small>plan → execute → replan</small>"]
REF["ReflexionPattern<br/><small>execute → critique → retry</small>"]
TOT["TreeOfThoughtsPattern<br/><small>branch → evaluate → select</small>"]
GD["GoalDecompositionPattern<br/><small>goal → phases → tasks</small>"]
end
subgraph Pipeline
RP["ReasoningPipeline<br/><small>chains patterns sequentially</small>"]
end
REACT --> EX
COT --> EX
PAE --> EX
REF --> EX
TOT --> EX
GD --> EX
RP --> REACT
RP --> COT
RP --> PAE
style EX fill:#e67e22,color:#fff
style OUT fill:#27ae60,color:#fff
Multi-Provider Support¶
The framework is provider-agnostic: it never re-implements a provider client.
Model construction is delegated entirely to pydantic-ai, so any provider pydantic-ai
supports — OpenAI, Anthropic, Google/Gemini, Groq, Bedrock, Mistral, Cohere,
DeepSeek, xAI, OpenRouter, Azure, Ollama, … — works by passing either a
"provider:model" string or a pydantic-ai Model object to any agent. Everything
the framework adds on top is built to be provider-uniform:
- Identity normalisation —
model_utils(get_model_identifier,extract_model_info,detect_model_family) turns both"provider:model"strings andModelobjects (reading the provider from the model's own_provider.name) into a canonicalprovider:modelstring. This single key feeds cost lookup, quota/backoff, and usage grouping, so a model object never "drops" its provider. - Cost — pricing keys off that identifier across all providers, with
provider-aware nuances: Gemini
thoughts_tokensare counted (OpenAI/Anthropic fold reasoning into output), Bedrock vendor-prefixed ids get a retry on the bare model name, and an authoritative provider cost (OpenRouterusage.cost) wins when available. See Observability → Cost Resolution. - Prompt caching — routed by model family: Anthropic writes
cache_controlbreakpoints, OpenAI supplies a routing key (its caching is automatic), Gemini usescachedContent; Claude via Bedrock/OpenRouter is skipped with a warning (those backends cache differently). See Agents → PromptCacheMiddleware. - Tool schemas — real
python_types keep tool schemas portable, with one caveat: Gemini rejects free-formdict[str, Any]object schemas — use a JSON string or a nested model instead. See Tools. - Failover & rate limits —
FallbackModelWrapper/run_with_fallbackfail over across providers, and rate-limit backoff prefers a provider's structured retry hint (e.g. Geminiretry_delay) before falling back to exponential backoff.
This is validated end-to-end against a real provider by
tests/integration/test_real_anthropic_e2e.py (nightly; see tests/README).
Plugin System¶
Plugins are discovered via Python entry points under three well-known groups:
fireflyframework_agentic.agents, fireflyframework_agentic.tools, and
fireflyframework_agentic.reasoning_patterns. The PluginDiscovery class scans
these groups and loads the referenced objects so they can self-register with
their respective registries.
flowchart LR
subgraph Package pyproject.toml
EP1["fireflyframework_agentic.agents<br/><small>my_agent = my_pkg:MyAgent</small>"]
EP2["fireflyframework_agentic.tools<br/><small>my_tool = my_pkg:MyTool</small>"]
EP3["fireflyframework_agentic.reasoning_patterns<br/><small>my_pattern = my_pkg:MyPattern</small>"]
end
PD["PluginDiscovery<br/><small>discover_all() · discover_group()</small>"]
subgraph Registries
AR["AgentRegistry<br/><small>register · get · list_agents</small>"]
TR["ToolRegistry<br/><small>register · get · list_tools</small>"]
RR["Reasoning Registry<br/><small>(pattern catalog)</small>"]
end
EP1 --> PD
EP2 --> PD
EP3 --> PD
PD --> AR
PD --> TR
PD --> RR
To create a plugin, add entry points in your package's pyproject.toml:
[project.entry-points."fireflyframework_agentic.agents"]
my_agent = "my_package.agents:MyAgent"
[project.entry-points."fireflyframework_agentic.tools"]
my_tool = "my_package.tools:MyTool"
Then call discovery at startup:
from fireflyframework_agentic.plugin import PluginDiscovery
result = PluginDiscovery.discover_all()
print(f"Loaded {len(result.successful)} plugins, {len(result.failed)} failed")
Configuration¶
All configuration is managed through FireflyAgenticConfig, which reads values from
environment variables prefixed with FIREFLY_AGENTIC_. For example:
export FIREFLY_AGENTIC_DEFAULT_MODEL=openai:gpt-4o
export FIREFLY_AGENTIC_LOG_LEVEL=DEBUG
export FIREFLY_AGENTIC_OBSERVABILITY_ENABLED=true
export FIREFLY_AGENTIC_NATIVE_INSTRUMENTATION_ENABLED=true # native pydantic-ai GenAI spans (see observability.md)
export FIREFLY_AGENTIC_REASONING_OUTPUT_MODE=prompted # reasoning structured-output strategy (see reasoning.md)
The framework emits telemetry through the OpenTelemetry API but does not configure the OTel SDK or any exporter. Wiring up the SDK/exporter endpoint (including any OTLP endpoint) is the host service's responsibility;
config.observability_enabledonly gates whether the framework emits spans and metrics. Supplying removed serving/exposure fields (e.g.otlp_endpoint,rbac_enabled,cors_allowed_origins) raises aValueError.
The configuration singleton is available via: