Architecture¶
Firefly DataScience is a hexagonal, auto-configured data-science framework: a lean DI container wires ports to adapters, and a Spring-Boot-style application context boots it all from packaging entry points.
This page explains how the pieces fit together: the five layers, the ports-and-adapters (hexagonal) core, entry-point auto-configuration, the dependency-injection container, and the FireflyDataScienceApplication startup lifecycle. The design goal throughout is that the domain never depends on a vendor SDK, and that an adapter can be added, swapped, or overridden without touching calling code.
The five layers¶
The framework is organised top-to-bottom so that the domain never depends on a vendor SDK:
- Application —
FireflyDataScienceApplication/ApplicationContext: the bootstrap and the started, wired context you resolve beans from. - Auto-configuration —
@auto_configuration@configurationclasses discovered via entry points; each contributes adapters conditionally. - Container — the
Container: a type-hint-driven IoC container with singleton/transient scopes and constructor injection. - Domain / Ports — protocol interfaces (e.g.
DatasetLoaderPort) plus the light, dependency-free core types incore.types(TaskType,Modality,Scope). - Adapters — concrete implementations of the ports backed by optional extras (scikit-learn, OpenML, deep-learning, GenAI, ...), each gated by a condition.
The core stays importable with no optional ML extra installed — vendor imports live inside adapters and @bean methods, never at module top level. core.types enforces this with hand-written StrEnums (TaskType, Modality, Scope) and no third-party ML imports.
The reproducible pattern — the LLM proposes; the classical engine decides
The same separation that keeps vendor SDKs out of the domain keeps GenAI out of the decision path. GenAI lives in adapters behind ports; the deterministic classical engine trains, scores and selects. The architecture is what makes the rule enforceable: a GenAI adapter can only ever propose — the container resolves a port, and the classical engine decides whether a proposal survives a measured improvement over a seeded baseline.
Hexagonal: ports and adapters¶
A port is a Protocol the domain depends on. An adapter is a concrete class that implements it. The container binds them by type annotation, so swapping an adapter never touches calling code.
from typing import Protocol
class DatasetLoaderPort(Protocol):
def load(self, name: str) -> object: ...
class SklearnDatasetLoader:
def load(self, name: str) -> object:
from sklearn import datasets
return getattr(datasets, f"load_{name}")()
Each data-science port is declared as a Protocol in its own domain module (not in a central package): DatasetLoaderPort in datasets, TrainerPort in models, AutoMLBackendPort in automl, FeatureEngineerPort in features, SearchPolicyPort in search, MetricsEvaluatorPort in evaluation, ValidatorPort in validation, and TrackerPort / RegistryPort in tracking. Each is a contract the domain calls; the concrete class that fulfils it is decided at boot.
The adapter is contributed by an auto-configuration, gated on the optional dependency being importable:
from fireflyframework_datascience.container.conditions import (
auto_configuration,
conditional_on_class,
)
from fireflyframework_datascience.container.stereotypes import bean, configuration
from fireflyframework_datascience.datasets import DatasetLoaderPort
@auto_configuration
@conditional_on_class("sklearn")
@configuration
class DatasetsAutoConfiguration:
@bean(name="sklearn_dataset_loader")
def sklearn_loader(self) -> DatasetLoaderPort: # (1)!
from fireflyframework_datascience.datasets.adapters import SklearnDatasetLoader
return SklearnDatasetLoader()
- The
@beanmethod's return annotation is the provided type —DatasetLoaderPorthere — so the container registersSklearnDatasetLoaderunder the port. ResolvingDatasetLoaderPortyields whichever adapter won. (At boot,_apply_onereadsget_type_hints(method)["return"]; a@beanmethod with no return annotation is skipped.)
Key types¶
A small, stable vocabulary spans the wiring layer. These are the names you actually import:
| Type / decorator | Module | Role |
|---|---|---|
Container |
container.container |
The IoC container; resolution by type annotation. |
Scope |
core.types |
SINGLETON (cached, default) or TRANSIENT (new each resolve). |
@configuration / @bean |
container.stereotypes |
Mark a class as holding factory methods; mark a method as a bean factory. |
@component |
container.stereotypes |
Mark a class as injectable by its own type. |
@auto_configuration |
container.conditions |
Mark a class discoverable via the entry-point group. |
@order |
core.ordering |
Set ordering (lower runs/resolves first). |
ConditionContext |
container.conditions |
What a condition is evaluated against (config + container). |
ApplicationContext |
application |
A started app: the loaded config plus the wired container. |
WiringError |
core.exceptions |
Raised on ambiguous, missing, or circular dependencies. |
The @bean decorator defaults to scope=Scope.SINGLETON and primary=False; pass name=, scope=, or primary= to override. @component and the container's register_* methods share the same defaults.
Entry-point auto-configuration¶
Adapter packages register their auto-configuration class under the firefly_datascience.auto_configuration entry-point group in pyproject.toml:
[project.entry-points."firefly_datascience.auto_configuration"]
core = "fireflyframework_datascience.core.auto_configuration:CoreAutoConfiguration"
datasets = "fireflyframework_datascience.datasets.auto_configuration:DatasetsAutoConfiguration"
models = "fireflyframework_datascience.models.auto_configuration:ModelsAutoConfiguration"
At startup discover_auto_configurations() loads every class in the group, tolerating any whose optional extra is missing (it is simply skipped — its @conditional_on_class would have excluded it anyway), then sorts them by @order:
from fireflyframework_datascience.core.plugin import discover_auto_configurations
for cls in discover_auto_configurations():
print(cls.__name__)
CoreAutoConfiguration is the always-on reference example: it has no @conditional_on_class, so it always applies, and it registers a single RuntimeInfo bean snapshotting the framework version, Python version, platform, default ML framework, and whether GenAI is enabled:
@auto_configuration
@configuration
class CoreAutoConfiguration:
@bean
def runtime_info(self, config: FireflyDataScienceConfig) -> RuntimeInfo: # (1)!
return RuntimeInfo(
framework_version=__version__,
python_version=platform.python_version(),
platform=platform.platform(),
default_ml_framework=config.default_ml_framework,
genai_enabled=config.genai.enabled,
)
- The method's only parameter,
config, is filled by type hint:FireflyDataScienceConfigis already registered as a bean (the application context registers it first), so the container injects it when it calls the factory.
Conditions¶
Conditions gate both whole auto-configurations and individual @bean methods. Each is evaluated against a ConditionContext (the loaded config plus the partially-wired container):
from fireflyframework_datascience.container.conditions import (
conditional_on_class, # an optional extra is importable
conditional_on_property, # a config key is set / equals a value
conditional_on_missing_bean, # user override wins
conditional_on_bean, # another bean is already present
)
conditional_on_class("sklearn")matches whenimportlib.util.find_spec("sklearn")resolves — i.e. the optional extra is installed.conditional_on_property("genai.enabled")reads a dotted path off the config; with nohaving_valueit matches when the value is truthy ("1","true","yes","on", or any truthy object), andmatch_if_missing=Truecontrols behaviour when the key is absent.conditional_on_bean(SomePort)/conditional_on_missing_bean(SomePort)query the partially-wired container — so ordering (@order) decides what is already present when a condition runs.
conditional_on_missing_bean(DatasetLoaderPort) is the secure-by-default override hook: a framework default applies only when you have not already registered your own. Because conditions see the live container, registering your adapter first (lower @order, or via extra_auto_configurations) is enough to win.
The DI container¶
Container is a lean IoC container; resolution is by type annotation, with constructor injection and circular-dependency detection. There are three ways to register a bean:
from fireflyframework_datascience.container.container import Container
container = Container()
container.register_instance(DatasetLoaderPort, SklearnDatasetLoader()) # (1)!
- Register an already-constructed object as a singleton. Use this when you built the instance yourself (e.g. the application context registers the loaded config this way).
Resolution mirrors the three shapes you need in practice:
loader = container.resolve(DatasetLoaderPort) # single bean (honours @primary)
maybe = container.resolve_optional(DatasetLoaderPort) # None if absent
allof = container.resolve_all(DatasetLoaderPort) # every bean, sorted by @order
Key behaviours:
- Scopes —
Scope.SINGLETON(cached, the default) andScope.TRANSIENT(new each resolve). - Ambiguity — multiple beans for one type require exactly one marked
primary=True, elseresolveraisesWiringError. Resolve by name withresolve_by_name(...)to disambiguate. - Injection — constructor / factory parameters are filled by type hint;
Optional[X]/X | Noneparams resolve toNonewhen no bean exists, and a parameter with a default is left to its default when no matching bean is found. - Circular dependencies — detected during construction; a cycle raises
WiringErrorrather than recursing. - Fail-fast —
eager_init()instantiates every singleton at boot, validating the whole wiring graph before your code runs.
Resolution is by annotation, not by name
resolve(...) looks up registrations by the provided type. Names are a side channel:
register_* accept a name=, and resolve_by_name(...) / bean_names() work off it. A bean
with no usable return annotation is never registered (the application context skips it).
The application lifecycle¶
FireflyDataScienceApplication.start() runs a fixed sequence, mirroring pyfly's lifecycle:
- Load config (
FireflyDataScienceConfig.load) — unless one is passed in. - Print the banner.
- Create the
Containerand register the config as a bean. - Discover auto-configurations (entry points + any extras), de-duplicate while preserving order, sort by
@order. - Evaluate each auto-configuration's conditions; for those that pass, instantiate the class and register every
@beanmethod whose own conditions also pass. eager_init()all singletons (fail-fast).- Print the wiring summary and return a ready
ApplicationContext.
from fireflyframework_datascience.application import FireflyDataScienceApplication
# One-call bootstrap.
ctx = FireflyDataScienceApplication.run()
print(ctx.bean_count, "beans")
print([ac.__name__ for ac in ctx.applied_auto_configurations])
loader = ctx.get(DatasetLoaderPort) # resolve a bean by type
tracker = ctx.get_optional(SomeOptionalPort) # None if not wired
When the banner is on, boot ends by printing the wiring summary — a quick check that the expected adapters were applied:
Expected
Firefly DataScience is ready.
profiles : default
beans : 7
auto-config : 3 applied (CoreAutoConfiguration, DatasetsAutoConfiguration, ModelsAutoConfiguration)
ml framework : sklearn
genai : disabled
sandbox : ...
The exact bean count and applied list depend on which optional extras are installed; the line shape (profiles, beans, auto-config, ml framework, genai, sandbox) is fixed.
You can steer the boot without forking the framework:
ctx = FireflyDataScienceApplication.run(
config_dir="config",
profiles=["prod"],
extra_auto_configurations=[MyCustomAutoConfiguration], # add your own
print_output=False, # quiet boot for tests
)
Passing auto_configurations=[...] replaces discovery entirely (handy for hermetic tests); extra_auto_configurations=[...] appends to whatever was discovered. The returned ApplicationContext exposes .config, .container, .applied_auto_configurations, .bean_count, and the get / get_optional resolvers.
Quiet boots and hermetic tests
Pass print_output=False to silence the banner and wiring summary, and
auto_configurations=[...] to pin an exact set of auto-configurations — together they make the
context fully deterministic for tests, with no dependence on which extras happen to be installed.
Auto-configuration flow¶
Adapters self-register via the firefly_datascience.auto_configuration entry-point group; the application context discovers them, evaluates their conditions, and registers the surviving beans.
See also¶
- Quickstart — boot the application context in one call.
- Configuration — the
FireflyDataScienceConfigthat conditions read. - AutoML — what the wired ports drive end to end.
- GenAI features — the gated adapters behind the ports.
- Security — the override and sandbox guarantees this wiring underpins.