Firefly DataScience¶
AutoML that fuses GenAI with classical ML & Deep Learning — hexagonal, secure-by-default, and native to the Firefly Framework.
fireflyframework-datascience is a state-of-the-art Python metaframework for AutoML. It pairs
GenAI — built on fireflyframework-agentic,
which wraps Pydantic AI — with traditional ML and Deep Learning, so
any team can apply data science to any project quickly, with production governance, hexagonal
swappability, and security by default.
The reproducible pattern — the LLM proposes; the classical engine decides
GenAI proposes code, features, pipelines and seeds; a deterministic classical engine trains, scores and selects; and every GenAI step is gated behind a measured improvement over a seeded classical baseline. GenAI is a governed, measurably-gated accelerator over a battle-tested classical core — never a black box.
Want the whole story in one document?
The Complete Guide (PDF) combines the executive summary and strategic case (faster time-to-value, governed GenAI, no lock-in) with the full architecture, a hands-on tutorial, and the benchmark evidence — one document for both leaders and engineers.
Why Firefly DataScience?¶
-
Classical-first AutoML
A deterministic engine trains, scores and selects across scikit-learn, XGBoost, LightGBM, CatBoost, AutoGluon and TabPFN — reproducible from a seed.
-
GenAI as a gated accelerator
The LLM proposes features and pipelines; nothing ships unless it beats the seeded classical baseline (
genai.cost_benefit_gateis on by default). -
The agentic ML-engineering loop
Propose → train → score → select, driven by the agentic runtime, with a measured improvement required at every step.
-
Deep Learning, swappable
PyTorch Lightning and HuggingFace sit behind the same ports as the classical adapters — tabular, text, vision, timeseries and multimodal.
-
Hexagonal & swappable
Every ML/MLOps library (MLflow, Feast, BentoML, …) is a swappable adapter behind a
Protocolport; the core stays library-agnostic. -
Secure by default
LLM-generated code runs in a sandbox (
montyby default) with timeouts and approval gates; GenAI is off until you enable it. -
Explainable & trustworthy
Deterministic global + local feature importances (permutation, SHAP) and calibrated probabilities — so every model can be explained, and its scores trusted for real decisions.
Get started in 30 seconds¶
uv add fireflyframework-datascience # core (ports, app, DI — no heavy ML libs)
uv add 'fireflyframework-datascience[automl-stack]' # + classical AutoML + tracking
Requires Python 3.13+. Extras compose, e.g. [tabular,tracking,genai].
from fireflyframework_datascience import FireflyDataScienceApplication
# load config -> print banner -> wire DI container -> wiring summary -> ready context
app = FireflyDataScienceApplication.run()
print(app.bean_count) # number of wired beans
print(app.config.default_ml_framework) # "sklearn"
print(app.applied_auto_configurations) # discovered auto-configurations
GenAI is classical-first and off by default — opt in, and require a measured win, explicitly:
config = FireflyDataScienceConfig(app_name="lumen-credit-risk", default_ml_framework="lightgbm")
config.genai.enabled = True # opt in to the GenAI accelerator
config.genai.cost_benefit_gate = True # require a measured win over baseline
config.execution.sandbox = "docker" # sandbox LLM-generated code
app = FireflyDataScienceApplication.run(config=config)
Explore the docs¶
-
Hexagonal ports/adapters, the DI container, and auto-configuration.
-
Install, boot an
ApplicationContext, run your first AutoML job. -
FireflyDataScienceConfig, profiles, env vars, YAML overlays. -
Dataset backends (pandas, …) and
Modality. -
Train, score, select — with calibration, stacking ensembles, PR-AUC selection & CV strategies.
-
Deterministic global + local feature importances (permutation, SHAP).
-
The gated GenAI accelerator and the cost-benefit gate.
-
Propose → train → score → select on the agentic runtime.
-
PyTorch Lightning & HuggingFace behind the ports.
-
Model registry, feature store, and BentoML serving.
-
Sandboxed code execution, approval gates, secure defaults.
-
Reproducible measurement of GenAI vs. classical baselines.
-
End-to-end lending vertical worked example.