Is SBDK production-ready?

Yes! SBDK uses battle-tested tools (DLT, dbt, DuckDB) that power production data pipelines at thousands of companies. The CLI provides professional error handling, validation, and clear error messages.

Can I use SBDK with cloud data sources?

Absolutely. SBDK supports all DLT sources (APIs, databases, SaaS apps). You ingest from cloud sources but process and analyze locally, avoiding data egress costs.

How does local processing scale?

DuckDB can process millions of rows in seconds on a laptop. For truly massive datasets (100GB+), you can still use SBDK for development and deploy to a larger local or on-prem machine.

Will you always offer a free tier?

Yes. SBDK core is MIT licensed and will always be free and open source. Future Team and Enterprise tiers will add collaboration features, but the core toolkit remains free forever.

What if I need cloud deployment later?

Your SBDK pipelines are just Python code using standard tools. You can deploy them anywhere: Docker containers, Kubernetes, cloud VMs, or serverless functions.

How is this different from running dbt locally?

SBDK gives you the complete stack (ingestion + transformation + analytics) with one command. No juggling multiple tools, configs, or databases. Everything works together out of the box.

Production-Ready Examples for Building Local-First Data Tools

Name: SBDK.dev
Rating: 5 (100 reviews)
Author: SBDK

Five complete, working implementations showing how to build data pipelines, ML-in-SQL, semantic layers, and AI analytics—without cloud dependencies.

📦

Complete Codebases

Not tutorials or docs—actual production-quality code you can run, study, and fork.

🔗

Working Examples

See how DLT, dbt, DuckDB, Rust extensions, and MCP integration actually fit together.

⚡

Skip Months of R&D

Proven patterns for local-first architecture, tested and documented. Just fork and adapt.

Explore the 5 Projects See Use Cases

Open source • MIT Licensed • Archived as reference implementations (Nov 2025)

Who Should Use These?

Whether you're building similar tools or learning modern data engineering, these implementations save you months of research.

Building Local-First Tools?

Stop piecing together scattered docs. See complete working examples of:

How to structure a data pipeline with DLT + dbt + DuckDB
Building Rust extensions for DuckDB (ML/AI in SQL)
MCP server integration for AI assistants
Statistical rigor in AI-powered analytics

Learning Modern Data Stack?

Skip tutorials. Learn from production-quality code that shows:

Real architecture patterns, not toy examples
How components actually connect and work together
Error handling, testing, and deployment practices
Why certain tech choices were made (comments explain decisions)

How The Ecosystem Fits Together

Each project builds on the foundation to create a complete local-first analytics platform

Foundation: SBDK.dev

The core framework providing local-first data pipelines with DLT (ingestion), dbt (transformation), and DuckDB (analytics). Everything else builds on this foundation.

Intelligence: Mallard (local-inference)

A DuckDB extension adding ML/AI capabilities. Run zero-shot predictions, generate embeddings, and get feature importance—all in SQL, no separate ML infrastructure needed.

Visualization: Semantic Tracer

Visualizes dbt semantic layers with interactive lineage graphs. Understand how your metrics, dimensions, and entities connect. Built with Tauri and React Flow.

Conversational: Local AI Analyst

Ask questions in natural language, get answers based on real query results with statistical rigor. Execution-first approach prevents AI fabrication with confidence intervals and significance testing.

Integration: knowDB

Connects everything to AI assistants via MCP (Model Context Protocol). Query your data through Claude Desktop or ChatGPT Desktop with automatic dbt model syncing and semantic layer integration.

The Result

A complete stack for building local-first analytics tools. Start with raw data, transform it, analyze it with ML, visualize relationships, and query it conversationally—all without cloud dependencies.

Five Projects, One Ecosystem

Each project is a complete, production-quality reference implementation. Fork any or all to build your own local-first data tools.

SBDK.dev (Sandbox Development Kit)

A developer sandbox framework for local-first data pipeline development using DLT, DuckDB, and dbt. It provides a complete local-first environment for prototyping, learning, and developing data solutions before deploying to production systems.

Key Features

11x Faster Installation: Lightning-fast installation with `uv`.
100% Local: No cloud dependencies, no complex setup.
Intelligent Guided UI: A clean, intuitive interface with actionable options.
Hot Reload: Automatic re-runs when files change for iterative development.

View Code Fork It

Mallard (local-inference)

ARCHIVED

A local-first semantic layer for AI-powered analytics, providing a "Snowflake Cortex for Local-First Databases." It allows you to run powerful, zero-shot tabular predictions directly in your database with simple SQL.

Key Features

Zero-Shot Predictions: Use powerful foundation models for classification and regression without training.
Simple SQL Interface: All functionality is exposed through declarative SQL UDFs.
High-Performance & Local-First: Built in Rust as a DuckDB extension.
Embeddings & Explainability: Generate dense vector embeddings and get feature importance explanations.

View Code Fork It

Semantic Tracer

ARCHIVED

A local-first application for visualizing and exploring dbt semantic layers. It connects directly to your dbt project and Snowflake account to provide a real-time, interactive lineage graph of your metrics, dimensions, and entities.

Key Features

Local-First: Your data and semantic models never leave your machine.
dbt Semantic Layer Integration: Connects seamlessly to your `semantic_models.yml` file.
Interactive Lineage Graph: Utilizes React Flow to create a dynamic and explorable graph.
Tauri Backend: A lightweight Rust backend provides high performance and a secure application shell.

View Code Fork It

Local AI Analyst

ARCHIVED

An AI-powered data analyst with a semantic layer, statistical rigor, and natural language insights. It allows you to ask questions in natural language and get answers based on real query results, not AI guesses.

Key Features

Natural Language Queries: Ask questions like "What's our conversion rate by plan type?"
Statistical Rigor: Automatic significance testing, confidence intervals, and sample size validation.
Execution-First: Prevents AI fabrication by building, executing, and then annotating results.
Multi-Query Workflows: Built-in analytical workflows for comprehensive analysis.

View Code Fork It

knowDB

ARCHIVED

A local-first agentic analytics platform that extends `sbdk-dev` to enable natural language queries against your data through AI assistants like Claude Desktop and ChatGPT Desktop via the Model Context Protocol (MCP).

Key Features

Multi-AI Support: Works with any MCP-compatible AI assistant.
dbt Integration: Sync dbt models to the semantic layer automatically.
Local-First: Runs entirely on your machine with DuckDB.
Open Source: MIT License - free for personal and commercial use.

View Code Fork It

Ready to Build?

These projects are reference implementations showing how to build local-first data tools. Here's how to get started:

Pick a Project

Start with SBDK.dev for the foundation, or choose any project that matches your needs. Each works standalone or as part of the ecosystem.

Fork & Explore

Fork the repository, read the README, explore the code. Each project includes comprehensive documentation and examples.

# Example: SBDK.dev

git clone https://github.com/sbdk-dev/sbdk-dev

cd sbdk-dev

pip install -e .

Adapt & Extend

These are reference implementations—take what works, modify what doesn't, and build your own tools. All projects are MIT licensed for maximum flexibility.

Share Your Work

Built something cool? Share it! Open an issue on the original repo to showcase your fork or derivative work.

Why These Are Archived

These are complete, stable reference implementations—not active products. They're archived because they're done: production-quality code demonstrating proven patterns. Perfect for forking, learning, or adapting for your own projects.