Open Source Reference Implementations: Complete, production-quality codebases demonstrating local-first data and AI patterns. MIT Licensed • Fork & Build Your Own

Production-Ready Examples for Building Local-First Data Tools

Five complete, working implementations showing how to build data pipelines, ML-in-SQL, semantic layers, and AI analytics—without cloud dependencies.

📦

Complete Codebases

Not tutorials or docs—actual production-quality code you can run, study, and fork.

🔗

Working Examples

See how DLT, dbt, DuckDB, Rust extensions, and MCP integration actually fit together.

Skip Months of R&D

Proven patterns for local-first architecture, tested and documented. Just fork and adapt.

Open source • MIT Licensed • Archived as reference implementations (Nov 2025)

Who Should Use These?

Whether you're building similar tools or learning modern data engineering, these implementations save you months of research.

Building Local-First Tools?

Stop piecing together scattered docs. See complete working examples of:

  • How to structure a data pipeline with DLT + dbt + DuckDB
  • Building Rust extensions for DuckDB (ML/AI in SQL)
  • MCP server integration for AI assistants
  • Statistical rigor in AI-powered analytics

Learning Modern Data Stack?

Skip tutorials. Learn from production-quality code that shows:

  • Real architecture patterns, not toy examples
  • How components actually connect and work together
  • Error handling, testing, and deployment practices
  • Why certain tech choices were made (comments explain decisions)

How The Ecosystem Fits Together

Each project builds on the foundation to create a complete local-first analytics platform

1

Foundation: SBDK.dev

The core framework providing local-first data pipelines with DLT (ingestion), dbt (transformation), and DuckDB (analytics). Everything else builds on this foundation.

2

Intelligence: Mallard (local-inference)

A DuckDB extension adding ML/AI capabilities. Run zero-shot predictions, generate embeddings, and get feature importance—all in SQL, no separate ML infrastructure needed.

3

Visualization: Semantic Tracer

Visualizes dbt semantic layers with interactive lineage graphs. Understand how your metrics, dimensions, and entities connect. Built with Tauri and React Flow.

4

Conversational: Local AI Analyst

Ask questions in natural language, get answers based on real query results with statistical rigor. Execution-first approach prevents AI fabrication with confidence intervals and significance testing.

5

Integration: knowDB

Connects everything to AI assistants via MCP (Model Context Protocol). Query your data through Claude Desktop or ChatGPT Desktop with automatic dbt model syncing and semantic layer integration.

The Result

A complete stack for building local-first analytics tools. Start with raw data, transform it, analyze it with ML, visualize relationships, and query it conversationally—all without cloud dependencies.

Five Projects, One Ecosystem

Each project is a complete, production-quality reference implementation. Fork any or all to build your own local-first data tools.

SBDK.dev (Sandbox Development Kit)

A developer sandbox framework for local-first data pipeline development using DLT, DuckDB, and dbt. It provides a complete local-first environment for prototyping, learning, and developing data solutions before deploying to production systems.

Key Features

  • 11x Faster Installation: Lightning-fast installation with `uv`.
  • 100% Local: No cloud dependencies, no complex setup.
  • Intelligent Guided UI: A clean, intuitive interface with actionable options.
  • Hot Reload: Automatic re-runs when files change for iterative development.

Mallard (local-inference)

ARCHIVED

A local-first semantic layer for AI-powered analytics, providing a "Snowflake Cortex for Local-First Databases." It allows you to run powerful, zero-shot tabular predictions directly in your database with simple SQL.

Key Features

  • Zero-Shot Predictions: Use powerful foundation models for classification and regression without training.
  • Simple SQL Interface: All functionality is exposed through declarative SQL UDFs.
  • High-Performance & Local-First: Built in Rust as a DuckDB extension.
  • Embeddings & Explainability: Generate dense vector embeddings and get feature importance explanations.

Semantic Tracer

ARCHIVED

A local-first application for visualizing and exploring dbt semantic layers. It connects directly to your dbt project and Snowflake account to provide a real-time, interactive lineage graph of your metrics, dimensions, and entities.

Key Features

  • Local-First: Your data and semantic models never leave your machine.
  • dbt Semantic Layer Integration: Connects seamlessly to your `semantic_models.yml` file.
  • Interactive Lineage Graph: Utilizes React Flow to create a dynamic and explorable graph.
  • Tauri Backend: A lightweight Rust backend provides high performance and a secure application shell.

Local AI Analyst

ARCHIVED

An AI-powered data analyst with a semantic layer, statistical rigor, and natural language insights. It allows you to ask questions in natural language and get answers based on real query results, not AI guesses.

Key Features

  • Natural Language Queries: Ask questions like "What's our conversion rate by plan type?"
  • Statistical Rigor: Automatic significance testing, confidence intervals, and sample size validation.
  • Execution-First: Prevents AI fabrication by building, executing, and then annotating results.
  • Multi-Query Workflows: Built-in analytical workflows for comprehensive analysis.

knowDB

ARCHIVED

A local-first agentic analytics platform that extends `sbdk-dev` to enable natural language queries against your data through AI assistants like Claude Desktop and ChatGPT Desktop via the Model Context Protocol (MCP).

Key Features

  • Multi-AI Support: Works with any MCP-compatible AI assistant.
  • dbt Integration: Sync dbt models to the semantic layer automatically.
  • Local-First: Runs entirely on your machine with DuckDB.
  • Open Source: MIT License - free for personal and commercial use.

Ready to Build?

These projects are reference implementations showing how to build local-first data tools. Here's how to get started:

1

Pick a Project

Start with SBDK.dev for the foundation, or choose any project that matches your needs. Each works standalone or as part of the ecosystem.

2

Fork & Explore

Fork the repository, read the README, explore the code. Each project includes comprehensive documentation and examples.

# Example: SBDK.dev
git clone https://github.com/sbdk-dev/sbdk-dev
cd sbdk-dev
pip install -e .
3

Adapt & Extend

These are reference implementations—take what works, modify what doesn't, and build your own tools. All projects are MIT licensed for maximum flexibility.

4

Share Your Work

Built something cool? Share it! Open an issue on the original repo to showcase your fork or derivative work.

Why These Are Archived

These are complete, stable reference implementations—not active products. They're archived because they're done: production-quality code demonstrating proven patterns. Perfect for forking, learning, or adapting for your own projects.