Economic Data Project

An end-to-end data application that ingests, transforms, and analyzes economic and financial market data using modern open-source tools. The application combines traditional data engineering workflows with AI-powered analysis agents to provide insights into economic cycles, market trends, and asset allocation strategies.

Documentation

For detailed documentation, see the docs/ directory:

Documentation	Description
Data Platform Overview	High-level architecture
Dagster Pipeline	Data orchestration documentation
dbt Models	SQL transformation documentation
GCP Deployment	Cloud deployment guide

Tools and Technologies

Core Frameworks

Dagster: Orchestration framework for data pipelines, asset management, schedules, and sensors
- Dagster Documentation
dbt: SQL-based transformation framework for data modeling and analytics
- dbt Documentation
DSPy: Framework for building and optimizing AI agents with LLMs
- DSPy Documentation
DuckDB/MotherDuck: Embedded analytical database with cloud sync capabilities
- DuckDB Documentation
- MotherDuck Documentation

Supporting Technologies

Polars: High-performance dataframe library
- Polars Documentation
Sling: Data replication tool for syncing between databases
- Sling Documentation
BigQuery: Cloud data warehouse for replication target
- BigQuery Documentation
Python: Primary programming language (3.10-3.13)

Data Sources

All data is sourced from publicly available APIs:

Economic Data

Federal Reserve Economic Data (FRED): Comprehensive economic indicators including GDP, inflation, employment, housing, trade, and financial conditions
- FRED API Documentation
Bureau of Labor Statistics (BLS): Employment and labor market data
- BLS API Documentation
Census Bureau: Population and demographic data
- Census Bureau API Documentation

Market Data

Market Stack API: Stock market data for major indices, sectors, global markets, currencies, fixed income, and commodities
- Market Stack API Documentation
Treasury Yields: U.S. Treasury bond yield curve data
Realtor.com: Housing market inventory and pricing data
- Realtor.com Research Data

Project Structure

economic-data-project/
├── macro_agents/                    # Main Dagster project
│   ├── src/macro_agents/
│   │   ├── definitions.py            # Central Dagster definitions
│   │   └── defs/
│   │       ├── domains/              # Domain assets + automation
│   │       │   ├── markets.py        # MarketStack ETFs/commodities/company prices
│   │       │   ├── calendars.py      # Economic/earnings calendars
│   │       │   ├── macro.py          # FRED + Treasury + FOMC minutes
│   │       │   ├── housing.py        # Realtor/housing data
│   │       │   ├── social.py         # Reddit ingestion
│   │       │   ├── sec.py            # SEC filings
│   │       │   └── fomc_transcripts.py
│   │       ├── transformation/       # Data transformation
│   │       │   └── financial_condition_index.py
│   │       ├── agents/               # AI analysis agents (DSPy)
│   │       ├── resources/            # Dagster resources
│   │       ├── replication/          # Data replication
│   │       ├── telemetry/            # Monitoring assets
│   │       ├── sensors/              # Event-driven triggers
│   │       ├── shared_resources.py   # Common resources
│   │       └── asset_failure_sensor.py
│   └── tests/                       # Test suite
├── dbt_project/                     # dbt transformation project
│   ├── dbt_project.yml              # dbt configuration
│   ├── profiles.yml                 # Connection profiles
│   └── models/
│       ├── staging/                 # Staging layer models
│       ├── government/              # Government data models
│       ├── markets/                 # Market data models
│       ├── commodities/             # Commodity data models
│       └── analysis/                # Analysis layer models
├── dagster_cloud.yaml               # Dagster Cloud deployment config
└── makefile                         # Build and automation commands

Data Flow

1. Ingestion Layer (Dagster Assets)

Raw data assets pull from external APIs and store in DuckDB/MotherDuck:

FRED Data: Partitioned by 70+ economic series codes, scheduled weekly
Market Data: Partitioned by ticker and month for indices, sectors, commodities, currencies
Treasury Yields: Daily yield curve data
Housing Data: Inventory and pricing data from BLS and Realtor.com

2. Transformation Layer (dbt Models)

SQL-based transformations organized in layers:

Staging: Standardizes and cleans raw data (stg_* models)
Government: Aggregates economic indicators (fred_*, housing_* models)
Markets: Analyzes market returns and summaries (*_summary, *_analysis_return models)
Commodities: Commodity-specific analysis
Analysis: Combines economic and market data for advanced analytics (base_historical_analysis, leading_econ_return_indicator)

3. AI Analysis Layer (DSPy Agents)

AI-powered analysis agents that operate on transformed data:

Economic Cycle Analysis: Identifies economic phases (expansion, peak, contraction, trough)
Asset Allocation: Generates portfolio recommendations based on economic conditions
Backtesting: Tests investment strategies against historical data
Model Evaluation: Continuous improvement of AI models using DSPy metrics

4. Replication Layer (Sling)

Replicates transformed data from MotherDuck to BigQuery for downstream consumption.

Environment Variables

Create a .env file in the macro_agents directory with the following variables:

Required

MODEL_NAME: OpenAI model to use (e.g., gpt-4-turbo-preview, gpt-3.5-turbo)
OPENAI_API_KEY: OpenAI API authentication key
FRED_API_KEY: Federal Reserve Economic Data API key
MARKETSTACK_API_KEY: Market Stack API key
MOTHERDUCK_TOKEN: MotherDuck authentication token (for cloud sync)

Optional (Development)

ENVIRONMENT: Environment setting (dev or prod, defaults to dev)
DBT_TARGET: dbt target environment (local, dev, or prod, defaults to local)
DBT_PROJECT_DIR: Path to dbt project directory (auto-detected if not set)

Optional (Production/Replication)

MOTHERDUCK_DATABASE: MotherDuck database name
MOTHERDUCK_PROD_SCHEMA: MotherDuck production schema
BIGQUERY_PROJECT_ID: Google Cloud project ID for BigQuery
BIGQUERY_LOCATION: BigQuery dataset location
BIGQUERY_DATASET: BigQuery dataset name
GOOGLE_APPLICATION_CREDENTIALS: Path to Google Cloud service account credentials JSON file
CENSUS_API_KEY: Census Bureau API key (if using Census data)

Quick Start

Prerequisites

Python 3.11
uv (recommended) or pip for package management
DuckDB and MotherDuck account (for cloud sync)
API keys for data sources
OpenAI API key for AI agents

Installation

Clone the repository

git clone <repository-url>
cd economic-data-project

Install dependencies

cd macro_agents
uv sync  # or pip install -e .[dev]

Install dbt packages

cd ../dbt_project
dbt deps

Set up environment variables
Create a .env file in the macro_agents directory with required variables (see Environment Variables section above).
Validate setup

# Test Dagster definitions
cd macro_agents
dg check defs

# Test dbt models
cd ../dbt_project
dbt compile
dbt parse

Running Locally

Start Dagster UI:

cd macro_agents
dagster dev

Navigate to http://localhost:3000 to view and materialize assets.

Run dbt models manually:

cd dbt_project
dbt run          # Run all models
dbt run --select staging.*  # Run specific layer

Run tests:

cd macro_agents
pytest tests/ -v
# Or use the makefile
make test

Deployment

The project is configured for deployment on Dagster Cloud using the dagster_cloud.yaml configuration file. The deployment builds from the macro_agents directory and uses macro_agents.definitions as the entry point.

Development

Common Commands

# Run tests
make test

# Lint Python code
make ruff

# Lint SQL code
make lint

# Fix SQL linting issues
make fix

# Run pre-PR checks (linting, type checking, tests, security scans)
make pre-pr

First Run Workflow

Materialize ingestion assets - Start with FRED data or Market Stack data via Dagster UI
Run dbt transformations - Transform raw data through staging → marts → analysis layers (automated via eager assets)
Run analysis agents - Execute DSPy agents on transformed data via Dagster UI
View results - Check DuckDB/MotherDuck for analysis outputs

Automation

Ingestion Assets: Scheduled weekly on Mondays at midnight (FRED data)
dbt Models: Eager automation (run automatically when upstream data changes)
Analysis Agents: On-demand or scheduled via Dagster jobs
Replication: Monthly partitioned replication to BigQuery via Sling

Testing

Test suite located in macro_agents/tests/:

Unit tests for analysis agents
Integration tests for end-to-end workflows
Tests for Dagster asset descriptions
Tests for dbt model descriptions
Resource and schedule tests

Run tests using make test or pytest tests/ -v from the macro_agents directory.