economic-data-project
Health Gecti
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 38 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
A personal data platform for U.S. economic and financial market data — Dagster, dbt, MotherDuck, DSPy. Queried over MCP.
Economic Data Project
An end-to-end data application that ingests, transforms, and analyzes economic and financial market data using modern open-source tools. The application combines traditional data engineering workflows with AI-powered analysis agents to provide insights into economic cycles, market trends, and asset allocation strategies.
Documentation
For detailed documentation, see the docs/ directory:
| Documentation | Description |
|---|---|
| Data Platform Overview | High-level architecture |
| Dagster Pipeline | Data orchestration documentation |
| dbt Models | SQL transformation documentation |
| GCP Deployment | Cloud deployment guide |
Tools and Technologies
Core Frameworks
- Dagster: Orchestration framework for data pipelines, asset management, schedules, and sensors
- dbt: SQL-based transformation framework for data modeling and analytics
- DSPy: Framework for building and optimizing AI agents with LLMs
- DuckDB/MotherDuck: Embedded analytical database with cloud sync capabilities
Supporting Technologies
- Polars: High-performance dataframe library
- Sling: Data replication tool for syncing between databases
- BigQuery: Cloud data warehouse for replication target
- Python: Primary programming language (3.10-3.13)
Data Sources
All data is sourced from publicly available APIs:
Economic Data
- Federal Reserve Economic Data (FRED): Comprehensive economic indicators including GDP, inflation, employment, housing, trade, and financial conditions
- Bureau of Labor Statistics (BLS): Employment and labor market data
- Census Bureau: Population and demographic data
Market Data
- Market Stack API: Stock market data for major indices, sectors, global markets, currencies, fixed income, and commodities
- Treasury Yields: U.S. Treasury bond yield curve data
- Realtor.com: Housing market inventory and pricing data
Project Structure
economic-data-project/
├── macro_agents/ # Main Dagster project
│ ├── src/macro_agents/
│ │ ├── definitions.py # Central Dagster definitions
│ │ └── defs/
│ │ ├── domains/ # Domain assets + automation
│ │ │ ├── markets.py # MarketStack ETFs/commodities/company prices
│ │ │ ├── calendars.py # Economic/earnings calendars
│ │ │ ├── macro.py # FRED + Treasury + FOMC minutes
│ │ │ ├── housing.py # Realtor/housing data
│ │ │ ├── social.py # Reddit ingestion
│ │ │ ├── sec.py # SEC filings
│ │ │ └── fomc_transcripts.py
│ │ ├── transformation/ # Data transformation
│ │ │ └── financial_condition_index.py
│ │ ├── agents/ # AI analysis agents (DSPy)
│ │ ├── resources/ # Dagster resources
│ │ ├── replication/ # Data replication
│ │ ├── telemetry/ # Monitoring assets
│ │ ├── sensors/ # Event-driven triggers
│ │ ├── shared_resources.py # Common resources
│ │ └── asset_failure_sensor.py
│ └── tests/ # Test suite
├── dbt_project/ # dbt transformation project
│ ├── dbt_project.yml # dbt configuration
│ ├── profiles.yml # Connection profiles
│ └── models/
│ ├── staging/ # Staging layer models
│ ├── government/ # Government data models
│ ├── markets/ # Market data models
│ ├── commodities/ # Commodity data models
│ └── analysis/ # Analysis layer models
├── dagster_cloud.yaml # Dagster Cloud deployment config
└── makefile # Build and automation commands
Data Flow
1. Ingestion Layer (Dagster Assets)
Raw data assets pull from external APIs and store in DuckDB/MotherDuck:
- FRED Data: Partitioned by 70+ economic series codes, scheduled weekly
- Market Data: Partitioned by ticker and month for indices, sectors, commodities, currencies
- Treasury Yields: Daily yield curve data
- Housing Data: Inventory and pricing data from BLS and Realtor.com
2. Transformation Layer (dbt Models)
SQL-based transformations organized in layers:
- Staging: Standardizes and cleans raw data (
stg_*models) - Government: Aggregates economic indicators (
fred_*,housing_*models) - Markets: Analyzes market returns and summaries (
*_summary,*_analysis_returnmodels) - Commodities: Commodity-specific analysis
- Analysis: Combines economic and market data for advanced analytics (
base_historical_analysis,leading_econ_return_indicator)
3. AI Analysis Layer (DSPy Agents)
AI-powered analysis agents that operate on transformed data:
- Economic Cycle Analysis: Identifies economic phases (expansion, peak, contraction, trough)
- Asset Allocation: Generates portfolio recommendations based on economic conditions
- Backtesting: Tests investment strategies against historical data
- Model Evaluation: Continuous improvement of AI models using DSPy metrics
4. Replication Layer (Sling)
Replicates transformed data from MotherDuck to BigQuery for downstream consumption.
Environment Variables
Create a .env file in the macro_agents directory with the following variables:
Required
MODEL_NAME: OpenAI model to use (e.g.,gpt-4-turbo-preview,gpt-3.5-turbo)OPENAI_API_KEY: OpenAI API authentication keyFRED_API_KEY: Federal Reserve Economic Data API keyMARKETSTACK_API_KEY: Market Stack API keyMOTHERDUCK_TOKEN: MotherDuck authentication token (for cloud sync)
Optional (Development)
ENVIRONMENT: Environment setting (devorprod, defaults todev)DBT_TARGET: dbt target environment (local,dev, orprod, defaults tolocal)DBT_PROJECT_DIR: Path to dbt project directory (auto-detected if not set)
Optional (Production/Replication)
MOTHERDUCK_DATABASE: MotherDuck database nameMOTHERDUCK_PROD_SCHEMA: MotherDuck production schemaBIGQUERY_PROJECT_ID: Google Cloud project ID for BigQueryBIGQUERY_LOCATION: BigQuery dataset locationBIGQUERY_DATASET: BigQuery dataset nameGOOGLE_APPLICATION_CREDENTIALS: Path to Google Cloud service account credentials JSON fileCENSUS_API_KEY: Census Bureau API key (if using Census data)
Quick Start
Prerequisites
- Python 3.11
- uv (recommended) or pip for package management
- DuckDB and MotherDuck account (for cloud sync)
- API keys for data sources
- OpenAI API key for AI agents
Installation
- Clone the repository
git clone <repository-url>
cd economic-data-project
- Install dependencies
cd macro_agents
uv sync # or pip install -e .[dev]
- Install dbt packages
cd ../dbt_project
dbt deps
Set up environment variables
Create a.envfile in themacro_agentsdirectory with required variables (see Environment Variables section above).Validate setup
# Test Dagster definitions
cd macro_agents
dg check defs
# Test dbt models
cd ../dbt_project
dbt compile
dbt parse
Running Locally
Start Dagster UI:
cd macro_agents
dagster dev
Navigate to http://localhost:3000 to view and materialize assets.
Run dbt models manually:
cd dbt_project
dbt run # Run all models
dbt run --select staging.* # Run specific layer
Run tests:
cd macro_agents
pytest tests/ -v
# Or use the makefile
make test
Deployment
The project is configured for deployment on Dagster Cloud using the dagster_cloud.yaml configuration file. The deployment builds from the macro_agents directory and uses macro_agents.definitions as the entry point.
Development
Common Commands
# Run tests
make test
# Lint Python code
make ruff
# Lint SQL code
make lint
# Fix SQL linting issues
make fix
# Run pre-PR checks (linting, type checking, tests, security scans)
make pre-pr
First Run Workflow
- Materialize ingestion assets - Start with FRED data or Market Stack data via Dagster UI
- Run dbt transformations - Transform raw data through staging → marts → analysis layers (automated via eager assets)
- Run analysis agents - Execute DSPy agents on transformed data via Dagster UI
- View results - Check DuckDB/MotherDuck for analysis outputs
Automation
- Ingestion Assets: Scheduled weekly on Mondays at midnight (FRED data)
- dbt Models: Eager automation (run automatically when upstream data changes)
- Analysis Agents: On-demand or scheduled via Dagster jobs
- Replication: Monthly partitioned replication to BigQuery via Sling
Testing
Test suite located in macro_agents/tests/:
- Unit tests for analysis agents
- Integration tests for end-to-end workflows
- Tests for Dagster asset descriptions
- Tests for dbt model descriptions
- Resource and schedule tests
Run tests using make test or pytest tests/ -v from the macro_agents directory.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi