SignalPilot is now #1 on Spider 2.0-DBT with 51.56.
Back

Benefits of Using SignalPilot Governed MCP to Build dbt Models

Luiz Fernando

Luiz Fernando

/5 min read
Benefits of Using SignalPilot Governed MCP to Build dbt Models

The Status Quo is Broken

Here's how most people build dbt models today:

  1. Open a SQL client (DBeaver, pgAdmin, DataGrip)
  2. Manually explore tables: \d table_name, SELECT * FROM ... LIMIT 10
  3. Switch to editor, write the model
  4. Switch to terminal, run dbt run
  5. Switch back to SQL client, check results
  6. Repeat

Four context switches per iteration. And if you're using an AI assistant? You're copy-pasting schemas and query results between windows.

What Changes with MCP

MCP (Model Context Protocol) lets AI assistants call tools directly — including database tools. SignalPilot's MCP server exposes governed database access:

list_tables          → Full schema with columns, PKs, FKs, row counts
describe_table       → Column-level detail with types and annotations
query_database       → Read-only SQL with automatic governance
schema_overview      → Database-level statistics

The AI assistant gets direct, structured access to your database. No copy-paste. No context switching.

But Direct Database Access is Terrifying

This is where governance matters. SignalPilot doesn't just proxy SQL — it enforces safety:

1. AST-Level Query Parsing

Every SQL query is parsed to an Abstract Syntax Tree before execution. DDL (CREATE, DROP, ALTER) and DML (INSERT, UPDATE, DELETE) are blocked at the parse level — not with regex, but with actual SQL grammar analysis.

2. Automatic LIMIT Injection

No query runs without a row limit. If you write SELECT * FROM gateway_health_events (31K rows), the governance layer injects LIMIT 1000. The AI gets enough data to understand patterns without accidentally pulling gigabytes.

3. Statement Stacking Detection

SELECT 1; DROP TABLE users; — blocked. The parser detects multiple statements and rejects the query before it reaches the database.

4. Full Audit Trail

Every query is logged with:

  • Who ran it (user ID)
  • Which tables were accessed
  • How many rows returned
  • Cost in USD
  • Whether it was blocked (and why)

5. Cost Governance

Session budgets cap total spend. An AI agent can't run runaway queries — the budget system tracks cumulative cost and blocks when limits are hit.

How This Played Out Building dbt Models

Schema Discovery: One Call, Full Picture

SignalPilot MCP: list_tables("SignalPilot-Ops")

→ public.gateway_orgs (12 rows): org_id*, byok_enabled, created_at
→ public.gateway_audit_logs (4K rows): id*, user_id, timestamp, event_type, ...
→ public.gateway_health_events (31K rows): id*, org_id, connection_name, ...
... (10 tables total)

The AI had the full schema in one structured response. No iterating through information_schema. No manual exploration.

Data Profiling: Governed Exploration

The AI ran exploratory queries to understand data patterns:

-- Discover health tiers
SELECT org_id, connection_name,
  ROUND(AVG(CASE WHEN success THEN 1.0 ELSE 0.0 END) * 100, 1) as uptime_pct
FROM gateway_health_events GROUP BY 1, 2 ORDER BY uptime_pct ASC

Each query was:

  • Automatically limited to 1000 rows
  • Logged to the audit trail
  • Tracked for cost
  • Read-only enforced

The AI discovered 5 natural health tiers, 3 AI agent types, industry-specific compliance patterns, and budget spending patterns — all through governed queries.

Type Discovery: Catching Data Quirks

One query revealed that gateway_api_keys.created_at stores epoch timestamps as VARCHAR, not DOUBLE PRECISION. The AI's first model failed:

function to_timestamp(character varying) does not exist

It ran a follow-up query through SignalPilot to check the actual type:

SELECT created_at, pg_typeof(created_at) FROM gateway_api_keys LIMIT 3
→ '1735689600' | character varying

Then fixed the model: to_timestamp(created_at::double precision). The governed query tool was the debugging tool.

Validation: Query the Output

After dbt run, the AI queried the marts through SignalPilot to validate:

SELECT org_id, security_score, pii_masking_mode, byok_enabled
FROM public_marts.security_posture

The same governance applies to model outputs. Read-only, audited, limited.

The Compound Benefits

For Individual Developers

  • Zero context switching — schema discovery, profiling, model building, validation all in one flow
  • Faster iteration — AI writes the model AND validates it, catching type mismatches immediately
  • Built-in safety — explore production-like databases without fear

For Teams

  • Audit trail — every query the AI ran is logged. You can review what it explored, what it cost
  • Agent attribution — track which AI tool (claude-code, cursor, dbt-agent) generated which queries
  • Cost governance — budget caps prevent runaway exploration. Our 12-org project cost fractions of a cent

For Compliance

  • Read-only by default — no model building process can accidentally modify source data
  • PII masking — sensitive columns can be masked before the AI sees them
  • BYOK encryption — credentials never leave your infrastructure

The Numbers from Our Build

MetricValue
Schema discovery to full understanding2 MCP calls
Governed SQL queries for profiling10
Total dbt models built16
Tests passing25/25
Every query audited with agent attributionYes
Automatic LIMIT injection on all queriesYes

Setting It Up

1. Configure SignalPilot MCP in Your Coding Agent

Note: SignalPilot works with any MCP-compatible client — Claude Code, Cursor, Windsurf, Codex, Cline, or any agent that speaks the protocol.

claude mcp add --transport http signalpilot https://gateway.signalpilot.ai/mcp

2. Connect Your Database

Register your database connection in SignalPilot. Postgres, Snowflake, BigQuery, Redshift, ClickHouse, and DuckDB are supported.

3. Start Building

Tell your coding agent what you want to build. It will use SignalPilot's MCP tools to explore your schema, profile your data, and build dbt models — all governed.

"I want to build a dbt project against the SignalPilot-Ops database.
Start by exploring the schema and building staging models."

The combination of AI code generation and governed database access isn't just convenient — it's the only safe way to let AI agents build analytics infrastructure against real databases.

Built with Claude Code + SignalPilot MCP.

mcpdbtgovernancedatabaseai-agents

Try SignalPilot today

Install in under a minute. Open source. No credit card.

Get Started Free