Skip to main content

Understanding Context Aggregation

Context aggregation is SignalPilot’s ability to gather and synthesize information from across your entire data stack—databases, notebooks, collaboration tools, and documentation—into a unified context for AI-powered investigations.

Why Context Matters

Traditional AI tools require you to manually copy-paste context:
❌ Without Context Aggregation:
1. Copy table schema from Snowflake
2. Paste into ChatGPT
3. Ask question
4. Get answer with hallucinated column names
5. Fix and retry
6. Repeat for each new question
SignalPilot automatically aggregates context:
✅ With Context Aggregation:
1. Ask question
2. SignalPilot fetches relevant schemas, lineage, docs
3. Get accurate answer that references real tables

The MCP Architecture

SignalPilot uses the Model Context Protocol (MCP) to connect to your data stack. MCP provides a standardized way for AI systems to access external context sources.
┌─────────────────────────────────────────────────────────────────┐
│                     SIGNALPILOT CORE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│    ┌─────────────────────────────────────────────────────────┐  │
│    │              INTERNAL MCP SIDECAR                        │  │
│    │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐        │  │
│    │  │ Kernel  │ │ Schema  │ │  Query  │ │  File   │        │  │
│    │  │ Context │ │ Explorer│ │ History │ │ System  │        │  │
│    │  └─────────┘ └─────────┘ └─────────┘ └─────────┘        │  │
│    └─────────────────────────────────────────────────────────┘  │
│                              │                                   │
│                              ▼                                   │
│    ┌─────────────────────────────────────────────────────────┐  │
│    │              EXTERNAL MCP SERVERS                        │  │
│    │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐        │  │
│    │  │   dbt   │ │  Slack  │ │  Jira   │ │ Notion  │        │  │
│    │  │ Lineage │ │ Threads │ │ Tickets │ │  Docs   │        │  │
│    │  └─────────┘ └─────────┘ └─────────┘ └─────────┘        │  │
│    └─────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Internal Context Sources

The internal MCP sidecar provides access to local context that doesn’t require external connections:

Kernel Context

Access to the current Jupyter kernel state:
Context TypeWhat It ProvidesExample Use
VariablesActive dataframes, values”What columns are in df?”
Execution HistoryRecently run code”What did I just calculate?”
OutputsCell outputs, plots”Explain this visualization”
ErrorsStack traces, exceptions”Why did this fail?”

Schema Explorer

Direct database introspection:
Context TypeWhat It ProvidesExample Use
TablesList of available tables”What tables contain user data?”
ColumnsColumn names, types, nullability”What’s the schema of orders?”
RelationshipsForeign keys, joins”How are users and orders related?”
IndexesPerformance hints”Is this query optimized?”

Query History

Recent database activity:
Context TypeWhat It ProvidesExample Use
Past QueriesSQL that was previously run”Run that revenue query again”
Query ResultsCached result summaries”What did we find yesterday?”
Error HistoryFailed queries and reasons”Why did this query fail before?”

File System

Local file access:
Context TypeWhat It ProvidesExample Use
Notebooks.ipynb files in workspace”What analysis is in notebook X?”
Data FilesCSVs, Parquet, JSON”Load the customer_data.csv”
ConfigsConnection strings, settings”What database am I connected to?”

External MCP Servers

External MCP servers connect SignalPilot to your broader data ecosystem:

dbt Integration

What it provides:
  • Model lineage (upstream/downstream dependencies)
  • Model documentation and descriptions
  • Test results and data quality status
  • Column-level lineage
Example queries:
  • “What models feed into the revenue dashboard?”
  • “Show me the lineage for customer_lifetime_value”
  • “Are there any failing dbt tests?”

Setup Guide

Configure dbt Cloud or dbt Core integration
What it provides:
  • Channel discussions about data and metrics
  • Data team decisions and context
  • Recent incident threads
  • @mentions of specific metrics or tables
Example queries:
  • “What did the team discuss about conversion last week?”
  • “Are there any known issues with the orders table?”
  • “Who owns the attribution model?”

Setup Guide

Configure Slack workspace integration
What it provides:
  • Tickets related to data issues
  • Deployment history affecting data
  • Sprint context for data projects
  • Issue status and assignments
Example queries:
  • “Were there any deployments to the pipeline last week?”
  • “Is there a ticket for the missing data issue?”
  • “What’s the status of the ETL fix?”

Setup Guide

Configure Jira project integration
What it provides:
  • Data dictionaries and glossaries
  • Design documents and specifications
  • Runbooks and troubleshooting guides
  • Meeting notes and decisions
Example queries:
  • “What’s the definition of ‘active user’ in our docs?”
  • “Is there a runbook for pipeline failures?”
  • “What decisions were made in the data review?”

Parallel Context Resolution

When you ask a question, SignalPilot resolves context from all relevant sources simultaneously:
Question: "Why did conversion drop last week?"

Context Resolution (parallel):
├─ Kernel: Check for existing conversion analysis
├─ Schema: Fetch conversion_events table structure
├─ Query History: Find recent conversion queries
├─ dbt: Get model lineage for conversion metrics
├─ Slack: Search for "conversion" in data channels
└─ Jira: Look for related deployment tickets

Total time: ~2 seconds (not 6x sequential)
Parallel resolution means adding more context sources doesn’t significantly increase latency. The total time is determined by the slowest source, not the sum of all sources.

Context Prioritization

Not all context is equally relevant. SignalPilot uses semantic understanding to prioritize:
PriorityContext TypeWhen Used
HighDirect mentions (table names, metrics)Always included
HighRecent query historyAlways included
MediumRelated schemasIncluded if space allows
Mediumdbt lineageIncluded for data questions
LowGeneral Slack discussionsSummarized if relevant
LowOld documentationReferenced but not full text
Context prioritization ensures the AI focuses on the most relevant information without being overwhelmed by tangential details.

Context Security

All context aggregation respects your security boundaries:

Read-Only Access

SignalPilot only reads context—it cannot modify external systems. Database connections are read-only. Slack integration is read-only. No write access to any external MCP source.
Database credentials and API tokens are stored locally in your environment. They are never transmitted to SignalPilot servers. Connection strings are processed locally.
Only metadata and schemas are transmitted by default. Actual data values stay local unless explicitly included in a query. You control what context is shared.
All context resolutions are logged locally. You can see exactly what was fetched and when. Hooks can enforce additional access controls.

Learn More: Security & Privacy

Complete security model documentation

Configuring Context Sources

Adding a Database

# In your notebook or config
signalpilot.add_connection(
    name="production",
    type="snowflake",
    connection_string="snowflake://user@account/database"
)

Adding an MCP Server

// In signalpilot.config.json
{
  "mcp_servers": {
    "dbt": {
      "type": "dbt-cloud",
      "account_id": "12345",
      "project_id": "67890"
    },
    "slack": {
      "type": "slack",
      "workspace": "your-workspace"
    }
  }
}

Best Practices

1

Start with Core Sources

Begin with database and kernel context. Add external MCP sources as needed.
2

Use Descriptive Names

Name your connections clearly (e.g., “production-snowflake”, “analytics-postgres”) for easy reference.
3

Review Context in Plans

When approving investigation plans, check that the right context sources are being used.
4

Add Context Incrementally

If an investigation misses context, mention it. SignalPilot learns to include similar context next time.