Understanding Context Aggregation

Context aggregation is SignalPilot’s ability to gather and synthesize information from across your entire data stack—databases, notebooks, collaboration tools, and documentation—into a unified context for AI-powered investigations.

Why Context Matters

Traditional AI tools require you to manually copy-paste context:

❌ Without Context Aggregation:
Copy table schema from Snowflake
Paste into ChatGPT
Ask question
Get answer with hallucinated column names
Fix and retry
Repeat for each new question

SignalPilot automatically aggregates context:

✅ With Context Aggregation:
Ask question
SignalPilot fetches relevant schemas, lineage, docs
Get accurate answer that references real tables

The MCP Architecture

SignalPilot uses the Model Context Protocol (MCP) to connect to your data stack. MCP provides a standardized way for AI systems to access external context sources.

┌─────────────────────────────────────────────────────────────────┐
│                     SIGNALPILOT CORE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│    ┌─────────────────────────────────────────────────────────┐  │
│    │              INTERNAL MCP SIDECAR                        │  │
│    │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐        │  │
│    │  │ Kernel  │ │ Schema  │ │  Query  │ │  File   │        │  │
│    │  │ Context │ │ Explorer│ │ History │ │ System  │        │  │
│    │  └─────────┘ └─────────┘ └─────────┘ └─────────┘        │  │
│    └─────────────────────────────────────────────────────────┘  │
│                              │                                   │
│                              ▼                                   │
│    ┌─────────────────────────────────────────────────────────┐  │
│    │              EXTERNAL MCP SERVERS                        │  │
│    │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐        │  │
│    │  │   dbt   │ │  Slack  │ │  Jira   │ │ Notion  │        │  │
│    │  │ Lineage │ │ Threads │ │ Tickets │ │  Docs   │        │  │
│    │  └─────────┘ └─────────┘ └─────────┘ └─────────┘        │  │
│    └─────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Internal Context Sources

The internal MCP sidecar provides access to local context that doesn’t require external connections:

Kernel Context

Access to the current Jupyter kernel state:

Context Type	What It Provides	Example Use
Variables	Active dataframes, values	”What columns are in df?”
Execution History	Recently run code	”What did I just calculate?”
Outputs	Cell outputs, plots	”Explain this visualization”
Errors	Stack traces, exceptions	”Why did this fail?”

Schema Explorer

Direct database introspection:

Context Type	What It Provides	Example Use
Tables	List of available tables	”What tables contain user data?”
Columns	Column names, types, nullability	”What’s the schema of orders?”
Relationships	Foreign keys, joins	”How are users and orders related?”
Indexes	Performance hints	”Is this query optimized?”

Query History

Recent database activity:

Context Type	What It Provides	Example Use
Past Queries	SQL that was previously run	”Run that revenue query again”
Query Results	Cached result summaries	”What did we find yesterday?”
Error History	Failed queries and reasons	”Why did this query fail before?”

File System

Local file access:

Context Type	What It Provides	Example Use
Notebooks	.ipynb files in workspace	”What analysis is in notebook X?”
Data Files	CSVs, Parquet, JSON	”Load the customer_data.csv”
Configs	Connection strings, settings	”What database am I connected to?”

External MCP Servers

External MCP servers connect SignalPilot to your broader data ecosystem:

dbt Integration

What it provides:

Model lineage (upstream/downstream dependencies)
Model documentation and descriptions
Test results and data quality status
Column-level lineage

Example queries:

“What models feed into the revenue dashboard?”
“Show me the lineage for customer_lifetime_value”
“Are there any failing dbt tests?”

Setup Guide

Configure dbt Cloud or dbt Core integration

Slack Integration

What it provides:

Channel discussions about data and metrics
Data team decisions and context
Recent incident threads
@mentions of specific metrics or tables

Example queries:

“What did the team discuss about conversion last week?”
“Are there any known issues with the orders table?”
“Who owns the attribution model?”

Setup Guide

Configure Slack workspace integration

Jira Integration

What it provides:

Tickets related to data issues
Deployment history affecting data
Sprint context for data projects
Issue status and assignments

Example queries:

“Were there any deployments to the pipeline last week?”
“Is there a ticket for the missing data issue?”
“What’s the status of the ETL fix?”

Setup Guide

Configure Jira project integration

Notion/GDocs Integration

What it provides:

Data dictionaries and glossaries
Design documents and specifications
Runbooks and troubleshooting guides
Meeting notes and decisions

Example queries:

“What’s the definition of ‘active user’ in our docs?”
“Is there a runbook for pipeline failures?”
“What decisions were made in the data review?”

Parallel Context Resolution

When you ask a question, SignalPilot resolves context from all relevant sources simultaneously:

Question: "Why did conversion drop last week?"

Context Resolution (parallel):
├─ Kernel: Check for existing conversion analysis
├─ Schema: Fetch conversion_events table structure
├─ Query History: Find recent conversion queries
├─ dbt: Get model lineage for conversion metrics
├─ Slack: Search for "conversion" in data channels
└─ Jira: Look for related deployment tickets

Total time: ~2 seconds (not 6x sequential)

Parallel resolution means adding more context sources doesn’t significantly increase latency. The total time is determined by the slowest source, not the sum of all sources.

Context Prioritization

Not all context is equally relevant. SignalPilot uses semantic understanding to prioritize:

Priority	Context Type	When Used
High	Direct mentions (table names, metrics)	Always included
High	Recent query history	Always included
Medium	Related schemas	Included if space allows
Medium	dbt lineage	Included for data questions
Low	General Slack discussions	Summarized if relevant
Low	Old documentation	Referenced but not full text

Context prioritization ensures the AI focuses on the most relevant information without being overwhelmed by tangential details.

Context Security

All context aggregation respects your security boundaries:

Read-Only Access

SignalPilot only reads context—it cannot modify external systems. Database connections are read-only. Slack integration is read-only. No write access to any external MCP source.

Credential Management

Database credentials and API tokens are stored locally in your environment. They are never transmitted to SignalPilot servers. Connection strings are processed locally.

Data Minimization

Only metadata and schemas are transmitted by default. Actual data values stay local unless explicitly included in a query. You control what context is shared.

Audit Trail

All context resolutions are logged locally. You can see exactly what was fetched and when. Hooks can enforce additional access controls.

Learn More: Security & Privacy

Complete security model documentation

Configuring Context Sources

Adding a Database

# In your notebook or config
signalpilot.add_connection(
    name="production",
    type="snowflake",
    connection_string="snowflake://user@account/database"
)

Adding an MCP Server

// In signalpilot.config.json
{
  "mcp_servers": {
    "dbt": {
      "type": "dbt-cloud",
      "account_id": "12345",
      "project_id": "67890"
    },
    "slack": {
      "type": "slack",
      "workspace": "your-workspace"
    }
  }
}

Best Practices

Start with Core Sources

Begin with database and kernel context. Add external MCP sources as needed.

Use Descriptive Names

Name your connections clearly (e.g., “production-snowflake”, “analytics-postgres”) for easy reference.

Review Context in Plans

When approving investigation plans, check that the right context sources are being used.

Add Context Incrementally

If an investigation misses context, mention it. SignalPilot learns to include similar context next time.

Agent Orchestration Loop

How context flows through the investigation loop

Multi-Session Memory

How context insights are stored for future use

dbt Integration

Detailed dbt Cloud and Core setup

Slack Integration

Connect your Slack workspace

Introduction

Getting Started

Integrations

Core Concepts

Understanding Context Aggregation

Understanding Context Aggregation

Why Context Matters

The MCP Architecture

Internal Context Sources

Kernel Context

Schema Explorer

Query History

File System

External MCP Servers

Setup Guide

Setup Guide

Setup Guide

Parallel Context Resolution

Context Prioritization

Context Security

Learn More: Security & Privacy

Configuring Context Sources

Adding a Database

Adding an MCP Server

Best Practices

Agent Orchestration Loop

Multi-Session Memory

dbt Integration

Slack Integration

Introduction

Getting Started

Integrations

Core Concepts

​Understanding Context Aggregation

​Why Context Matters

​The MCP Architecture

​Internal Context Sources

​Kernel Context

​Schema Explorer

​Query History

​File System

​External MCP Servers

Setup Guide

Setup Guide

Setup Guide

​Parallel Context Resolution

​Context Prioritization

​Context Security

Learn More: Security & Privacy

​Configuring Context Sources

​Adding a Database

​Adding an MCP Server

​Best Practices

​Related Deep Dives

Agent Orchestration Loop

Multi-Session Memory

dbt Integration

Slack Integration

Understanding Context Aggregation

Why Context Matters

The MCP Architecture

Internal Context Sources

Kernel Context

Schema Explorer

Query History

File System

External MCP Servers

Parallel Context Resolution

Context Prioritization

Context Security

Configuring Context Sources

Adding a Database

Adding an MCP Server

Best Practices

Related Deep Dives