Spider 2.0-DBT

Benchmarks

SignalPilot is officially the #1 ranked AI agent on the Spider 2.0-DBT leaderboard — resolving 42 of 64 tasks on the world's hardest data-engineering benchmark.

Full task-by-task evaluation results below.

Read the full story: How We Beat JetBrains to #1
65.62% pass rate·42 passed·22 failed·64 total·claude-sonnet-4-6
TaskStatus
activity001Pass
airbnb001Pass
airport001Pass
app_reporting001Pass
app_reporting002Pass
apple_store001Pass
asana001Pass
asset001Pass
chinook001Pass
divvy001Pass
f1002Pass
f1003Pass
google_play001Pass
google_play002Pass
greenhouse001Pass
hive001Pass
hubspot001Pass
intercom001Pass
lever001Pass
marketo001Pass
maturity001Pass
mrr001Pass
mrr002Pass
pendo001Pass
playbook001Pass
qualtrics001Pass
quickbooks002Pass
quickbooks003Pass
recharge002Pass
reddit001Pass
retail001Pass
salesforce001Pass
shopify_holistic_reporting001Pass
shopify001Pass
shopify002Pass
superstore001Pass
tickit001Pass
tpch001Pass
twilio001Pass
workday001Pass
workday002Pass
zuora001Pass
analytics_engineering001Fail
atp_tour001Fail
f1001Fail
flicks001Fail
inzight001Fail
jira001Fail
movie_recomm001Fail
nba001Fail
netflix001Fail
playbook002Fail
provider001Fail
quickbooks001Fail
recharge001Fail
sap001Fail
scd001Fail
social_media001Fail
synthea001Fail
tickit002Fail
tpch002Fail
xero_new001Fail
xero_new002Fail
xero001Fail

Run ID: dbt-run9 · Suite: spider2-dbt · Spider2 GitHub