A 63-node sharded DuckDB run with DataFusion Ballista clocks a 1T-row aggregation in 5 seconds. That setup echoes a broader push toward open-source, distributed analytics where sharded planners are the next frontier [1]. The build uses 63 Azure Standard E64pds v6 nodes, each with 64 vCPUs and 504 GiB RAM, totaling around 4000 CPUs and 30 TB memory. The math is striking: about $235.872/hr, and that’s cheaper than a Snowflake 4XL cluster at $384/hr for the same scale. The discussion even riffs on BigQuery as a comparison point [1].
• Spot instances could bring the hourly price down to roughly $45.99, highlighting how cloud pricing and availability influence feasibility as workloads scale [1].
The dataset and the 1T challenge live in public view—the 1trc repo on GitHub—so teams can explore the reality of petabyte-scale sharding in practice [1].
Meanwhile, a parallel thread spotlights PostgreSQL feature evolution with Pgfeaturediff, a tool that compares features across versions. This kind of version-to-version visibility helps teams gauge compatibility and roadmap decisions as they consider moving between releases [2].
Taken together, the chatter shows distributed analytics tooling shaping real decisions about distribution, cost, and compatibility—between open-source paths and cloud options—as workloads creep into trillions of rows.
References
A sharded DuckDB on 63 nodes runs 1T row aggregation challenge in 5 sec
Discusses cross-DB sharding, DuckDB, DataFusion Ballista, cost/performance tradeoffs, and comparisons with Snowflake/BigQuery, including one-trillion-row challenge datasets and distributed query planning.
View sourcePgfeaturediff: Compare PostgreSQL features between versions
A tool to compare PostgreSQL features across versions; highlights changes, capabilities, and feature differences.
View source