DuckLake is pushing a SQL-powered lakehouse format, signaling a move toward SQL-native data lake architectures. DuckLake and Prof. H. Mühleisen frame SQL as the default access layer for lake data [1].
Timeplus Proton 3.0 — the first vectorized streaming SQL engine — is the highlight. The open-source release promises enterprise-grade streaming in a single binary with zero dependencies, plus a few big ideas:
• Timeplus Proton 3.0 — first vectorized streaming SQL engine in modern C++ with JIT compilation; high-throughput, low-latency processing; end-to-end streaming: ETL, joins, aggregation, alerts, and tasks; native connectors (Kafka, Redpanda, Pulsar, ClickHouse, Splunk, Elastic, MongoDB, S3, Iceberg); native Python UDF/UDAF support. The release also emphasizes zero dependencies for easier deployment [2].
Meanwhile, in Dan Cohen's newsletter, 'The Index and the Vector,' indexing and vector search are debated as complementary approaches to fast retrieval [3].
Separately, 'Query Decomposition for RAG'—a Q&A on breaking down RAG queries—shows how these threads map onto practical analytics pipelines [4].
Taken together, these threads sketch a 2025–2026 analytics landscape where SQL lakehouses, vectorized streaming, and retrieval-augmented workflows collide in real-world pipelines.
References
DuckLake – SQL-Powered Lakehouse Format for the Rest of Us by Prof. H. Mühleisen [video]
Video presentation on DuckLake, a SQL-powered lakehouse format for data lake enthusiasts.
View sourceShow HN: Timeplus Proton 3.0 – First vectorized streaming SQL engine
Show HN announcing Timeplus Proton 3.0, a vectorized streaming SQL engine with connectors, UDFs, and end-to-end streaming, seeking feedback here
View sourceThe Index and the Vector
Discusses how indexing relates to vector databases and modern search techniques in databases.
View sourceQuery Decomposition for RAG
arXiv paper on decomposing queries for RAG; relevance to database querying and retrieval techniques
View source