Shodata is Git for datasets—a fast, modern way to version data instead of chasing file names in a maze of S3 buckets. It’s an open platform designed for dataset workflows, where uploading a file automatically creates a new version (v2, v3, etc.) and each version lands with a discussion thread, a full history, and clean previews and statistics.
That workflow mirrors code versioning but focuses on the data itself. The discussion notes that, in big data, treating information about files as metadata is common—but Shodata treats the dataset itself as the unit of truth [1].
On the toolbox side, Apache Iceberg and its Catalog concept are cited as inspiration. Iceberg can run via Docker compose and still let you manage data with SQL, giving you a low-friction entry point before you scale up [1]. The idea that data versioning sits in the metadata-vs-data debate aligns with the view that information about files is metadata, keeping the dataset’s history the primary asset [1].
Broader implications show up in cross‑instance movement and self-hosted storage. Some threads point to relational moves like moving tables across PostgreSQL instances as a parallel challenge [3], while others discuss self-hostable alternatives to Jsonbin.io, reflecting a desire to own storage end-to-end [2].
If you’re watching data workflows, Shodata’s model—catalogs, versioned datasets, and self-hosted avenues—feels like a rising tide for data versioning.
References
Show HN: I built a tool to version control datasets (like Git, but for data)
Show HN: built Shodata, a Git-like version control for datasets; discusses Iceberg, catalog concepts, metadata vs data, MVP, feedback needs.
View sourceAsk HN: Self Hostable Alternative to Jsonbin.io?
Asks for self-hosted JSON storage alternative to Jsonbin.io; seeks recommendations or comparisons of self-hosted data storage services and features today.
View sourceMoving tables across PostgreSQL instances
Discussion about moving tables between PostgreSQL instances, methods, challenges, and data consistency considerations.
View source