Hacker Neus

Show HN: BemiDB – Open-source data warehouse with zero-ETL (bemidb.com)

13 points

by exAspArk

a month ago |

Hi HN! We're Evgeny and Arjun, and we’re building a simpler way for startups to do data analytics.

Since open-sourcing our Postgres read replica optimized for analytics (https://github.com/BemiHQ/BemiDB), we started hearing a familiar story. Teams would connect Postgres, feel relieved that they didn’t have to wrangle complex ETL pipelines, but then hit a wall as soon as they wanted to join data from HubSpot, Stripe, etc. They’d do hacky things like use Airbyte to sync data to their Postgres, so that it’d then auto sync to their BemiDB analytical database.

We want to remove the layers of data complexity that startups have to add when scaling, and that’s why BemiDB now also allows connecting any supplementary data sources. This makes it a zero-ETL data warehouse for companies that don’t want the typical heavyweight warehouses with expensive ETL’s.

Under the hood, we use Apache Iceberg (with Parquet data files) stored in S3. This allows for bottomless inexpensive storage, compressed data in columnar files, and an open format that guarantees compatibility with other data tools. We use Trino to help with table maintenance and compaction.

We embed DuckDB as the query engine for in-memory analytics that work for complex queries. With efficient columnar storage and vectorized execution, we’re aiming for faster results without heavy infra. BemiDB communicates over the Postgres wire protocol and is also Postgres syntax compatible.

We want to fully simplify data infra for companies that use Postgres and other data sources by reducing complexity (automatic data source syncs), using non-proprietary data formats (Iceberg open tables), and removing vendor lock-in (open source). We'd love to hear any and all feedback! Thoughts HN?