Hmmm. But may be such great difference between json+gzip vs iceberg+parquet is not point of the iceberg. Binary parquet (with metadata in it) vs text json...
Whoa, this was awesome, guys! Loved how you showed off Hudi's multi-modal indexing. But first, I've got to upgrade my current Hudi 0.10.1 lakehouse setup. After that, I'm super excited to try this out with Trino. Really looking forward to it. Great stuff! Thanks!
Rather than considering writing your own task scheduler/runner, consider using the open-source HPC tools out there.. Slurm with auto-scaling is an absolute beast, as it was designed, and is used, to schedule millions of jobs daily for thousands of users against extremely busy/constrained super-computers around the world (over 60% of the supercomputers use it) - job runtimes ranging from sub-second to months. And you benefit from a massive set of other features such as user/team management, quotas, accounting/budgeting, flexible scheduler resources/constraints..
Can airbyte do some form of transformation/aggregation as part T of ETL ? Because I just see 100s of demos of Source-to-Sink - but not enough on Transformation.