Holden Karau, OSS Engineer, Data Platform Engineering, talks about the importance of reliable data pipelines and how to build them covering tools from testing to validation and auditing. The talk uses Apache Spark as an example, but the concepts generalize regardless of your specific tools.
Some related projects include:
github.com/holdenk/spark-test...
github.com/unionai-oss/pandera
github.com/target/data-validator
and
github.com/tensorflow/data-va....
#netflix
#datascience
#dataengineering
#etl
#bigdata
13 дек 2023