Тёмный

Automated data profiling and quality scan via Dataplex 

PracticalGCP
Подписаться 2,9 тыс.
Просмотров 6 тыс.
50% 1

Data quality is a critical concern within a complex data environment, particularly when dealing with a substantial volume of data distributed across multiple locations. To systematically identify and visualise potential issues, establish periodic scans, and notify the relevant teams at an organisational level on a significant scale, where should one begin?
This is precisely where the automated data profiling and data quality scanning capabilities of Dataplex on Google Cloud can prove invaluable. Requiring no infrastructure setup and offering a straightforward method for defining and implementing rules for data profiling and quality checks, it could serve as an excellent foundation for your large-scale data quality framework.
01:16 - Data Profiling vs Data Quality Scan
02:37 - Dataplex auto profiling
08:15 - Dataplex auto data quality scan
10:47 - Profiling hinted quality rules & YAML via CLI
18:36 - Other options to create scans
21:08 - Sensitive data considerations
22:02 - Summary
Slide: drive.google.com/file/d/13khs...
Repo: github.com/rocketechgroup/dat...

Наука

Опубликовано:

 

21 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 8   
@user-bl6kx6ld4y
@user-bl6kx6ld4y 8 месяцев назад
Great video!!
@rubelahmed-je6bo
@rubelahmed-je6bo 8 месяцев назад
Great video, will he good if you do a basic tutorial on how to set up a catalog start to finish
@DExpertz
@DExpertz 2 месяца назад
I appreciate this video Sir, 😍 (Subscribed and liked) will share too with my team.
@practicalgcp2780
@practicalgcp2780 2 месяца назад
Thanks so much for you support ❤
@DExpertz
@DExpertz 2 месяца назад
@@practicalgcp2780 Of course man, thank you for sharing this informations in a simpler way
@QuynhNguyen-zy2rs
@QuynhNguyen-zy2rs 2 месяца назад
Hi, After you have created data profile scan and data quality scan, is the insights tab displayed? I don't see the insights tab in your video. Please explain to me! Thanks!
@yogeshsahu4943
@yogeshsahu4943 6 месяцев назад
great video can you make 1 video for data lineage api where dataplex can't be enabled directly and lineage api data can be used manually to reflect lineage on dataplex
@practicalgcp2780
@practicalgcp2780 6 месяцев назад
thanks for the comment, can you clarify what do you mean by dataplex cannot be enabled directly? I've not used the lineage API yet but my understanding of how it works is the lineage would be automatically generated as long as you enable the data lineage API, and BigQuery does it via SQL parsing through audit logs. I do believe there is an option if you want to add your own lineage via the API for the ones are outside of the context of BigQuery, are you referring to that one. I've not tried it yet as there hasn't been an use case I need it.
Далее
Centralised Data Sharing using Analytics Hub
31:33
Просмотров 2,4 тыс.
What’s new in data governance
41:02
Просмотров 2,2 тыс.
Run Apache Spark jobs on serverless Dataproc
30:18
Просмотров 3,7 тыс.
Build a Data Mesh on GCP with Dataplex
16:34
Просмотров 17 тыс.
Near real-time CDC using DataStream
32:20
Просмотров 6 тыс.
Здесь упор в процессор
18:02
Просмотров 376 тыс.
$1 vs $100,000 Slow Motion Camera!
0:44
Просмотров 25 млн