Тёмный

How And Why Data Engineers Need To Care About Data Quality Now - And How To Implement It 

Seattle Data Guy
Подписаться 96 тыс.
Просмотров 10 тыс.
50% 1

As companies look to incorporate AI and ML into their data strategies and roadmaps, there is a new opportunity to refocus on data quality.
Regardless of how fancy or sophisticated a company's AI model might be, poor data quality will make the outputs of these models useless at best, and misleading and company destroying at worst.
So, as your company is rolling out its internally developed LLM or implementing a dynamic pricing model, it’s a great time to review your data quality strategy.
The first key point to cover is that data quality is not limited to one pillar, like accuracy. But instead, several key pillars need to be considered when developing your data quality system.
Again, special thanks to decube for sponsoring this video! If you're looking to improve your data quality you can check out the link below
bit.ly/3Sndk1B
If you enjoyed this video, check out some of my other top videos.
Top Courses To Become A Data Engineer
• Top Courses To Become ...
Data Modeling Challenges - The Issues Data Engineers & Architects Face When Implementing Data Models
• Data Modeling Challeng...
If you would like to learn more about data engineering, then check out Googles GCP certificate
bit.ly/3NQVn7V
If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
seattledataguy.substack.com/​​
Or check out my blog
www.theseattledataguy.com/
And if you want to support the channel, then you can become a paid member of my newsletter
seattledataguy.substack.com/s...
Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
_____________________________________________________________
Subscribe: / @seattledataguy
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

Опубликовано:

 

29 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 25   
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
If you need help with your data analytics strategy or are having problems with your data quality, feel free to set-up some time with me! calendly.com/ben-rogojan/consultation
@richardgui2934
@richardgui2934 5 месяцев назад
# Short Summary ## Types of data quality checks: - Range check: checks for outliers - Category checks: works like an "enum"-s in programming - Data freshness check: fails if there was no or just a little new data - Volume data check - null check -- allow no nulls or allow for a % of fields to be null ## How to create a system to perform checks for you It is nice to have: - sending alert notifications if checks fail! - having a "Data quality" dashboard -- that contains "freshness", "volume", "null" checks, etc. - tracking change of volume, freshness, null checks over time - abstraction layers so that setting up test cases is a breeze ## Platforms Data Quality/Lineage tools exist. You can either use those or write your own tool. -- Project requirements wil help you choose There are data quality checks in DBT as well. There are Builtins ones and the great expectations library contains many more. You can also use the unit test library in order to test your data transformations in DBT. --- Thank you for the video. I love your content!
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
Thanks for the summary!
@Supersheep19
@Supersheep19 Месяц назад
Thank you so much!! It saves me time to summarise the video which is what I planned to do. Glad that I checked the comments section before I do it.
@PyGorka
@PyGorka 5 месяцев назад
Great talk. We are implementing more checks like this in our systems and they are nice. One check we like to do in Snowflake is a check to try to load a file into a check table which has the same schema of the final table. We then capture any errors in that check table, store the data in a blob and put metadata there to record it. We use this to see if a file can be loaded into the table or not. If a file can be loaded but one record is bad (Ex: missing columns) we just exclude that one row in a reject table. I'll have to look into the data operators I wonder how those well those run. This topic is so big and you could go so deep into explaining how to handle problems.
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
Thanks for sharing how your team is implementing some data quality checks its super helpful for everyone else!!!
@thndesmondsaid
@thndesmondsaid 10 дней назад
Such a good video! Data quality checks are simple/common sense but many organizations don't take the time to implement them!
@heljava
@heljava 5 месяцев назад
Thank you. Those are really great tips and as always the examples are great!
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
Glad you found this video helpful!
@JAYRROD
@JAYRROD 5 месяцев назад
Great topic - appreciate the practical examples!
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
Glad you liked it!
@jzthegreat
@jzthegreat 5 месяцев назад
Your video quality has gotten a lot better my guy. I like the different zooms of focus
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
Thank you!
@andrejhucko6823
@andrejhucko6823 5 месяцев назад
Good video, I liked the editing and explanations. I'm using mostly GX (great-expectations) for quality checks.
@nishijain7993
@nishijain7993 3 месяца назад
Insightful!
@SeattleDataGuy
@SeattleDataGuy 3 месяца назад
thank you!
@wilsonroberto3817
@wilsonroberto3817 5 месяцев назад
Hellow man, really nice video! pls i'm in doubt about which certification should I take in AWS. Solutions Architect or wait for the Data Engineer certification which starts on March? I'm work as DE and I already have the cloud practioner and az900 certifications!
@sanjayplays5010
@sanjayplays5010 3 месяца назад
Thanks for the video Ben, using this to implement some DQ checks now. How do you reckon something like Deequ fits in here? Would you run a Deequ job prior to each ETL job?
@daegrun
@daegrun 5 месяцев назад
If data quality checks are done at this level then why do I hear that a data analyst has to do a lot of data cleaning and data quality checks as well? Of the mention of the amount of failures allowed is the reason why?
@SeattleDataGuy
@SeattleDataGuy 5 месяцев назад
There are a few reasons why, not everyone implements checks, data sources can still be wrong, sometimes due to the level of integration different analysts might pull the same data from different sources, some from the data warehouse, some from 3-4 different source systems and a few other reasons...
@alecryan8220
@alecryan8220 5 месяцев назад
Are these videos AI generated? The editing is weird lol
@jorgeperez7742
@jorgeperez7742 5 месяцев назад
🫵😹🫵😹🫵😹
@andydataguy
@andydataguy 5 месяцев назад
Great to see a video talking about the trade offs! The sign of a good architect 🙌🏾🫡
Далее
EVOLUTION OF ICE CREAM 😱 #shorts
00:11
Просмотров 7 млн
I Built a EXTREME School Bus!
21:37
Просмотров 4,2 млн
I Wish I Knew These 7 ML Engineerings Skills Earlier
20:53
The Harsh Reality of Becoming Data Engineer
36:02
Просмотров 41 тыс.