Part two Pytest integration with ETL pipeline: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-7FPksG-LYOA.html Part three of Pytest - Data Quality report: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Sv6QWF7J63k.html
@@srh1034 sure. Here is an overview of the channel's content and the ETL series sequence. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pjiv6j7tyxY.html
You did a great job. I was looking same material for long time. Thanks man for sharing great content. I have many questions on pytest, will ask many questions once I go through all videos . Thanks
Here is the link to the video in the series that runs data quality test against sql server. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-7FPksG-LYOA.html Here is the link to the series: ru-vid.com/group/PLaz3Ms051BAkgmoRZEcGFvQzY4YW_SR8b
You can embed these tests in your Data Pipeline, below is an example. Once you schedule it via an orchestrator then these tests will run each time your pipeline is triggered. You can use any tool like Airflow, Dagsters, Prefect or cron to schedule Python based pipelines. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-7FPksG-LYOA.html&ab_channel=BIInsightsInc Airflow: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html&ab_channel=BIInsightsInc Dagster: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-f1TbVGdhmYg.html&ab_channel=BIInsightsInc
I will try and cover this in the future. In the meantime you can check out the following videos on the testing and automating the ETL pipelines. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-7FPksG-LYOA.html ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Sv6QWF7J63k.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-7UQ91Ib7PtU.html&t How to automate Python based ETL pipelines. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-f1TbVGdhmYg.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-IsuAltPOiEw.html
If you want to log the test for review or sharing then check out the next video. I haven't played around with Tqdm but here is there docs and implementation. Maybe in the future I will implement this in a project. github.com/tqdm/tqdm
Thank you for a great tutorial! You already have few different videos, can you add a number(to order them) to each tutorial it can help which video is the first and which one is the last.
Thanks and good suggestion. I have consolidated the data quality videos in their own playlist. Here is the link: ru-vid.com/group/PLaz3Ms051BAkgmoRZEcGFvQzY4YW_SR8b
Thanks for spotting this. I have updated the code base. You can use the following assertion. # check for nulls def test_null_check(df): assert df['ProductKey'].notnull().all()
If the data type of this column is string or object then it will be pass. If you have datatype of Int or float then it will fail. You can also remove the "O" and test for string if that's the objective. Here is an example of this test with int. github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/Pytest/Session%20one/string%20and%20object%20test%20result.png
@@BiInsightsIncThanks for responding. When I have the column value as 1, which is int below assertion is passing. I tried to remove "O" and then it's failing but it fails even if the data type is string. assert (df["Genre"].dtype == str or df["Genre"].dtype == 'O')
@@dmunagala you need to check the data type. Value might be 1 but it can be stored as string. Check my previous comment I have link to this test and it’s failing with int data type.
@@BiInsightsInc Yes, you are right. I checked the datatype by using, df.info() and got to know the exact datatypes for all columns in my csv file. It is working as expected. Thank you so much for your help, you are amazing!!