Тёмный

Data Cleaning in MySQL | Full Project 

Alex The Analyst
Подписаться 793 тыс.
Просмотров 68 тыс.
50% 1

Full MySQL Course: www.analystbuilder.com/course...
In this lesson we are going to be building a data cleaning project in MySQL!
Download Dataset: github.com/AlexTheAnalyst/MyS...
GitHub Code: github.com/AlexTheAnalyst/MyS...
____________________________________________
SUBSCRIBE!
Do you want to become a Data Analyst? That's what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
____________________________________________
RESOURCES:
Coursera Courses:
📖Google Data Analyst Certification: coursera.pxf.io/5bBd62
📖Data Analysis with Python - coursera.pxf.io/BXY3Wy
📖IBM Data Analysis Specialization - coursera.pxf.io/AoYOdR
📖Tableau Data Visualization - coursera.pxf.io/MXYqaN
Udemy Courses:
📖Python for Data Science - bit.ly/3Z4A5K6
📖Statistics for Data Science - bit.ly/37jqDbq
📖SQL for Data Analysts (SSMS) - bit.ly/3fkqEij
📖Tableau A-Z - bit.ly/385lYvN
Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!
____________________________________________
BECOME A MEMBER -
Want to support the channel? Consider becoming a member! I do Monthly Livestreams and you get some awesome Emoji's to use in chat and comments!
/ @alextheanalyst
____________________________________________
Websites:
💻Website: AlexTheAnalyst.com
💾GitHub: github.com/AlexTheAnalyst
📱Instagram: @Alex_The_Analyst
____________________________________________
All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for

Опубликовано:

 

15 апр 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 251   
@user-sp8sw7vt5k
@user-sp8sw7vt5k Месяц назад
Timestamps: Removing Duplicates: 8:00 Standardizing Data: 17:32 Null/Blank Values: 33:30 Remove Unnecessary Columns/Rows: 46:12 Great video!
@obeliskphaeton
@obeliskphaeton 28 дней назад
I converted the 'date' column from text to Date format after importing. And when I ran the duplicate_cte, I only got 5 rows in output. Note: I used date instead of 'date' in the partition by section.
@harshitthakur8300
@harshitthakur8300 23 дня назад
Life saver I was just searching for this
@usmanabid5452
@usmanabid5452 4 дня назад
nice video
@anyamunkh1787
@anyamunkh1787 Месяц назад
Watched all the ads without even skipping, that's how much I am grateful for your work and time you put into this.
@tejjm4263
@tejjm4263 Месяц назад
Thanks for the kind words! I made it to the end and learned a lot while working on the project simultaneously.
@10cutie207
@10cutie207 2 месяца назад
Alex! This is why I subscribed, thank you so much for doing this in MySQL!!
@ibrahimdenisfofanah6420
@ibrahimdenisfofanah6420 Месяц назад
Patiently waiting for the exploratory aspect of the clean data. Thanks very much
@AnnNguyenHo
@AnnNguyenHo Месяц назад
Amazing Alex, this is exactly what I'm looking for my project too. Thank you so much
@alamsgodwin6179
@alamsgodwin6179 Месяц назад
Thanks Alex, Can't wait to Start this project
@cjbrown3396
@cjbrown3396 Месяц назад
watch till the end it's awesome Alex ! thanks so much
@ahmadmarzodandy6054
@ahmadmarzodandy6054 2 месяца назад
Thanks for this video, Alex! Really need it
@utkarshrana39
@utkarshrana39 Месяц назад
Hey Alex! I'm from India, I have been following you for a months but really couldn't make any project. But from the first encounter of your content, I knew I'm gonna walk on your foot steps. I loved it and also I was looking for some data like this last 2 weeks, I did tried on most the US, Indian data bureau(s) and where not. Yesterday I decided to make this project at hand, AND WOW It was that data I was looking for. Thank you so much. This is my second project ever in SQL. I love it totally from the beginning to the end. And I had so much fun doing this project, literally. I was laughing with the funny parts, even overwhelmed in the end at that industry populating part juts like. I cheered yayy. I made a mistake too, I forgot to run the deleting query of the duplicates. I had to run from the start to find out where did I miss. I love your energy and how to take things so calmly and carry the learner with you till the very motive. I probably have written too much now. I am so excited to follow this project till the visualization part. And here's one more thing I tried, I wanna show you - In the populating part, we can do it without making the blanks, null. This is the query I tried, UPDATE layoffs_staging2 t1 JOIN layoffs_staging2 t2 ON (t1.company = t2.company AND t1.location = t2.location) SET t1.industry = t2.industry WHERE (t1.industry IS NULL OR t1.industry = '') AND (t2.industry IS NOT NULL AND t2.industry != '');
@alphaghost4330
@alphaghost4330 22 дня назад
hey i'm working on the latest layoff dataset, matched some unknowns in the stage column by doing a self join on company column, should i update those unknown values?
@peaceandlove8862
@peaceandlove8862 Месяц назад
Alex videos are always so real authentic and so relevant!
@michaelp9061
@michaelp9061 Месяц назад
Incredible tutorial Alex. Thank you!
@luckyraji6828
@luckyraji6828 Месяц назад
Thanks alot.This tutorial is on point and timely.This is what I have been looking for
@muneebbolo
@muneebbolo Месяц назад
Thanks for sharing this helpful content, Alex! We need more of this.
@user-uj2om9it6u
@user-uj2om9it6u Месяц назад
Thanks Alex. Great video. The best tip so far was from the data cleaning vid. I didn’t realize that I could check the results before executing changes on the database. Like start_time, length(start_time), SUBSTRING(start_time,1,19) to check truncating that string prior to the real deal.
@gauravtanwar8886
@gauravtanwar8886 2 месяца назад
exactly what i was looking for! thanks a lot 🙌🏻
@thamizarasan1913
@thamizarasan1913 Месяц назад
Thanks for doing a project in SQL. Waited for long.
@ratnakshtyagi3564
@ratnakshtyagi3564 Месяц назад
thanks alex for this data cleaning practice
@hazoom
@hazoom Месяц назад
i Appreciat your work Alex, well done
@newenglandnomad9405
@newenglandnomad9405 2 месяца назад
Outstanding video. I did follow most of it, the rest I'll rewind and study. Definitely going to be doing the code along myself and posting to my portfolio. Thanks for the very detailed walk through. I am trying to get better so I can try this as a side hustle while looking for a data job. I have a comfy IS help desk job, I'm just bored to death of it and not learning anything new.
@sujayy6851
@sujayy6851 Месяц назад
Thanks a lot for simplifying MYSQL Alex!
@cgadison
@cgadison Месяц назад
This was very insightful, thank you so much for this Alex.
@zakhelembhele7046
@zakhelembhele7046 20 дней назад
Alex, You're so natural. The Best yet!
@MrWonderninja
@MrWonderninja 27 дней назад
Learned a lot following along through this, excited to follow the EDA next!
@rakeshbasak6842
@rakeshbasak6842 Месяц назад
awesome work Alex! and thanks for providing this kind of contents
@rokibhasan5184
@rokibhasan5184 Месяц назад
Looking forward to next project
@Ladyhadassah
@Ladyhadassah 2 месяца назад
Great work, Alex. we love you
@SafiaSingla
@SafiaSingla Месяц назад
This was an amazing tutorial!! Thank you
@hasanrazakhan7154
@hasanrazakhan7154 Месяц назад
Thanks Alex for this amazing video
@derrickmedina2796
@derrickmedina2796 23 дня назад
Great work! Love how it was all broken down
@yustone
@yustone Месяц назад
Thanks, I really like this project
@ibrahimolasunkanmi7576
@ibrahimolasunkanmi7576 15 дней назад
Alex The Analyst, You are a Blessing to this Generation...
@alice60372
@alice60372 Месяц назад
Alex u r the best! Thank you so very much... Plzzz do more videos on data cleaning.
@Arslan-nm3rz
@Arslan-nm3rz 2 месяца назад
Thank you very much Alex!!!
@zaidahmad9735
@zaidahmad9735 2 месяца назад
Hey alex , thanks for the video. Please cover data cleaning in stata or R as well.
@leosch80
@leosch80 2 месяца назад
Excellent Alex!!! You read my mind, man! This is just what I needed to put in my portfolio. THANK YOU
@user-tz5cq6vh5k
@user-tz5cq6vh5k Месяц назад
Truly Blessing for us.
@yuko2732
@yuko2732 2 дня назад
Thank you so much, Alex. 😊
@dennisbunarta1190
@dennisbunarta1190 Месяц назад
I love this channel.. God bless you alex
@womanonamission8677
@womanonamission8677 12 дней назад
Took me all day but yayyy I’m done my first sql project!!
@harshitthakur8300
@harshitthakur8300 28 дней назад
Great video and easy to learn from these kind of videos.
@krystalbrantley4996
@krystalbrantley4996 Месяц назад
Thank you so much for sharing your expertise. I learned and I laughed (your comments 😄) throughout the tutorial. You're awesome!
@AlexTheAnalyst
@AlexTheAnalyst Месяц назад
Haha glad to hear it! Learning should be fun :D
@AnalystInMaking
@AnalystInMaking Месяц назад
Has anybody told you that you are not just good but you are AWESOME !👑
@bayonette1
@bayonette1 Месяц назад
it was a very useful project, thank you very much 🙌
@medhaniemahari4265
@medhaniemahari4265 Месяц назад
...and another lesson taken... 47:35 "I can't trust that data, I really can't!" We should get to this level before confidently deleting 'useless' rows! As always, Alex you're the best! Thank you very much for all your contribution!
@kahinaabbas9792
@kahinaabbas9792 2 месяца назад
just on time 🤩
@truthgaming2296
@truthgaming2296 Месяц назад
thank you alex, i love your explanation very much thank you XD
@chethuchethu6530
@chethuchethu6530 Месяц назад
great explanation
@tcrawford8430
@tcrawford8430 2 месяца назад
Thank you 🙏
@usmanabid5452
@usmanabid5452 4 дня назад
great work
@CalvinDsilva-cz8iq
@CalvinDsilva-cz8iq 9 дней назад
Really Amazing videos
@AkashKumar-jo7ec
@AkashKumar-jo7ec 18 дней назад
@Alex The Analyst , Thank you so much for sharing a genuine Content . Till now i have learnt lots of SQL Tutorial , (there is one issue on fixing Text Type to Date ) . I hope when you find this message definately help me out there.
@duurduranto
@duurduranto Месяц назад
Absolutely phenomenal work. Thank you very very much for this. Cleared some of the concepts and at the same time created a project. Absolutely love it.
@TheSupersayan6
@TheSupersayan6 Месяц назад
can you make a tutorial on how to connect this mysql database in power bi and make a dashboard for it?
@user-wg1kf5rc3h
@user-wg1kf5rc3h 2 месяца назад
Appreciated
@muneebbolo
@muneebbolo Месяц назад
I did get all the results as you get in the except third step of removing the null or blank values anyways I hope in future I will get the desired results as much as possible I can in the process of data cleaning.
@MaidulRashidRaj
@MaidulRashidRaj День назад
everything is ok except i cannot update the date format, ive checked the preference section still not working 😢😢😢😢, help
@allyoucaneat12
@allyoucaneat12 Месяц назад
Hi Alex, thanks for the good work. A quick question, when working on a dataset and in the process of cleaning you have an age column with a negative value, is it advisable to delete the negative age values or to assign another age to them. Also where you have someone less than 20 years with a PhD degree, should I assign another age or delete the rows
@AlexTheAnalyst
@AlexTheAnalyst Месяц назад
Those would be questions I would ask the person I got the data from. Since we don't have that you just have to do the best you can. If you think the negative should be positive that may be the right approach? Or deleting that row altogether may correct. Hard to say if we can't go back to the client or stakeholder.
@piromaniaco3579
@piromaniaco3579 19 дней назад
Just finished doing this one, really fun and practical. Now heading to the EDA part. A question, I am not sure how to post this projects in a portfolio. I normally publish projects in my github page when it's about web or app development but I've never done SQL projects before, how is it supposed to be published to make it properly visible for recruiters for example. Thank Alex for all the value you share
@charlesolukoju4809
@charlesolukoju4809 Месяц назад
Alex, thank you very much to this Please I'm having issues downloading the data set
@sayematazinshaikh6656
@sayematazinshaikh6656 2 месяца назад
Nice!
@jaden13art50
@jaden13art50 18 дней назад
Hey Alex I was wondering if you could help me troubleshoot what I'm doing incorrectly. All my row numbers are different when I run the query during the 9:31 duration of the video.
@nickelikem
@nickelikem 20 дней назад
I find your method of removing duplicates too complicated. When inserting into the new table, I removed duplicates by running 'SELECT DISTINCT * ....' . Is there any downside to my method? Are there cases where it wouldn't work?
@andiralee408
@andiralee408 13 дней назад
Oh oh me too! I used Union of the same table to get rid of duplicates, but SELECT DISTINCT * is just so much quicker and shorter! Thanks for sharing :) Great job us! PS: Alex if you're seeing nickelikem's comment please also see if there might be any downside to my method in the long run. Thanksss
@kevinmugo2662
@kevinmugo2662 Месяц назад
When it comes to the part on dealing with null values , Specifically on checking where company is 'Airbnb' am getting 4 rows instead of 2 . It is a clear duplicate and I can't figure out why.
@ShortClipsPodcasts
@ShortClipsPodcasts Месяц назад
I'm having problem importing the data, the orginal data has 2000+ rows, but when I import it, it only has 564. Does anyone know how to fix this issue?
@Pato-rt1vh
@Pato-rt1vh Месяц назад
Same, if anyone knows a video I can watch to fix it just let me know. 👍🏽
@SomeStatus
@SomeStatus Месяц назад
@@Pato-rt1vh convert that .csv into a json!
@piromaniaco3579
@piromaniaco3579 Месяц назад
I am facing the same issue. I just came to the comment section to see if anyone can bring some light to it. I really want to practice and do the project, it's frustrating to be stuck right before starting.
@ichigokurosaki6470
@ichigokurosaki6470 Месяц назад
I’m having the same issue
@rahulganeshregalla1165
@rahulganeshregalla1165 16 дней назад
I faced the same issue. I don't have a solution but a suggestion. Just follow through the video with whatever data u could import. Won't be perfect, but try to get what we're trying to do here. Then just try to practice data cleaning on some raw data which you can find on kaggle.
@sachinsrivastava2177
@sachinsrivastava2177 Месяц назад
CTE is not working in MySQL version 8.03 it is showing syntax error , How I can fix this?
@alphaghost4330
@alphaghost4330 22 дня назад
i matched some unknowns in the stage column by doing a self join on company column, should i update those unknown values?
@ryanjames3235
@ryanjames3235 Месяц назад
I'm trying to do this project following the steps but I'm having difficulty creating a new table from the exiting table, the like function is not working , same as the as and select from statement
@derrickmedina2796
@derrickmedina2796 23 дня назад
Should I add this to my portfolio or finish the analysis part then add that part together
@mcxineupdates1117
@mcxineupdates1117 27 дней назад
I'm unable to create the table using the copy to clipboard method , please is there any allternative?
@sanchitsharma2899
@sanchitsharma2899 Месяц назад
Is there any other way then manually scrolling through data to standardize it?
@yousraahmedd
@yousraahmedd 20 дней назад
Regarding populating the rows, Can I google missing info such as the country if it's missing?
@USA_Diariez
@USA_Diariez 28 дней назад
At 27:20 why are we using where country like USA? If we skip the where condition wont it fix all places in the column where there is a '. ' ?
@muneebbolo
@muneebbolo Месяц назад
Null values or Blank values was the trickiest part for me to understand I'm still trying to understand what is just happened with the column 😀
@shekharwadhawan6707
@shekharwadhawan6707 15 дней назад
hey I'm unable to import all the dataset from the csv file into mysql. everytime it returns 565 rows ,can some suggest the solution to this
@ppiercejr
@ppiercejr Месяц назад
I understand the ease of which using the upload tool makes it to create a table and import the data all at once, but I find that it makes it cumbersome in this case since there is no unique column. Is there a reason that you wouldn't create the schema for the table and create an auto incrementing id column that is the primary key to assign a unique id to every row, then use the row_number function to search for duplicate rows using all the columns except the id column. This would save you from having to create a duplicate table to store row_num as you could just use the id column to delete the duplicate records. This also seems like it would make your database easier to deal with since it would have a proper primary key. Sure, it is a meaningless primary key, but it would certainly make updating the data easier and faster in many cases.
@tanishquegupta
@tanishquegupta 26 дней назад
Hi Alex, Can you explain how you are using ROW_NUMBER() without using ORDER BY?
@johithsriman2220
@johithsriman2220 Месяц назад
Iam using oracle sql developer when i use update and join it throws me an error when i check the error in stack over flow it says that Oracle does not support JOIN clause in UPDATE statements. how can i over come this error
@huonggiang8277
@huonggiang8277 8 дней назад
I have a question. I do the exact same thing, but there is nothing when I do the 'removing duplicates' part. I did all the processes again and just realized that mine is only 564 records. I don't know why. Can you explain how to fix it?
@georgek398
@georgek398 Месяц назад
why does the data appear to be pre-cleaned? I'm not seeing the duplicates and I'm not seeing the different crypto industries...
@AlexTheAnalyst
@AlexTheAnalyst Месяц назад
Are you getting the data from the github link?
@georgek398
@georgek398 Месяц назад
@@AlexTheAnalyst thanks for your response. I am getting it from github. with the help of the comments i believe i have figured it out. i had to convert the data to JSON, and raise the 'Limit to 1000 rows' dropdown menu to something higher than the length of this data. Otherwise I was about to give up, so perhaps a description update would help other viewers. Now I just have to change all the 'NULL' strings in the JSON data into actual NULL values. Thanks again
@ichigokurosaki6470
@ichigokurosaki6470 Месяц назад
⁠@@georgek398Have you figured out how to change the null strings into null values? I’m stuck at that at the moment
@razeeop5673
@razeeop5673 Месяц назад
what if we clean our dataset in pandas and then EDA in SQL nut how can we open that pandas's dataset in SQL?
@zeboulounyoan5723
@zeboulounyoan5723 25 дней назад
Hi, amazing video. explanations are awesome. I just had a question: when I did the import only 564 records were imported. I don't understand why not all records are imported???
@AbhishekSharma-vm7tr
@AbhishekSharma-vm7tr Месяц назад
i am getting null when i am trying to convert text date into date format what can i do
@aslamshaikhisdi
@aslamshaikhisdi 7 дней назад
Hello, Could anyone please help me with uploading this data set to my SQL 8 bench? Table has been created but the data is not loading.
@kusumsharma7293
@kusumsharma7293 Месяц назад
Hello sir thank you for making such precious videos, I have subscribed you, I follow your all recent videos, but what you did here , I tried to do, but I m doing all those things in MySQL workbench, with duplicates _cte, to remove duplicate, it doesn’t work , here, could you please suggest me what should I do for MySQL workbench
@irfankhan-qj4mu
@irfankhan-qj4mu 28 дней назад
Sir, if you write table_name .date(column name) , it works better,becoz 'date' does not work in my workbench ,dont but after all super cool you are Sir, thanks and respect from Pakistan
@Super8-wf7iy
@Super8-wf7iy Месяц назад
when will you upload the next part ?
@typicalstoic2089
@typicalstoic2089 3 часа назад
is it normal that sometimes the results of data appeares different from the video when I'm writing a query even though I'm doing exactly the same thing?
@alamsgodwin6179
@alamsgodwin6179 Месяц назад
What software are you using?
@user-su9br6zz7e
@user-su9br6zz7e 23 дня назад
Where do you get the data sets?
@inshakhan3356
@inshakhan3356 Месяц назад
I am unable to import the data file in SSMS, its showing errors while importing
@saniya_thousif
@saniya_thousif 13 дней назад
I didn't understand how the row number function is working. As it basically assign unique number for each record but in your case it's assigning the same number @alex
@maxwelltsangya3810
@maxwelltsangya3810 Месяц назад
Null Handling: 46:29 these are coming from the dates as it appears like on these dates, there were no layoffs, but same companies might have layed people off on other days.
@rnielvhinsieans.elpidama6935
@rnielvhinsieans.elpidama6935 2 месяца назад
question can i still follow this if I use PostgeSQL/PgAdmin?
@huntercoleman1347
@huntercoleman1347 2 месяца назад
Yes. The SQL will largely be the same. Except importing the data will be different. In PostgreSQL, you could try something like: COPY the_table_name FROM 'C:\Path_to_file\file_name.csv' WITH (FORMAT CSV, HEADER); Note: The table must already exist. HEADER means that your csv file on your computer has a header, which the database import should ignore. If you decide to import via the PgAdmin GUI, it will be different than MySQL workbench. Also, when Alex used STR_TO_DATE in MySQL, the equivalent in PostgreSQL is TO_DATE, and you don't need the percent signs before the format characters.
@potato-sweet
@potato-sweet 19 дней назад
i'm unfortunately stuck right at the beginning T_T when importing the table, my own mysql shows that only 564 records have been imported but i know theres a total of 2631. i've managed to find a query to import the data and it shows the following error: Error Code: 3948. Loading local data is disabled; this must be enabled on both the client and server sides anyone else facing this issue and know how to solve it?
@pensenaute
@pensenaute 15 дней назад
I have this problem with MySQL with other databases from Kaggle. It’s so frustrating 😭
@things_that
@things_that Месяц назад
The laugh of triumph at 43:18-19 😂😂😂
@muneebbolo
@muneebbolo Месяц назад
I would love to know more about the SCHEMAS section of MySQL because I am still learning it so please suggest me a video on this topic if you have already made Thanks in advance.
@bharathlingampalli4708
@bharathlingampalli4708 Месяц назад
while importing the dataset, it is automatically removing many rows. It has only imported 534 rows. Can anyone help me with this?
@J.L1212-dv9tr
@J.L1212-dv9tr 29 дней назад
Hi, Im not sure about this but did he forgot to delete the duplicate?
@harshitthakur8300
@harshitthakur8300 23 дня назад
It would be better if you divide video into sub topics so that if we want we can come back and see the seocific part of the video.