Тёмный

Clean MESSY String Data in Pandas 

Rob Mulla
Подписаться 174 тыс.
Просмотров 81 тыс.
50% 1

Coding in #python and #pandas you can easily clean messy string columns with some built in methods. Here we show an example of cleaning address values to make them more standardized. #datascience

Развлечения

Опубликовано:

 

3 мар 2023

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 55   
@ErulianADRaghath
@ErulianADRaghath Год назад
I love pandas for cleaning up problematic data sets before feeding them into a model. It's just so dang satisfying to see messy data turn into nice consistent points on a scatter plot :D
@robmulla
@robmulla Год назад
Totally agree. Pandas is a powerful tool for data manipulation and wrangling.
@Fubbel42
@Fubbel42 Год назад
I would omit the " " in str.split (it defaults to split on all whitespace). Though it doesn't matter that much if you do n=1, if you have messy data, chances are there are double spaces, too, which may give you empty strings that may cause issues down the line.
@robmulla
@robmulla Год назад
I didn’t realize the white space was default. Good to know. Good point about double spaces.
@G2Chanakya
@G2Chanakya 10 месяцев назад
​@@robmulla Yes. Also the someplaces the split could be commas and stuff.
@Ras-kr5nw
@Ras-kr5nw 10 месяцев назад
What would be a solution? I have a dataset with many different kind of "spaces" (double, tripple,...) and i just can't figure it out
@ruidodevinilo
@ruidodevinilo 8 месяцев назад
​@@Ras-kr5nw Maybe regex: r'\s+' That means match one or more white spaces.
@RubenRyuZMoya
@RubenRyuZMoya 4 месяца назад
@@Ras-kr5nwfirst use a regex pattern to replace all whitespaces longer than 1 to 1 whitespace
@kmateti
@kmateti Год назад
This content is exactly what I needed. Thank you!
@robmulla
@robmulla Год назад
You're so welcome!
@Blessed_91
@Blessed_91 2 месяца назад
Great and informative short!
@UnholyRenton
@UnholyRenton Год назад
Nice! I would also recommend using chaining to make it a bit more readable
@robmulla
@robmulla Год назад
Great point! I have a video with pandas tips and that's one of them. I should use my own advice :D
@jonathangarciasilveira4297
@jonathangarciasilveira4297 Год назад
I love your channel. Keep it up
@robmulla
@robmulla Год назад
Thanks man!!
@daironperezfrias7819
@daironperezfrias7819 Год назад
I just beggin in this world, this is very helpeful for me, Thanks
@robmulla
@robmulla Год назад
Glad you found it helpful!
@DiegoEmeGe
@DiegoEmeGe 8 месяцев назад
Thank you so much!
@CarolinaMunoz-vy3ni
@CarolinaMunoz-vy3ni Год назад
Awesome 🎉 thanks
@robmulla
@robmulla Год назад
Glad you liked it!
@gemfire400
@gemfire400 Год назад
Thank you!
@robmulla
@robmulla Год назад
No. Thank you!
@ZeuSonRed
@ZeuSonRed 10 месяцев назад
You are rising
@code2compass
@code2compass 8 месяцев назад
Wow
@rahul98003
@rahul98003 Год назад
Nice Video 👍I don't know how to code. But I can relate this to MS excel..
@robmulla
@robmulla Год назад
Thanks! If you use excel you should definitely try out pandas.
@gsp_admirador
@gsp_admirador Год назад
Nyce
@julesdrums6167
@julesdrums6167 Год назад
This is rad.
@robmulla
@robmulla Год назад
Thanks!
@rikaminski
@rikaminski Год назад
dict_rp = {'St.':'Stress', 'Rd':'Road'} df_data['Address'].replace(dict_rp, regex = True)
@robmulla
@robmulla Год назад
I like it. But I think stress is a typo
@LethalLuggage
@LethalLuggage 6 месяцев назад
That data looks so clean I'm jealous. This wouldn't work on the address data I deal with
@tintindb
@tintindb Год назад
😮👍
@robmulla
@robmulla Год назад
🙌
@littlepianist89
@littlepianist89 Год назад
The second line of code raises an error for me (TypeError: string indices must be integers). Does anyone know why this happens? When I'm not trying to reassign the column it works just fine.
@Fine_Mouche
@Fine_Mouche Год назад
What the $ make in strings since regex is set to false ?
@robmulla
@robmulla Год назад
Good catch. I had to change regex to True but it got cut out of the video.
@vinikun9105
@vinikun9105 Год назад
Can i know what kind of software your using please
@Xarxes104
@Xarxes104 6 месяцев назад
The "ohh it looks better now" feeling after cleaning up some dogshit data.
@pewster31
@pewster31 6 месяцев назад
Any suggestions on dealing with date strings? I can’t seem to parse them into a date object to save my life. Formats all over the mmddyyyy yyyymmdd. Nightmare.
@alejandropu
@alejandropu Год назад
Next video: how to get time control of shorts on youtube with Python ;)
@robmulla
@robmulla Год назад
Hah. I don’t know why it doesn’t work for you. I can scrub forward and backwards no problem…
@alejandropu
@alejandropu Год назад
@@robmulla My solution sometimes is tweet a short, then I can use time controls. But not in RU-vid website.
@MachineLearningPro
@MachineLearningPro 7 месяцев назад
Great video! Take a look at my Pandas tutorial if you want.
@br4252
@br4252 7 месяцев назад
Bro some condos and apartments are labeled by the half (ie. 354.5 urmoms lane). You just messed it all up in the matter of 10 seconds.
@sirJ0rd4n
@sirJ0rd4n Год назад
What is this terminal dude ?
@robmulla
@robmulla Год назад
Vscode + jupyter.
@rikaminski
@rikaminski Год назад
Replace function with dictionary...
@robmulla
@robmulla Год назад
That’s a great idea. Have a code example of how I could do it?
@rikaminski
@rikaminski Год назад
I will do it here and send it to you later. My code here is direct, when I use this dictionary { 'Y': 'Yes', 'N': 'No'...} on a column using replace works, but in this case it needs to be handled a bit first.
@rikaminski
@rikaminski Год назад
@@robmulla dict_rp = {'St.':'Stress', 'Rd':'Road'} df_data['Address'].replace(dict_rp, regex = True)
@unapologeticchetan6566
@unapologeticchetan6566 Год назад
Will Excel Hold My beer 🍺 😅
@unapologeticchetan6566
@unapologeticchetan6566 Год назад
*while
@robmulla
@robmulla Год назад
😂
Далее
Ручка из шланга, лайфхак
00:11
Просмотров 16 тыс.
[RU] Winline EPIC Standoff 2 Major | LAN | Final Day
9:48:47
Tweets Scraping Using Beautiful Soup 4 and Selenium
3:50
Andrew Ng Machine Learning Career Advice
10:02
Просмотров 92 тыс.
I can't STOP reading these Machine Learning Books!
0:26
AI vs Machine Learning
5:49
Просмотров 1 млн
У Котика Отняли Игрушку 🥺
0:15