Тёмный
No video :(

Clean Excel Data With Python Pandas - Removing Unwanted Characters 

Derrick Sherrill
Подписаться 84 тыс.
Просмотров 111 тыс.
50% 1

Hey Everyone, in this one we're looking at the replace method in pandas to remove characters from your spreadsheet columns.
Be sure to post what you want to see next!
Kite helps fund the channel, thanks for checking them out and supporting me --
⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. www.kite.com/g...
Support the Channel on Patreon --
/ derricksherrill
Join The Socials --
Reddit - / codewithderrick
FB - / codewithderrick
Insta - / codewithderrick
Twitter - / codewithderrick
LinkedIn - / derricksherrill
GitHub - github.com/Der...
*****************************************************************
Full code from the video:
import pandas as pd
excel_file_path = 'office_info.xlsx'
df = pd.read_excel(excel_file_path)
print(df.head(2))
for column in df.columns:
df[column] = df[column].str.replace(r'\W',"")
df.to_excel("removed_characters.xlsx")
github.com/Der...
Packages (& Versions) used in this video:
Python 3.8
Pandas 0.25
Atom 1.41
*****************************************************************
Code from this tutorial and all my others can be found on my GitHub:
github.com/Der...
Check out my website:
www.derrickshe...
If you liked the video - please hit the like button. It means more than you know. Thanks for watching and thank you for all your support!!
--- Channel FAQ --
What text editor do you use?
Atom - atom.io/
What Equipment do you use to film videos?
Blue Yeti Microphone - amzn.to/2PcNj5d
Mic sound shield - amzn.to/3bVNkEt
Soundfoam - amzn.to/37NV9ci
Camera desk stand - amzn.to/3bX8xhm
Box Lights - amzn.to/2PanL95
Side Lights - amzn.to/37KSNut
Green Screen - amzn.to/37SFFnc
What computer do you use/desk setup?
Film on imac (4k screen) - amzn.to/37SEu7g
Work on Macbook Pro - amzn.to/2HJ5b3G
Video Storage - amzn.to/2Pey8sw
Mouse - amzn.to/2PhCtv3
Desk - amzn.to/37O1Mv1
Chair - amzn.to/2uqHE4E
What editing software do you use?
Adobe CC - www.adobe.com/...
Premiere Pro for video editing
Photoshop for images
After Effects for animations
Do I have any courses available?
Yes & always working on more!
www.udemy.com/...
Where do I get my music?
I get all my music from the copyright free RU-vid audio library
www.youtube.co...
Let me know if there's anything else you want answered!
-------------------------
Always looking for suggestions on what video to make next -- leave me a comment with your project! Happy Coding!

Опубликовано:

 

20 ноя 2019

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 199   
@CodeWithDerrick
@CodeWithDerrick 4 года назад
Thanks for being here! What do you want to see next?
@shahzan525
@shahzan525 4 года назад
Make on mergesort..
@andrzejandrzejos6674
@andrzejandrzejos6674 4 года назад
I'd love to see some stock filter that scrapps data from finviz :)
@KostasPanagias
@KostasPanagias 4 года назад
Hi Derrick! I would like to see a way (if there is any) to extract part of a large excelsheet which is highlighted with specific color (font or background color). For example in an excel with 1000 rows, to extract only those rows that have yellow color (or red font, or even combination) in a new excel file (with the first row which could be the data labels).
@cu806
@cu806 4 года назад
I keep receiving the following error. I am trying to remove the pipes from my excel spread sheet. Traceback (most recent call last): File "C:/Users/User/Documents/remove_characters.py", line 18, in df[column] = df[column].str.replace('\|\|','') File "C:\Program Files (x86)\Python37-32\lib\site-packages\pandas\core\generic.py", line 5175, in __getattr__ return object.__getattribute__(self, name) File "C:\Program Files (x86)\Python37-32\lib\site-packages\pandas\core\accessor.py", line 175, in __get__ accessor_obj = self._accessor(obj) File "C:\Program Files (x86)\Python37-32\lib\site-packages\pandas\core\strings.py", line 1917, in __init__ self._inferred_dtype = self._validate(data) File "C:\Program Files (x86)\Python37-32\lib\site-packages\pandas\core\strings.py", line 1967, in _validate raise AttributeError("Can only use .str accessor with string " "values!") AttributeError: Can only use .str accessor with string values! Here is my code and I formatted all my data as text as well. Added Noob comments in code lol import pandas as pd import os # file should be formatted as text using the format cells>text option # since were not working in the same file directory you need to specifiy the file path # Make sure you are using double backslashes to seperate and single quotes to enclose the path inside parenthesis excel_file_path = ('C:\\Users\\User\\Documents\\Teset2.xlsx') #creates data frame and reads from excel df = pd.read_excel(excel_file_path) # only returns the first 2 lines print(df.head(2)) for column in df.columns: df[column] = df[column].str.replace('\|\|','') print(df) df.to_excel('C:\\Users\\User\\Documents\\Clean_Data.xlsx')
@customnotion
@customnotion 4 года назад
But for the r'' is for raw string?, Can we use it for regex also?
@aatsw
@aatsw 3 года назад
I've binge-watched quite a lot of your videos in the last two days. Amazing contents, and you always managed to keep it concise yet very clearly explained. Thank you. Please keep updating.
@vanakornsirijongprasert1726
@vanakornsirijongprasert1726 3 месяца назад
I like how you went back explaining each methods and what they would do.
@aduck24
@aduck24 4 года назад
This was what I needed and you made it so clear to understand. Thank you 😊
@subzeroLV
@subzeroLV 4 года назад
Thank you Derrick! That was exactly what I was looking for! I have no doubt I can apply this to my situation. I really appreciate you taking the time to make this video!
@CodeWithDerrick
@CodeWithDerrick 4 года назад
I'm glad it works for you. Always happy to help!
@kadhraedgerly4267
@kadhraedgerly4267 3 года назад
Great video! I'm currently trying to remove the units (and characters) in 3 columns of a dataset so I can gain greater insights and analytics. Basically, I want to just leave the numbers and convert it to a float from an object. I'm working in Python for this project. Would love a video on this!
@sahilkhan2470
@sahilkhan2470 Год назад
clear, easy and straight to the point. A real man
@amritamishra2313
@amritamishra2313 3 года назад
I Was stuck here since 2 hours..your code did help..!!Thanks...
@namitachadha3882
@namitachadha3882 4 года назад
Great!! Thankyou. But when i execute the for loop for removing the special character ,cell consisting of ''no special character'' or noise word is giving empty cell . While exporting, I am getting only the clean data..I dont know why is this happening.. Please help!!
@shawnpetersen5338
@shawnpetersen5338 4 года назад
Great video, keep up the good work. I just started my Data Analytics journey and using "R" but self-teaching myself on Python.
@mohamed4743
@mohamed4743 3 года назад
Thank you Derrrick great video. I'm new to python and was a bit sceptical to use with excel. I'm quite proficient in excel, but this seems faster than text to columns functionality of excel. I'll give a bash and see.
@jesusbvasquezq6597
@jesusbvasquezq6597 2 года назад
Excellent video, straight to the point and clearly explained. It helped me a lot. Thanks a million for this.
@vipanpatial2243
@vipanpatial2243 Год назад
my csv have brackets in one my column for exp [ ALHSD1 (125mm) ALHSD2 (175mm) ALHSD3 (225mm)] , how i can remove these brackets and keep the text and numbers. Please help
@darshitsolanki7352
@darshitsolanki7352 3 года назад
I like really easy and short videos to explain things and ur contents are really like mine category 😂
@petterstensland3888
@petterstensland3888 4 года назад
Derrick, super thank you for these videos. Invaluable!
@samr598
@samr598 4 года назад
This was what I needed and you made it so clear to understand. Thank you
@sheyi5471
@sheyi5471 10 месяцев назад
Your English was clear and understandable. But please how do I remove a specific character from a list in excel for instance I want to remove PG from the list of PG134589…………PG463737, PG637389. Your response would be highly appreciated. Thank you
@adeboyeopatimehin578
@adeboyeopatimehin578 2 года назад
Good job, pls trying to do this on a data set that is on multiple excel spreadsheets. Will appreciate any useful input.
@ZaidSoftware
@ZaidSoftware 3 года назад
Hi @Derrick Sherrill how can split words in one column to many columns in Excel by python thank for everything
@DebayanKar7
@DebayanKar7 4 года назад
What if i have a dataset full of mis-typed words and a well defined dictionary of correct, standard words. I there an efficient way to replace the mistyped words in pandas?
@jaredoirere4746
@jaredoirere4746 5 месяцев назад
Brilliant and precisely well explained
@hasibulhasan2798
@hasibulhasan2798 4 года назад
Thanks man! This was very helpful also to remove unnecessary character from csv data.
@johnnynicolas4622
@johnnynicolas4622 2 года назад
thanks I spent about two days to solve that problem finally you help me !
@sudiptomitra
@sudiptomitra 3 года назад
Very simple, yet so effective & precise !
@JohnsonKongor
@JohnsonKongor 4 года назад
Thanks, Derrick. I love this cause I work with many spreadsheets.
@darshitsolanki7352
@darshitsolanki7352 3 года назад
Amazing video 🎥 keep hustling 🔥🔥🔥🔥
@shankerm3959
@shankerm3959 4 года назад
Excellent job How about an advanced regex tutorial? That would be awesome to have.
@matattz
@matattz Год назад
I need a recommendation for the BEST Regex course online, or books!It should be easy to digest but also all you really need. Thanks guys!
@leeblack2103
@leeblack2103 2 года назад
Whether if you wanted to reduce the length of each field. For example, if the field exceeds 35,000 characters truncated.
@ilmiahgunggus7078
@ilmiahgunggus7078 2 года назад
So when we use str.replace(special_characters), a column which contains integer will be ignored ?
@aqiltank
@aqiltank 3 года назад
Love from Malaysia, I like your videos. Keep it up 😇
@luismoreyra6804
@luismoreyra6804 3 года назад
wow! i'm really amazed with this vid! you saved me a lot of headaches dude!! thanks a lot!!!
@joxa6119
@joxa6119 2 года назад
Question. How can we remove such data that has space in the middle of the value? ie "123 4445"
@jayh240
@jayh240 2 года назад
Can anyone help me as given code is not working for df[column] = df[column].str.replace(r'\W',"") .str.replace is not working
@Richkotite1
@Richkotite1 2 года назад
Great video, I wanted to output to excel at the end of the routine but its not creating the file. It worked in your "if replacement video" but not here, any ideas?
@douglaslopez2311
@douglaslopez2311 2 года назад
Thanks for your video. I tried to delete any special character of my pandas DF as you explained and I still have the error "None of Index are in the [columns]". Do you know what is the cause? Thank you.
@hassanmahamat-pz8fx
@hassanmahamat-pz8fx 3 месяца назад
Very clear explanation
@brendafosmire6519
@brendafosmire6519 4 года назад
Thanks. Very useful for me since I’m learning Pandas.
@rish_1823
@rish_1823 Год назад
Hi Derrick how could we remove special characters from starting of a string and ending of a string in a dataframe
@enricomendiola9952
@enricomendiola9952 5 месяцев назад
Hello Derrick great video. I just want to ask what IDE or editor are you using in this video?
@torque6389
@torque6389 4 года назад
Another great video! You are very good at explaining things.
@samuelmensahbaffoe8303
@samuelmensahbaffoe8303 3 года назад
hi, how do you remove Numbers attached to letters as a string in a column
@monihareddy5491
@monihareddy5491 3 года назад
You are a life savior for me. Thank you so much for this.
@cindywang2658
@cindywang2658 2 года назад
This is very helpful. Thank you! Do you have example to remove all characters that are not in ascii[0-127]
@soegengriyadi7459
@soegengriyadi7459 Год назад
Thanks for the tutorial. Could you help me how to remove time with format 23:58:00+05 become 23:58:00
@shivanireddy7734
@shivanireddy7734 2 года назад
Hi, can you please do a video on how to remove text from an image? Maybe by using Opencv with python
@GracefulTalesPluto
@GracefulTalesPluto 9 месяцев назад
Hi, I would like to do a v-look up using two excel tables in python, don't want to do it in excel but would like python to do it.
@habibtmg
@habibtmg 3 года назад
hay, kindly make a video on paths where we are out of the same directory and can perform different calcutions. thnanks.
@MildlyAmusingComedyC
@MildlyAmusingComedyC 3 года назад
Awesome Simple video. Exactly What I was looking for
@user-jw7jb9sd5s
@user-jw7jb9sd5s 7 месяцев назад
hi my code is not working can you please help me import pandas as pd df=pd.read_excel(r"E: eg_cleaningfile.xlsx") for column in df.columns: df[column]=df[column].str.replace(r'\W',"") print(df)
@swatisingh4041
@swatisingh4041 2 года назад
Please make a video on removing elements and their corresponding numeric values from an alloy. For example, Al20Co5Fe6 Cr2, How to split these numeric values in individual columns?
@bennri
@bennri 3 года назад
A bigger problem are things like donâ€TMt which should be changed to don't. I found python's ftfy module to be useful for this problem which arises because Windows Excel uses windows-1252 encoding by default, but pandas interprets it as utf-8. It affects not only apostrophes but also accents like café, “quotes,” etc.
@geriray4412
@geriray4412 2 года назад
Hi, how do I apply this with multiple sheets from the same workbook?
@melinaballario2649
@melinaballario2649 4 года назад
Hi Derrick! I have a problem... (r'\W', '') deleted all the special characters but also it deleted the spaces ! how can i keep the spaces?
@sameerbhale6989
@sameerbhale6989 2 года назад
Replace it by “ “ instead of “” in replace
@snakesnarroz
@snakesnarroz 4 года назад
Thanks for this. Oddly my excel file has all columns formatted as text. However, running the script throws an error AttributeError: Can only use .str accessor with string values! This is due to one of the columns is not a string, its a float64 or something. Any advice on how to correct this?
@ShadabKhan-sn9xc
@ShadabKhan-sn9xc 3 года назад
Hi you are great May i please request you to make video on excel formulas using python and getting it upto last row?
@conscioussleeppill1093
@conscioussleeppill1093 3 года назад
Great 👍.. I got this useful for my Current code
@yibrahim7
@yibrahim7 2 года назад
Hey man, thanks a lot. great work, simple and to the point just like we want it. but I have a question, in last step, when I removed the characters from all the columns, I had one cell that was removed completely(originally it contained only numbers and was converted to text as per your advise) so could you help on the reason and solution please?
@raphaelokoye4310
@raphaelokoye4310 4 года назад
Nice video. Very explanatory. The report I generate on a weekly basis is so messy. The software churns out four reports on a single sheet. When loaded in pandas most columns are lost in the dataframe and their corresponding column names in pandas becomes unnamed.. still figuring out the best way to handle that report. I could clean it in Excel but want to automate it with python
@keifer7813
@keifer7813 Год назад
I thought typing "r" before a string makes it a raw string. I didnt know it had anything to do with regular expressions I thought you need to import the 're' module to use regex
@MAYUKHization
@MAYUKHization 2 года назад
If I only want to remove the white spaces , not any special charachters . what should I do?
@eking3469
@eking3469 2 года назад
How do we write code to not generating column A in the cleaned spreadsheet?
@jyotiali5670
@jyotiali5670 Год назад
I want to clean a column and replacing all the special characters with correct letters , how do I do
@rushas
@rushas 4 года назад
Thanks for the good content. The other day I needed to replace a non-empty column with non-empty column in the same sheet. Assume that we have these columns: col-A: Names col-B: Old phone numbers col-C: Email addresses col-D: Some other infos col-E: New phone numbers I simply want to replace col-B with the col-E. You know, in the Excel you can just select col-E and cut the column then select col-B and insert the new column and then delete the old data (col-B). I was wondering if there is an easy way to solve this kind of problems with pandas?
@terboonway4346
@terboonway4346 4 года назад
Thanks for this great video! May I know what text editor do you use? Im using Jupyternotebook, and it seems like your text editor is better, as it has auto fill up function
@howstuffworks-channel
@howstuffworks-channel Год назад
This is amazing! Thank you! I wonder if there is a way to remove non-english characters as well like those in Chinese etc. from a list.
@lulu_achoo
@lulu_achoo 2 года назад
could this work for emojis?
@deepakmehra8627
@deepakmehra8627 3 года назад
Some rows have three names and some rows have two name in one column. I want to split name in separate columns. How can I do this please help and make a video
@hemantsah8567
@hemantsah8567 4 года назад
Great Work Dude... Can you make a video on extracting and cleaning data from excel file? e.g. those excel sheets contain some row as headers or uneven number of rows.
@nkunam
@nkunam 4 года назад
You are the best Derrick. Thank you.
@sirfsimran482
@sirfsimran482 3 года назад
Hey Derrick, I have a question. I successfully run this code & it's working perfectly but i want to rap up this i mean i want to present it as a tool so how can i do it ? Can you please help me & i want every other individial as well can use it just by following some steps ? Awaiting your comment. Abhishekk
@cordularaecke
@cordularaecke 4 года назад
Great video, you explain things very clearly. I take your point about different approaches to apply cleaning function. You could perhaps elaborate about more advanced pandas like using lambda functions? For example: cleanup = lambda x: x.astype(str).str.replace(r'\W', '') df = df.apply(cleanup) # restore salary to integer value df.Salary = df.Salary.astype(int) # export to excel without index (not normally needed) df.to_excel('removed_characters.xlsx', index=False) What about grouping and aggregation? Or timeseries analysis ... I'm terrible at all of them ! :) Thanks again for great series.
@CodeWithDerrick
@CodeWithDerrick 4 года назад
Nice catch on returning the salary to int type, totally slipped my mind. I tend to stay away from Lambda's because my dislikes on the video tend to skyrocket haha. Very useful though, just hard to implement and not lose beginners. A full video on them in the future is a good idea though! I've done a little bit with groupby from time to time but it would be good to do a deep dive on more techniques. Timeseries are a must in the future too! Thanks for your kind words.
@srikanthp503
@srikanthp503 3 года назад
Hi Bro hope you r doing good.. i have list of columns in exceI file i need to remove special charecters in only one column.please help me
@ramprakash0
@ramprakash0 2 года назад
Can you show me how to replace empty spaces with nan in a data frame
@greenchiptechnology5
@greenchiptechnology5 4 года назад
Hi Derrick, great video Create one video for set timer and auto trigger py script to run excel work and save in excel as xlsb format.
@musamushi1733
@musamushi1733 11 месяцев назад
what if i actually want to remove number and special characters? plz reply mr @Derrick Sherrill
@ridingdatatoGenAI
@ridingdatatoGenAI 3 года назад
Hey Derick, great video. Could you assist around the situation wherein COlumn 22K 2234M 12K 23M Where 'M' occurs need to multiply by 1000 & where 'K' happens to leave it as it & the column type is string this needs to be changed to float
@MrStudent1978
@MrStudent1978 4 года назад
Can you please show how the word spellings can be corrected after removing special characters in an excel file...your video was really very helpful! Thanks for sharing! Respects to you from Punjab India
@Snooch5991
@Snooch5991 3 года назад
you just saved my homework, thank you good sir
@divyashreebg6124
@divyashreebg6124 4 года назад
i want split one EXCEL of 45k number of rows into multiple excels , split by rows of 15k each. how to do it
@manideep7717
@manideep7717 3 года назад
Hi, I am unable to convert -1 values to NAN in Excel can you help me
@saradadras6526
@saradadras6526 Год назад
Do you think you can post a code to remove a colored part of text?
@parshuramsinghthakur6405
@parshuramsinghthakur6405 4 года назад
Thank you Derrick for sharing this. It helped me a lot. Can you please help me on how to do this - Remove special characters from the column headings and replace all spaces with single underscore
@nicolasjousson5162
@nicolasjousson5162 3 года назад
Great content, thank you Derrick. Do you know if this would also remove Enter's from a cell?
@phil.pinsky
@phil.pinsky 4 года назад
Fantastic video, thanks for the great content! Very tangential question, but how are you doing your face at the bottom right corner during the screen recordings?
@CodeWithDerrick
@CodeWithDerrick 4 года назад
Thanks for the kind words! I'm using a green screen behind me. So I key the screen out and resize the headshot during editing with premiere pro.
@testbetadasa4367
@testbetadasa4367 4 года назад
Thank you Derrick! can we extract Excel charts(bar,stacked charts) to power point presentations(PPT) ,if possible can you make a video on it.
@nhibnguyen5547
@nhibnguyen5547 4 года назад
Thank you so much for sharing Derrick! This video is really helpful. Also, how can we remove the unwanted characters but still keep the space?
@CodeWithDerrick
@CodeWithDerrick 4 года назад
Thanks for your kind words! You would just change the regular expression, so the W in this script. Haven't tested it (So you might need to tweak it) but I think it would look something like this: ^a-zA-Z0-9_ Where the underscore would represent the white space. Happy coding!
@muhanadkais
@muhanadkais 3 года назад
@@CodeWithDerrick What about the opposite case where to remove only the spaces and keep the characters?
@radoonhadoon
@radoonhadoon 4 года назад
hey, what should i do if i need to remove all the special characters and also the numbers
@findthetruth3021
@findthetruth3021 4 года назад
How to create a dashboard in python by using importing raw data from Excel?
@muhammadnouman307
@muhammadnouman307 Год назад
THANKS BROTHER
@sirfsimran482
@sirfsimran482 3 года назад
Thank You Derrick Bro :) Keep up the good Work ;)
@waynefmj
@waynefmj 3 года назад
Very nicely done
@abhishekb1324
@abhishekb1324 3 года назад
How do u overwrite and save it in same file ?
@findthetruth3021
@findthetruth3021 4 года назад
I love your videos, but I want to make an interface data entry form by python which feeds the data to the sql server or excel shee or google sheet. Thanks in advance.
@andreswanepoel4826
@andreswanepoel4826 4 года назад
HI please do a demo on searching for specific characters in a cell (Search and contain)
@bayanalkhatib5817
@bayanalkhatib5817 3 года назад
Hello Derrick Do you think I can remove all junk characters except the normal characters using this script? I’m very very new to python and programming
@akshaykhoyani1336
@akshaykhoyani1336 4 года назад
Bro one thing i want to ask I want to collect lots of customer number from website but can't export it in excelsheet so that i want to collect it one by one through change domain path serial number and than i copy customer contact number and name but its too large data so its too boring work but too important for me so Can automatically change serial number in website path and collect contact number and name automatically from website to excel sheet. Please reply bro. Love from india.❤️
@customnotion
@customnotion 4 года назад
But for the r'' is for raw string?, Can we use it for regex also?
@jeff111
@jeff111 4 года назад
You said using the for loop isn’t the fastest. What are the other faster ways to complete the same “removing unwanted characters” task?
@jeffreyeng9503
@jeffreyeng9503 4 года назад
Yes, I want to know what is the other more efficient method. Your video is straight to the point with an example which is very easy to follow and fun to learn python. Is very useful for a beginner like me to learn python. Thanks.
@anelisabolosha9934
@anelisabolosha9934 Год назад
Thank you so much this is really great💟💯
Далее
Data Cleaning in Pandas | Python Pandas Tutorials
38:37
How to Move Data Automatically Between Excel Files
11:37
5 Excel Secrets You'll Be Embarrassed You Didn't Know
17:32
Introducing Python in Excel
19:01
Просмотров 1,5 млн
Pydantic Tutorial • Solving Python's Biggest Problem
11:07
Are You Still Using Excel? AUTOMATE it with PYTHON
7:19