Coding in #python and #pandas you can easily clean messy string columns with some built in methods. Here we show an example of cleaning address values to make them more standardized. #datascience
I love pandas for cleaning up problematic data sets before feeding them into a model. It's just so dang satisfying to see messy data turn into nice consistent points on a scatter plot :D
I would omit the " " in str.split (it defaults to split on all whitespace). Though it doesn't matter that much if you do n=1, if you have messy data, chances are there are double spaces, too, which may give you empty strings that may cause issues down the line.
The second line of code raises an error for me (TypeError: string indices must be integers). Does anyone know why this happens? When I'm not trying to reassign the column it works just fine.
Any suggestions on dealing with date strings? I can’t seem to parse them into a date object to save my life. Formats all over the mmddyyyy yyyymmdd. Nightmare.
I will do it here and send it to you later. My code here is direct, when I use this dictionary { 'Y': 'Yes', 'N': 'No'...} on a column using replace works, but in this case it needs to be handled a bit first.