First of all thanks to you, I learned ffill, bfill and interpolate functions from here. But it's recommended from many professionals that missing values should be imputed with mean, median & mode.
Please explain to fiilna or replace zero with mean value by groupby... Means you have 3 groups in data frame and you want to fillna with respective to group mean
Could you make a tutorial on Big Data as well, for situations with e.g. 500k rows and 200 columns where you don't see all of your data and don't know what kinds of Nan values to expect and therefore can't name them textually? Big thanks in advance :)
It’s a **kwark called “chunksize=“ the integer passed to it is the amount per chunk. So if you select 1000 and have a df of 500k. It would load 500 times. In pieces
How to get how many type different type of value is there to put in na_values? I mean to say the value you have mentioned for missing_value.. how you are getting that.. we cant check the file if that has huge data
i have question i wrote print(os.listdir()). but i got many files that is inside my jupyter. may i know how can i import my csv file that i have clean.
Hello sir, thank you so much for the tutorial. I'm actually stuck since my source in a CSV file. Except that sadly the file I'm working is extremely complex with indefinete columns since my main columns are repeated everyday based on the date. I've been stuck on this problem since over a week. Is there a way I could reach out to you and have your mail to maybe help solve this problem? Thanks a lot in advance.
The dataset which you have is having fewer instances what if we have thousands of rows of data how to find Nan, and Na there in the dataset ...? if you see this please respond ASAP
I have a csv file and when i am using concat function it automatically name unnamed group 1,2,3... Also the alignment gets messy with songle line of code How to fix it
Hi,very nice explanation. I am totally new to python. Can you pls make a tutorial on how to install jupyter and all the other required libraries to perform forecasting.
Great tutorial! No need to hesitate on referring to the code snippets btw… I don’t think any sane person watching this has the expectation for you to memorise a to z what you want to articulate…
Hii Soumil, right now I'm working on language translation project for that I have collected the data, but I'm facing preprocessing data could you please help me with that.
Sir, i'm still new in python and this data cleaning thing. And i want to ask what is 11 in df11 ?? is it some kind of function ?? and i also don't understand the snippet concept
Sir, by running the data cleaning code in jupyter notebook by following the same code instruction given by u, when i run the code in the output it is not showing unnamed:0 temperature humidity & in my jupyter system it is showing such as v1 &v2 in the output.Why it is so?can u plz explain.
I also faced the same problem. When we creating a new csv file there is no unnamed 0 : column... But if we saved the same file as csv into a folder it will create a new column lke unnamed 0: If we read this data the output will be like in this video.. If repeated each time one extra column will add. For avoiding use index= False while saving a code. It will work
This video really helped me a lot, but I still got more to understand. I've zero basic knowledge on this. I'm working on a thesis which needs some coding to complete. I've few questions to what you've explained in this video; 1. What if there are lot of dataset and how do you define the missing value for each? 2. What was that in the missing value you defined "np.nan" ? As I said earlier, I'm working on a project which is about human-in-the-loop code. Initially I'll be given the dataset and have to figure out a code to include human for feedback from system (Reinforcement Learning). I would like get your response, and if possible any helpful idea or suggestions on the project mentioned above. Thank you
1. You can use separate lists (with diff variable names,) or a single list as a master list for all the datasets. Depends on the said datasets and the data they contain. 2. np.nan is the "NaN" value in the dataset, which means Not a Number. So basically the np.nan returns a float object whose value is NaN. Hope this helps.
1) u can also use unique function to get only unique values Example df['Customers'].unique in the example above u will get all the unique values in the column 'Customers'
sir, facing a issue in a code to convert the variable from object to integer in jupyter notebook:- it shows the error:-invalid literal for int() with base 10: '-'
AttributeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 pd_cleaned = pd.dropna() AttributeError: module 'pandas' has no attribute 'dropna i can't find drop na