Тёмный

Tutorial 11-Exploratory Data Analysis(EDA) of Titanic dataset 

Krish Naik
Подписаться 984 тыс.
Просмотров 311 тыс.
50% 1

Here is the detailed explanation of Exploratory Data Analysis of the Titanic. Finally we are applying Logistic Regression for the prediction of the survived column.
Github url: github.com/krishnaik06/EDA1
References from : Jose Portila EDA Materials And Kaggle
⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite for a few months and I love it! www.kite.com/get-kite/?...
Stats playlist : • Population vs Sample i...
You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
Packt url : prod.packtpub.com/in/big-data...
Amazon url: www.amazon.com/Hands-Python-F...

Опубликовано:

 

16 янв 2019

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 282   
@aakritiroy7336
@aakritiroy7336 3 года назад
After so much of struggle with my LMS, I was finally able to understand entire EDA in within 30 minutes. Thank you.🙏👍
@bhadrakadabra
@bhadrakadabra 3 года назад
Is it the inmovidu one?
@ytg6663
@ytg6663 3 года назад
What ia LMS
@ONE-THING-2RAY
@ONE-THING-2RAY 3 года назад
@@ytg6663 learning machine shorts
@shinosukenohara.123
@shinosukenohara.123 3 года назад
@@ONE-THING-2RAY Where it is?
@akashravindra..
@akashravindra.. 2 года назад
@@ytg6663 Learning Management System
@Esha25ghosh
@Esha25ghosh 3 года назад
You are awesome sir! Not only are you a great mentor, but also a great motivator. Thanks for all the great work you have been doing. Stay blessed!
@chaos8514
@chaos8514 Год назад
I am learning this for data analyst but not sure what more should I learn to get job asap.. if you can help please we can connect on instagram
@PiyushSingh-cq2xv
@PiyushSingh-cq2xv 3 года назад
This is one of the best data set being used to understand how to fix the nulls. Great Job and thank you .
@VVV-wx3ui
@VVV-wx3ui 4 года назад
Doing a job that of True Guru, Ekalavyas are all around and raring for such knowledge-impartation. Thanks much Krish.
@aliakbarrayhan6389
@aliakbarrayhan6389 4 года назад
Sir I'm very impressed to see your such amazing video.. Though I am very weak in programming but now I feel like that i should start my programming journey again cause i have someone like u who can explains anything in very simple way
@souvikdas3905
@souvikdas3905 4 года назад
What a beautiful video for a beginner who is just getting his hands on data science.
@sunnychandra5064
@sunnychandra5064 5 лет назад
You have actually cleared the EDA concept for me, Thanks a lot !!
@ShivamChaudhary-jn4kw
@ShivamChaudhary-jn4kw 6 месяцев назад
why 0 and 1 is taken in cols as the indexing of the column is 2 and 5 then why 0 and 1 is taken can you clear
@brainfuck007
@brainfuck007 4 года назад
You are a gem! Making india learn ML. Thank you for all the stuff you do for us. :)
@vital4statistix
@vital4statistix 3 года назад
Krish, This material is FIRST CLASS. Appreciate it very much.
@classicemmaeasy2292
@classicemmaeasy2292 Год назад
Me trying to understand data analysis with python couple of days ago now U actually make it simpler and beginners friendly, more unction to function sir
@girishmahamuni1830
@girishmahamuni1830 3 года назад
Thank you for providing knowledge in a simple way.
@theayodejipopshow
@theayodejipopshow Год назад
This video is amazing. Thanks so much for sharing your wealth of knowledge.
@sudeeprajput1830
@sudeeprajput1830 3 года назад
You are amazing brother. Your videos are helping me gain confidence in ML. Keep up the good work
@pravinmore434
@pravinmore434 3 года назад
Thanks a lot for the very detailed lesson Sir.. that was really fruitful and helped me complete one of my project. Thanks a ton..
@imranullah7355
@imranullah7355 3 года назад
Thanks a lot Sir... You've expailed it in a great way... Love from Pakistan
@akanshabhandari1062
@akanshabhandari1062 3 года назад
Very helpful..... U did a lot of hard-work for us.... Thnk u so much sir🙌🙌🙏🙏..... And ur way of teaching is very good that is form basic
@RajatSharma-ct6ie
@RajatSharma-ct6ie 4 года назад
Great work sir, learning a lot from your videos, please upload more videos on EDA..
@abhinavmahajan448
@abhinavmahajan448 3 года назад
Thanks for the detailed video. Really helpful :)
@vinothv8514
@vinothv8514 5 лет назад
Nice work Mr. Krish...... It's really helpful
@RahulRoy-qy8rk
@RahulRoy-qy8rk 4 года назад
This was so helpful. Thank You
@ifhamaslam9088
@ifhamaslam9088 4 года назад
Superb explanations.. And interesting to learning
@rupeshnandanyadav8108
@rupeshnandanyadav8108 2 года назад
Awesome tutorial on Exploratory Data Analysis ❤️❤️
@venkatadeviprasadkankanala7387
@venkatadeviprasadkankanala7387 4 года назад
Very nice one thank you very much for sharing valuable information
@GauravVerma-jk6cf
@GauravVerma-jk6cf 3 года назад
this was really one of the most usefull stuff avialable !!!!!!!!!!!!!!!
@AshishRoy
@AshishRoy 2 года назад
Very nicely explained. Awesome
@lavanyameesa6432
@lavanyameesa6432 2 года назад
wonderful explaination
@ganeshrao405
@ganeshrao405 3 года назад
Really helpful, Thank you soo much.
@garvitjain4106
@garvitjain4106 3 года назад
@Krish You are doing an amazing job.
@pandian3731
@pandian3731 4 года назад
Another great video very useful one bro like NLP.. 📍
@thePrabhuChannel
@thePrabhuChannel 3 года назад
21:30 Median of the passenger age travelling in each Pclass can be calculated using below code instead of looking at boxplot and guessing the number. df[df['Pclass']==1]['Age'].median() df[df['Pclass']==2]['Age'].median() df[df['Pclass']==3]['Age'].median()
@viveksingh881
@viveksingh881 3 года назад
good one brother i was thinking the same y to guess it when we can actually calculate it,....
@tusharmahuri2439
@tusharmahuri2439 2 года назад
There is a error comes when I want to use sns.countplot. And the error is "could not interpret input 'survived' "
@yashikaarora8573
@yashikaarora8573 Год назад
@@tusharmahuri2439 bro copy the heads from the data set and not just type, the language is case sensitive it is 'Survived' and not 'survived'
@saylisuryawanshi3989
@saylisuryawanshi3989 4 года назад
great job sir, please do make more such videos for practising for beginners .
@ManishKumar-gg2vm
@ManishKumar-gg2vm 4 года назад
awesome explain ...........I really can't stop myself to comment on this video...……...on of the grt video on data visualization
@Sab_Moh_Maya_Hal
@Sab_Moh_Maya_Hal 4 года назад
very knowledgeable,thanks man :)
@mssnal
@mssnal 3 года назад
Great one Krish. Basically covers most of the things a beginner needs to understand.
@unnatiraut9553
@unnatiraut9553 Год назад
Great to understand. thanks alot
@ShubhamJain-in6sz
@ShubhamJain-in6sz 4 года назад
Great work sir!!👍🏻👍🏻
@tusharikajoshi8410
@tusharikajoshi8410 Год назад
hey @Krish! Should we do this data visualization for each and every column? or we do it after feature selection? if we are supposed t do for each column, wouldn't the code get to big and complex for data with hundreds or thousands of features?
@VengalraoPachavaedu
@VengalraoPachavaedu 5 лет назад
I have seen some of your videos, excellent work. I really appreciate your work Mr. Krish Naik.
@naveenrawat6505
@naveenrawat6505 3 года назад
loving the playlist :)))))
@pedrocrespo2681
@pedrocrespo2681 3 года назад
Pretty nice explanation !
@vinniKP
@vinniKP 2 года назад
Hi Krish, Should Univariate and bi-variate analysis be done before null values imputation or after that?
@premkishanmishra1574
@premkishanmishra1574 6 месяцев назад
loved your video , far better than the uni teachers :P
@MrKmdmustaq
@MrKmdmustaq 4 года назад
Can u please make a video on treating the outliers, this will help us a lot in solving the problems
@saifkhan4541
@saifkhan4541 4 года назад
Thankyou sir it is very helpful 😊.
@devanshusharma9386
@devanshusharma9386 4 года назад
very helpful for beginners
@gkmadhav
@gkmadhav 3 года назад
Is there a part 2 and 3 for this video, about feature engineering on the same dataset?
@honey9111
@honey9111 4 года назад
Thanks a lot Kris. EDA was well explained. I could not understand the last part starting from confusion matrix and how to read the final result of the analysis?
@fancy4926
@fancy4926 3 года назад
In some cases, I use label encoding etc to change a character column into numbers. When using dtypes, it says that column is int32 (or int 64 or float), I think it actually should be categorical and then I can use it for ML. Is that right that I should use astype('category') to convert the format and then I can use ML?
@arniloy9358
@arniloy9358 2 года назад
there is another null left in embarked column in 831st entry. it still shows in the heatmap, while in the video this doesn't show.(25:07) and if I continue this path, do I apply the same method of removing age nulls(defining a class) or should I just replace the average value directly by redefining the index of the null(as it is just a single cell)?
@karthikeyanradhakrishnan3219
@karthikeyanradhakrishnan3219 4 года назад
I have one question, why didn't you use feature scaling for Age and Fare?
@yashkhilavdiya5693
@yashkhilavdiya5693 2 года назад
Thank You So Much
@classicgd
@classicgd 3 года назад
Hi Krish thanks for the videos... do you have a playlist explaining all algorithms ?
@MrArvindSaha
@MrArvindSaha 4 года назад
At IN[26]- box plot results, straight line(2nd or 50th percent quartile) inside rect box, you are saying mean value, is it mean or median?
@sohamdeshpande3654
@sohamdeshpande3654 3 года назад
Thank you very much!!!
@ashishgoyal7020
@ashishgoyal7020 3 года назад
Thank you Krish.
@aination7302
@aination7302 3 года назад
Both imputing and dropping missing values (NaN) is not a good practice with real world data. The ideal way is to derive a new field indicating missing values. 1 for missing else 0. because, sometimes missing value can be a new information in itself. just sharing some learning from my job :)
@okonvictor8711
@okonvictor8711 2 года назад
Hi please do you mind sharing how to do that here. Or can I reach you via email?
@waqarmehdi4394
@waqarmehdi4394 2 года назад
Yes, it depends upon the dataset and problem you want to solve. In this case, dropping the null value is the best possible option in my opinion.
@asfandyarsaeed6402
@asfandyarsaeed6402 2 года назад
hi Krish do I need to do shipro wilk test to check the normality as its not normal if you apply this test on age column
@anjalis4016
@anjalis4016 2 года назад
Sir, we can only use seaborn for inbuilt datasets available in seaborn? After data cleaning i am unable to use seaborn please help me
@priyansharajsinha1159
@priyansharajsinha1159 4 года назад
Could you please provide the link of the videos you are referring for logistics regression and confusion_matix.
@gunjanmishra6673
@gunjanmishra6673 2 года назад
Hi.. can you suggest some other data set that can be used for implementing all these functionalities.
@babupatil2416
@babupatil2416 4 года назад
Hi Krish, Please create some more videos on EDA, it will be helpful.
@shubhamthapa7586
@shubhamthapa7586 3 года назад
i have a question why is he not using SimpleImputer class from scikit learn instead of finding the realtion to make the nan values having some values we can easily do it through sklearn module and also why isnt he using label encoder for binary values ???
@pramishprakash
@pramishprakash 2 года назад
Thank you sir
@bhavanshah1368
@bhavanshah1368 3 года назад
@Krish Naik : Hi Krish, could you please explain why Age assigned cols[0] and Pclass cols[1],??I have not understood this
@bharath_v
@bharath_v 3 года назад
Good One!
@louerleseigneur4532
@louerleseigneur4532 3 года назад
Thanks Krish
@paulohenriquews
@paulohenriquews 4 года назад
Thank you!
@naveengoud3264
@naveengoud3264 4 года назад
Best explanation
@pringlesss7701
@pringlesss7701 4 года назад
for imputing the age, why did you define your own function and not simply use SimpleImputer?
@puneetyadav2951
@puneetyadav2951 3 года назад
for me logical regression hows blank everytime," [ LogisticRegression()] " like this what to do ???
@ds-hy9nc
@ds-hy9nc 4 года назад
when i try to apply my functinon (23:20)it is showing unexpected EOF while parsing
@aasthasingh67
@aasthasingh67 3 года назад
How do you know for one kind of result, which plot to use exactly?
@tumul1474
@tumul1474 4 года назад
this is beyond amazing....amazing place to learn and to revise the impn techniques
@joelbraganza3819
@joelbraganza3819 3 года назад
Why do we need to get dummy variables for binary class variables like Sex and Embark, and why didn't we treat the variable pclass with One-hot-encoding, is it because we are treating it as ordinal, but wouldn't it cause problems with linear-regression and DNN algorithms to apply over it? Let me know Sir. Thanks.
@TheKhubaib313
@TheKhubaib313 3 года назад
thank u soo much
@parisworld4326
@parisworld4326 3 года назад
for me lbfgs is failed to converge on the local minima . How to fix it. i believe more categories need to be labeled like Pclass and standard scaler is required for age and fare .
@aryanrana5658
@aryanrana5658 2 года назад
My doubt is When u are apply that 'Age' and 'PClass' apply function ,but in that what is the use of axis=1. Could u plz explain that.
@Parshant17
@Parshant17 3 года назад
Are you sure that is average in boxplot near 20th mintue? Because when we talk about percentile then 50%ile should be median.
@MrTANKASALA
@MrTANKASALA 4 года назад
Age and passenger class have negative correlation score but why did you still fill missing values for Age based on Pclass?
@biswajitsahoo1542
@biswajitsahoo1542 3 года назад
Fantastic
@dipeshlimaje8998
@dipeshlimaje8998 Год назад
sir im confuse coz we are predicting survival so it is 0 and 1 which means means its a categorical data and we r solving with regression
@prabhakarakshay
@prabhakarakshay 2 года назад
Sir, is there any way this countplot be converted to percentage, like how much percentage of male survived and died and similarly how much percentage of female died and survived? If no of males is too large compared to that of females, countplot will not present a clear picture as scale is very different. Can we convert it to percentage?
@raj345to
@raj345to 3 года назад
THANKYOU SOMUCH !!!!
@mohamedshathik8045
@mohamedshathik8045 2 года назад
Hi krish, You didn't drop the passenger ID column before fit the logistic regression model cause it doesn't contain any information.
@AliRaza-zd7eb
@AliRaza-zd7eb 3 года назад
sir in concate of df1=pd.concat([df1,sex,embarked],axis=1) there is lot of NaN value occur in male Q and S columns. How can i solve this problem
@truptigedam7983
@truptigedam7983 4 года назад
Can you please tell me when to use juypter notebook and Spyder IDE...
@prashanthi8492
@prashanthi8492 4 года назад
Sir,why you didn't treat outliers in age column? Could anyone tell me,I have deadline for project tomorrow!
@LearnwithNaviOfficial
@LearnwithNaviOfficial 4 месяца назад
@krish Naik we drop the age column then how again age column occur
@madhumeenamk6531
@madhumeenamk6531 3 года назад
Can regression be done using unsupervised algorithms?
@gangasekar3224
@gangasekar3224 2 года назад
Instead of mayplot lib and seaborn can we use powerbi
@sowjanyadharmavarapu2653
@sowjanyadharmavarapu2653 3 года назад
sir i really liked your video.. but according to road map video, you asked us to watch python 1-24 lectures first..in this eda concept, you have mentioned some new words like get_dummies, and few other new words.. stuck with the last 10 mins explaination.. else everything is really clear and understandable.. thanks for all the efforts...
@dynamictechnocrat
@dynamictechnocrat Год назад
Get dummy are use in pandas
@ashridas9896
@ashridas9896 Год назад
It is basically one - hot encoding.. Encoding techniques are used to convert categorical data into numerical data Since it is applied on 'Embarked' column ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-OTPz5plKb40.html
@pepetisiddhardha9848
@pepetisiddhardha9848 3 года назад
I didnt understood why categorical features disappeared in training data for logistic regression
@nabeelsj3631
@nabeelsj3631 3 года назад
Hi Krish, Upon analysing the titanic data, could see one missing value is there in Embarked column. Since there is only one value missing, it was hard to find it via visualisation. On cheeking the percentage of null value, i could find it as the below: data.isnull().mean() * 100 PassengerId 0.000000 Survived 0.000000 Pclass 0.000000 Name 0.000000 Sex 0.000000 Age 19.865320 SibSp 0.000000 Parch 0.000000 Ticket 0.000000 Fare 0.000000 Cabin 77.104377 Embarked 0.224467 dtype: float64 Could you please confirm whether this can be ignored?
@AlphaGodzilla1
@AlphaGodzilla1 2 года назад
After doing model.fit(X_train,y_train) I am getting this error ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
@shishirrd
@shishirrd 2 года назад
Great explanation video! Had a question, would using a Label Encoder for the features make any difference to the results? I obtained a 82% score with 80:20 train, test split on a normal DT model. I'm not enough of an expert to predict why this would happen though. Any comments would be appreciated!
@kkckk4360
@kkckk4360 5 лет назад
can please make the video on hypothesics testing in stats
@aradhyakanth8409
@aradhyakanth8409 2 года назад
Sir, what is the need to visualise the data in this problem. You haven't use any analysis extracted from the visualisation to get help out in data cleaning.
@vamshikrishna5333
@vamshikrishna5333 2 года назад
Hi Can anyone help me with difference beyween EDA of this Titanic Dataset and EDA of Housing Price Prediction. Both follow a Different Steps. Iam quite Confused. Will Really appreciate any help.
@samyakkumarsahoo8706
@samyakkumarsahoo8706 3 года назад
It was a resourceful video. But why EDA is done before train-test split ?
@shantonuchowdhury2169
@shantonuchowdhury2169 3 года назад
can you please explain who you know the Embarked C = 00, Q = 10, s = 01. thank you.
Далее
How To Become Expertise in Exploratory Data Analysis
10:05
ШОКОЛАДКА МИСТЕРА БИСТА
00:44
Exploratory Data Analysis with Pandas Python
40:22
Просмотров 439 тыс.
How I Would Learn Data Analysis (If I Could Start Over)
10:59
Exploratory Data Analysis
5:02
Просмотров 51 тыс.