Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos
I know im randomly asking but does any of you know of a method to log back into an instagram account? I was stupid forgot the account password. I appreciate any help you can give me!
@Stetson Davian thanks for your reply. I found the site through google and im trying it out now. I see it takes quite some time so I will get back to you later when my account password hopefully is recovered.
I'm a beginner but your way of teaching makes me love machine learning, I feel it's so easy. Even you make me understand how the algo is working behind the scene. Love from India...
This is unreal! I literally abandoned my datacamp machine learning course for this one and no regret at all. I especially like that you taught the underlying mathematical concept of how these codes come to be. You also speak clear and understandable English plus the sound system is top notch. I've taken your Data science course and your and prof Allen's remains my best to date with Hugo's coming in a distant 3rd. And to think you recorded this more than 7 years ago makes you conclude that this is way ahead of its time
MANY THANKS!!! All other data science tutorials (for beginners) go by way to quickly. Some people may find you going slowly a nuisance, but I found it to be EXTREMELY HELPFUL. THANK YOU! Subbed ^__^
I was searching for appropriate videos on ML from long time. After following this series i can say that it is the best which i have ever seen.Each and every concept is covered with great detail. Same applies for study material and links. Thanks Data School .....!!!!
You're a way better instructor than my college professors. The syntax is fairly simple and the explanation of the statistical intuition behind the metrics made this enjoyable.
*Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos
+Siddharth Gupta For people who have random chunks of exposure to certain aspects of sklearn/pandas/etc: watch the video at 1.25 or 1.5x speed. You can get through the lesson faster, and the increased speed will actually have a counterintuitive effect of making you focus more. Also when you start losing focus or miss a concept, you will notice right away because you will suddenly be totally lost, so you will know to rewind.
Thank-you so much for your explanations of sk-learn, it finally makes sense to me! I'm already pretty familiar with Pandas so I'd love to learn more about sk-learn, because I feel there are so many other machine learning algorithms I'd love to get my head around.
wonderful videos! I would like you to focus on scikit-learn, and your style of teaching which combines hands-on with scikit-learnt, real examples, explanation of ML techniques are very helpful!
Your video tutorial is outstanding! You can simplify complex concepts in an elegant manner. And unlike other instructors you don't show-off on how smart you are. That's why we know that you're really a smart guy :)
This is the best video tutorial series on Machine learning I have seen. You have hooked me up! Thanks for creating the series and you are an amazing teacher. Keep it up!
More pandas please! And more Seaborn! A large part of Machine Learning is "messing" with the data BEFORE you apply any of the algorithms on it, and pd and sns are really good at that. Also, I think it'd be interesting (maybe latter in the series) that you could go on an all out example, like working with the titanic dataset from Kaggle, and giving hints on how to visualize, understand the data and choose the best algorithm for it. As a final note, I'm already a bit familiar with the techniques you use, but your comments and clear explanations makes everything clearer and helps me fixate some of these techniques. Thank you for that! Excellent series, and keep on the good work.
Antonio Augusto Santos Thanks for the feedback! I am planning to cover more examples later in the series, probably using a Kaggle competition. And, I appreciate your kind words! I was hoping to reach both users new to machine learning and those with some machine learning familiarity, so it's nice to hear that it's working :)
Thank you for the awesome videos. I am currently learning Machine Learning as part of a course. I don't have previous knowledge of Python (currently learning an introduction to Python as well), I am really struggling to understand; this is my midterm break; I found one of your videos while I was searching, I am one of the fortunate to found your videos. Thanks for your effort.
Say one thing....you are an excellent teacher. My teachers at engineering school and on Udemy don't explain things half as well as you do! That should tell you a lot! I wish I could hire you personally.
really thankful for your video series. it is straightforward and easy to understand, highly recommend to other guys who are interested in python, machine learning etc.
As for an answer for your question: I would like to learn more about sklearn. Pandas is amazing, and I'm just starting to learn it, but there are already a lot of nice tutourials out there. Keep up the good job :)
This is one of the best available online resource for introduction to data science. Thank you for these amazing videos. Its teachers like you who inspire students like me :)
Great work,please upload more tutorials lyk these,really helpful to get started. Before watching this tutorial i was not at al aware of ML,but now after watching 4/5 videos i've got a good overview ,thank you
Very very great way teaching. I really liked the speed and pronounce you do, the possible mistakes which you cover, also explanation. This is great series and you are a great tutor. Fan of you and subscribed. Please make a separate series on Machine Learning (Bit more detailed), Deeplearning, AI, Data Science. I am not sure which one should be learnt first and how. I decided you are the best guru for me to make me some good level in all these skills. Please help.
Hi Kevin, First of all thank you very much for those great videos. If you have a chance to make tutorial regarding deep learning it would be great. You are the best instructor, I've ever seen in this field. You are the best
I am answering your question 5 years later but I would love to see more video tutorials from you about scikit-learn (e.g Neural network models (supervised)) or scikit-multilearn if you want!! :) Thnx a lot Kevin!
These videos are outstanding. Am new to data science and many of the videos are too simple or too hard. You have found the goldilocks zone of data science. I also like that they are on youtube where I can speed them up to 1.5x to match my comprehension rate.Vimeo can't do that. I would like you to focus on Scikit, but use Pandas as most of use will be using both. I think a single lesson on how to use Pandas, as well as how to customize Ipython/Jupyter, would also be useful. I'd also like to see a video focused on data sources and on how to approach complex problems (ala kaggle challenges) Improvement suggestions: 1. Focus on technnical quality. Use basic stage lighting (difussed above, side, front, w/ reflector) and a condensor mic to better pic up your voice w/o echo. 2) put a whiteboard or suchsimple background behind you - way to much background clutter. And I think you are missing an opportunity to end with marketing your courses at data school, your book, etc.Not that I love ads, but... marketing!
Harvey Summers Thanks for all of the suggestions, and your kind comments! Very helpful. Building up to more complex problems is definitely on the list. And, it's nice to know that I'm hitting the "sweet spot" in terms of difficulty level.
Cool video!I just finish your pandas video series, but I thought pandas should be learned before the sklearn, well, anyway thank you for making such great videos for us.
Your videos really helped me understand the sklearn basics easily. It would be great if you could do a similar video series on SVMs using scikit-learn and its applications. Your explanations and methods are great! Thanks a lot!
Pretty amazing video! +1 for sk-learn as next video in this series. I also think that plotting stuff helps a lot. Whenever possible it would be nice to show seaborn in action. Great job and looking forward to the next one.
Watched all your videos. Your teaching skills are amazing, thank you for compiling those videos. I'm looking forward to your next videos about machine learning using sklearn.
+AvivProg Wow, thank you! You are very welcome -- I enjoyed creating the videos. Here is the playlist containing the entire video series: ru-vid.com/group/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
Excellent and straight to the point content again. Thanks a lot for the videos and also the additional references you provide. It's always good to know where to go next :) And please continue on with scikit-learn rather than pandas/seaborn.
Absolutely amazing material, thank you Kevin! I just wanted to know how would you deal with non-numerical features (i.e Gender, Occupation, Education, etc.) when constructing your ML model? Would you assign them numerical values? If possible, I'd like some guidance or a push in the right direction. Again you explain this material much better than most channels do, please keep up the phenomenal work!
Great video once again. I think the focus of this series should be on ML and Scikit learn. You can explain the relevant pandas code wherever required as you did in this video. One question: Is there any algorithm in ML which can select the most relevant / explanatory predictor variables (features) from the data set (instead of user using trial and error approach)? I think this is critical for the data sets with high number of features
umair durrani Great question! There is no "silver bullet" for feature selection, meaning no single strategy that will always tell you which variables to keep in your model. Domain understanding, data exploration, and human intuition are key. That being said, the Random Forests model will give you a measure of "variable importance" (on a scale of 0 to 1), and you could use that to guide the selection. As well, regularized linear models will shrink coefficients down to zero as the "penalty term" increases, effectively performing feature selection. Just keep in mind that both need to be tuned to perform properly, and features need to be scaled when performing regularization. scikit-learn has some more guidance on feature selection here: scikit-learn.org/stable/modules/feature_selection.html Thanks again for your kind and helpful comments!
umair durrani Umair, there are several useful techniques for feature selection that I recommend you look into. Statistical methods such as forward- and backward-elimination are perfectly suited for determining the most predictive variables in a regression model and easy to understand and implement. Decision Trees inherently perform feature selection in that the variable splits are deemed significant and automatically chosen by the algorithm. A bit more on the complex side are Principle Component Analysis (PCA) and Association Rules which I believe PCA is in sci-kit-learn. Good luck! Darron. www.linkedin.com/in/votefordata
+Data School Could you please advise in another course more about Feature Selection? Which models are more suitable for several cases etc. Like for example, sorting features' scores from RandomizedLasso, or by ranking from RecursiveFeatureElimination, or by selecting K best?
Great content, you have an inspiring way of presenting, keep it up! I have one question though, why is the TV coefficient smaller than the Radio coefficient, even though from the plots and best fit line it looks like the sales go up faster with more TV ad spending?
Great video!! Thanks for that. I'd like to keep learning about Scikit-learn. Although, Pandas is also definitely a powerful Python data analysis toolkit.
It's wonderful tutorial ever I seen regarding machine learning. I expect more videos related to machine learning. if you made some video regarding some optimization technique of linear regression, then it should be more beneficial. ( like bfgs etc )
Hi Kevin, I'm new to both Python and machine learning. Your tutorials are great learning materials. I understanding this is a 5-year old presentation and I'm wondering if you would still answer a question I have related to this tutorial. Specifically, when I was trying to get the pairplots you demonstrated, I got the following error: KeyError: "['Sales'] not in index" and I got three blank boxes. What was wrong? Many Thanks for your help. FYI, I also tried to find answers by Googling online and haven't been able to find any answers that work.
Thank you so much for the video, really great introduction to Pandas and SKlearn, I hope you can focus more on the sklearn with pandas dataframe, again, thanks for the great video!
Thanks for good video ! Will be great if you can in a future video take any data set from some kaggle competition any try to work with , feature engineering is an interesting issue too. Two technical notes : - for people who works with proxy , to install seaborn with anaconda have to define http/https proxy first , so on anaconda prompt execute following command : "set http_proxy=X.X.X.X:port_number" - for Python 3 users zip command looks like : "list(zip(feature_cols,linreg.coef_))"
Hello! Would you happen to have a video on how to create a logistic regression model using scikit learn LogisticRegression() to solve 2 or more independent variables to predict a dependent variable?
Great tutorial!! After watching this and looking at the sklearn docs, it seems as if the LinearRegression() object has only coef_ and intercept_ attributes. Does sklearn not provide metrics such as standard errors, t-statistics, p-values, and R-squared? If not, what is the reasoning behind it ? Thanks.
Troy Walters Thanks for your comment! You can indeed compute R-squared using the r2_score function in the sklearn.metrics module. Regarding the others, I think the scikit-learn contributors would argue that those metrics belong in a statistics library, not a machine learning library. Here is a relevant discussion from the scikit-learn mailing list: www.mail-archive.com/scikit-learn-general%40lists.sourceforge.net/msg13102.html
Fantastic tutorial series for PYTHON beginners ...Can you please start teaching us deep learning and neural network? I learn PANDAS, Numpy from your tutorial.. Thanks a lot man
Thank you for your explanation, it's very clear. But what I don't understand is that you say the algorithm you are working with is called linear regression. But if you predict the dependent variable(Y) from multiple independent variables (x1, x2 etc.), then we are dealing with multiple linear regression right? Can you please explain why that is not the case?
Guys if any one is getting error on this line : sns.pairplot(data,x_vars=['TV','radio','newspaper'],y_vars='sales' ) you need to mention the exact same column names in x_var and y_var attributes.
First of all I would like to thank you for these amazing videos :D My question is, do you know why this is an issue now and why you don't have a problem with it in your video?
Your material is second only to "Introduction to Statistical Learning" so far for me. You know your subject "to the core" and recommend resources that I have already collected (so I will value your judgement :). Do you have a recommendation for a cheat sheet for hacking around in python notebook? I'm keeping all my notes there, but only just learned how to add an image when you showed the iris. I really don't have time to read up on python properly. Thanks!
+V Kandinski What a nice compliment, thank you! Are you asking for a cheat sheet about the notebook itself, or the Python language as a whole? For the notebook, I have a brief list of keyboard shortcuts here, plus links to some good resources: github.com/justmarkham/scikit-learn-videos/blob/master/02_machine_learning_setup.ipynb For the Python language, this is kind of like a cheat sheet: www.dataschool.io/python-quick-reference/ Hope that helps!
Thank you for the awesome videos, clear and to the point. However, I have a question regarding the retraining for the feature selection part (starting 30:31) : Won't it introduce data snooping bias when retraining to pick for different features?
SungDeuk Park For self-study, this book is excellent if you want to go deeper into machine learning: www-bcf.usc.edu/~gareth/ISL/ For getting better at Python (especially Pandas), this book is very good: shop.oreilly.com/product/0636920023784.do