Тёмный

Getting started in scikit-learn with the famous iris dataset 

Data School
Подписаться 241 тыс.
Просмотров 252 тыс.
50% 1

Now that we've set up Python for machine learning, let's get started by loading an example dataset into scikit-learn! We'll explore the famous "iris" dataset, learn some important machine learning terminology, and discuss the four key requirements for working with data in scikit-learn.
Download the notebook: github.com/justmarkham/scikit...
Iris dataset: archive.ics.uci.edu/ml/dataset...
scikit-learn dataset loading utilities: scikit-learn.org/stable/datasets/
Fast Numerical Computing with NumPy (slides): speakerdeck.com/jakevdp/losin...
Fast Numerical Computing with NumPy (video): • Losing your Loops Fast...
Introduction to NumPy (PDF): www.engr.ucsb.edu/~shell/che21...
WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:
1) WATCH my scikit-learn video series:
• Machine learning in Py...
2) SUBSCRIBE for more videos:
ru-vid.com?su...
3) JOIN "Data School Insiders" to access bonus content:
/ dataschool
4) ENROLL in my Machine Learning course:
www.dataschool.io/learn/
5) LET'S CONNECT!
- Newsletter: www.dataschool.io/subscribe/
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham

Опубликовано:

 

10 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 308   
@dataschool
@dataschool 3 года назад
Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos
@mliuzzolino
@mliuzzolino 8 лет назад
Your clarity in speech your thorough explanations are absolutely outstanding. I've spent 5 years at university and have recently finished a triple major in math and sciences, and I must say that you are an excellent educator. Your teaching skills easily surpass the majority of professors' I've experienced. Very nice work. I have been accepted into a CompSci PhD program for this upcoming fall semester, and I am extremely interested in machine learning. Consequently, I have been trying to self learn data science in the meantime to better prepare for my studies, and finding good tutorials that offer lucid explanations has been quite difficult. Andrew Ng's course is wonderful, but lacks the Python element, a language I am very keen on and will be using in my graduate studies. Others are difficult to follow, put together in haste, or offer no explanations - just code. Thank you for the effort you've put into creating these videos, and for your incredible teaching abilities. Keep up the fantastic work, and I wish you the best of luck for your future endeavors - although I'm sure your skill is what will enable your success, not luck!
@dataschool
@dataschool 8 лет назад
+Michael Iuzzolino WOW! Thank you for your incredibly thoughtful and generous comments. I take pride in my teaching, and it is truly a delight to hear it appreciated so thoroughly. In a few weeks, I will be announcing an updated and expanded version of this course, in case you are interested: www.dataschool.io/learn/ Best of luck to you as well!
@jnscollier
@jnscollier 8 лет назад
Just want to say you are an amazing speaker/presenter. It's inspiring to see thoughts in a human so collected and clear-headed. I can only aspire to.
@dataschool
@dataschool 8 лет назад
+jnscollier Wow, thank you! I greatly appreciate the compliment!
@fahdciwan8709
@fahdciwan8709 4 года назад
Thanks a ton brother !! one of the best tutorials for a python library. the clarity in explanation is 10/10
@abhimanyukatyayan
@abhimanyukatyayan 8 лет назад
Can't be more simpler than this. You are an awesome teacher. I've seen many tutors and videos, no-one is so concrete, simple and your explanation style is outstanding. Looking forward for some complex ML topics.
@dataschool
@dataschool 8 лет назад
+abhimanyu katyayan What a nice thing for you to say! I really appreciate it.
@kelliesukhi9921
@kelliesukhi9921 8 лет назад
Goodness. These videos are amazingly useful and the best of their kind I've found online. Thanks, Data School, for putting them together!
@dataschool
@dataschool 8 лет назад
Wow, thanks so much for your kind comment! I'm glad the videos have been so useful to you!
@RayedWahed
@RayedWahed 8 лет назад
I never imagined something as archaic as machine learning could be taught with such ease and grace. Everything about your videos is spot on. From your lectures, to the development environment, to the recommendations and links. Love the follow up links. So helpful. Can't wait for more!!!
@dataschool
@dataschool 8 лет назад
+Rayed Bin Wahed Wow! Thanks so much for your kind and generous comments!
@manjuappu89
@manjuappu89 8 лет назад
Your Explaination is so good, referring to 10minutes of this content helps me to know things much more better.
@dataschool
@dataschool 8 лет назад
Excellent! I'm very glad to hear.
@MichaelSartore
@MichaelSartore 4 года назад
Clear and concise explanations. You've covered things other tutorials I've seen have missed. Thanks!
@dataschool
@dataschool 4 года назад
Thanks very much for your kind words!
@yuvaraj2457
@yuvaraj2457 3 года назад
I never thought that some would teach so clearly. Even a layman would understand this stuff. Great, thumbs up. U mentioned the data have to be in a shape but what if some data are missing and not in a proper shape
@gonzalomolina2988
@gonzalomolina2988 7 лет назад
I recently found this tutorial series while I'm looking for something to learn about machine learning and I'll just say Thank you! This is especially useful for those people whose mother language isn't English! Thanks again!
@dataschool
@dataschool 7 лет назад
You're very welcome! So glad to hear that it has been helpful to you!
@anarefin
@anarefin 9 лет назад
I was looking some easy but effective tutorials on ML. I have no doubt, this series will be that what I was looking for. Thank you very much for taking this initiative. (Y)
@wenliangzhang2986
@wenliangzhang2986 9 лет назад
Thank you so much for the tutorial. I love it! You speak so clearly and concisly! I cannot wait longer for your coming video!
@dataschool
@dataschool 9 лет назад
Wenliang Zhang Awesome! What a nice comment to receive, thank you! :)
@alexandermrkich8734
@alexandermrkich8734 2 года назад
Thank you for explaining this slowly - it makes it very easy to follow.
@dataschool
@dataschool 2 года назад
Great to hear!
@louisebuijs3221
@louisebuijs3221 3 года назад
Im just pausing the video to say that your explanation couldn't be better. So clear! THANK YOU !!!
@dataschool
@dataschool 3 года назад
Thank you! 🙌
@kannanv8831
@kannanv8831 4 года назад
A smooth and calm voice. It is easy to absorb.
@dataschool
@dataschool 4 года назад
Thanks!
@lakshaywadhwa
@lakshaywadhwa 6 лет назад
A great video i have ever watched , because it gives clearity and full understanding most importantly also provides resources, heads off to u man, what else one need
@dataschool
@dataschool 6 лет назад
Thanks so much for your kind comment!
@prashanthreddy8537
@prashanthreddy8537 5 лет назад
Thank you for making some really structured and good content. I have learned to use pandas in python by watching your videos. The way you explain is clear and structured. Please, keep making more videos so that most people like me will be able to learn even from halfway across the world. Keep up the good work!
@dataschool
@dataschool 5 лет назад
Thank you so much for your kind comment! So glad to hear I've been helpful to you 🙌
@michelepaglialonga9382
@michelepaglialonga9382 6 лет назад
thank you! In my opinion, the best tutorial on the topic I found on the net. I know Python and I'm approaching machine learning ... very nice
@dataschool
@dataschool 6 лет назад
Thanks so much for your kind words!
@navinkamdar961
@navinkamdar961 7 лет назад
I have seen a lot of stuff on machine learning but couldn't get it. Your work is amazing. You let me know that I can learn ML too. Thank you soooooo much.....
@dataschool
@dataschool 7 лет назад
Awesome! I'm so glad to hear! Thanks so much for your kind comments :)
@dataschool
@dataschool 6 лет назад
Hi Navin, I have a favor to ask of you... would you mind emailing me? Thank you so much! Email address - kevin at dataschool dot io
@sgbalakrishna
@sgbalakrishna 8 лет назад
you are the best teacher :) Thanks for these videos
@dataschool
@dataschool 8 лет назад
+sgbalakrishna Thanks, and you're welcome! :)
@jasinthasravanthi448
@jasinthasravanthi448 6 лет назад
Just Love the way you teach. Point to Point.. Thanks a lot for the series
@dataschool
@dataschool 6 лет назад
You're very welcome! :)
@julianferry3059
@julianferry3059 8 лет назад
Thank you so much for these videos. This is exactly what I needed to get started with Machine Learning
@dataschool
@dataschool 8 лет назад
Great to hear! Good luck with your education :)
@albertgao7256
@albertgao7256 8 лет назад
Thanks so much for the great video, and specially for introducing the , now I finally know why should we use Numpy! And seems very simple to use!
@dataschool
@dataschool 8 лет назад
+Albert Gao You're very welcome!
@bardeeaaaa
@bardeeaaaa 6 лет назад
Hands down the best machine learning tuts on youtube
@dataschool
@dataschool 6 лет назад
Thanks very much!
@tatavares1985
@tatavares1985 7 лет назад
Excellent classes!! Amazing for those who have english as a secound language like me!! Tks a lot!
@dataschool
@dataschool 7 лет назад
You're very welcome! I'm glad the videos are helpful to you!
@victorblaer
@victorblaer 7 лет назад
Quick tip for new users to Python like me: in Python 3 you need to wrap the arguments to your print statement/function in parenthesis. So, print iris.data won't work but print(iris.data) will.
@dataschool
@dataschool 7 лет назад
Thanks so much for sharing with others!
@dataschool
@dataschool 6 лет назад
I recently updated the code to use Python 3.6. The updated code can be found here: github.com/justmarkham/scikit-learn-videos
@aromax504
@aromax504 8 лет назад
Absolutely brilliant and clutter less. Thank you
@dataschool
@dataschool 8 лет назад
+aromax504 You're very welcome!
@AsifMehedi
@AsifMehedi 9 лет назад
Excellent - well thought out and lucidly explained.
@dataschool
@dataschool 9 лет назад
Asif Mehedi Thanks! It takes a lot of planning, but it's great to hear when the videos are helpful to people :)
@AsifMehedi
@AsifMehedi 9 лет назад
Data School I can imagine how much preparation must go behind this. As someone here commented, this series is turning out to be one of the best introductions to ML ever. Keep up the great work!
@ankitvashisht3519
@ankitvashisht3519 5 лет назад
This is "REALLY AWESOME".....Great work...Thanks a lot for these awesome tutorials..
@dataschool
@dataschool 5 лет назад
Thanks! :)
@maniesha4599
@maniesha4599 5 лет назад
Very helpful videos- clear and precise. Thanks!
@dataschool
@dataschool 5 лет назад
You're welcome!
@dataschool
@dataschool 6 лет назад
*Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos
@TheAlderFalder
@TheAlderFalder 5 лет назад
Hey Kevin, for some reason your updated jupyter code at 1:40 doesn't work. (There is no output, instead a window pops up, suggesting to save the iris data file) I've also tried to write the HTML code, like in this video, instead of using IFrame... same problem. (Not a huge deal, but would be nice if you could clarify)
@storyfactory1603
@storyfactory1603 4 года назад
Please do a video about labelencoder,onehotencoder
@AD-sf1vn
@AD-sf1vn 7 лет назад
thank you: you speak very clearly an dslowly which is perfect for a non english native. And course is very clear and step by step: perfect
@dataschool
@dataschool 6 лет назад
You're very welcome! I'm glad my videos are helpful to you!
@TarunPrasadKottary
@TarunPrasadKottary 7 лет назад
Awesome stuff man! very clean explanation and easy to learn. cheers
@dataschool
@dataschool 7 лет назад
Thanks very much!
@usmanshaikh1115
@usmanshaikh1115 5 лет назад
Very useful and easy to understand. Thank you
@dataschool
@dataschool 5 лет назад
You're welcome!
@incanberra
@incanberra 4 года назад
These videos are so useful. Thanks.
@dataschool
@dataschool 4 года назад
Thanks for your kind comment!
@ohserra
@ohserra 8 лет назад
You are a great professor! I hope all the best success to you ;)
@dataschool
@dataschool 8 лет назад
+Diogo Gonçalves Thanks, I wish you success also!
@doupanpan7271
@doupanpan7271 6 лет назад
thank you very much to post the videos, you are my role model :)
@dataschool
@dataschool 5 лет назад
Ha! Thanks! :)
@pavanarameshchandra8531
@pavanarameshchandra8531 6 лет назад
Thank you. . You are gifted , you made it simple
@dataschool
@dataschool 6 лет назад
You are very welcome!
@chanukyasai2860
@chanukyasai2860 4 года назад
your voice is amazing and explanation is also very good....you just seem like Sheldon while explaining
@mojtabavahdati7648
@mojtabavahdati7648 4 года назад
This video helped me a lot. Thanks.
@dataschool
@dataschool 4 года назад
You're welcome!
@shivayshakti6575
@shivayshakti6575 2 года назад
Dude was doing epic shit in 2015, salute :)
@dataschool
@dataschool 2 года назад
😄
@financialfreedom6832
@financialfreedom6832 9 лет назад
Very good explanation.. Liked it (Y) .. Thanks for other links as well..
@diegocavalcante9568
@diegocavalcante9568 8 лет назад
I Love your videos! Great Job and good luck!
@dataschool
@dataschool 8 лет назад
+Diego Cavalcante Awesome! Good luck to you too :)
@shahjoyal4
@shahjoyal4 8 лет назад
Fantastic video series :)
@dataschool
@dataschool 8 лет назад
Thanks for your kind comment!
@bernsbuenaobra473
@bernsbuenaobra473 3 года назад
Still very relevant there is just newer versions of the same thing (IPython is now Jupyter Notebooks and version numbers of the same library were expanded and improved ) code that will still work for a demo. Fundamentals don't change anyway and the delivery of the tutorial remains excellent to this day. I like to access data locally from my hard drive needing no internet to read IRIS data as HTML location but data frame for Pandas instead - it wouldn't hurt to read the text file into excel and convert that to .csv file instead.
@12vak
@12vak 3 года назад
This helped alot, thanks!
@dataschool
@dataschool 3 года назад
Glad it helped!
@vanmemet
@vanmemet 7 лет назад
Thanks a lot, you are a perfect teacher.
@dataschool
@dataschool 7 лет назад
Wow, thanks for your kind comment!
@milliekim5072
@milliekim5072 5 лет назад
Awesome videos!!! Thank you so much
@dataschool
@dataschool 5 лет назад
You're welcome!
@nathantum9157
@nathantum9157 8 лет назад
in this video you said that we know the target as 0,1,2 so we know it is a classification problem but not a regression problem. But according to my understanding, we can apply multi-class logistic regression in this problem.
@Dan-tf9kb
@Dan-tf9kb 7 лет назад
Very well explained
@dataschool
@dataschool 7 лет назад
Thanks! Glad it was helpful to you.
@abhinavbhatnagar5651
@abhinavbhatnagar5651 5 лет назад
YOUR VIDEOS are awesome
@dataschool
@dataschool 5 лет назад
Thanks!
@GauravSharmalife
@GauravSharmalife 7 лет назад
Amazing lectures :-) You remind me of sheldon cooper too
@dataschool
@dataschool 7 лет назад
Thanks! :)
@bilalmohammad4242
@bilalmohammad4242 3 года назад
Thank you very much
@SteveCollins527
@SteveCollins527 8 лет назад
Thanks for these videos, these are a great help for getting started! Quick question, I may have missed it since I skipped a couple videos since I already had anaconda and Jupyter notebook installed when starting, but are you using Python 2.7? I am using 3.4 and as I am going through I ma getting some syntax errors and such. Nothing that's too difficult to troubleshoot or anything, I was just wanting to know if I need to keep that in mind throughout the series?
@dataschool
@dataschool 8 лет назад
+Steve Collins That's correct, I am using Python 2.7 in this series. The main changes you will need to make are using "print" as a function (instead of a statement), and explicitly converting the results of the range function to a list. If you run into any other issues, I'd love to hear about it!
@SajidTechTidbits
@SajidTechTidbits 6 лет назад
A great resource for machine learning .....
@dataschool
@dataschool 6 лет назад
Thanks!
@flamboyantperson5936
@flamboyantperson5936 6 лет назад
Awesome explanation
@dataschool
@dataschool 6 лет назад
Thanks!
@putin9614
@putin9614 6 лет назад
very nice explanation
@dataschool
@dataschool 6 лет назад
Thanks!
@Shiva-zy7jq
@Shiva-zy7jq 5 лет назад
Awesome video
@dataschool
@dataschool 5 лет назад
Thanks!
@stargaryen3383
@stargaryen3383 5 лет назад
watch this on 1.25 , thats much better, thank you sir
@dataschool
@dataschool 5 лет назад
Everyone has their own preference, and so I'm glad RU-vid allows you to select the best speed for you!
@maggietang1369
@maggietang1369 4 года назад
Hi, Kevin, thank you for all great videos. Is there any easier way to import sklearn modules? Like from sklearn import all? Thank you!
@ernestoquisbert5046
@ernestoquisbert5046 7 лет назад
Excellent
@dataschool
@dataschool 7 лет назад
Thanks!
@Belfran
@Belfran 5 лет назад
Thank you!
@dataschool
@dataschool 5 лет назад
You're welcome!
@youssefnasef7736
@youssefnasef7736 3 года назад
I have a little question: why do we have to predict data based on informations that we already have? e.g. when i defined the X matrix to the features and the y vector to the predicted data, all we want to do s to verify how accurate the computer is to analyse, learn and predict which group does it belong to?
@BadriNathJK
@BadriNathJK 8 лет назад
Your voice is amazing.
@dataschool
@dataschool 8 лет назад
Ha, thank you! :)
@user-gx1oh9nx8j
@user-gx1oh9nx8j 6 лет назад
sir. i am facing some problems, can you send me your E-mail, i will contact you sir. thanks indeed
@easy_3d
@easy_3d 3 года назад
Love all your videos.could you make videos on coding machine learning alogrithms?
@dataschool
@dataschool 3 года назад
Thanks for your suggestion!
@musiclover21187
@musiclover21187 7 лет назад
Omg. This is excellent. Will you be coming up with tensorflow tutorials?
@dataschool
@dataschool 7 лет назад
Thanks for the suggestion - I'll consider it for the future!
@Nana19912
@Nana19912 8 лет назад
Great tutorial! I have a question if I had a dataset that is not a toy dataset -not built in sklearn-, how do i import it? what function do I use?
@dataschool
@dataschool 8 лет назад
I recommend you use pandas to read in the dataset. Here's a video that should help you: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-5_QXMwezPJE.html
@robindong3802
@robindong3802 6 лет назад
yes, very amazing voice as well.
@dataschool
@dataschool 6 лет назад
Thank you!
@transportation-talk
@transportation-talk 9 лет назад
Great tutorial! I am sure this series will become an excellent resource for all machine learning enthusiasts. Bundle of thanks! I have a question: What if there are more than 1 observations (rows) associated with an outcome? For example, we want to predict if a certain vehicle is a car or truck using labelled data and each vehicle has, say, 10 observations in a second? Would we consider all of them in the X matrix?
@dataschool
@dataschool 9 лет назад
umair durrani That's a great question. It is required that each logical "unit" in your dataset is represented by a single row of feature data and a single outcome. So in your example, where the logical unit is "vehicle", you have to find a way to fit all of the feature data about that vehicle into that single row. It can be a challenging task to turn "time series data" (as you described) into single features, but that's ultimately what you have to do. Depending upon what those "10 observations in a second" represent, you might use the average of those numbers as one feature, the standard deviation of those numbers as a second feature, and the min and max of those numbers as two more features. What I'm describing is called "feature engineering", and it is an art rather than an exact science. But as you can see, by calculating these features, I can fit that time series data into a single row of my feature matrix X. Let me know if that helps. Thanks for your kind words!
@transportation-talk
@transportation-talk 9 лет назад
Data School Thanks for the explanation. I am understanding your description of feature engineering which is quite interesting. The possible problem is that there is considerable variation of the features for the same vehicle across the time. So, maybe classification might not be appropriate for this task. I hope you'll introduce other techniques in future videos. Thanks for your time.
@claire2247
@claire2247 2 года назад
Thank you for your videos! Can you tell me, is .target_names an action, and one that can be used on all scikitlearn datasets in Python? Or is it specific to the iris dataset? Thanks again!
@dataschool
@dataschool 2 года назад
Great question! target_names is available for all of scikit-learn's built-in datasets that are intended for classification: scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
@claire2247
@claire2247 2 года назад
@@dataschool Thank you!
@KaSulaiman
@KaSulaiman 6 лет назад
Good explanation. You need to keep it up the good work. Sadly though, we rarely can find methods on how to create dataset exactly like iris or similar rather than explaining from finished data onwards. Mostly, it would be at the extreme ends i.e. either show how to create data set or from a ready dataset to machine learning which hopefully won't deteriorate the passion of beginners to improve their skills.
@dataschool
@dataschool 6 лет назад
I agree that it's important to have tutorials that involve more "real world" datasets, though I also find that you have to start with the basics (simple datasets) before moving on to more complicated examples. Thanks for your comment!
@tinghofung
@tinghofung 7 лет назад
I am a beginner in machine learning. The concepts are well-explained, thank you! Btw, I watched the video with 1.5x speed. haha
@dataschool
@dataschool 7 лет назад
You're very welcome! I'm glad the videos are helpful to you at any speed :)
@geethasaikrishna8286
@geethasaikrishna8286 7 лет назад
Hi, Thanks for the video series. They are really helpful. In the video above you have described that scikit learn always takes numeric values, we need to convert the categorical to numeric. But how would it segregate between which columns have numeric values & which other columns are categorical. Please share some links to read more about this topic if you have some
@DontTakeCrack
@DontTakeCrack 5 лет назад
BTW, I really enjoy this speed for education purposes. Rarely do I get an opportunity to actually digest each sentence properly.
@dataschool
@dataschool 5 лет назад
That's my goal! Thank you so much your comment 👍
@RajeshSharma-bd5zo
@RajeshSharma-bd5zo 6 лет назад
Hi Kevin, how can I feed text data into ML model. Like what pre processing steps I need to perform over that data so that our model will be able to understand it. Or how we deal with such kind of data to train our model?
@dataschool
@dataschool 6 лет назад
This video will be perfect for you: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ZiKMIuYidY0.html
@hanyerfan8204
@hanyerfan8204 5 лет назад
amazing =)
@dataschool
@dataschool 5 лет назад
Thanks!
@python_by_abhishek
@python_by_abhishek 4 года назад
I love your video series. Can you make a video teaching the above using .csv file and not database. Also, while predicting accuracy of model made, is there any method to make train data and test data? If it's not mentioned which column is my target data then what is the approach to determine the accuracy? Please make a lecture on this.
@dataschool
@dataschool 4 года назад
Thanks for your suggestion!
@aniruddhasamant7640
@aniruddhasamant7640 5 лет назад
I have a question, when we do a normal train test split on a dataset our x and y are pandas series objects and not numpy arrays, but they still work,pls elaborate on this point
@dataschool
@dataschool 5 лет назад
That's right, and it's because scikit-learn understands how to access the NumPy arrays that underlie DataFrames.
@suzitbiswas2652
@suzitbiswas2652 8 лет назад
Great video. your teaching style is awesome and also explanations. I am new in machine learning so i fell some difficulties to understand.. please give your advice to improve my understanding.
@dataschool
@dataschool 8 лет назад
+Suzit Biswas Glad it is helpful to you! Have you already watched this video? ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-elojMnjn4kk.html It gives a brief introduction to machine learning. Let me know what you think!
@suzitbiswas2652
@suzitbiswas2652 8 лет назад
+Data School Thank you very much. now i have seen this video.
@anarefin
@anarefin 9 лет назад
As every week only a single video will be published, can you please share some other resources as study materials? It might be any video/book or whatever you like to share. Thanks again!
@arsenalacid
@arsenalacid 7 лет назад
Hi i really am enjoying this video series so thank you for that. I am just learning python at the moment do you think its harder to get a job in machine learning compared to say django? I find machine learning more interesting but I feel its harder and there are fewer jobs in the market. Im from UK btw, thanks again.
@dataschool
@dataschool 7 лет назад
I think that it's harder to get a job in machine learning not because there are fewer jobs, but because machine learning has a higher barrier to entry in terms of academic skills. However, I'd advise you to choose the path that interests you the most. Hope that helps!
@bicepjai
@bicepjai 9 лет назад
Thanks for the material. If you are following ISLR book, could you share the chapters that we might read along the way, just like how you added to the first video.
@dataschool
@dataschool 9 лет назад
Jayaram Prabhu Durairaj I love the ISLR book, though this video series doesn't follow the book. However, I'll do my best to include resources that go along with each video for those people who want to go deeper!
@kostasnikoloutsos5172
@kostasnikoloutsos5172 6 лет назад
Wow, I think x,y are like in math in high-school when we set an independent variable X (which is the observation right now) and we were getting a result y(the target).
@dataschool
@dataschool 6 лет назад
Sounds similar to me!
@sapnapatil27
@sapnapatil27 7 лет назад
Hi, Very nice video and well explained. As mentioned sklearn only uses numeric for features and response. How can we convert values into numeric for classification or regression? I know we can manually assign 1,2,3 .... for each value but does sklearn has a library which can convert categorical to numerical.
@dataschool
@dataschool 7 лет назад
This video will guide you how to convert categorical variables to numeric variables using pandas: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0s_1IsROgDc.html It can also be done using scikit-learn, see here: scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features Hope that helps!
@adrianmanuelorayeabrasaldo
@adrianmanuelorayeabrasaldo 4 года назад
Can you also do a tutorial on how to create your own dataset, that would be amazing :D
@dataschool
@dataschool 4 года назад
I recommend just using a CSV file.
@sonalivv
@sonalivv 7 лет назад
Thanks!! This video was exactly what I was looking for. You mentioned that you will explain how to import our own dataset. Can you please point me to that video?
@dataschool
@dataschool 7 лет назад
Glad the video is helpful to you! I think the video you are looking for is this one: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-3ZWuPVWq7p4.html Also, you may want to check out my pandas video series: ru-vid.com/group/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y Hope that helps!
@sonalivv
@sonalivv 7 лет назад
Thanks!!
@abhishekkumaryadav652
@abhishekkumaryadav652 3 года назад
I want you to tell me what did you do for your vocal it's natural ,it's so cool
@dataschool
@dataschool 3 года назад
Thanks!
@umarnasir1
@umarnasir1 8 лет назад
1. Hi, at 08:10, you said that predicting wether an email is spam or ham is also an example of classification problem. I dont understand how is that, Can you please elaborate that. 2.One other question I have is that if in my experiment, target can be coded either by two features, one numerical discrete and one numerical continuous (it can be coded by both individually, its upto me which feature to use to code my target). Then will it be regression case or classification ? I am asking this bec you said that if response being predicted is ordered and continuous then its regression case. (so if i chose the continuous feature (which is rounded to nearast decimal), then can it be a regression case)
@dataschool
@dataschool 8 лет назад
+umar0021 1. A classification problem is one in which the response value is categorical. In this problem, the response value is either "ham" or "spam", which are categories. 2. Here are some scenarios: - If your target (also known as "response value") is continuous, you would use a regression model. - If your target is discrete numbers, but they represent categories with no logical order, you would use a classification model. - If your target is discrete numbers, and they represent categories with a logical order, you could use a classification model or a regression model. The downside of using classification here is that the models are not aware of the ordering. The downside of using regression here is that you have to round the regression output so that the predicted values can be converted back to categories.
@ericgilkey3549
@ericgilkey3549 6 лет назад
Thanks for the tutorial. However, I'm interested in using data from my own excel table (so CSV file) rather than a pre-compiled dataset. Is there a way to convert this to a Bunch form that scikit requires?
@dataschool
@dataschool 6 лет назад
Great question! The right approach would be to use pandas rather than creating a bunch object. Here's a video that introduces pandas for use with scikit-learn: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-3ZWuPVWq7p4.html And if you want to learn more pandas, I have a huge video series: ru-vid.com/group/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y Hope that helps!
@ericgilkey3549
@ericgilkey3549 6 лет назад
Thank you very much for the reply. I actually came across these videos shortly after I asked the question. That's what I get for asking too soon!
@dataschool
@dataschool 6 лет назад
Great to hear! Hope you enjoy the videos!
@Davidemmanuelkatz
@Davidemmanuelkatz 7 лет назад
Will classification problems only deal with mutually exclusive sets?
@dataschool
@dataschool 7 лет назад
Are you asking if you can have a classification problem in which a given observation belongs to more than one class? If so, the answer is yes, and it's called multilabel classification. scikit-learn does have some support for multilabel classification. You can read more here: scikit-learn.org/stable/modules/multiclass.html Hope that helps!
@architgupta3d
@architgupta3d 5 лет назад
You remind me of Sheldon Cooper. Nice video btw.
@dataschool
@dataschool 5 лет назад
Ha! I have heard that many times 😜
@abhinavbhatnagar5651
@abhinavbhatnagar5651 5 лет назад
Are you also going to create few on unsupervised n dimension reduction concepts
@dataschool
@dataschool 5 лет назад
That's not in my plans, but thanks for your suggestion!
@sandeeppreetam
@sandeeppreetam 4 года назад
is respone and target the same?
@ShreyasAgarwal
@ShreyasAgarwal 7 лет назад
i am not able to use scikit-learn module as skllearn. Stating an error sayin : no module named "skllearn" found.
@dataschool
@dataschool 7 лет назад
That should be 'sklearn', not 'skllearn'. Hope that helps!
@datascienceds7965
@datascienceds7965 6 лет назад
Great tutorial. Can I ask how would you set feature_names and target_names on a dataset? sorry if that is a stupid question!! I was trying data.feature_names in a dataset downloaded from AdventureWorks2012. and giving me an error as below: AttributeError: 'DataFrame' object has no attribute 'feature_names'
@dataschool
@dataschool 6 лет назад
Great question! You wouldn't set those attributes on your own dataset. I think one of these videos would be helpful to you: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-3ZWuPVWq7p4.html ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ylRlGCtAtiE.html
@datascienceds7965
@datascienceds7965 6 лет назад
Thanks for your reply. Great videos.
@dataschool
@dataschool 6 лет назад
Thanks!
@Achooification
@Achooification 7 лет назад
In playing with scikitlearn I found that it doesn't much mind strings as targets. To test this I created a simple classifier of colors where I gave it the HTML codes of a few basic colors, then by feeding it codes of other colors it would classify , for example, pink as a type of red. Or it would classify aqua as a type of blue. In doing this I gave it a list of color names as the target. During prediction it returns the string within an array marked with a byte-type. In principal why is this a bad practice? Thank you for your videos.
@dataschool
@dataschool 7 лет назад
Great point! There's no reason that this is a bad practice. In the "old days", the target for some classification models had to be a number (not a string). However, I think that this issue has since been addressed, and now it's likely that all classification models can accept strings as the target.
@Kingofqueers1
@Kingofqueers1 5 лет назад
How can I make my own dataset in sci-kit learn? Do I have to make an excel file first and then download it as a csv
@dataschool
@dataschool 5 лет назад
This video might help: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-3ZWuPVWq7p4.html
@sokhibtukhtaev9693
@sokhibtukhtaev9693 7 лет назад
thanks for amazing and well-explained tutorials. I have a dataset of depth information from Kinect and they are in .bin extension? How can I load them and massage?
@dataschool
@dataschool 7 лет назад
Glad you like the tutorials! I don't know how to load 'bin' files, I'm sorry! Perhaps someone online has written a library that can help you to extract the data from bin files into a usable format.
@sokhibtukhtaev9693
@sokhibtukhtaev9693 7 лет назад
thanks anyway! you are awesome!
@dataschool
@dataschool 7 лет назад
Thanks! :)
@zjzhuang2981
@zjzhuang2981 7 лет назад
Hi, are there videos about how to prepare the raw data before building a model, like how can i turn my target whose type is category to number? And I want to ask how can the model tell the target number represents a category instead of numeric so there is no scale or distance between two target number.
@dataschool
@dataschool 7 лет назад
Regarding your first question, you could use the pandas map method: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-P_q0tkYqvSk.html Or, you could use LabelEncoder from scikit-learn: scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html Regarding your second question, scikit-learn will infer that the numbers represent categories as long as you use a classification model. Hope that helps!
@zjzhuang2981
@zjzhuang2981 7 лет назад
I think it's not an easy work to review all the comments and give replies, so I really appreciate for your detailed and friendly reply. Thank you!
@zjzhuang2981
@zjzhuang2981 7 лет назад
Thanks!
@dataschool
@dataschool 7 лет назад
You're very welcome! And you're right, it is a lot of work, but I'm happy to help :)
Далее
Training a machine learning model with scikit-learn
19:49
How Many Balloons Does It Take To Fly?
00:18
Просмотров 15 млн
🤯 #funny
00:20
Просмотров 2,6 млн
How I'd Learn AI in 2024 (if I could start over)
17:55
Просмотров 855 тыс.
My top 25 pandas tricks
27:38
Просмотров 265 тыс.
Scikit-Learn Model Pipeline Tutorial
16:50
Просмотров 25 тыс.
How Many Balloons Does It Take To Fly?
00:18
Просмотров 15 млн