If you're using Seaborn 0.11.0+, check out my videos all about the new Seaborn distribution plots: displot (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-4DA_dgc521o.html) and histplot (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Bjz00ygERxY.html) 😄
Hello the title misleading you haven't interpreted the plots instead you just shown how to make the distplots, the important part is interpretation about the plots
seaborn 0.11.0 is not supporting distplot however it suggesting to use histplot or displot and the various argument of distplot not supporting displot 😅
Yes -- the most recent Seaborn update was a big one! The distribution plot is now called the displot and a simple histogram plot now exists, the histplot. The displot is supposed to be more similar to the catplot and the relplot, so some arguments were removed (for example, fit) while others only work if "kind" matches (for example, bins only works for kind="hist"). Planning to do videos on both the updated displot and the histplot in the future!
As far as I know, binwidth and binrange were introduced with the seaborn histplot (in seaborn version 0.11.0) -- the distplot only has the bins argument. So, you can pass in your own custom bin list to the bins argument, but just not the binwidth and binrange options. And I agree - it was a bummer that the "fit" argument was removed!
Great question! I installed nbextensions (jupyter-contrib-nbextensions.readthedocs.io/en/latest/) which allows for additional functionality in Jupyter Notebook. Those dropdowns are an extension called "Collapsible Headings" (jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/collapsible_headings/readme.html).
You're absolutely amazing in everything. The best part of your videos is a really sweet voice and the explanations are awesome!! Thank you so much Kimberly. Your channel is quite underrated.
Hey, great work here, really awesome! A small suggestion, please update few codes as per new version in description box or in video. Kinda stuck with huge warning box.
Thank you! And yes, a few months after I launched this series, Seaborn underwent a big revamp in version 0.11.0. Will definitely take your suggestion into consideration, but I did go ahead and make new videos about the updated functions: displot (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-4DA_dgc521o.html) and histplot (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Bjz00ygERxY.html) 👍
I would probably use matplotlib to add a mean line. First import pyplot (from matplotlib import pyplot) and calculate your mean (say, m = cars.horsepower.mean()). Then you can use pyplot to add a vertical line at the mean (plt.axvline(m)). This code can be added right after your seaborn figure in the same Jupyter Notebook cell. I also have a video about adding vertical or horizontal lines with matplotlib if you'd like to learn more: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-xKeu1W2mn64.html
Sure thing - I mostly use datasets that are included in the seaborn library, and those datasets are also publicly available through GitHub (github.com/mwaskom/seaborn-data). In this particular video, I used the "mpg" dataset from seaborn (github.com/mwaskom/seaborn-data/blob/master/mpg.csv).
Great job Kimberly, Please keep on adding more tutorials. Really liked the way you teach the concepts and the code. Already Subscribed and will look out for more videos on ML and Data Science from you!!!
Hi -- Your best bet is probably to use pandas to filter down to the apples first and then do a displot of the apple weights. So if your dataframe, df, has two columns "fruit_type" and "weight", you could do a displot on df[df.fruit_type == "apple"].weight to draw out a distribution plot for just the apple weights. This assumes that you have several apple weight measurements.
Distplot is used to show distributions of numerical values. Since "name" in this dataset has descriptive categories, distplot won't be able to show you much. You could perhaps build a barplot to count up the number of cars from each make, which is the first word in name: cars.name.apply(lambda x: x.split()[0]).value_counts().plot(kind='bar')
Great video tutorials Kimberly! I am wondering what the best way is to handle bounded data with the kdeplot. For example, measurements from an instrument that will always be positive (so bounded on the left by zero). The kdeplot can show positive probability for values below zero. I understand you can limit the x-axis of the plot to start at zero, but the probability is underestimated because some of it "spills over" onto the negative side of the x-axis. I have read about (and tried) a method to include all negative values of the same dataset and the resulting kdeplot shows accurate probability density as it approaches zero. Then you can limit the x-axis to start at zero. Have you used this approach or is it better to use the scipy.stats.skewnorm?
Hi Richard - thank you for this interesting question! Yes, that is definitely a drawback of the KDE, that values can spill into unnatural areas. Seaborn has an option called "clip". If you set that to a pair of numbers, Seaborn will not evaluate density outside of the bounds you provide. But I haven't looked into the source code. Not sure if it will redistribute the density into the allowed area or just clip off the ends. Otherwise, the idea you mentioned should work! If you add in equivalent negative values and THEN lop off your x-axis, you will get an equivalent added part from the negative values even though part of the positive values are getting cut off. 👍
@@KimberlyFessel thank you for the detailed reply. I think I might have tried “clip” before, but I might have to revisit. Your videos are great and truly appreciated. They have helped me a lot. Keep up the great work!
Excellent -- glad you enjoyed the video. Also glad you asked this question! xkcd is a fun comic series. About ten years ago they conducted a large-scale color survey: blog.xkcd.com/2010/05/03/color-survey-results/ The resulting named colors (xkcd.com/color/rgb/) can be accessed by matplotlib or seaborn by prepending the color name with 'xkcd:' I am planning to do a video all about color options and color palettes within seaborn soon!
Hi Kimberly, These are amazing tutorials. Just one question. At 5:23, when you make the plot right skewed using skewnorm, how will you do it for left skew?
Thank you! And yes - Himanshu is correct - skewnorm would automatically detect left or right skewness. Unfortunately, however, the "fit" argument I mentioned in this video has been removed from the new displot and histplot functions in Seaborn version 0.11.0.
You can add a legend to the distplot using matplotlib commands. If you have imported and aliased pyplot (from matplotlib import pyplot as plt), add "plt.legend(["label"])" as a line of code after your distplot. You can also label the kde and histogram separately by making a longer list, e.g. "plt.legend(["kde", "histogram"])".
Hi there -- you can add a line of code before your Seaborn plot to change the figure size via matplotlib: plt.figure(figsize=(10,6)). Just input your required dimensions in place of my example tuple, (10,6), and be sure to do: "from matplotlib import pyplot as plt" at the start of your code.
Hi there -- the KDEplot provides you with an estimate of your data's probability density function. The height of the graph is scaled so that the area under the curve sums (or integrates) to one.