Тёмный

Master Box-Violin Plots in {ggplot2} and Discover 10 Reasons Why They Are Useful 

yuzaR Data Science
Подписаться 9 тыс.
Просмотров 3,3 тыс.
50% 1

Boxplots display a wealth of useful information about the dataset. In this video, we'll start with the most basic boxplot, build every part of this notched box-violin plot in {ggplot2} step by step, and understand why every detail matters 😉
If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳
Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

Опубликовано:

 

3 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 51   
@zane.walker
@zane.walker 10 месяцев назад
A very informative (and well produced) 17 minute video. I picked up on your trick of wrapping a plot inside of a ggplotly command a video or two ago and find it very useful (wish I had discovered that earlier)! Also, some nice tips on adding mean, CI, etc. to the standard boxplots. I like using the ggbetweenstats command. which I started using after one of your earlier videos, on small sets of groups but they don't always work that well with larger numbers of groups. Adding more information to standard boxplots seems like a good compromise. Very much appreciate your videos and thank you for sharing your insights!
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
thanks indeed for such a nice feedback! I very much enjoy creating content and the fact that it's useful for more people than just me, means a lot to me! appreciate your support!
@akanequeen
@akanequeen 6 месяцев назад
This is sooo great!!
@yuzaR-Data-Science
@yuzaR-Data-Science 6 месяцев назад
Glad you liked it! Thanks for watching!
@moviezone8130
@moviezone8130 4 месяца назад
You absolutely set the bar dear. I can't wait to watch it again and again. Can you share the codes as pdf or some other method so that I can practice on my own. Thanks.
@yuzaR-Data-Science
@yuzaR-Data-Science 4 месяца назад
Thanks again for such a great feedback! I am very happy it's useful! As I sad in the other comment of yours, please, feel free to rewatch and pause the video to write down the code yourself, since it is a good learning strategy. Better then copy-pasting. But if you wish to have the hole code, consider to join the channel (it's the join button below every video) and I'll send you the code. Kind regards!
@WilOspinoC
@WilOspinoC 10 месяцев назад
As usual, the content does not disappoint. You always keep expectations high and deliver. Dopamine and serotonin run through my body every time you upload a new video. Once again Me Yury, thank you so much for your educational work.
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Wow, thank you, Wil! That's by far the best feedback I have ever received! I'll try to make sure your dopamine and serotonin levels continue to rise 😉 Thanks for your support!
@tarasst6887
@tarasst6887 10 месяцев назад
super high quality material presentation
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Thanks a lot, Tarass! I also enjoy creating content!
@shadyamigo
@shadyamigo 10 месяцев назад
Would you mind checking. In the first part you say the whiskers extend to the maximum and minimum but I think the geom_boxplot doesn’t go all the way to max and minimum-hence why there are outliers. From the documentation “The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Data beyond the end of the whiskers are called "outlying" points and are plotted individually.”
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
thanks for pointing it out, you are correct: maximum should have been defined as the largest value no further than 1.5 * IQR from the hinge. I guess, I just wanted to first describe the box, then outliers later, and this step by step slow explanation has a cost of not being able to be precise all the time. Being precise immediately would throw several concepts at the learner, like box, outliers, IQR, hinge ... I just hope that I compensated for it later in the video. Thanks again for being attentive!
@shadyamigo
@shadyamigo 10 месяцев назад
@@yuzaR-Data-Science it was all very clear. Thanks for providing this material
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
@@shadyamigo glad you liked it! cheers, mate
@statlab_stat.solution
@statlab_stat.solution 10 месяцев назад
Great. Keep going
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Thank you, I will!
@kennethgottfredsen767
@kennethgottfredsen767 10 месяцев назад
Hi Youzar, Great video, and I really like the random jokes thrown in here and there. Keep it up! / Kenneth
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Thanks for the feedback, Kenneth! :) It's good to see that people get my jokes. Because I am never sure, whether they are funny to more people than just me 😁
@kennethgottfredsen767
@kennethgottfredsen767 10 месяцев назад
@@yuzaR-Data-Science Do you have any videos on how to connect to a cloud or local SQL-server in R?
@yuzaR-Data-Science
@yuzaR-Data-Science 9 месяцев назад
not yet, I might come in a distant future, until then I plan to cover some modelling and machine learning topics.
@Violetblue1307
@Violetblue1307 4 дня назад
I really like the way you explain codes, beginning with simple levels and increasing gradually the complexity. I also understand every argument you wrote, so interesting! Previously, I used to copy codes online or through some classes but I can't know all of them, so I can't design graphs by myself. Now it changed. Thank you!
@yuzaR-Data-Science
@yuzaR-Data-Science 4 дня назад
I am so glad to hear that, Violet! Thanks a lot! I felt the same by going through lots of R content, that's why I started to produce my own R content ... it was actually only for me in the beginning, so, I learn better, but then I started to get more and more positive feedback, like yours ;) and being helpful by producing useful content makes me kind of happy :)
@Marcosls2015
@Marcosls2015 4 месяца назад
Hi Yuri, really thanks for sharing this knowledge! This was fantastic to open the mind to the possibilities of this plot. Please, I wonder if you could share the code? Thanks
@yuzaR-Data-Science
@yuzaR-Data-Science 4 месяца назад
Hi Marcos, thanks a ton for joining the channel! Your support is much appreciated! Of coarse you can have the code. I just posted it on the community tab for members only. Please, let me know whether you can see/find it. Kind regards! Yury
@Learner_2000
@Learner_2000 3 месяца назад
Excellent video and easy to learn .Thank you so much. I have one query after plot this graph,when I save with proper dimension, the text size appeared very small. Im trying so many times with a manually fixed font size ,but not success. Could you provide any idea to fixed. Note: this problem is only appeared within ggstatplot function graphs like Pairwise compariaon, vilion plot.. Thank you in advance
@yuzaR-Data-Science
@yuzaR-Data-Science 3 месяца назад
Yes, you can use "device" argument, like I said near the end of the video, and use the same extention as in your picture. Here is the code for jpeg example: ggsave( "magic_boxplot.jpeg", # "pdf", "jpeg", "tiff", "png", "bmp", # "svg", "eps", "ps", "tex" or "wmf" device= jpeg, plot = p6, width = 10, height= 7, dpi = 1000) Thanks for positive feedback! Glad you liked the video! :)
@Learner_2000
@Learner_2000 2 месяца назад
@@yuzaR-Data-Science Greeting from Nepal! Now, I can made a beautiful graph with well appearance by adjusting point.args ,centrality.point.args, centrality.label.args, and theme. I was not hear even the package name ggstatplot , after watching your videos my interest raising day by day and now I can easily make a publication graph .Thank you so much .
@yuzaR-Data-Science
@yuzaR-Data-Science 2 месяца назад
You are very welcome 🙏 thank you for watching!
@sebbikankondi5546
@sebbikankondi5546 10 месяцев назад
Excellent video as always, thank you so much for sharing this. One question, you mentioned replacing or removing incorrect representations of sample sizes on the x.axis that materialize as a result of further splitting the plots into smaller sub-plots. What approach would you use to still display sample sizes on your plot after splitting them into sub-plots i.e., replacing and not simply removing them?
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Thanks for the excellent question! I knew I'll get this question, because I asked myself the same one :) I don't have a quick solution for it, to be honest, because there is already a function, which does calculate the sample size and puts the values on the x-axis. So, I never needed to figure it out. It only works with one additional variable, though. Here is this function: library(ggstatsplot) grouped_ggbetweenstats(data = Wage, x = education, y = wage, grouping.var = health_ins)
@sebbikankondi5546
@sebbikankondi5546 10 месяцев назад
Thank you, grouped_ggbetweenstats() works really well and adds useful additional info. To simply add sample sizes to the already existing plot, adding stats_n_text() from EnvStats package works really well too: p6+ facet_grid(jobclass ~ health_ins)+ stats_n_text(y.pos=5). But that displays sample sizes on the plot and not the x.axis.
@yuzaR-Data-Science
@yuzaR-Data-Science 9 месяцев назад
you can also produce separate plots and put them together at any time if this would reduce the complexety of programming. patchwork is there an amazing package, I will release a review of this one very soon.
@MoritzSchorn
@MoritzSchorn 10 месяцев назад
Hi Youry, I really ike your videos and they make me want to learn more of R and Data Science :) Do you you have any recommandations for students who want to master both? I am looking forward to the next video!
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Yes, definitely, Moritz. The best start in my opinion is the R4DS book. The best finish is the tidymodels book. Both are online and free. In between you'd need to go through a few classic statistics book, learn and compute statistical tests and models. Some of the topics you'll find on my channel. This will prepare you for machine learning. Thanks for such a nice feedback! I am glad my content is useful!
@MoritzSchorn
@MoritzSchorn 10 месяцев назад
@@yuzaR-Data-Science Thank you for the tips :)
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
you are very welcome @@MoritzSchorn
@hikeaway1596
@hikeaway1596 10 месяцев назад
I love your tutorials! They are soo informative, that I need to rewatch them in order not to miss any important detail :) thanks for doing this, keep up a great work!
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Glad you like them!
@eliapp
@eliapp 10 месяцев назад
I love the way you explain these concepts. It's almost as if you live inside the data ❤
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Glad you enjoy my explanations 😊 I probably sometimes live inside of the data 🙈😂 thank you for such a nice feedback! Much love!
@suelook9562
@suelook9562 8 месяцев назад
Very educative and simple to understand
@yuzaR-Data-Science
@yuzaR-Data-Science 8 месяцев назад
Thanks for your nice feedback!
@juliusirungu1363
@juliusirungu1363 10 месяцев назад
Great and very informative
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
Glad you liked it! Thanks for watching!
@Walker-nb9de
@Walker-nb9de 10 месяцев назад
Great. Thanks for the up.
@Walker-nb9de
@Walker-nb9de 10 месяцев назад
Please upload some tutorial related raster data manipulation In R,. That would be really helpful.
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
thanks a lot!
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
thanks for the idea, I did not know about the raster data manipulation yet, but I'll have a look at it and put it on my list of tutorials I plan to do. thank you for watching!
@Walker-nb9de
@Walker-nb9de 10 месяцев назад
@@yuzaR-Data-Science Thanks.
@yuzaR-Data-Science
@yuzaR-Data-Science 10 месяцев назад
you are very welcome!
Далее
V16 из БЕНЗОПИЛ - ПЕРВЫЙ ЗАПУСК
13:57
Сколько стоит ПП?
00:57
Просмотров 173 тыс.
Mastering {dplyr}: 50+ Data Wrangling Techniques!
17:35
violin plots should not exist
42:15
Просмотров 200 тыс.
V16 из БЕНЗОПИЛ - ПЕРВЫЙ ЗАПУСК
13:57