Тёмный
TidyX
TidyX
TidyX
Подписаться
TidyX is a screen cast where we discuss how Data Science topics and code work line-by-line, explaining what they did and how the functions they used work. We also break down the visualizations they create and talk about how to apply similar approaches to other data sets. The objective is to help more people learn R and get involved in the TidyTuesday community.

The hosts are Ellis Hughes (@ellis_hughes) and Patrick Ward (@OSPpatrick).

Ellis has been working with R since 2015 and has a background working as a statistical programmer in support of both Statistical Genetics and HIV Vaccines. He also runs the Seattle UseR Group.

Patrick's current work centers on research and development in professional sport with an emphasis on data analysis in American football. Previously, he was a sport scientist within the Nike Sports Research Lab. Research interests include training and competition analysis as they apply to athlete health, injury, and performance.
I Likert Coffee | TidyX Episode 181
29:00
4 месяца назад
Are you sure? | TidyX Episode 176
27:43
6 месяцев назад
The Line Plot Saga | TidyX Episode 166
20:18
9 месяцев назад
Shinylive - Is this thing on? | TidyX Episode 161
16:37
11 месяцев назад
Shiny URL Queries | TidyX Episode 160
20:11
11 месяцев назад
TidyX Posit::Conf 2023 Recap
20:28
11 месяцев назад
Комментарии
@jodarove
@jodarove 15 дней назад
Thank you guys! this was really helpful
@coenraadmarais4074
@coenraadmarais4074 23 дня назад
Flame tree next, please.
@eustrainroblero729
@eustrainroblero729 Месяц назад
gganimate? it will be great one comparison between packages
@k5555-b4f
@k5555-b4f 2 месяца назад
very cool and so useful simply because event-level dataframes are ubiquitous (especially in a work/company setting) especially with fetching all this data from a nested/tree like structured jsons - you have a big fan in me here gentlemen - thanks !! (i'm willing to bet Cohen is a data analyst/scientist for a company involved in sports/NBA bookkeeping or analytics lmao)
@k5555-b4f
@k5555-b4f 2 месяца назад
great stuff as always ! clear and easy to understand what easily can be confusing so, thanks ! (very) unrelated to this but would you be interested in diving (either deep) or as an introduction into the logger package, i think it's created with the intent of mimicking python's version and while i find python's pretty straightforward, for some reason R's version is a little more obscure/harder to grasp for me
@TidyX_screencast
@TidyX_screencast 2 месяца назад
Thanks for the comment! I'll have to look at the logger package a bit again - I think I've used it in the past, but there may be a few other things/concepts you need to know before it makes sense ~ Ellis
@k5555-b4f
@k5555-b4f 2 месяца назад
@@TidyX_screencast no worries if you can't of course - thanks Ellis either way !
@alelust7170
@alelust7170 2 месяца назад
Great content 🎉
@hosseinkhandani3937
@hosseinkhandani3937 2 месяца назад
Why was the panel data package (plm) or LSDV Reg method not used?
@blaisepascal3905
@blaisepascal3905 3 месяца назад
Nice video! Do you know how the modify() function works in purrr? I really struggle to see the difference with the map() function.
@djangoworldwide7925
@djangoworldwide7925 3 месяца назад
You can try use the \(df) notation for anon functions. purrr::map(my_list, \(df) lm(y~x1+X2, data = df)) This function expects a list of data frames called my_list. It then regress in each of these data frames y against x1+x2, and specifies each of the data frames in the list as the data for the regression. It is the same as what you guys did, but i think this notation was introduced and is now considered better than ~ .x notation
@liangzhao2659
@liangzhao2659 3 месяца назад
Great job!
@kurokami254
@kurokami254 3 месяца назад
This was great! Pretty concise and learnt a lot. Cleaning data is a lot less time consuming and intuitive for me now. Didn't know base R was so good at dealing with strings
@TidyX_screencast
@TidyX_screencast 3 месяца назад
Thanks for the comment! R handles strings quite well in a variety of different ways. We just scratched the surface here ~ Ellis
@rayflyers
@rayflyers 3 месяца назад
Take the code I commented on episode 144 and replace purrr::map() functions with furrr::future_map() functions.
@guirodriguues
@guirodriguues 3 месяца назад
Awesome. One package I like a lot is the furrr package. You use the function "future_map" as you do with the "map" function from the purrr package, but in parallel. Pretty easy.
@mikep8857
@mikep8857 4 месяца назад
Great approach for doing multiple comparisons. Could you not just replace filter(r1 != r2) with filter(as.numeric(r1) > as.numeric(r2))? I think broom::tidy() on the output of the t test might have made it a bit easier to combine and extract the data you wanted. although your approach works fine.
@patrickward6067
@patrickward6067 3 месяца назад
Haven't tried that. Will give it a shot! Thanks! ~patrick
@omarahomar
@omarahomar 3 месяца назад
The same solution came to my mind last week for a similar problem, filter the indices if i>j just gave me the upper triangle (except diag.) of the Cartesian product matrix. 👍
@scotmorrsn
@scotmorrsn 4 месяца назад
Good stuff gents. Appreciate the extra package shared tho.
@mikep8857
@mikep8857 4 месяца назад
Another great episodw. Looking forward to the episode 200 party edition! In your plots the y axis was count data. It irritates me that ggplot will often put decimal points on scales even if you bother to define the variable as an integer. For one plot it's not too hard to manually input the breaks etc. Is there an easy way of getting round this problem when you are generating lots of plots, as in your example? It should be as simple as saying "this is integer data, 10.5 is meanigless!"
@mikep8857
@mikep8857 4 месяца назад
Great epidode! I learnt the invisible function which seems particularly appropriate for an episode on the Lord of the Rings!
@TidyX_screencast
@TidyX_screencast 4 месяца назад
I hadn't even thought about how well that worked! That's great! ~ Ellis
@IO-qt8kv
@IO-qt8kv 5 месяцев назад
How do you introduce and make predictions on other new dataset (not the test data)
@tomkmb4120
@tomkmb4120 5 месяцев назад
Any plans to do any more NBA stuff? I'd love to see something like trying to predict some of the awards as the season is winding down, maybe Most Improved player as predicted using ML - similar to the HOF pitching
@tomkmb4120
@tomkmb4120 5 месяцев назад
This was a fun one
@rafabws
@rafabws 6 месяцев назад
Great video, guys! And it's great that the baseball season around the corner, so lots of people should be itching for new data points (I mean, games lol)
@ahmedmo8814
@ahmedmo8814 7 месяцев назад
Excellent showcasing but we also need to know more about map function of purr
@scotmorrsn
@scotmorrsn 7 месяцев назад
🐈🐈
@joshsmith8389
@joshsmith8389 7 месяцев назад
now do barry bonds
@tdawry
@tdawry 7 месяцев назад
I'm not sure if the "A Aron" was a Key and Peele joke or not, but an interesting video either way
@ciensalud
@ciensalud 7 месяцев назад
You guys rockin', keep it up!
@fredericrioux6937
@fredericrioux6937 7 месяцев назад
Good work. You guys can get the innings pitched if you divide the IPOuts by 3 (IPOuts = Outs pitched)
@IgnacioAguilarToledo
@IgnacioAguilarToledo 7 месяцев назад
Great!
@ToniGril
@ToniGril 8 месяцев назад
Nice work guys! It would be great if you could do this with tidymodels framework.
@cornellmihkail1238
@cornellmihkail1238 8 месяцев назад
First
@djangoworldwide7925
@djangoworldwide7925 8 месяцев назад
You should really focus on advanced ggplot, tidy models and shiny stuff....
@patrickward6067
@patrickward6067 8 месяцев назад
Thanks for the reply. We have many episodes on ggplot2 and entire series on tidymodels and shiny. Is there something in particular you'd be interested in seeing? ~patrick
@yarriofultramar
@yarriofultramar 8 месяцев назад
Fantastic presentation! Thanks! I am very interested in learning more on Bayesian statistics.
@TidyX_screencast
@TidyX_screencast 8 месяцев назад
We did a whole series on Bayes, starting with episode 99! Bit.ly/TidyX_Ep99
@Aaqib..
@Aaqib.. 8 месяцев назад
Thanks a lot ​@@TidyX_screencast
@ArcenisRojas
@ArcenisRojas 8 месяцев назад
If you must use a for loop... # Using a for loop library(rlang) wgt_stat_list <- list() for (i in 1:5) { wgt_stat_list[[i]] <- get_wgt_score( fake_dat, sym(str_c("stat", i)), sym(str_c("stat", i, "_n")) ) }
@ArcenisRojas
@ArcenisRojas 8 месяцев назад
Also, thank you both for this video!
@ArcenisRojas
@ArcenisRojas 8 месяцев назад
I'd like to offer to solutions, both from the Tidyverse; the second one uses {rlang} for Tidyevaluation (avoid for loops!!!): # Doint it with a pivot_longer and pivot_wider fake_dat |> pivot_longer(starts_with("stat")) |> mutate( stat_number = str_extract(name, "\\d"), name = str_remove(name, "\\d") ) |> pivot_wider( names_from = name, values_from = value ) |> group_by(athlete, stat_number) |> summarise( total_obs = sum(stat_n), wgt_stat = weighted.mean(stat, stat_n) ) # Using tidy_eval library(rlang) get_wgt_score <- function(dat, variable, N) { dat |> group_by(athlete) |> summarise( total_obs = sum(!!variable), wgt_stat = weighted.mean(!!variable, !!N) ) |> mutate(stat = as_string(variable)) } map2( str_c("stat", 1:5), str_c("stat", 1:5, "_n"), \(x, y) fake_dat |> get_wgt_score(sym(x), sym(y))) ) |> list_rbind()
@brianingersoll5604
@brianingersoll5604 8 месяцев назад
Hey Ellis - great walk-through. I see you predicted both Beltre and Mauer, both of who made it. A-Rod won't make it due to external shenanigans, as not you note. The other player who made it was Todd Helton. Going to do pitchers next?
@zl1061
@zl1061 8 месяцев назад
Really appreciate the for loop solution as often running into this situation myself and I always wondered if you can do this without running the same code multiple times. This has been super helpful!
@rayflyers
@rayflyers 8 месяцев назад
You can accomplish it with a single pivot longer if the column names have a consistent pattern/separator. I gave the stat variables a new suffix ("_score") to make that happen. fake_dat |> rename_with(\(x) str_replace(x, "(.+\\d$)", "\\1_score")) |> pivot_longer( c(ends_with("_score"), ends_with("_n")), names_to = c("stat", ".value"), names_sep = "_", ) |> summarize( total_obs = sum(n), wgt_stat = weighted.mean(score, n), .by = c(athlete, stat) )
@Matt_Kumar
@Matt_Kumar 8 месяцев назад
Keeping Patrick on his toes :)
@caty863
@caty863 8 месяцев назад
I am right now building a shiny application prototype that I would like to demo to my colleagues. We use Dataiku in my organization and it has the capability to host shiny apps. However, since our tech guys are all pythonistas, most R packages that I am using don't run well on our Dataiku deployment (our tech guys don't care). I explored other alternatives for shiny app deployment but I couldn't find one that is practical enough. I am now going to try this *shinylive* option. Wish me luck.
@patrickolson4594
@patrickolson4594 8 месяцев назад
Thank you for the awesome tutorial. Did you try building an application with raw data from something like a CSV? Edit* I was able to get my personal data to work. Thank you again for your awesome tutorial.
@johnyuill8551
@johnyuill8551 8 месяцев назад
@patrickolson4594 - if you have details to share, that would be great. have struggled with importing data into the app unsuccessfuly. 😤
@valentinkamm4824
@valentinkamm4824 8 месяцев назад
How did you do it? I‘m also struggling
@gustavoortizvasquez6442
@gustavoortizvasquez6442 8 месяцев назад
Good content. Keep up with this channel!
@AHMED198750
@AHMED198750 9 месяцев назад
Nice presentation
@scotmorrsn
@scotmorrsn 9 месяцев назад
🤜🤛 Good to remember that base r still slaps
@djangoworldwide7925
@djangoworldwide7925 9 месяцев назад
You guys it's reall6kinda pointless to teach/show baseR. There are so many cool things to learn in broom, purr, vroom, prophet, gg's extension with html, more advanced joining with closest and more... Putting the time on base R with functionality that is so much easier in the tidyverse is just... Meh 🤷🏽‍♀️
@jkaeromero2288
@jkaeromero2288 9 месяцев назад
amazing tutorial
@fefoanza3283
@fefoanza3283 9 месяцев назад
Very interesting! I've always used ggplot2 for plotting but there are some situations in which I wonder if base R could help me. Main setback for this is that I'm really tidyverse oriented, and so my ability to find resources for non tidy approaches is lacking a bit. I'll drop here a few questions that interest me and might be interesting for others. 1. gganimate is slow! I often work with 2D paths data and it's not always easy to "see" the dynamics of the path. An example for this is gaze data from eye tracking, where plotting the gazeplot all at once often results in an unreadable mess.. So I've tried rendering them as animations, but the rendering part is so slow that it is unfeasable to use it in the "question-wrangle-plot" loop. I wonder if there are ways to do animations in base R that would make the process quicker. 2. the second question is tied to the first one. Since most of my data has a sampling frequency of at most 60hz but it could probably be visualised even at lower frequencies, I've often asked myself if there are ways to render animations in real time. I've seen some stuff from coolbutuseless but it's all a bit out of my league.. here are some examples coolbutuseless.github.io/package/eventloop/articles/stream-plotting.html coolbutuseless.github.io/package/eventloop/articles/demo-particles.html github.com/coolbutuseless/nara 3. how do you save the plots that you make with base R? in ggplot2 you have ggsave and you pass it the plot object. can you even save a base plot as an object? Thank you for what you do, it's truly helpful
@manueltiburtini6528
@manueltiburtini6528 9 месяцев назад
nice to see you|
@rayflyers
@rayflyers 9 месяцев назад
Some of my colleagues did a card sorting task for survey development and wanted to display the results in a dendrogram. Can you show how to make a dendrogram (both in ggplot2 and whatever else you find that's useful)?
@manueltiburtini6528
@manueltiburtini6528 10 месяцев назад
thanksss!