No video :(

Intermediate R Workshop - Data Management with dplyr and tidyr

Подписаться 1,8 тыс.

Просмотров 26 тыс.

50% 1

Workshop on data management using dplyr and tidyr at UW Tacoma. Assumes basic R knowledge. Recorded 1/31/2019.
clanfear.githu...

Опубликовано:

20 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 52

@vijay006 3 года назад

Dear Professor, I have looked at many R tutorial videos for self-learning. Your lectures are by far the best with absolute clarity and with a great deal of explanation. Slides are very useful. Thank you so much..

@krismopolitan 3 года назад

I'm mad at everyone in the world for not pointing me to this video sooner!

@user-gt1jk8vx1n 2 года назад

i wouldn't watch any other R tutorial after watching this awesome tutorial. thank you!

@StefanoVerugi 3 года назад

one of the most complete, professional, well explained and slide supported video on R I found on YT, thank you

@cclanfear 3 года назад

Thanks, much appreciated!

@BiologistDillon 2 года назад

Just a note at 8:00. Object assignment does not _need_ to be at the start of a pipe. It can be at the end through the use of a right facing arrow -> or equals sign. Sure it may be standard to go at the top, but with pipes reading left to right top to bottom, I find it more intuitive to assign the output of the pipe at the end of said pipe. For example:input _df %>% filter(some_var == "value") %>%group_by(some_var) -> output_dfFrom a consistency standpoint, I still tend to assign the output at the start of the pipe, but its absolutely doesnt need to always be at the start.

@cclanfear 2 года назад

In most classes I note the right assignment operator (as well as bidirectional operator), but not in these quick workshops as I don't spend much time on what is considered good R style (e.g., no equals assignment).

@windowviews150 2 года назад

This is the best tidyverse video on RU-vid. You have a new subscriber. Thanks for sharing!

@dantelangone4829 2 года назад

Thank you for posting this on RU-vid. I have the habit of going through lessons at 1.5x (or even faster), but I loved this class and enjoyed every minute of the full length. I loved your comment about "spiraling" on a problem, that is very applicable to life (and graduate research in general) and not only to programming.

@cclanfear 2 года назад

Thank you, much appreciated!

@haraldurkarlsson1147 2 года назад

I have been working with R for a few years now and this is some of the best stuff I have come across. Thanks!

@_subrata 2 года назад

I'm new to R and stumbled upon your video. This is the best thing happened to me. Thank you so much Charles

@ruthreshmahadevan3023 3 года назад

Thank you Charles for the crystal clear explanations... 🙏

@birdfullo7 3 года назад

Dude loves pipes.

@israkvelvarga993 2 года назад

Thank you for sharing this class, I have learned a lot! Greetings from Costa Rica

@ButterfaceGMusicSlump Год назад

Super great video! Thank you for the upload.

@glenpjanson5538 2 года назад

Sir, you are great!! Really love your classes.

@revolution77N 3 года назад

Best dplyr course ever! Thank you so much man!!

@uAkide 3 года назад

What a useful presentation, very clear, easy to follow, well explained, thank you for preparing this video!

@haraldurkarlsson1147 2 года назад

Charles, You might not be interested in this but I think the dplyr's "lag" function also works in the Usable Dates part. That is mutate(date = date.entered + lag(week, default=0) * 7)mutate(date = date.entered + lag(week, default=0) * 7). By setting default=0 you avoid a NA in the first row.

@user-dl5go9tg6g 3 месяца назад

So smooth. Thank you

@cristianmarcelovillegaslob9860 2 года назад

amazing video! congrats

@Motivational_Child 3 года назад

Excellent presentation! Thanks for sharing.

@sakhawat3003 3 года назад

a little correction at 40:30, it's not allowed anymore to use funs() inside summarize_at(). Rather it is suggested to use list(). For example: list(mean=mean, sd=sd)

@neontrain 2 года назад

wow this is one of the most informational videos on this Ive seen. Ive been having this issue on a dataframe not being able to calculate the mean/max in a column that has null values in it. my line currently looks like trackman %>% group_by(tagged_pitch_type) %>% summarise(mean(as.numeric(spin_rate, na.rm = TRUE))) it works for a few of the different pitch types but the ones that have null values dont work.

@cclanfear 2 года назад

You mean NA values, not NULL values, I assume. If so, you just have an argument out of place in summarize: summarise(spin_rate = mean(as.numeric(spin_rate), na.rm = TRUE)).

@hobao4965 2 года назад

thank you so much for such a great lecture !!

@cyberbrodi 4 года назад

Very helpful Charles! Thank you!

@spirosgyparakis8888 3 года назад

amazing job! Thanks a lot

@hongkaizhang3000 2 года назад

very good！谢谢！

@eileenxu1423 4 года назад

Great video! Well explained!

@heraldfinch7010 5 лет назад

A really really good explanation ..

@mightyowl1668 4 года назад

awesome presentation! thanks a lot

@SevenRavens007 4 года назад

Awesome thanks for sharing!

@mariocailotto7128 4 года назад

Very nice video, thank you!

@haraldurkarlsson1147 2 года назад

I guess gather and spread are now pivot_longer and pivot_wider... It is hard to keep up with this changes (gather and spread actually made sense to me).

@cclanfear 2 года назад

Yep, I was fine with gather and spread, but in changing these they also added some useful features. They're a bit more powerful.

@diegopacheco9367 4 года назад

nice video.

@patrickmuvunyi55 3 года назад

46:31 Thanks Charles for your straightforward explanation! However, I have been trying to apply the group_by function, but R gives me this, Error: unexpected ')' in " n = n())"

@cclanfear 3 года назад

You likely have an extra ) somewhere or are missing something higher up in the code. Check RStudio for a red mark on the left side pointing out an error in the code.

@patrickmuvunyi55 3 года назад

Thanks! To make my question clear, here is my code library(magrittr) aa %>% group_by(Year) %>% summarise(Life.expectancy mean = mean(Life.expectancy), Life.expectancy median = median(Life.expectancy), n = n()) %>% And they gave me this in console: Error: unexpected symbol in " Life.expectancy median" > n = n()) %>% Error: unexpected ')' in " n = n())"

@cclanfear 3 года назад

@@patrickmuvunyi55 Two issues: (1) The code ends with a pipe, (2) there are spaces in variable names, which is not permitted unless surrounded by backticks (`). Fixed: aa %>% group_by(Year) %>% summarise(Life.expectancy.mean = mean(Life.expectancy), Life.expectancy.median = median(Life.expectancy), n = n())

@patrickmuvunyi55 3 года назад

@@cclanfear I appreciate, Sir! It has finally worked! Am not going to fail this anymore!

@miztx2syuiip590 2 года назад

for me and current gen i find that starting with Pen first works much better and ending with Pen inputting more in between making it flow liquidity so much easier-- Teplace pen with Pipe my my too many pens before pipes airplanes eject whoa

@haraldurkarlsson1147 2 года назад

In regards to the preponderance of NAs in the Billboard rank data would it not be rendered useless in terms of data analysis as it stands (without some sort of fancy imputation etc)?

@cclanfear 2 года назад

Billboard has two types of NAs: (1) False NAs that appear when a song is no longer on the billboard, which are the result only of the data being in wide format. (2) What appear to be true NAs where some songs are no longer tracked after like 20 weeks (truncated observations). If you were modeling these data, you could use something like a survival model for the truncated observations.

@haraldurkarlsson1147 2 года назад

@@cclanfear I was wondering about the same thing. Survival analysis might work here. NAs would be censored data. This might be an interesting problem to tackle in class (if you have not done so already). Thanks.

@1622roma 2 года назад

I keep getting errors in 1:11:13 Error: unexpected symbol in: " select(-minutes, -seconds) summary" There are a few codes from your slides keeps giving me a error message; for example; billboard_1 %>% select(artist, track, weeks_at_1) %>% distinct((artist, track, weeks_at_1) %>% arrange(desc(weeks_at_1)) %>% head(7) I did name billboard-1 to a different variable, due to billboard_2000 gave me problems. Please check!

@cclanfear 2 года назад

I'd advise looking carefully at the code and perhaps checking the website: clanfear.github.io/Intermediate_R_Workshop/ In your code example, for instance, you have an extra ( in your distinct() call. In your select() error you likely also have an added or missing character like ( or ,