Introduction to the dplyr R package

Roger Peng

Подписаться 27 тыс.

Просмотров 67 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

13 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 46

@ArunRangarajan 7 лет назад

Short, complete and crystal clear! You absolutely rock, Dr Roger Peng!

@pensivenincompoop2016 8 лет назад

I am new to R and I am learning it for my phylogenetics and statistics and I can already tell that this package is very useful. Thanks for the tutorial!

@ChristopherSkyi 9 лет назад

To get chicago.rds, go here: github.com/DataScienceSpecialization/courses/blob/master/03_GettingData/dplyr/chicago.rds

@jyotijain5157 8 лет назад

Thank You.

@anthonychariton9952 6 лет назад

Brilliant overview, thank you kindly for this

@bodobruckner9600 9 лет назад

Good, flawless and fast, as we have got to appreciate in Roger Peng´s and friends´ Coursera courses :-)

@PandiMengri 4 года назад

This is exactly what I was looking for! Thank you, Roger! :)

@kvafsu225 4 года назад

Really nice video.Thanks.

@AllenMartin-hp5yf Год назад

What/where is the website you downloaded "chicago" from?

@tuanlong9238 6 лет назад

my god, look like he uses R original version, supper =)))

@WahranRai 5 лет назад

14:27 assigning work variables and split one instruction per ligne is useful for debugging and facilitate the readibility of the code !!!

@michelemelchiori7628 9 лет назад

V Nice! Please consider to add the explanation of joins that are important too

@MrAlivallo 5 лет назад

so the hardest part of getting started with 'dplyr' is getting the data wrangled to match for manipulation. How do I do this inside {r} ? If I do this in PowerBI it is all Drag/Drop/Click. Why doesnt this exist for RStudio?

@kevinmaeir1612 7 лет назад

Hey, I have a table with 4 columns. 2 of them are list of diferents dates and in the another are numbers. I want to compare the columns of dates and get a new table just with the number of the same date. Can you help me? thks

@c.deg.7982 5 лет назад

For some reason I cannot get tally() or count() to work inside the summarize() function for a dataset grouped by a catagorical variable...

@linussunil83 9 лет назад

can someone explain me the step where he mutates tempcat column in df. i dont understand arguments used for factor : factor(1*(tmpd

@rohanshingade7228 8 лет назад

1 multiplied by (tmpd < 80). If we simply typle (tmpd < 80) we get logical vector. But we multiply it by 1 we will get a numeric vector.

@linussunil83 8 лет назад

Thanks buddy

@kevintan6484 8 лет назад

Hello everyone, I am such a beginner in R. I could not even import the Chicago.rds file right, I click the import data on the right hand side and I select the file and it turn to be messy code. So, I imported my own data (name data1) set from a txt file and try to follow the steps in the video. I can only success few of them, please help me out. I have checked many times that I have downloaded "dplyr" package, and I even try to reinstall the R and R studio, my R version is 3.2.4 data 1 looks like this: V1 V2 V3 V4 Product Names Qty Numeric No.1 Numeric No.2 1. head(select(data1, V1:V3)) returns: Error in head(select(data1, V1)) : could not find function "select" 2. data1.f = filter(data1, V4 > 50) returns: Error in filter(data1, V4 > 50) : object 'V4' not found Then I tried: data1.f = filter(data1, "V4" > 50) it worked, but when I View the data1.f, there are still numbers bigger smaller than 50 in V4 Then I tried: data1.f = filter(data1, data1$V4 > 50) I View all the "N/A" shown in the frame 3. Rename data.1 = rename(data.1, V1 = Productnames, V2 = Qty) returns: Error in rename(data.1, V1 = Productnames, V2 = Qty) : unused arguments (V1 = Productnames, V2 = Qty) 4. Group_by: goodbad = group_by(data1, tempcat) returns: Error: could not find function "group I am really appreciate you guys for helping me out of the wood!!

@lobbielobbie1766 8 лет назад

Hey Kelvin, It is quite difficult by just looking at the error messages without the dataset and reproducible examples. Here's a code sample which you can try. I am using RStudio and you can find a good dplyr cheat sheet at www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf. If you are worried or confused by the %>% pipe in the code, it just mean 'passing the results of one statement to the next' in layman terms. In addition, downloading the package means you are getting the package ready to be used. To use any package in your code, you need to import the package into your code using library() as shown. # import libraries library(dplyr) # create a data frame with named columns set.seed(888) MyDF 50 MyFilter % filter(SalesAmount > 50) View(MyFilter) # create a new sales commission variable using 1% of TotalSales MySales % mutate(MyCommission = 0.01 * SalesAmount) View(MySales) # sum totals by SalesID MySummary % group_by(SalesID) %>% summarise(NumbOfSales = n(), TotalSales = sum(SalesAmount), TotalCommission = sum(MyCommission)) View(MySummary) # sum sales amount by LocationID MyLocationSales % group_by(LocationID) %>% summarise(LocationSalesTotal = sum(SalesAmount)) View(MyLocationSales) HTH, Lobbie

@calefalejandrorodriguezcue3754 8 лет назад

Hi Roger. Thanks for this video. I have a DataFrame in R that has several variables (at least three). What I would like to do is to make a pivot table but showing sub totals for each of the variables. I've achieved this with only 2 variables but, unfortunately, when I add a third or a fourth variable doesn't add its sub total in its parent variable. Do you know how to do this in R? I've also tried it in pandas pivot_table but I've got the same. Please help :'(

@karimlameer 9 лет назад

Thanks Roger, where can I get the data set from? I tried looking for it.

@claveralvaro6245 5 лет назад

You can do it even from excell , just make sure you got the right kind of variables to work with. And also look for the packages you need to load the data in case of a xlsx format (excel file) is the package called "readxl". But if you are like , too lazy or something there are some default data files to work with like "iris" or "crabs" just put it as dataframe into a variable, print it and KAPOO YAH !

@mikebosko9077 9 лет назад

I'm new to R, what is meant by 'making sure all the factors are annotated'? I understand factors, but annotated how? Thanks much! -Mike

@mdev1187 9 лет назад

@3:14 it's the *levels* of any factors present (there aren't any in the chicago data.frame), so you can control if and when levels are kept or dropped. Usually I'd want retain levels of an *ordered* factor (like a Year), but not unordered ones (like City). If data is missing for a Year (derived from date variable) in one City I wouldn't want to lose that Year as a level, so make Year an Ordered Factor before filtering. If City were a factor I probably wouldn't want to retain every level after filtering, so it's best left as a character variable so the issue doesn't arise.

@lalaithan 7 лет назад

Can someone explain why it is that get all "NA"s when I input chicago

@claudiuskerth9497 9 лет назад

where can chicago.rds be downloaded from? It isn't the same dataset as in the gamair package may thanks

@michelemelchiori7628 9 лет назад

github.com/DataScienceSpecialization/courses/blob/master/03_GettingData/dplyr/chicago.rds then click on "Raw" button

@ghtyu99 7 лет назад

I have tried several times to download this dataset from GitHiub using the link above and also receive an error message (see below) whether or not I use the "View Raw" button. I am running R for Mac OS R 3.3.3 GUI 1.69 Mavericks build (7328). Does anyone have a workaround or correction? Thanks. "Error: bad restore file magic number (file may be corrupted) -- no data loadedIn addition: Warning message: file ‘chicago.rds’ has magic number 'X' Use of save versions prior to 2 is deprecated".

@jdlopez131 5 лет назад

Isn't sqldf package a lot better than dplyr? I mean sql commands :) need I say more?

@yousfoss4367 4 года назад

thks grand prof

@MultiHunter36 5 лет назад

why am I not able to use select function? Error in select(chicago, city:dptp) : could not find function "select" >

@rrmaximiliano 5 лет назад

Maybe you didn't load the dplyr package. Use library(dplyr)

@Dwright3316 9 лет назад

What version of R is Dr. Peng using here? I have downloaded R version 3.2.1 (2015-06-18). But, unfortunately, I cannot use the "chicago.rds" package -- error message -- is not available (for R version 3.2.1) Is there any workarounds for this? Or would I need to uninstall my current version of R and find the older version in order to install/load this package? Thank you! I'm new to programming in R, so any help would be greatly appreciated!

@lalaithan 7 лет назад

It's a dataset, not a package.

@kunalbali810 9 лет назад

I have two dataframe suppose like latitude longitude values 20 11 3.5 20 12 1.5 20 13 4.5 20 14 4 21 11 1.2 21 12 1.4 21 13 1.4 21 14 1.8 and latitude longitude values 20 11 3 20 12 1 20 13 4 20 14 4 21 11 1 21 12 1 21 13 1.4 21 14 1.2 now i need to get the result like 20 11 3.32 20 12 1.25 20 13 4.25 20 14 4 21 11 1.1 21 12 1.2 21 13 1.4 21 14 1.5 You see i just did the mean of 3rd column with each rows So how can i do that as i am dealing with atmospheric data so i need to do this please tell me how to do ??

@sushantchoudhary6393 9 лет назад

you could just say dataframe3$values = dataframe1$values + dataframe2$values. How you got 3.32 there in the third table though is ... it's not the mean of 3 and 3.5, just so we're on the same page.

@sushantchoudhary6393 9 лет назад

Sorry forgot to divide by 2. dataframe3$values = dataframe3$values/2

@kunalbali810 9 лет назад

Sushant Choudhary Do you know how to plot standard error or standard bar plot in time series graph ??

@sushantchoudhary6393 9 лет назад

Yes, I do. To say any more than that, I would need a more precise question, though.

@kunalbali810 9 лет назад

Sushant Choudhary Great. I have 13 year data of each months starting from 2002-09-01 to 2014-12-01. Now i need to plot the annualy mean graph with standard deviation and monthly mean (2002-2014) with standard deviation. The data is below . Hope you have got my point. africa_co china_co SM_CO 2002-09-01 2.05 2.11 2.09 2002-10-01 2.125 2.095 2.21 2002-11-01 2.035 2.175 2.095 2002-12-01 2.095 2.175 1.905 2003-01-01 2.15 2.29 1.815 2003-02-01 2.12 2.33 1.775 2003-03-01 2.025 2.475 1.875 2003-04-01 1.92 2.415 1.765 2003-05-01 1.885 2.335 1.585 2003-06-01 1.775 2.35 1.56 2003-07-01 1.87 1.91 1.59 2003-08-01 2.035 1.945 1.755 2003-09-01 2.145 1.95 2.125 2003-10-01 2.12 2.025 1.98 2003-11-01 2 2.12 1.89 2003-12-01 2.04 2.195 1.85 2004-01-01 2.105 2.285 1.72 2004-02-01 2.14 2.335 1.81 2004-03-01 2.07 2.52 1.75 2004-04-01 1.915 2.45 1.68 2004-05-01 1.82 2.185 1.57 2004-06-01 1.775 2.085 1.545 2004-07-01 1.88 1.91 1.62 2004-08-01 1.965 1.97 1.755 2004-09-01 2.09 2.035 2.33 2004-10-01 2.095 2.075 2.17 2004-11-01 1.98 2.075 2.02 2004-12-01 2.13 2.145 1.89 2005-01-01 2.185 2.34 1.78 2005-02-01 2.11 2.365 1.7 2005-03-01 2.005 2.535 1.725 2005-04-01 1.91 2.505 1.655 2005-05-01 1.805 2.26 1.585 2005-06-01 1.77 2.065 1.495 2005-07-01 1.85 1.87 1.59 2005-08-01 2.025 1.885 1.95 2005-09-01 2.19 1.955 2.365 2005-10-01 2.18 2.035 2.455 2005-11-01 2.09 2.065 2.08 2005-12-01 2.165 2.275 1.845 2006-01-01 2.115 2.265 1.72 2006-02-01 2.06 2.25 1.685 2006-03-01 1.905 2.38 1.69 2006-04-01 1.8 2.31 1.645 2006-05-01 1.74 2.135 1.545 2006-06-01 1.73 1.955 1.5 2006-07-01 1.795 1.885 1.515 2006-08-01 1.995 1.99 1.775 2006-09-01 2.09 2.1 2.205 2006-10-01 2.01 2.17 2.03 2006-11-01 2.005 2.165 1.9 2006-12-01 2.125 2.195 1.885 2007-01-01 2.215 2.315 1.8 2007-02-01 2.2 2.42 1.865 2007-03-01 2.17 2.535 1.825 2007-04-01 1.955 2.57 1.715 2007-05-01 1.81 2.225 1.585 2007-06-01 1.72 2.13 1.51 2007-07-01 1.84 1.87 1.53 2007-08-01 1.98 1.945 1.815 2007-09-01 2.115 2.05 2.54 2007-10-01 2.14 2.065 2.52 2007-11-01 2.005 2.07 2.03 2007-12-01 2.12 2.15 1.75 2008-01-01 2.115 2.25 1.71 2008-02-01 2.2 2.355 1.765 2008-03-01 2.09 2.45 1.815 2008-04-01 1.84 2.36 1.725 2008-05-01 1.75 2.265 1.545 2008-06-01 1.74 2.055 1.485 2008-07-01 1.85 1.855 1.525 2008-08-01 1.99 1.88 1.7 2008-09-01 2.095 1.885 1.995 2008-10-01 2.01 1.865 2.08 2008-11-01 1.98 1.865 1.915 2008-12-01 2.07 2.005 1.755 2009-01-01 2.125 2.18 1.695 2009-02-01 1.975 2.155 1.665 2009-03-01 1.945 2.375 1.635 2009-04-01 1.84 2.37 1.655 2009-05-01 1.73 2.17 1.565 2009-06-01 1.73 1.975 1.49 2009-07-01 1.83 1.83 1.48 2009-08-01 1.925 1.91 1.635 2009-09-01 2.04 1.91 1.82 2009-10-01 1.985 1.97 1.895 2009-11-01 1.925 1.95 1.89 2009-12-01 2.055 2.105 1.87 2010-01-01 2.09 2.125 1.74 2010-02-01 2.02 2.225 1.705 2010-03-01 1.95 2.415 1.7 2010-04-01 1.92 2.395 1.67 2010-05-01 1.775 2.16 1.555 2010-06-01 1.735 2.01 1.53 2010-07-01 1.835 1.83 1.55 2010-08-01 1.995 1.865 1.91 2010-09-01 2.16 1.91 2.38 2010-10-01 2.275 1.885 2.47 2010-11-01 2.045 1.97 1.91 2010-12-01 2.01 2.045 1.75 2011-01-01 2.12 2.245 1.675 2011-02-01 2.115 2.265 1.685 2011-03-01 2.06 2.35 1.685 2011-04-01 1.865 2.355 1.635 2011-05-01 1.755 2.075 1.54 2011-06-01 1.72 1.93 1.475 2011-07-01 1.84 1.89 1.51 2011-08-01 2.025 1.87 1.64 2011-09-01 2.175 1.92 2.04 2011-10-01 1.94 1.925 1.9 2011-11-01 1.895 1.88 1.735 2011-12-01 2.045 2.095 1.77 2012-01-01 2.155 2.215 1.705 2012-02-01 2.15 2.28 1.7 2012-03-01 2.065 2.385 1.685 2012-04-01 1.965 2.34 1.625 2012-05-01 1.765 2.2 1.535 2012-06-01 1.78 2.045 1.465 2012-07-01 1.82 1.93 1.5 2012-08-01 2.025 1.935 1.685 2012-09-01 2.11 1.955 2.07 2012-10-01 2.005 1.995 2.005 2012-11-01 1.94 1.925 1.9 2012-12-01 1.965 2.065 1.755 2013-01-01 2.065 2.17 1.64 2013-02-01 2.085 2.205 1.715 2013-03-01 1.975 2.305 1.7 2013-04-01 1.86 2.355 1.6 2013-05-01 1.8 2.1 1.54 2013-06-01 1.8 1.855 1.505 2013-07-01 1.9 1.775 1.52 2013-08-01 2.115 1.795 1.64 2013-09-01 2.085 1.865 1.825 2013-10-01 1.905 1.895 1.85 2013-11-01 1.895 1.895 1.685 2013-12-01 1.915 2.04 1.68 2014-01-01 2.07 2.115 1.645 2014-02-01 2.075 2.175 1.69 2014-03-01 2.035 2.34 1.73 2014-04-01 1.855 2.435 1.635 2014-05-01 1.725 2.09 1.545 2014-06-01 1.745 1.99 1.465 2014-07-01 1.8 1.775 1.48 2014-08-01 1.95 1.875 1.675 2014-09-01 2.005 1.835 1.915 2014-10-01 1.99 1.89 1.92 2014-11-01 1.975 1.92 1.79 2014-12-01 1.985 2.07 1.73