I am new to R and I am learning it for my phylogenetics and statistics and I can already tell that this package is very useful. Thanks for the tutorial!
so the hardest part of getting started with 'dplyr' is getting the data wrangled to match for manipulation. How do I do this inside {r} ? If I do this in PowerBI it is all Drag/Drop/Click. Why doesnt this exist for RStudio?
Hey, I have a table with 4 columns. 2 of them are list of diferents dates and in the another are numbers. I want to compare the columns of dates and get a new table just with the number of the same date. Can you help me? thks
Hello everyone, I am such a beginner in R. I could not even import the Chicago.rds file right, I click the import data on the right hand side and I select the file and it turn to be messy code. So, I imported my own data (name data1) set from a txt file and try to follow the steps in the video. I can only success few of them, please help me out. I have checked many times that I have downloaded "dplyr" package, and I even try to reinstall the R and R studio, my R version is 3.2.4 data 1 looks like this: V1 V2 V3 V4 Product Names Qty Numeric No.1 Numeric No.2 1. head(select(data1, V1:V3)) returns: Error in head(select(data1, V1)) : could not find function "select" 2. data1.f = filter(data1, V4 > 50) returns: Error in filter(data1, V4 > 50) : object 'V4' not found Then I tried: data1.f = filter(data1, "V4" > 50) it worked, but when I View the data1.f, there are still numbers bigger smaller than 50 in V4 Then I tried: data1.f = filter(data1, data1$V4 > 50) I View all the "N/A" shown in the frame 3. Rename data.1 = rename(data.1, V1 = Productnames, V2 = Qty) returns: Error in rename(data.1, V1 = Productnames, V2 = Qty) : unused arguments (V1 = Productnames, V2 = Qty) 4. Group_by: goodbad = group_by(data1, tempcat) returns: Error: could not find function "group I am really appreciate you guys for helping me out of the wood!!
Hey Kelvin, It is quite difficult by just looking at the error messages without the dataset and reproducible examples. Here's a code sample which you can try. I am using RStudio and you can find a good dplyr cheat sheet at www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf. If you are worried or confused by the %>% pipe in the code, it just mean 'passing the results of one statement to the next' in layman terms. In addition, downloading the package means you are getting the package ready to be used. To use any package in your code, you need to import the package into your code using library() as shown. # import libraries library(dplyr) # create a data frame with named columns set.seed(888) MyDF 50 MyFilter % filter(SalesAmount > 50) View(MyFilter) # create a new sales commission variable using 1% of TotalSales MySales % mutate(MyCommission = 0.01 * SalesAmount) View(MySales) # sum totals by SalesID MySummary % group_by(SalesID) %>% summarise(NumbOfSales = n(), TotalSales = sum(SalesAmount), TotalCommission = sum(MyCommission)) View(MySummary) # sum sales amount by LocationID MyLocationSales % group_by(LocationID) %>% summarise(LocationSalesTotal = sum(SalesAmount)) View(MyLocationSales) HTH, Lobbie
Hi Roger. Thanks for this video. I have a DataFrame in R that has several variables (at least three). What I would like to do is to make a pivot table but showing sub totals for each of the variables. I've achieved this with only 2 variables but, unfortunately, when I add a third or a fourth variable doesn't add its sub total in its parent variable. Do you know how to do this in R? I've also tried it in pandas pivot_table but I've got the same. Please help :'(
You can do it even from excell , just make sure you got the right kind of variables to work with. And also look for the packages you need to load the data in case of a xlsx format (excel file) is the package called "readxl". But if you are like , too lazy or something there are some default data files to work with like "iris" or "crabs" just put it as dataframe into a variable, print it and KAPOO YAH !
@3:14 it's the *levels* of any factors present (there aren't any in the chicago data.frame), so you can control if and when levels are kept or dropped. Usually I'd want retain levels of an *ordered* factor (like a Year), but not unordered ones (like City). If data is missing for a Year (derived from date variable) in one City I wouldn't want to lose that Year as a level, so make Year an Ordered Factor before filtering. If City were a factor I probably wouldn't want to retain every level after filtering, so it's best left as a character variable so the issue doesn't arise.
I have tried several times to download this dataset from GitHiub using the link above and also receive an error message (see below) whether or not I use the "View Raw" button. I am running R for Mac OS R 3.3.3 GUI 1.69 Mavericks build (7328). Does anyone have a workaround or correction? Thanks. "Error: bad restore file magic number (file may be corrupted) -- no data loadedIn addition: Warning message: file ‘chicago.rds’ has magic number 'X' Use of save versions prior to 2 is deprecated".
What version of R is Dr. Peng using here? I have downloaded R version 3.2.1 (2015-06-18). But, unfortunately, I cannot use the "chicago.rds" package -- error message -- is not available (for R version 3.2.1) Is there any workarounds for this? Or would I need to uninstall my current version of R and find the older version in order to install/load this package? Thank you! I'm new to programming in R, so any help would be greatly appreciated!
I have two dataframe suppose like latitude longitude values 20 11 3.5 20 12 1.5 20 13 4.5 20 14 4 21 11 1.2 21 12 1.4 21 13 1.4 21 14 1.8 and latitude longitude values 20 11 3 20 12 1 20 13 4 20 14 4 21 11 1 21 12 1 21 13 1.4 21 14 1.2 now i need to get the result like 20 11 3.32 20 12 1.25 20 13 4.25 20 14 4 21 11 1.1 21 12 1.2 21 13 1.4 21 14 1.5 You see i just did the mean of 3rd column with each rows So how can i do that as i am dealing with atmospheric data so i need to do this please tell me how to do ??
you could just say dataframe3$values = dataframe1$values + dataframe2$values. How you got 3.32 there in the third table though is ... it's not the mean of 3 and 3.5, just so we're on the same page.