Data Manipulation in R with dplyr

  • The R language is widely used among data scientists, statisticians, researchers and students.

    It is simply the leading tool for statistics, data analysis and machine learning. It is platform-independent, open-source, and has a large, vibrant community of users.

    The Comprehensive R Archive Network is the one-stop-shop for all R packages.

    This really brings us to the package to be discussed on this blog – dplyr. The CRAN documentation for dplyr can be found here.

    For this blog, I would be demonstrating the 5 operations of the package. The first thing we would need is to install the package and load the library.

    install.packages(“dplR”)

    > library(dplR)

    We then need to find a dataset on which we could run these operations. CRAN makes the download logs of their packages publicly available here – CRAN package download logs. Let us download the file for July 8, 2014 (we could really pick a log from any date) onto RStudio’s working directory.

    Once the file has been copied onto the working directory of R, execute the below line (where the variable path2csv stores the location of the csv)

    > mydf <- read.csv(path2csv, stringsAsFactors = FALSE)

     

    we then save the data frame onto a variable called cran by converting it to a tbl_df to improve readability. Calling the variable cran prints out the contents.

    > cran <- tbl_df(mydf)
    > cran
    
    

    Capture_Dplyr1.PNG

    The dplyr philosophy is to have small functions that do one thing well. There are basically 5 commands that cover most of the fundamental data manipulation tasks.

    • select()
    Usually in the entire data set that we use for analyis, we would really be interested in a few columns. This function is used to select / fetch the columns which are required. If I only need the columns ip_id, package and country. I execute the following statement –
    > select(cran, ip_id, package, country)

    CaptureDplyr2.PNG

    It is important to note that the columns are returned in the order in which we specified, irrespective of how it was in the original dataframe.
    We could also use the ‘-‘ sign to ommit the columns we do not need.
    > select(cran, -time)
    CaptureDplyr3.PNG
    
    • filter()
    Now that we know how to select columns, the next logical thing would be to be able to select rows. That is where the filter() function comes in.
    This is like the ‘where’ clause in SQL. Let us understand this by an example –
    > filter(cran, package == "swirl")

    CaptureDplyr4.PNG

    If you look at the column ‘package’, we now see that the resulting dataframe has only rows which have the package as ‘swirl’.
    Multiple conditions can be passed to filter() one after the other. For example, if I want to fetch all swirl packages downloaded on the OS – linux in India:
    > filter(cran, package == "swirl", r_os == "linux-gnu", country == "IN")

    CaptureDplyr5.PNG

    • arrange()
    This is used to order the rows of a dataset according to the values of a particular variable. Suppose we want to order all rows of a dataset in ascending / descending order of a column. Notice the ip_id column listed in descending order.
    > arrange(cran2, desc(ip_id))

    CaptureDplyr6.PNG

    • mutate()

    This function is used to edit or add additional columns to the dataframe. Suppose I want to convert the size column which is in bytes to megabytes and store the values in a column called size_mb.

    > mutate(cran3, size_mb = size / 2^20)

    CaptureDplyr7.PNG

    • sumarize()

    This function is used to collapse the dataset into a single row, the go-to function to calculate the mean in a sanitized dataframe.

    For example – I want to know the average download size from the size column.

    > summarize(cran, avg_bytes = mean(size))

    CaptureDplyr8.PNG

    sumarize() can also be used to fetch records in groups using the FOR EACH construct.
    Disclosure: The above example is from the dplyR lesson on the swirl package.

42 thoughts on “Data Manipulation in R with dplyr”

  1. Greetings from Idaho! I’m bored to death at work so I decided
    to browse your blog on my iphone during lunch break.
    I really like the info you present here and can’t wait
    to take a look when I get home. I’m amazed at how quick your
    blog loaded on my cell phone .. I’m not even using WIFI, just
    3G .. Anyhow, superb blog! http://bing.net

  2. I have been exploring for a little bit for any high quality articles or blog posts
    in this kind of space . Exploring in Yahoo I
    eventually stumbled upon this site. Reading this information So i
    am satisfied to show that I have an incredibly excellent uncanny feeling I came upon just
    what I needed. I most undoubtedly will make certain to don?t omit this web site and
    provides it a glance on a relentless basis. https://milkyway.cs.rpi.edu/milkyway/team_display.php?teamid=59892

  3. Youre so cool! I dont suppose Ive learn something like this before. So nice to search out anyone with some authentic thoughts on this subject. realy thanks for starting this up. this web site is something that is needed on the web, somebody with a little originality. useful job for bringing something new to the web!

  4. The purpose was never to create a martial art” and even to earn money.
    It was to survive and the men and women who preserved these ways did just that…in the
    worst locations conceivable.

  5. Typically with lotions and a few body oils I’d discover myself utilizing extra lotion at some time throughout the day because my skin could become dried.

  6. I will immediately grasp your rss feed as I can’t in finding your
    email subscription hyperlink or e-newsletter service.
    Do you’ve any? Kindly permit me know so that I may subscribe.
    Thanks.

  7. Hi there just wanted to give you a quick heads up and let you know a few of the images aren’t loading correctly.
    I’m not sure why but I think its a linking issue. I’ve
    tried it in two different web browsers and both show the same results.

  8. You really make it seem so easy along with your presentation but I
    find this matter to be really something which I feel I’d by no means understand.
    It kind of feels too complicated and extremely huge for me.

    I’m taking a look forward in your next post, I’ll
    try to get the cling of it!

  9. Simply want to say your article is as surprising.
    The clarity in your post is just excellent and i can assume
    you are an expert on this subject. Well with your permission allow me to grab your feed to keep up to
    date with forthcoming post. Thanks a million and please keep up the gratifying work.

  10. I’m not sure why but this weblog is loading extremely slow for me.
    Is anyone else having this problem or is it a issue on my end?
    I’ll check back later on and see if the problem still exists.

  11. This is the right site for everyone who wants to find out about this topic.
    You understand a whole lot its almost tough to argue with you (not that I personally would want to…HaHa).
    You certainly put a brand new spin on a topic that’s been written about for many
    years. Excellent stuff, just great!

  12. Great article! This is the kind of info that are
    supposed to be shared around the internet. Disgrace on the seek
    engines for not positioning this publish upper! Come on over and discuss with my website .

    Thank you =)

  13. Hello! I know this is kinda off topic but I was wondering
    if you knew where I could locate a captcha plugin for my comment form?
    I’m using the same blog platform as yours and I’m having difficulty finding one?
    Thanks a lot!

  14. You can certainly see your enthusiasm within the article you write.
    The sector hopes for even more passionate writers like you who are not afraid to mention how they believe.
    At all times follow your heart.

  15. Everything is very open with a precise explanation of the challenges.
    It was really informative. Your website is very helpful.
    Thanks for sharing!

  16. I blog frequently and I seriously appreciate your information. Your article has truly peaked my
    interest. I am going to book mark your blog and keep checking for new details about once a week.
    I opted in for your Feed as well.toplist

  17. Do you mind if I quote a couple of your posts as long as I provide credit
    and sources back to your website? My website is
    in the exact same niche as yours and my visitors would truly benefit from some of the information you
    provide here. Please let me know if this alright with
    you. Regards!

  18. Howdy! Someone in my Facebook group shared this website with us so
    I came to look it over. I’m definitely loving the information. I’m bookmarking and will be tweeting this
    to my followers! Excellent blog and great style and design.

  19. My partner and I absolutely love your blog and find nearly all of your
    post’s to be just what I’m looking for. can you offer guest writers to write content in your case?
    I wouldn’t mind writing a post or elaborating on a number of the subjects you write related to here.
    Again, awesome site!

  20. I jjust like the helpful info youu supply on your articles.
    I’ll bookmasrk your weblog and check again right here frequently.
    I’m relatively sure I’ll learn plenty of new stuff proper right here!

    Best off luck for the next!

  21. hello there and thank yyou for your info ? I have definitely ppicked upp something new from right here.
    I did however expertise several technical issues using this web
    site, as I experiencxed to rewload the web site lots of times previous to
    I could get it to load correctly. I had been wondering
    if your web hpsting is OK? Not that I’m complaining, but sluggish loading instances
    times will sometimes affet your placeement in google annd ccan damage your high-quality score if ads and marketing with Adwords.
    Anyway I’m adding this RSS too my email and can look out for a lot
    more of your respective excitinng content. Make sure you
    update this again soon.

  22. Someone essentially help to make significantly posts I might state. This is the first time I frequented your web page and so far? I surprised with the research you made to make this actual submit amazing. Wonderful job!

Comments are closed.