1  R Basics

Learning objectives
  • Be able to install and load packages in R.
  • Be able to set the working directory in R.
  • Be able to import .RData, .csv and .xlsx-files.
  • Understand why scripts are the main way of working with code in R.
Using LLMs for coding

Large Language Models (LLMs) like ChatGPT, Claude, and GitHub Copilot are powerful tools when performing programming tasks. They can be used to create and explain code as well as debugging. If you want to use LLMs to help you create code, then please read Appendix B - LLMs in programming.

1.1 Installing packages

A package in R is a set of commands which are not a part of the base-set in R. Many of the R-commands which are used throughout this course requires a certain package to be installed on the computer/Mac. It is a good idea to get familiar with Installing packages and loading them onto your R-script mainly so you won’t be missing them at the exercises, casework or examination.

In R there are two important commands concerning installation of packages.

  • install.packages() installs the target package on your computer.
  • remove.packages() uninstalls the package from your computer.

For example: install.packages(’readxl’) Installs the package readxl on the computer and remove.packages(’readxl’) uninstalls the package readxl from the computer.

1.1.1 Loading a package

When the packages are installed on the computer, you can load them onto your workspace/script at every occasion you initiate your analysis in R. To do this, you use the library() command. library() points at a package-library stored on your computer. Everytime you open a new session of R, you need to load the needed packages again.

For example, library(readxl) Loads the package readxl onto the workspace.

When you load a package, you might get warning messages like the following:

library(readxl)

Advarselsbesked:
pakke ‘readxl’ blev bygget under R version 3.1.3

1.2 Working directory

In R you are using something called a working directory or wd for short. This is the folder on your computer in which R saves and finds the projects that you are working on. This also makes it easier to load datasets. The working directory can be changed in R either manually or through code. getwd() and setwd() are the two important commands for changing the working directory.

getwd() # Show the current working directory

“/Users/madsbjorlie/Documents/Statistik/Exercises/Week 1”

setwd("~/Documents/R-træning") # Change the working directory 
getwd() # Show the current working directory

“/Users/madsbjorlie/Documents/R-træning”

1.3 Importing a dataset

Throughout this course you will need to import a lot of data into R. Getting familiar with the following packages and commands will help minimize your R-related frustration. Datasets can be imported into R in numerous ways. Like changing the working directory, it can be done both manually and through coding. We recommend doing it through coding since this makes it easier to maintain an overview.

Almost all of the datasets that will be handed out in this course will be in both the excel file-type .xlsx, as well as the .RData format. R is also capable of importing text-files such as .txt or .csv.

.xlsx-files are Microsoft Excel’s standard project file type, whereas .csv-files are short for comma separated values and is a term for text-files where the values are separated by a comma (or in the Danish Excel, a semicolon).

You can either load .RData files, import datasets through R’s inherent commands or use some data-import packages to import file-types such as .xlsx or .xls. Both methods works fine and which one you will use depends on your personal preference.

1.3.1 Importing an .RData file

If someone imported and stored the data as an .RData file, you can simply import it using the load() function. For this you do not need any libraries.

load(file.choose())
load("~/Documents/..../Beer GCMS.RData”)

The only difference in comparison with the import-methods below, is that you do not “pipe” (the <- function) the object into something you name yourself. The data object will retain the name as it was saved with. However, if you like your objects to be named something special (like X), then simply just add a line below the load() where you define it: e.g., X <- beer.

1.3.2 Importing a dataset through R’s own commands

As a default, R can not import Excel-files such as .xls and .xlsx. To use R’s read.csv() function, you need to save the Excel dataset as a .csv file. This is done by choosing (in Excel) and then selecting the .csv file-type. This might seem a bit tedious, but it eliminates the demand for other packages.

read.csv() imports the dataset specified in the parenthesis. This can be done in two ways: by typing the path to the file on your computer or by using the command file.choose() which corresponds to opening a new file. If the dataset is in the working directory, you do not have to type the full path, but just the file-name.
For example:

Beer <- read.csv(file.choose(), header=TRUE, sep=";", dec=",")  
Beer <- read.csv(”Beerdata.csv”, header=TRUE, sep=";", dec=",")  
Beer <- read.csv("~/Documents/R-traening/Øldata.csv”, header=TRUE, sep=";", dec=",")

The different arguments: header =, sep =, and dec = tells R how to import the data. header=TRUE tells R that the first row in the dataset is not a part of the data itself but carries the variable names. sep=”;” defines which separator the document uses. By using Danish Excel, this will always be semicolon. This can be checked by opening the dataset in NotePad on windows or TextEditor on Mac. dec=”,” defines which symbol is used for decimals. It is necessary to make sure that the dataset in R is separated by a full stop rather than a comma. This can be checked by using summary commands after the data has been imported.

1.3.3 Importing a dataset using packages

By using various packages, it is possible to import Excel-documents directly into R. This can be quite handy, but some of the packages will not run on Mac or on Windows due to other programs missing. The most commonly used data-import packages are: gdata, readxl, xlsx and rio. gdata requires Perl which is default on Mac and Linux but not on Windows and therefor it will not run on Windows unless it is installed. The following is a couple of examples using the various packages.

library(readxl)  
Beer <- read_excel(file.choose())  
Beer <- read_excel("~/Documents/R-traening/Beerdata.xls”)  
Beer <- read_excel("~/Documents/R-traening/Beerdata.xlsx”)  

library(gdata)  
Beer <- read.xls(file.choose())  
Beer <- read.xls("~/Documents/R-traening/Beerdata.xls”)  

library(xlsx)  
Beer <- read.xlsx(file.choose(), sheetIndex = 1)  
Beer <- read.xlsx("~/Documents/R-traening/Beerdata.xls”, sheetIndex = 1)  
Beer <- read.xlsx("~/Documents/R-traening/Beerdata.xlsx”, sheetIndex = 1)  

library(rio)  
Beer <- import(file.choose())  
Beer <- import("~/Documents/R-traening/Beerdata.xls”)  
Beer <- import("~/Documents/R-traening/Beerdata.xlsx”)  

1.3.4 Getting an overview of the dataset

When the dataset is imported into R, you can use different commands to check that it was imported correctly. The commands are head(), str() and dim().

  • head() shows the first 6 rows in the dataset.
  • str() shows the types of the various columns, such as numeric and factor.
  • dim() shows the dimensions of the data-matrix.
Coffee <- read.csv(file.choose(), header=TRUE, sep=";", dec=",")
head(Coffee)
  Sample Assessor Replicate Intensity  Sour Bitter
1    31C        1         1      9.30  6.90   6.75
2    31C        1         2      8.70  8.10   7.95
3    31C        1         3      9.75  8.70  10.20
4    31C        1         4     11.70 10.95  11.40
5    31C        2         1      8.70  5.70  11.40
6    31C        2         2      8.70  9.30  10.35
str(Coffee)
'data.frame':   192 obs. of  6 variables:
 $ Sample   : Factor w/ 6 levels "31C","37C","44C",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Assessor : int  1 1 1 1 2 2 2 2 3 3 ...
 $ Replicate: int  1 2 3 4 1 2 3 4 1 2 ...
 $ Intensity: num  9.3 8.7 9.75 11.7 8.7 ...
 $ Sour     : num  6.9 8.1 8.7 10.9 5.7 ...
 $ Bitter   : num  6.75 7.95 10.2 11.4 11.4 ...
dim(Coffee) # show dimensions of data
[1] 192   6

1.4 Scripts

We highly recommend that you make your data analysis using a script. A script is simply a flat text file that is given the surname .R such that R can interpret the commands. Here you will have the commands needed to do the analysis from setting necessary functions, import of data, initial inspection, modeling and plots.

Each analysis task is slightly different, however, almost always there is a set of generic tasks which is always needed. That is: cleaning up the workspace, loading packages, setting work directory, loading data and checking the data structure. That typically fills up the first 5-10 lines of code in every script as follows:

rm(list = ls()) # remove all variables in environment
graphics.off() # delete all generated plots

library(ggplot2) # load package ggplot2
library(readxl) # load package readxl

setwd("~/MyComputer/Courses/FDA/Exercises/Week1") # set working directory

data <- read_excel("SomeData.xlsx") # load file

head(data) # show first 6 lines of data

# plot data
ggplot(df, aes(x, y, color = treatment)) + 
    geom_point()

1.5 Rmarkdown

Rmarkdown is a framework for producing beautiful reports where code and text is combined side-by-side. It makes it possible to combine scripts (your data analysis) with output (figures, tables and numbers) and narrative (the fairytale on why and discussions etc.) in ONE document. It is rather simple when you get used to how it works, and it really makes life much more easy. Both for this course, but also for all other tasks which involves data analysis and is to be presented as a report.

It is referred to as reproducible data analysis, and is the opposite of Excel-Hell, where the latter is characterized by being non transparent, and really hard to figure out what happened between data and results.

Rmarkdown is especially recommended for the hand-in assignemnts doing the course.

1.5.1 How to setup a Rmarkdown document

Setting up an Rmarkdown document is easy and fast using RStudio. See the video below or follow the written instructions.

To setup a new Rmarkdown document do the following:

  1. In RStudio go to File -> New File -> RMarkdown
  2. In the popup set your title, author name and date (these can all be changed later).
  3. Choose your default output. HTML or PDF is recommended (this can be changed later).
  4. Click “OK”.

Now your new document has been created. It contains some settings at the very top:

---
title: "My first project"
author: "Jeppe"
date: "`r Sys.Date()`"
output: html_document # Change this to pdf_document for pdf output
---

These settings can be changed whenever. For more info on fancy settings see guide on HTML outputs or guide on PDF outputs.

Figure 1.1: A video on how to setup an Rmarkdown document using Rstudio.

1.5.2 Rendering markdown documents

For a guide on how to render as html and pdf see the video below.

LaTeX distribution for pdf output

In order to create a PDF document from Rmarkdown you will need to have a LaTeX distribution installed on your system. TinyTex is recommended for this, and it can easily be installed from within R using the tinytex package:

  1. Go to RStudio and run the following code to install the package used for installing TinyTex:
install.packages("tinytex")
  1. Run the following code to install TinyTex using the tinytex package:
tinytex::install_tinytex()
  1. You should now be able to create a pdf output.
Figure 1.2: A video on how to render Rmarkdown documents as HTML and PDF.

1.5.3 Mixing text, code, and math

In Rmarkdown you can easily mix text, code, mathematical expressions and much more. For a guide on how to format text, code etc. see the video or the instructions below.

Instructions on how to format text, renderable code etc. can be found in multiple places:

  • As the standard text that is already in your .Rmd document when you first create it.
  • As the Markdown Quick Reference within RStudio.
    • Go to Help > Markdown Quick Reference
  • As the Rmarkdown Cheat Sheet.
    • Go to Help > Cheat Sheets > R Markdown Cheat Sheet
Figure 1.3: A video on how to format text, code, math, list etc. within Rmarkdown documents.