# Importing Data Files into R

Preparing and managing data is one of the most time-consuming task for data scientists. Importing data into a data science software is the first step for the downstream analysis, which seems trivial but at times frustrating and time consuming to figure out the right way. Data can be directly entered into a system (often feasible only for small data), or imported into analytical and visualization software from different data file formats. In this post, I put together a list of R packages/functions (see Table below) with illustrations that, I hope, will ease the burden of locating the right package/function. Note that the table neither list all data file formats nor all available packages. For a comprehensive R import/export discussion you can read R Import/Export documentation.

R packages / functions to read/import data files

Before running the data import function, make sure that the required packages are installed. If the package is not base package or not installed previously, the first task is installing and loading the package. I wrote the following R function to install R packages along with any package that the given package depends on.

### Install R package¶

In [ ]:
#  R function (r.pkg.install) to install multiple packages at once.

r_pkg_install <- function(r_pkgs)
for (i in 1:length(r_pkgs)){
# "dependencies" option is TRUE to install all packages that a given package depends on.
install.packages(paste(r_pkgs[i]), lib="directory you want to install package",
repos = "https://cran.cnr.berkeley.edu/", dependencies = TRUE)
}

# List all R packages you need to install into your computer ;

r_pkg_install(r_pkgs=c("xlsx", "XML", "sas7bdat", "foreign", "R.matlab"))

In [4]:
# Locating the data file directory.
# You can specify the full directory in the R function or, or change the working directory so that you don't need to specify every time you run your R function.

setwd("C:/Users/TM/MWWW/dataset1/") # Replace by your working directory

# Check working directory
getwd()

"C:/Users/TM/MWWW/dataset1"
In [ ]:
# Note that each function has several arguments and can be checked using help() or ?. For example,


### Import tab delimeted text file¶

In [5]:
# using read.table ;


Sepal_LSepal_WPetal_LPetal_WSpecies
15.1 3.5 1.4 0.2 setosa
24.9 3 1.4 0.2 setosa
34.7 3.2 1.3 0.2 setosa
44.6 3.1 1.5 0.2 setosa
55 3.6 1.4 0.2 setosa
65.4 3.9 1.7 0.4 setosa

Caution should be taken in reading text files with an empty (" ") missing data value. For example, for missing data values within a tab delimited text file, you can specify sep="\t" option to read in the data properly.

In [ ]:
tab_data <- read.table("iris.txt", sep="\t", header = TRUE)


Some data files coded missing values differently, for example "?". We can read such data using the na.string option. This will automatically replace the special missing values by NA. For example, if you are reading a data with missing value recorded as "?", then use the following code

In [ ]:
tab_data_missing <- read.table("iris_missing.txt", na.strings=c("?"), header = TRUE)


On the other hand, if you want to replace NA by a different value, for example, replacing all NA by 0, you can use the following code

In [ ]:
tab_data[is.na(tab_data)] <- 0


### Import comma separated value (CSV) file¶

In [ ]:
csv_data <- read.csv("iris.csv" )


### Import excel file¶

In [ ]:
library(rJava)
library(xlsxjars)
library(xlsx)

xlsx_data <- read.xlsx("iris.xlsx", sheetName = "iris" )


### Import SPSS file¶

In [ ]:
library(foreign)

spss_data <- read.spss("iris.sav", to.data.frame = TRUE )


### Import STATA file¶

In [ ]:
library(sas7bdat)



### Import SAS file¶

In [ ]:
library(sas7bdat)



### Import SAS export file¶

In [ ]:
library(foreign)