Using Plotly in R for Panel Data Visualization

Gifa Delyani Nursyafitri
5 min readApr 12, 2019

--

Holaaa, readers!

Right now, I want to share about how to using plotly in R for Panel Data Visualization. Stay tune :)

Source : https://plot.ly/products/dash/

Before we talk about how to using plotly in R, I’ll tell you about Panel Data.

What is Panel Data?

Data panel is data that is formed from two data structures, time series and cross section.

Time series is is a group of obsevation on a single entity over time. For example : Number of Rainfall each day in Indonesia for 10 years.

Cross section is a group of observations of multiple entities at a single time. For example : Number of Population for each Indonesian provinces in 2018.

We can call our data as Panel Data if the data organized in both dimentions. For example : Number of Population for each provinces in from 2010–2018.

The data that I used is dataset Gapminder. Just click the link below to get the data!

Dataset Gapminder

First of all, input the data from sheet 1, sheet2, sheet 3, and sheet 4. The data format is .xlsx, so we must install package “openxlsx” before we use the appropriate script to input XLSX data into R.

#for sheet 1
gapminder <- read.xlsx("E:\\file kuliah\\smt 6\\Data Visualization\\uk2\\gapminder.xlsx", sheet = 1, startRow = 1, colNames = TRUE)
View(gapmider)
GDP Data
#for sheet 2
gapminder1 <- read.xlsx("E:\\file kuliah\\smt 6\\Data Visualization\\uk2\\gapminder.xlsx", sheet = 2, startRow = 1, colNames = TRUE)
View(gapminder1)
Population Data
#for sheet 3
gapminder2 <- read.xlsx("E:\\file kuliah\\smt 6\\Data Visualization\\uk2\\gapminder.xlsx", sheet = 3, startRow = 1, colNames = TRUE)
View(gapminder2)
Life Expectancy Data
#for sheet 4
gapminder3 <- read.xlsx("E:\\file kuliah\\smt 6\\Data Visualization\\uk2\\gapminder.xlsx", sheet = 4, startRow = 1, colNames = TRUE)
Region Data

Then create a vector that contains only one column, namely the country. Then repeat as many as 47 times, because of the years from 1970 to 2016. Repeat the years (from 1970 to 2016) as many as 170 times, because there are 170 countries.

#mengambil varibelcountry.vec <- gapminder [,1]
country.vec
# membuat pengulangan variabel panel (replikasi)country_panel <- c()
for (i in 1:170)
{
x = rep(country.vec[i], 47)
country_panel <- append(country_panel, x)
}
View(country_panel)
years_panel <- rep(1970:2016, 170)
years_panel
gdp_panel <- c()
for (i in 1:170)
{
x = gapminder[i,]
x = x[-c(1:3)]
x = t(x)
gdp_panel <- append(gdp_panel, x)
}
gdp_panel

Do the same thing to variable population in sheet 2 and life expectancy in sheet 3.

#mengambil data untuk sheet populasi
pop_panel <- c()
for (i in 1:170)
{
x = gapminder1[i,]
x = x[-c(1:3)]
x = t(x)
pop_panel <- append(pop_panel, x)
}
pop_panel
#mengambil data life expectation untuk sheet 3
life_panel <- c()
for (i in 1:170) {
x = gapminder2[i,]
x = x[-c(1:3)]
x = t(x)
life_panel <- append(life_panel, x)
}
life_panel

The next steps, create a vector that contains only one column (column 6), and namely the region. Then repeat as many as 47 times, because of the years from 1970 to 2016.

region.vec <- gapminder3 [,6]
region.vec
region_panel <- c()
for (i in 1:170) {
x = rep(region.vec[i], 47)
region_panel <- append(region_panel, x)
}
region_panel

After getting 6 vectors; namely country_panel, years_panel, gdp_penel, pop_panel, life_panel, and region_panel, combine the vector into a data frame.

gapminder_frame <- data.frame(region_panel,country_panel, years_panel, gdp_panel, pop_panel, life_panel)
View(gapminder_frame)

Below is a snapshot of the original data looks like after loading the dataset into a dataframe.

Next, make a visualization of gapminder_frame ggplot with the name gap1. The x axis is the log of gdp_panel and the y axis is life_panel.

#membuat visual dengan plotlygap1 <- ggplot(gapminder_frame, aes(x = log(gdp_panel), y = life_panel)) + geom_point()
gap1

From the picture above there isn’t any information we can get. So, we need to make a layer based on the years with the name gap2.

gap2 <- ggplot(gapminder_frame, aes(x = log(gdp_panel), y = life_panel)) + geom_point(aes(frame = years_panel))
ggplotly(gap2)

The output above still can’t give the detailed information because all points are black so they cannot be categorized by country. So to distinguish plots by country, we can add the script “color = country_panel” with the name gap3.

gap3 <- ggplot(gapminder_frame, aes(x = log(gdp_panel), y = life_panel, color = country_panel)) + geom_point(aes(frame = years_panel))
ggplotly(gap3)

To see the final results, there are several conditions that must be met to get the visualization, namely:

Axis x (x axis) = gdp_panel
Axis y (y axis) = life_panel
Color (color plot) = country_panel
Size (plot size) = pop_panel
Shape (plot form) = region_panel by continent

gap4 <- ggplot(gapminder_frame, aes(x = log(gdp_panel), y = life_panel, color = country_panel)) + geom_point(aes(shape = region_panel,size = pop_panel,frame = years_panel))
ggplotly(gap4)

Yeay! This is how the animation look like. So beautiful, right?

Now you can enjoy your well deserved GIF animation!

See you on another topic!

--

--

Gifa Delyani Nursyafitri
Gifa Delyani Nursyafitri

Written by Gifa Delyani Nursyafitri

Ku abadikan disini, karena aku paham betul bahwa ingatan manusia terbatas.