Monday, 7 July 2014

Bar plots and histograms in R

Last week i had to analyse some data and present some statistics about it. In order to make my statistical presentation fanciful, i needed to plot statistical distributions in either bar plots or histograms. I know Microsoft Excel can do the job here. But I desired to have this done using the R statistical package. And this is how it went:

Data entry

The first step is to have the data that will be used to generate the plots. In R, data can be entered manually, or can be imported. For very small data sets, I preferred to enter the data manually. But it is often the case that we have to generate plots for large data sets. But in both cases, one requirement is that this data has to be captured in a specific data structure like a vector (which is the simplest case) or matrix.

Initially my first attempt was to plot a histogram but this turned out to be very confusing whereas the bar plots were relatively easier to implement.

Bar Plots

Manual Process

Consider some frequencies for 10 interval ranges between 0 to 100%
0-9:  2
10-19:  4
20-29:  8
30-39:  18
40-49:  30
50-59:  50
60-69:  40
70-79:  28
80-89:  15
90-100:  5

Now a bar plot would be suitable in this case. The idea is to plot the ranges horizontally and the frequencies vertically. So what you need to do here is to put the frequencies in a vector (call it freqs). This is how it goes in R:

R-prompt> freqs -> c(2, 4, 8, 18, 30, 50, 40, 28, 15, 5)

Then you specify the ranges for the horizontal axis:

R-prompt> names(freqs) -> c("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90-100")

And finally do the plot:

R-prompt> barplot(freqs, xlab="range", ylab="frequency")



No comments:

Post a Comment