I have X and Y data and want to put 95% confidence interval in my R plot. What is the command for that. 2012), and not only calculate 95% Confidence Intervals on these slopes (which so far. As shown in Figure 1, we created a dotplot with confidence intervals with the previous code. Example 2: Drawing Plot with Confidence Intervals Using plotrix Package. This example explains how to use the plotrix package to draw a confidence interval plot in R.
The 'base package' in R does not have a command to calculate confidence intervals for RRs, ORs. However, there are supplemental packages that can be loaded into R to add additional analytical tools, including confidence intervals for RR and OR. These tools are in the ' epitools ' package.
You must first install the package on your computer (just once), but each time you want to use it in an active R session, you need to load it.
Type the following to install the epitools package (this only needs to be done once):
>install.packages('epitools')
You should see the following message as a response in red:
Installing package into 'C:/Users/healeym/Documents/R/win-library/3.3' (as 'lib' is unspecified) trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.3/epitools_0.5-7.zip' Content type'application/zip' length 228486 bytes (223 KB) downloaded 223 KB
package 'epitools' successfully unpacked and MD5 sums checked The downloaded binary packages are in C:UsersyourusernameAppDataLocalTempRtmpsLajiUdownloaded_packages
You only have to install the epitools package once, but you have to call it up each time you use it.
>library(epitools)
Warning message: package 'epitools' was built under R version 3.4.2
If you are given the counts in a contingency table, i.e., you do not have the raw data set, you can re-create the table in R and then compute the risk ratio and its 95% confidence limits using the riskratio.wald() function in Epitools.
No CVD | CVD | Total | |
No HTN | 1017 | 165 | 1182 |
HTN | 2260 | 992 | 3252 |
Total | 3277 | 1157 | 4434 |
This is where the orientation of the contingency table is critical, i.e., with the unexposed (reference) group in the first row and the subjects without the outcome in the first column.
We create the contingency table in R using the matrix function and entering the data for the 1st column, then 2nd column. Note that we only enter the observed counts for each of the exposure-disease categories; we do not enter the totals in the margins. The solution in R is as follows:
R Code:
# The 1stline below creates the contingency table; the 2nd line prints the table so you can check the orientation
>RRtable<-matrix(c(1017,2260,165,992),nrow = 2, ncol = 2)
>RRtable
[,1] [,2]
[1,] 1017 165
[2,] 2260 992
# The next line asks R to compute the RR and 95% confidence interval
>riskratio.wald(RRtable)
$data
Outcome
Predictor Disease1 Disease2 Total
Exposed1 1017 165 1182
Exposed2 2260 992 3252
Total 3277 1157 4434
$measure
risk ratio with 95% C.I.
Predictor estimate lower upper
Exposed1 1.000000 NA
Exposed22.185217 1.879441 2.540742
$p.value
two-sided
Predictor midp.exact fisher.exact chi.square
Exposed1 NA
Exposed2 0 7.357611e-31 1.35953e-28zz
$correction [1] FALSE
attr(,'method') [1] 'Unconditional MLE & normal approximation (Wald) CI'
The risk ratio and 95% confidence interval are listed in the output under $measure.
Case-control studies use an odds ratio as the measure of association, but this procedure is very similar to the analysis above for RR.
>ORtable<-matrix(c(1017,2260,165,992),nrow = 2, ncol = 2)
>ORtable
[,1] [,2]
[1,] 1017 165
[2,] 2260 992
>oddsratio.wald(ORtable)
$data
Outcome
Predictor Disease1 Disease2 Total
Exposed1 1017 165 1182
Exposed2 2260 992 3252
Total 3277 1157 4434
$measure
odds ratio with 95% C.I.
Predictor estimate lower upper
Exposed1 1.000000 NA
Exposed22.705455 2.258339 3.241093
$p.value two-sided
Predictor midp.exact fisher.exact chi.square
Exposed1 NA
Exposed2 0 7.357611e-31 1.35953e-28
$correction [1] FALSE
attr(,'method')
[1] 'Unconditional MLE & normal approximation (Wald) CI'
If you have a raw data set, computing risk ratios and odds ratios and their corresponding 95% confidence intervals is even easier, because the contingency table can be created using the table() command instead of the matrix function.
For example, if I have data from the Framingham Heart Study and I want to compute the risk ratio for the association between type 2 diabetes and risk of being hospitalized with a myocardial infarction, I first use the table() command.
> table(diabetes,hospmi)
hospmi
diabetes 0 1
0 2557 210
1 183 48
Then, to compute the risk ratio and confidence limits, I insert the table parameters into the riskratio.wald() function:
> riskratio.wald(table(diabetes,hospmi))
$data
hospmi
diabetes 0 1 Total
0 2557 210 2767
1 183 48 231
Total 2740 258 2998
$measure
risk ratio with 95% C.I.
diabetes estimate lower upper
0 1.00000 NA NA
1 2.73791 2.062282 3.63488
Using the same data, I can similarly compute an odds ratio and its confidence interval using the oddsratio.wald()function:
> oddsratio.wald(table(diabetes,hospmi))
$data
hospmi
diabetes 0 1 Total
0 2557 210 2767
1 183 48 231
Total 2740 258 2998
$measure
odds ratio with 95% C.I.
diabetes estimate lower upper
0 1.000000 NA NA
1 3.193755 2.256038 4.521233
Note that, since this is a cohort study, it makes sense to compute the risk ratio, but I also have the option of computing an odds ratio, although in a case-control study one can only calculate an odds ratio. Notice also that in the example above, the odds ratio was somewhat more extreme than the risk ratio.
Test Yourself
Problem #1
A clinical trial was conducted to compare a new blood pressure-lowering medication to a placebo. Patients were enrolled and randomized to receive either the new medication or a placebo. The data below were collected at the end of the 6 week study.
Treatment (n=100) | Placebo (n=100) | |
Systolic Blood Pressure, mean (sd) | 120.2 (15.4) | 131.4 (18.9) |
Hypertensive, % | 14% | 22% |
Side Effects, % | 6% | 8% |
Generate a point estimate and 95% confidence interval for the risk ratio of side effects in patients assigned to the experimental group as compared to placebo. Use both the hand calculation method and the method using R to see if you get the same answers. Interpret the results in a sentence or two.
Link to Answer in a Word fileProblem #2
The table below summarizes parental characteristics for children of normal weight and children classified as overweight or obese. Perform a chi-square test by hand to determine if there is an association between the mother's BMI and the child's weight status. Compute the p-value and report your conclusion.
Characteristics | Child - Normal Weight (n=62) | Child - Overweight/Obese (n=38) | Total (n=100) |
Mean (SD) Age, years | 13.4 (2.6) | 11.1 (2.9) | 12.5 (2.7) |
% Male | 45% | 51% | 47% |
Mother's BMI | |||
Normal (BMI<25) | 40 (65%) | 16 (41%) | 56 (56%) |
Overweight (BMI 25-29.9) | 15 (24%) | 14 (38%) | 29 (29%) |
Obese (BMI > 30) | 7 (11%) | 8 (21%) | 15 (15%) |
Father's BMI | |||
Normal (BMI<25) | 34 (55%) | 16 (41%) | 50 (50%) |
Overweight (BMI 25-29.9) | 20 (32%) | 14 (38%) | 34 (34%) |
Obese (BMI > 30) | 8 (13%) | 8 (21%) | 16 (16%) |
Mean (SD) Systolic BP | 123 (15) | 139 (12) | 129 (14) |
Mean (SD) Total Cholesterol | 186 (25) | 211 (28) | 196 (26) |
return to top previous page next page
Contents
Here we look at some examples of calculating confidence intervals. Theexamples are for both normal and t distributions. We assume that youcan enter data and know the commands associated with basicprobability. Note that an easier way to calculate confidence intervalsusing the t.test command is discussed in section The Easy Way.
Here we will look at a fictitious example. We will make someassumptions for what we might find in an experiment and find theresulting confidence interval using a normal distribution. Here weassume that the sample mean is 5, the standard deviation is 2, and thesample size is 20. In the example below we will use a 95% confidencelevel and wish to find the confidence interval. The commands to findthe confidence interval in R are the following:
Our level of certainty about the true mean is 95% in predicting that thetrue mean is within the intervalbetween 4.12 and 5.88 assuming that the original random variable isnormally distributed, and the samples are independent.
Calculating the confidence interval when using a t-test is similar tousing a normal distribution. The only difference is that we use thecommand associated with the t-distribution rather than the normaldistribution. Here we repeat the procedures above, but we will assumethat we are working with a sample standard deviation rather than anexact standard deviation.
Again we assume that the sample mean is 5, the sample standarddeviation is 2, and the sample size is 20. We use a 95% confidencelevel and wish to find the confidence interval. The commands to findthe confidence interval in R are the following:
Our level of certainty about the true mean is 95% in predicting that thetrue mean is within the intervalbetween 4.06 and 5.94 assuming that the original random variable isnormally distributed, and the samples are independent.
We now look at an example where we have a univariate data set and wantto find the 95% confidence interval for the mean. In this example weuse one of the data sets given in the data input chapter. We use thew1.dat data set:
We can now calculate an error for the mean:
The confidence interval is found by adding and subtracting the errorfrom the mean:
Our level of certainty about the true mean is 95% in predicting that thetrue mean is within the intervalbetween 0.66 and 0.87assuming that the original random variable is normally distributed,and the samples are independent.
Suppose that you want to find the confidence intervals for manytests. This is a common task and most software packages will allow youto do this.
We have three different sets of results:
Comparison 1 | |||
Mean | Std. Dev. | Number (pop.) | |
Group I | 10 | 3 | 300 |
Group II | 10.5 | 2.5 | 230 |
Comparison 2 | |||
Mean | Std. Dev. | Number (pop.) | |
Group I | 12 | 4 | 210 |
Group II | 13 | 5.3 | 340 |
Comparison 3 | |||
Mean | Std. Dev. | Number (pop.) | |
Group I | 30 | 4.5 | 420 |
Group II | 28.5 | 3 | 400 |
For each of these comparisons we want to calculate the associatedconfidence interval for the difference of the means. For eachcomparison there are two groups. We will refer to group one as thegroup whose results are in the first row of each comparison above. Wewill refer to group two as the group whose results are in the secondrow of each comparison above. Before we can do that we must firstcompute a standard error and a t-score. We will find general formulaewhich is necessary in order to do all three calculations at once.
We assume that the means for the first group are defined in a variablecalled m1. The means for the second group are defined in a variablecalled m2. The standard deviations for the first group are in avariable called sd1. The standard deviations for the second groupare in a variable called sd2. The number of samples for the firstgroup are in a variable called num1. Finally, the number of samplesfor the second group are in a variable called num2.
With these definitions the standard error is the square root of(sd1^2)/num1+(sd2^2)/num2. The R commands to do this can be foundbelow:
To see the values just type in the variable name on a line alone:
Now we need to define the confidence interval around the assumeddifferences. Just as in the case of finding the p values in previouschapter we have to use the pmin command to get the number of degreesof freedom. In this case the null hypotheses are for a difference ofzero, and we use a 95% confidence interval:
This gives the confidence intervals for each of the three tests. Forexample, in the first experiment the 95% confidence interval isbetween -0.97 and -0.03 assuming that the random variables arenormally distributed, and the samples are independent.