Abstract
PURPOSE Researchers who conduct clusterrandomized studies must account for clustering during study planning; failure to do so can result in insufficient study power. To plan adequately, investigators need accurate estimates of clustering in the form of intraclass correlation coefficients (ICCs).
METHODS We used data for 5,042 patients, from 61 practices in 8 practicebased research networks, obtained from the Prescription for Health program, sponsored by the Robert Wood Johnson Fund, to estimate ICCs for demographic and behavioral variables and for physician and practice characteristics. We used an approach similar to analysis of variance to calculate ICCs for binary variables and mixed models that directly estimated between and withincluster variances to calculate ICCs for continuous variables.
RESULTS ICCs indicating substantial withinpractice clustering were calculated for age (ICC = 0.151), race (ICC = 0.265), and such behaviors as smoking (ICC = 0.118) and unhealthy diet (ICC = 0.206). Patients’ intenttochange behaviors related to smoking, diet, or exercise were less clustered (ICCs ≤0.007). Withinnetwork ICCs were generally smaller, reflecting heterogeneity among practices within the same network. ICCs for practicelevel measures indicated that practices within networks were relatively homogenous with respect to practice type (ICC = 0.29) and the use of electronic medical records (ICC = 0.23), but less homogenous with respect to size and rates of physician and staff turnover.
CONCLUSION ICCs for patient behaviors and intent to change those behaviors were generally less than 0.1. Though small, such ICCs are not trivial; if cluster sizes are large, even small levels of clustering that is unaccounted for reduces the statistical power of a clusterrandomized study.
 intraclass correlation coefficient
 primary care
 clusterrandomized trial
 practicebased research network
 estimation techniques
INTRODUCTION
Research conducted in practicebased research networks (PBRNs) often randomizes interventions, not by individuals, but according to a natural clustering unit, such as the physician or the practice in which patients receive care. Patients who receive care from the same physician or at the same practice are likely to resemble one another more than they do patients who see other physicians or attend other practices. These shared withincluster characteristics may relate to patients inhabiting the same geographic area or community, or to commonalities in physician or practice styles. In studies involving multiple networks, clustering may also occur at the network level because of differences in geography or member selection. For example, some networks recruit mostly small rural practices, whereas others primarily include innercity community health centers.
Researchers who conduct clusterrandomized studies must explicitly account for clustering at every stage of design and analysis. Failure to account for clustering, as well as the associations that naturally exist among patients within clinics, underestimates variability in outcome measures. Specifically, it underestimates the standard errors for betweensubject effects, for example, effects related to clusterrandomized treatments. As a consequence, confidence intervals that estimate treatment effects will be too narrow and tests of hypotheses concerning treatment effects will be vulnerable to type 1 errors, the probability that investigators will declare differences to exist when they actually do not.^{1}
The need to account for clustering begins during study planning, before data are collected or analyzed. Underestimating withincluster variability in the study’s primary outcome measures during design and planning will, in turn, underestimate the number of subjects necessary to detect a hypothesized treatment difference.^{2} Failure to account for clustering can lead to studies that are underpowered.
To plan studies that have appropriate power, investigators need good estimates of clustering effects, typically in the form of intraclass correlation coefficients (ICCs). This coefficient, a parameter customarily signified as ρ, is defined as the proportion of a measure’s total variance (σ^{2}_{y}) that is shared among members of defined clusters. Re cognizing that an outcome’s total variance (σ^{2}_{y}) is the sum of the betweencluster variance (σ^{2}_{c}) and the withincluster variance (σ^{2}_{w}), then,
If patients who attend the same clinic are relatively homogenous with respect to a measure, the withincluster variance (σ^{2}_{w}) will be relatively small, and the betweencluster variability (σ^{2}_{c}) and ICC will be relatively large. When then betweencluster variability is large, it is difficult to attribute betweencluster differences to a treatment that is randomly assigned by cluster. As a result, studies that fail to account for this kind of clustering during their planning stage may be unable to detect treatment effects when they are executed.
Although investigators are most often interested in ICC estimates that quantify clustering in a study’s outcome variable, ICCs are estimable for any variable measured in a sample. A population estimate for any variable’s ICC is obtained using variance estimates for σ^{2}_{c} and σ^{2}_{w} that are derived from the sample. As are all population estimates, the ICCs are subject to some uncertainty, which is quantified in a confidence interval. Because an ICC’s estimate involves a nonlinear combination of variances, the estimate’s standard error and confidence interval involve calculations that are not straightforward. Obtaining confidence intervals for the ICC by bootstrapping^{3} avoids this computational obstacle.
To plan clusterrandomized studies, investigators use the wellknown variation inflation factor (VIF), generally expressed as VIF = 1 + ρ(m−1), which requires estimates of the ICC (ρ) and of the study’s mean cluster size (m). This formula for the VIF is based on a ratio that compares an outcome’s variance in a study with independent clusters whose average size is m, with the outcome’s variance calculated in a manner that ignores clustering and, instead, treats each patient as an independent cluster of size m=1.^{4} The Supplemental Appendix outlines the logic that underlies the formula, and is available at http://annfammed.org/content/10/3/235/suppl/DC1.
Also called the design effect, the VIF quantifies the effect that clustering among observations has on the variance of an outcome under study. Investigators use the VIF to produce both sample size calculations and hypothesis tests that are appropriately adjusted for the effect of clustering on an outcome’s variance. Calculating a sample size that produces adequate power under the assumption that treatments are randomized at the level of the individual, but then multiplying that sample size by the VIF, ensures that a clusterrandomized design is of equal statistical power.^{5}^{(pp112–113)} Similarly, calculating χ^{2} or t statistics to test hypotheses, while treating observations as unclustered, but then dividing these statistics by the VIF or the square root of the VIF, respectively, produces appropriate clusteradjusted tests.^{6}^{(p333)}
METHODS
From 2003 to 2007, the Robert Wood Johnson Foundation (RWJF) funded 2 rounds of practicebased research network (PBRN) research on methods that might be used in primary care settings to identify and address 4 unhealthy behaviors: unhealthy eating, lack of physical activity, tobacco use, and alcohol overuse and abuse. Ten networks participated in the second round of the RWJFsponsored Prescription for Health program and its Common Measures Better Outcomes (COMBO) study.^{7–9}
One of the 10 networks enrolled only families with small children and another network enrolled only adolescents. Using data from the other 8 PBRNs, we calculated intraclass correlation coefficients (ICCs) for each of a list of patientlevel behavioral and demographic variables and for certain physician and practice characteristics (Table 1). Table 1 organizes these variables and characteristics among 3 levels of the hierarchy within which observations were clustered: (1) patients within practices, (2) patients within PBRNs, and (3) practices within networks.
Patients Within Practices and Within Networks
The 8 PBRNs reported data on 5,042 patients who were aged at least 18 years and who received care in 61 practices. Networks that reported patientlevel data included between 3 and 13 practices, and the practices enrolled between 1 and 364 patients. Although the 8 networks’ projects differed in design, all collected practicelevel data using the same practice information form and patientlevel data on the same set of common measures.^{8}
Practices Within Networks
While 61 practices contributed both patientlevel and practicelevel information, an additional 28 practices enrolled in the studies but contributed only practicelevel data. Using data from these 89 practices, which numbered from 6 to 26 practices per network, we calculated ICCs on practicelevel variables that included the number of fulltime equivalent physician and staff, physician and staff turnover, and use of electronic medical records (Table 1).
Calculation of ICCs
The ICC is conventionally calculated using 2 quantities obtained from an analysis of variance.^{10} One quantity is a mean square that estimates betweencluster variability (MSC), that portion of an outcome’s variability that patients share because they are nested within clinics. The other quantity is a mean square that estimates withinsubject variability (MSE) that is unique to (but assumed to be equal among) each subject regardless of cluster membership. These quantities are inserted into formulae established by Shrout and Fleiss,^{10} the relevant one for this study being
Because the size of clusters typically varies in a clusterrandomized study, the formula for the ICC also requires combining each cluster’s size (m_{k}) to calculate an overall weighted mean cluster size (m).^{6}^{(equation 8)} This calculation of m was also necessary to VIFs.
We used an analysis of variance approach promoted by Reed^{11} and Taljaard et al^{12} to arrive at ICCs for binary variables. The approach is equivalent to that of a mixed model that estimates a random intercept for each cluster.
We arrived at ICCs for each continuous variable by directly estimating the betweencluster (σ^{2}_{c}) and withincluster (σ^{2}_{w}) variances in a mixed model that treated clusters as random effects.^{13}^{(pp329–339)} The models, calculated in SAS PROC MIXED 9.2 (SAS Institute Inc), were structurally equivalent to hierarchical models where, for example, observations on patients were nested within either clinics or networks. These models estimated σ^{2}_{c} and σ^{2}_{w} using restricted maximum likelihood estimation, which produces more unbiased estimates than maximum likelihood estimation when observations are clustered or correlated.^{14}^{(p101)} We also used this mixed model approach to calculate ICCs for ordinal variables that reflected patients’ intention to change healthrelated behaviors.
Point estimates for the ICCs are accompanied by 95% confidence intervals. To avoid the complicated estimate of a standard error that is required for an estimate that, similar to the ICC’s, involves a nonlinear combination of variances, we calculated bootstrap 95% confidence intervals. Specifically, we resampled with replacement to produce 1,000 bootstrap samples,^{3} calculated the ICC for each sample so obtained, then reported empirical 95% bootstrap confidence intervals. These intervals’ limits are simply the ICC values that demarcate the 2.5th and 97.5th percentiles of the estimate’s bootstrap distribution.
RESULTS
Patients Within Practices and Within Networks
Table 2 summarizes, for variables measured on individual patients, calculated ICCs and their 95% confidence intervals, adjusted cluster sizes (m), and VIFs. Large ICCs that reflect substantial clustering of patient characteristics within physician practices were evident for demographic such variables as age (ICC = 0.151) and the proportion of patients who are nonwhite and white (ICC = 0.265). Large ICCs were also found for such behaviors as smoking status (ICC = 0.118) and unhealthy diet (ICC = 0.206). The extent of withinpatient clustering for alcohol use depended on how the behavior was measured; the ICC was estimated to be 0.076 when we assessed drinks per day but only 0.001 when we assessed average drinks per month. Relatively small ICCs (0.007 or lower) were calculated for the intention to change behaviors related to smoking, diet, and exercise. Patients’ intent to change these behaviors was relatively diverse within the practices.
Corresponding ICCs within networks were generally smaller, which suggests that, even though withinpractice clustering was evident for many measures, practices within the same network were relatively heterogenous with respect to the measures.
Practices Within Networks
Table 3 summarizes the ICCs and VIFs calculated for practicelevel variables, measured in 89 practices within 8 PBRNs. The project did not collect data at the level of individual physicians. Whereas practices within PBRNs were relatively homogenous with respect to practice type (ICC = 0.29) and the use of electronic medical records (ICC = 0.23), they were less homogenous with respect to their size and to the rate of turnover of physicians and staff.
DISCUSSION
Our analyses suggest that the ICCs for certain measures of health behavior are small, generally less than 0.1. Bland^{4} describes this magnitude as typical for outcome variables in clusterrandomized studies. Though small, these ICCs are not trivial; if cluster sizes are large, even small levels of clustering, if unaccounted for, can reduce a study’s statistical power.
We found larger ICCs for patientlevel demographic variables and for practicelevel variables, such as the presence of an electronic medical record, a measure that relates to the control of clinical processes. In this regard, the study reinforces others’ observation that clustering is less evident for outcome variables than for other independent and process variables.^{12}
High levels of withinpractice clustering among demographic and other independent variables underscore the need, when analyzing data from studies that randomize interventions among practices, to adjust for confounding that arise as a result of betweencluster differences. Statistical methods exist to adjust for confounding. Moreover, where outcomes are measured on continuous scales, mixed or hierarchical models can adjust for practice and patientlevel clustering among covariates. For outcomes that are binomial or measured as counts, marginal models that use generalized estimating equations can derive clusteradjusted estimates of treatment and other effects and are applicable as long as clusters are numerous.^{15}
Because methods are available to adjust for clustering in the analysis of data that have already been collected, the primary use for information about clustering is for study planning. Investigators can use estimates of ICCs such as those provided here to ensure that a planned clusterrandomized study affords adequate power to detect a treatment effect. Investigators can initially use conventional sample size estimation techniques to determine that a sample of, for example, 100 independent and randomly selected subjects affords an 80 percent power to detect a prespecified and clinically meaningful effect. They can proceed to calculate a VIF by using a published estimate of the ICC for the study’s outcome measure, along with an estimate of the study’s likely cluster size. By multiplying the conventional sample size estimate by the VIF, the investigators can arrive at an appropriately inflated or augmented goal for subject recruitment. Recruiting a sample of this increased size ensures that, under the planned cluster randomization, the study affords 80% power to detect the prespecified effect.
This study provides estimates of the ICC at 3 levels of clustering: patients within practices, patients within networks, and practices within networks. The study estimated ICCs for binary outcome and process measures using an approach similar to analysis of variance (ANOVA) that, although advocated by Reed,^{11} may apply only to data which, like the COMBO data, involved large clusters. Estimates of the variances that make up the ICC were robust in that we obtained similar results whether we used Reed’s singlefactor ANOVA approach^{11} or the hierarchical models constructed using SAS PROC MIXED.^{13} To estimate ICCs for binary and ordinal measures from studies with smaller clusters, a more appropriate approach might construct hierarchical logistic or cumulative logistic regression models, respectively, in software such as SAS PROC GENMOD, which can apply appropriate distributional assumptions along with generalized estimating equations methodology.^{16,17}
In addition to estimating ICCs, this study provides confidence intervals on those estimates. Obtained by resampling, these intervals provide investigators with realistic ranges for the ICCs’ true values. In particular, the intervals’ upper bounds will generate the largest and most conservative VIFs that investigators might use in calculating sample size estimates for clusterrandomized studies.
Investigators who plan studies with interventions that are randomized not to individuals, but to relatively homogenous groups or clusters of individuals, must account for clustering, particularly when planning the size of the studies’ samples. A standard approach multiplies initial sample size estimates, made on the assumption that individuals are heterogenous and not clustered within groups, such as medical practices, by a variance inflation factor calculated on the basis of approximate cluster size and an estimate of the appropriate ICC. This study used data from the RWJFsponsored Prescription for Health program, and its COMBO study^{7–9} to provide point estimates and confidence intervals for ICCs for health behaviors and other patient and practicelevel characteristics. These estimates will be of interest to practicebased researchers as they plan research on similar health outcomes and patient behaviors.
Acknowledgments
We would like to thank the primary care practices around the country who participated in round 2 of the Prescription for Health studies.
Footnotes

Conflicts of interest: authors report none.

Funding support: This research was supported by The Robert Wood Johnson Foundation, Princeton, New Jersey.
 Received for publication February 18, 2011.
 Revision received October 30, 2011.
 Accepted for publication October 13, 2011.
 © 2012 Annals of Family Medicine, Inc.