|
|
||||||||
Editorial |
1 Departments of Family Medicine, Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
2 Department of Family Medicine, University of Colorado Health Science Center, Denver, Colo
CORRESPONDING AUTHOR: Stephen J. Zyzanski, PhD, Department of Family Medicine, Case Western Reserve University, 11001 Cedar Ave, Suite 306, Cleveland, OH 44106, sjz{at}po.cwru.edu
Key Words: Clustered data analysis sample size estimation intraclass correlation analysis bias
Studies in which data from multiple patients arecollected per clinician or per practice are becoming common in primary care research, particularly with the increase of studies conducted in practice-based research networks. These studies generate data that are clustered. A special case of clustered data is an intervention study where clinicians or practices are randomized into an intervention or control group. In such cluster-randomized designs, all patients of a clinician or practice are assigned to the same treatment, and this design is often used when logistics of implementation or the need to avoid contamination of treatment arms is a priority.
A major issue in the analysis of clustered data is that observations within a cluster are not independent, and the degree of similarity is typically measured by the intracluster correlation coefficient (ICC).1 Ignoring the intracluster correlation in the analysis could lead to incorrect P values, confidence intervals that are too small, and biased estimates and effect sizes, all of which can lead to incorrect interpretation of associations between variables.2 Failure to take into account the clustered structure of the study design during the planning phase of the study also can lead to underpowered study designs in which the effective sample size and statistical power to detect differences are smaller than planned.
In most situations, the numeric value of the intra-cluster correlation tends to be small and positive. Several authors have provided guidelines for interpreting the magnitude of the intraclass correlation3 with small, medium, and large values of the intraclass correlation coefficients reported as .05, .10, and .15. Small values of the intracluster correlation can be deceiving, however. Investigators need to be aware that the cluster effect is a combination of both the intracluster correlation and the cluster size. Small intracluster correlations coupled with large cluster size can still affect the validity of conventional statistical analyses.
Although clustered data are common, investigators often overlook both the special analysis challenges and the unique opportunities inherent with clustered data.4,5 In this issue of the Annals, Reed suggests a convenient correction procedure to address clustered data.6 The correction involves applying a formula to the standard errors and then conducting the planned analysis with the corrected standard errors. Also in this issue, the article by Killip et al7 provides a formula to compute an effective sample size for clustered data. Computation of the effective sample size is important, as it avoids costly sample size errors caused by underpowered studies. Examples in the Killip et al article show how the intracluster correlation, number of observations within a cluster, and number of clusters are all interrelated in estimating sample size and power for clustered data.
Clustered data imply a hierarchical nature to the data, and while many levels can be considered, two levels are most commonly specified. The outcome measure is always assessed at the lowest level. Explanatory variables, however, may be considered at any of the levels (eg, patient variables and/or physician or practice level variables). Consequently, clustered data provide considerable opportunities to explore, in greater depth, the interrelationships among variables at any level; these analyses are generically called multilevel analyses.
Considering an example of data with patients clustered with physicians, a comprehensive multilevel data analysis aims to assess the direct effect of patient and clinician/practice level variables on the outcome. One could also determine whether the variables at the clinician/practice level serve as moderators of patient level relationships by testing cross-level interactions between variables from the patient level and the physician level.8 Hence, multilevel analyses are designed to analyze variables from different levels simultaneously, all the while taking into account the intracluster correlation.
Statistical software to conduct these types of analyses and for computing sample size for clustered data now exist, and we encourage their wider use.911 While the two articles featured in this issue help raise awareness of the challenges and some solutions to analyzing clustered data, the skills required for optimal analysis of clustered data often are beyond those of most clinician-investigators. Studies involving clustered data would greatly benefit from the expertise provided by statisticians versed in the analysis of clustered data. Several recent textbooks3,9,1214 and Web sites1517 provide good introduction to the area with realistic health care examples. Finally, the recent CONSORT statement delineating guidelines for reporting of randomized controlled trials has now been extended to the special case of cluster-randomized trials.18
FOOTNOTES
Conflicts of interest: none reported
Received for publication March 26, 2004. Accepted for publication April 9, 2004.
REFERENCES
This article has been cited by other articles:
![]() |
F. M. Chen, G. E. Fryer Jr., and T. E. Norris Effects of Comorbidity and Clustering upon Referrals in Primary Care J Am Board Fam Med, November 1, 2005; 18(6): 449 - 452. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Stange and P. A. Nutting In This Issue: Bursting the Bubble on Chronic Disease Management, the Meaning of Healing, PBRN Methods Supplement, and the Annals' 2-Year Anniversary Ann. Fam. Med, May 1, 2005; 3(3): 194 - 196. [Full Text] [PDF] |
||||
![]() |
W. D. Pace, E. W. Staton, and S. Holcomb Practice-Based Research Network Studies in the Age of HIPAA Ann. Fam. Med, May 1, 2005; 3(suppl_1): S38 - S45. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Stange and W. L. Miller In This Issue: The Patient Voice, Clinical Research, Clustered Data, and the Wonca Research Conference Ann. Fam. Med, May 1, 2004; 2(3): 194 - 197. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |