Skip to main content

Main menu

  • Home
  • Content
    • Current Issue
    • Online First
    • Multimedia
    • Collections
    • Past Issues
    • Articles by Subject
    • Articles by Type
    • Supplements
    • Plain Language Summaries
    • Call for Papers
  • Info for
    • Authors
    • Reviewers
    • Media
    • Job Seekers
  • About
    • Annals of Family Medicine
    • Editorial Staff & Boards
    • Sponsoring Organizations
    • Copyrights & Permissions
    • Announcements
  • Engage
    • Engage
    • e-Letters (Comments)
    • Subscribe
    • RSS
    • Email Alerts
    • Journal Club
  • Contact
    • Feedback
    • Contact Us
  • Careers

User menu

  • My alerts

Search

  • Advanced search
Annals of Family Medicine
  • My alerts
Annals of Family Medicine

Advanced Search

  • Home
  • Content
    • Current Issue
    • Online First
    • Multimedia
    • Collections
    • Past Issues
    • Articles by Subject
    • Articles by Type
    • Supplements
    • Plain Language Summaries
    • Call for Papers
  • Info for
    • Authors
    • Reviewers
    • Media
    • Job Seekers
  • About
    • Annals of Family Medicine
    • Editorial Staff & Boards
    • Sponsoring Organizations
    • Copyrights & Permissions
    • Announcements
  • Engage
    • Engage
    • e-Letters (Comments)
    • Subscribe
    • RSS
    • Email Alerts
    • Journal Club
  • Contact
    • Feedback
    • Contact Us
  • Careers
  • Follow annalsfm on Twitter
  • Visit annalsfm on Facebook
EditorialEditorials

On the Nature and Analysis of Clustered Data

Stephen J. Zyzanski, Susan A. Flocke and L. Miriam Dickinson
The Annals of Family Medicine May 2004, 2 (3) 199-200; DOI: https://doi.org/10.1370/afm.197
Stephen J. Zyzanski
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susan A. Flocke
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
L. Miriam Dickinson
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Info & Metrics
  • eLetters
  • PDF
Loading
  • Clustered data analysis
  • sample size estimation
  • intraclass correlation
  • analysis bias

Studies in which data from multiple patients arecollected per clinician or per practice are becoming common in primary care research, particularly with the increase of studies conducted in practice-based research networks. These studies generate data that are clustered. A special case of clustered data is an intervention study where clinicians or practices are randomized into an intervention or control group. In such cluster-randomized designs, all patients of a clinician or practice are assigned to the same treatment, and this design is often used when logistics of implementation or the need to avoid contamination of treatment arms is a priority.

A major issue in the analysis of clustered data is that observations within a cluster are not independent, and the degree of similarity is typically measured by the intracluster correlation coefficient (ICC).1 Ignoring the intracluster correlation in the analysis could lead to incorrect P values, confidence intervals that are too small, and biased estimates and effect sizes, all of which can lead to incorrect interpretation of associations between variables.2 Failure to take into account the clustered structure of the study design during the planning phase of the study also can lead to underpowered study designs in which the effective sample size and statistical power to detect differences are smaller than planned.

In most situations, the numeric value of the intra-cluster correlation tends to be small and positive. Several authors have provided guidelines for interpreting the magnitude of the intraclass correlation3 with small, medium, and large values of the intraclass correlation coefficients reported as .05, .10, and .15. Small values of the intracluster correlation can be deceiving, however. Investigators need to be aware that the cluster effect is a combination of both the intracluster correlation and the cluster size. Small intracluster correlations coupled with large cluster size can still affect the validity of conventional statistical analyses.

Although clustered data are common, investigators often overlook both the special analysis challenges and the unique opportunities inherent with clustered data.4,5 In this issue of the Annals, Reed suggests a convenient correction procedure to address clustered data.6 The correction involves applying a formula to the standard errors and then conducting the planned analysis with the corrected standard errors. Also in this issue, the article by Killip et al7 provides a formula to compute an effective sample size for clustered data. Computation of the effective sample size is important, as it avoids costly sample size errors caused by underpowered studies. Examples in the Killip et al article show how the intracluster correlation, number of observations within a cluster, and number of clusters are all interrelated in estimating sample size and power for clustered data.

Clustered data imply a hierarchical nature to the data, and while many levels can be considered, two levels are most commonly specified. The outcome measure is always assessed at the lowest level. Explanatory variables, however, may be considered at any of the levels (eg, patient variables and/or physician or practice level variables). Consequently, clustered data provide considerable opportunities to explore, in greater depth, the interrelationships among variables at any level; these analyses are generically called multilevel analyses.

Considering an example of data with patients clustered with physicians, a comprehensive multilevel data analysis aims to assess the direct effect of patient and clinician/practice level variables on the outcome. One could also determine whether the variables at the clinician/practice level serve as moderators of patient level relationships by testing cross-level interactions between variables from the patient level and the physician level.8 Hence, multilevel analyses are designed to analyze variables from different levels simultaneously, all the while taking into account the intracluster correlation.

Statistical software to conduct these types of analyses and for computing sample size for clustered data now exist, and we encourage their wider use.9–,11 While the two articles featured in this issue help raise awareness of the challenges and some solutions to analyzing clustered data, the skills required for optimal analysis of clustered data often are beyond those of most clinician-investigators. Studies involving clustered data would greatly benefit from the expertise provided by statisticians versed in the analysis of clustered data. Several recent textbooks3,9,12–,14 and Web sites15–,17 provide good introduction to the area with realistic health care examples. Finally, the recent CONSORT statement delineating guidelines for reporting of randomized controlled trials has now been extended to the special case of cluster-randomized trials.18

Footnotes

  • Conflicts of interest: none reported

  • Received for publication March 26, 2004.
  • Accepted for publication April 9, 2004.
  • © 2004 Annals of Family Medicine, Inc.

REFERENCES

  1. ↵
    Kerry SM, Bland JM. The intracluster correlation coefficient in cluster randomization. BMJ. 1998;316:1455–1460.
    OpenUrlFREE Full Text
  2. ↵
    Campbell MK, Grimshaw JM. Cluster randomized trials: time for improvement. The implications of adopting a cluster design are still largely being ignored [editorial]. BMJ. 1998;317:1171–1172.
    OpenUrlFREE Full Text
  3. ↵
    Hox J. Multilevel Analysis: Techniques and Application. Mahwah, NJ: Lawrence Erlbaum; 2002.
  4. ↵
    Varnell SP, Murray DM, Janega JB, Blitstein MS. Design and analysis of group-randomized trials: a review of recent practices. Am J Pub Health. 2004;94:393–399.
    OpenUrlCrossRefPubMed
  5. ↵
    Localio AR, Berlin JA, Ten TR, Kimmel SE. Adjustments for center in multi-center studies: an overview. Ann Intern Med. 2000;135:112–123.
    OpenUrl
  6. ↵
    Reed JF. Adjusted chi-square statistics: application to clustered binary data in primary care. Ann Fam Med. 2004;2:201–203.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    Killip S, Mahfoud Z, Pearce K. What is an intraclass correlation coefficient? Ann Fam Med. 2004;2:204–208.
    OpenUrlAbstract/FREE Full Text
  8. ↵
    Kraemer HC, Wilson T, Fairburn CG, Agras WS. Mediators and moderators of treatment effects in randomized clinical trials. Arch Gen Psychiatry. 2002;59:877–883.
    OpenUrlCrossRefPubMed
  9. ↵
    Raudenbush SW, Bryk AS. Hierarchical Linear Models: Application and Data Analysis Methods. Thousand Oaks, Calif: Sage Publications; 2001.
  10. Singer J. Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. J Educ Behav Stat. 1998;24:323–355.
    OpenUrl
  11. ↵
    Donner A. Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. London: Arnold; 2000.
  12. ↵
    Murray DM. Design and Analysis of Group Randomization Trials. New York, NY: Oxford University Press; 1998.
  13. Kreft I, De Leeuw J. Introducing Multilevel Modeling. Thousand Oaks, Calif: Sage Publications; 1998.
  14. ↵
    Leyland AH, Goldstein H. Multilevel Modeling of Health Statistics. John Wiley & Sons: London; 2001.
  15. ↵
    Hedeker D. Multilevel data analysis. Available at: http://tigger.uic.edu/~hedeker/ml.html.
  16. Cluster for multilevel modeling. Available at: http://multilevel.ioe.ac.uk/index.html.
  17. ↵
    Multilevel modeling resources. University of California Los Angeles Web site. Available at http://www.ats.ucla.edu/stat/mlm/default.htm.
  18. ↵
    Campbell MK, Elbourne DR, Altman DG. CONSORT statement: extension to cluster randomized trials. BMJ. 2004;328:702–708.
    OpenUrlFREE Full Text
PreviousNext
Back to top

In this issue

The Annals of Family Medicine: 2 (3)
The Annals of Family Medicine: 2 (3)
Vol. 2, Issue 3
1 May 2004
  • Table of Contents
  • Index by author
  • TRACK Participants
  • The Issue in Brief
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Annals of Family Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
On the Nature and Analysis of Clustered Data
(Your Name) has sent you a message from Annals of Family Medicine
(Your Name) thought you would like to see the Annals of Family Medicine web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
10 + 7 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
On the Nature and Analysis of Clustered Data
Stephen J. Zyzanski, Susan A. Flocke, L. Miriam Dickinson
The Annals of Family Medicine May 2004, 2 (3) 199-200; DOI: 10.1370/afm.197

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Get Permissions
Share
On the Nature and Analysis of Clustered Data
Stephen J. Zyzanski, Susan A. Flocke, L. Miriam Dickinson
The Annals of Family Medicine May 2004, 2 (3) 199-200; DOI: 10.1370/afm.197
Reddit logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Footnotes
    • REFERENCES
  • Info & Metrics
  • eLetters
  • PDF

Related Articles

  • No related articles found.
  • PubMed
  • Google Scholar

Cited By...

  • Reflections From a Statistical Editor: Elements of Great Manuscripts
  • Intraclass Correlation Coefficients Typical of Cluster-Randomized Studies: Estimates From the Robert Wood Johnson Prescription for Health Projects
  • Do nurse and patient injuries share common antecedents? An analysis of associations with safety climate and working conditions
  • Physician Trust in the Patient: Development and Validation of a New Measure
  • A Study of Clustered Data and Approaches to Its Analysis
  • Annals Journal Club: Guided Care
  • Annals Journal Club: Mortality from White Coat vs Sustained Hypertension
  • Effects of Comorbidity and Clustering upon Referrals in Primary Care
  • In This Issue: Bursting the Bubble on Chronic Disease Management, the Meaning of Healing, PBRN Methods Supplement, and the Annals' 2-Year Anniversary
  • Practice-Based Research Network Studies in the Age of HIPAA
  • In This Issue: The Patient Voice, Clinical Research, Clustered Data, and the Wonca Research Conference
  • Google Scholar

More in this TOC Section

  • Recruiting, Educating, and Taking Primary Care to Rural Communities
  • Returning to a Patient-Centered Approach in the Management of Hypothyroidism
  • An Opportunity to Emphasize Equity, Social Determinants, and Prevention in Primary Care
Show more Editorials

Similar Articles

Subjects

  • Methods:
    • Quantitative methods
  • Other research types:
    • PBRN research

Content

  • Current Issue
  • Past Issues
  • Past Issues in Brief
  • Multimedia
  • Articles by Type
  • Articles by Subject
  • Multimedia
  • Supplements
  • Online First
  • Calls for Papers

Info for

  • Authors
  • Reviewers
  • Media
  • Job Seekers

Engage

  • E-mail Alerts
  • e-Letters (Comments)
  • RSS
  • Journal Club
  • Submit a Manuscript
  • Subscribe
  • Family Medicine Careers

About

  • About Us
  • Editorial Board & Staff
  • Sponsoring Organizations
  • Copyrights & Permissions
  • Contact Us
  • eLetter/Comments Policy

© 2023 Annals of Family Medicine