Multi-level longitudinal learning curve regression models integrated with item difficulty metrics for deliberate practice of visual diagnosis: groundwork for adaptive learning

Reinstein, Ilan; Hill, Jennifer; Cook, David A.; Lineberry, Matthew; Pusic, Martin V.

doi:10.1007/s10459-021-10027-0

Multi-level longitudinal learning curve regression models integrated with item difficulty metrics for deliberate practice of visual diagnosis: groundwork for adaptive learning

Published: 01 March 2021

Volume 26, pages 881–912, (2021)
Cite this article

Advances in Health Sciences Education Aims and scope Submit manuscript

763 Accesses
6 Citations
9 Altmetric
Explore all metrics

Abstract

Visual diagnosis of radiographs, histology and electrocardiograms lends itself to deliberate practice, facilitated by large online banks of cases. Which cases to supply to which learners in which order is still to be worked out, with there being considerable potential for adapting the learning. Advances in statistical modeling, based on an accumulating learning curve, offer methods for more effectively pairing learners with cases of known calibrations. Using demonstration radiograph and electrocardiogram datasets, the advantages of moving from traditional regression to multilevel methods for modeling growth in ability or performance are demonstrated, with a final step of integrating case-level item-response information based on diagnostic grouping. This produces more precise individual-level estimates that can eventually support learner adaptive case selection. The progressive increase in model sophistication is not simply statistical but rather brings the models into alignment with core learning principles including the importance of taking into account individual differences in baseline skill and learning rate as well as the differential interaction with cases of varying diagnosis and difficulty. The developed approach can thus give researchers and educators a better basis on which to anticipate learners’ pathways and individually adapt their future learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Re-evaluating GPT-4’s bar exam performance

Article Open access 30 March 2024

Eric Martínez

Predicting academic success in higher education: literature review and best practices

Article Open access 10 February 2020

Eyman Alyahyan & Dilek Düştegör

Tailored gamification in education: A literature review and future agenda

Article Open access 29 June 2022

Wilk Oliveira, Juho Hamari, … Seiji Isotani

References

Anderson, S. J., Hecker, K. G., Krigolson, O. E., & Jamniczky, H. A. (2018). A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study. Frontiers in Human Neuroscience, 12, 38.
Article Google Scholar
Bates, D., Maechler, M., Bolker, B, & Walker, S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2000. https://arxiv.org/abs/1406.5823v1
Bates D, Maechler M, Bolker B, Walker S. (2015). lme4: Linear Mixed-Effects Models Using Eigen and S4. R package version 1.1–10, URL http://CRAN.R-project.org/package=lme4.
Bok, H. G., de Jong, L. H., O’Neill, T., Maxey, C., & Hecker, K. G. (2018). Validity evidence for programmatic assessment in competency-based education. Perspectives on medical education., 7(6), 362–372.
Article Google Scholar
Bolsin, S., & Colson, M. (2000). The use of the Cusum Technique in the assessment of trainee competence in new procedures. International Journal for Quality in Health Care, 12(5), 433–438.
Article Google Scholar
Boutis, K., Pecaric, M., Carrière, B., Stimec, J., Willan, A., Chan, J., & Pusic, M. (2019). The effect of testing and feedback on the forgetting curves for radiograph interpretation skills. Medical Teacher, 41(7), 756–764.
Article Google Scholar
Bryk, A. S., & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147–158.
Article Google Scholar
Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning: A temporal ridgeline of optimal retention. Psychological Science., 19(11), 1095–1102.
Article Google Scholar
Cepeda, N. J., Coburn, N., Rohrer, D., Wixted, J. T., Mozer, M. C., & Pashler, H. (2009). Optimizing distributed practice: Theoretical analysis and practical implications. Experimental Psychology., 56(4), 236–246.
Article Google Scholar
Chaiklin, S. (2003). The zone of proximal development in Vygotsky’s analysis of learning and instruction. In A. Kozulin, B. Gindis, V. Ageyev, & S. Miller (Eds.), Vygotsky’s educational theory in cultural context (First (pp. 39–64). Cambridge, England: Cambridge University Press.
Chapter Google Scholar
Davis, A. L., Pecaric, M., Pusic, M. V., Smith, T., Shouldice, M., Brown, J., & Boutis, K. (2020). Deliberate practice as an educational method for learning to interpret the prepubescent female genital examination. Child Abuse and Neglect, 101, 104379.
Article Google Scholar
De Boeck, P., et al. (2011). The Estimation of Item Response Models with the lmer Function from the lme4 Package in R. Journal of Statistical Software, 39(12), 1–28.
Article Google Scholar
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
Book Google Scholar
Diederich, E., Thomas, L., Mahnken, J., & Lineberry, M. (2018). Pretest scores uniquely predict 1-year-delayed performance in a simulation-based mastery course for central line insertion. Simulation in Healthcare, 13(3), 163–167.
Article Google Scholar
Doran, H., Bates, D., Bliese, P., & Dowling, M. (2007). Estimating the multilevel rasch model: with the lme4 package. Journal of Statistical Software, 20(2), 1–18.
Article Google Scholar
Downing, S. M. (2003). Item response theory: applications of modern test theory in medical education. Medical education, 37(8), 739–745.
Article Google Scholar
Ericsson, K. A. (2004). Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Academic Medicine, 79(10), S70–S81.
Article Google Scholar
Ericsson, K. A. (2015). Acquisition and maintenance of medical expertise. Academic Medicine, 90(11), 1471–1486.
Article Google Scholar
Ertmer, P. A., & Newby, T. J. (1993). Behaviorism, cognitivism, constructivism: comparing critical features from an instructional design perspective. Performance Improvement Quarterly, 6(4), 50–72.
Article Google Scholar
Faraway, J. J. (2016). Linear models with R. Boca Raton: Chapman and Hall/CRC.
Book Google Scholar
Gelman, A., & Hill, J. (2007). Multilevel linear models: the basics. Data analysis using regression and multilevel/hierarchical models (pp. 251–278). Cambridge: Cambridge University Press.
Google Scholar
Gelman, A., & Hill, J. (2007). Simulation of Probability Models and Statistical Inferences in Data Analysis Using Regression and Multilevel/Hierarchical Models (pp. 251–278). Cambridge: Cambridge University Press.
Google Scholar
Guadagnoli, M., Morin, M., & Dubrowski, A. (2012). The application of the challenge point framework in medical education. Medical Education, 46, 447–453.
Article Google Scholar
Gulliksen, H. (1934). A rational equation of the learning curve based on Thorndike’s law of effect. The Journal of General Psychology, 11(2), 395–434.
Article Google Scholar
Hatala, R., Gutman, J., Lineberry, M., Triola, M., & Pusic, M. (2019). How well is each learner learning? Validity investigation of a learning curve-based assessment approach for ECG interpretation. Advances in Health Sciences Education, 24(1), 45–63.
Article Google Scholar
Jaber, M. Y., & Bonney, M. (1997). A comparative study of learning curves with forgetting. Applied Mathematical Modelling, 21(8), 523–531.
Article Google Scholar
Jonassen, D. (1991). Objectivism versus constructivism: Do we need a new philosophical paradigm? Educational Technology Research and Development, 39(3), 5–14.
Article Google Scholar
Kerfoot, B. P., Baker, H., Pangara, L., Agarwal, K., Taffet, G., Mechaber, A. J., et al. (2012). An online spaced-education game to teach and assess medical students. Academic Medicine, 87(10), 1443–1449.
Article Google Scholar
Linacre, J. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7(4), 328.
Google Scholar
Lindsey, R. V., Shroyer, J. D., Pashler, H., & Mozer, M. C. (2014). Improving students’ long-term knowledge retention through personalized review. Psychological Science, 25(3), 639–647.
Article Google Scholar
Park, O., & Lee, J. (2003). Adaptive instructional systems. Educational Technology Research and Development, 25, 651–684.
Google Scholar
Pavlik, P. I., & Anderson, J. R. (2008). Using a model to compute the optimal schedule of practice. Journal of Experimental Psychology: Applied, 14(2), 101–117.
Google Scholar
Pecaric, M., Boutis, K., Beckstead, J., & Pusic, M. (2017). A big data and learning analytics approach to process-level feedback in cognitive simulations. Academic Medicine, 92(2), 175–184.
Article Google Scholar
Price, D. W., Swanson, D. B., Irons, M. B., & Hawkins, R. E. (2018 Sep 2). Longitudinal assessments in continuing specialty certification and lifelong learning. Medical Teacher, 40(9), 917–919.
Article Google Scholar
Pusic, M. V., Pecaric, M., & Boutis, K. (2011). How much practice is enough? Using learning curves to assess the deliberate practice of radiograph interpretation. Academic Medicine, 86, 731–736.
Article Google Scholar
Pusic, M. V., Boutis, K., Hatala, R., & Cook, D. (2015). Learning curves in health professions education. Academic Medicine : Journal of the Association of American Medical Colleges, 90(8), 1034–1042.
Article Google Scholar
Pusic, M. V., Boutis, K., Pecaric, M. R., Savenkov, O., Beckstead, J. W., & Jaber, M. Y. (2017). A primer on the statistical modelling of learning curves in health professions education. Advances in Health Sciences Education, 22(3), 741–759.
Article Google Scholar
Pusic, M. V., Boutis, K., & McGaghie, W. C. (2018). Role of scientific theory in simulation education research. Simulation in Healthcare, 13(3S), S7-14.
Article Google Scholar
R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Rijmen, F., Tuerlinckx, F., Boeck, P. D., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8(2), 185–205.
Article Google Scholar
Robson, K., & Pevalin, D. J. (2016). Multilevel modeling in plain language. London: SAGE.
Book Google Scholar
Singer, J. D., & Willett, J. B. (2003). Doing data analysis with the multilevel model for change. Applied longitudinal data analysis: modeling change and event occurrence (pp. 75–137). Oxford: Oxford University Press.
Chapter Google Scholar
Thurstone, L. L. (1919). The learning curve equation. Psychological Review, 34, 278–286.
Google Scholar
van der Linden, W. J. (2009). Constrained adaptive testing with shadow tests. Elements of adaptive testing (pp. 31–55). New York: Springer.
Chapter Google Scholar
Versteeg, M., Hendriks, R. A., Thomas, A., Ommering, B. W. C., & Steendijk, P. (2020). Conceptualising spaced learning in health professions education: A scoping review. Medical Education., 54(3), 205–216.
Article Google Scholar
Wang, L., Zhang, Z., McArdle, J. J., & Salthouse, T. A. (2008). Investigating ceiling effects in longitudinal data analysis. Multivariate Behavioral Research, 43(3), 476–496.
Article Google Scholar
Wolfe, J. M., Evans, K. K., Drew, T., Aizenman, A., & Josephs, E. (2016). How do radiologists use the human search engine? Radiation Protection Dosimetry, 169(1–4), 24–31.
Article Google Scholar
Wood, G., BattAppelboam, J. A., Harris, A., & Wilson, M. R. (2013). Exploring the impact of expertise, clinical history, and visual search on electrocardiogram interpretation. Medical Decision Making, 34(1), 75–83.
Article Google Scholar

Download references

Acknowledgements

This work was funded by the U.S. Department of Defense Medical Simulation and Information Sciences Research Program Grant Number W81XWH-16-1-0797. The funding source played no role in the design and conduct of the study. The authors would like to acknowledge the support in developing the ECG dataset by the full grant team including Drs. Julie Friedman, Joseph Bennett, David Rhee, Barry Rosenzweig and Jeffrey Lorin. The elbow radiograph dataset is provided courtesy of Drs. Kathy Boutis and Martin Pecaric with that work having been funded by a Royal College of Physicians and Surgeons of Canada Medical Education Research Grant.

Author information

Authors and Affiliations

Institute for Innovations in Medical Education, NYU Grossman School of Medicine, 550 First Avenue, MSB G109, New York, NY, 10016, USA
Ilan Reinstein & Martin V. Pusic
Department of Applied Statistics, Social Science, and the Humanities, New York University, New York, NY, USA
Jennifer Hill
Department of Medicine, Office of Applied Scholarship and Education Science, School of Continuous Professional Development, Mayo Clinic College of Medicine and Science, Rochester, MN, USA
David A. Cook
Zamierowksi Institute for Experiential Learning, University of Kansas Medical Center, Kansas City, KS, USA
Matthew Lineberry
Department of Emergency Medicine, NYU Grossman School of Medicine, New York, NY, USA
Martin V. Pusic

Authors

Ilan Reinstein
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Hill
View author publications
You can also search for this author in PubMed Google Scholar
David A. Cook
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Lineberry
View author publications
You can also search for this author in PubMed Google Scholar
Martin V. Pusic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin V. Pusic.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Variable Definition and Notation

See Table 4

Table 4 Notation table for variables, parameters and indexes

Full size table

Appendix 2

Comments on Local Independence

Conditional independence is an important limiting assumption of both IRT and traditional linear models (De Boeck, 2004). Specifically, the errors in a regression model are assumed to be independent of each other. This may be more or less plausible depending on what is conditioned on in the model. For instance, once we include the covariate vector \({\mathbf{x}}\), the fixed effects for diagnosis (or item, or item order) this assumption applies to only the portion of the response not explained by these inputs, making this independence assumption more plausible because it allows responses to be more similar to each other within groups defined by the inputs (for instance among responses for items with the same diagnosis).

A critical advantage of multilevel modeling is the ability to explicitly account for non-independence of observations even beyond this. For instance the random effects associated with each person p in our models allow for (a specific type of) correlation across measurements for the same person beyond the fact that we predict that they will have a similar average value. They allow us to make a more flexible assumption of a correlation of measures within person.

Our models do not include this type of explicit correlation within items or even within diagnosis. However, in a learning context with longitudinal data, we expect the knowledge gained on early items will improve learners’ accuracy on subsequent items, especially of the same diagnosis. For our final model in Eq. 14 ("Discussion" section), we have assumed that responses of a single learner at a given time are independent of previous responses, conditioned on specific learner parameters (learning rate and base knowledge). In the data we used for this paper, items were reviewed by the learners in a random order and we measured the learning trajectory adjusted for the difficulty of the items within diagnosis. Through the inclusion of the diagnoses’ fixed effects, we ameliorate some of the problem in that we allow items with a similar diagnosis to share a common mean. Additional correlation may exist, and moreover, the time trajectory could induce further correlation.

Future models could consider other covariance structures to account for response dependence, for example by assuming the learning rate at future times will depend on the amount of learning up to that specific point in time. Another strategy would be to include random effects for diagnosis therefore allowing us to distinguish specific item effects within the different diagnosis groups. Our intent is not to map out all possibilities but rather to point out that the multi-level modeling context has the flexibility to address a limiting assumption of the traditional logistic model.

Appendix 3

Model Evaluation

This appendix evaluates and compares the models used to develop the nonlinear learning trajectories – the multilevel logistic models from Eqs. 6, 8 and 14. The primary goal of this comparison is to evaluate whether adjusting for the diagnosis helps explain the learning in greater detail and provides more accurate learner-level predictions.

We first evaluate relative model performance by comparing the information criteria Akaike Information Criteria and Bayesian Information Criteria (AIC and BIC) across the models. As we add informative predictors to our models, we expect these measures to improve (i.e. decrease). As Table 4 demonstrates, all three criteria do indeed favor the learning trajectory models that incorporate diagnosis information.

See Table 5

Table 5 Model diagnostics comparison

Full size table

To further evaluate our models, we can perform simulations to generate new data corresponding to the assumptions implicit in the model. (Gelman and Hill 2007a) For each simulated dataset we can calculate relevant summaries of the data. We use two such statistics. The first two are the mean and standard deviation of the total number of correct responses at the population level (across learners—see Fig. 10, Elbow Dataset). We then compare the same statistics in our observed data to the distribution of statistics formed from our simulated datasets. This evaluation will help us understand whether our modeling assumptions are reasonable for the data we are trying to understand. Said another way, we can see whether it is reasonable to assume our observed data arose from a world meeting the assumptions of the model.

See Fig. 10.

We can see in Fig. 10a and b the results of our first two test statistics for the three models we compared on Table 2. These demonstrate that although the complete pooling approach closely reproduces the mean of the total correct responses it underestimates the standard deviation of this summary statistic by an important margin. Also note that, when we include random intercept and slopes, the additional variance terms help us correct this behavior and we not only are able to get a better fit to the standard deviation, but to the mean as well. Both multilevel models closely reproduce the first summary statistic and the second falls well within the predictive simulated distributions for both models. The datasets and code necessary to repeat these analyses is available at: https://github.com/IlanReinstein/ecg_paper

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reinstein, I., Hill, J., Cook, D.A. et al. Multi-level longitudinal learning curve regression models integrated with item difficulty metrics for deliberate practice of visual diagnosis: groundwork for adaptive learning. Adv in Health Sci Educ 26, 881–912 (2021). https://doi.org/10.1007/s10459-021-10027-0

Download citation

Received: 21 July 2020
Accepted: 07 January 2021
Published: 01 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10459-021-10027-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-level longitudinal learning curve regression models integrated with item difficulty metrics for deliberate practice of visual diagnosis: groundwork for adaptive learning

Abstract

Access this article

Similar content being viewed by others

Re-evaluating GPT-4’s bar exam performance

Predicting academic success in higher education: literature review and best practices

Tailored gamification in education: A literature review and future agenda

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-level longitudinal learning curve regression models integrated with item difficulty metrics for deliberate practice of visual diagnosis: groundwork for adaptive learning

Abstract

Access this article

Similar content being viewed by others

Re-evaluating GPT-4’s bar exam performance

Predicting academic success in higher education: literature review and best practices

Tailored gamification in education: A literature review and future agenda

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation