Abstract
Visual diagnosis of radiographs, histology and electrocardiograms lends itself to deliberate practice, facilitated by large online banks of cases. Which cases to supply to which learners in which order is still to be worked out, with there being considerable potential for adapting the learning. Advances in statistical modeling, based on an accumulating learning curve, offer methods for more effectively pairing learners with cases of known calibrations. Using demonstration radiograph and electrocardiogram datasets, the advantages of moving from traditional regression to multilevel methods for modeling growth in ability or performance are demonstrated, with a final step of integrating case-level item-response information based on diagnostic grouping. This produces more precise individual-level estimates that can eventually support learner adaptive case selection. The progressive increase in model sophistication is not simply statistical but rather brings the models into alignment with core learning principles including the importance of taking into account individual differences in baseline skill and learning rate as well as the differential interaction with cases of varying diagnosis and difficulty. The developed approach can thus give researchers and educators a better basis on which to anticipate learners’ pathways and individually adapt their future learning.
Similar content being viewed by others
References
Anderson, S. J., Hecker, K. G., Krigolson, O. E., & Jamniczky, H. A. (2018). A Reinforcement-Based Learning Paradigm Increases Anatomical Learning and Retention—A Neuroeducation Study. Frontiers in Human Neuroscience, 12, 38.
Bates, D., Maechler, M., Bolker, B, & Walker, S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2000. https://arxiv.org/abs/1406.5823v1
Bates D, Maechler M, Bolker B, Walker S. (2015). lme4: Linear Mixed-Effects Models Using Eigen and S4. R package version 1.1–10, URL http://CRAN.R-project.org/package=lme4.
Bok, H. G., de Jong, L. H., O’Neill, T., Maxey, C., & Hecker, K. G. (2018). Validity evidence for programmatic assessment in competency-based education. Perspectives on medical education., 7(6), 362–372.
Bolsin, S., & Colson, M. (2000). The use of the Cusum Technique in the assessment of trainee competence in new procedures. International Journal for Quality in Health Care, 12(5), 433–438.
Boutis, K., Pecaric, M., Carrière, B., Stimec, J., Willan, A., Chan, J., & Pusic, M. (2019). The effect of testing and feedback on the forgetting curves for radiograph interpretation skills. Medical Teacher, 41(7), 756–764.
Bryk, A. S., & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147–158.
Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning: A temporal ridgeline of optimal retention. Psychological Science., 19(11), 1095–1102.
Cepeda, N. J., Coburn, N., Rohrer, D., Wixted, J. T., Mozer, M. C., & Pashler, H. (2009). Optimizing distributed practice: Theoretical analysis and practical implications. Experimental Psychology., 56(4), 236–246.
Chaiklin, S. (2003). The zone of proximal development in Vygotsky’s analysis of learning and instruction. In A. Kozulin, B. Gindis, V. Ageyev, & S. Miller (Eds.), Vygotsky’s educational theory in cultural context (First (pp. 39–64). Cambridge, England: Cambridge University Press.
Davis, A. L., Pecaric, M., Pusic, M. V., Smith, T., Shouldice, M., Brown, J., & Boutis, K. (2020). Deliberate practice as an educational method for learning to interpret the prepubescent female genital examination. Child Abuse and Neglect, 101, 104379.
De Boeck, P., et al. (2011). The Estimation of Item Response Models with the lmer Function from the lme4 Package in R. Journal of Statistical Software, 39(12), 1–28.
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
Diederich, E., Thomas, L., Mahnken, J., & Lineberry, M. (2018). Pretest scores uniquely predict 1-year-delayed performance in a simulation-based mastery course for central line insertion. Simulation in Healthcare, 13(3), 163–167.
Doran, H., Bates, D., Bliese, P., & Dowling, M. (2007). Estimating the multilevel rasch model: with the lme4 package. Journal of Statistical Software, 20(2), 1–18.
Downing, S. M. (2003). Item response theory: applications of modern test theory in medical education. Medical education, 37(8), 739–745.
Ericsson, K. A. (2004). Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Academic Medicine, 79(10), S70–S81.
Ericsson, K. A. (2015). Acquisition and maintenance of medical expertise. Academic Medicine, 90(11), 1471–1486.
Ertmer, P. A., & Newby, T. J. (1993). Behaviorism, cognitivism, constructivism: comparing critical features from an instructional design perspective. Performance Improvement Quarterly, 6(4), 50–72.
Faraway, J. J. (2016). Linear models with R. Boca Raton: Chapman and Hall/CRC.
Gelman, A., & Hill, J. (2007). Multilevel linear models: the basics. Data analysis using regression and multilevel/hierarchical models (pp. 251–278). Cambridge: Cambridge University Press.
Gelman, A., & Hill, J. (2007). Simulation of Probability Models and Statistical Inferences in Data Analysis Using Regression and Multilevel/Hierarchical Models (pp. 251–278). Cambridge: Cambridge University Press.
Guadagnoli, M., Morin, M., & Dubrowski, A. (2012). The application of the challenge point framework in medical education. Medical Education, 46, 447–453.
Gulliksen, H. (1934). A rational equation of the learning curve based on Thorndike’s law of effect. The Journal of General Psychology, 11(2), 395–434.
Hatala, R., Gutman, J., Lineberry, M., Triola, M., & Pusic, M. (2019). How well is each learner learning? Validity investigation of a learning curve-based assessment approach for ECG interpretation. Advances in Health Sciences Education, 24(1), 45–63.
Jaber, M. Y., & Bonney, M. (1997). A comparative study of learning curves with forgetting. Applied Mathematical Modelling, 21(8), 523–531.
Jonassen, D. (1991). Objectivism versus constructivism: Do we need a new philosophical paradigm? Educational Technology Research and Development, 39(3), 5–14.
Kerfoot, B. P., Baker, H., Pangara, L., Agarwal, K., Taffet, G., Mechaber, A. J., et al. (2012). An online spaced-education game to teach and assess medical students. Academic Medicine, 87(10), 1443–1449.
Linacre, J. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7(4), 328.
Lindsey, R. V., Shroyer, J. D., Pashler, H., & Mozer, M. C. (2014). Improving students’ long-term knowledge retention through personalized review. Psychological Science, 25(3), 639–647.
Park, O., & Lee, J. (2003). Adaptive instructional systems. Educational Technology Research and Development, 25, 651–684.
Pavlik, P. I., & Anderson, J. R. (2008). Using a model to compute the optimal schedule of practice. Journal of Experimental Psychology: Applied, 14(2), 101–117.
Pecaric, M., Boutis, K., Beckstead, J., & Pusic, M. (2017). A big data and learning analytics approach to process-level feedback in cognitive simulations. Academic Medicine, 92(2), 175–184.
Price, D. W., Swanson, D. B., Irons, M. B., & Hawkins, R. E. (2018 Sep 2). Longitudinal assessments in continuing specialty certification and lifelong learning. Medical Teacher, 40(9), 917–919.
Pusic, M. V., Pecaric, M., & Boutis, K. (2011). How much practice is enough? Using learning curves to assess the deliberate practice of radiograph interpretation. Academic Medicine, 86, 731–736.
Pusic, M. V., Boutis, K., Hatala, R., & Cook, D. (2015). Learning curves in health professions education. Academic Medicine : Journal of the Association of American Medical Colleges, 90(8), 1034–1042.
Pusic, M. V., Boutis, K., Pecaric, M. R., Savenkov, O., Beckstead, J. W., & Jaber, M. Y. (2017). A primer on the statistical modelling of learning curves in health professions education. Advances in Health Sciences Education, 22(3), 741–759.
Pusic, M. V., Boutis, K., & McGaghie, W. C. (2018). Role of scientific theory in simulation education research. Simulation in Healthcare, 13(3S), S7-14.
R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Rijmen, F., Tuerlinckx, F., Boeck, P. D., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8(2), 185–205.
Robson, K., & Pevalin, D. J. (2016). Multilevel modeling in plain language. London: SAGE.
Singer, J. D., & Willett, J. B. (2003). Doing data analysis with the multilevel model for change. Applied longitudinal data analysis: modeling change and event occurrence (pp. 75–137). Oxford: Oxford University Press.
Thurstone, L. L. (1919). The learning curve equation. Psychological Review, 34, 278–286.
van der Linden, W. J. (2009). Constrained adaptive testing with shadow tests. Elements of adaptive testing (pp. 31–55). New York: Springer.
Versteeg, M., Hendriks, R. A., Thomas, A., Ommering, B. W. C., & Steendijk, P. (2020). Conceptualising spaced learning in health professions education: A scoping review. Medical Education., 54(3), 205–216.
Wang, L., Zhang, Z., McArdle, J. J., & Salthouse, T. A. (2008). Investigating ceiling effects in longitudinal data analysis. Multivariate Behavioral Research, 43(3), 476–496.
Wolfe, J. M., Evans, K. K., Drew, T., Aizenman, A., & Josephs, E. (2016). How do radiologists use the human search engine? Radiation Protection Dosimetry, 169(1–4), 24–31.
Wood, G., BattAppelboam, J. A., Harris, A., & Wilson, M. R. (2013). Exploring the impact of expertise, clinical history, and visual search on electrocardiogram interpretation. Medical Decision Making, 34(1), 75–83.
Acknowledgements
This work was funded by the U.S. Department of Defense Medical Simulation and Information Sciences Research Program Grant Number W81XWH-16-1-0797. The funding source played no role in the design and conduct of the study. The authors would like to acknowledge the support in developing the ECG dataset by the full grant team including Drs. Julie Friedman, Joseph Bennett, David Rhee, Barry Rosenzweig and Jeffrey Lorin. The elbow radiograph dataset is provided courtesy of Drs. Kathy Boutis and Martin Pecaric with that work having been funded by a Royal College of Physicians and Surgeons of Canada Medical Education Research Grant.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Variable Definition and Notation
See Table 4
Appendix 2
Comments on Local Independence
Conditional independence is an important limiting assumption of both IRT and traditional linear models (De Boeck, 2004). Specifically, the errors in a regression model are assumed to be independent of each other. This may be more or less plausible depending on what is conditioned on in the model. For instance, once we include the covariate vector \({\mathbf{x}}\), the fixed effects for diagnosis (or item, or item order) this assumption applies to only the portion of the response not explained by these inputs, making this independence assumption more plausible because it allows responses to be more similar to each other within groups defined by the inputs (for instance among responses for items with the same diagnosis).
A critical advantage of multilevel modeling is the ability to explicitly account for non-independence of observations even beyond this. For instance the random effects associated with each person p in our models allow for (a specific type of) correlation across measurements for the same person beyond the fact that we predict that they will have a similar average value. They allow us to make a more flexible assumption of a correlation of measures within person.
Our models do not include this type of explicit correlation within items or even within diagnosis. However, in a learning context with longitudinal data, we expect the knowledge gained on early items will improve learners’ accuracy on subsequent items, especially of the same diagnosis. For our final model in Eq. 14 ("Discussion" section), we have assumed that responses of a single learner at a given time are independent of previous responses, conditioned on specific learner parameters (learning rate and base knowledge). In the data we used for this paper, items were reviewed by the learners in a random order and we measured the learning trajectory adjusted for the difficulty of the items within diagnosis. Through the inclusion of the diagnoses’ fixed effects, we ameliorate some of the problem in that we allow items with a similar diagnosis to share a common mean. Additional correlation may exist, and moreover, the time trajectory could induce further correlation.
Future models could consider other covariance structures to account for response dependence, for example by assuming the learning rate at future times will depend on the amount of learning up to that specific point in time. Another strategy would be to include random effects for diagnosis therefore allowing us to distinguish specific item effects within the different diagnosis groups. Our intent is not to map out all possibilities but rather to point out that the multi-level modeling context has the flexibility to address a limiting assumption of the traditional logistic model.
Appendix 3
Model Evaluation
This appendix evaluates and compares the models used to develop the nonlinear learning trajectories – the multilevel logistic models from Eqs. 6, 8 and 14. The primary goal of this comparison is to evaluate whether adjusting for the diagnosis helps explain the learning in greater detail and provides more accurate learner-level predictions.
We first evaluate relative model performance by comparing the information criteria Akaike Information Criteria and Bayesian Information Criteria (AIC and BIC) across the models. As we add informative predictors to our models, we expect these measures to improve (i.e. decrease). As Table 4 demonstrates, all three criteria do indeed favor the learning trajectory models that incorporate diagnosis information.
See Table 5
To further evaluate our models, we can perform simulations to generate new data corresponding to the assumptions implicit in the model. (Gelman and Hill 2007a) For each simulated dataset we can calculate relevant summaries of the data. We use two such statistics. The first two are the mean and standard deviation of the total number of correct responses at the population level (across learners—see Fig. 10, Elbow Dataset). We then compare the same statistics in our observed data to the distribution of statistics formed from our simulated datasets. This evaluation will help us understand whether our modeling assumptions are reasonable for the data we are trying to understand. Said another way, we can see whether it is reasonable to assume our observed data arose from a world meeting the assumptions of the model.
See Fig. 10.
We can see in Fig. 10a and b the results of our first two test statistics for the three models we compared on Table 2. These demonstrate that although the complete pooling approach closely reproduces the mean of the total correct responses it underestimates the standard deviation of this summary statistic by an important margin. Also note that, when we include random intercept and slopes, the additional variance terms help us correct this behavior and we not only are able to get a better fit to the standard deviation, but to the mean as well. Both multilevel models closely reproduce the first summary statistic and the second falls well within the predictive simulated distributions for both models. The datasets and code necessary to repeat these analyses is available at: https://github.com/IlanReinstein/ecg_paper
Rights and permissions
About this article
Cite this article
Reinstein, I., Hill, J., Cook, D.A. et al. Multi-level longitudinal learning curve regression models integrated with item difficulty metrics for deliberate practice of visual diagnosis: groundwork for adaptive learning. Adv in Health Sci Educ 26, 881–912 (2021). https://doi.org/10.1007/s10459-021-10027-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-021-10027-0