

The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 63:1246-1256 (2008)
© 2008 The Gerontological Society of America
Creating a Computer Adaptive Test Version of the Late-Life Function and Disability Instrument
Alan M. Jette,
Stephen M. Haley,
Pengsheng Ni,
Sippy Olarsch and
Richard Moed
1 Health & Disability Research Institute, Boston University School of Public Health, Boston, Massachusetts.
2 CREcare, LLC, Boston, Massachusetts.
Address correspondence to Alan M. Jette, PhD, PT, Health & Disability Research Institute, Boston University School of Public Health, 580 Harrison Ave., 2nd Floor, Boston, MA 02118. E-mail: ajette{at}bu.edu
 |
Abstract
|
|---|
Background. This study applied item response theory (IRT) and computer adaptive testing (CAT) methodologies to develop a prototype function and disability assessment instrument for use in aging research. Herein, we report on the development of the CAT version of the Late-Life Function and Disability Instrument (Late-Life FDI) and evaluate its psychometric properties.
Methods. We used confirmatory factor analysis, IRT methods, validation, and computer simulation analyses of data collected from 671 older adults residing in residential care facilities. We compared accuracy, precision, and sensitivity to change of scores from CAT versions of two Late-Life FDI scales with scores from the fixed-form instrument. Score estimates from the prototype CAT versus the original instrument were compared in a sample of 40 older adults.
Results. Distinct function and disability domains were identified within the Late-Life FDI item bank and used to construct two prototype CAT scales. Using retrospective data, scores from computer simulations of the prototype CAT scales were highly correlated with scores from the original instrument. The results of computer simulation, accuracy, precision, and sensitivity to change of the CATs closely approximated those of the fixed-form scales, especially for the 10- or 15-item CAT versions. In the prospective study, each CAT was administered in <3 minutes and CAT scores were highly correlated with scores generated from the original instrument.
Conclusions. CAT scores of the Late-Life FDI were highly comparable to those obtained from the full-length instrument with a small loss in accuracy, precision, and sensitivity to change.
Key Words: Outcome assessment (Health Care) Geriatrics Rehabilitation
DIFFICULTY with physical functioning and disability are widely recognized as serious problems among older adults and predict nursing home admission, hospitalization, physician utilization, overall dependency, and mortality (1–4). Consequently, physical functioning and disability assessment have become standards in the evaluation of older persons in gerontological research.
Although the field has seen great improvement over the past 30 years in patient-reported instruments that assess function and disability, the length and complexity of many fixed-form instruments are problematic and raise concerns over respondent burden and administration costs (5,6). The shift to shorter fixed-form versions of patient-reported instruments has raised concern over resultant loss of precision and insensitivity to clinically meaningful change (7). These well-recognized limitations are largely caused by the defining signature of traditional instruments that consist of a fixed set of questions that must be administered to all participants. Respondents are often frustrated by redundant items and items that to them are of low salience and relevance (8–10). Consequently, no function and disability measure can be considered as a "gold standard."
Contemporary methods of instrument construction coupled with new data collection models provide a promising means for simultaneously achieving measurement breadth, precision, and sensitivity across the broad range of function for older adults while reducing the burden and cost of data collection. Outcome measurement has seen important advances with the introduction of item response theory (IRT) methods (11–14), which have allowed researchers to develop quantitative scales that are sensitive to the smaller functional change often seen in older adults. Nevertheless, IRT methods alone have been insufficient to balance comprehensiveness of coverage, precision, and sensitivity against assessment feasibility. Recently, computer adaptive testing (CAT) methods have been introduced into the health field as a potential solution to this measurement dilemma (15,16). Adaptive testing approaches tailor the assessment to the current level of function or disability of the older adult so that only items that are neither too hard nor too easy are administered. In CAT administration, the program uses the response to an initial question to establish a general range of likely function. Subsequent questions are selected through application of algorithms to progressively refine the estimated score to the range of precision established a priori by the examiner. Regardless of the actual items administered, all scores are on the same scale, which supports comparisons across time or across groups of individuals with different levels of current functional performance. A detailed discussion of IRT and CAT methodology is available elsewhere (17).
The development of a CAT requires: (i) a set of items (item bank) that examines each outcome; (ii) items that scale consistently on a single dimension from low to high ability; and (iii) rules to guide starting, stopping, and scoring. Although CAT offers a potential solution to the conflict between psychometric adequacy and measurement feasibility, the psychometric properties of CAT instruments must be demonstrated empirically. In the present study, we conducted analyses to identify distinct conceptual domains within the Late-Life Function and Disability Instrument (Late-Life FDI), calibrated quantitative scales, and created a prototype CAT version of the instrument. We conducted computer simulation studies of two CAT scales to assess their accuracy, precision, and sensitivity to change. In a small prospective study, we estimated the comparability of CAT scores with fixed-form scales of the Late-Life FDI and calculated the time to complete each CAT.
 |
METHODS
|
|---|
Samples
CAT development sample.--
Participants for confirmatory factor analysis, IRT analyses, and simulation studies come from the Promoting Independence in Residential Care (PIRC) study that examined outcomes of a functional training intervention in 671 individuals residing in 32 residential care facilities in Auckland and Christchurch, New Zealand. Baseline data from the PIRC study were used to assess the underlying structure of the Late-Life FDI and for simulation studies of the accuracy and precision of the CAT version of the Late-Life FDI. Although 671 participants completed the baseline full-length instrument, there were 476 participants who finished the 12-month follow-up and whose data were used to assess the CAT scales' sensitivity to change. Because the attrition of participants could affect the generalizability of findings, comparisons of background information between completers and noncompleters were performed.
Pilot study sample.--
We recruited a sample of 40 older adults from the greater Boston area with limitations in function and/or physical disabilities who were participants in previous studies conducted by the Boston University Health & Disability Research Institute. These were used to calculate the time needed to complete each CAT and to compare scores generated from the CAT and fixed-form versions of the Late-Life FDI.
Instrument
The Late-Life FDI.--
The fixed-form Late-Life FDI consists of 64 items that provide a patient-reported assessment of distinct function and disability domains in older adults (18–22). The Function component assesses patient-reported difficulty in performing 32 functional tasks. The Disability component, which contains 32 life activities, assesses two dimensions: (i) frequency (16 items – "how often..."), and (ii) limitation (16 items – "how limited..."). Two approaches to assessing disability allow older persons to respond differently to questions of what they actually do in daily life versus what they are capable of doing. Raw logit scores for each domain were transformed to summary scale scores that were transformed to a t-scale where a score of 50 is the mean with 10 as the standard deviation (SD). Based on a one-parameter Rasch model, each domain was scored on a similar metric where higher scores reflected better function or less disability (23). The Late-Life FDI is being used in gerontological research (24) although some concern has been reported regarding the 20- to 30-minute administration time (25).
Analyses
Late-Life FDI domains.--
We tested the underlying structure of the Late-Life FDI items in a series of confirmatory factor analyses (26) and used MPlus software (Muthén & Muthén, Los Angeles, CA) to evaluate item loadings and residual correlations between items (27). Because the data were not normally distributed, we used the weighted least square estimation, which is based on polytotic correlation. We used multiple fit statistics to check the model fit: (i) chi-square to degrees of freedom ratio, comparative fit index (CFI), Tucker–Lewis Index (TLI), and root mean square error approximation (RMSEA) (CFI and TLI compare the testing model to a baseline null model, values range from 0 to 1 [
0.95 suggests acceptable fit]; RMSEA assesses misfit per degree of freedom, values <0.08 suggest an acceptable fit, whereas values <0.05 suggest very good fit); (ii) the magnitude of the factor loadings on the primary factor; and (iii) residual correlation greater than ±0.2 (a higher residual correlation indicated that the primary factor could not fully explain the correlation between items, or it indicates violation of the local independence assumption).
We used weighted least squares means and variance-adjusted estimation methods, which are more precise when analyzing moderately sized samples with skewed categorical data (26,28). To determine the extent to which a unidimensional model adequately represented scale structure, we considered the eigenvalues associated with each factor extracted, item loadings on the primary factor, and results from the overall model fit tests.
Item calibrations.--
The item calibrations for each scale were estimated using the Rasch IRT model, which estimates the item difficulty parameters (29–31). The Rasch model was selected as the best solution for this phase of the project because of simplicity in interpretation and flexibility in adjusting to the underlying form of the population or trait distributions. Item calibrations from a Rasch partial credit model are estimates of each item's level of difficulty based on the item response pattern in the sample data. Using item calibrations for all items, we estimated IRT-based scores for each function and disability domain using weighted maximum likelihood estimation methods (23,28). We evaluated fit using the INFIT and OUTFIT statistics for each item based on the comparison of expected and observed value across the distribution of each latent variable; Bonferroni-corrected p values were used for significance testing. The scores estimated from the IRT model were standardized to have a mean of 50 (SD = 10).
Differential item functioning.--
In IRT a participant's score on an item should depend entirely on the latent variable being measured. Significant differential item functioning (DIF) indicates that variables other than the latent variable, such as age, or gender, are likely influencing the response (32). There are two kinds of DIF: One is uniform DIF, which means that the item response difference is constant across the participants' ability levels; another is non-uniform DIF, which means that the item response difference will change at different ability levels. Logistic regression was used to detect the uniform DIF and non-uniform DIF across gender and age and between the baseline and follow-up assessments using statistical significance and a cutoff of the R2 change >0.02. In these regressions, the dependent variable was the function/disability item score and the independent variables were the ability level as assessed by the Late-Life FDI, the background variable being examined for DIF (e.g., cognitive level), and the interaction between the background variable and the ability level estimate. In a DIF analysis, if the background variable effect is significant and the interaction term is not, then the item displays uniform DIF. If the interaction variable is significant, then the item has non-uniform DIF. We used level of statistical significance based on the likelihood ratio test and used Bonferroni-corrected p values for significance testing. R2 change was used to quantify the effect size of uniform and non-uniform DIF using the criterion based on the work of Jodoin and Gierl (33).
Development of the CAT program.--
After a final item pool was identified and item calibrations were generated for each domain, we constructed the Late-Life FDI CAT algorithms on the HDRI software developed at the Boston University Health & Disability Research Institute. The CATs were designed to be patient-reported and are administered from a stand-alone or laptop computer. We programmed the CATs to use weighted maximum likelihood score estimation, and we selected initial items from those in the middle of the function and disability range. The response to the first item was fed into the CAT algorithm, and the application calculated a probable score as well a person-specific measure of how precise that score was. If the score was not estimated with sufficient precision, according to internal guidelines, additional questions were selected and administered until either the precision standard was reached or the defined maximum number of items had been administered. The CAT process is illustrated in Figure 1.

View larger version (19K):
[in this window]
[in a new window]
|
Figure 1. Example of Late-Life Function and Disability Instrument computer adaptive testing (Late-Life FDI CAT), using four items to meet stopping rule. CI = confidence interval
|
|
Psychometric evaluation of the CAT.--
We conducted a series of simulations to estimate the CAT scales' accuracy, precision, validity, and sensitivity to change. In these simulations, responses to items selected by the CAT software were obtained for cases in the PIRC data set and were "fed" to the computer to simulate the conditions of an actual CAT assessment. As in an actual CAT, the simulation used the IRT model to select the best item to administer next, i.e., the one with the highest information function given the current score level, re-estimated the domain score and confidence interval, and decided whether to continue testing. To compare results from the CAT and fixed-length scales in the simulation studies, we used a fixed-stopping rule of 5, 10, and 15 items for the 32-item function scale, and 5 and 10 items for the disability scale because the disability limitation scale had only 16 items. For the prospective evaluation, we used a 10-item CAT and compared that to each fixed-form function and disability limitation scale.
The accuracy and precision simulations were conducted in the baseline PIRC sample (N = 671). To assess CAT accuracy, Pearson correlations were calculated between each of the CAT-generated scores and the fixed-form Late-Life FDI domain scales to assess the extent to which simulated CAT scores were consistent with scores from the full-length instrument. To compare the relative precision of the CAT scores to scores from the full-length scales, we plotted the standard errors in relation to the ability scores in the sample.
The comparability of simulated CAT-based estimates in measuring change over time was examined within the development sample who had been administered each Late-Life FDI scale at baseline and at a 12-month follow-up assessment. Average scores, change scores, effect sizes, and standardized response means were compared (34). We did not examine responsiveness in relation to an external standard.
Known-groups validity was assessed by calculating Spearman correlation coefficients (±95% confidence intervals) between Late-Life FDI CAT scales and measures of physical performance in the Elderly Mobility Scale (EMS) (35). We used quartiles to group scores from the EMS into <25th, 25th–49th, 50th–74th, and >75th.
Pilot study.--
The Late-Life FDI CATs and fixed-form scales were administered by telephone interview with a sample of community-dwelling older adults with disabilities. Pearson correlations were calculated between each of the CAT-generated scores and the fixed-form Late-Life FDI scales to assess the extent to which simulated CAT scores were consistent with scores from the original instrument.
We provided interviewers with formal training in the administration of the CAT. We had an internal clock to track the amount of time and the number of items needed to meet preset levels of precision for each CAT. Participants were told that they were being asked to help us evaluate a new instrument and that similar items would be asked. Demographic information (ethnicity, sex, age, and functional level) was available for each participant. All procedures were approved by the Institutional Review Board at Boston University Medical Center.
 |
RESULTS
|
|---|
In the baseline PIRC sample, the average age was 84.1 years (SD = 7.4) and 73% were female (Table 1). There were no statistically significant differences in mean age, gender, or Late-Life FDI scores between participants who completed the 12-month follow-up and those who dropped out.
Function and Disability Domains
We tested several different models within each domain of the Late-Life FDI. In the function domain, a one-dimensional model across all 32 items achieved an acceptable level of fit, explained 57% of the variance, and was easily interpretable (Table 2). The percentage of residual covariance greater than ±0.2 was 2.2% of this one-dimensional function model, which indicated that the local independence was acceptable. The 16 items in the disability limitation scale also fit a one-dimensional model (Table 2). In this model, there was no residual covariance value greater than ±0.2, which means that the local dependent assumption was satisfied. In contrast, the 16-item disability frequency scale did not fit an acceptable one-dimensional model. Even when the disability domain was broken into two scales, the level of fit was not acceptable. After removing five items from the frequency scale, the one-dimensional model achieved an acceptable degree of fit, resulting in an 11-item disability frequency scale. With only 11 items, a disability frequency CAT was not developed.
Item Calibrations
The distribution of the Late-Life FDI function and disability scale scores in this sample is displayed in Figures 2 and 3 along with the distribution of response categories at each level of the scoring distribution. For the function scale at baseline, one member of the sample displayed floor and ceiling effects, whereas two participants had a ceiling effect at 12 months. At baseline, in the disability limitation scale, 17 participants (2.5%) of the sample exhibited a ceiling effect, but no floor effects were seen at either time point. For the disability frequency scale, one participant had a floor effect, but no participants displayed a ceiling effect at baseline or at the 12-month follow-up. Item calibrations in each domain of the Late-Life FDI are displayed in Table 3.
DIF
There was one item (unscrewing a jar lid) that did display gender DIF in the function scale. Because this item displayed only marginal DIF (an R2 change of <0.0284) and because of the importance of the content of this item, we decided to retain the item in the function scale. The effect of this item on the CAT simulation results was very small. If gender DIF is identified in future analyses of an expanded function item bank, the analysis can be stratified by gender. There were no function scale items that showed DIF across level of cognitive impairment or across the two assessment time points.
There were no items that showed gender, cognitive impairment, or age DIF within the disability limitation scale. The logistic regression results did detect the DIF between baseline and 12-month assessment for the Active Recreation item, which showed significant DIF across two time points.
Accuracy, Precision, Validity, and Sensitivity to Change of the CAT
As Table 4 displays, the correlation among scores based on the fixed-form versions and the 5-, 10-, 15-item function domain CATs and the 5- and 10-item disability limitation domain CATs ranged from 0.90 to 0.99 in the baseline sample and at comparably high levels in the 12-month sample, indicating a high and consistent degree of accuracy using CATs.
View this table:
[in this window]
[in a new window]
|
Table 4. Intraclass Correlations Between Fixed-Form Late-Life FDI and CAT Simulation Scores in the Baseline and Follow-Up Samples.
|
|
In the function and disability limitation CATs, the standard errors of 5-item CATs were consistently larger than the 10- and 15- item CATs across all ranges (Figures 4 and 5), indicating less precision for the 5-item CAT. The CAT standard errors were only slightly larger than those from the full-length version reflecting the fewer number of items that were used to calculate the overall score. For both scales, the standard errors were greater at extreme score ranges.

View larger version (11K):
[in this window]
[in a new window]
|
Figure 4. Plot of standard error of participant scores based on 5- and 10-item Late-Life disability computer adaptive testing (CAT) compared with all items for that scale
|
|

View larger version (14K):
[in this window]
[in a new window]
|
Figure 5. Plot of standard error of participant scores based on 5-, 10-, and 15-item Late-Life function computer adaptive testing (CAT) compared with all items for that scale
|
|
The ability of each CAT version to discriminate between groups of older adults on the basis of mobility limitations was evaluated by comparing average scores on the function and disability CATs across participant groups that scored in different quartiles on the EMS. As hypothesized, the CAT average scores were significantly different for participants in lower EMS quartiles as compared with scores for participants in higher EMS quartiles, an indication of the CAT's validity (Table 5).
The descriptive statistics and SD values of baseline and follow-up scores from the 5- and 10-item CATs were quite similar to those from the original version of the Late-Life FDI for all domains (Table 6). The effect sizes and standardized response mean values for both CAT versions are somewhat smaller than those for the full-length version of the instrument.
Pilot Study
The function CAT, on average, took 2.5 (SD = 0.87) minutes, and the disability limitation CAT took 2.8 (SD = 1.4) minutes, on average, to complete by telephone administration in comparison to taking 20 minutes, on average, to complete the fixed-form versions of both scales of the Late-Life FDI. Scores estimated by each CAT scale were highly correlated with the scores generated by the fixed-form versions of each scale. Scores from the function CAT were correlated 0.94 with those generated by the fixed-form scale and 0.82 between the disability CAT and the disability fixed-form scale (Table 7).
 |
DISCUSSION
|
|---|
The results of these analyses revealed that prototype CAT models built for the function and disability domains of the Late-Life FDI instrument provided accurate, precise, valid, and sensitive estimates of late-life function and disability while reducing administrative burden. Two 10-item CATs were administered by telephone in 5.4 minutes, on average, compared with
20 minutes for the fixed-length scales. Although preliminary, the results from the present study are encouraging in that they demonstrate that the goal of a patient-reported function and disability assessment is achievable with little sacrifice of psychometric quality while reducing data collection time and administrative burden. Although further work is clearly needed to expand the Late-Life FDI item banks to improve breadth and depth of coverage and to refine the CAT scales, these results provide evidence that a CAT version of the Late-Life FDI offers the possibility of a patient-reported outcome measure that could be usefully applied across diverse populations of older adults to monitor change in function and disability. These results are consistent with previous research with CAT models for function (36,37).
In reflecting on the Late-Life FDI structure as revealed in this sample with others it is interesting to note that previous factor analysis and IRT analyses suggested a more complex structure underlying the Late-Life FDI than was revealed in this study. In prior work, the function component consisting of three subscales: upper extremity function, basic lower extremity function, and advanced lower extremity function. Analysis of the disability limitation domain revealed two subdomains (instrumental role and management role), and two different subdomains were identified in the disability frequency domain (social role and personal role). Other investigators suggested a somewhat different structure across the instrument (25,38). In contrast to these previous analyses, the confirmatory factor analysis findings in this study revealed a more parsimonious structure consisting of three subdomains: function (32 items), disability limitation (16 items), and disability frequency (11 items). However, given the small number of items that fit a one-dimensional solution in the disability frequency scale, we do not recommend the construction of a CAT version of this domain at this time. Further research is needed to expand the disability frequency item bank subdomain before a useful CAT version can be constructed.
One option would be to delete entirely the disability frequency domain within the Late-Life FDI. We believe, however, that the distinction between disability limitation and frequency of daily activity performance may be important and worth preserving. Recent analyses of a sample with knee pain and functional limitations, for example, revealed that factors within a person's community environment had a differential impact on frequency of activities in contrast to their perceived limitation in the same activities (39). Specifically, the presence of transportation facilitators in one's community was associated with less limitation in disability but was not associated with the frequency with which persons did those same activities. The finding may suggest that, regardless of the level of transportation resources available in one's environment, older persons continue to participate at the level of frequency they desire. Yet, more transportation resources make it easier for them to do so, resulting in feeling less limited.
One potential concern with the introduction of CAT instruments is that administering fewer items may contribute to a loss of sensitivity to change, which would greatly dampen enthusiasm for their use. As has been reported for other CAT instruments, the present findings reveal that modest, if any, sensitivity to change is lost compared with the full-length instrument as long as the CAT program has
10 items. This finding has been replicated in prospective studies of other CAT instruments that assess function in different samples (35,40). However, it should be noted that the 5-item CATs were less accurate, precise, and sensitive and therefore would not be recommended for many research applications. One of the advantages of CAT is that it allows the user to specify the level of score precision desired for a particular application. In a CAT model using a stopping rule based on a relaxed level of score precision, it is quite possible that the scores of some individuals might be estimated with <10 items. Also, in individual assessment, where high precision is desirable, a
15-item stopping rule or a criterion reflecting a smaller degree of measurement error might be more desirable. In contrast, for large-scale studies in which efficiency of administration is essential and less precision is required, even the 5-item CAT might be acceptable. Scores from the function CAT were correlated 0.94 with those generated by the fixed-form scale, and a correlation of 0.82 was observed between the disability CAT and the disability fixed-form scale. Although a correlation of this magnitude between a CAT version and item bank is not unusual (35,41), because the pilot sample size was small the observed correlations were influenced by a small number of participants. We found three participants who displayed response patterns that were inconsistent between the fixed-form and the CAT version of the Late-Life FDI. When we removed those three participants, the correlation between the disability CAT and the disability fixed-form scale rose to 0.934.
We note a number of limitations to this study. First, simulations of CAT scores, such as those reported in this study for the PIRC sample, are possible whenever data sets include responses to all items in an item pool, in this case, the fixed-form version of the Late-Life FDI. Simulations are based on the assumption that the answers to a subset of those items selected using CAT would be identical to the answers given if they were embedded in a larger fixed-form instrument. Such simulations are likely good (but not perfect) approximations of actual CAT administrations (42,43). Results from the pilot study sample, which used both a CAT version of the Late-Life FDI and the fixed-form version of the instrument, provide preliminary real-time evidence of levels of scoring comparability. Clearly, the results of this real-time comparison need to be replicated in a larger and more representative sample of older adults.
The findings from this study are further limited by the relatively small number of questionnaire items available in the fixed-form version of the Late-Life FDI. In future work, we plan to expand considerably the number of functions and disabilities in the Late-Life FDI item banks and construct an updated CAT instrument that will be able to draw from a more comprehensive pool of function and disability items to estimate an older person's function and disability levels. One of the advantages of CAT-based instruments is the ability to replenish (on a regular basis) the item pools underlying a CAT instrument (44). An important advantage of this process is that CAT instruments can be improved relatively quickly based on data from ongoing outcome assessments while scores from the different CAT versions of the same instrument are made comparable along a common scoring metric.
Finally, it should be noted that this analysis was performed on a sample of older adults living in residential care facilities in New Zealand. Both the New Zealand cohort and the Boston pilot study sample study used persons with a restricted range of health and functional status and were not representative. This study needs to be replicated by repeated administrations of CAT instruments in prospective field studies with different types of older samples that are more representative of defined populations.
Conclusion
This study revealed that CAT methodology can be applied successfully to assess patient-reported functioning and disability in older adults. These preliminary findings suggest that the application of CAT methodology can reduce the time required for administration without significant loss of accuracy, precision, or sensitivity to change. Although further work is needed to expand and refine the item pools in all three outcome domains, the results suggest that the CAT approach offers a viable solution to the long-standing conflict between the need for accuracy in clinical assessment and the equal need for practicality of administration.
 |
Acknowledgments
|
|---|
This work was supported by the National Institute on Aging/National Institutes of Health (R41 AG027620-01) and an Independent Scientist Award (K02 HD45354-01) to Dr. Haley.
Drs. Haley and Jette and Mr. Moed have stock interest in CRE Care LLC, which distributes the Late-Life Function and Disability Instrument products.
 |
Footnotes
|
|---|
Decision Editor: Luigi Ferrucci, MD, PhD
Received November 16, 2007
Accepted February 29, 2008
 |
References
|
|---|
- Department of Health and Human Services (U.S.). Healthy People 2010: With Understanding and Improving Health Objectives for Improving Health. 2nd ed. Washington, DC: U.S. Government Printing Office; 2000.
- Guralnik JM, Simonsick EM, Ferrucci L, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994;49:M85-M94.[Abstract]
- Guralnik JM, Ferrucci L, Simonsick EM, et al. Lower-extremity function in persons over the age of 70 years as a predictor of subsequent disability. N Engl J Med. 1995;332:556-561.[Abstract/Free Full Text]
- Dunlop DD, Hughes SL, Manheim LM. Disability in activities of daily living: patterns of change and a hierarchy of disability. Am J Public Health. 1997;87:378-383.[Abstract/Free Full Text]
- McHorney CA. Generic health measurement: past accomplishments and a measurement paradigm for the 21st century. Ann Intern Med. 1997;127:743-750.[Abstract/Free Full Text]
- Ware JE, Jr. Conceptualization and measurement of health-related quality of life: comments on an evolving field. Arch Phys Med Rehabil. 2003;84:(Suppl 2): S43-S51.[Medline]
- Rubenach S, Shadbolt B, McCallum J, et al. Assessing health-related quality of life following myocardial infarction: is the SF-12 useful? J Clin Epidemiol. 2002;55:306-309.[Medline]
- Chen AL-T, Broadhead WE, Doe EA, Broyles WK. Patient acceptance of two health status measures: the Medical Outcomes Study Short-Form General Health Survey and the Duke Health Profile. Fam Med. 1993;25:536-539.[Medline]
- Beaton DE, Richards RR. Measuring function of the shoulder. A cross-sectional comparison of five questionnaires. J Bone Joint Surg Am. 1996;78:882-90.[Abstract/Free Full Text]
- McHorney CA, Bricker DE, Jr. A qualitative study of patients' and physicians' views about practice-based functional health assessment. Med Care. 2002;40:1113-1125.[Medline]
- Lord F. Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Erlbaum Associates; 1990.
- van der Linden W, Hambleton R. Handbook of Modern Item Response Theory. New York: Springer-Verlag New York, Inc.; 1997.
- Hambleton R, Swaminathan H, Rogers H. Fundamentals of Item Response Theory. Newbury Park, CA: Sage Publications; 1991.
- Hambleton RK, Pitoniak MJ. Testing and measurement. Advances in item response theory and selected testing practices. In: Pashler H, Yantis S, Medin D, Gallistel R, Wiated, eds. Steven's Handbook of Experimental Psychology. New York: John Wiley & Sons, Inc.; 2002.
- Jette AM, Haley SM. Contemporary measurement techniques for rehabilitation outcomes assessment. J Rehabil Med. 2005;37:339-345.[Medline]
- Cella D, Gershon R, Lai J-S, Choi S. The future of outcomes measurement: item banking, tailored short forms, and computerized adaptive assessment. Qual Life Res. 2007;16:133-141.[Medline]
- Wainer H. Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence Erlbaum Associates; 2000.
- Haley SM, Ludlow LH, Kooyoomjian JT. Extending the range of functional assessment in older adults: development of the late-life function and disability instrument. J Aging Phys Act. 2002;10:453-465.
- Haley SM, Jette AM, Coster WJ, et al. Late life function and disability instrument: II Development and evaluation of the function component. J Gerontol Med Sci. 2002;57A:M217-M222.[Abstract/Free Full Text]
- Jette AM, Haley SM, Coster WJ, et al. Late life function and disability instrument: I. Development and evaluation of the disability component. J Gerontol Med Sci. 2002;57A:M209-M216.[Abstract/Free Full Text]
- Sayers SP, Jette AM, Haley SM, et al. Validation of the late-life function and disability instrument (LLFDI). J Am Geriatr Soc. 2004;52:1-6.[Medline]
- Dubuc N, Haley SM, Ni PS, et al. Function and disability in late life: comparison of the Late-Life Function and Disability Instrument to the Short-Form-36 and the London Handicap Scale. Disabil Rehabil. 2004;26:362-370.[Medline]
- Warm TA. Weighted likelihood estimation of ability in item response theory. Psychometrika. 1989;54:427-450.
- Ouellette M, LeBrasseur NK, Bean JF, et al. High-intensity resistance training improves muscle strength, self-reported function and disability in long-term stroke survivors. Stroke. 2004;35:1404-1409.[Abstract/Free Full Text]
- McAuley E, Konopack JF, Motl RW, et al. Measuring disability and function in older women: psychometric properties of the late-life function and disability instrument. J Gerontol A Biol Sci Med Sci. 2005;60:901-909.[Abstract/Free Full Text]
- Mislevy RJ. Recent developments in the factor analysis of categorical variables. J Educ Stat. 1986;11:3-31.
- Muthen B, Muthen L. Mplus User's Guide. Los Angeles: Muthen & Muthen; 1998.
- Beauducel A, Herzberg PY. On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Struct Equat Model. 2006;13:186-203.
- Fischer G, Molenaar I. Rasch models: foundations, recent developments, and applications. Berlin: Springer-Verlag; 1995.
- Andrich D. Rasch Models for Measurement. Beverly Hills, CA: Sage Publications; 1998.
- Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47:149-174.
- Hariharan S, Rogers HJ. Detecting differential item functioning using logistic regression procedures. J Educ Meas. 1990;27:361-370.
- Jodoin MG, Gierl MJ. Evaluating type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Appl Measure Educ. 2001;14:329-349.
- Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures used in physical therapy. Phys Ther. 2006;86:735-743.[Abstract/Free Full Text]
- Haley S, Siebens H, Coster W, et al. Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes. Arch Phys Med Rehabil. 2006;87:1033-1042.[Medline]
- Jette A, Haley S, Tao W, et al. Prospective evaluation of the AM-PAC-CAT in outpatient rehabilitation settings. Phys Ther. 2007;87:385-398.[Abstract/Free Full Text]
- Haley SM, Coster WJ, Andres PL, et al. Score comparability of short-forms and computerized adaptive testing: an illustration with the Activity Measure for Post-Acute Care (AM-PAC). Arch Phys Med Rehabil. 2004;85:661-666.[Medline]
- Haley S, Ni P, Hambleton R, Slavin M, Jette A. Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank. J Clin Epidemiol. 2006;59:1174-1182.[Medline]
- Bean JF, Olveczky D, Sharon L, et al. Self-reported vs observed mobility performance: are the underlying factors identical? Journal of Gerontology: Medical Sciences: Blackwell Publishing. 2007 Annual Meeting of the American Geriatrics Society, Seattle, WA, May 2–6, 2007.
- White DK, Keysor JJ, Jette AM, et al. Which environmental factors in the community are associated with disability?: The MOST Study. Association of Rheumatology Health Professionals. Annual Meeting, Boston, MA, November 2007.
- Haley S, Gandek B, Siebens H, et al. Computer adaptive testing follow-up after discharge from inpatient rehabilitation. II. Participation outcomes. Arch Phys Med Rehabil. 2008;89:275-283.[Medline]
- Ware J, Bjorner J, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies widely used headache impact scales. Med Care. 2000;38:(9 Suppl): I173-I182.
- Ware JE, Jr. Conceptualization and measurement of health-related quality of life: comments on an evolving field. Arch Phys Med Rehabil. 2003;84:S43-S51.[Medline]
- Haley S, Pengsheng N, Jette A, et al. Replenishing a Computerized Adaptive Test (CAT) of patient reported outcomes. Qual Life Res. In press.