Center for the Study of Issues in Public Mental Health

Identifying Prognostic Factors That Predict Recovery In the Presence of Loss-to- Follow-up.

Investigators: Christina Drake, Ph.D., Richard Levine, Ph.D. and Eugene Laska, Ph.D.

PROJECT GOALS

Loss to follow up is a common occurrence in longitudinal studies, and the longer the follow-up period the greater the chance it will occur. In the context of health services research, it is entirely possible that the distributions of outcomes of subjects in the uncensored group may not be representative of the distribution in the entire baseline cohort. Thus, without making rather strong assumptions on the nature of the censoring mechanism (ignorable missing) or obtaining more information about the reason for their being missing, it may be erroneous to base estimates of the outcome distribution and an assessment of the prognostic effects of variables measured at baseline on data from the uncensored group. Our goal was to develop statistical techniques that can be used to:

    a. estimate outcomes, and

    b. identify important prognostic factors of outcome when there is non-ignorable loss to follow-up in a study.                      

RESEARCH ACTIVITIES AND RESULTS

We have adopted Rubin's model for causal inference (Holland, 1986) to provide a conceptual basis for assessing prognostic effects in the presence of potential confounders. We assume that the data on candidate baseline prognostic factors is available on all subjects the study. Our approach is to use candidate baseline prognostic factors to stratify subjects, so that similar outcomes are expected for subjects in the same stratum. In particular, the expected outcomes for censored individuals is the same as the expected outcomes for uncensored individual who are in the same stratum. Therefore outcome may be imputed for the censored individuals based on model fits obtained from the uncensored individuals. A global assessment of the importance of candidate prognostic factors may be made using the baseline and outcome data of the uncensored subjects together with the baseline data and imputed outcomes of the censored subjects.

The method we have developed is based on the propensity score, introduced by Rosenbaum and Rubin (1983). In the present context, the propensity score is the probability of being censored given a set of covariates. Let y be an outcome variable, x a vector of covariates and z the censoring variable (censored=1, not censored=0). The propensity score is

e(x, y) = p(z = 1 | x, y).

Rosenbaum and Rubin (1983) showed that conditional on the propensity score, the joint distribution of the covariates (y, x) is independent of the censoring status. That is,

p(x, y | z, e(x, y)) = p(x, y | e(x, y)) (1)

which provides the basis for identifying outcome equivalent groups. They showed that Equation 1 is asymptotically true for any consistent estimator of the propensity score. That is, estimating the propensity score does not introduce further biases. Unfortunately, in situations with lost to follow-up, the propensity score cannot be estimated directly since outcome is missing for some subjects. To overcome this difficulty, we have developed an iterative estimation procedure. Denote the outcome of interest in the complete data by Y, which is comprised of outcomes from individuals not censored, Ync, and missing outcomes from censored individuals, Yc. Thus Y = {Yc, Y}. The procedure is

a. Estimate the propensity score using a logistic regression model and only the baseline variables x. That is, estimate ec(x) = p(z = 1| x) and denote the estimate by êc(x).

b. Stratify the sample based on values of the estimated probabilities of being censored. That is, for each subject in the cohort, substitute the baseline value x into the propensity logistic regression to obtain the estimate êc(x), and use these values to form subgroup strata.

c. Within each stratum, estimate the distribution of the outcome variable given the baseline covariate. That is fit a regression model, y(x), in each strata, using the data from the uncensored subjects in the strata.

d. For each censored patient, impute the missing outcome value yc, denoted c, by substituting the baseline value x into the regression model y(x) from the strata to which the patient belongs determined in step b.

e. Reestimate the propensty score using a logistic regression model and both the baseline variables x and the known and estimated outcome variables. That is, estimate e(ync, c, x) = p(z = 1 | ync, c, x).

f. Repeat steps b - e substituting e(ync, c, x) for ec(x).

When the process is complete, a regression ync(x) can be computed, without stratification, using the censored subjects only. This can be compared with a regression computed without stratification, using all of the data together with the imputed outcome values for the censored subjects. The distribution of outcome in the censored and uncensored subjects can be compared.

 

SIGNIFICANCE OF FINDINGS

Typically, the identification of prognostic factors is based on data from those individuals in the cohort who are not lost or censored. If the majority of subjects are uncensored, it is unlikely that estimates will be biased. But, if the percentage of censored individuals is large, the chance for erroneous inference may be large. Bias is not a problem if the mechanisms that governs loss to follow up are statistically independent of the outcome measure. However, if this assumption does not hold, the possibility of bias must be considered. This method overcomes this problem and enables valid estimates to be made even in the presence of non-ignorable censoring.

PLANS

Further work is necessary to assess the validity of the method under different types of informative censoring. This work will be done using simulation approaches. We have begun to analyze a data set, the WHO/NKI sponsored international outcome study of schizophrenia (ISoS) and further analysis using this method will be carried out. The results need to be written up for publication.

Publications and Presentations:

The methods were presented at the annual meeting of the American Statistical Association in August of 1998. An abstract is published in the proceedings of the meeting.

A chapter describing the method as applied to data from the WHO sponsored study on the long term course and outcome of schizophrenia will appear in Prospects for Recovery in Schizophrenia: Results from a WHO-Collaborative Investigation edited by Center co-director Kim Hopper, Ph.D., et al, in conjunction with the SSED at NKI and WHO.

Project completed, March 1999.

[Top]

HOME  

Cores: Negotiating Lives in Communities | Methods for MH Services Research | Systems IntegrationPromoting Recovery  |
Topics: Service Delivery Systems | Homelessness | Instrument Development | Managed Care | Mental Illness and Chemical Abuse | Multi-cultural Issues Improving Services Research | Recovery | Treatment Innovations |
Statistical Methods and Computer Programs