Estimating Population Size and the Number of Duplicates on Two Lists Based only on Birth Dates  

Co-Principal Investigators: Eugene Laska, Ph.D., Morris Meisner, Ph.D, Carole Siegel, Ph.D.

PROJECT GOALS:

  Methods for estimating the size of a population are required when no simple way exists for enumeration. The approaches that have received the most attention are capture-recapture methods. Suppose there are two lists of individuals, labeled A and B. The data are the number of distinct individuals on each list, NA and NB, and the number common to both lists, NAB. An estimate of the binomial probability, pB, of being on one of the lists, say B is given by B = NAB/NA. The estimator of the size of the population from which the two lists A and B arise is therefore NB/ B = NANB/NAB. Critical to the method is an accurate count of the number of duplicates, the number of individuals on both lists. 

To obtain NAB, individuals on the two lists need to be matched, and not uncommonly, identifying information on individuals on one or the other or both lists is either incomplete or in error. We consider an extreme case in which the only information available is the date of birth of each individual on each list. Clearly, with such limited information, it is not possible to specifically match those who appear on one list with individuals on the other. Our specific aims were to

 


RESEARCH ACTIVITIES AND RESULTS: 

Conditioning on the set of data from the two lists, we found the maximum likelihood estimator (MLE) of the duplication rate for members of one of the lists, and the variance of the estimator. This can be used to form a confidence interval for the number of individuals common to both lists. The MLEs of the size of the population, the probabilities of appearing on each list and the asymptotic variances of these estimators was obtained. Using the delta method, we determined the estimators and the variance of several natural duplication rates based on the parameters of the model.  


POLICY IMPLICATIONS:

Such problems arise in a variety of mental health contexts where the need for protection of the privacy of individuals may prevent sharing identifying information. For example, it is important for the payers and managers of mental health systems to be aware of the number of individuals who use multiple resources, as well as the number in their catchment area who are mentally ill. Because of the need to protect the rights to privacy, such systems may not be able to share personal identifying information. Without compromising their confidentiality obligations, they may be able to provide limited information such as gender, county of residence and date of birth of each individual served by their respective system.

PLANS:

            In New York State, there are a multitude of mental health services providers including two large systems of care who deliver mental health services to veterans. One is the veterans’ administration (VA), and the second is an amalgam of state, county and local mental health agencies under the auspices of the NYSOMH.  These agencies provide care to the general population, including many veterans eligible for service from the VA. This information enables an estimate of the number served by both, which in turn permits an estimate of the total population size. We intend to apply the newly developed methods to the NY state data. (See Veterans’ Study: Systems Integration Core).

 

Laska EM, Meisner M, Wanderling J., Siegel C. (2002). Estimating duplication rate and population size based only on birth dates. Submitted, Statistics in Medicine.

HOME  

Cores: Negotiating Lives in Communities | Methods for MH Services Research | Systems IntegrationPromoting Recovery  |
Topics: Service Delivery Systems | Homelessness | Instrument Development | Managed Care | Mental Illness and Chemical Abuse | Multi-cultural Issues Improving Services Research | Recovery | Treatment Innovations |
Statistical Methods and Computer Programs