Healthcare Databases in Thailand and Japan: Potential Sources for Health Technology Assessment Research

Background Health technology assessment (HTA) has been continuously used for value-based healthcare decisions over the last decade. Healthcare databases represent an important source of information for HTA, which has seen a surge in use in Western countries. Although HTA agencies have been established in Asia-Pacific region, application and understanding of healthcare databases for HTA is rather limited. Thus, we reviewed existing databases to assess their potential for HTA in Thailand where HTA has been used officially and Japan where HTA is going to be officially introduced. Method Existing healthcare databases in Thailand and Japan were compiled and reviewed. Databases’ characteristics e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables were described. Databases were assessed for its potential HTA use in terms of safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. Request route for each database was also provided. Results Forty databases– 20 from Thailand and 20 from Japan—were included. These comprised of national censuses, surveys, registries, administrative data, and claimed databases. All databases were potentially used for epidemiological studies. In addition, data on mortality, morbidity, disability, adverse events, quality of life, service/technology utilization, length of stay, and economics were also found in some databases. However, access to patient-level data was limited since information about the databases was not available on public sources. Conclusion Our findings have shown that existing databases provided valuable information for HTA research with limitation on accessibility. Mutual dialogue on healthcare database development and usage for HTA among Asia-Pacific region is needed.


Introduction
Health technology assessment (HTA) has been defined as "A multidisciplinary field of policy analysis that studies the medical, social, ethical, and economic consequences of health-care interventions" [1]. It is a process for evaluation of new healthcare intervention that examines the available information to aid decision making. Over the last decade, the global use of HTA for value-based healthcare decisions has seen a continuous growth [2][3][4]. Notably in the Asian Region, South Korea, Taiwan and Thailand had formally adopted HTA as part of their decision making on health policy [5][6][7]. The adoption of HTA in these three countries was driven by a number of factors mainly a clear direction towards universal healthcare coverage and the need for rational allocation of scarce resources.
It is well-recognized that HTA is a dynamic process as such conducting one requires timely supply of research findings to inform decisions about the value of the interventions in question. Hence, information in form of healthcare databases is considered as an essential part of HTA research. These databases had become a widely accepted source of information, especially in the Western countries. To date, there has been a number of HTA performed that utilized healthcare databases to produce meaningful and contextually relevant findings for policy makers. For examples, in the United States (US), administrative claim data were used by patient outcome research teams (PORT) to assess the effectiveness of a number of treatments [8]. Denmark had developed standards for use of its Biobank, long-term storage of biological samples from patients used in studies of disease control and treatment, which included personal identifiers [9]. Sweden has over fifty quality registries, which comprised of patient data, diagnoses, interventions, and outcomes [10]. Previous reports have made known repositories of available databases in the US and Canada [11,12], which consequently facilitated the use of databases.
In contrast, the number of HTA research which makes use of healthcare databases in Asia-Pacific countries is relatively small, despite the availability of such databases. It is due to the well-recognized fact that the accessibility to these databases has proven to be difficult [13]. Furthermore, there is a lack of formal repositories. Thus, having a repository of existing healthcare databases made known to the public may be regarded as the first step to accessibility, consequently leads to an increase of use, and in turn the number of HTA research in Asia-Pacific region.
This study aimed to compile and describe healthcare databases currently available in Thailand and Japan, where their HTA use is at different stages. Thailand is an example of a country with a long history of HTA development [5]. The economic recession and health financing reforms played a crucial role in driving the demands for HTA research in aiding policy makers. At a different stage of HTA adoption, Japan is another country with a long history of interest in using HTA but has minimal evidence to demonstrate the use of HTA in policy decision making [7,14]. In 2012, the Japanese Government has demonstrated a strong interest in adopting the use of HTA in their policy decision making process. The formal adoption of HTA in Japan is expected to be introduced in the near future [15]. It is believed that the compilation as well as the assessment of variables of existing healthcare databases, particularly in terms of potential use for HTA research, would encourage further research development in HTA, not only in Thailand and Japan but also the Asia-Pacific region.

Methods
Since there are no comprehensive lists of healthcare databases available in Thailand and Japan, we compiled the lists through a combination of search engines (e.g. Google, and PubMed), references from published articles, and a network of people with experience in healthcare databases through February 2013 without language restriction. Included databases' characteristics were listed and described e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables. Then, databases were classified into three groups according to sources and methods of data collection including: 1) population and household surveys, 2) surveillances and registries, and 3) administrative and claim databases.
To be included in our compiled list of databases, such databases must be of value for potential use in HTA. That is databases must at least identify one of the following domains according to 'best practice in undertaking and reporting health technology assessment' [16] including; safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. The details of outcomes/variables in each domain are available in Table 1. Request route for the access of each database was also provided.

Results
We included 40 databases, 20 from Thailand and 20 from Japan, as a representation of available databases. Our compilation of databases revealed that most data were of national representativeness. Sixty percent (24 of 40) were longitudinal data. These databases serve as foundations for the research in health technology assessment and evaluation for health decision making. Host agencies take primary responsibility for designing, collecting, maintaining and updating datasets, as well as disseminating the data. Characteristics of databases were listed and described in terms of name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables as shown in Table 2. These 40 databases from Thailand and Japan, were classified into three groups; 1) population and household surveys, 2) surveillances and registries, and 3) administrative and claim databases ( Table 3). Out of 20 databases from Thailand, all databases (100%) contained national data. Fourteen databases (70%) comprised of more than 10,000 individuals, 15 databases (75%) provided information on chronic diseases, and 9 databases (45%) had longitudinal data. After classified into three groups according to sources and method of data collection, the first group contained 9 databases of surveys of selective samples of the populations or households. These contained information on population health status (e.g. mortality, and morbidity), health states, and behaviors of specific interests, for examples, mental health, risk factors, physical exercise, and nutrition. The second group contained 9 databases of disease surveillances and registries. In this group, two surveillance databases were primarily population-based, while the other seven registry databases were restricted to patients who had visited health care facilities, but were not population-based. There was a lack of economic information in facility-based data. The third group contained 2 databases obtained from hospital-based administrative data for reimbursement. Cost/payment for health care, utilization of services, morbidity, and mortality were available in electronic databases ( Table 3).
Out of 20 databases from Japan, 17 databases (85%) represented national level data. Thirteen databases (65%) consisted of more than 10,000 individuals, 8 databases (40%) have information on chronic diseases, and 15 databases (75%) provided longitudinal data. The first group comprised of 3 databases. Some included social aspects (e.g. patients' satisfaction) and economic variables as well as population health status. The second group contained 5 databases obtained from disease surveillances and registries. In this group, we obtained population health status (e.g. mortality, mobility, prevalence) but without economic variables. The third group contained 12 databases obtained from hospital-based administrative data for reimbursement or claimed data. We included private services on claimed data in this group and regarded claimed data as utilization information. It is noteworthy that we have information on reimbursement price of each medical device in Databases of Medical Device which are freely accessible (Table 3).
These data have potential for use in HTA research, according to the criteria in Methods section. All databases from Thailand (100%) provided epidemiological data such as incidence, prevalence, and health state, but 19 of 20 databases (95%) from Japan provided such information. Data on safety/efficacy/effectiveness (e.g. mortality, morbidity, disability, adverse events, quality of life) were found in 18 of 20 databases (90%) from Thailand, and 14 of 20 (70%) from Japan. While 2 databases (10%) from Japan provided information on the social/ethical issues, which were not found in the databases from Thailand. For organizational/professional information (e.g. service/technology utilization, length of stay, personnel required, material required), 17 (85%) and 13 (65%) of 20 databases from Thailand and Japan respectively, provided such information. In addition, data on economics were also found in 12 (60%) and 4 (20%) of 20 databases from Thailand and Japan respectively (Table 3). Thus, databases which contained information about disability, adverse event, quality of life, social/ethical issues, and organization/professional issues were limited (Table 3 and Fig 1). Nevertheless, access to patient-level data was also limited since some databases were not available on the websites. In addition some were only available as in-house service or as commercial use (S1 Table). To obtain data at patient level, formal contact with database's host was needed.

Discussion
To the best of our knowledge, this is the first study in Asia-Pacific region that provides the list of healthcare databases and assessed their potential for HTA research. In the present study, a sample of 40 large-scale databases (20 from Thailand and 20 from Japan) implies an abundance of data with high potential for HTA research. Although, most people may not expect that a number of healthcare databases are available in Thailand, the adoption of universal coverage and the use of case-mix classification system partly render the development of electronic claims database in early 2000. This is consistent with Japan. It is possible to question whether having this compilation of databases available to the public would make a difference in terms of accessibility. It has been shown in a number of reports that a lack of listing of databases can be one of the major causes of minimal use of existing databases, having a list of databases demonstrates the availability of databases.
Using databases for HTA research particularly for coverage decision is not uncommon [17]. These databases generally contained large sample of subjects and were designed with appropriate sampling to be representative of national level. Most importantly, database analysis can be performed with minimal time and cost. Database-based study, for example adverse event surveillance reports [18], is useful for rare disease/events that sometimes are never reported in randomized controlled trials (RCTs). Databases that were collated from routine services provide 'real-world' data that are essential for HTA research. Furthermore, databases have multiple functions for applications. For example, Health and Welfare Survey (HWS) database provides the information on morbidity, service or technology utilization, cost/price, prevalence, health state, and demographic data, which can be used in various purposes such as cost estimation, disease incidence/prevalence, and outcomes ascertainment.
Database-based studies have both strengths and limitations. For all non-randomized data, the most significant concern is the potential for bias. Database studies do not meet the methodological rigor of RCTs, regardless of the use of statistical approaches to adjust for selection bias [19,20]. Thus, database-based studies need a rigorous identification the sources of bias and confounding, and then adjusted for these before estimating the impact of interventions. In addition, routine databases usually encounter with the problems of collection errors and missing data. Subsequently, justification of data selection or data imputation should be clarified before estimating the impact of interventions or cost estimation as well. However, the different types of data sources have their own characteristics and limitations.
Population, household and health surveys are designed to collect data of the population (e.g. household, economic status, health status and well-being, healthcare utilization and expenditure, and treatment patterns). They typically collect information from individuals in target population using rigor methodologies i.e. relying on proper sample designs. With these methods, surveys can provide information on the target population. Accordingly, health surveys data can make unique contributions in terms of generalizability of treatments and their impacts, and utilization of health services. The major limitation of surveys for HTA research is the lack of data on specific products. In addition, the issues of subjectivity and recall bias should always be considered [21].
Disease surveillances are reports of patients who have experiences with diseases or unexpected events, while registries are prospective studies of patients who have a particular disease and/or receiving a particular intervention. They can be used for understanding natural history, assessing and monitoring safety and effectiveness, evaluating quality of care and performance, and estimating cost-effectiveness [22]. They include larger and more diverse group of patients than that in RCTs. Thus, they reflect real world patients, practices, and outcomes. Moreover, patients are always followed over a long time frame, allowing for long-term outcomes assessment. Treatment patterns are reflecting the everyday clinical decision-making that is most relevant to providers, decision makers, and payers as well. Since disease surveillances and registries intended to capture data on a population of interest, the inherent limitations of this kind of observational studies should be considered. This kind of database may not be suitable for hypothesis testing, but are useful for hypothesis generation. In addition, incidence and prevalence derived from disease surveillances and registries may not be accurate because they always came from case registers, thus lacking actual denominator/population. However, they remain useful for the estimation of burden of disease.
Administrative and claims databases, in general, are collected primarily for reimbursement purposes, but contained some clinical variables; procedure used, and charge information. Claims databases usually lend to longitudinal and cross-sectional data collection of clinical and economic outcomes at patient, group, or population level. Thereby, they can be used for HTA research at low cost and in a short time. Given the large data size of administrative and claims databases, rare events/outcomes could be potentially detected. In addition, the databases also demonstrate their usefulness for measuring economic impact and association between interventions and outcomes [23,24]. The use of this kind of databases had been challenged due to a number of factors including: 1) data quality e.g. missing data, and coding errors, 2) limited information on clinical outcomes and symptoms, 3) absence of population denominator, 4) lack of information of the difference between cost and charge. However, the large data size may overcome the issue of random missing data. Sometimes the estimation of treatment effects and costs can be biased due to a correlation between unobserved variables that are associated with treatment selection and outcomes, i.e. baseline characteristics.
In Thailand, Medical Research Foundation is developing a website to collect databases/registries [25], and Ganesh SAP Research Unit is collecting the published outcomes research and health economics studies, reports, theses, and proceedings which were conducted in Thailand [26,27]. In Japan there is also a development of a single computerized database linking medical records from all hospitals in Japan [28]. These make it increasingly possible for rapid reviews that can be used for HTA and provide the information for decision making.
Another important aspect that should be considered while using databases is the need to balance data access with need to protect patient confidentiality. To protect the rights and interests of patients, it is important to ensure that all possible steps are undertaken to limit access to confidential information. To minimize risks of disclosure, informed consent for the use of clinical data should be obtained if possible [29] The most effective strategy to keep confidentiality is to remove all identifying information from medical records before any use. [29] However, in some situations, patient identifying information is needed. For example, researcher needs patient identification to link information across sources (e.g. death records, pharmacy data, hospitalization data). Idealistically, informed consent should be obtained from each patient. In reality, samples used in analysis based on computerized database often includes more than thousands of cases, causing acquisition of individual informed consent impossible. Thus, investigators have to demonstrate a clear plan to adequately protect identifiers and prevent disclosure of confidential information. In addition, identifying information should be destroyed at the earliest opportunity.
The main limitation of the present study was that we did not perform a comprehensive search of all existing healthcare databases in both Thailand and Japan. However, we try our best by including relevant databases to describe/illustrate the characteristics and potential use for HTA to the readers/researchers.
In summary, our study provides data sources where HTA researchers can use as an initial source for data search. It should be noted that the use of databases for HTA research has strengths and limitations. The readers/decision makers are referred to comprehensive summaries and critical appraisals [30] about strengths and limitations noted in the literature before using database-based studies for HTA. Further study describing how databases have been used in HTA is warranted.
Supporting Information S1 Table. Database request route.