Skip to main content
Advertisement
  • Loading metrics

A scoping review of the landscape of health-related open datasets in Latin America

  • David Restrepo ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    davidres@mit.edu

    Affiliations Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, Telematics Department, University of Cauca, Popayán, Cauca, Colombia

  • Justin Quion,

    Roles Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

  • Constanza Vásquez-Venegas,

    Roles Investigation, Writing – review & editing

    Affiliation Scientific Image Analysis Lab, Integrative Biology Program, Biomedical Sciences Institute (ICBM), Faculty of Medicine, Universidad de Chile, Santiago, Chile

  • Cleva Villanueva,

    Roles Writing – review & editing

    Affiliation Instituto Politécnico Nacional, Escuela Superior de Medicina, Ciudad de Mexico, Mexico

  • Leo Anthony Celi,

    Roles Writing – review & editing

    Affiliation Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

  • Luis Filipe Nakayama

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliations Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, Department of Ophthalmology, São Paulo Federal University, São Paulo, São Paulo, Brazil

Abstract

Artificial intelligence (AI) algorithms have the potential to revolutionize healthcare, but their successful translation into clinical practice has been limited. One crucial factor is the data used to train these algorithms, which must be representative of the population. However, most healthcare databases are derived from high-income countries, leading to non-representative models and potentially exacerbating health inequities. This review focuses on the landscape of health-related open datasets in Latin America, aiming to identify existing datasets, examine data-sharing frameworks, techniques, platforms, and formats, and identify best practices in Latin America. The review found 61 datasets from 23 countries, with the DATASUS dataset from Brazil contributing to the majority of articles. The analysis revealed a dearth of datasets created by the authors themselves, indicating a reliance on existing open datasets. The findings underscore the importance of promoting open data in Latin America. We provide recommendations for enhancing data sharing in the region.

Author summary

In this review, we explore the potential of artificial intelligence (AI) algorithms to revolutionize healthcare while addressing the challenges of translating them into clinical practice. One crucial obstacle we identify is the limited availability of representative data to train these algorithms. Most healthcare databases are sourced from high-income countries, resulting in non-representative models that may worsen health inequities. Our focus is on health-related open datasets in Latin America, where we aim to identify existing datasets, analyze data-sharing frameworks, techniques, platforms, and formats, and highlight best practices in the region. Through our analysis, we found 61 datasets from 23 countries, with the majority relying heavily on the DATASUS dataset from Brazil. Surprisingly, there is a lack of datasets created by the authors themselves, indicating a reliance on existing open datasets. Our findings underscore the urgent need to promote open data initiatives in Latin America, and we provide recommendations for enhancing data sharing in the region. By fostering data accessibility, we can unlock the potential of AI to advance healthcare for all.

Introduction

Artificial intelligence (AI) algorithms hold great promise in healthcare, enhancing clinical decision-making, diagnosis, and identifying new genomic patterns and drugs [1,2]. However, few AI systems have been translated into clinical practice, and those that have been have not shown much success [35].

Much of the machine learning community around the world has focused on generating new algorithms and more complex machine learning techniques like Transformer models such as BERT [6], GPT [7] or Stable Diffusion [8]. However, there is a lack of research investigating the data utilized by these algorithms and whether or not it is representative of the population in question. Algorithms that are trained on non-representative data can lead to dangerous and biased outcomes [9,10]. More often than not, it is those who are underrepresented in the data who will be harmed the most, such as those in low and middle-income countries (LMICs), women [11], and non-Whites [12]. If left unchecked, these encoded biases will continue to generate inequities in health and widen the gap between populations [13].

An important component of data, and generally overlooked, is the amount, quality, and accessibility of datasets. As it stands now, most healthcare databases originate from high-income countries [1416], limiting the generalizability of any resulting models. Quality is another important aspect of datasets and one that is difficult to come by especially in healthcare due to the personal nature of the information. This data must be deidentified which can be a laborious and expensive process. Finally, a dataset has little value if it cannot be accessed [17]. Publicly available datasets promote reproducibility, enable validation studies, and are a valuable alternative to the elevated costs and challenges of developing a database [14,18].

This review seeks to explore the landscape of health-related open datasets in Latin America. Specifically, the authors aim to identify the existing open health datasets in Latin America through a mapping of the existing literature in Latin American countries; take note of the modalities, techniques, platforms and formats being used to share data; and highlight the initiatives and practices around the publication of open data in Latin America. Additionally, limitations and gaps surrounding the current landscape of health data sharing in Latin America will be identified. Finally, recommendations and suggestions to promote the use of open data in Latin America are provided.

Results

The initial search yielded 700 papers (Fig 1) and 170 duplicates were removed. From the 530 papers that remained, a primary screening based on title and abstract assessment excluded 344 papers. Finally a full-text analysis was performed and excluded 45 papers. A total of 141 documents were used in the final quantitative and qualitative analysis.

thumbnail
Fig 1. Flow diagram for article inclusion and exclusion from PRISMA.

https://doi.org/10.1371/journal.pdig.0000368.g001

The 141 remaining articles were published between 2006 and 2023, with the majority 96 (68.1%) originating from 2020 to 2022.

Authors

In 120 (85.1%) of the articles, at least one author was working in a Latin American institution. It should be noted nationality was based on the location of the affiliated institution which ignores the possibility of Latin American authors immigrating and working at other institutions and vice-versa. Current infrastructure does not allow for easy parsing of author nationality and is outside the scope of this review.

Datasets

This review identified 61 datasets (38 described in Table 1 and 23 described in Table 2) from 23 countries. From those identified datasets, the contribution of the Brazilian database DATASUS dataset stands out, which contributed to 54 (38.3%) of the 141 documents. The contribution of the Brazilian Institute of Geography and Statistics (IBGE) also stands out as the main source of social determinants of health in Brazil, contributing to 9 articles (6.4%). Finally, datasets available for all of Latin America contributed to 43 articles (30.5%). It is important to note that many of the datasets included are global in scope and are not specific to Latin America alone.

thumbnail
Table 1. Open databases found in Latin American countries.

The databases resulting from articles that created the database and released it are not mentioned here because they will be mentioned later. All Latin America means Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, Guatemala, Guyana, Haiti, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Puerto Rico, Suriname, Uruguay, Venezuela.

https://doi.org/10.1371/journal.pdig.0000368.t001

thumbnail
Table 2. Open datasets in Latin America generated through articles found in the review.

Although only Latin American countries appear in the table in the column “Latin American Countries”, it is also important to note that in many cases those are global datasets that are also available for countries in other continents. All Latin America means Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, Guatemala, Guyana, Haiti, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Puerto Rico, Suriname, Uruguay, Venezuela.

https://doi.org/10.1371/journal.pdig.0000368.t002

Country

Brazil is the most prevalent country, mainly due to the influence of DATASUS, appearing in 83 (58.7%) of the papers, followed by Mexico with 47 (33.3%) documents, and Colombia with 32 (22.7%). Guyana and Suriname, on the other hand, are the countries that are present in the least number of articles, appearing only 14 (9.9%) and 13 (9.2%) times, respectively. A heatmap showing the distribution of paper appearances between countries is seen in Fig 2.

thumbnail
Fig 2. Heat map showing the distribution of articles published using open data in Latin America.

The most intense colors indicate a greater presence of articles and the lighter colors indicate less presence. Map created adapting naturalearth_lowres’s layer using (c) 2013–2022, GeoPandas developers, an open source python package created under the liberal terms of the BSD-3-Clause license [49].

https://doi.org/10.1371/journal.pdig.0000368.g002

Local datasets

Despite the large number of articles resulting from the search, articles rarely generated and analyzed datasets from the authors’ institutions. Instead, many of the datasets used were generated by governments and NGOs. Of the 141 resulting papers, only 23 of them created their own dataset. The other 118 papers utilized public datasets.

From those papers that created open datasets, 12 of those datasets were created for specific Latin American countries. 5 datasets were created for Colombia: 1 with social determinants of health and nutrition data for Public Health tasks [21], and 4 with a more clinical point of view which are a muscle dysmorphia dataset [22], body fat measurement [23], endoscopic ultrasound scans [24], and a treatment of Helicobacter pylori dataset [25]. 8 datasets were created for Brazil: 2 from a Public Health perspective which are BASICS [26] with epidemiologic data, and the dataset on child vaccination [27]; 1 dataset of laboratory exams [28]; 1 dataset of images of leprosy called AI4Leprosy [29]; 1 open dataset with electronic health records called ORBDA [30]; 2 datasets for genomics [31,32]; and one trial with the effects of BCG vaccination for COVID-19 to [33]. Of note, 4 datasets were created for Mexico, followed by Cuba (1) and Honduras (1). The rest of the datasets were created globally, mainly for public health, and included either some or all Latin American countries. The full list of datasets and more information about the datasets can be seen in Table 2.

Open vs. credentialed users

Of the total datasets in Latin America, only 7 (5%) require credentialing. Three of these correspond to databases created in research papers (Table 2): SELAdb database [31,32], the Helicobacter pylori dataset [25], and the Cuban Human Brain Mapping Project (CHBMP) [42].

Dates

A majority of the papers were published from 2020–2022 (Fig 3), a spike that may be fueled by the COVID-19 pandemic. Of the 96 papers published during that time period, 33 (34.4%) focused on COVID, either in Latin America or utilized Latin America data.

thumbnail
Fig 3. Number of papers published that created open datasets in Latin America or that used open datasets created in Latin America.

https://doi.org/10.1371/journal.pdig.0000368.g003

Data type

It is also important to take into account the types of data, since a wide range of sources and forms of data also enable a wide range of types of models and investigations in the territory. Fig 4 shows the distribution of data types in the found databases. The most utilized data type is tabular data used in 125 (88.7%) of the papers, followed by images with 5 (6.4%) and finally genomic data, signals, and text with 4 (6.4%), 2 (1.4%) and 1 (0.7%) respectively. Out of the 23 datasets that were created by papers, 15 (65.2%) were tabular, 6 (26.1%) used images and 2 (8.6%) were genomic data. Text datasets were not presented in the papers that generated their own datasets. It should be noted that some papers used multiple datasets and combined different forms of data. Additionally, DATASUS is primarily a tabular dataset and its prevalence throughout the papers skews the results.

thumbnail
Fig 4. Data Type used in datasets from Latin America in all documents.

https://doi.org/10.1371/journal.pdig.0000368.g004

Discussion

The landscape of publicly-available datasets in Latin America is still in its infancy. While Brazil has made great strides around open data mainly through its DATASUS platform, the other Latin American countries are lagging behind. This is due to two reasons, either the current infrastructure—personnel and systems—does not support the creation of datasets in the organizations and they simply do not exist, or despite the existence of health datasets in these countries, they are not being curated and analyzed for a variety of reasons ranging from awareness to accessibility.

Based on the results, there is much to be improved upon. A majority of the papers and datasets used originate from Brazil and many Latin American countries are not represented. Despite the robustness of the DATASUS databases, models developed solely with this data will have limited applicability outside of Brazil. As Movva et al have shown [9], the use of broad demographic groups can hide disparities in the subpopulations. In our case, Brazil alone is not representative of Latin America. South and Central American countries have specific sociocultural and other factors that are unique from each other.

Even when the other Latin American countries are represented in the data, on many occasions it is not due to local initiatives, but rather by initiatives of international organizations that collect and release this data globally. It is not sufficient to simply be represented by data alone; it is crucial that the communities investigating these problems are also representative of the population in question. These groups will be more attuned to the unique socio-cultural context of the problem and are more likely to come up with solutions.

The number of publicly available healthcare datasets originating from Latin America is limited, with DATASUS being the primary database used and providing limited options for research questions to be addressed. The problem becomes more evident given that of the existing datasets, 88.7% aggregate data at the level of population living in cities or even countries. In terms of topics and modalities, the search did not find any clinical dataset. Of the datasets found, only 6.4% were of medical images, making research in specialties such as radiology and ophthalmology not possible.

It is crucial that open data becomes more mainstream in order to promote transparent and reproducible health research [17,44], support processes and quality improvements in health systems, and mitigate algorithmic biases [45,46] as the interest in artificial intelligence intensifies. Sharing data should be at the core of the scientific process to ensure reproducibility especially in the area of health where lives of patients are at stake.

Compliance with FAIR (findable, accessible, interoperable, and reusable) [47] practices for health data sharing in Latin American articles falls short, generating a lack of reproducibility, research advancement, and reduction of health inequities. Prioritizing the adherence to the FAIR principles should be crucial for datasets in Latin America. It is important to bear in mind that the existence of data is not enough if there are many barriers involved. Simply having the data is not enough; it must be available and in optimal conditions for effective utilization.

This review has limitations. The data sources for extracting the articles were limited to 3 databases (Scopus, Web Of Science, and PubMed), and despite the combined scope, some papers and databases may have been omitted, although we argue that databases should be “findable” if they are to be useful. It should also be taken into account that many of the data sources found do not come from scientific publications, but are data sources published by governments and/or local or international organizations. We understand that many of the open data sources in Latin America may not be present in scientific databases and, therefore, may have been excluded from this review. Other data sources, such as open data repositories, should be explored in future reviews.

Recommendations

Support funding and infrastructure

It is necessary to support, encourage, and streamline the creation and maintenance of datasets through funding and infrastructure. Not only should governments and funding agencies assist but other institutions such as hospital and research groups should be encouraged to partake in order to increase the robustness of the data and datasets available. This includes other forms of data as well such as images and text. Investments in technological infrastructure are also necessary as the storage and processing of the data are key components that cannot be overlooked.

While data sharing policies vary across the region, some countries are aligning with global practices by incorporating the obligation to open data release, often tied to funding support. These policies promote transparency and accessibility, highlighting the need for dedicated funding mechanisms and enhanced data infrastructure. These policies can foster data-sharing cultures and empower Latin America in the open data aspects.

Improve data quality and standardization

Attention should be given to the quality and standardization of health datasets in Latin America in order promote and facilitate the use of open data. A good first step towards this goal would be the establishment of guidelines and protocols regarding collection, storage, and sharing of data. Other possibilities include establishing a rigorous de-identification process and implementing data governance practices. Practices, such as controlled access can also be used in cases where de-identification of medical data is not possible. In certain cases, using credentialed data access, rather than fully open data, may be a suitable approach, ensuring that data security and ethical considerations are addressed.

Incorporating benchmarks is also extremely important in the context of open health data in Latin America. To facilitate this, we propose the use of open-source and free tools like GitHub or GitLab, together with programming languages like Python or R, to run benchmark models and ensure reproducibility. Benchmarks not only provide a standardized framework for evaluating the quality and performance of open health datasets but also serve as a foundation for ongoing research and collaboration. By establishing benchmarks, the Latin American research community can accelerate the development of research-based algorithms and encourage the generation of additional data, thus fostering innovation and collaboration.

Foster data sharing culture

Encouraging a culture of data sharing is crucial for the advancement of open data initiatives in Latin America. Key steps to achieve this include recognition, academic or otherwise, and incentives for those who share data openly, such as funding opportunities. Health institutions should be made aware that sharing data is an important but delicate process. Incorporating data treatment policies to avoid risks related to data privacy and identification is vital in each institution. However, sharing data represents more benefits than risks for health institutions.

Address ethical considerations

As the use of open data grows, it is important to consider the ethical ramifications of its use, ranging from data privacy to the potential biases within. Clear guidelines and frameworks should be established to ensure the responsible and ethical use of such data. Additionally, frequent investigations into the datasets may be necessary in order to secure future use of data and models.

Promote multidisciplinary collaboration

As data becomes accessible, the various fields become more entwined and as a result, a multidisciplinary approach is necessary. Health disparities and inequities cannot be tackled by health researchers and data alone. Instead, local communities should be engaged at all levels to better identify solutions with the unique sociocultural perspective in mind.

Conclusion

The authors have performed a review of the research carried out in Latin America that yielded 141 articles utilizing open data related to health without any time, citation or year limitations. From these 141 articles, data such as country of authors’ affiliations, most utilized data sources, data type, and a description of these datasets was extracted. As a conclusion:

  • Having a standardized and accessible data source, as is the case of DATASUS in Brazil, also generates a great source of resources for local research, decision-making, and development in the region.
  • Latin America is a region in which work is still needed around open data from many perspectives such as governments and open data policies. This problem can be seen very easily, especially when comparing the number of articles generated by data from the local health ministries of each country with data generated through international organizations such as NGOs. Special emphasis must be placed on the fact that it is not just having the data, but how to release and share it, since on many occasions the NGO datasets are derived from reports from the Ministries of Health of each region.
  • It is recommended that in Latin America a multidimensional approach be taken where different stakeholders from the government, research institutions, health organizations, among others, work together to create an open health ecosystem. These kind of initiatives can be done through events such as datathons [48], conferences and congresses in which topics related to local needs are worked with local experts in multidisciplinary teams. It must be avoided that the rules and research in the region continue to be carried out by people with total ignorance of the problem.

Implementing these kinds of efforts requires adequate equipment, resources, and a long-term commitment from all parties. In any case, it must be taken into account that the possible benefits of these changes would be much greater in the long term for the improvement of health ecosystems, reduction of biases and inequities in health in Latin America.

Materials and methods

This scoping review focused on publicly available Latin American datasets using a PRISMA methodology [19]. The literature review search included PubMed, Scopus, and Web of Science databases, before 21, June 2023. The search strategy utilized variations of keywords “dataset” and “publicly available”, along with Latin American countries. Document types were limited to journal papers, conference papers, and data papers with no language, year, or citations exclusion. Exact search criteria can be found in S1 File.

Papers were considered eligible if they met the following criteria: i) Papers were published in academic journals, conference proceedings, and reputable sources; ii) Studies that focus on health-related open datasets in Latin America; iii) Studies that provide information on the availability, accessibility, and use of open health datasets; plus any of the following criteria: a) Studies that discuss the modalities, techniques, platforms, and formats used for sharing data in Latin America; b) Studies that highlight initiatives and practices related to the publication of open data in Latin America; c) Studies that identify limitations, gaps, and challenges in the current landscape of health data sharing in Latin America.

Papers were excluded if they did not meet all of the following: i) Non-academic sources such as blog posts, opinion pieces, and news articles; ii) Studies not focused on health-related datasets or not specific to Latin America; iii) Studies that do not discuss the availability, accessibility, or use of open datasets; iv) Studies that are not related to the modalities, techniques, platforms, or formats used for sharing data in Latin America; and v) Studies that do not address initiatives and practices related to the publication of open data in Latin America.

Analysis of documents and the assessment process for eligibility was performed by three authors (DR, LFN, and JQ) where each paper was reviewed at least 2 times by two of the authors to ensure eligibility. As a first step, the software ScientoPy [20] was utilized to clean and remove duplicates from the search results. Since ScientoPy only works for Scopus and Web Of Science, it was necessary to remove duplicates from Pubmed through an alternative method. Python 3.10.12 programming language was leveraged to check the DOIs from the preprocessing of Scopus and Web Of Science, and remove those that were also present in the Pubmed result. Finally, two screening steps were employed: first by title and abstract, then by full-text assessment. All authors participated in the full-text assessment.

The retrieved articles, reviewers assessment, and codes are publicly available at: https://github.com/dsrestrepo/MIT_Review_datasets_Latin_America

Reviewed variables

From the included articles, the following characteristics were extracted about the datasets: name of the datasets used, whether it was open-access or required credentials, the type of data used (e.g. tabular, images, etc.), the originating country of the dataset, and whether or not the authors of the paper created the dataset cited. Further, author nationality was extracted and was based on the country of the affiliated institution of the author. This was used to determine the presence or absence of a Latin American author amongst the group.

Supporting information

References

  1. 1. Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28: 31–38. pmid:35058619
  2. 2. Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2: 719–731. pmid:31015651
  3. 3. Habib AR, Lin AL, Grant RW. The Epic Sepsis Model Falls Short—The Importance of External Validation. JAMA Intern Med. 2021;181: 1040–1041. pmid:34152360
  4. 4. Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, et al. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Internal Medicine. 2021. pmid:34152373
  5. 5. Heaven WD. Google’s medical AI was super accurate in a lab. Real life was a different story. MITS Technol Rev.
  6. 6. Tenney I, Das D, Pavlick E. BERT Rediscovers the Classical NLP Pipeline. arXiv [cs.CL]. 2019. Available: http://arxiv.org/abs/1905.05950
  7. 7. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language Models are Few-Shot Learners. arXiv [cs.CL]. 2020. Available: http://arxiv.org/abs/2005.14165
  8. 8. Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv [cs.CV]. 2021. Available: http://arxiv.org/abs/2112.10752
  9. 9. Movva R, Shanmugam D, Hou K, Pathak P, Guttag J, Garg N, et al. Coarse race data conceals disparities in clinical risk score performance. arXiv [cs.CY]. 2023. Available: http://arxiv.org/abs/2304.09270
  10. 10. Zou J, Schiebinger L. Ensuring that biomedical AI benefits diverse populations. EBioMedicine. 2021;67: 103358. pmid:33962897
  11. 11. Lucy L, Bamman D. Gender and Representation Bias in GPT-3 Generated Stories. Proceedings of the Third Workshop on Narrative Understanding. Virtual: Association for Computational Linguistics; 2021. pp. 48–55.
  12. 12. Nicoletti L, Bass D. Generative AI Takes Stereotypes and Bias From Bad to Worse. In: Bloomberg [Internet]. 8 Jun 2023 [cited 19 Jun 2023]. Available: https://www.bloomberg.com/graphics/2023-generative-ai-bias/?srnd=graphics-v2&utm_source=www.healthcareainews.com&utm_medium=newsletter&utm_campaign=healthcare-s-hidden-gold
  13. 13. Celi LA, Cellini J, Charpignon M-L, Dee EC, Dernoncourt F, Eber R, et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities-A global review. PLOS Digit Health. 2022;1: e0000022. pmid:36812532
  14. 14. Khan SM, Liu X, Nath S, Korot E, Faes L, Wagner SK, et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digit Health. 2021;3: e51–e66. pmid:33735069
  15. 15. Yi PH, Kim TK, Siegel E, Yahyavi-Firouz-Abadi N. Demographic Reporting in Publicly Available Chest Radiograph Data Sets: Opportunities for Mitigating Sex and Racial Disparities in Deep Learning Models. J Am Coll Radiol. 2022;19: 192–200. pmid:35033310
  16. 16. Sauer CM, Dam TA, Celi LA, Faltys M, de la Hoz MAA, Adhikari L, et al. Systematic Review and Comparison of Publicly Available ICU Data Sets—A Decision Guide for Clinicians and Data Scientists. Crit Care Med. 2022;50: e581. pmid:35234175
  17. 17. de Kok JWTM, MÁA de la Hoz, de Jong Y, Brokke V, Elbers PWG, Thoral P, et al. A guide to sharing open healthcare data under the General Data Protection Regulation. Sci Data. 2023;10: 404. pmid:37355751
  18. 18. Seastedt KP, Schwab P, O’Brien Z, Wakida E, Herrera K, Marcelo PGF, et al. Global healthcare fairness: We should be sharing more, not less, data. PLOS Digit Health. 2022;1: e0000102. pmid:36812599
  19. 19. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018;169: 467–473. pmid:30178033
  20. 20. Ruiz-Rosero J, Ramirez-Gonzalez G, Viveros-Delgado J. Software survey: ScientoPy, a scientometric tool for topics trend analysis in scientific publications. Scientometrics. 2019;121: 1165–1188.
  21. 21. Restrepo DS, Pérez LE, López DM, Vargas-Cañas R, Osorio-Valencia JS. Multi-Dimensional Dataset of Open Data and Satellite Images for Characterization of Food Security and Nutrition. Front Nutr. 2021;8: 796082. pmid:35155518
  22. 22. Kuzmar I, Consuegra J, Jiménez J, López E, Hernández J, Noreña-Peña A. Dataset for estimation of muscle Dysmorphia in individuals from Colombia. Data Brief. 2020;31: 105967. pmid:32671163
  23. 23. Kuzmar I, Arroyo JRM, Villanueva MAC, Ortega LVS, Cortissoz GSG, Bandera XPG, et al. Dataset for the estimation of a new body fat measurement method. Data Brief. 2021;34: 106656. pmid:33385025
  24. 24. Jaramillo M, Ruano J, Gómez M, Romero E. Endoscopic ultrasound database of the pancreas. 16th International Symposium on Medical Information Processing and Analysis. SPIE; 2020. pp. 130–135. https://doi.org/10.1117/12.2581321
  25. 25. Valladales-Restrepo LF, Correa-Sánchez Y, Aristizábal-Carmona BS, Machado-Alba JE. Treatment regimens used in the management of Helicobacter pylori in Colombia. Braz J Infect Dis. 2022;26: 102331. pmid:35182470
  26. 26. Fernandes Santos Alves R, de Moraes Mello Boccolini P, Baroni LR, de Almeida Relvas-Brandt L, de Abreu Junqueira Gritz R, Siqueira Boccolini C. Brazilian spatial, demographic, and socioeconomic data from 1996 to 2020. BMC Res Notes. 2022;15: 159. pmid:35538501
  27. 27. Boccolini P de MM, Boccolini CS, de Almeida Relvas-Brandt L, Alves RFS. Dataset on child vaccination in Brazil from 1996 to 2021. Sci Data. 2023;10: 23. pmid:36631497
  28. 28. Szwarcwald CL, Malta DC, Souza Júnior PRB de, Almeida W da S de, Damacena GN, Pereira CA, et al. Laboratory exams of the National Health Survey: methodology of sampling, data collection and analysis. Rev Bras Epidemiol. 2019;22Suppl 02: E190004.SUPL.2. pmid:31596375
  29. 29. Barbieri RR, Xu Y, Setian L, Souza-Santos PT, Trivedi A, Cristofono J, et al. Reimagining leprosy elimination with AI analysis of a combination of skin lesion images with demographic and clinical data. Lancet Reg Health Am. 2022;9: 100192. pmid:36776278
  30. 30. Teodoro D, Sundvall E, João Junior M, Ruch P, Miranda Freire S. ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers. PLoS One. 2018;13: e0190028. pmid:29293556
  31. 31. da Costa GE, Fernandes GL, Rodrigues JCG, da V B Leal DF, Pastana LF, Pereira EEB, et al. Exome Evaluation of Autism-Associated Genes in Amazon American Populations. Genes. 2022;13. pmid:35205412
  32. 32. Lerario AM, Mohan DR, Montenegro LR, Funari MF de A, Nishi MY, Narcizo A de M, et al. SELAdb: A database of exonic variants in a Brazilian population referred to a quaternary medical center in São Paulo. Clinics. 2020;75: e1913. pmid:32785571
  33. 33. Pittet LF, Messina NL, Gardiner K, Orsini F, Abruzzo V, Bannister S, et al. BCG vaccination to reduce the impact of COVID-19 in healthcare workers: Protocol for a randomised controlled trial (BRACE trial). BMJ Open. 2021;11: e052101. pmid:34711598
  34. 34. James WHM, Tejedor-Garavito N, Hanspal SE, Campbell-Sutton A, Hornby GM, Pezzulo C, et al. Gridded birth and pregnancy datasets for Africa, Latin America and the Caribbean. Sci Data. 2018;5: 180090. pmid:29786689
  35. 35. Sorichetta A, Hornby GM, Stevens FR, Gaughan AE, Linard C, Tatem AJ. High-resolution gridded population datasets for Latin America and the Caribbean in 2010, 2015, and 2020. Sci Data. 2015;2: 150045. pmid:26347245
  36. 36. Karlinsky A, Kobak D. Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset. Elife. 2021;10. pmid:34190045
  37. 37. Hajjou M, Krech L, Lane-Barlow C, Roth L, Pribluda VS, Phanouvong S, et al. Monitoring the quality of medicines: results from Africa, Asia, and South America. Am J Trop Med Hyg. 2015;92: 68–74. pmid:25897073
  38. 38. Balducci T, Rasgado-Toledo J, Valencia A, van Tol M-J, Aleman A, Garza-Villarreal EA. A behavioral and brain imaging dataset with focus on emotion regulation of women with fibromyalgia. Sci Data. 2022;9: 581. pmid:36138036
  39. 39. Albores-Mendez EM, Aguilera Hernández AD, Melo-González A, Vargas-Hernández MA, Gutierrez de la Cruz N, Vazquez-Guzman MA, et al. A diagnostic model for overweight and obesity from untargeted urine metabolomics of soldiers. PeerJ. 2022;10: e13754. pmid:35898940
  40. 40. Padilla-Rivas GR, Delgado-Gallegos JL, Montemayor-Garza R de J, Franco-Villareal H, Cosio-León MDLÁ, Avilés-Rodriguez G, et al. Dataset of the adapted COVID stress scales for healthcare professionals of the northeast region of Mexico. Data Brief. 2021;34: 106733. pmid:33521178
  41. 41. Menzies NA, Suharlim C, Geng F, Ward ZJ, Brenzel L, Resch SC. The cost determinants of routine infant immunization services: a meta-regression analysis of six country studies. BMC Med. 2017;15: 178. pmid:28982358
  42. 42. Valdes-Sosa PA, Galan-Garcia L, Bosch-Bayard J, Bringas-Vega ML, Aubert-Vazquez E, Rodriguez-Gil I, et al. The Cuban Human Brain Mapping Project, a young and middle age population-based EEG, MRI, and cognition dataset. Sci Data. 2021;8: 45. pmid:33547313
  43. 43. Angeles-Valdez D, Rasgado-Toledo J, Issa-Garcia V, Balducci T, Villicaña V, Valencia A, et al. The Mexican magnetic resonance imaging dataset of patients with cocaine use disorder: SUDMEX CONN. Sci Data. 2022;9: 133. pmid:35361781
  44. 44. Celi LA, Citi L, Ghassemi M, Pollard TJ. The PLOS ONE collection on machine learning in health and biomedicine: Towards open code and open data. PLoS One. 2019;14: e0210232. pmid:30645625
  45. 45. Noble SU. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press; 2018. Available: https://play.google.com/store/books/details?id=-ThDDwAAQBAJ
  46. 46. Impact of healthcare algorithms on racial and ethnic disparities in health and healthcare. [cited 3 Jul 2023]. Available: https://effectivehealthcare.ahrq.gov/products/racial-disparities-health-healthcare/protocol
  47. 47. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3: 160018. pmid:26978244
  48. 48. Aboab J, Celi LA, Charlton P, Feng M, Ghassemi M, Marshall DC, et al. A “datathon” model to support cross-disciplinary collaboration. Sci Transl Med. 2016;8: 333ps8. pmid:27053770
  49. 49. Van den Bossche Joris, Jordahl Kelsey, Fleischmann Martin, James McBride Jacob Wasserman, Richards Matt, Adrian Garcia Badaracco, et al. 2023. Geopandas/geopandas: v0.13.2. https://doi.org/10.5281/zenodo.8009629.