Bioinformatics in Sudan: Status and challenges case study: The National University-Sudan

The ever increasing applications of bioinformatics in providing effective interpretation of large and complex biological data require expertise in the use of sophisticated computational tools and advanced statistical tests, skills that are mostly lacking in the Sudanese research community. This can be attributed to paucity in the development and promotion of bioinformatics, lack of senior bioinformaticians, and the general status quo of inadequate research funding in Sudan. In this paper, we describe the challenges that have encountered the development of bioinformatics as a discipline in Sudan. Additionally, we highlight on specific actions that may help develop and promote its education and training. The paper takes the National University Biomedical Research Institute (NUBRI) as an example of an institute that has tackled many of these challenges and strives to drive powerful efforts in the development of bioinformatics in the country.

Introduction Advances in bioinformatics continue to enable interdisciplinary research teams to handle, analyze, and interpret large biological datasets. Technological advances in genome sequencing and computational capacities have provided deeper insights into various fields including biomedical sciences, healthcare, and agriculture, among others [1]. The emergence of bioinformatics in Africa was a response to the development of initiatives aimed at establishing strong bioinformatics capacities across the continent. In 1996, Winston Hide recognized the need for bioinformatics development when he founded the South African National Bioinformatics Institute (SANBI, http://www.sanbi.ac.za/) at the University of the Western Cape (UWC) [2]. Seven years later, several initiatives and training nodes were established in South Africa, Kenya, Tunisia, and Sudan, with the aim of providing training programs and workshops for African researchers and postgraduate bioinformatics students. In 2004, the African Society for Bioinformatics and Computational Biology (ASBCB) was established as a dedicated society for bioinformatics development in Africa [3]. Following its first conference in Kenya in 2007, the ASBCB has been running biannual meetings accompanied by training workshops in partnership with the International Society of Computational Biology (ISCB) [4]. These have introduced students and researchers to a wide range of exciting opportunities in the field. The 2 most pivotal events, which occurred in 2010, were the establishment of Human Heredity and Health in Africa initiative (H3Africa, http://www.h3africa.org) and the development of a pan-African bioinformatics network (H3ABioNet, https://www.h3abionet.org/) in response to gaps in human genomic research in African populations and to develop bioinformatics capacity in the continent, respectively [5]. Recently, and despite the challenges to be described later, there has been notable progress in the development of bioinformatics in the African continent. This is manifested in the rising number of initiatives, research funding, degree programs (BSc, MSc, and PhD), internships, and training opportunities. It can be claimed that bioinformatics education in Africa has been revitalized through H3ABioNet initiatives since 2010. As a result, various groups of bioinformaticians and wet lab researchers have become increasingly interested in workshops and academic programs in this field. These efforts, in addition to published studies on the development of bioinformatics in several countries [6,7], have inspired us to investigate the status of bioinformatics in Sudan. We attempted to investigate the current situation of bioinformatics education and research in the country through reviewing the number of Sudanese bioinformatics publications, studying the outcomes of genomics and bioinformatics research projects in national institutions, thereby examining the effects of the shortage of trained bioinformaticians, and exploring logistics and infrastructural limitations. The challenges and proposed solutions have been discussed in the light of our own initiatives at the National University-Sudan (NUSU).

Sudanese bioinformatics institutions
As of 2021, there are 4 Sudanese institutions that offer bioinformatics data analysis and education: (1) the Centre for Bioinformatics and Systems Biology (CBSB), University of Khartoum; (2) the Department of Molecular Biology and Bioinformatics at the College of Veterinary Medicine, University of Bahri, Khartoum; (3) the Department of Applied Bioinformatics and Genomics at the Africa City of Technology; and (4) the Department of Bioinformatics and Biostatistics at the National University Biomedical Research Institute (NUBRI), NUSU.

Challenges to the growth of bioinformatics in Sudan
Although bioinformatics has become well established in some parts of Africa, it can still be considered to be in its formative stages in Sudan. This is attributed to many challenges including the lack of access to local bioinformatics expertise and training, internet availability, speed, and instability and the lack of laboratory infrastructure. These difficulties have hindered bioinformatics development and education. Additionally, until recently, Sudan has faced strict economic sanctions that have obstructed funding and the possibility of international collaborations with reputable institutions. These restrictions have also included access to a number of advanced massive open online courses (MOOCs) in molecular biology and bioinformatics. The expected relief from sanctions, although widely reported in local and international media, is yet to be fully implemented. The advent of the Coronavirus Disease 2019 (CAU : PleasenotethatCOVID À 19hasbeendefinedasCoronavirusDisease2019inthesentenceTheadvento OVID-19) pandemic has further crippled an already struggling economy. Despite this, enthusiasm for bioinformatics among Sudanese researchers has only increased in recent years as a result of ongoing workshops and courses. This, however, has not been corroborated by the establishment of a sufficient number of formal bioinformatics degree programs at both the undergraduate and graduate levels. The analyses of large datasets generated from bioinformatics pipelines requires the establishment of a bioinformatics laboratory. There are difficulties, however, in setting up such facilities, even with the availability of experts with the required skills and experience. Therefore, expanding this computing and IT infrastructure is a priority. It will allow for the wide scale application of bioinformatics in Sudan. Moreover, the availability of technical facilities will put Sudan in line with global institutions in terms of capacities and will lead to more effective collaborations, the enhanced sharing of data, and, eventually, the translation of these research and academic efforts into science and clinical practice.

Publications in bioinformatics
The bioinformatics community in Sudan has been steadily growing, with researchers from various scientific backgrounds and diverse interests, including pathogen genomics, human genetics, computer science, plant, and animal genetics. The databases PubMed and ScienceDirect [24] were searched from the period between 2003 and 2020 using the following query terms: "next-generation sequencing," "computational biology," "bioinformatics," genomics," and "in silico" as mentioned by authors [25]. Another search term, "sequencing," was added to include publications that used first generation sequencing or Sanger sequencing. After discarding duplicate papers, all publications were filtered according to the following inclusion and exclusion criteria.
Inclusion criteria. The selection included articles written in English, authored by at least 1 scientist affiliated with a Sudanese institution and incorporating bioinformatics techniques according to the definition by authors [26].
Exclusion criteria. Some publications were excluded from the study like those affiliated with South Sudan (an independent country since 2011). Also excluded were publications that were available only as abstracts, where we were not able to confirm whether a bioinformatics tool had been used, and reviews (40 papers), due to the lack of a bioinformatics tool application.

Publications data analysis
The final outcome was 379 articles (Fig 1) (S1 Table) (retrieved on 10-5-2020). The chosen articles included those with methodologies that incorporate bioinformatics techniques, such as genome/exome sequencing, sequence alignment, phylogenetic trees, primer design, single nucleotide polymorphism (SNP) analysis, and more. The present study has indicated that bioinformatics tools and software have been used to investigate a number of research areas in Sudan as illustrated in Fig 2. They have mostly focused on animal research and human microbiology (malaria, leishmania, theileria, mycetoma, echinococcus, and staphylococcus), genetics, and genomics, and, later, human disease, drug design, plant studies, crop research, mycology, cancer research, population genetics, and human nutrition. The articles relating to subjects from Sudan are usually collaborative projects between scientists in Sudanese institutions and those in regional or international research centers. These projects include studies that have analyzed the full genome sequence of Dromedary camels to evaluate the genomic diversity and signatures of positive selection [27], the phylogenetic tree analysis of Streptomyces isolates from soil [28], and anti-hepatitis B virus activity isolated from Guiera senegalensis leaves [29]. The studies also included articles that investigated the prediction of the appropriate Middle East Respiratory Syndrome Coronavirus (MAU : PleasenotethatMERS À CoV hasbeendefinedasMiddleEastRespiratorySyndro ERS-CoV) epitope vaccine [30], identification of mutations in the Theileria annulata prolyl isomerase I gene (TaPIN1) [31], malaria drug resistance molecular markers [32], whole genome sequencing of Klebsiella pneumoniae [33] and Acinetobacter baumannii [34], and the function of EIL/EIN3 transcription factor genes in cotton fiber development [35].

Development of Bioinformatics in National University-Sudan
Despite the abovementioned challenges, the NUBRI has been trying since 2017 to overcome all of these obstacles and lead efforts to advance bioinformatics applications and learning in The diagram indicates the number of hits obtained in a search of PubMed and ScienceDirect using the terms "nextgeneration sequencing," "computational biology," "bioinformatics," "genomics," "in silico," and "sequencing" filtered by authors affiliated with Sudanese institutions from 2003 to 2020. the country. The institute is continuing to offer dependable solutions for research problems through the analysis and interpretation of biological data. The Department of Bioinformatics and Biostatistics at NUBRI is focused on using a myriad of bioinformatics tools in the adequate computational and IT infrastructure, including network and storage systems. The research team is comprised of experienced researchers with the skills to analyze bioinformatics pipelines in addition to wet lab methodologies and techniques. In this section we describe NUBRI's efforts in the development of bioinformatics.

The mission and objectives of the bioinformatics and biostatistics department
Mission. The mission of this department is to (a) conduct bioinformatics and computational biology research relevant to Sudan; (b) develop human resources in bioinformatics and computational biology by educating and mentoring scientists; and (c) raise awareness and access to bioinformatics and computational biology resources. The figure describes all research areas, subresearch areas, and topics retrieved from a search of PubMed and ScienceDirect using the terms "next-generation sequencing," "computational biology," "bioinformatics," "genomics," "in silico," and "sequencing" filtered by authors affiliated with Sudanese institutions. All the numerical data used in all figures are included in S1 Data. https://doi.org/10.1371/journal.pcbi.1009462.g002 Objectives. The objectives of the department are to (a) generate and publish high quality biomedical research; (b) train competent researchers; (c) upgrade the academic programs of the NUSU; (d) enhance other research fields through collaborative projects; and (e) establish sources of sustainable funding to realize the missions of NUBRI.
Infrastructure. One of the key challenges to the application of bioinformatics in Sudan is the lack of reliable access to a stable internet connection, which is key to accessing web-based bioinformatics tools and even client tools often require access to online datasets. Another more significant obstacle is the instability of the electricity supplied to educational institutions. NUBRI has been able to overcome these challenges as a result of the availability of substantial technical, logistical, and financial resources, which have made its designated objectives attainable. The institute has invested in obtaining the necessary software and hardware for networking, storage, and data analysis to enable efficient processing of sequence data. These facilities are also comprised of a 24-hour power supply, robust internet connections, and a bioinformatics lab supplied with high-performance computers. The university has provided a server with high processing power, memory, and storage (2 × 12 Intel Xeon CPU, 350 GB DDR4 RAM, 10 TB SSD HD) that is suitable for massive data storage and computing resources.
Training capacity. Basic bioinformatics training programs are highly necessary to increase bioinformatics capacities. NUBRI has provided those opportunities with an ultimate aim of enhancing the development of bioinformatics in Sudan. It is committed to deliver a number of short training courses, available both on-site and online, throughout the year. This has supported students in their MSc and PhD programs in bioinformatics-related disciplines. Below is a brief description of the activities and programs offered at the institution: 1. Short-term training courses and workshops: These are intended to familiarize students and those interested in the field with a wide range of bioinformatics topics including databases, DNA sequence analysis, and genome browsing. Table 1 illustrates workshops and training programs held at NUBRI between 2018 and 2019. They were structured in a way to engage individuals of diverse academic backgrounds. The majority of these workshops were accompanied with hands-on activities to enable participants to handle real data and bioinformatics tools. In order to continuously improve the quality of the workshops offered by the institute, participants were invited to fill in evaluation forms at the end of each workshop. They were also given handouts and suggestions for further reading and practice to ensure that they are able to apply what they have learned.

Research Attachment Program (RAP):
In 2018, RAPs were established to train students in one of the subdisciplines of bioinformatics. During their 2 to 3 months at the Department of Bioinformatics and Biostatistics, students are given the opportunity to work in projects, analyze sequencing data, and participate in writing a scientific publication. This program focuses on training participants to use bioinformatics data packages and webbased interactive tools to analyze real data generated from the department's research projects. Between 2018 and 2020, the program has recruited 50 researchers from various disciplines including biotechnology, pharmacy, molecular biology, laboratory technology, veterinary medicine, human medicine, and dentistry.

Classrooms for H3ABioNet workshops:
In 2017, the department succeeded in hosting 1 of 20 classrooms of the Pan African Bioinformatics Network for H3Africa (H3ABioNet). The department has so far hosted classrooms for 3 introductory level courses as illustrated in the Table 1. These 3-month courses are offered free of charge without application or participation fees. This has enabled participants from local universities and major research institutions to gain access to African expertise and bioinformatics trainers via video conferencing and discussion forums [36].

Staff training:
Researchers at NUBRI are encouraged to regularly attend training workshops to ensure that they possess the necessary skills and knowledge in order to effectively collaborate with regional and global bioinformatics institutes. This has been accomplished through providing bursaries and travel support. Trained staff, in turn, act as trainers and arrange courses to transfer skills to other university staff. Table 2 summarizes the participation of NUBRI staff at conferences and workshops worldwide.
5. E-learning: NUBRI has created an easily accessible electronic learning platform for diploma and master programs to conduct teaching sessions and assessments. NUBRI is also planning to establish an updateable education platform that facilitates knowledge transfer though lecture videos and hands-on practical exercises. Due to the computerintensive nature of bioinformatics, using e-learning technologies will benefit both instructor and learners by allowing for the easy preparation of teaching materials and efficient evaluation of students' progress. As for the learners, it offers them the freedom to learn at their own pace and interact easily with other learners and instructors. Along this line, the department of Bioinformatics and Biostatistics is establishing a server platform for data and resources storage and for computational analyses. The server platform is a key achievement that will enhance international collaborations and provide online, country-, and region-wide training and services.
Degree programs. The NUSU is currently the only university in the country that offers a degree program in bioinformatics. There were many attempts by Sudanese universities to introduce bioinformatics into current curricula, which have normally focused on integrating aspects of the discipline into existing life science degree programs. At NUBRI, we have developed courses in a way that conforms to the interests of scientists from both biology and computer science backgrounds. This will certainly help in developing multidisciplinary teams with the ability to utilize their respective basic science knowledge and newly acquired bioinformatics skills to handle complex datasets.
The bioinformatics master's degree program was established in 2019 and is offered over 12 months. It incorporates 13 coursework modules and a research project running concurrently throughout the program. This master's degree program equips the students with the following: 1. a basic background in modern biology, biochemistry, cell biology, genetics, and molecular biology; 2. familiarity with computational methods to address problems of digital technology in molecular biology; 3. knowledge on storage, retrieval, sharing, and use of biological information in core areas of bioinformatics: multiple sequence alignment, phylogenetic trees, genomics, and proteomics; and 4. skills in applied bioinformatics: immunoinformatics, drug designing, and discovery.
The program's courses cover introductory programming and molecular biology before moving on to more advanced topics. The program is evenly divided into coursework and research. The coursework includes diverse modules covering mathematics, statistics, computer science, and biology, with an emphasis on applications to bioinformatics research. While establishing and running the program, a number of challenges arose, such as the required infrastructure and administrative support, range of topics, diversity of students, and finding examiners and program evaluators. The institute actively recruits students from different backgrounds with the belief that diversity positively impacts student interactions and overall academic experience. For purposes of knowledge transfer and promoting collaborations, international experts are invited to provide workshops at the institute in order to keep students updated with all the most recent bioinformatics tools and software used in big data analysis. Upon successful completion of the master's program, students should be able to (a) outline strategies and apply appropriate tools in bioinformatics; (b) identify potential bioinformatics applications drawn from ongoing research; (c) interpret data related to sequences of nucleotides and amino acids, protein domains, and protein structures; (d) develop and implement tools that enable efficient access and management of data; (e) acquire essential programming skills; and (f) demonstrate an understanding of the algorithms used in bioinformatics.
In addition to the master's degree program, diploma and PhD programs are already being developed at NUBRI, as part of its vision to expand and promote the discipline in Sudan.
Collaborations. Building bridges among African scientists and non-African scientists is one of the approaches that can help to improve Africa's participation in bioinformatics and facilitate the development of scientific capacity and productivity in this discipline [16]. At NUBRI, researchers are always encouraged to collaborate and engage with bioinformatics researchers within the country and from various academic institutions around the world such as Universiti Putra Malaysia (UPM), the ISCB, the African Society of Bioinformatics and Computational Biology (ASBCB), the SANBI, and the National Institute for Communicable Diseases (NICD). The institute has collaborative projects with the NICD focused on the field of pathogen genomics. A number of NICD staff were invited to perform workshops and train NUBRI staff at their laboratories in South Africa. In another collaboration with South Africa, one of our staff members joined SANBI as an intern and received bioinformatics training and consequently initiated joint research projects. Partnership agreements with SANBI and other African institutes are currently underway in order to support bioinformatics training in Sudan, through sponsoring conferences, seminars, and internship programs. In 2018, a researcher from NUBRI was trained on NGS analysis at UPM and was successful in promoting collaborations, which is evident by the institute's contribution in organizing the 3rd International Symposium and Workshop on Functional Genomics and Structural Biology (FGSB 2018) in Malaysia. These networks have contributed to the rapid development of bioinformatics research at the institute as they have provided training opportunities as well as the sharing bioinformatics resources.
Projects. As a bioinformatics research group at the NUBRI, pathogen genomics is the core theme of ongoing projects, as shown in Table 3. NGS technology (whole genome sequencing) is being performed to study clinically relevant pathogenic microorganisms. The ultimate aim of these projects is to reduce the health burden of infectious diseases and help in their control. Microbiology screening tests, nucleic acid extraction, and amplification using PCR and reverse transcription PCR (RAU : PleasenotethatRT À PCRhasbeendefinedasreversetranscrip T-PCR) are carried out in the well-established laboratories of NUSU and NUBRI. Bioinformatics analysis associated with research projects is run by the bioinformatics department at NUBRI using the available server to handle and store large volumes of biological data. In addition, a project aiming at developing a friendly Linux OS for bioinformatics analyses is being developed as illustrated in Table 3.
Future strategies. The NUSU aims to accelerate the implementation of bioinformatics in the country at the micro level and also the continent at the macro level. This requires a clear road map for dedicated research and support as it has been demonstrated by the South African experience [2]. Fig 3 illustrates our short-, medium-, and long-term strategic plans that focus on providing enhanced educational programs, workshops, resources, scientific research projects, and collaborations. This plan was drafted after carefully following the progress of bioinformatics not only at NUBRI but also at other institutes in Sudan. Regular evaluation and feedback are important aspects of the institute's plan in order to assess the timeline of achieving goals and provide the resources needed to meet objectives.
Short term (1 to 2 years). At NUBRI, bioinformatics resources and opportunities are always highlighted in discussions and at meetings with researchers, faculty members, policymakers, and even industry leaders. Interactions with researchers at national conferences and symposia have reaffirmed that bioinformatics research development must be a priority in Sudan due to its crucial role and impact on other life sciences. One short-term project currently being established at NUBRI is an NGS laboratory. This is expected to fuel the growth of data and enable the institute to expand research beyond pathogen genomics. A website and web-based educational portal is also in progress. This will provide researchers and students with access to training materials and bioinformatics protocols. Additionally, seminars and conferences are being organized to promote networking not only with educational institutions but also with governmental and industrial entities.
Medium term (2 to 4 years). With the establishment of the bioinformatics and NGS laboratory, NUBRI will be at the forefront of bioinformatics development in the country through the ability to independently sequence project samples and share results nationally and with international research groups. The goal is to collaborate on 3 projects per year by the end of 2024. NUBRI will also focus on securing funds from government-sponsored grants and international funding agencies, taking into account the successful South African experience, which has enabled it to lead bioinformatics research in Africa. The institute will continue in hosting bioinformatics activities and working closely with H3ABioNet, the SANBI, and the NICD. These collaborations will address Africa's increasing demand for bioinformatics training and research. Implementation of these medium-term strategic goals will also increase Sudan's contribution to bioinformatics publications in Africa.
Long term (4 to 10 years). In the long term, we plan to conform to nationwide goals set forth by the Sudanese Ministry of Health and WHO, which aim to combat the growing threat of antimicrobial resistance by implementing the National Action Plan (NAP). This action plan was initiated to alleviate the health burden associated with both communicable and noncommunicable diseases in Sudan and Africa and to which bioinformatics is necessary in understanding pathogen genomes and identifying drug targets. At the same time, PhD and postdoctoral training programs are to be established, given that such opportunities will contribute to tackling local problems efficiently and improving the publication records in the country consequently.

Conclusions
The field of bioinformatics is rapidly evolving and is increasingly being recognized as important to the work of scientists in Africa. In Sudan, interest in life science research and healthcare has created demand for bioinformatics training due to the need to analyze and interpret complex biological data. The field has largely been overlooked previously because of numerous challenges, namely funding and limited local training opportunities. The NUSU and the NUBRI have been leading efforts toward a capacity building plan, detailed in short-, medium-, and long-term goals to support bioinformatics education, training, and research. The institute has succeeded in establishing collaborations, hosting experts in the field, providing training, and has even established a bioinformatics degree program. It is hoped that all these efforts will build even stronger regional and international partnerships and establish Sudan as a key bioinformatics hub in Africa.
Supporting information S1 Data. Excel spreadsheet containing, in separate sheets, the underlying numerical data for Figs 1 and 2.