International Glossina Genome Initiative 2004–2014: A Driver for Post-Genomic Era Research on the African Continent

Human African trypanosomiasis (HAT), also known as sleeping sickness, is a neglected disease that impacts 70 million people distributed over 1.55 million km2 in sub-Saharan Africa and includes at least 50% of the population of the Democratic Republic of the Congo [1]. Trypanosoma brucei gambiense accounts for more than 98% of the infections in central and West Africa, the remaining infections being from Trypanosoma brucei rhodesiense in East Africa [2]. The parasites are transmitted to the hosts through the bite of an infected tsetse fly. Disease control is challenging as there are no vaccines, and effective, easily delivered drugs are still lacking. Treatment invariably involves lengthy hospitalization, with both medical and socioeconomic consequences. Control of disease can be accomplished, however, through vector control, which largely to date has aimed to reduce insect populations rather than eliminate them. In the mid-1990s, disease cases were increasing but tsetse research and facilities that could maintain tsetse fly colonies were on a decline globally, particularly in Africa. This was also at a time when new scientific advances were being realized in other vector-borne disease systems, particularly building on the revolution in genome sequencing and genome-wide analyses. 
 
In 2004, the Molecular Entomology program of the World Health Organization Tropical Diseases Research (WHO-TDR) unit began an initiative (International Glossina Genome Initiative, IGGI) that brought together an interdisciplinary group of researchers from multiple countries and institutions to explore possibilities for genomics and post-genomics research in the Tsetse field. The first major goal of IGGI was to produce a reference genome sequence for single Glossina species—work that is described in [3]. However, alongside this direct output, there were two further goals for this initiative. First, to understand the genomic basis of unique aspects of tsetse biology with a focus on applied discoveries, those that could be exploited to aid in vector control; and second, to use this network to help build global capacity, in Africa and globally, for genetics and genomics-based research into tsetse. This vision was crystalized at the outset with a decision to generate sustainable genomic resources on the African continent, and in turn to use these resources to provide training in genome analytical skills (Figure 1). The IGGI consortium met annually to organize training workshops, to initiate transcriptomics-based projects, to recruit financial resources to undertake the WGS initiative and to track progress once the project was successfully launched. Wellcome Trust funding for the WGS project provided the much-needed momentum to ensure the project remained on track. 
 
 
 
Figure 1 
 
Glossina molecular toolbox to facilitate genomics training.

Human African trypanosomiasis (HAT), also known as sleeping sickness, is a neglected disease that impacts 70 million people distributed over 1.55 million km 2 in sub-Saharan Africa and includes at least 50% of the population of the Democratic Republic of the Congo [1]. Trypanosoma brucei gambiense accounts for more than 98% of the infections in central and West Africa, the remaining infections being from Trypanosoma brucei rhodesiense in East Africa [2]. The parasites are transmitted to the hosts through the bite of an infected tsetse fly. Disease control is challenging as there are no vaccines, and effective, easily delivered drugs are still lacking. Treatment invariably involves lengthy hospitalization, with both medical and socioeconomic consequences. Control of disease can be accomplished, however, through vector control, which largely to date has aimed to reduce insect populations rather than eliminate them. In the mid-1990s, disease cases were increasing but tsetse research and facilities that could maintain tsetse fly colonies were on a decline globally, particularly in Africa. This was also at a time when new scientific advances were being realized in other vector-borne disease systems, particularly building on the revolution in genome sequencing and genomewide analyses.
In 2004, the Molecular Entomology program of the World Health Organization Tropical Diseases Research (WHO-TDR) unit began an initiative (International Glossina Genome Initiative, IGGI) that brought together an interdisciplinary group of researchers from multiple countries and institutions to explore possibilities for genomics and post-genomics research in the Tsetse field. The first major goal of IGGI was to produce a reference genome sequence for single Glossina specieswork that is described in [3]. However, alongside this direct output, there were two further goals for this initiative. First, to understand the genomic basis of unique aspects of tsetse biology with a focus on applied discoveries, those that could be exploited to aid in vector control; and second, to use this network to help build global capacity, in Africa and globally, for genetics and genomics-based research into tsetse. This vision was crystalized at the outset with a decision to generate sustainable genomic resources on the African continent, and in turn to use these resources to provide training in genome analytical skills ( Figure 1). The IGGI consortium met annually to organize training workshops, to initiate transcriptomics-based projects, to recruit financial resources to undertake the WGS initiative and to track progress once the project was successfully launched. Wellcome Trust funding for the WGS project provided the much-needed momentum to ensure the project remained on track.

Data Curation as a Model for Genomics Training
In lieu of a completely sequenced Glossina genome, the consortium recruited global funds that enabled the development of a molecular toolbox, which initially included the production of expressed sequence tags (ESTs)-sequences produced from the ends of cloned cDNA fragments-from 11 large cDNA libraries, along with the construction and sequencing of a bacterial artificial chromosome (BAC) library. The molecular toolbox became the training instrument for five ten-day bioinformatics workshops held at the University of the Western Cape, South Africa, involving a total of 75 students and researchers from 18 African countries (Table 1). Although, by their very nature, the EST libraries only offered a partial and highly fragmented picture of the genome, they were nonetheless an opportunity to gain insight into the molecular mechanisms and processes that govern vectorial capacity of Glossina [4][5][6][7][8][9]. These workshops provided an environment for students to interact with international experts from Yale University, European Bioinformatics Institute, Wellcome Trust Sanger Institute, Liverpool School of Tropical Medicine, and RIKEN. The manual annotation exercise of the Glossina transcriptome resulted in a curated Glossina dataset (www.vectorbase.org) and afforded graduate students an opportunity to acquire a skill set that could be transferred to other sequencing projects. At least 20% of participants have subsequently forged international collaborations with these trainers or have, themselves, become sought after as genomics trainers and supervisors of graduate students.

International Student Exchange Programs
The IGGI consortium held annual executive meetings, which either coincided with workshop activities or at scientific meetings where many members were present. During the annual IGGI consortium meeting in November 2009, the tsetse biological laboratories from within the consortium established themselves as remote mentoring sites for the Glossina Functional Genomics Network. This enabled three African students to participate in a two-month exchange program hosted annually in laboratories in the United States and Europe. These graduate students gained additional skills in genomics and functional genomics areas, and the visits provided them with an established scientific network for academic support while they continued their graduate studies. The first cohort of the exchange program was subsequently funded to present their work at the bi-annual African Society for Bioinformatics and Computational Biology meeting in Cape Town, South Africa in 2011. These students have afterwards either completed their PhD degrees or are in their final stages of doing so. Four additional centers on the African continent were identified as partners within the Glossina Functional Genomics Network-in Malawi, Zambia, Sudan, and Uganda.
The skills development funding obtained from the WHO-TDR program was maximized to create a researchenabling environment in tsetse-endemic countries by supporting recipients of the exchange programs. To this end, in collaboration with Professor Erik Bongcam-Rudloff, Uppsala University, the EBIOKIT was adopted for African tsetse research laboratories to carry out genomic analyses locally without the need to rely on internet connectivity. The bioinformatics computing environment was rolled out in Kenya, where 44 participants received hands-on training to use the myriad of bioinformatics analytical tools embedded within the EBIOKIT. This international collaborative training session impacted Egerton University's Faculty of Agriculture, Jomo Kenyatta University of Science and Technology, Moi University, Kenya Wildlife Services, and the Kenyan Ministry of Higher Education.

African Institution Partnerships
The impact of the IGGI consortium activities over the past eight years can be measured by the capacity development activities afforded to both students and junior researchers in Africa. The IGGI activities have been a catalyst for graduate training in genomics on the African continent. Notwithstanding the training workshops, at least ten MSc and 12 PhD students were trained as a direct consequence of access to Glossina genome data (Table S1). Having an established network allowed some of the African participants to gather additional research funds from the WHO-TDR Entomology Committee to begin a population genetics investigation in the Lake Victoria Basin. This initiative further enabled the network to recruit three PhD students to be trained in genomics research. Similarly, the Yale scientists within IGGI were successful in securing funds from the Fogarty International Center to help support capacity in tsetse research in East Africa. These funds allowed the network participants to organize workshop activities in Kenya and Uganda focusing on population genetics/ genomics, bioinformatics, and functional genomics fields for tsetse researchers and students.
Researchers, on the other hand, who have been part of the IGGI consortium, have facilitated short courses at their local institutions; been invited to act as reviewers for international journals; and

Conclusion
The genomic activities of the IGGI consortium over the past ten years have provided a skill set for African researchers to exploit the recently sequenced Glossina morsitans morsitans genome. The completion of a ten-year genome project has cemented a scientific network of trypanosomiasis researchers that will be required to provide mentoring to new students and researchers as strategies are sought to best integrate the valuable genomic resources for the Tsetse and the TriTryp genomes. Moreover, the involvement of scientists from diseaseendemic countries from the start sets the Tsetse genome project apart from other vector or parasite projects and will help to ensure a greater sense of owner-ship and long term commitment. Access to the human genome sequence, the trypanosomatid genomes [10][11][12][13], and trypanosome 'omics data [14,15] has provided insight into host-parasite interactions and the identification of new vaccine candidates or chemotherapeutic targets. Targeting tsetse-trypanosome interactions with the goal of identifying genes that modulate trypanosome transmission has entered the realm of feasibility with the completion of the tsetse genome sequencing project.