EWET: Data collection and interface for the genetic analysis of Echinococcus multilocularis based on EmsB microsatellite

Evolution and dispersion history on Earth of organisms can best be studied through biological markers in molecular epidemiological studies. The biological diversity of the cestode Echinococcus multilocularis was investigated in different cladistic approaches. First the morphological aspects were explored in connection with its ecology. More recently, molecular aspects were investigated to better understand the nature of the variations observed among isolates. The study of the tandemly repeated multilocus microsatellite EmsB allowed us to attain a high genetic diversity level where other classic markers have failed. Since 2006, EmsB data have been collected on specimens from various endemic foci of the parasite in Europe (in historic and newly endemic areas), Asia (China, Japan and Kyrgyzstan), and North America (Canada and Alaska). Biological data on the isolates and metadata were also recorded (e.g. host, geographical location, EmsB analysis, citation in the literature). In order to make available the data set of 1,166 isolates from classic and aberrant domestic and wild animal hosts (larval lesions and adult worms) and from human origin, an open web access interface, developed in PHP, and connected to a PostgreSQL database, was developed in the EmsB Website for the Echinococcus Typing (EWET) project. It allows researchers to access data collection, perform genetic analyses online (e.g. defining the genetic distance between their own samples and the samples in the database), consult distribution maps of EmsB profiles, and record and share their new EmsB genotyping data. In order to standardize the EmsB analyses performed in the different laboratories throughout the world, a calibrator was developed. The final aim of this project was to gather and arrange available data to permit to better understand the dispersion and transmission patterns of the parasite among definitive and intermediate hosts, in order to organize control strategies on the ground.


I. Introduction to the EmsB microsatellite
The EmsB marker is a microsatellite present in about 40 copies in the Echinococcus multilocularis genome, located on chromosome 5 [1]. The flanking regions of EmsB are highly conservative, but the microsatellite pattern (CA)n(GA)n present a size polymorphism, with independent mutations occurring in the (CA) and the (GA) repetitions ( Figure 1). In order to study the microsatellite polymorphism in E. multilocularis specimens, a PCR was performed on these targets and primers were designed in the microsatellite flanking regions [2]. The EmsB marker was used for genotyping E. multilocularis on different geographical scales, from a micro-local scale to a regional scale [3][4][5][6][7][8]. These guidelines will allow researchers to use and analyze EmsB data from their own samples, from DNA extraction to genotyping studies.

II. DNA extraction
Total genomic DNA is isolated and purified from a tissue sample (unique worm, isolated egg or approx. 50 mg of metacestode), using a DNA extraction kit for tissue. The procedure is carried out according to the manufacturer's protocol. Purified DNA is eluted with 200 µl of elution buffer (provided by the manufacturer) for metacestode samples and unique worm, or 100 µl for eggs, in order to obtain optimal DNA concentrations. The DNA concentration is checked with a spectrophotometer apparatus. The limit of sensitivity was 1 fg of DNA used for EmsB-PCR [2]. Theoretically, DNA purified from one egg can be used as a matrix for PCR. The DNA samples have to be stored at -20°C until use in PCR.
Primers were designed in the highly conservative flanking region of the microsatellite (Figure 1). The EmsB A primer is 5'-labeled with a fluorochrome (ex. FAM), and is 20-bp long. The EmsB A primer can be ordered as a "modified" oligonucleotide. The EmsB C, 20-bp long primer can be ordered as an "unmodified" primer.

IV. Amplification by PCR
The EmsB-PCR is performed in a 30 µl reaction mixture containing 50 to 100 ng of DNA, 200 µM of each deoxynucleoside triphosphate, 0.4 µM of fluorescent forward primer, 5-labeled specific fluorescence dye, 0.7 µM of classical reverse primers, and 0.5 U of enzyme, e.g. Taq DNA polymerase enzyme associated with the corresponding PCR buffer. The PCR amplification is achieved in a thermocycler under the following conditions: an initial denaturation step at 94°C for 5 min and 30 cycles with denaturation at 94°C for 30 s, annealing 60°C for 30 s, extension at 72°C for 1 min and a final extension step at 72°C for 10 min (minimum). One PCR is enough to obtain an EmsB profile. It is possible (but not essential) to control the size of the PCR products by electrophoresis on 1% agarose gel.
V. Size polymorphism analysis PCR products are studied in fragment size analysis. To assess the polymorphism of size, an automatic sequencer can be used, such as ABI Prism 3100 or 3500 automatic sequencer (Life Technologies, Foster City, CA) or Beckman CEQ 8000 (Beckman Coulter, Fullerton, CA). A molecular-weight size marker is used to specify the size of the PCR fragments. The fluorescence signal generated by the labeled primer is detected by colorimetric reading.
Correspondences are established to assess the size of the amplified fragments using dedicated software (e.g. Genotyper 3.7 for the ABI apparatus or Genetic Analysis System 8.0.52 for the Beckman apparatus).

VI. Example of EmsB electrophoregram and interpretation
After the fragment size analysis is performed, an EmsB electrophoregram is obtained ( Figure 2). We can observe size standard (1) and a series of peaks (2) with different sizes (from 209 to 241 bp observed in E. multilocularis samples from endemic regions worldwide). The height of peaks is different from one peak to another and refer to the number of EmsB fragment copies for a given size. The size and fluorescence intensity for each peak are recorded in a spreadsheet ( Table 2).
1. Recording EmsB data on a spreadsheet: a) EmsB fragment size On the electrophoregram, the size of the fragments has to be adjusted (see the "raw peak size" and "adjusted peak size" lines in Table 2 -Step 1). Because the EmsB variations are due to the addition or suppression of 2 nucleotides (CA) and (GA), the minimum distance between two peaks is 2 bp.

b) Fluorescence intensity
For each peak, the fluorescence intensity (FI) is recorded ( Table 2 -Step 2). This intensity depends on the initial quantity of DNA used for the PCR. First the lowest peak values (under 10% of the highest peak) have to be removed. Second the FI values are thus normalized: for each peak the FI is divided by the sum of the entire FI for a given sample (e.g. for the peak at 215 bp 6185/21797 = 0.28). The sum of the normalized values is equal to 1. The distance between samples will be calculated according to these normalized values. NB. substitute coma by point. Fragment sizes and normalized FI ( Table 2 -Step 3) are saved in a text file in tab format (*.txt) (Figure 3).   The relationships between samples are represented by generating dendrograms of distance, using the Unweighted Pair Group Method with Arithmetic mean (UPGMA), which is a simple agglomerative hierarchical clustering method, based on pairwise similarity between units. The algorithm allows us to build a phylogenetic tree reflecting the structure present in a pairwise similarity matrix. Here the tree is considered as a dendrogram, not as a phylogenetic tree, because the homozygosis or heterozygosis origins of the EmsB loci are unknown. For each successive iteration the nearest two clusters are combined into a higher-level cluster. The arithmetic average distance between two isolates is calculated for each cluster constituted.
Because of the UPGMA method, the representation of the relationships between samples could change according to the isolates included in the model.