Molecular Docking Simulations Provide Insights in the Substrate Binding Sites and Possible Substrates of the ABCC6 Transporter

The human ATP-binding cassette family C member 6 (ABCC6) gene encodes an ABC transporter protein (ABCC6), primarily expressed in liver and kidney. Mutations in the ABCC6 gene cause pseudoxanthoma elasticum (PXE), an autosomal recessive connective tissue disease characterized by ectopic mineralization of the elastic fibers. The pathophysiology underlying PXE is incompletely understood, which can at least partly be explained by the undetermined nature of the ABCC6 substrates as well as the unknown substrate recognition and binding sites. Several compounds, including anionic glutathione conjugates (N-ethylmaleimide; NEM-GS) and leukotriene C4 (LTC4) were shown to be modestly transported in vitro; conversely, vitamin K3 (VK3) was demonstrated not to be transported by ABCC6. To predict the possible substrate binding pockets of the ABCC6 transporter, we generated a 3D homology model of ABCC6 in both open and closed conformation, qualified for molecular docking and virtual screening approaches. By docking 10 reported in vitro substrates in our ABCC6 3D homology models, we were able to predict the substrate binding residues of ABCC6. Further, virtual screening of 4651 metabolites from the Human Serum Metabolome Database against our open conformation model disclosed possible substrates for ABCC6, which are mostly lipid and biliary secretion compounds, some of which are found to be involved in mineralization. Docking of these possible substrates in the closed conformation model also showed high affinity. Virtual screening expands this possibility to explore more compounds that can interact with ABCC6, and may aid in understanding the mechanisms leading to PXE.

Introduction ABCC6 (also known as Multidrug Resistance Protein-6, MRP6) is a transporter protein, belonging to the adenosine triphosphate (ATP)-binding cassette (ABC) family, primarily expressed in liver and kidney [1]. Perturbance of the ABCC6 transporter due to mutations in the encoding ABCC6 gene causes pseudoxanthoma elasticum (PXE), an autosomal recessive connective tissue disorder, affecting the skin, eyes and cardiovascular system, due to progressive mineralization and fragmentation of elastic fibers in the extracellular matrix [2]. Current evidence suggests PXE to be a metabolic disease, in which the defective ABCC6 protein fails to transport one or more metabolites from liver and kidney cells into the circulation [3,4,5,6]. However, until now only in vitro substrates of the ABCC6 transporter have been identified, mainly glutathione conjugates, demonstrating ABCC6 to be an organic anion transporter [6,7,8]. Next to the speculation on the main physiological substrates, no structural data on possible substrate recognition and binding sites of ABCC6 are presently available.
As currently no high resolution X-ray crystallographic structure for ABCC6 is available, 3D homology modeling may be an alternative to gain insights into potential substrate binding sites and the mechanisms of substrate interaction with the protein [9]. A 3D configuration of a prokaryotic ABCC6 protein was successfully modeled using the X-ray structure of the Staphylococcus aureus Sav1866 export pump [10]. Fulop et al (2009) [11] used this model as a template to build an outward-facing conformation (that corresponds to the ATP-bound or closed conformation state) 3D homology model for human ABCC6 and have used this model and the distribution of PXE-causing mutations to demonstrate the relevance of the transmission interfaces as well as the ABC-ABC domain contacts for the function of the transporter. Varadi et al (2011) [12] were also able to build an ATP-free/wide open inward facing confirmation of this 3D model (http://www.enzim.hu/,varadi/ABCC6/).
In this study, we have first generated an ABCC6 homology model in open conformation, and applied molecular docking of known in vitro substrates of ABCC6, to evaluate if we could predict potential sites responsible for protein-substrate interaction. Based on the prediction of binding pockets in the ABCC6 transporter, we carried out a virtual screening in the open conformation ABCC6 model using the Human Serum Metabolome Database (HSMD) against these substrate binding sites of ABCC6, to generate a list of potential substrate molecules. In the second phase, we have generated a closed conformation homology model of ABCC6 to evaluate the reproducibility of the open conformation docking results. The compounds in this list with the highest binding energy are discussed on their potential role in PXE pathophysiology.

Homology modeling of ABCC6
The primary sequence of the human ABCC6 protein was retrieved from the SwissProt Protein database (UniProt Accession: O95255) which has a sequence length of 1503 amino acids. In order to find suitable template(s) for the ABCC6 protein, PSI-BLAST against all existing molecules in the Protein Data Bank (PDB) was performed [13], which revealed highest homology of ABCC6 with P-glycoprotein (P-gp, mouse ABCB1 transporter). Compared to P-gp, ABCC1 has a higher homology with ABCC6, but no PDB file of ABCC1 is listed in the PDB database. P-gp was therefore used as a template to generate an open conformation ABCC6 3D homology model. To generate the open conformation structure, initially four models which include amino acid residues 302 to 1503 of ABCC6 were generated using three Metaservers including I-TASSER [14] Genesilico [15] and Pcons [16], and the MODELLER v9.9 program [17]. Models obtained from I-TASSER, Genesilico, Pcons and MODELLER were then used again as templates to build a final model of ABCC6 using MODELLER 9.9. Secondary structure elements were revealed by Genesilico Metaserver. The loop regions of the final model were rebuilt using Modloop [18], and the energy minimization of the resulting final model was performed using the YASARA energy minimization server [19].
To meet the observation of a higher homology between ABCC6 and ABCC1 compared to P-gp, we generated a second homology model with ABCC1 as a template. Because of its absence in the PDB database, the template was based on the only available ABCC1 model reported by DeGorter et al. (2008), which is only in a closed conformation [20]. Because a 3D homology model always follows the spatial arrangement of the template, we could thus only generate a closed conformation ABCC6 model based on the ABCC1 template. The MODELLER v9.9 program and the template counterparts from 302-856 and 943-1503 were used to generate this closed conformation 3D homology model. In addition, a full length 3D structure of ABCC6 was generated using MODELLER v9.9 containing all 1503 amino acid residues with P-gp as a template.

Model validation
The stereo-chemical quality of the final models were evaluated using three different servers (ERRAT [21], Rampage [22] and ProQ [23]), which are used to determine the statistical significance of a protein 3D model considering spatial position of amino-acids and overall stability of the structure. Usually more than 90% accuracy is expected for the validity of a model. Furthermore, template-target superposition based on structural alignment has been used in SuperPose [24] to understand the coverage between template and target-protein, and the reliability of the model.

Topology and intrinsically disordered region prediction
The membrane topology of the open conformation ABCC6 protein was predicted using software from the PONGO server [25]. It uses four selected and high scoring predictors -MEMSAT, TMHMM2, PRODIV and ENSEMBLE 1.0, and the predicted result can be displayed in a graphical view. As TMHMM [26] has been considered the best transmembrane prediction method, we have used the TMHMM server (http://www.cbs.dtu.dk/services/ TMHMM/) individually to assess the probability of the transmembrane domain position in the ABCC6 sequence (302-1503). Furthermore, the ABCC6 amino acid sequence (302-1503) was divided into four segments and submitted to the MetaDisorder server [27], a meta-server for the prediction of the intrinsically disordered region. Exclusion of these disordered regions was considered during generation of our ABCC6 models.

Electrostatic charge distribution
Electrostatic potential surface (EPS) of both the open and closed conformation ABCC6 modeled protein including its substrate binding sites was calculated using PBEQ solver (http://www. charmm-gui.org/?doc = input/pbeqsolver) and resulted outputs (3D structure with surface potential scale of 210 Kcal/mol to + 10 Kcal/mol) were displayed in PyMOL (http://www.pymol.org).

Ligand molecules selection
Initially, 8 previously reported in vitro substrates (Table S1) were designed and validated to use as reference ligands to perform molecular docking. Ligand coordinates (2D structure; Figure S1) were obtained from the DrugBank [28], PubChem Compound database (http://pubchem.ncbi.nlm.nih.gov/) and ZINC database [29]. Second, vitamin K1 (VK1) and vitamin K2 (VK2) were evaluated as ligands, as well as vitamin K3 (VK3). The latter was used as a negative control, as it has been previously shown not to be transported by ABCC6 [30]. Third, after delineation of predicted binding site(s), all 4651 metabolites listed in the Human Serum Metabolome Database (HSMD; http://www. serummetabolome.ca/scripts/hmdbDownload.cgi) were used to dock our ABCC6 open conformation 3D homology model. The substrate molecules with no available coordinates were drawn and converted into 3D structure, and generation of mol-files was carried out using the ACD/ChemSketch software [31]. Energy minimization of these compounds was then performed using the United Force field (UFF) program [32].

Molecular docking approach for the identification of substrate binding sites
To identify the substrate binding sites and possible binding pattern of 8 reported substrates and three vitamin K isoforms with the ABCC6 transporter protein, molecular docking was performed using Autodock 4.2 [33] and Autodock Vina [34]. In this study, a rigid docking protocol was considered in order to reduce the computational cost and time in which the receptor was kept rigid and ligands were allowed to rotate. For blind docking, in order to remove the bias, the whole protein transmembrane (TM) region was covered with a grid. The size of the grid box was set to 80 Å x 80 Å x 80 Å (x, y and z) with 0.925 Å spacing between the grid points. To search the best conformation space of the ligand, the Lamarckian Genetic Algorithm was employed. During the blind docking process, 100 different conformers were generated for each ligand with a population size of 150 individuals. A maximum number of evaluation was set to 2,500,000 and maximum number of generation to 27,000, while other parameters were kept as default.
A refined docking was then carried out in a specific area to optimize the results, for which the grid box size was also set to 80 Å680 Å680 Å (x, y and z) whereas the space between grid points was reduced to 0.55277 Å . For the docking process, the number of generation was reduced to 20 conformers for each ligand with a population size of 250 individuals. Maximum number of evaluation was set to 25,000,000 while maximum number of generation was set to 27,000. Other parameters were used as default. In both the initial blind docking and refined docking approach, selection of the final pose of ligand bound with the protein was done by giving priority to the lowest bindingenergy conformation with largest binding-cluster of the total conformers.
A detailed analysis of residues involved in the interaction between ligands and protein was conducted using the Discovery Studio 3.1 Visualizer software (Accelyrs Software Inc., Discovery Studio Modeling Environment, and Release 3.1.San Diego: Accelrys Software Inc., 2012).
Manual inspection was employed to investigate the occurrence of mutations in substrate binding sites (SBS). To identify the number of naturally found mutations residing in SBS, the aminoacids involved in SBS were compared to the list of natural variants in ABCC6 at UniProt (http://www.uniprot.org/uniprot/ O95255#section_features) and the unique mutations annotated in the Leiden Open Variation Database (LOVD) of the ABCC6 gene (http://www.ncbi.nlm.nih.gov/lovd/home. php?select_db = ABCC6).

Virtual screening and docking
First, structure based virtual screening against HSMD was carried out with Autodock Vina (for each compound 10 bound conformations were generated). During virtual screening the docking was applied only to the substrate binding sites in the transmembrane region. For this, the grid box was set to 70 Å670 Å660 Å with default value of 1.0 Å spacing by Autodock Vina. The grid box was large enough to cover the whole predicted substrate binding region. Furthermore, Autodock was used to verify the docking results by re-docking the resulted compounds, and to analyze the interaction between top scoring compounds and the ABCC6 protein models.

Homology modeling of human ABCC6
To generate a homology model of the ABCC6 protein, we initially performed a PSI-BLAST search towards the PDB database against the ABCC6 sequence extracted from UniProt (AC no: O95255). Generally, identity of $30% between the target and template sequence is widely accepted for comparative modeling [35]. However, for membrane transporter proteins identity of $20-40% is expected to establish a protein model of sufficient quality [36,37]. We found mouse P-gp (PDB ID: 3G5U, 3G60 and 3G61), and Sav1866 (prokaryotic transport pump) to have the highest sequence identity and query coverage among all the proteins available in the PDB database, and chose them as a template to generate an open conformation ABCC6 homology model. BLAST result shows that ABCC6 shares a 26% sequence identity and 57% query coverage with mouse P-glycoprotein, and 27% identity and 58% coverage with Sav1866. Though within the ABCC transporter family ABCC6 has highest sequence homology with ABCC1 (44.045%), no PDB file of ABCC1 is listed in the PDB database. However, De Gorter et al. (2008) published an ABCC1 closed conformation model [20]; comprehensive literature mining revealed this to be the only available template which we used to generate a closed conformation ABCC6 homology model. ABCC6 shows 43.97% sequence identity and a 90.57% query coverage with the human ABCC1 model. Generally, native state or an open conformation state is a more stable form of a protein and more eligible for substrates to bind. Thus, to investigate potential substrate binding sites of a protein, use of an open conformation model is first choice. Conformational change of the closed conformation model during binding and release of the substrate is a common phenomenon in ATP-dependent transporters. Moreover, a closed conformation model expected to show low affinity or a higher binding energy to the substrate compared to an open conformation model. For these reasons, we selected Pgp as template in our principle docking ABCC6 open conformation model. An additional advantage of P-gp is the presence of complex ligands in inward facing conformation, which can provide information about the substrate binding sites (SBS) in ABCC6. It has been previously shown that TMD 0 and L 0 are responsible for the membrane localization as N-terminal truncated ABCC6 mutants missing both transmembrane (TMD 0 ) and cytoplasmic linker (L 0 ) domains were found to be absent from the plasma membrane [3]. There is however no evidence of TMD 0 being related to substrate binding. Rather, the reported experiments indicate that the core region of ABCC6 is more likely to be involved in substrate recognition and binding. Moreover, as the counterparts of TMD 0 and L 0 are unavailable in databases, those external domains were excluded from the N-terminal region (1-301 amino acids) of the ABCC6 amino acid sequence. Contrary, our models contain the recently reported conserved C-terminal PDZ-like sequence-facilitating protein-protein interactions by acting as scaffold, regulating protein activity, protein stability, and protein mobility in the membrane -which was found to be critical for the regulation of trafficking and membrane localization of ABCC6. Removal of or mutation in this sequence results in decreased protein expression and increased degradation of ABCC6 [38]. We also have generated an open conformation 3D homology model including all 1503 amino acid residues by using P-gp as a template, but failed to overcome the quality threshold ( Figure S2) as discussed below.
To prevent generation of low quality models due to low targettemplate sequence identity, we used three different Metaservers (I-Tasser, GeneSilico, and Pcons) and the MODELLER program, and generated four initial models for each server and program. Initially, MODELLER used 3G5U, 3G60 and 3G61 as templates and generated a good open conformation model. I-TASSER also resulted in a reliable model with a C-value of 20.31 (the C-value indicates the confidence score calculated by I-TASSER for the predicted model and ranges from 25 to 2) and a TM score of 0.67 (the TM score measures the structural similarity between two structures; a TM score .0.5 indicates a model of correct topology) ( Figure S3). Then, by using these models as templates, the final open conformation model was generated by using MODELLER which generate a high quality model with a DOPE score of 2 90805.80 ( Figure 1). In PyMOL, two chains (chain A and chain B) were detected in the ABCC6 structure. Chain A had been constructed by the amino acids from 302 to 856 and chain B from 857 to 1503. Furthermore, the loop regions less than 12 residues of the final model (560-565, 973-979, 1189-1200, 1221-1225) were remodeled using the ModLoop server, as it is very difficult to model loops having more than 12 residues [39]. For the closed conformation model (Figure 1), only MODELLER program (obtained DOPE score of 2131757.60) and the counterparts of the template were used to generate the model.

Validation of 3D homology models
Validation of the stereo-chemical properties of the final models was done using three servers: Rampage, Errat and ProQ. Stereochemical quality results for the open conformation ABCC6 model obtained by Rampage ( Figure S4A) shows that 96.6% residues are in the favored region, 3.0% residues are in the allowed region and 0.4% residues are in the outliner region. In the closed conformation ABCC6 model, Rampage showed that 92% of the amino acids residues are plotted in the favored region, 7.1% residues are in the allowed region and 0.3% residues are in outliner region ( Figure S4B). Third, an open conformation fulllength ABCC6 model ( Figure S2A) containing all amino acid residues (thus including the disordered region) did not pass the quality threshold as only 77.5% of amino acid residues were plotted in the favored region ( Figure S2B). The residues in the outliner or disallowed region belong to either the disordered region or the NBD domain and are thus less likely to play a role in    Figure S4C, S4D), where a value above 90% usually indicates a high quality model [21].  Figure S6), shows the closeness between these two models. Thus, overall quality validation revealed that the open conformation ABCC6 homology model generated using P-gp as a template is the most accurate model to perform molecular docking analysis.

Analysis of topology and secondary structure, and prediction of protein disordered region
According to previous studies, the membrane topology model of human ABCC6 is composed of 17 transmembrane helices with the topology arrangement of TMD0-L0-TMD1-ABC1-L-TMD2-ABC2, two nucleotide-binding domains (NBD1 and NBD2) and two Walker motifs Walker-A and Walker-B [3]. The TMD0 is formed by five membrane spanning helices whereas the TMD1 and TMD2 domains are formed by 6 membrane helices each [3,40]. The predicted result found by the PONGO server (presented only for the open conformation ABCC6 model in Figure S7) in our study shows a similar and consistent conformation. This result is also supported by the TMHMM server result where it showed a plot of posterior probabilities (0 to 1) of inside, outside and TM helix. At the top of the plot (between 1 and 1.2) the N-best predictions are shown. TMHMM showed two major spots of transmembrane helices from 302-600 and 900-1225 as expected. In TMHMM, the first and second portion present 6 and 4 membrane helices respectively, while PONGO predicts 6 membrane helices in both portions.
The intrinsically disordered/unstructured region of a protein is the region which does not have a well-defined three dimensional structure and can exhibit a large number of conformations over time [41]. The region between amino acid residues 856 to 943 in our study was found to be disordered and less likely to form a stable structure for the ABCC6 protein ( Figure S8). Indeed, the model including this region did not fulfil the quality criteria. This disordered region was removed from the ABCC6 sequence in the open and closed confirmation model, and thus ignored for model generation and blind docking analysis. Since the substrate binding sites of the ABCC6 transporter are yet to be defined, we used a blind docking approach by using Autodock tools for the 8 compounds that were reported to be transported in vitro by ABCC6 as reference substrates to our models. In addition, because of the suggestion of involvement of vitamin K in the PXE pathophysiology, we also performed blind docking for the three known vitamin K isoforms: VK1, VK2 and VK3. The latter has already been shown not be transported by ABCC6 and was thus used as a negative control [30].
In the blind docking approach, by using the grid volume that covers the whole transmembrane region of the protein for the 10 compounds (8 reported substrates, VK1 and VK2) distributed over the open conformation homology model the potent binding areas were predicted. Further, a refined docking for specific areas was carried out to optimize the results. This refined docking approach resulted in 20 different binding conformations for each compound. These conformations were clustered together with RMSD (root mean square deviation) ,4 Au. The lowest binding energy conformation from the largest cluster was chosen.
We noted that some ligands that bind in these predicted binding sites also partly overlap with each other (Figure 2A and Figure 3), thus resembling a large flexible cavity instead of two separate binding pockets. The presence of such a large flexible substrate binding cavity has been shown for several ABC transporters (e.g. P-gp, ABCC1) and allows the possibility of a wide range of substrates [42,43].
On the other hand, refined docking of the 10 reference compounds shows that all the compounds bind in a specific area of the closed conformation ABCC6 model. Structure analysis revealed only Chain-A (302-856AA) to be involved in the substrate binding site conformation (substantial amino acids are Ser470, Gln540, Thr543, Asn575, Asn578, Lys579), which revealed .55% amino acid residues similarity to SBS-1 in the open conformation ABCC6 homology model. Taking into account the closed conformation refined docking result, this is suggestive for ABCC6 to have a single big flexible binding pocket. However, we need to take into account that the closed conformation model result may be less reliable and definite conclusion at this point are therefore not possible.
Remarkably, average docking energy of Autodock and Autodock Vina in both models showed highest binding affinity for compound BQ-123 followed by teniposide. BQ-123 was seen to interact with the active site residue Lys579 by forming two hydrogen bonds with bond distance of 1.73Au and 2.26Au and single hydrogen bond with residue Gln1217 with bond distance of 1.75Au. Teniposide was found to interact with residue Gln540 by two hydrogen bonds with bond distance of 2.375Au and 2.166Au and single hydrogen bond with residue Thr543 with bond distance of 1.93Au. In contrast, VK3 shows low binding affinity and higher binding energy (than the average of 10 reference compounds). Moreover, this binding energy was generated for an outer face site in the open conformation ABCC6 model ( Figure S9). Its low affinity and lack of specificity for a binding site is consistent with VK3 not being transported by ABCC6. The final docking result of the 10 compounds along with corresponding binding energy for both models is illustrated in Table 1 and the representative binding mode is shown in Figure 4. Residues involved in hydrophobic interactions with corresponding substrate molecules are shown in Table S2.
Furthermore, multiple sequence alignment with the two most closely related members of the human ABC family, ABCC1 and ABCC2 shows that most of the residues in the predicted binding regions from ABCC6 are found to be conserved with those residues of ABCC1 or ABCC2 that are reported to play significant role in substrate binding ( Figure S10) [44,45,46,47]. Residues such as Trp1218 contributing in hydrogen bonding and hydrophobic interactions between ABCC6 and its substrates, was found conserved. Asn575, Asn578, Lys579, Thr1213 and Arg1221 residues are conserved among ABCC6 paralogs, found in the substrate binding pockets of other ABC proteins, and have been described to be efficient for substrate specificity and transport function [45,47,48,49,50]. As shown in Table S2, the majority of residues that contribute in the interactions with substrate molecules are present in TM 9, 10, 11, 16 and 17. This is consistent with a number of observations suggesting that ABCC1, ABCB1 and other related ABC proteins possess multiple substrate binding sites and that the transmembrane domains such as TM9, 10, 11, 16 and 17 play a significant role in substrate recognition and binding [42,51,52].
Evaluation of the mutations which affect the amino acids involved in SBS revealed that, among 288 known unique mutations (www.ncbi.nlm.nih.gov/lovd/), only 16 mutations have been found to be involved in SBS-1, and 4 mutations have been found in SBS-2 of the open conformation model ( Table 2). This result may signify that few ABCC6 mutations cause disruption of substrate binding to ABCC6, but that most disease causing mutations of ABCC6 are probably involved in Nucleotide Binding Domain (NBD) or other portions of the protein that are important for protein localization, so that it causes the inactivation or mislocalization of ABCC6. This was recently also suggested by Xue et al. [38].

Electrostatic Potential Surfaces (EPS)
EPS of the ligand recognition area in the ABC transporters is of particular interest, since EPS can elucidate the substrate differentiation between these transporters [53]. For this reason, the EPS value of our models was calculated using the PBEQ solver program. Analysis of the electrostatic charge distribution in our models revealed variations in surface electrostatic at different cavities in the ABCC6 protein ( Figure 5). Our study shows that the predicted substrate binding site(s) of ABCC6 model exhibit an overall positive (or polar) with neutral (or hydrophobic) and large negative area ( Figure 5). Most of the positively charged residues (in blue color) were found in the cytoplasmic part of the protein, whereas mostly neutral (or hydrophobic; in white color) with some spots of positively and negatively (in red color) charged residues were found at the transmembrane region of the protein surface. The presence of hydrophobic pockets with spots of negatively and positively charged residues indicate the role of these residues in electrostatic interaction with specific substrates [54]. Moreover, this also suggest that the negatively charged substrates such as S-2(2, 4-dinitrophenyl glutathione), Leukotriene (LTC4) and BQ-123 might interact electrostatically with positively charged residues within two predicted binding regions whereas the positively charged and neutral hydrophobic substrates such as Doxorubicin, Daunorubicin, Etoposide, Teniposide, vitamin K1 and vitamin K2 might interact electrostatically with negatively charged residues of the ABCC6 open conformation model. Since the majority of multidrug transporters transport drugs with a positive charge, it is reasonable to assume the involvement of negatively charged intra-membranous acidic residues in substrate binding [54]. The role of these residues in substrate recognition has already been reported in number of MDR proteins such as the E.coli MdfA protein [55] and in murine MRP1, where a single 'Glu' residue at the TM15 region was found to have a role not only in transporting cationic anthracyclines but was also important in transporting conjugated organic anions, such as LTC4 [56].

Virtual screening of Serum Metabolome Database
The virtual screening of 4,651 compounds from HSMD in the ABCC6 open conformation model using mean binding energy 2 8.5 Kcal/mol for Autodock Vina and 27.2 Kcal/mol for Autodock from blind docking as a cut-off threshold value (Table  S1) resulted in the selection of the top 50 compounds (1.075% of the total compounds in HSMD) ( Table 3)  From the listed 50 compounds, several compounds are involved in vitamin D metabolism, disturbances of which can play a role in calcification [57,58]. The highest affinity for SBS-1 with a binding free energy of 210 Kcal/mol, was found for 25-Hydroxyvitamin D2 (HMDB01438) in ABCC6 open conformation model. It forms 3 hydrogen bonds with residues Thr543, Gln957, Arg964 with bond distance of 2.1Au, 1.8Au, and 2.3Au respectively (shown in Figure 6). Besides vitamin D metabolites, several other compounds were found to be involved in in calcification and calcification related mechanisms ( Table 3). Several of the listed compounds, such as HMDB01830, HMDB00760, HMDB00907, HMDB00121, HMDB00761, HMDB00917, HMDB00946, HMDB01926, HMDB01562, and HMDB00326, have previously been reported to be transported by other ABC transporter proteins [59]. Further in vitro and in vivo validation is necessary to evaluate whether the actual substrates of ABCC6 are among these compounds.

Conclusion
The current study provides an insight of the 3D structure and predicted substrate binding sites of the ABCC6 transporter. Molecular docking of reference substrates predicted two binding sites or a large flexible binding pocket of the protein, which allowed searching novel substrates of ABCC6 by performing ligand-based virtual screening. We have identified compounds from HSMD which bind with high affinity to the predicted substrate binding sites of the ABCC6 protein. The amino-acid residues which are involved in the predicted binding sites are conserved among ABC transporters and were also found to reside in the binding sites of ABCC1 and ABCC2 ( Figure S10). Though    this can be seen as indirect support of the relevance of our findings, the main limitations of this study remain that it is based on in silico modeling and prediction. For the blind docking experiment with HSMD metabolites, we have so far only considered the 50 compounds with highest binding affinity. We should also consider the possibility that the substrate of ABCC6 is ranked lower than the top 50 of these compounds, that more than one of the top 50 compounds can be transported at minimal level by ABCC6 and that there may be a third substrate binding site for the main substrates of ABCC6. Further, we cannot exclude that the ABCC6 substrate has no direct relation to mineralization but is an intermediate in a regulatory process. Nevertheless, the results of this study provide opportunities to further analyze in vitro and in vivo compounds which may interact with ABCC6, with potential insights into the function of this transporter and subsequently the mechanisms underlying PXE.   The core portion (300 to 600 amino-acid positions) of ABCC6 between two transmembrane bundles was found to be highly conserved among the templates and target sequence. The probable helix beta sheet has been indicated by small boxes; the conserved amino acids are indicated by the red boxes.

Supporting Information
(TIF) Figure S7 Prediction of transmembrane (TM) topology of the ABCC6 protein model using the PONGO server. PONGO server (used different programs including MEMSAT, TMHMM2, PRODIV and ENSEMBLE 1.0) revealed the presence of 12 alpha-transmembrane domains (shown in red); 6 of them are located in between 302-600 amino acid residues, the remaining 6 are located in between 945-1225 amino acid residues. (TIF) Figure S8 Structural disordered region of ABCC6 as identified using the MetaDisorder server. The plot shows the results from a series of predictors including-(a) DISOPRED2 [89], (b) PrDOS [90] (c) IUPred short [91] (d) IUpred long [91] (e) Pdisorder (SoftBerry Product) (f) DISPROT [92] and GlobPlot [93]. The region between amino acids 850-950 was found to be disordered with a Disorder tendency value greater than 0.5.