Cysteine proteases in protozoan parasites

Cysteine proteases (CPs) play key roles in the pathogenesis of protozoan parasites, including cell/tissue penetration, hydrolysis of host or parasite proteins, autophagy, and evasion or modulation of the host immune response, making them attractive chemotherapeutic and vaccine targets. This review highlights current knowledge on clan CA cysteine proteases, the best-characterized group of cysteine proteases, from 7 protozoan organisms causing human diseases with significant impact: Entamoeba histolytica, Leishmania species (sp.), Trypanosoma brucei, T. cruzi, Cryptosporidium sp., Plasmodium sp., and Toxoplasma gondii. Clan CA proteases from three organisms (T. brucei, T. cruzi, and Plasmodium sp.) are well characterized as druggable targets based on in vitro and in vivo models. A number of candidate inhibitors are under development. CPs from these organisms and from other protozoan parasites should be further characterized to improve our understanding of their biological functions and identify novel targets for chemotherapy.


Introduction
Proteases are enzymes that catalyze the hydrolysis of peptide bonds and are important in a number of biological activities, including digestion of peptides, activation of other enzymes, modulation of the immune system, participation in the cell cycle, and differentiation and autophagy. There are at least 6 classes of proteases classified according to the nucleophilic group responsible for the first step in the proteolysis: serine, cysteine, metallo, aspartate, glutamate, and threonine proteases. Cysteine proteases (CPs) are categorized into 72 families, but not all are represented in protozoan parasites [1]. The most abundant and well characterized CPs in these organisms are the clan CA papain-family enzymes, named after an abundant protease present in papaya fruit. We selected 7 protozoan organisms with medical relevance for review of the current knowledge regarding their clan CA CPs. Table 1 summarizes well-characterized clan CA CPs from these organisms, and Table 2 lists some well-studied inhibitors. Plasmodium sp. Falcipain-1 PF3D7_1458000.1 Oocyst production [110] SERA family. Atypical CP and dipeptidyl aminopeptidase 3.

TGME49_249670
Cell invasion and replication [140,142] Single copy gene CatL (TgCPL)   TGME49_321530 Host cell invasion, digestion of cytosolic proteins [134,141] TgCP1 and TgCP2 (CatClike), exopeptidases TGME49_289620, TGME49_276130 Expressed in tachyzoites, degrade peptides [137] disease is the most lethal, killing over 60,000 people per year [20]. The factors determining clinical manifestations are not fully understood; these are strongly influenced by parasite strain or species and by immunological factors [21]. Leishmania express a broad range of CPs, the best characterized of which are CPA, CPB, and CPC (CPs A, B, and C), all part of clan CA, family C1. CPA and CPB are cathepsin L-like and likely show some functional redundancy [22], while CPC is cathepsin B-like. Cathepsin B-like and cathepsin L-like enzymes have key structural differences associated with divergent substrate binding and proteolytic mechanisms. In cathepsin L, binding of the peptide substrates spans the entire channel between the two domains of the protein, leading to endopeptidolytic activity. In contrast, cathepsin B has an additional occlusion loop that restricts substrate binding under low pH, leading to carboxypeptidase activity; this loop is displaced at high pH, at which point cathepsin B displays endopeptidase activity [23,24].
CPB genes are arranged in tandem arrays [25], with 19 copies in L. mexicana, 8 in L. major [26], and 5 in L. chagasi [27], while CPA and CPC are single copy [28,29]. CP gene expression is stage regulated; most are expressed at higher levels in the mammalian amastigote stage than in the insect promastigote stage [25,29]. However, CPB1 and CPB2 are expressed at higher levels in metacyclic promastigotes [30] and CPC in procyclic promastigotes [28]. Consistent with elevated CP expression in the amastigote stage, Leishmania CPs play key roles in the interactions between Leishmania and its mammalian host. CPA-knockdown or knockout L. chagasi and L. infantum show decreased in vitro [31] and in vivo infectivity [32]. Likewise, CPB-knockout L. mexicana displayed impaired macrophage infectivity and delayed lesion progression [33]. Multiple CPB copies are required for significant restoration of virulence [34]. CPB modulates host responses to L. mexicana by down-regulating protective Th1 immune responses, and in particular IFN-γ production, via degradation of the transcription factor NF-κB and subsequent inhibition of IL-12 production by infected macrophages [35]. L. mexicana CPs also cleave JNK and ERK MAP kinases. Both kinases negatively regulate IL-12 production, and therefore the protozoan cysteine protease CPB can alter host macrophage signaling by increasing IL-12 transcription [35]. L. amazonensis likewise inhibits antigen presentation via CP-mediated degradation of MHC class II molecules [36]. CPB also modulates levels of parasite proteins, including gp63 [37], a major virulence factor in Leishmania. Episomal expression

Trypanosoma brucei
Human African trypanosomiasis (sleeping sickness), caused by various subspecies of T. brucei, is found in sub-Saharan Africa. The flagellated parasite invades the blood-brain barrier, causing fatal damage in the central nervous system, with approximately 7,000 deaths per year [20]. Similar to Leishmania CPs, the best characterized T. brucei CPs are cathepsin-like enzymes, cathepsin L (TbCatL, also known as brucipain and rhodesain in different T. brucei subspecies) and cathepsin B (TbCatB) [49].
TbCatL and TbCatB are both involved in virulence. TbCatL promotes parasite crossing of the blood-brain barrier via activation of host G-protein-coupled receptors such as PAR2 (protease-activated receptor 2) and subsequent induction of host calcium-signaling pathways [

Trypanosoma cruzi
Chagas disease is endemic in the Americas and is the main cause of heart failure in Latin America, leading to more than 12,000 deaths every year [66]. It is caused by the parasite T. cruzi, which encodes genes from four clans of CPs: CA, CD, CE, and CF. Clan CA includes the most abundant protease of this parasite: cruzain (cruzipain), a papain-like cathepsin L-like member of family C1.
Cruzain is present in epimastigotes and bloodstream trypomastigotes and is identified as a major antigen in infected humans [67], and for that reason, it has been considered as a vaccine candidate [68]. It plays important roles in differentiation [69], metabolism [70], evasion of the immune response, and invasion of host cells [71]. In trypomastigotes (infective stage), cruzain is localized in the flagellar pocket, and in intracellular amastigotes, the enzyme is on the cell surface. Recombinant cruzain was expressed in bacteria and demonstrated protease activity after autocatalytic activation [72]. The crystal structure of cruzain complexed with the inhibitor Z-Phe-Ala-fluoromethyl ketone was resolved, confirming structural similarities with papain.
Genes coding for a 30-kDa cathepsin B-like CP, two other cathepsins, and other homologues of calpains (family C2) are also present [74]. Other members of clan CA include autophagin-like Atg4 protease (family of C54), responsible for processing Atg8 for the formation of the autophagosomes involved in the parasitic autophagy process essential for metacyclogenesis and virulence [75,76]. Atg4 has recently been proposed as a potential new target for chemotherapy [76].
The idea of developing CP inhibitors as Chagas disease chemotherapy originated in observations of in vitro antiparasitic effects associated with cruzain inhibition. Because of multiple copies of the cruzain gene, gene knockout could not be achieved to confirm the essentiality of the enzyme, even though "chemical validation" with protease inhibitors suggested that it is essential. The first proof of concept in an animal model showed that mice could be rescued from a lethal T. cruzi infection with the vinyl sulfone inhibitor K11777 dosed for 20 days [3]. Surviving mice had negative hemoculture, indicating parasitological cure. The same inhibitor was later tested in a dog model of Chagas disease, and it prevented cardiomyopathy after 7 days of treatment [2].
A number of other cruzain inhibitors are under study. Computational screening of the ZINC database identified inhibitors with nanomolar potency against cruzain [77]. Some isatins also have the thiosemicarbazone functionality, a well-known cruzain inhibitor group [78]. However, isatin compounds without a thiosemicarbazone also demonstrated inhibitory activity against cruzain [79]. These compounds have a peptide sequence recognized by the catalytic site of cruzain as well as the epoxy electrophile reminiscent of the E-64 inhibitor. Odanacatib, a reversible inhibitor of human cathepsin K with an amino nitrile warhead, was in Phase III trials to treat osteoporosis, but its development was discontinued due to risk of stroke (see the associated chapter, "Cysteine proteases as digestive enzymes in parasitic helminths," for information on odanacatib efficacy against hookworm infection). The nitrile warhead forms a reversible (albeit with a slow off rate) thioimidate with the active site of human cathepsin K [80]. Inspired by this approach, reversible inhibitors have been developed against cruzain with 100× selectivity compared to human cathepsins L, B, S, and F. Further optimization of these compounds led to the discovery of Cz007 and Cz008, with IC 50 s against cultured parasites in the nanomolar range and antiparasitic efficacy in a mouse model of Chagas disease [81]. More recently, the vinyl sulfone WRR-669 was demonstrated to be a noncovalent inhibitor of cruzain. These results, showing both in vitro and in vivo efficacy against T. cruzi, support cruzain as a valid and druggable target for Chagas disease. A review published in 2015 highlights inhibitors tested against cruzain [83].

Cryptosporidium parvum
Cryptosporidiosis, caused by Cryptosporidium sp., is a concern worldwide. In a large study in sub-Saharan Africa and southern Asia, this protozoan was the second most common cause of moderate-to-severe diarrhea in infants and the third most common cause in toddlers, accounting for an estimated 2.9 and 4.7 million cases annually in children <2 years old in these regions, respectively [84]. Acquired immune deficiency syndrome (AIDS) patients, people under immunosuppressive treatments, patients with inheritable immunodeficiency, diabetic people, infants, and old or malnourished people are the most susceptible to severe cryptosporidiosis [85]. Outbreaks have been described among caregivers and students of veterinary hospitals after contact with calves infected with C. parvum, the zoonotic species [86]. A major outbreak was reported in Milwaukee in 1993, in which 403,000 people were infected and the cost of outbreak-associated illness was US$96. Recently, otubain protease (OTU), a CP that participates in the ubiquitin pathway, has also been identified [92]. The biochemical properties of the otubain-like CP of C. parvum (CpOTU) were characterized, and the enzyme may have an essential function during the oocyst stage of the parasite, when its expression reached maximum levels [93]. The protein contains an unusual C-terminal extension (217 amino acids) compared to other OTUs previously identified in human, mouse, and Drosophila, and deletion of the extension resulted in complete loss of enzyme activity.

Plasmodium sp.
Malaria is by far the deadliest parasitic disease of humans, with 446,000 or 631,000 [94, 95] deaths estimated in 2015 using different modeling approaches. Plasmodium falciparum and P. vivax are the most common species infecting humans, with P. falciparum responsible for nearly all deaths. The parasites express multiple CPs, some of which are the subjects of recent reviews [96][97][98]. For P. falciparum, the genome sequence predicts 33 CP-like proteins, although a number of these are probably not active enzymes. The best characterized are 4 falcipains and 3 dipeptidyl peptidases, all clan CA proteases [96].
The functions of plasmodial CPs have been characterized using selective CP inhibitors and in some cases by gene knockout. Erythrocytic malaria parasites import erythrocyte cytosol and degrade hemoglobin in an acidic food vacuole as a source of amino acids [99] in a cooperative process involving enzymes of multiple catalytic classes, including CPs, aspartic proteases, and metalloproteases. Incubating parasites with broadly active CP inhibitors causes the food vacuole to swell and fill with undegraded hemoglobin, suggesting CP essentiality in hemoglobin processing. CPs also contribute indirectly to hemoglobin hydrolysis via the processing of plasmepsins to active enzymes [100]. The CPs with clear roles in hemoglobin hydrolysis are falcipain-2, falcipain-3, and dipeptidyl aminopeptidase 1. Knockout of falcipain-2 caused the food vacuoles of P. falciparum trophozoites to fill with undegraded hemoglobin, but this abnormality resolved later in the life cycle, presumably due to expression of falcipain-3 [101]. In contrast, knockout of falcipain-3 was not tolerated, suggesting that this protease is essential [102]. Some studies have also suggested roles for CPs in erythrocyte rupture at the completion of the erythrocytic cycle or in merozoite invasion of erythrocytes [103]. CP inhibitors blocked the rupture of erythrocytes by mature schizonts. Mediators of this process are predicted to be members of the SERA family, including the pseudo-CP SERA5 [104], the functional CP SERA6 [105], and dipeptidyl aminopeptidase 3 [103,106], although recent reports refute activity of dipeptidyl aminopeptidase 3 in erythrocyte rupture [98,107]. Recent advances have demonstrated a proteolytic cascade responsible for egress of merozoites from host erythrocytes; this cascade includes cleavage of the actin-binding domain of the erythrocyte cytoskeletal protein β-spectrin by SERA6 to mediate erythrocyte rupture, the final step required for merozoite egress [108]. Some studies have also suggested that CPs participate in erythrocyte invasion by asexual parasites, notably falcipain-1 [109] and dipeptidyl aminopeptidase 3 [98]. However, studies with protease inhibitors have generally not supported this conclusion; knockout of falcipain-1 did not block P. falciparum development [110,111], and antibodies against the endogenous P. falciparum CP inhibitor falstatin blocked invasion, suggesting that inhibiting falcipain-like CP activity facilitates invasion [112]. Considering another family of clan CA CP, a nucleolar calpain-like P. falciparum CP appears to be required for the development of erythrocytic parasites [113,114]. Also, a P. falciparum otubain-like CP was recently shown to localize to the apicoplast organelle and to be required for normal apicoplast and parasite development via inhibition of the predicted role of P. falciparum Atg8 in protein import to the apicoplast [115].
CPs appear to play additional roles in nonerythrocytic plasmodial stages. Considering liver stages, an unidentified plasmodial CP cleaves the circumsporozoite protein, which coats the sporozoites injected by mosquitoes, to enable invasion of hepatocytes [116]. In the murine parasite P. berghei, the orthologue of falcipain-1 appears to be critical for invasion of erythrocytes by hepatocyte-derived merozoites [117]. Considering mosquito stages, CP inhibitors and the knockout of falcipain-1 decreased oocyst production in mosquitoes [110,118]. Also, dipeptidyl aminopeptidase 2, which is expressed in gametocytes, may contribute to gamete egress [119].
Our understanding of the roles of CPs in the plasmodial life cycle suggests numerous potential drug targets. Falcipain-2 and falcipain-3 have low pH optima, consistent with activity in the acidic food vacuole, and both enzymes were localized to the food vacuole [120]. Falcipain-2 is more active against peptidyl substrates, uniquely able to activate and undergo autohydrolysis at neutral pH, and more stable at neutral pH. Considering specificity for peptide substrates and inhibitors, important differences were seen between falcipain-2, falcipain-3, and homologs from the rodent parasites P. berghei and P. vinckei; differences in specificity between falcipain-2 and falcipain-3 were less pronounced [121]. Both enzymes are synthesized as membranebound proforms that are processed, probably by autohydrolysis, to soluble mature forms. A related enzyme, falcipain-2 0 , is nearly identical in sequence and biochemical features to falcipain-2, but its role is uncertain, as in contrast to the case with falcipain-2 and falcipain-3, knockout of falcipain-2 0 had no clear phenotype [122]. The falcipains have some unique features for papain-family proteases, including unusually long N-terminal domains and insertions in the catalytic domain. Identified functions of these domains include trafficking of falcipain-2 to the food vacuole by upstream portions of the prodomain; enzyme inhibition by downstream portions of the prodomain; mediation of enzyme folding by short peptides immediately upstream of the catalytic domain [123]; and mediation of binding to the native substrate, hemoglobin, by a small insertion near the C-terminus of the catalytic domain.
Multiple studies have demonstrated that CP inhibitors have potent antimalarial effects [124]. With these inhibitors, a block in P. falciparum development is accompanied by a specific block in hemoglobin hydrolysis, marked by the appearance of swollen, hemoglobin-filled food vacuoles, and antiparasitic effects correlated with the degree of inhibition of falcipain-2 and falcipain-3. Drug discovery directed against falcipains is facilitated by the available structures of falcipain-2 and falcipain-3 complexed with small-molecule and protein inhibitors [125].
Peptidyl falcipain inhibitors with nanomolar antimalarial activity have included fluoromethyl ketones, vinyl sulfones, and aldehydes; in some cases, in vivo activity against murine malaria has also been demonstrated [126]. Promising nonpeptidyl falcipain inhibitors have included a series of nitriles that was extensively studied with many promising features, including excellent in vitro and in vivo potency [127]; this project was halted because of tissue binding that might predict idiosyncratic toxicity, but the evaluation of nitrile inhibitors is ongoing. Another interesting approach is the optimization of natural products, including analogues of gallinamide A, a compound from cyanobacteria with nanomolar antimalarial activity [128], and sugarcane cystatin [129].
Concerning the potential for resistance, parasites were selected for resistance to a vinyl sulfone falcipain inhibitor, but the selection was slow and the mechanism of resistance complex, without mutations in target enzymes, suggesting that resistance to falcipain inhibitors may develop slowly, especially with combination therapy [130]. Considering combinations, the activity of artemisinins, the mainstay of modern treatment for falciparum malaria, requires falcipain activity [131]; thus, falcipain inhibitors should probably not be combined with artemisinins. In contrast, inhibitors of CPs and aspartic proteases showed synergistic antimalarial effects, consistent with a complementary role for these two classes of enzymes and suggesting the potential for synergistic combination antimalarial therapy [132].

Toxoplasma gondii
Toxoplasma gondii is a foodborne pathogen with seroprevalence ranging from 10%-30% in North America and northern Europe to more than 80% in areas of Latin America and Africa [133]. Most infected individuals remain asymptomatic despite lifelong infection. In contrast, congenital transmission or infection of immunocompromised patients with AIDS or organ transplants can lead to fatal, disseminated disease [133]. T. gondii CPs have been shown to be important for invasion, digestion of host proteins for nutrition [134,137], and autophagy for cyst survival [138]. The Toxoplasma genome project revealed that the redundancy of CP genes is lower in T. gondii than in most other studied parasites. For example, E. histolytica has more than 50 genes encoding CPs with similar structure and specificity [139]. In contrast, T. gondii has genes encoding only one cathepsin B (TgCPB), one cathepsin L (TgCPL), and three cathepsin Cs (TgCPC1, 2, and 3), potentially making them more amenable drug targets. Active recombinant proteases have been expressed for all the T. gondii CPs, simplifying structurebased drug design.
TgCPB and TgCPL have been linked to host cell invasion. Targeting TgCPB with a peptidyl cathepsin B inhibitor or antisense RNA inhibited host cell invasion and in vitro growth and blocked infection in a chick embryo model of toxoplasmosis [142]. TgCPL acts as a maturase for TgCPB [143] and key adhesins, the microneme proteins MIC2-associated protein (M2AP) and MIC3. TgCPL knockouts were attenuated in virulence in acute infection in mice [134]. Both TgCPB and TgCPL have been localized in the vacuolar compartment, a lysosome-like organelle, where TgCPL has been shown to digest host cytosolic proteins. Most recently, TgCPL has been shown to be important for cyst survival in latent infection [138]. In TgCPL knockout strains or cysts incubated with the vinyl sulfone inhibitor LHVS, autophagy of autophagosomes in the vacuolar compartment was inhibited, resulting in abnormal cyst morphology and decreased survival.
The most developed peptidyl inhibitors target TgCPB and/or TgCPL. The crystal structure of TgCPL with morpholinurea-leucyl-homophenyl-vinyl sulfone has been determined, and the inhibitor can block host cell invasion [136] and cyst viability in vitro [138]. K11777 inhibits purified recombinant TgCPB and TgCPL in the nanomolar range and blocks host cell invasion, parasite replication, and viability in a chick embryo egg model [144]. Unfortunately, neither vinyl sulfone inhibitor is likely to cross the blood-brain barrier to prevent latent infection, so further optimization will be required.
The T. gondii cathepsin Cs are exopeptidases that are also potential drug targets. TgCPC1 was the most highly expressed cathepsin mRNA in tachyzoites; TgCPC3 was only identified in oocysts [137]. Both TgCPC1 and TgCPC2 localize to the dense granules and are secreted into the parasitophorous vacuole, where they degrade peptides. Both TgCPC1 and TgCPC2 were inhibited by Gly-Phe-dimethylketone, reducing parasite intracellular growth and proliferation. The same phenotype was not seen with a TgCPC1 knockout, as TgCPC2 expression was upregulated, suggesting the importance of inhibiting both enzymes [137].
Autophagy is a key process in all eukaryotic cells to remove and recycle misfolded proteins and damaged organelles. Autophagy is likely to be important in T. gondii tachyzoites to survive extracellular stress and for bradyzoites during latent infection [138]. Although only a limited number of autophagy proteins (Atg) are encoded in the T. gondii genome, and there are no classic lysosomes in this organism, autophagosome-like bodies are formed [138,145]. TgCPL appears to play an important role in the degradation of autophagosomes, as knockout or inhibition of TgCPL results in undigested proteins and organelles in the vacuolar compartment, limiting chronic infection in mice.
The clan CA cysteine proteinases of T. gondii have been identified as potential vaccine candidates. DNA vaccines containing TgCPB or TgCPL gene sequences individually or together produced both humoral and cellular immune responses in BALB/c mice. Following immunization, survival was prolonged following intraperitoneal challenge with tachyzoites, with the most significant effect from the combined TgCPB/TgCPL vaccine [146]. Similar results were seen with a TgCPC1 DNA vaccine [147].

Conclusion
From the 7 protozoan parasites causing human disease that are of interest in this review, only 2 genera (Trypanosoma and Plasmodium) have at least one well-characterized clan CA CP: cruzain in T. cruzi, TbCatL in T. brucei, and falcipain-2 and falcipain-3 in the malaria parasite. These enzymes were validated as drug targets using a variety of inhibitor chemistries.
One common theme is the observation that distinct pathogens employ related CPs to perform similar functions. For example, the process of invading a tissue in the case of extracellular parasites [10, 19, 51, 148] and invading a host cell in the case of intracellular parasites [71, 135,136,140] is highly dependent on clan CA proteases. It is also interesting to note that parasites have evolved different mechanisms to utilize CPs to modulate the immune system of the host. EhCP can directly degrade IgA, IgG, and IL-18 [11-15]. By contrast, CPB in Leishmania sp. modulates host responses by down-regulating protective Th1 immune responses, and in particular IFN-γ production, via degradation of the transcription factor NF-κB and the subsequent inhibition of IL-12 production by infected macrophages [26,35,149].
Apart from the more well-studied Trypanosoma and Plasmodium sp., it is important to continue investigations regarding the therapeutic potential of other protozoan CPs. In support of this, essentiality has been suggested or demonstrated for the CPs in many of the species discussed here. Furthermore, the potential for the emergence of other protease targets is great, considering that less than 10% of putative CPs found in the respective genomes have been so far characterized.

Key learning points
• Gaps in our knowledge: there are many genome copies of CPs per protozoan organism. Less than 10 enzymes have been well characterized per parasite and not more than 2 enzymes per pathogen have had their structures resolved by X-ray crystallography.
• Importance: the major functions of CPs shared by these protozoan parasites are host invasion and tissue penetration, virulence and evasion/modulation of the host immune system.