Analysis of Epichloë festucae small secreted proteins in the interaction with Lolium perenne

Epichloë festucae is an endophyte of the agriculturally important perennial ryegrass. This species systemically colonises the aerial tissues of this host where its growth is tightly regulated thereby maintaining a mutualistic symbiotic interaction. Recent studies have suggested that small secreted proteins, termed effectors, play a vital role in the suppression of host defence responses. To date only a few effectors with important roles in mutualistic interactions have been described. Here we make use of the fully assembled E. festucae genome and EffectorP to generate a suite of 141 effector candidates. These were analysed with respect to their genome location and expression profiles in planta and in several symbiosis-defective mutants. We found an association between effector candidates and a class of transposable elements known as MITEs, but no correlation with other dynamic features of the E. festucae genome, such as transposable element-rich regions. Three effector candidates and a small GPI-anchored protein were chosen for functional analysis based on their high expression in planta compared to in culture and their differential regulation in symbiosis defective E. festucae mutants. All three candidate effector proteins were shown to possess a functional signal peptide and two could be detected in the extracellular medium by western blotting. Localization of the effector candidates in planta suggests that they are not translocated into the plant cell, but rather, are localized in the apoplastic space or are attached to the cell wall. Deletion and overexpression of the effector candidates, as well as the putative GPI-anchored protein, did not affect the plant growth phenotype or restrict growth of E. festucae mutants in planta. These results indicate that these proteins are either not required for the interaction at the observed life stages or that there is redundancy between effectors expressed by E. festucae.


Introduction The Data
The analyses presented here combined data from a number of earlier studies, as cited in the Methods secton of the main paper. Data from each of these studies has been combined in a single file in the data directory of this repository.

Column
Description of data gene gene ID length protein length (aa) nr_cysteines number of cysteines sp_classificatoin signalP classification (non-secreted, secreted, ssp) ep_classificatoin EffectorP classification (NA for proteins > 200 aa) ep_prob EffectorP probability apo_classificatoin ApoplastP classification (NA for non-secreted proteins) apo_prob ApoplastP probability expr_C_RPMK Gene expression in culture expr_P_RPMK Gene expression in planta PvC_qval Q-value for differential expression culture v planta PvC_L2FD log2 fold difference in expression planta v culture Hep_L2FD log2 fold differnce in expression HepA del v WT dAT distance to nearest AT-rich sequence MITE_nearest nearest MITE MITE_d distance to neartest MITE name name of gene (used in plots) Eathon_2015 was gene identified as ssp in Eaton 2015? analysed Was gene analysed by deletion mutatnt sspcolumn is gene a probable ssp (used inplots) The key column for our analysis is sp_classification, which identifies small (< 200bp) secreted proteins (labeled ssp), larger secreted proteins (labeled secreted) and proteins that do not have a classical secretion signal (labeled non-secreted). As expected, only a small proportion of all genes ssps: knitr::kable(t(table(all_genes$sp_classification))) non_secreted secreted ssp 7690 536 141 We will often use this classification as a point of comparison, so we can create a version of our data grouped by this column for future use.

Helper functions
The file utils.R has some helper functions that we will used to compare fitted regression models and plot some results. Here we use source to import these functions.

source( utils.R )
The proteins themselves Are effectors cysteine rich?
We expect small secreted proteins to be active within the plant apoplast, where disulfide bridges formed between cysteine residues are known to stabilise protein structures. We thus begin by testing whether putative effectors are more cysteine-rich than other proteins. Specifically, we fit a logistic regression in which secretion class (i.e. non-secreted v secreted v small-secreted) predicts the proportion of amino acid residues that are cysteines.
A model with secretion class is a substantially better fit than a null model (every time we show these tales, the model with the lower AIC is the best-fitting one): Clearly, the median distance between proteins and MITEs is lowest for ssps. If MITEs alter gene expression we expect them to act in cis. So we will focus in particular on MITEs near to the promotor region of a gene (that is 2kbp upsteram of a transcription start site) First we compare the proportion of genes with a MITE this close between secretion-classes:  A non-significant result means we cannot say particular MITE families are more or less likely to be near to putative effectors or other secreted proteins).

Do SSPs form clusters?
In some fungi, small secreted proteins appear in clusters of presumably co-regulated genes. Given that we have the location of all protein-coding genes in the Fl1 genome, we can test whether the small secreted ones are more likely to occur near to each other than we would expect by chance.
We start by identifying the locations of small secreted proteins using the file protein_data_locations. As the name suggests, this file has some summary data on each protein, including prot_type which assigns proteins to non-secreted, larged-secreted and small-secreted groups.

Nearest-neighbour distance for small-secreted proteins
We want to know if the small-secreted proteins tend to come in 'clumps', or if they are even spread out along the genome. To start investigating this, we need to calculate the minimum distance to another SSP for each of our SSPs (i.e. the minimum within-group distance).
Thankfully the location data (the last three columns of the protein data) is in bed format, so we can used bedtools to calculate these distances. First, we need to write out a bed file for only the secreted proteins: x[order(x$chrom, x$start, x$end),] } secreted_loci <select( filter(prot_data, prot_type=="small-secreted"), chrom, start, end, gene ) write.simple.table(sort_bed(secreted_loci), "secreted_locs.bed") Now we can use bedtools closest to get the nearest neighbour for each protein. Setting the -io flag means we ignore exact matches ('overlaps') so all distances are to another ssp. We also use cut to remove some extraneous columns from the output.

Simulating a null distribution
To know if these genes are more clustered than we might expect by chance we need to simulate a null distribution. We do that by repeating the procedure we used to calculated within-group neighbour distances for random subsets of genes, each the same size as SSP dataset.
Here I select all the non SSP proteins, and write the locations of 1 000 random subsets of these genes. . .

Null distribution of mean nearest−neighbour distance
Mean distance to nearest gene In this graph the histogram shows the null distribution and the read line our observed data. So, we find no evidence that SSP are on average closer to each other than we'd exect by chance. Here is a p-value for the test.

Are effectors differentially expressed in planta
Effectors are often lowly-expressed or silenced when fungi are grown in axenic culture, and only highly expressed when they infect a plant. We can used previously published RNAseq data to investigate whether putative effectors are more highly expressed in planta We use the log2 fold difference in gene expression as summary statistic for each gene (here positive numbers represent higher expression in planta). So, indeed, putative effectors have over-all higher gene expression in planta (mean > 1 unit of log2 fold difference) and higher proportion of these genes have substantial (> 2) and significant differences in expression.

Are effectors released from repression in HepA deletion strains?
In many species, chromatin-state contributes to the regulation of effectors. We can use RNAseq data from a Heterochromatin protein 1 (hepA) knockout strain to investigate whether the absence of this protein releases effector from expression in culture.
At first glance, there appears to be little difference among gene classes:

How is effector expression altered in symbiosis mutants?
Ee can recreate figure 2C from the paper, comparing the expression of different gene classes across four different symbiosis mutants in planta.

Plant phenotype analyses.
Finally, we will analyse the plant phenotype data discussed in the man text of the paper. In each case we will compare grasses infected with genetically modified fungi (either deletion or over-expression strains) to those infected by wild-type fungi (here labeled FL1).

Tiller number
Starting with data on the number of tillers in a plant. Visually, there is little evidence for a difference between the wild type and any of the mutants.