Long Non-Coding RNA Expression Profiles in Hereditary Haemorrhagic Telangiectasia

Hereditary Haemorrhagic Telangiectasia (HHT) is an autosomal dominantly inherited vascular disease characterized by the presence of mucocutaneous telangiectasia and arteriovenous malformations in visceral organs. HHT is predominantly caused by mutations in ENG and ACVRL1, which both belong to the TGF-β signalling pathway. The exact mechanism of how haploinsufficiency of ENG and ACVRL1 leads to HHT manifestations remains to be identified. As long non-coding RNAs (lncRNAs) are increasingly recognized as key regulators of gene expression and constitute a sizable fraction of the human transcriptome, we wanted to assess whether lncRNAs play a role in the molecular pathogenesis of HHT manifestations. By microarray technology, we profiled lncRNA transcripts from HHT nasal telangiectasial and non-telangiectasial tissue using a paired design. The microarray probes were annotated using the GENCODE v.16 dataset, identifying 4,810 probes mapping to 2,811 lncRNAs. Comparing HHT telangiectasial tissue with HHT non-telangiectasial tissue, we identified 42 lncRNAs that are differentially expressed (q<0.001). Using GREAT, a tool that assumes cis-regulation, we showed that differently expressed lncRNAs are enriched for genomic loci involved in key pathways concerning HHT. Our study identified lncRNAs that are aberrantly expressed in HHT telangiectasia and indicates that lncRNAs may contribute to regulate protein-coding loci in HHT. These results suggest that the lncRNA component of the transcriptome deserves more attention in HHT. A deeper understanding of lncRNAs and their role in telangiectasia formation possesses potential for discovering therapeutic targets in HHT.


Introduction
Hereditary Haemorrhagic Telangiectasia (HHT) is an autosomal dominantly inherited vascular disease characterized by the presence of mucocutaneous telangiectasia and arteriovenous malformations (AVMs) in visceral organs, primarily the lungs, liver and brain. The most common clinical manifestation is spontaneous and recurrent epistaxis [1,2], originating from nasal telangiectasias, which can be difficult to prevent and can lead to severe anaemia. Moreover, pulmonary arteriovenous malformations (PAVMs) are seen in approximately one third of patients and are potentially lethal, due to haemorrhage or shunting of blood through these abnormal blood vessels, which can cause paradoxical embolisms and cerebral abscesses [2,3]. The clinical manifestations of HHT are extremely variable, even between family members, and age-dependent penetrance is present. The reported prevalence in Denmark is around 1 in 6,500 [4]. Around 85% of clinically diagnosed HHT patients [5][6][7] carry a mutation in either ENG (HHT1) or ACVRL1 (HHT2). Patients with mutations in ENG or ACVRL1 are clinically similar, as all reported manifestations are known to occur in both. However, later onset and lower penetrance, less cerebral and pulmonary AVMs, but more liver involvement and a risk of developing pulmonary arterial hypertension, are observed in patients with ACVRL1 mutations. Currently there is no cure for the disease; only symptomatic treatment is possible.
HHT manifestations are thought to result from an imbalance in the process of angiogenesis. Angiogenesis is the development of new capillary blood vessels from pre-existing vessels and is controlled by different cytokines such as vascular endothelial growth factor (VEGF) and transforming growth factor beta 1 (TGFb1). ENG and ACVRL1 encode receptor proteins: Endoglin and ALK1 (Activin A receptor type II-like 1) (SKR3) respectively, which are components of the TGF-b signalling pathway. The TGF-b signalling pathway plays a complex and important role in development and homeostasis of many organs, including the vascular system. In the latter, it is involved in cell proliferation, migration, extracellular matrix formation, vascular smooth muscle cell differentiation and vascular tone. As Endoglin and ALK1 proteins are predominantly expressed in endothelial cells, these are primary cellular targets of the disease. Thus, HHT manifestations are caused by a disturbance in the TGF-b signalling pathway [8][9][10]. The exact mechanism of how haploinsufficiency of ENG and ACVRL1 leads to HHT manifestations remains to be identified.
Our knowledge of the non protein-coding part of the human transcriptome is expanding rapidly, and recently attention has shifted towards the most numerous but still poorly understood group-the long non-coding RNAs (lncRNAs). Long non-coding RNAs (lncRNAs) are defined as eukaryote RNAs longer than 200 nucleotides in length, without protein coding capacity. This arbitrary limit of 200 nucleotides distinguishes lncRNAs from small regulatory RNAs, such as microRNA and short interfering RNA (siRNA). Studies indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/ intron lengths [11]. LncRNAs have been shown to be involved in major mechanisms of transcriptional and post-transcriptional regulation, such as targeting transcription factors, initiating chromatin remodelling, directing methylation complexes and blocking nearby transcription [12]. LncRNAs appear to control expression of protein-coding genes through both cisand transacting pathways [11]. Moreover, lncRNAs tend to be expressed at lower levels than protein-coding genes and display more tissuespecific expression patterns [11].
Of the many identified lncRNAs, only few have been characterized functionally [13]. A complete catalogue of lncRNAs is not yet available; nonetheless, in 2012 the GENCODE consortium published the most complete set of human lncRNAs to date [11], comprising 9277 manually annotated genes producing 14,880 transcripts. GENCODE annotate lncRNAs according to their localization regarding the nearest known protein transcripts: exonic, intronic, overlapping or intergenic [11].
To identify potential HHT therapeutic targets, further knowledge on how a disturbance of the TGF-b signalling pathway leads to HHT manifestations is needed. As lncRNAs are increasingly recognized as key regulators of gene expression we assessed the lncRNA expression in HHT tissue. Thus, the purpose of this study was to investigate the possible involvement of lncRNAs in the molecular pathogenesis of the telangiectasia formation in HHT patients. This was done by expression profiling of lncRNAs in 40 telangiectasial and 40 non-telangiectasial samples from HHT patients using microarray technology.

Ethics statement
This study was approved by the ethics committee of Southern Denmark (S-20090131), and the patients provided verbal and written informed consent to participate in this study in accordance with the Declaration of Helsinki.

HHT patients
Nasal mucosal biopsy specimens of telangiectasial and nontelangiectasial tissue in pairs from HHT1 (n = 19) and HHT2 (n = 21) patients were used in this study. The nasal mucosa was anaesthetized with gauze impregnated with Lidocain phenylephrinehydrocloride. No infiltration anaesthesia was used. The samples were collected with a Weil Nasal Forceps by an experienced rhinologist and contained either macroscopicly visible telangiectasia or natural mucosa. All patients showed signs of HHT according to the Curaçao Criteria [14] and carried the familial pathogenic mutation in either ENG or ACVRL1. The 80 paired samples represented 14 different HHT1 families and 14 different HHT2 families. The familial mutations were of different types and distributed throughout the two genes (Table 1) [15]. The patients were aged 28-56 years (median age 40) and included 21 women and 19 men. Patients were diagnosed and treated in the national HHT Centre of Denmark in Odense University Hospital.

RNA isolation and quality
Samples were collected in RNAlater Solution (Life Technologies) and stored at 220uC. Total RNA was isolated using the RNeasy Micro Kit according to the RNeasyH Micro Handbook (Qiagen). The quantity of RNA was assessed with the NanoDrop spectrophotometer ND-8000 (NanoDrop Technologies), and RNA quality was assessed by the Agilent 2100 Bioanalyzer (Agilent Technologies).

Microarray labelling, hybridization, and scanning
Sample labelling and array hybridization were performed according to the Two-Color Microarray-Based Gene Expression Analysis-Low Input Quick Amp Labeling-protocol (Agilent Technologies) using the SurePrint G3 Human Gene Expression 8660 microarray format (Agilent Technologies). Samples were labelled with Cy5 and Universal Human Reference RNA (Stratagene) was labelled with Cy3 and used as a common reference on all arrays.

Data pre-processing and annotation of lncRNAs
Agilent Feature Extraction software v. 10.7.3.1 (Agilent technologies) was used to analyse acquired array images. Data were then within-array normalized by Loess normalization method and between-array normalized by Quantile normalization. The normalized values were used to calculate log 2 transformed Cy5/Cy3 ratios. Replicate probes were collapsed calculating the median. Missing expression values were imputed by k-nearest neighbours averaging (k = 10). Date pre-processing was performed using the R (R Core team 2012) package limma [16]. Microarray data were deposited to the Gene Expression Omnibus (GSE53515).
All the 42,164 probes of the Agilent SurePrint G3 array were reannotated using GENCODE v.16 gene annotation database (www.gencodegenes.org) [11]. The genomic coordinates of the probes in the Agilent array were matched to the genomic coordinates of the lncRNAs from the GENCODE v.16, to identify the probes covering lncRNAs. LncRNAs were included if 55 base pairs overlapped with the 60-mer array probes.

Data analysis
Data analyses were performed using the Qlucore Omics Explorer 2.3 software (Qlucore). Differentially expressed lncRNAs comparing telangiectasial and non-telangiectasial tissue were ranked according to statistical significance determined by twogroup comparison (paired t-test). This was done for the groups HHT1 and HHT2 seperately and subsequently for the total group of HHT. Multiple testing was adjusted for by the Benjamini-Hochberg method. Differentially expressed lncRNAs were chosen for further evaluation (q,0.15). Principal component analysis (PCA) and hierarchical clustering were performed in Qlucore Omics Explorer 2.3 to examine whether telangiectasial and nontelangiectasial samples could be separated.

Genomic Regions Enrichment of Annotation
To assess the potential cis-regulatory effect of the differentially expressed lncRNAs in HHT, we used the Genomic Regions Enrichment of Annotations Tool (GREAT) [17]. This program has genomic coordinates as inputs and outputs nearby genes and their ontologies. It analyses the functional significance of cis-regulatory regions identified by localized measurements of DNA binding events across an entire genome.

Correlation analysis
To identify potential trans-regulated transcripts, we assessed the correlation between the expression levels of the 10 highest ranking statistically significantly differentially expressed lncRNAs (HHT total) and every transcript present on the Agilent array by calculating Pearson's correlation coefficient. The transcripts were ranked according to the corrected p-values for the correlation coefficients; this was performed for each lncRNA of interest. Correction for multiple testing was performed using the Bonferoni correction. Manhattan plots were drawn using the Integrative Genome Viewer (IGV) version 2.3.18 [18], based on each of these

Annotation of lncRNAs
All the 42,164 probes of the Agilent SurePrint G3 array were reannotated using GENCODE v.16 gene annotation database (www.gencodegenes.org) [11]. We identified 5,069 probes mapping to 2,939 lncRNAs. These 5,069 probes included both intronic and exonic lncRNA probes. Of these 5,069 probes, 404 mapped to overlapping lncRNAs and mRNAs -of these, 259 probes were excluded as they mapped to overlapping mRNA exons and lncRNA introns -only probes mapping to overlapping mRNA exons and lncRNA exons were included. Further analysis was performed using the 4,810 remaining lncRNA probes, mapping to 2,811 lncRNAs, covering almost 31% of the GENCODE lncRNA dataset ( Figure S1).

Identification of differentially expressed lncRNAs in HHT telangiectasia
We detected 617 statistically significantly differentially expressed lncRNAs (q,0.05, paired t-rest) when comparing telangiectasial and non-telangiectasial samples in the total HHT group (80 paired samples), of which 42 were highly statistically significant (q,0.001). Of the 42 differentially expressed lncRNAs (listed in Table 2), 16 were upregulated and 26 downregulated. None of these 42 lncRNAs are characterized functionally.
Analysis of the HHT1 samples resulted in 70 differentially expressed lncRNAs (q,0.05, paired t-test) when comparing telangiectasial and non-telangiectasial samples, and in HHT2 analysis resulted in 114 differentially expressed lncRNAs (q,0.05, paired t-test). The number of common lncRNAs (q,0.05) in the three groups is rather small as shown in Figure 1.
The complete results of the paired t-tests in HHT and the subgroups HHT1 and HHT2 are provided in Table S1, S2, and S3.  (Figure 2c). No statistically significant difference in expression values between HHT1 telangiectasial tissue and HHT2 telangiectasial tissue was observed. Even so, the data suggest that there is a minor difference as the comparison and grouping of the subgroups introduce more variation in the PCA plots (Figures 2a and 2b compared with Figure 2c).

Principal component analysis (PCA) and hierarchical clustering
Hierarchical clustering: amongst the differentially expressed lncRNAs (q,0.001) in the HHT group both up-and downregulation was observed (Figure 3). Hierarchical clustering for HHT1 and HHT2 is shown in Figure S6 and S7.

Genomic Regions Enrichment of Annotation
HHT1. The differentially expressed lncRNAs with q,0.15 (n = 617) were loaded into GREAT. Of these, 31% map within 50 kilobases (kb) of an annotated gene, and 89% map within 500 kb of a gene ( Figure S2). Gene ontology [19] (www.geneontology.org) analysis, performed in GREAT, of the 617 neighbouring genomic  regions (826 genes) is seen in Table 3. The ontologies of those loci concerned six gene ontology (GO) terms (binomial false discovery rate (FDR) q-value,0.001), of which three were considered key terms as they are central to HHT pathogenesis: blood vessel morphogenesis (GO:0001568, 3747 gene products), blood vessel development (GO:0001568, 3747 gene products) and vasculogenesis (Synonym: Vascular morphogenesis, GO:0001570, 557 gene products). Table 4 (and Tables S4-S6 in File S1) lists the genes involved in the three GO terms and their relative positions. In the GO terms tree ( Figure 4) the close connection and overlap of the three mentioned GO terms can be seen.
HHT2. The differentially expressed lncRNAs with q,0.15 (n = 640) were loaded into GREAT. Of these, 31% map within 50 kilobases (kb) of an annotated gene, and 89% map within 500 kb of a gene ( Figure S3). Gene ontology analysis of the 640 neighbouring genomic regions (887 genes) is seen in Table 5. The ontologies of those loci concerned five GO terms (binomial FDR q-value,0.001), of which especially vasculogenesis was    Table S7 in File S1) lists the genes involved in this GO term and their relative positions.
HHT. The differentially expressed lncRNAs with q,0.05 (n = 617) were loaded into GREAT. A more stringent cut-off of q,0.05 was applied here, as the sample size was larger. Of these lncRNAs, 31% map within 50 kilobases (kb) of an annotated gene, and 90% map within 500 kb of a gene ( Figure S4). Gene ontology analysis of the 617 neighbouring genomic regions (837 genes) is seen in Table 6. Applying a cut-off of q,0.001 resulted in three GO terms: vasculogenesis, lactation and blood vessel morphogenesis. Table 4 (and Tables S8 and S9 in File S1) lists the genes involved in the GO terms 'vasculogenesis' and 'blood vessel morphogenesis' and their relative positions.
By GREAT analysis, 63 lncRNAs were identified, which is equal to 10.2% of the differentially expressed lncRNAs in the HHT group (n = 617, q,0.05). Concordingly, 89.8% of statistically significantly differentially expressed lncRNAs in the HHT group are not part of the three selected statistically significant GO terms in GREAT.

Correlation analysis
To identify potential trans-regulated transcripts, we assessed the correlation between the expression levels of the 10 highest ranking differentially expressed lncRNAs in the Hereditary Haemorrhagic Telangiectasia (HHT) total group and all transcripts present on the Agilent array, by calculating Person's correlation coefficient.
The Manhattan plots drawn on this basis showed that several statistically significant correlations were present for each lncRNA in question (40-3725 correlations, p,0.05 (Bonferoni-corrected)(data not shown)), and that the correlating transcripts were mapping to many different chromosomes ( Figure 5 shows the top three ranking lncRNAs). More than half the correlated transcripts were lncRNAs (data not shown). In the cases of multiple lncRNAs neighbouring a gene you see a clear correlation of foldchange/direction; meaning that if one lncRNA neighbouring a gene is upregulated, in most cases (37/49) so are the other(s) ( Table 4).

Discussion
To our knowledge, this is the first study to assess the regulatory effects of long noncoding RNAs (lncRNAs) in HHT affected tissue. Using microarray technology, we identified lncRNAs that are statistically significantly differentially expressed in HHT telangiectasial tissue compared with HHT non-telangiectasial nasal tissue. Using GREAT, a tool which assumes cis-regulation, we also showed that those lncRNAs are enriched for genomic loci involved in key pathways concerning HHT. The comparison of expression profiles between HHT telangiectasial and HHT non-telangiectasial nasal tissue clearly showed the differential expression of a substantial number of lncRNAs. Principal component analysis (PCA) and hierarchical clustering were able to discriminate between HHT telangiectasial and HHT non-telangiectasial nasal Figure 4. Gene Ontology terms tree. Graphical view of the gene ontology (GO) terms tree showing part of the GO terms in the ontology biological process and their connections. Our three GO terms; blood vessel development, blood vessel development, and vasculogenesis (in red boxes), are the more narrow and specific terms in the tree and overlap by a number of genes. doi:10.1371/journal.pone.0090272.g004 Table 5. Gene ontology (GO) analysis of 640 genomic regions (887 genes) located nearby differentially expressed long non-coding RNAs in the HHT2 group.  tissue, demonstrating an HHT telangiectasial profile. However, gene expression profiling did not allow us to discriminate between HHT1 and HHT2, possibly because of the size of the study and lack of statistical power. Nonetheless, HHT1 and HHT2 had somewhat different groups of differentially expressed lncRNAs with a smaller number of common lncRNAs, which may be due to the minor phenotypical and pathway differences of HHT1 and HHT2.
The lncRNA list used in this study was retrieved from the GENCODE v.16 dataset, which uses a combination of manual annotation, computational analysis and targeted experimental validation and is the largest catalogue of human lncRNAs to date. It was used in order to retrieve optimal lncRNA annotation. To minimize the number of probes mapping only or mostly to mRNAs, we filtered out probes mapping to overlapping mRNA exons and lncRNA introns.
LncRNAs appear to control expression of protein-coding genes through both cisand trans-acting pathways. As cis-regulation is reported to be more pronounced for lncRNAs [11], we chose to use GREAT for further analysis of our data. GREAT is a tool that can be used to relate the lncRNA transcriptome to biology using well-known mRNA function and gene ontology (GO) terms. Using GREAT, three interesting and statistically significant GO terms were the result: blood vessel morphogenesis, blood vessel development, and vasculogenesis. All three were located inside the top 6 resulting GO terms. Blood vessel morphogenesis is defined as: 'The process in which the anatomical structures of blood vessels are generated and organized. The blood vessel is the vasculature carrying blood'. Blood vessel development is defined as: 'The process whose specific outcome is the progression of a blood vessel over time, from its formation to the mature structure'; and vasculogenesis: 'The differentiation of endothelial cells from progenitor cells during blood vessel development, and the de novo formation of blood vessels and tubes'. These GO term results are interesting as HHT manifestations are thought to result from imbalanced angiogenesis. The results may point to a causal effect of lncRNAs regulating mRNAs central to vasculogenesis, angiogenesis, and perhaps even HHT telangiectasia formation.
In most of the cases of multiple lncRNAs neighbouring a gene, a clear correlation of direction was observed. However, a correlation of direction was not present in all cases, which could indicate that a more fine-tuned regulation is being played out. Each GO term involved both up-and downregulated lncRNAs, which may suggest local cis-regulation at multiple sites.
The three GO terms contain repeatedly mentioned genes. Accordingly, that indicates a central group of genes that seems to be cis-regulated by differentially expressed lncRNAs. This central group of genes primarily involves: CAV1, CCM2, FOXF1, FZD4, PRSS23, RASA1, SMO, TIPARP, ZFPM2 and ZMIZ1. Some of these are genes known to cause vascular malformation disorders, such as HHT: CCM2 causes cerebral cavernous malformation type 2 [20] and RASA1 causes capillary malformation-arteriovenous malformation syndrome (CM-AVM) [21].
Another approach in the data analysis was the correlation study. The correlations between the expression levels of the 10 highest ranking differentially expressed lncRNAs in the HHT total group and every transcript present in the Agilent microarray were assessed by calculating Pearson's correlation coefficient. Multiple statistically significant correlations, mapping to many different chromosomes, were present for each lncRNA in question. Hence, the correlation analyses may indicate that multiple trans-regulative mechanisms are present. More than half the correlated transcripts are lncRNAs and, as these tend to be more tissue-specific than mRNAs, it could be that they are all co-regulated. Nevertheless, these results suggest only a certain degree of whole-genome correlation, and this may not be associated to the HHT phenotype at all.
In 2007, two gene expression microarray studies were published describing HHT samples. Thomas et al. [22] studied cultured human umbilical vein endothelial cells from seven newborns carrying the familial HHT mutation, using human umbilical vein endothelial cells from three healthy newborns as controls. Fernandez-Lopez et al. [23] studied blood outgrowth endothelial cells from three HHT patients using blood outgrowth endothelial cells from healthy donors as controls. Neither of the two papers described non-coding RNA, and their raw data have not been made available for re-annotation and data analysis in order to compare with our results.
The strength of our study is that affected HHT tissue was analysed in a paired design with samples from a relatively large number of patients. This resulted in a large number of differently expressed lncRNAs, even though we applied a strict cut-off. GREAT is a useful tool to relate lncRNAs to known biology, but it is limited to cis-regulation. Of the 617 aberrantly expressed lncRNAs in the HHT group, only 63 were identified in relevant GO terms by GREAT, indicating that regulative mechanisms other than cis-regulation may be important. Future studies are needed to validate the results of our study and to contribute knowledge about specific lncRNA functionality and both cisand trans-regulatory mechanisms.
In summary, our study identified lncRNAs that are aberrantly expressed in HHT telangiectasia and indicates that lncRNAs may contribute to regulate protein-coding loci in HHT, suggesting that the lncRNA component of the transcriptome deserves more attention in HHT research. Thus, a deeper understanding of lncRNAs and their role in telangiectasia formation possesses potential for discovering therapeutic targets and for identifying new biomarkers.