Placenta-Enriched LincRNAs MIR503HG and LINC00629 Decrease Migration and Invasion Potential of JEG-3 Cell Line

LINC00629 and MIR503HG are long intergenic non-coding RNAs (lincRNAs) mapped on chromosome X (Xq26), a region enriched for genes associated with human reproduction. Genes highly expressed in normal reproductive tissues and cancers (CT genes) are well known as potential tumor biomarkers. This study aimed to characterize the structure, expression, function and regulation mechanism of MIR503HG and LINC00629 lincRNAs. According to our data, MIR503HG expression was almost exclusive to placenta and LINC00629 was highly expressed in placenta and other reproductive tissues. Further analysis, using a cancer cell lines panel, showed that MIR503HG and LINC00629 were expressed in 50% and 100% of the cancer cell lines, respectively. MIR503HG was expressed predominantly in the nucleus of JEG-3 choriocarcinoma cells. We observed a positively correlated expression between MIR503HG and LINC00629, and between the lincRNAs and neighboring miRNAs. Also, both LINC00629 and MIR503GH could be negatively regulated by DNA methylation in an indirect way. Additionally, we identified new transcripts for MIR503HG and LINC00629 that are relatively conserved when compared to other primates. Furthermore, we found that overexpression of MIR503HG2 and the three-exon LINC00629 new isoforms decreased invasion and migration potential of JEG-3 tumor cell line. In conclusion, our results suggest that lincRNAs MIR503HG and LINC00629 impaired migration and invasion capacities in a choriocarcinoma in vitro model, indicating a potential role in human reproduction and tumorigenesis. Moreover, the MIR503HG expression pattern found here could indicate a putative new tumor biomarker.


Introduction
There is evidence that the X chromosome contains more genes related to sex and reproduction than we would be expected by chance [1]. Specifically, it is known that region Xq26 harbors critical genes responsible for placental and normal fetal development, including PLAC1 gene [2], which expression is restricted to placenta [3]. Interestingly, PLAC1 protein is present in a variety of tumor types, and its expression was previously associated with the malignant phenotype [4][5]. Until recently, PLAC1 has been the only candidate gene, within the chromosome region Xq26, relevant for normal placental development. Nevertheless, according to the most updated version of the human genome [2], new genes with unknown functions have been identified in this region, such as PHF6, MIR503HG, and LINC00629, the last two were previously known as MGC16121 and CR596471, respectively.
MIR503HG and LINC00629 genes are mapped between HPRT1 and PLAC1 loci (Genome Browse-UCSC; Feb. 2009, CRCh37/hg19) in opposite orientations and are about 3kb apart from each other (S1 Fig). Both genes present three exons and CpG islands in their putative promoter regions. Also, they were described as long intergenic non-coding RNAs (lincRNAs): RNAs longer than 200 bp, non-translated and located between protein-coding genes. LincR-NAs are involved in regulation of transcription, processing, and post-transcription pathways [6] and when deregulated, they are associated with several types of cancer [7].
Considering the significance of Xq26 region in embryo development and similarity between germinal and placental cells with tumors [8], we sought to characterize the structure, expression pattern, function, and regulation mechanism of MIR503HG and LINC00629 genes.

Results
MIR503HG and LINC00629 are highly expressed in reproductive tissues MIR503HG and LINC00629 genes are located in the same region as PLAC1, whose expression is restricted to placenta and recently was found to be expressed in cancer cells [2]. Therefore, we sought to determine the expression levels of MIR503HG and LINC00629 in RNA samples from a commercial normal human tissue panel. We found that MIR503HG expression is almost restricted to the placenta ( Fig 1A) and that LINC00629 was also highly expressed in placenta and other reproductive tissues ( Fig 1B). RT-qPCR analysis of 18 cancer cell lines revealed that MIR503HG is expressed in 50% (9/18) and LINC00629 in 100% of them, considering as not expressed samples with Cts values above 34 (Fig 1C and 1D, respectively).

MIR503HG and LINC00629 expression can be indirectly regulated by methylation
The presence of CpG islands in the promoter regions of both lincRNAs suggests that DNA methylation could regulate them. To test this possibility, we treated cancer cell lines with low expression of MIR503HG and LINC00629 with the demethylating agent 5-Aza-2-deoxycytidine (5-Aza-dC).
Treatment with 5-Aza-dC at 5 μM significantly increased MIR503HG expression in mammary gland tumor cell line (HCC1954) and the colorectal adenocarcinoma cell line (DLD-1). On the other hand, LINC00629 expression was increased by treatment only in the DLD-1 cell line (Fig 4). Likewise, the miRNAs miR-424 and miR-503, which are mapped in the MIR503HG gene, also presented increased expression after 5-Aza-dC treatment (Fig 4).
Interestingly, Methylation Sensitive High-Resolution Melting (MS-HRM) revealed that methylation status of the CpG islands near to putative promoter region of both genes did not change after treatment. Both control and treated samples were 100% methylated (Fig 5). To prove that DNA global demethylation was effective after 5-aza-dC treatment, we digested DNA derived from treated samples using MspI and HpaII restriction enzymes. HpaII is sensitive to DNA methylation within the CCGG region and an isoschizomer of MspI. Comparing DNA band intensities in agarose gel from samples digested with MspI or HpaII, we verified that 5-azatreated samples presented a lower methylation level than untreated samples (S2 Fig).

MIR503HG and LINC00629 new isoforms
Although MIR503HG and LINC00629 genes have been recently validated, we have found some isoforms not previously reported nor deposited at NCBI RNA reference sequences collection  Regarding to the MIR503HG gene, we identified an isoform that had a smaller 5' region and a longer 3' region when compared to the MIR503HG sequence, currently deposited in RefSeq ( Fig 6A). We denominated this new isoform MIR503HG2. Likewise, LINC00629 presented other four isoforms, two of them carried three exons and the other two had only two exons. In GenBank, we deposited the LINC00629 isoforms: LINC00629A and LINC00629B (2 exons); LINC00629C and LINC00629D (3 exons). We observed that in the three-exon isoforms, the first exon was mapped in a different position than in the original sequence previous described at RefSeq. We found no similar transcripts to the isoforms containing two exons in the same data bank (Fig 6B). Isoforms with the same number of exons also differ by their length in poly A tail.
The expression profile for the two and three-exons LINC00629 isoforms described herein is similar to normal tissues and cancer cell lines tested. However, the isoforms containing three exons were more expressed than the two-exons isoforms (S3 Fig).

Conservation and secondary structure
To evaluate the conservation and structure of the new isoforms, we compared them with similar sequences from other species deposited in the GenBank (RefSeq). The most similar sequences were found in other primates. Regarding MIR503HG2, the sequences with the highest identity were from Nomascus leucogenys, Papio anubis, Saimiri boliviensis and Callithrix jacchus. When aligned to the human genome, the last exons were apparently the most conserved ( Fig 7A). For LINC00629, we only evaluated one of each two-exons or three-exons isoforms, once they were very similar, differing only in the 3' end. For both types of isoforms, the most similar sequences were found in Pan paniscus, Pan troglodytes and Gorilla gorilla ( Fig 7B).
Analysis of the secondary structure of MIR503HG2 showed that the region corresponding to the exon 3, displayed a substructure similar to other primates (S4 Fig). MIR503HG is predominantly found in nucleus and LINC00629 is evenly spread in JEG-3 choriocarcinoma cell line To determine the cellular location from MIR503HG and LINC00629 RNAs, we extracted RNA from nucleus and cytoplasm from JEG-3 cells cultured in normal conditions. Fig 8 shows that MIR503HG RNA was mainly found in the nucleus, about 12-fold higher expressed than in cytoplasm (p<0.001), and LINC00629 was equally dispersed in cell compartments.

MIR503HG and LINC00629 inhibit migration and invasion in JEG-3 choriocarcinoma cell line
Taking into account that MIR503HG and LINC00629 were lower expressed in JEG-3 cell line than in normal placenta tissues (Fig 1), we used this cell line to overexpress the MIR503HG2 and the 3-exon LINC00629 isoforms (Fig 9).
We found that the overexpression of both lincRNAs reduced the percentage of migrating cells around 30% (p<0.01) and invading cells in approximately 40% (p<0.001) (Fig 10A and  10B). However, there were no change of cells in S phase in cell cycle assay (S5 Fig), suggesting that cell proliferation not is affected by these lincRNAs.

Discussion
Despite their functions, placenta, and germinative cells display some characteristics similar to tumor cells [8][9]. These include proliferation, migration, and invasion [10]. Consequently, the study of genes expressed in placenta and germinative cells could be considered a model system to investigate tumorigenic mechanism [10].
We showed here the characterization of MIR503HG and LINC00629 genes. Both genes are described as lincRNAs located in Xq26 region, which contains other genes related to  reproduction, fetal and placental development [2][3]. To our knowledge, gene expression profiles of both loci in normal and tumor cell lines have not been shown in the literature. According to Sasaki et al. [11] and Cabili et al. [12], most of the lncRNAs are expressed in a tissuespecific manner and probably regulate specific biologic process in each tissue.
MIR503HG and LINC00629 genes were higher expressed in placenta compared to other normal tissues. LINC00629 gene was also expressed in other tissues related to reproduction, such as cervix, ovary and testis. The same expression pattern was observed for miRNAs located near the studied loci (miR-424, miR-450a, miR-450b-5p, miR-503 and miR-542-3p). Takada et al. [13] showed in mice that the expression of miR-503 was restricted to placenta and ovary, supporting the suggestion that Xq26 region is involved in human reproduction and development. In a similar way, C19MC, which is exclusively expressed in placenta, contains miRNAs that are aberrantly expressed in specific human malignancies [14].
The positively correlated expression of MIR503HG and LINC00629 genes and neighbor miRNAs suggests that all of them might be regulated simultaneously and that genetic and epigenetic alterations found in tumors could disrupt this control. However, on the opposite of our finding, Fiedler et al. [15] verified that MIR503HG repression up-regulated miR-424 in Human Umbilical Vein Endothelial Cells (HUVECs). We suggest that this correlation may be tissue related.
MIR503HG, LINC00629, miR-424 and miR-503 were negatively regulated by DNA methylation after treatment with the demethylating agent 5-aza-2-deoxycytidine. A previous study had already demonstrated that methylation regulates miR-503 in non-small lung cancer cell lines [16]. Nevertheless, the CpG dinucleotides from CpG islands located near the MIR503HG and LINC00629 5' regions apparently have no effect on their expression. In this way, they must be regulated by demethylation in an indirect route. As observed in CTAs (cancer/testis-associated genes), the selective expression in germinal tissues and tumor cells can be regulated by LincRNAs Structure, Expression Pattern, Function, and Regulation Mechanism Characterization DNA methylation [17]. In this way, despite the fact that our study is about non-coding RNAs, both genes seem to be regulated by the same mechanism.
MIR503HG and LINC00629 loci predicted from Expressed Sequence Tags (ESTs), which are more prone to have mistakes, mostly in its ends [18]. The RACE approach revealed that LINC00629 locus transcribes two isoforms with two exons. Besides, the results showed that the three-exons isoforms presented higher expression than the two-exons isoforms, suggesting an important role in processes regulated by LINC00629 gene.
Even though long non-coding genes are less conserved than coding genes regarding nucleotide sequence, they present highly conserved secondary structures [19]. The transcripts herein described are relatively conserved when compared to RNA sequences found in other primates. As stated by Necsulea et al. [20], the MIR503HG gene was originated at least 370 Myr ago in the tetrapod ancestor and, curiously, its expression pattern changed from predominantly testicular in ancestors to placental in eutherians.
Interestingly, we found that the last exon of the MIR503HG2 transcript has a secondary substructure, which is very similar to transcripts from Nomascus leucogenys, Callithrix jacchus and Saimiri saimiri. The presence of a domain that is evolutionarily conserved in nucleotide sequence and RNA secondary structure suggests that it may represent a functional domain [21], and probably could be the most important region of these transcripts.
Functions of lncRNAs depend on cellular location, and most lincRNAs are located in the nucleus [22], as we found MIR503HG in JEG-3 cells. This suggests that it could act organizing sub-structures, altering the chromatin state or regulating gene expression [23]. On the other hand, LINC00629, which is evenly spread in nucleus and cytoplasm, may have a different role in global cellular function.
Functional assays demonstrated that overexpression of MIR503HG2 and LINC00629 threeexon isoform (LINC00629C and LINC00629D) decreased cell migration and invasion rates, indicating a potential role in tumorigenesis. Besides, once both lincRNAs are enriched in placenta and located in Xq26 region, we suggest that they could act as tumor suppressors in choriocarcinoma, which lost its ability to cease invasion and migration. Nevertheless, Fiedler et al. [15] verified that MIR503HG suppression inhibited migration and proliferation in HUVEC cells, indicating that its action must be influenced by cell context or isoform type.
Here we have characterized the structure, regulation by methylation and function of MIR503HG and LINC00629 genes. Based on their expression profile and their effects on migration and invasion in a model of choriocarcinoma, our study suggests a potential role for MIR503HG and LINC00629 genes in tumorigenesis and human reproduction, considering the similarity among normal placenta and germinal tissues to tumors. Additionally, the expression pattern found for MIR503HG could indicate a putative new tumor biomarker. All other cell lines utilized were purchased from ATCC or DSMZ and cultured according to recommended mediums. Cells were kept at 37°C and 5% CO 2 .

Plasmid transfection
The lincRNAs in this study were transfected in JEG-3 cell line using Lipofectamine 2000 reagent (Thermo Scientific, Catalog No. 11668-019) together with 500 ng of vector DNA for each well in 24 well plates. As a negative control, we used the empty vector. Experiments were performed 48 hours after transfection. The transfection efficiency and cell viability were evaluated by GFP expression and PI staining, using FACS Calibur flow cytometer (Becton Dickinson).

Migration and invasion assays
Cell migration was evaluated in 24-well transwell plates (Greiner BioOne, Catalog No 662638). Matrigel Invasion Chamber (Corning, Catalog No. 354480) replaced the filters, in the invasion assay. Cells from the upper compartment were removed with a cotton swab and cells that migrated to the lower face of the filter were fixed in 4% formaldehyde (in PBS) and stained with 0.5% crystal violet. The number of cells was manually counted using Image J software. All experiments were performed two to three times, independently.

Cell cycle assay
Cell cycle assay was done synchronizing cells by FBS starving for 24 hours before vectors transfection. After 48 hours post transfection, cells were fixed with ice-cold absolute ethanol overnight, added to RNAse A and PI and analyzed in FACS Calibur flow cytometer. All experiments were performed three times, independently.

5-Aza-dC treatment
For 5-Aza-,2-deoxycytidine (Sigma-Aldrich Co., Catalog No. A3656) treatment, 1.3x10 5 DLD-1, and HCC1954 cells were cultured in each well of a six-well plate. After 24 hours, 5-Aza-2-deoxycytidine diluted in DMSO (Sigma-Aldrich Co., Catalog No. D2650) was added to a final concentration of 5 μM. Mediums were changed each 24 hours adding fresh drug each day, for three days. At the end of the experiment, the cells were harvested, and the viability analyzed using trypan blue (Life, Catalog No. 15250-061). All experiments were performed three times, independently.

DNA and RNA extraction
RNA and DNA from cells were extracted with Trizol Reagent (Life, Catalog No. 15596-018), according to the manufacturer's protocol.

RT-qPCR analysis
To evaluate the expression profile of the lincRNAs, we used RNA extracted from cancer cell lines and a panel of 20 normal tissues (FirstChoice Human Total RNA Panel Survey, Ambion, Catalog No. AM6000). Reverse transcription was performed with High Capacity cDNA Reverse Transcription Kit (Thermo Scientific, Catalog No. 4368813), according to supplier´s protocol.
The RT-qPCR analysis was performed using Taqman Gene Expression Assay (Applied Biosystems) or IDT (Integrated DNA Technologies) probes. For MIR503HG RefSeq isoform we used the assay Hs03681341_m1 (Applied Biosystems) and for both RefSeq and MIR503HG2 isoforms we utilized Hs.PT.58.2631940 (Integrated DNA Technologies). For MIR503HG2 specific isoform, we used a custom probe with the following sequences: primer F: 5`CAG CCT TCC TGA AAG ACC A 3`; primer R: 5`TGT TGA TGT AGT GTT CCT GGG T 3`and probe: 5`CT CCA GTG G A CGC CTG CAG G 3`(Integrated DNA Technologies). For LINC00629 gene, we used the assay 186830593 (Applied Biosystems) for LINC00629A and LINC00629B isoforms and the assay Hs04274538_m1 (Applied Biosystems) for RefSeq, LINC00629C, and LINC00629D isoforms. Expression levels were normalized with endogenous genes GAPDH All the expression data, except the ones from the demethylating treatment, were analyzed by the formula 2 -ΔCt , in which ΔCt value was calculated using the geometric mean from endogenous genes. For the demethylating treatment experiment, we used 2 -ΔΔCt , with the same endogenous and considering samples without treatment as a reference sample.

Nucleus and cytoplasm expression assay
For nucleus and cytoplasm RNA separated extraction, we used PARIS Kit (Thermo Scientific, Catalog No. AM1921) according to supplier's protocol and utilized GAPDH RNA (Applied Biosystems, Catalog No. Hs02758991_g1), predominantly cytoplasmic, as a control. We normalized the expression levels with endogenous rRNA 18S (Applied Biosystems, Catalog No. 4319413E) and expression analysis was performed as the previous item.

DNA methylation assay
The DNA obtained from cell lines were subjected to treatment with sodium bisulfite, which is based on deamination of unmethylated cytosines to uracil and maintenance of methylated cytosines, in the presence of NaOH and sodium bisulfate [24]. For this procedure, we used Epi-Tect Bisulfite Kit (Qiagen, Catalog No. 59104), according to manufacturer's instructions.
The Methylation Sensitive High-Resolution Melting (MS-HRM) method allows analyzing the methylation percentages among different converted bisulfite DNA samples. This method is based on different dissociation times from double strand to single strand DNA, among distinctive methylated samples after PCR [25].
The sequences obtained by RACE technique and by PCR regarding the CpG islands were evaluated using the software Codoncode Aligner (CodonCode Corp.) and aligned to the human genome using the BLAT tool [26] (website: http://genome.ucsc.edu/).

Conservation and secondary structure analyses
Nucleotide sequences obtained by RACE technique were analyzed by The Basic Local Alignment Search Tool (NCBI-BLAST) [27][28][29], searching for similar RNA sequences contained in the NCBI Reference RNA sequences (RefSeq_rna) [28]. Then, sequences similar to nonhuman RNA were compared to the studied genes and further aligned to the human genome, through the BLAT tool [26] (website: http://genome.ucsc.edu/). Additionally, we predicted the common RNA structure using the TurboFold algorithm, available in RNAstructure platform (http://rna. urmc.rochester.edu/RNAstructureWeb/) [29], which presents the common RNA structures with the lowest free energy values.

Statistics Analysis
The expression data derived from cancer cell lines and normal tissue were analyzed with Pearson's correlation coefficient, using expression values found for each gene and the miRNA. For the demethylating treatment experiment, transfected cells assays and nucleus/cytoplasm expression assay, we used Student's t-test. For migration and invasion assays, we used ANOVA followed by Bonferroni post-test. All the statistical analysis were performed with GraphPad Prism 4 software and p<0.05 was considered as significant.