Reader Comments

Post a new comment on this article

Question and revalidating about mRNA dataset

Posted by deepreds on 28 Feb 2007 at 22:31 GMT

Dear Authors,

Thank you for nice work it is nice materials for fusion gene detection and analysis.

I have a few question for you.

1. EST dataset? It wold be greatly helpful if you publish your list of fused EST dataset as a supplementary material.

2. From the mRNA dataset, I prepared fasta files in order to align using newest version of BLAT & GMAP. I got consistent results for most of mRNA dataset. But I found some discrepancy for some of mRNA sequences. Maybe one of the fusion side is comprized of repeat sequences. For example, AF118081 mapped chr11:1975060-1982822 from 0 to 727, and chr19:41196814-41253812 from 484 to 1374. There are 243 bp overlap between two alignments which indicates AF118081 cannot meat your criteria for fused sequences.

Here are a few examples.

1919 39 0 0 10 24 10 171491 + BC040590 3038 1041 3023 17
78774742 2057649 2231098 21 18,16,3,24,9,8,1,3,4,4,6,7,21,17,5,28,103,470,168,253,790, 1041,1063,1081,1
084,1108,1117,1128,1132,1138,1142,1148,1154,1161,1185,1203,1209,1237,1342,1812,1980,2233, 2057649,2057667,2057683,
2057689,2057719,2188666,2188674,2188675,2188678,2188683,2188687,2190148,2190156,2190177,2190194,2190199,2227424,2227527,
2229079,2229475,2230308,
1346 13 0 0 0 0 1 21104 + BC040590 3038 0 1359 X
154913754 46935765 46958228 2 1238,121, 0,1238, 46935765,46958107,

869 16 0 0 2 5 4 56113 + AF118081 1374 484 1374 19
63811651 41196814 41253812 7 8,4,18,2,17,3,833, 484,492,499,519,521,538,541, 41196814
,41196825,41196829,41196847,41196850,41196870,41252979,
696 29 0 0 1 2 3 7037 + AF118081 1374 0 727 11
134452384 1975060 1982822 5 549,80,67,24,5, 0,549,629,698,722, 1975060,1982643,1982724,1982791,1982817,

1104 13 0 0 2 4 4 40420 + BC038772 1678 520 1641 8
146274826 49011736 49053273 7 8,26,41,13,16,118,895, 520,529,558,599,612,628,746, 49011736
,49011744,49011770,49032705,49032719,49052258,49052378,
753 26 0 0 2 5 7 84862 + BC038772 1678 0 784 18
76117153 32663075 32748716 10 82,36,143,95,274,109,8,19,10,3, 0,82,118,261,356,630,739,747,768
,781, 32663075,32668280,32669194,32678168,32719517,32748562,32748673,32748684,32748703,32748713,

1622 33 0 0 1 3 4 15860 - AK126172 2253 0 1658 22
49691432 28546627 28564142 6 33,13,100,53,747,709, 595,628,641,744,797,1544, 28546627
,28546661,28549587,28549687,28562685,28563433,
919 25 0 0 0 0 4 36360 + AK126172 2253 1309 2253 6
170899992 41640795 41678099 5 117,7,13,10,797, 1309,1426,1433,1446,1456, 41640795
,41640918,41640926,41677290,41677302,

1129 11 0 0 0 0 3 23630 - BC047770 1902 743 1883 17
78774742 2266090 2290860 4 1033,63,1,43, 19,1052,1115,1116, 2266090,2267128,2267194,2290817,
861 19 0 0 2 2 2 945 + BC047770 1902 0 882 1
247249719 9217519 9219344 5 788,1,16,38,37, 0,788,790,806,845, 9217519,9219250,9219251,9219269,9219307,

371 10 0 0 2 4 2 2274 - AF085869 567 0 385 2
242951149 11329311 11331966 5 7,49,55,1,269, 182,190,239,297,298, 11329311,11329318,113302
19,11330274,11331697,
300 1 0 0 0 0 0 0 + AF085869 567 266 567 4
191273063 71951264 71951565 1 301, 266, 71951264,

1183 1 0 0 0 0 0 0 - AK094335 1882 0 1184 13
114142980 51902085 51903269 1 1184, 698, 51902085,
1234 169 0 0 22 47 27 154486 - AK094335 1882 432 1882 2
242951149 71268945 71424834 49 228,177,149,144,4,7,6,2,8,8,10,10,8,10,14,2,5,10,10,10,8,2,3,21,
23,21,5,28,4,11,5,11,12,4,2,12,8,22,6,17,7,4,40,262,4,4,16,6,13, 0,228,405,554,698,707,716,722,724,732,740,751,76
3,771,781,796,799,804,816,827,837,846,851,854,875,901,922,927,958,963,974,979,990,1002,1006,1013,1026,1034,1058,1064,108
1,1088,1095,1136,1400,1406,1414,1430,1437, 71268945,71270482,71283089,71307565,71308198,71308202,71308209,71308218,
71308221,71308232,71311323,71311333,71311343,71312993,71313007,71313021,71313023,71313031,71313041,71313051,71352698,713
52706,71352708,71380846,71380870,71380893,71380917,71380923,71380951,71380955,71380969,71408653,71409495,71409509,714099
67,71409969,71409981,71410523,71410545,71410553,71410573,71416719,71416723,71424523,71424785,71424789,71424793,71424815,
71424821,

1242 9 0 0 1 1 2 1156 + AB073889 2091 839 2091 10
135374737 104513786 104516193 4 16,1198,25,12, 839,856,2054,2079, 104513786,104513802,1045
16154,104516181,
913 13 0 0 0 0 1 9498 + AB073889 2091 0 926 8
146274826 32279120 32289544 2 859,67, 0,859, 32279120,32289477,

2394 4 0 0 0 0 18 50354 - Z22957 4281 2 2400 15 10033891
5 50409928 50462680 19 60,94,199,101,9,228,54,144,114,149,94,249,240,81,73,212,109,87,101,
1881,1941,2035,2234,2335,2344,2572,2626,2770,2884,3033,3127,3376,3616,3697,3770,3982,4091,4178, 50409928,50415951,504196
84,50425849,50428306,50430742,50433093,50433359,50439456,50444042,50446519,50449657,50451612,50454792,50454876,50455835,
50459113,50459310,50462579,
1811 5 0 0 1 1 0 0 - Z22957 4281 2463 4280 14 10636858
5 79208346 79210162 2 932,884, 1,934, 79208346,79209278,


1993 0 0 0 0 0 11 8523 + AK096600 2827 834 2827 1
247249719 149766074 149776590 12 133,210,207,258,171,162,90,84,119,109,164,286, 834,967,1177,138
4,1642,1813,1975,2065,2149,2268,2377,2541, 149766074,149768449,149769008,149769588,149771501,149773074,149774141,14
9774699,149774879,149775336,149775829,149776304,
845 21 0 0 5 13 6 3593 - AK096600 2827 0 879 19
63811651 2667902 2672361 12 10,33,1,10,2,10,6,6,3,8,717,60, 1948,1958,1994,1998,2008,2010,2022,2030,2039,204
2,2050,2767, 2667902,2667913,2667946,2667947,2668115,2668118,2668128,2668134,2668140,2668145,2668156,2672301,

RE: Question and revalidating about mRNA dataset

peru replied to deepreds on 01 Mar 2007 at 14:44 GMT

Dear deepreds,

thank you for your interest in our work.

The discrepancies you found in the mRNA data is likely due to the fact that the latest version of GMAP, as far as I can tell from the website, uses human genome, NCBI build 35, whereas we used NCBI build 36. I looked through the cases you provided and found no example where the selection criteria were not met, e.g. AF118081 maps to chr11:1975060-1975611 from 0 to 551, and chr19:41252982-41253812 from 545 to 1374, i.e. there is an overlap, but only 6 bases (see below for BLAT output).

I'll see what I can do about the EST data.

Cheers,

Per


551 0 0 0 0 0 0 0 + AF118081 1374 0 551 NC_000011 134452384 1975060 1975611 1 551, 0, 1975060,
122 13 0 0 0 0 1 5 - AF118081 1374 1226 1361 NC_000017 78774742 57915553 57915693 2 16,119, 13,29, 57915553,57915574,
105 4 0 0 2 20 2 24 - AF118081 1374 1228 1357 NC_000017 78774742 55859723 55859856 3 12,51,46, 17,36,100, 55859723,55859747,55859810,
33 0 0 0 0 0 0 0 - AF118081 1374 659 692 NC_000017 78774742 53379946 53379979 1 33, 682, 53379946,
827 2 0 0 0 0 1 1 + AF118081 1374 545 1374 NC_000019 63811651 41252982 41253812 2 4,825, 545,549, 41252982,41252987,
126 14 0 0 0 0 1 1 - AF118081 1374 553 693 NC_000002 242951149 47909037 47909178 2 135,5, 681,816, 47909037,47909173,
157 17 0 0 0 0 0 0 + AF118081 1374 549 723 NC_000021 46944323 31705746 31705920 1 174, 549, 31705746,
152 11 0 0 2 12 1 39 + AF118081 1374 549 724 NC_000004 191273063 90912763 90912965 3 70,67,26, 549,629,698, 90912763,90912872,90912939,
42 1 0 0 1 3 1 1 + AF118081 1374 676 722 NC_000006 170899992 45374265 45374309 2 19,24, 676,698, 45374265,45374285,
93 5 0 0 1 15 2 219 - AF118081 1374 566 679 NC_000007 158821424 158201047 158201364 3 45,12,41, 695,740,767, 158201047,158201094,158201323,