BGF User's Guide
This is BGI (Beijing Genomics Institute) Gene
Finding program homepage. It is a program based on DP (Dynamic Programming)
& HSMM (Hidden Semi-Markov Model).
Input
The program takes sequence in FASTA or
Flat format. Each time you can only submit one sequence.
This is an example :
>SeqExampleGTCAACAATCATGCGGATAAGACGAGTTTAATTTGGTGCCAAAAGGAAGTTGCGGGTCAGAGGGCACCGGATCACAGAGAAAATTAATTGGACTGCTAGTGGATAGGAGTTGCTGACTAGGGGGTGTTTAGATACACGGCTGTAAAGTTTTAGCGTGTCATATCGTATATTATATATTGTATTGTATAGGGTGTTCGGACACTAATAAAAAAACTAACTGTAGAATCCGTCAGTAAACCGCGAGACAGATTTATTAAGTCTAATTAATCCATCATTAGCAAATGTTTACTGTAGCACCGTATTATCAAATCATGGAGCAATTATGCTTAAAAGATTCGTCTCACAAATTAGTCGCGCAATTAGTTATTTTTTACCTATATTTAATACTTCATACAGGTGTTAAACGTTCGATGTGACAGGGTGTAAAATTTTGGGGTGGAATCTAAACAGGGCCTAAAGACGTTTCCTAATTCTTACTCCCTCCGTCCCTAAAAAGACAACCACCTCTCCTAATATAACAAATCTAGACAACCCTCTGTCCAGATTTATGGTACTAAAAGGGGTTACATCCCCTGCTATG
How to run it
Either give the
name of the local file in which you have the DNA sequence in the File upload field, or paste
the sequence into the Sequence window. Then choose Species
and press `Submit'.
BGF outputGene# - predicted gene number, starting from start of sequence; S - DNA strand (+ for direct or - for complementary); Exon# - predicted exon number,in current gene;Type - type of coding sequence: Init - First (starting with start codon) Intr - internal (internal exon)Term - last coding segment, ending with stop codon)Sngl - single exon gene; Prom - position of transcription start (TATA-box position or cap site); Start and End - position of the Type; ORF_S/E - positions where the first complete codon starts and the last codon ends; Prob - exon probability for the Type For example
Program : bgfVersion : 1.0Time : Tue Feb 24 15:52:36 2004Parameter : RiceSequence : AF503585 8553Length : 8553GC% : 42.54%Total Genes: 3 ( 1 in + strand & 2 in - strand)Total Exons: 19 ( 16 in + strand & 3 in - strand) Gene# S Exon# Type Start End ORF_S ORF_E Prob Len===== = ===== ==== ======= = ======= ======= = ======= ======= ====== 1 - 1 Intr 66 - 209 68 - 208 0.66 144 1 - 2 Init 504 - 745 506 - 745 0.84 242 1 - Prom 890 - 0.05 2 + Prom 1022 - 0.10 2 + 1 Init 1129 - 1182 1129 - 1182 0.56 54 2 + 2 Intr 1925 - 2110 1925 - 2110 0.69 186 2 + 3 Intr 3104 - 3209 3104 - 3208 0.73 106 2 + 4 Intr 3268 - 3422 3270 - 3422 0.52 155 2 + 5 Intr 3547 - 3630 3547 - 3630 0.67 84 2 + 6 Intr 3704 - 3795 3704 - 3793 0.87 92 2 + 7 Intr 3935 - 4043 3936 - 4043 0.87 109 2 + 8 Intr 4150 - 4236 4150 - 4236 0.87 87 2 + 9 Intr 4359 - 4451 4359 - 4451 0.85 93 2 + 10 Intr 5350 - 5547 5350 - 5547 0.87 198 2 + 11 Intr 5687 - 5838 5687 - 5836 0.87 152 2 + 12 Intr 5930 - 6080 5931 - 6080 0.87 151 2 + 13 Intr 6181 - 6279 6181 - 6279 0.87 99 2 + 14 Intr 6365 - 6532 6365 - 6532 0.87 168 2 + 15 Intr 6830 - 6908 6830 - 6907 0.87 79 2 + 16 Term 7076 - 7206 7078 - 7206 0.87 131 2 + PolA 7634 - 0.24 3 - PolA 7743 - 0.85 3 - 1 Sngl 7826 - 8266 7826 - 8266 0.52 441 Predicted protein(s):>BGF: Gene:1 Exon(s):2 AA:128 Chain- H+T-MADYHFVYKDVEGASTEWDDIQRRLGNLPPKPEPFKPPAYAPKVDADEQPKSKEWLDEREPDELEDLEDDLDDDRFLEQYRRMRLAELREAAKAAKFGSIVPITGSDFVREVSQAPSDVWVVVFLYKD>BGF: Gene:2 Exon(s):16 AA:647 Chain+ H+T+MTDGHLFNNILLGGRAGSNPGQFKVYSGGLAWKRQGGGKTIEIEKSDLTSVTWMKVPRAYQLGVRTKDGLFYKFIGFREQDVSSLTNFMQKNMGLSPDEKQLSVSGQNWGGIDINVTLSIVGNMLTFMVGSKQAFEVSLADVSQTQMQGKTDVLLEFHVDDTTGGNEKDSLMDLSFHVPTSNTQFLGDENRTAAQVLWETIMGVADVDSSEEAVVTFEGIAILTPRGRYSVELHLSFLRLQGQANDFKIQYSSIVRLFLLPKSNNPHTFVVVTLDPPIRKGQTLYPHIVIQFETEAVVERNLALTKEVLAEKYKDRLEESYKGLIHEVFTKVLRGLSGAKVTRPGSFRSCQDGYAVKSSLKAEDGLLYPLEKGFFFLPKPPTLILHEEIEFVEFERHGAGGASISSHYFDLLVKLKNDQEHLFRNIQRSEYHNLFNFINGKHLKIMNLGDGQGATGGVTAVLRDTDDDAVDPHLERIKNQAGDEESDEEDEDFVADKDDSGSPTDDSGGEDSDASESGGEKEKLSKKEASSSKPPVKRKPKGRDEEGSDKRKPKKKKDPNAPKRAMTPFMYFSMAERGNMKNNNPDLPTTEIAKKLGEMWQKMTGEEKQPYIQQSQVDKKRYEKESAVYRGAAAMDVDSGSGGNESD>BGF: Gene:3 Exon(s):1 AA:146 Chain- H+T+MEHIPPWTLPPAHRSREVEDEADRDDGEAAVRGAEGRRPQIEEAVVDVRAPPGTTPTPTPARKRTAAASPLGATPAPAPERKGMSAASLPGATPTPTSATERKGTTAASPRGTQSTTPARKGLAVASPPGKPLPTPRRKRNFVAGD
Reference
[1] Bellman, R., Dynamic Programming,
[2] Bellman, R., Dreyfus, S. E., Applied Dynamic Programming,
[3]
[4]
[5] Burset, M. and Guig'o, R., Evaluation of gene structure prediction
programs, Genomics, 34 (1996) 353-367.
[6] Fickett, J. W., Finding genes by computer: the state of the art,
Trends in Genet., 12 (1996) 316-320.
[7] Krogh, A. et al., A hidden Markov model that finds genes in E.coli DNA,
Nucleic Acids Research, 22 (1994) 4768-4778.
[8] Krogh, A. et al., Hidden Markov Models in computational biology
applications to protein modeling, J. Mol. Biol., 235 (1994) 1501-1531.
[9] Mood, A. M. and Graybill, F. A., Introduction to the Theory of
Statistics, 2nd ed.,
[10] Rabiner, L. R. and Juang, B. H., An introduction to Hidden Markov
Models, IEEE ASSP Magazine, 3 (1986) 4-16.
[11] Rabiner, L. R., A tutorial on Hidden Markov Models and selected
applications in speech recognition, Proceedings on the IEEE, 77 (1989)
257-286.
[12] Waterman, M. S., Introduction to Computational Biology, Maps, sequences
and genomes, Chapman & Hall, London, 1995.
[13] Fickett JW., Tung CS., Assessment of
protein coding measures, Nucleic Acids Res. 1992 Dec 25;20(24):6441-50. Review.
[14] Hui-min Xie, DP and HMM (Unpublished note).
[15] Hui-min Xie, A Note for Alpha, Beta
& Gamma (Unpublished note).
[16] Hui-min Xie, A Experiment on HMM (Unpublished note).
[17] Wei-Mou Zheng, Genomic signal enhancement by clustering, Commun. Theor. Phys. 39 (2003) 631.
[18] Wei-Mou Zheng, Finding Signals for plant promoters, Geno., Prot. & Bioinfo. 1 (2003) 68.
[19] Wei-Mou Zheng, Genomic signal search by dynamic programming,
Commun. Theor. Phys. 39 (2003) 761.
[20] Tao Jiang, Ying Xu, Michael Q. Zhang, Current Topics in Computational
Molecular Biology, Tsing Hua press and MIT press
Authors : Jin-song Liu, Zhao Xu
Tutors : Bai-lin Hao, Hui-min Xie,
Wei-mou Zheng, Guo-ying Li, Jun Wang
Partners: Lin Fang, Jiao Jin, Lei Gao, Heng Li, Hai-hong Li
Yan Li, Zi-xing Xing, Qi-zhai Li, Shao-gen Gao