Changes between Initial Version and Version 1 of BioPerlRoundTripSecondPass

Show
Ignore:
Timestamp:
2008/02/13 11:51:47 (17 years ago)
Author:
heikki
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BioPerlRoundTripSecondPass

    v1 v1  
     1= !BioPerl Round Trip , Second Pass = 
     2 
     3Looking into problems identified in the first pass. Solving major problems when possible. 
     4Tag '''MAJOR''' means that this should be solved if possible. '''minor''' comments are for logging only. 
     5 
     6Based on bioperl-live SVN revision 14501. 
     7 
     8!BioPerl does not have parsers for these formats: 
     9 
     10 * ASN.1 
     11 * genbank XML 
     12 * INSD XML 
     13 
     14== fasta == 
     15 
     16 * minor: the length of the sequence line can vary (settable using method Bio::SeqIO::fasta::width() ) 
     17 
     18== embl == 
     19 
     20 * ~~MAJOR: sequence name and accession lost in conversion~~ 
     21   * The downloaded sequence file was mysteriously mangled. New file uploaded. Parser works. 
     22   * '''Note:''' EMBL format does not have a separate name any more. The primary accession number is now the name on the ID line. 
     23 * MAJOR: OX line for !TaxId is lost 
     24 * minor: only the actual data on the DT (date) line is kept  
     25{{{ 
     26DT   27-FEB-1998 (Rel. 54, Created) 
     27DT   14-NOV-2006 (Rel. 89, Last updated, Version 6) 
     28-> 
     29DT   27-FEB-1998 
     30DT   14-NOV-2006 
     31}}} 
     32 * minor: The RC (Reference Comment) lines in the Reference section are ignored. 
     33{{{ 
     34RC   revised by [4] 
     35}}} 
     36 * minor: Word wrapping differnences if free text lines, especially in author lists 
     37 * minor: the feature key/value pairs (FT) are not returned in order 
     38 * minor: SQ line does not contain CRC32 value 
     39   * note: there is a method for CRC64 in Bio::SeqIO::swiss::_crc64 
     40 
     41 
     42== genbank == 
     43 
     44 * MAJOR: SOURCE line adds full stop to the end of the line (following old genbank conversion?) 
     45 * minor: line BASE not present in recent genbank file, still generated by bioperl 
     46 * minor: features are not returned in order 
     47 
     48== swiss-prot == 
     49 
     50 * minor: No full stop at the end of the DT lines 
     51 * MAJOR: GN line returning only value from key/value pairs (e.g. 
     52{{{ 
     53GN   Name=DOF3.7; Synonyms=BBFA, DAG1;...    
     54->   
     55GN   DOF3.7 OR BBFA OR DAG1 ... 
     56}}} 
     57 * minor: OC line word wrapping differences 
     58 * minor: extra spaces at the end of the first RT line when there are more than one of them 
     59 * MAJOR: RX line:DOI key/value pair lost 
     60 * MAJOR: PE (evidence) line returned between CC and DR lines when it should be between DR and KW lines 
     61 * minor: extra space after first FT line 
     62 * minor: FTid sometimes not written on its own line 
     63 * minor: extra space written to the end of the sequence line 
     64