Changes between Initial Version and Version 1 of BioPerlRoundTripSecondPass

2008/02/13 11:51:47 (17 years ago)



  • BioPerlRoundTripSecondPass

    v1 v1  
     1= !BioPerl Round Trip , Second Pass = 
     3Looking into problems identified in the first pass. Solving major problems when possible. 
     4Tag '''MAJOR''' means that this should be solved if possible. '''minor''' comments are for logging only. 
     6Based on bioperl-live SVN revision 14501. 
     8!BioPerl does not have parsers for these formats: 
     10 * ASN.1 
     11 * genbank XML 
     12 * INSD XML 
     14== fasta == 
     16 * minor: the length of the sequence line can vary (settable using method Bio::SeqIO::fasta::width() ) 
     18== embl == 
     20 * ~~MAJOR: sequence name and accession lost in conversion~~ 
     21   * The downloaded sequence file was mysteriously mangled. New file uploaded. Parser works. 
     22   * '''Note:''' EMBL format does not have a separate name any more. The primary accession number is now the name on the ID line. 
     23 * MAJOR: OX line for !TaxId is lost 
     24 * minor: only the actual data on the DT (date) line is kept  
     26DT   27-FEB-1998 (Rel. 54, Created) 
     27DT   14-NOV-2006 (Rel. 89, Last updated, Version 6) 
     29DT   27-FEB-1998 
     30DT   14-NOV-2006 
     32 * minor: The RC (Reference Comment) lines in the Reference section are ignored. 
     34RC   revised by [4] 
     36 * minor: Word wrapping differnences if free text lines, especially in author lists 
     37 * minor: the feature key/value pairs (FT) are not returned in order 
     38 * minor: SQ line does not contain CRC32 value 
     39   * note: there is a method for CRC64 in Bio::SeqIO::swiss::_crc64 
     42== genbank == 
     44 * MAJOR: SOURCE line adds full stop to the end of the line (following old genbank conversion?) 
     45 * minor: line BASE not present in recent genbank file, still generated by bioperl 
     46 * minor: features are not returned in order 
     48== swiss-prot == 
     50 * minor: No full stop at the end of the DT lines 
     51 * MAJOR: GN line returning only value from key/value pairs (e.g. 
     53GN   Name=DOF3.7; Synonyms=BBFA, DAG1;...    
     55GN   DOF3.7 OR BBFA OR DAG1 ... 
     57 * minor: OC line word wrapping differences 
     58 * minor: extra spaces at the end of the first RT line when there are more than one of them 
     59 * MAJOR: RX line:DOI key/value pair lost 
     60 * MAJOR: PE (evidence) line returned between CC and DR lines when it should be between DR and KW lines 
     61 * minor: extra space after first FT line 
     62 * minor: FTid sometimes not written on its own line 
     63 * minor: extra space written to the end of the sequence line 