| 1 | = !BioPerl Round Trip , Second Pass = |
| 2 | |
| 3 | Looking into problems identified in the first pass. Solving major problems when possible. |
| 4 | Tag '''MAJOR''' means that this should be solved if possible. '''minor''' comments are for logging only. |
| 5 | |
| 6 | Based on bioperl-live SVN revision 14501. |
| 7 | |
| 8 | !BioPerl does not have parsers for these formats: |
| 9 | |
| 10 | * ASN.1 |
| 11 | * genbank XML |
| 12 | * INSD XML |
| 13 | |
| 14 | == fasta == |
| 15 | |
| 16 | * minor: the length of the sequence line can vary (settable using method Bio::SeqIO::fasta::width() ) |
| 17 | |
| 18 | == embl == |
| 19 | |
| 20 | * ~~MAJOR: sequence name and accession lost in conversion~~ |
| 21 | * The downloaded sequence file was mysteriously mangled. New file uploaded. Parser works. |
| 22 | * '''Note:''' EMBL format does not have a separate name any more. The primary accession number is now the name on the ID line. |
| 23 | * MAJOR: OX line for !TaxId is lost |
| 24 | * minor: only the actual data on the DT (date) line is kept |
| 25 | {{{ |
| 26 | DT 27-FEB-1998 (Rel. 54, Created) |
| 27 | DT 14-NOV-2006 (Rel. 89, Last updated, Version 6) |
| 28 | -> |
| 29 | DT 27-FEB-1998 |
| 30 | DT 14-NOV-2006 |
| 31 | }}} |
| 32 | * minor: The RC (Reference Comment) lines in the Reference section are ignored. |
| 33 | {{{ |
| 34 | RC revised by [4] |
| 35 | }}} |
| 36 | * minor: Word wrapping differnences if free text lines, especially in author lists |
| 37 | * minor: the feature key/value pairs (FT) are not returned in order |
| 38 | * minor: SQ line does not contain CRC32 value |
| 39 | * note: there is a method for CRC64 in Bio::SeqIO::swiss::_crc64 |
| 40 | |
| 41 | |
| 42 | == genbank == |
| 43 | |
| 44 | * MAJOR: SOURCE line adds full stop to the end of the line (following old genbank conversion?) |
| 45 | * minor: line BASE not present in recent genbank file, still generated by bioperl |
| 46 | * minor: features are not returned in order |
| 47 | |
| 48 | == swiss-prot == |
| 49 | |
| 50 | * minor: No full stop at the end of the DT lines |
| 51 | * MAJOR: GN line returning only value from key/value pairs (e.g. |
| 52 | {{{ |
| 53 | GN Name=DOF3.7; Synonyms=BBFA, DAG1;... |
| 54 | -> |
| 55 | GN DOF3.7 OR BBFA OR DAG1 ... |
| 56 | }}} |
| 57 | * minor: OC line word wrapping differences |
| 58 | * minor: extra spaces at the end of the first RT line when there are more than one of them |
| 59 | * MAJOR: RX line:DOI key/value pair lost |
| 60 | * MAJOR: PE (evidence) line returned between CC and DR lines when it should be between DR and KW lines |
| 61 | * minor: extra space after first FT line |
| 62 | * minor: FTid sometimes not written on its own line |
| 63 | * minor: extra space written to the end of the sequence line |
| 64 | |