| 1 | = !BioJava = |
| 2 | Working with BioJava-live SVN version 4723. |
| 3 | == Fasta Format == |
| 4 | === Major Issues === |
| 5 | None |
| 6 | === Minor Issues === |
| 7 | Sequence case is not preserved. Line length varies (default is 80 cpl). |
| 8 | ===Read-Write-Read=== |
| 9 | Succeeds! |
| 10 | |
| 11 | |
| 12 | == Genbank format == |
| 13 | === Major Issues === |
| 14 | None. |
| 15 | === Minor Issues === |
| 16 | Feature qualifier order is not preserved. |
| 17 | Because NCBI Taxonomy is referenced from memory or database if the version used |
| 18 | doesn't match the version that was used to construct the record then minor |
| 19 | differences appear. For example the common name of Arabidopsis changed from thale cress |
| 20 | to mouse ear cress. |
| 21 | ===Read-Write-Read=== |
| 22 | Success! |
| 23 | |
| 24 | == GenbankXML format == |
| 25 | == Major Issues == |
| 26 | Not supported (INSD is). |
| 27 | |
| 28 | == INSD Format == |
| 29 | == Major Issues == |
| 30 | None. |
| 31 | == Minor Issues == |
| 32 | * BioJava inlcudes the Reference_position tag. NCBI doesn't unless it is not 1..1 |
| 33 | {{{ |
| 34 | <INSDReference_position>1..1</INSDReference_position> |
| 35 | }}} |
| 36 | * There are other examples of this redundancy. I think if this doesn't break the |
| 37 | dtd then it doesn't matter. |
| 38 | * Qualifiers order is not preserved. I don't think this matters. |
| 39 | ===Read-Write-Read=== |
| 40 | Success! |
| 41 | |
| 42 | == EMBL Format == |
| 43 | === Major Issues === |
| 44 | No major issues |
| 45 | === Minor Issues === |
| 46 | * Version in date is not correct on output. |
| 47 | * Two XX lines after references. |
| 48 | * Feature qualifiers out of order. |
| 49 | ===Read-Write-Read=== |
| 50 | Succeeds! |
| 51 | |
| 52 | == SwissProt/ Uniprot == |
| 53 | === Major Issues === |
| 54 | === Minor Issues === |
| 55 | * BioSQL cannot store more than one database reference for a single publication, |
| 56 | eg Pubmed and medline Id and DOI. |
| 57 | * We are putting 'and' before the last author. |
| 58 | * We loose the copyright statement. |
| 59 | ===Read-Write-Read=== |
| 60 | Cannot read back in: |
| 61 | |
| 62 | <code> |
| 63 | Format_object=org.biojavax.bio.seq.io.UniProtFormat |
| 64 | Accession=Q43385 |
| 65 | Id= |
| 66 | Comments= |
| 67 | Parse_block=OS Arabidopsis thaliana (Mouse-ear cress) (Arabidopsis thaliana (L.) Heynh.).OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;OS Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; |
| 68 | eurosids II; Brassicales; Brassicaceae; Arabidopsis.OX NCBI_TaxID=3702; |
| 69 | Stack trace follows .... |
| 70 | |
| 71 | |
| 72 | at org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:615) |
| 73 | at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) |
| 74 | ... 3 more |
| 75 | Caused by: java.lang.IllegalArgumentException: NCBI taxonomy names cannot embed new lines - at:74, in name: <Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; |
| 76 | eurosids II; Brassicales; Brassicaceae; Arabidopsis> |
| 77 | at org.biojavax.bio.taxa.SimpleNCBITaxonName.<init>(SimpleNCBITaxonName.java:47) |
| 78 | at org.biojavax.bio.taxa.SimpleNCBITaxon.addName(SimpleNCBITaxon.java:148) |
| 79 | at org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:339) |
| 80 | ... 4 more |
| 81 | </code> |
| 82 | |
| 83 | == UniprotXML == |
| 84 | === Major Issues === |
| 85 | * Missing the namespace and the version from entry tag |
| 86 | * Don't write editor list if there are no editors |
| 87 | * Reference tag not correctly constructed. |
| 88 | * When writing Uniprot XML it would be illegal to use anything other than |
| 89 | Swiss-Prot or TrEMBL as the Namespace |
| 90 | === Minor Issues === |
| 91 | ===Read-Write-Read=== |
| 92 | |
| 93 | == EMBLxml == |
| 94 | === Major Issues === |
| 95 | none |
| 96 | === Minor Issues === |
| 97 | ===Read-Write-Read=== |
| 98 | Success! |