Version 2 (modified by markjschreiber, 10 years ago)

--

BioJava

Working with BioJava-live SVN version 4721.

Fasta Format

Major Issues

None

Minor Issues

Sequence case is not preserved. Line length varies (default is 80 cpl). ===Read-Write-Read=== Succeeds!

Genbank format

Major Issues

None.

Minor Issues

Feature qualifier order is not preserved. Because NCBI Taxonomy is referenced from memory or database if the version used doesn't match the version that was used to construct the record then minor differences appear. For example the common name of Arabidopsis changed from thale cress to mouse ear cress. ===Read-Write-Read===

GenbankXML format

Major Issues

Not supported (INSD is).

INSD Format

Major Issues

None.

Minor Issues

* BioJava inlcudes the Reference_position tag. NCBI doesn't unless it is not 1..1

<INSDReference_position>1..1</INSDReference_position>

* There are other examples of this redundancy. I think if this doesn't break the dtd then it doesn't matter. * Qualifiers order is not preserved. I don't think this matters. ===Read-Write-Read===

EMBL Format

Major Issues

No major issues

Minor Issues

  • Version in date is not correct on output.
  • Two XX lines after references.
  • Feature qualifiers out of order.

===Read-Write-Read=== Succeeds!

SwissProt/ Uniprot

Major Issues

Minor Issues

  • BioSQL cannot store more than one database reference for a single publication,

eg Pubmed and medline Id and DOI.

  • We are putting 'and' before the last author.
  • We loose the copyright statement.

===Read-Write-Read===

UniprotXML

Major Issues

  • Missing the namespace and the version from entry tag
  • Don't write editor list if there are no editors
  • Reference tag not correctly constructed.
  • When writing Uniprot XML it would be illegal to use anything other than

Swiss-Prot or TrEMBL as the Namespace

Minor Issues

===Read-Write-Read===

EMBLxml

Major Issues

Can read but produced XML doesn't validate against the xsd

Minor Issues

===Read-Write-Read===