`-e -j -f` Export files

(optional)

The process of import and export generates a set of files that will be consistent across different assemblies and therefore useful for use in comparative analyses, converting to BLAST databases, etc. Exported .json files that can be used for assembly/annotation statistic visualisation.

docker run --rm \
           --name easy-import-operophtera_brumata_v1_core_32_85_1 \
           --link genomehubs-mysql \
           -v ~/demo/genomehubs-import/import/conf:/import/conf \
           -v ~/demo/genomehubs-import/import/data:/import/data \
           -v ~/demo/genomehubs-import/download/data:/import/download \
           -v ~/demo/genomehubs-import/blast/data:/import/blast \
           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \
           -e FLAGS="-e" \
           genomehubs/easy-import:latest

export_json.pl generates three .json files:

  • <assembly_name>.meta.json contains basic metadata for the assembly and basic summary statistics including assembly span and number of gene models.
  • <assembly_name>.assembly-stats.json contains an assembly summary in the format used by github.com/rjchallis/assembly_stats to produce a number of individual and comparative views of several assembly statistics.
  • <assembly_name>.codon-usage.json contains a summary of scaffold, gene, exon, etc. lengths, base composition and codon usage in the format used by github.com/rjchallis/codon_usage to visualise expected and observed codon usage patterns.
docker run --rm \
           --name easy-import-operophtera_brumata_v1_core_32_85_1 \
           --link genomehubs-mysql \
           -v ~/demo/genomehubs-import/import/conf:/import/conf \
           -v ~/demo/genomehubs-import/import/data:/import/data \
           -v ~/demo/genomehubs-import/download/data:/import/download \
           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \
           -e FLAGS="-j" \
           genomehubs/easy-import:latest

Gene models and annotated scaffolds may also be exported in GFF3/EMBL format using `export_features.pl. This script will always export GFF3 and will additionally export EMBL format ready for submission to the INSDC if the required fields (see below) are specified in [META]`. The resulting file should be validated using the ENA flat file validator to confirm the output is valid prior to submission.

docker run --rm \
           --name easy-import-operophtera_brumata_v1_core_32_85_1 \
           --link genomehubs-mysql \
           -v ~/demo/genomehubs-import/import/conf:/import/conf \
           -v ~/demo/genomehubs-import/import/data:/import/data \
           -v ~/demo/genomehubs-import/download/data:/import/download \
           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \
           -e FLAGS="-f" \
           genomehubs/easy-import:latest

Configuration options

Additional entries are required in [META] in order to export to EMBL format:

[META]
	ASSEMBLY.BIOPROJECT=PRJEB00000
        ASSEMBLY.LOCUS_TAG=ABC123
        SPECIES.EMBL_DIVISION=INV

Where the bioproject and locus tag must be registered during the submission registration process and the embl division corresponds to the available taxonomic divisions in the EMBL format documentation