{"_id":"58c10f5a2c3c720f00768ba6","githubsync":"","parentDoc":null,"project":"5735936aafab441700723a50","category":{"_id":"58c10f5a2c3c720f00768b8a","version":"58c10f5a2c3c720f00768b87","__v":0,"project":"5735936aafab441700723a50","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-05-13T09:49:29.176Z","from_sync":false,"order":4,"slug":"quick-start","title":"Importing data"},"user":"573592b84b0ab120000b7d44","version":{"_id":"58c10f5a2c3c720f00768b87","project":"5735936aafab441700723a50","__v":4,"createdAt":"2017-03-09T08:16:26.385Z","releaseDate":"2017-03-09T08:16:26.385Z","categories":["58c10f5a2c3c720f00768b88","58c10f5a2c3c720f00768b89","58c10f5a2c3c720f00768b8a","58c10f5a2c3c720f00768b8b","58c10f5a2c3c720f00768b8c","58c10f5a2c3c720f00768b8d","58c10f5a2c3c720f00768b8e","58c10f5a2c3c720f00768b8f","58c10f5a2c3c720f00768b90","58c10f5a2c3c720f00768b91","58c10f5a2c3c720f00768b92","58c10f5a2c3c720f00768b93","58c11574b36d8c0f006fda47","58c2cdcafc6eed3900e97640","58c2ce8afc6eed3900e97663"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"2.0.0","version":"2.0"},"__v":0,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-05-13T11:09:38.576Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":7,"body":"The process of import and export generates a set of files that will be consistent across different assemblies and therefore useful for use in comparative analyses, converting to BLAST databases, etc. Exported ``.json`` files that can be used for assembly/annotation statistic visualisation.  \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker run --rm \\\\\\n           --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\\\\n           --link genomehubs-mysql \\\\\\n           -v ~/demo/genomehubs-import/import/conf:/import/conf \\\\\\n           -v ~/demo/genomehubs-import/import/data:/import/data \\\\\\n           -v ~/demo/genomehubs-import/download/data:/import/download \\\\\\n           -v ~/demo/genomehubs-import/blast/data:/import/blast \\\\\\n           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\\\\n           -e FLAGS=\\\"-e\\\" \\\\\\n           genomehubs/easy-import:latest\",\n      \"language\": \"text\",\n      \"name\": \"run export_sequences.pl\"\n    }\n  ]\n}\n[/block]\n``export_json.pl`` generates three ``.json`` files:\n  - ``<assembly_name>.meta.json`` contains basic metadata for the assembly and basic summary statistics including assembly span and number of gene models.\n  - ``<assembly_name>.assembly-stats.json`` contains an assembly summary in the format used by [github.com/rjchallis/assembly_stats](https://github.com/rjchallis/assembly_stats) to produce a number of individual and comparative views of several assembly statistics.\n  - ``<assembly_name>.codon-usage.json`` contains a summary of scaffold, gene, exon, etc. lengths, base composition and codon usage in the format used by [github.com/rjchallis/codon_usage](https://github.com/rjchallis/codon_usage) to visualise expected and observed codon usage patterns.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker run --rm \\\\\\n           --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\\\\n           --link genomehubs-mysql \\\\\\n           -v ~/demo/genomehubs-import/import/conf:/import/conf \\\\\\n           -v ~/demo/genomehubs-import/import/data:/import/data \\\\\\n           -v ~/demo/genomehubs-import/download/data:/import/download \\\\\\n           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\\\\n           -e FLAGS=\\\"-j\\\" \\\\\\n           genomehubs/easy-import:latest\",\n      \"language\": \"text\",\n      \"name\": \"run export_json.pl\"\n    }\n  ]\n}\n[/block]\nGene models and annotated scaffolds may also be exported in GFF3/EMBL format using ``export_features.pl. This script will always export GFF3 and will additionally export EMBL format ready for submission to the INSDC if the required fields (see below) are specified in `[META]`. The resulting file should be validated using the [ENA flat file validator](http://www.ebi.ac.uk/ena/software/flat-file-validator) to confirm the output is valid prior to submission.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker run --rm \\\\\\n           --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\\\\n           --link genomehubs-mysql \\\\\\n           -v ~/demo/genomehubs-import/import/conf:/import/conf \\\\\\n           -v ~/demo/genomehubs-import/import/data:/import/data \\\\\\n           -v ~/demo/genomehubs-import/download/data:/import/download \\\\\\n           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\\\\n           -e FLAGS=\\\"-f\\\" \\\\\\n           genomehubs/easy-import:latest\",\n      \"language\": \"text\",\n      \"name\": \"run export_features.pl\"\n    }\n  ]\n}\n[/block]\n\n[block:api-header]\n{\n  \"title\": \"Configuration options\"\n}\n[/block]\nAdditional entries are required in `[META]` in order to export to EMBL format:\n\n```\n[META]\n\tASSEMBLY.BIOPROJECT=PRJEB00000\n        ASSEMBLY.LOCUS_TAG=ABC123\n        SPECIES.EMBL_DIVISION=INV\n```\n\nWhere the bioproject and locus tag must be registered during the submission registration process and the embl division corresponds to the available taxonomic divisions in the EMBL format [documentation](ftp://ftp.embl.de/pub/databases/embl/doc/usrman.txt)","excerpt":"(optional)","slug":"step-27-export-files","type":"basic","title":"`-e -j -f` Export files"}

`-e -j -f` Export files

(optional)

The process of import and export generates a set of files that will be consistent across different assemblies and therefore useful for use in comparative analyses, converting to BLAST databases, etc. Exported ``.json`` files that can be used for assembly/annotation statistic visualisation. [block:code] { "codes": [ { "code": "docker run --rm \\\n --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\n --link genomehubs-mysql \\\n -v ~/demo/genomehubs-import/import/conf:/import/conf \\\n -v ~/demo/genomehubs-import/import/data:/import/data \\\n -v ~/demo/genomehubs-import/download/data:/import/download \\\n -v ~/demo/genomehubs-import/blast/data:/import/blast \\\n -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\n -e FLAGS=\"-e\" \\\n genomehubs/easy-import:latest", "language": "text", "name": "run export_sequences.pl" } ] } [/block] ``export_json.pl`` generates three ``.json`` files: - ``<assembly_name>.meta.json`` contains basic metadata for the assembly and basic summary statistics including assembly span and number of gene models. - ``<assembly_name>.assembly-stats.json`` contains an assembly summary in the format used by [github.com/rjchallis/assembly_stats](https://github.com/rjchallis/assembly_stats) to produce a number of individual and comparative views of several assembly statistics. - ``<assembly_name>.codon-usage.json`` contains a summary of scaffold, gene, exon, etc. lengths, base composition and codon usage in the format used by [github.com/rjchallis/codon_usage](https://github.com/rjchallis/codon_usage) to visualise expected and observed codon usage patterns. [block:code] { "codes": [ { "code": "docker run --rm \\\n --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\n --link genomehubs-mysql \\\n -v ~/demo/genomehubs-import/import/conf:/import/conf \\\n -v ~/demo/genomehubs-import/import/data:/import/data \\\n -v ~/demo/genomehubs-import/download/data:/import/download \\\n -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\n -e FLAGS=\"-j\" \\\n genomehubs/easy-import:latest", "language": "text", "name": "run export_json.pl" } ] } [/block] Gene models and annotated scaffolds may also be exported in GFF3/EMBL format using ``export_features.pl. This script will always export GFF3 and will additionally export EMBL format ready for submission to the INSDC if the required fields (see below) are specified in `[META]`. The resulting file should be validated using the [ENA flat file validator](http://www.ebi.ac.uk/ena/software/flat-file-validator) to confirm the output is valid prior to submission. [block:code] { "codes": [ { "code": "docker run --rm \\\n --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\n --link genomehubs-mysql \\\n -v ~/demo/genomehubs-import/import/conf:/import/conf \\\n -v ~/demo/genomehubs-import/import/data:/import/data \\\n -v ~/demo/genomehubs-import/download/data:/import/download \\\n -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\n -e FLAGS=\"-f\" \\\n genomehubs/easy-import:latest", "language": "text", "name": "run export_features.pl" } ] } [/block] [block:api-header] { "title": "Configuration options" } [/block] Additional entries are required in `[META]` in order to export to EMBL format: ``` [META] ASSEMBLY.BIOPROJECT=PRJEB00000 ASSEMBLY.LOCUS_TAG=ABC123 SPECIES.EMBL_DIVISION=INV ``` Where the bioproject and locus tag must be registered during the submission registration process and the embl division corresponds to the available taxonomic divisions in the EMBL format [documentation](ftp://ftp.embl.de/pub/databases/embl/doc/usrman.txt)