{"_id":"58c10f5a2c3c720f00768ba5","__v":0,"category":{"_id":"58c10f5a2c3c720f00768b8a","version":"58c10f5a2c3c720f00768b87","__v":0,"project":"5735936aafab441700723a50","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-05-13T09:49:29.176Z","from_sync":false,"order":4,"slug":"quick-start","title":"Importing data"},"version":{"_id":"58c10f5a2c3c720f00768b87","project":"5735936aafab441700723a50","__v":4,"createdAt":"2017-03-09T08:16:26.385Z","releaseDate":"2017-03-09T08:16:26.385Z","categories":["58c10f5a2c3c720f00768b88","58c10f5a2c3c720f00768b89","58c10f5a2c3c720f00768b8a","58c10f5a2c3c720f00768b8b","58c10f5a2c3c720f00768b8c","58c10f5a2c3c720f00768b8d","58c10f5a2c3c720f00768b8e","58c10f5a2c3c720f00768b8f","58c10f5a2c3c720f00768b90","58c10f5a2c3c720f00768b91","58c10f5a2c3c720f00768b92","58c10f5a2c3c720f00768b93","58c11574b36d8c0f006fda47","58c2cdcafc6eed3900e97640","58c2ce8afc6eed3900e97663"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"2.0.0","version":"2.0"},"githubsync":"","parentDoc":null,"project":"5735936aafab441700723a50","user":"573592b84b0ab120000b7d44","updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-05-13T10:49:28.241Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":6,"body":"Some xrefs can be imported via ``Dbxref`` attributes in a ``.gff`` file, however several xref types can be more richly represented in the Ensembl database if directly imported from program outputs. \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker run --rm \\\\\\n           --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\\\\n           --link genomehubs-mysql \\\\\\n           -v ~/demo/genomehubs-import/import/conf:/import/conf \\\\\\n           -v ~/demo/genomehubs-import/import/data:/import/data \\\\\\n           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\\\\n           -e FLAGS=\\\"-b\\\" \\\\\\n           genomehubs/easy-import:latest\",\n      \"language\": \"text\",\n      \"name\": \"import blastp and interproscan results\"\n    }\n  ]\n}\n[/block]\n\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker run --rm \\\\\\n           --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\\\\n           --link genomehubs-mysql \\\\\\n           -v ~/demo/genomehubs-import/import/conf:/import/conf \\\\\\n           -v ~/demo/genomehubs-import/import/data:/import/data \\\\\\n           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\\\\n           -e FLAGS=\\\"-r\\\" \\\\\\n           genomehubs/easy-import:latest\",\n      \"language\": \"text\",\n      \"name\": \"import repeatmasker results\"\n    }\n  ]\n}\n[/block]\n\n\nSummaries of assembly quality based on conserved gene sets using [CEGMA]() and [BUSCO]() can also be imported to the ``meta`` table of the core database.  if present, these values will be exported by the script ``export_json.pl`` during [Step 2.6: Export files](doc:step-26-export-files) for use in summary tables/visualisation.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"docker run --rm \\\\\\n           --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\\\\n           --link genomehubs-mysql \\\\\\n           -v ~/demo/genomehubs-import/import/conf:/import/conf \\\\\\n           -v ~/demo/genomehubs-import/import/data:/import/data \\\\\\n           -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\\\\n           -e FLAGS=\\\"-c\\\" \\\\\\n           genomehubs/easy-import:latest\",\n      \"language\": \"text\",\n      \"name\": \"import cegma/busco results\"\n    }\n  ]\n}\n[/block]\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Example commands\"\n}\n[/block]\nTo obtain the correct output format, use commands similar to the following:\n- blastp vs uniprot\n```\nparallel -j $NSLOTS --pipe --block 10k --recstart '>' \\\n    \"nice blastp -query - -db /exports/blast_db/uniprot_sprot.fasta -evalue 1e-10 -outfmt '6 std qlen slen stitle btop'\"\n```\n\n- repeatmasker\n```\nRepeatMasker -pa $NSLOTS -lib /path/to/repeat.library -dir . -xsmall /path/to/seqfile\n```\n\n- interproscan\n```\ncat $PROTEIN | paste - - | grep -v \"\\*\" | sed 's/\\t/\\n/g' \\\n| parallel -j $NSLOTS --pipe --block 100k --recstart '>' \\\n    \"nice interproscan.sh -T /run/shm/ -i - -d $OUTDIR -dp -t p -appl TIGRFAM-13.0,ProDom-2006.1,SMART-6.2,SignalP-EUK-4.0,PrositePatterns-20.97,PRINTS-42.0,SuperFamily-1.75,Gene3d-3.5.0,PfamA-27.0,PrositeProfiles-20.97,Phobius-1.01,TMHMM-2.0c,Coils-2.2 -f TSV\"\ncat -- $OUTDIR/* > $PROTEIN.interproscan\n```\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Configuration options\"\n}\n[/block]\n- [[FILES]](doc:files-core)\n```\n[FILES]\n    BLASTP =  [ BLASTP  http://download.lepbase.org/current/blastp/Operophtera_brumata_v1_-_proteins.fa.blastp.uniprot_sprot.1e-10.gz ]\n    IPRSCAN = [ IPRSCAN http://download.lepbase.org/current/interproscan/Operophtera_brumata_v1_-_proteins.fa.interproscan.gz ]\n    REPEATMASKER = [ REPEATMASKER http://download.lepbase.org/current/repeats/Operophtera_brumata_v1_-_scaffolds.fa.out.gz ]\n```\n  Specifiy the (remote) locations of ``BLASTP``, ``IPRSCAN`` and ``REPEATMASKER`` files as appropriate.\n\n- [[XREF]](doc:xref-core)\n```\n[XREF]\n    BLASTP = [ 2000 Uniprot/swissprot/TrEMBL UniProtKB/TrEMBL ]\n```\n  Set the external db id for ``BLASTP``.  The final value in the array will be used when adding links to the original data source to the description.","excerpt":"(optional)","slug":"step-26-import-additional-annotations","type":"basic","title":"`-b -r -c` Import additional annotations"}

`-b -r -c` Import additional annotations

(optional)

Some xrefs can be imported via ``Dbxref`` attributes in a ``.gff`` file, however several xref types can be more richly represented in the Ensembl database if directly imported from program outputs. [block:code] { "codes": [ { "code": "docker run --rm \\\n --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\n --link genomehubs-mysql \\\n -v ~/demo/genomehubs-import/import/conf:/import/conf \\\n -v ~/demo/genomehubs-import/import/data:/import/data \\\n -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\n -e FLAGS=\"-b\" \\\n genomehubs/easy-import:latest", "language": "text", "name": "import blastp and interproscan results" } ] } [/block] [block:code] { "codes": [ { "code": "docker run --rm \\\n --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\n --link genomehubs-mysql \\\n -v ~/demo/genomehubs-import/import/conf:/import/conf \\\n -v ~/demo/genomehubs-import/import/data:/import/data \\\n -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\n -e FLAGS=\"-r\" \\\n genomehubs/easy-import:latest", "language": "text", "name": "import repeatmasker results" } ] } [/block] Summaries of assembly quality based on conserved gene sets using [CEGMA]() and [BUSCO]() can also be imported to the ``meta`` table of the core database. if present, these values will be exported by the script ``export_json.pl`` during [Step 2.6: Export files](doc:step-26-export-files) for use in summary tables/visualisation. [block:code] { "codes": [ { "code": "docker run --rm \\\n --name easy-import-operophtera_brumata_v1_core_32_85_1 \\\n --link genomehubs-mysql \\\n -v ~/demo/genomehubs-import/import/conf:/import/conf \\\n -v ~/demo/genomehubs-import/import/data:/import/data \\\n -e DATABASE=operophtera_brumata_v1_core_32_85_1 \\\n -e FLAGS=\"-c\" \\\n genomehubs/easy-import:latest", "language": "text", "name": "import cegma/busco results" } ] } [/block] [block:api-header] { "type": "basic", "title": "Example commands" } [/block] To obtain the correct output format, use commands similar to the following: - blastp vs uniprot ``` parallel -j $NSLOTS --pipe --block 10k --recstart '>' \ "nice blastp -query - -db /exports/blast_db/uniprot_sprot.fasta -evalue 1e-10 -outfmt '6 std qlen slen stitle btop'" ``` - repeatmasker ``` RepeatMasker -pa $NSLOTS -lib /path/to/repeat.library -dir . -xsmall /path/to/seqfile ``` - interproscan ``` cat $PROTEIN | paste - - | grep -v "\*" | sed 's/\t/\n/g' \ | parallel -j $NSLOTS --pipe --block 100k --recstart '>' \ "nice interproscan.sh -T /run/shm/ -i - -d $OUTDIR -dp -t p -appl TIGRFAM-13.0,ProDom-2006.1,SMART-6.2,SignalP-EUK-4.0,PrositePatterns-20.97,PRINTS-42.0,SuperFamily-1.75,Gene3d-3.5.0,PfamA-27.0,PrositeProfiles-20.97,Phobius-1.01,TMHMM-2.0c,Coils-2.2 -f TSV" cat -- $OUTDIR/* > $PROTEIN.interproscan ``` [block:api-header] { "type": "basic", "title": "Configuration options" } [/block] - [[FILES]](doc:files-core) ``` [FILES] BLASTP = [ BLASTP http://download.lepbase.org/current/blastp/Operophtera_brumata_v1_-_proteins.fa.blastp.uniprot_sprot.1e-10.gz ] IPRSCAN = [ IPRSCAN http://download.lepbase.org/current/interproscan/Operophtera_brumata_v1_-_proteins.fa.interproscan.gz ] REPEATMASKER = [ REPEATMASKER http://download.lepbase.org/current/repeats/Operophtera_brumata_v1_-_scaffolds.fa.out.gz ] ``` Specifiy the (remote) locations of ``BLASTP``, ``IPRSCAN`` and ``REPEATMASKER`` files as appropriate. - [[XREF]](doc:xref-core) ``` [XREF] BLASTP = [ 2000 Uniprot/swissprot/TrEMBL UniProtKB/TrEMBL ] ``` Set the external db id for ``BLASTP``. The final value in the array will be used when adding links to the original data source to the description.