{"__v":14,"_id":"5735b138b667601700d3bd64","category":{"__v":0,"_id":"5735a32931a73b1700887c94","project":"5735936aafab441700723a50","version":"5735936aafab441700723a53","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-05-13T09:49:29.176Z","from_sync":false,"order":2,"slug":"quick-start","title":"Stage 2 - Core Import"},"parentDoc":null,"project":"5735936aafab441700723a50","user":"573592b84b0ab120000b7d44","version":{"__v":12,"_id":"5735936aafab441700723a53","project":"5735936aafab441700723a50","createdAt":"2016-05-13T08:42:18.615Z","releaseDate":"2016-05-13T08:42:18.615Z","categories":["5735936aafab441700723a54","5735a32931a73b1700887c94","5735b55beceb872200abbc6c","5735b56eb667601700d3bd6f","5735b9ba4b0ab120000b7dd4","5735b9c94b0ab120000b7dd5","5735cb131f16241700c8a0f7","5735e5c4e4824c3400aa1f21","5735e5d9e4824c3400aa1f23","5735e5f2ec67f6290013ac72","573ecfe0804f901700a9dfc7","573f276c7eeb8b190094ca7d"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":false,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-05-13T10:49:28.241Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":6,"body":"Some xrefs can be imported via ``Dbxref`` attributes in a ``.gff`` file, however several xref types can be more richly represented in the Ensembl database if directly imported from program outputs. \n\n- blastp\n```\ncd ~/import\nperl ../ei/core/import_blastp.pl ../ei/conf/core-import.ini ../ei/conf/core-import-extra.ini\n```\n\n- repeatmasker\n```\nperl ../ei/core/import_repeatmasker.pl ../ei/conf/core-import.ini ../ei/conf/core-import-extra.ini\n```\n\n- interproscan\n```\nperl ../ei/core/import_interproscan.pl ../ei/conf/core-import.ini ../ei/conf/core-import-extra.ini\n```\n\nSummaries of assembly quality based on conserved gene sets using [CEGMA]() and [BUSCO]() can also be imported to the ``meta`` table of the core database.  if present, these values will be exported by the script ``export_json.pl`` during [Step 2.6: Export files](doc:step-26-export-files) for use in summary tables/visualisation.\n\n```\nperl ../ei/core/import_cegma_busco.pl ../ei/conf/core-import.ini ../ei/conf/core-import-extra.ini\n```\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Example commands\"\n}\n[/block]\nTo obtain the correct output format, use commands similar to the following:\n- blastp vs uniprot\n```\nparallel -j $NSLOTS --pipe --block 10k --recstart '>' \\\n    \"nice blastp -query - -db /exports/blast_db/uniprot_sprot.fasta -evalue 1e-10 -outfmt '6 std qlen slen stitle btop'\"\n```\n\n- repeatmasker\n```\nRepeatMasker -pa $NSLOTS -lib /path/to/repeat.library -dir . -xsmall /path/to/seqfile\n```\n\n- interproscan\n```\ncat $PROTEIN | paste - - | grep -v \"\\*\" | sed 's/\\t/\\n/g' \\\n| parallel -j $NSLOTS --pipe --block 100k --recstart '>' \\\n    \"nice interproscan.sh -T /run/shm/ -i - -d $OUTDIR -dp -t p -appl TIGRFAM-13.0,ProDom-2006.1,SMART-6.2,SignalP-EUK-4.0,PrositePatterns-20.97,PRINTS-42.0,SuperFamily-1.75,Gene3d-3.5.0,PfamA-27.0,PrositeProfiles-20.97,Phobius-1.01,TMHMM-2.0c,Coils-2.2 -f TSV\"\ncat -- $OUTDIR/* > $PROTEIN.interproscan\n```\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Configuration options\"\n}\n[/block]\n- [[FILES]](doc:files-core)\n```\n[FILES]\n    BLASTP =  [ BLASTP  http://download.lepbase.org/current/blastp/Operophtera_brumata_v1_-_proteins.fa.blastp.uniprot_sprot.1e-10.gz ]\n    IPRSCAN = [ IPRSCAN http://download.lepbase.org/current/interproscan/Operophtera_brumata_v1_-_proteins.fa.interproscan.gz ]\n    REPEATMASKER = [ REPEATMASKER http://download.lepbase.org/current/repeats/Operophtera_brumata_v1_-_scaffolds.fa.out.gz ]\n```\n  Specifiy the (remote) locations of ``BLASTP``, ``IPRSCAN`` and ``REPEATMASKER`` files as appropriate.\n\n- [[XREF]](doc:xref-core)\n```\n[XREF]\n    BLASTP = [ 2000 Uniprot/swissprot/TrEMBL UniProtKB/TrEMBL ]\n```\n  Set the external db id for ``BLASTP``.  The final value in the array will be used when adding links to the original data source to the description.","excerpt":"(optional)","slug":"step-26-import-additional-annotations","type":"basic","title":"Step 2.6: Import additional annotations"}

Step 2.6: Import additional annotations

(optional)

Some xrefs can be imported via ``Dbxref`` attributes in a ``.gff`` file, however several xref types can be more richly represented in the Ensembl database if directly imported from program outputs. - blastp ``` cd ~/import perl ../ei/core/import_blastp.pl ../ei/conf/core-import.ini ../ei/conf/core-import-extra.ini ``` - repeatmasker ``` perl ../ei/core/import_repeatmasker.pl ../ei/conf/core-import.ini ../ei/conf/core-import-extra.ini ``` - interproscan ``` perl ../ei/core/import_interproscan.pl ../ei/conf/core-import.ini ../ei/conf/core-import-extra.ini ``` Summaries of assembly quality based on conserved gene sets using [CEGMA]() and [BUSCO]() can also be imported to the ``meta`` table of the core database. if present, these values will be exported by the script ``export_json.pl`` during [Step 2.6: Export files](doc:step-26-export-files) for use in summary tables/visualisation. ``` perl ../ei/core/import_cegma_busco.pl ../ei/conf/core-import.ini ../ei/conf/core-import-extra.ini ``` [block:api-header] { "type": "basic", "title": "Example commands" } [/block] To obtain the correct output format, use commands similar to the following: - blastp vs uniprot ``` parallel -j $NSLOTS --pipe --block 10k --recstart '>' \ "nice blastp -query - -db /exports/blast_db/uniprot_sprot.fasta -evalue 1e-10 -outfmt '6 std qlen slen stitle btop'" ``` - repeatmasker ``` RepeatMasker -pa $NSLOTS -lib /path/to/repeat.library -dir . -xsmall /path/to/seqfile ``` - interproscan ``` cat $PROTEIN | paste - - | grep -v "\*" | sed 's/\t/\n/g' \ | parallel -j $NSLOTS --pipe --block 100k --recstart '>' \ "nice interproscan.sh -T /run/shm/ -i - -d $OUTDIR -dp -t p -appl TIGRFAM-13.0,ProDom-2006.1,SMART-6.2,SignalP-EUK-4.0,PrositePatterns-20.97,PRINTS-42.0,SuperFamily-1.75,Gene3d-3.5.0,PfamA-27.0,PrositeProfiles-20.97,Phobius-1.01,TMHMM-2.0c,Coils-2.2 -f TSV" cat -- $OUTDIR/* > $PROTEIN.interproscan ``` [block:api-header] { "type": "basic", "title": "Configuration options" } [/block] - [[FILES]](doc:files-core) ``` [FILES] BLASTP = [ BLASTP http://download.lepbase.org/current/blastp/Operophtera_brumata_v1_-_proteins.fa.blastp.uniprot_sprot.1e-10.gz ] IPRSCAN = [ IPRSCAN http://download.lepbase.org/current/interproscan/Operophtera_brumata_v1_-_proteins.fa.interproscan.gz ] REPEATMASKER = [ REPEATMASKER http://download.lepbase.org/current/repeats/Operophtera_brumata_v1_-_scaffolds.fa.out.gz ] ``` Specifiy the (remote) locations of ``BLASTP``, ``IPRSCAN`` and ``REPEATMASKER`` files as appropriate. - [[XREF]](doc:xref-core) ``` [XREF] BLASTP = [ 2000 Uniprot/swissprot/TrEMBL UniProtKB/TrEMBL ] ``` Set the external db id for ``BLASTP``. The final value in the array will be used when adding links to the original data source to the description.