`-b -r -c` Import additional annotations
(optional)
Some xrefs can be imported via Dbxref
attributes in a .gff
file, however several xref types can be more richly represented in the Ensembl database if directly imported from program outputs.
docker run --rm \
--name easy-import-operophtera_brumata_v1_core_32_85_1 \
--link genomehubs-mysql \
-v ~/demo/genomehubs-import/import/conf:/import/conf \
-v ~/demo/genomehubs-import/import/data:/import/data \
-e DATABASE=operophtera_brumata_v1_core_32_85_1 \
-e FLAGS="-b" \
genomehubs/easy-import:latest
docker run --rm \
--name easy-import-operophtera_brumata_v1_core_32_85_1 \
--link genomehubs-mysql \
-v ~/demo/genomehubs-import/import/conf:/import/conf \
-v ~/demo/genomehubs-import/import/data:/import/data \
-e DATABASE=operophtera_brumata_v1_core_32_85_1 \
-e FLAGS="-r" \
genomehubs/easy-import:latest
Summaries of assembly quality based on conserved gene sets using CEGMA and BUSCO can also be imported to the meta
table of the core database. if present, these values will be exported by the script export_json.pl
during Step 2.6: Export files for use in summary tables/visualisation.
docker run --rm \
--name easy-import-operophtera_brumata_v1_core_32_85_1 \
--link genomehubs-mysql \
-v ~/demo/genomehubs-import/import/conf:/import/conf \
-v ~/demo/genomehubs-import/import/data:/import/data \
-e DATABASE=operophtera_brumata_v1_core_32_85_1 \
-e FLAGS="-c" \
genomehubs/easy-import:latest
Example commands
To obtain the correct output format, use commands similar to the following:
- blastp vs uniprot
parallel -j $NSLOTS --pipe --block 10k --recstart '>' \
"nice blastp -query - -db /exports/blast_db/uniprot_sprot.fasta -evalue 1e-10 -outfmt '6 std qlen slen stitle btop'"
- repeatmasker
RepeatMasker -pa $NSLOTS -lib /path/to/repeat.library -dir . -xsmall /path/to/seqfile
- interproscan
cat $PROTEIN | paste - - | grep -v "\*" | sed 's/\t/\n/g' \
| parallel -j $NSLOTS --pipe --block 100k --recstart '>' \
"nice interproscan.sh -T /run/shm/ -i - -d $OUTDIR -dp -t p -appl TIGRFAM-13.0,ProDom-2006.1,SMART-6.2,SignalP-EUK-4.0,PrositePatterns-20.97,PRINTS-42.0,SuperFamily-1.75,Gene3d-3.5.0,PfamA-27.0,PrositeProfiles-20.97,Phobius-1.01,TMHMM-2.0c,Coils-2.2 -f TSV"
cat -- $OUTDIR/* > $PROTEIN.interproscan
Configuration options
[FILES]
BLASTP = [ BLASTP http://download.lepbase.org/current/blastp/Operophtera_brumata_v1_-_proteins.fa.blastp.uniprot_sprot.1e-10.gz ]
IPRSCAN = [ IPRSCAN http://download.lepbase.org/current/interproscan/Operophtera_brumata_v1_-_proteins.fa.interproscan.gz ]
REPEATMASKER = [ REPEATMASKER http://download.lepbase.org/current/repeats/Operophtera_brumata_v1_-_scaffolds.fa.out.gz ]
Specifiy the (remote) locations of BLASTP
, IPRSCAN
and REPEATMASKER
files as appropriate.
[XREF]
BLASTP = [ 2000 Uniprot/swissprot/TrEMBL UniProtKB/TrEMBL ]
Set the external db id for BLASTP
. The final value in the array will be used when adding links to the original data source to the description.
Updated less than a minute ago