`-s` Create database and load sequence data
This step consists of two scripts, import_sequences.pl
must be run to set up a core database and load sequence data. import_sequence_synonyms.pl
is optional and will be run if you have a list of alternate scaffold/contig names or alternate names are generated by the first script (see configuration options).
docker run --rm \
--name easy-import-operophtera_brumata_v1_core_32_85_1 \
--link genomehubs-mysql \
-v ~/demo/genomehubs-import/import/conf:/import/conf \
-v ~/demo/genomehubs-import/import/data:/import/data \
-e DATABASE=operophtera_brumata_v1_core_32_85_1 \
-e FLAGS="-s" \
genomehubs/easy-import:latest
Configuration options
[ENSEMBL]
LOCAL = /ensembl
LOCAL
is the path to the Ensembl repositories on the localhost
and should be set to the same value as [WEBSITE] SERVER_ROOT
[DATABASE_TEMPLATE]
NAME = bombyx_mori_core_31_84_1
HOST = localhost
PORT = 3306
RO_USER = anonymous
RO_PASS =
Connection details for an existing local (or remote) database using the same schema version as the current import used as a datasource to ensure tables containing data that does not change across species are filled consistently.
[DATABASE_CORE]
NAME = operophtera_brumata_v1_core_31_84_1
HOST = localhost
PORT = 3306
RW_USER = importer
RW_PASS = importpassword
RO_USER = anonymous
RO_PASS =
Contains the name and connection parameters for the core database that will be created for the current species/assembly. the numbering after _core_
should follow the pattern of [DATABASE_TEMPLATE]. Connection parameters should be as defined in Step 1.2: Setup database connections.
[DATABASE_TAXONOMY]
NAME = ncbi_taxonomy
HOST = localhost
PORT = 3306
RO_USER = anonymous
RO_PASS =
Connection details for a copy of the (Ensembl format) ncbi_taxonomy database, used to fill in the taxonomic hierarchy in the meta
table during import.
[META]
SPECIES.PRODUCTION_NAME = Operophtera_brumata_v1
SPECIES.SCIENTIFIC_NAME = Operophtera brumata
SPECIES.COMMON_NAME = Winter moth
SPECIES.DISPLAY_NAME = Operophtera brumata v1
SPECIES.DIVISION = EnsemblMetazoa
SPECIES.URL = Operophtera_brumata_v1
SPECIES.TAXONOMY_ID = 472141
SPECIES.ALIAS = [ operophtera_brumata operophtera_brumata_v1 operophtera%20brumata winter%moth ]
ASSEMBLY.NAME = v1
ASSEMBLY.DATE = 2015-08-11
ASSEMBLY.ACCESSION = GCA_001266575.1
ASSEMBLY.DEFAULT = v1
PROVIDER.NAME = Wageningen University
PROVIDER.URL = http://www.bioinformatics.nl/wintermoth
GENEBUILD.ID = 1
GENEBUILD.START_DATE = 2015-08
GENEBUILD.VERSION = 1
GENEBUILD.METHOD = import
Metadata for the current import. These fields should be edited to suit the current import and are used either during this import pipeline or as a datasource for parts of the Ensembl website.
[FILES]
SCAFFOLD = [ fa http://www.bioinformatics.nl/wintermoth/data_files/Obru1.fsa.gz ]
Details of the sequence file(s) to be imported.
-
If a
SCAFFOLD
file of typefa
is provided, then aCONTIG
file is optional and vice versa. -
SCAFFOLD
data can also be imported from anagp
file providedCONTIG
sequences are provided. -
If no
CONTIG
file is provided, contigs will be imputed from runs ofN
in theSCAFFOLD
sequence
[MODIFY]
OVERWRITE_DB = 1
TRUNCATE_SEQUENCE_TABLES = 1
-
If
OVERWRITE_DB
is set to 1, running this script will cause any existing database with the same [DATABASE_CORE]NAME
to be dropped and recreated before any data are imported. -
Setting
TRUNCATE_SEQUENCE_TABLES
to 1 will truncate any existing sequence tables before importing. -
if these values are left unset, additional data will be added to an existing database/sequence table, which may have unintended consequences so proceed with caution.
[SCAFFOLD_NAMES]
HEADER = 1
SCAFFOLD = [ /(.+)/ /scaf_/scaffold/ ]
CONTIG = [ /(.+)/ /ctg_/contig/ ]
- To use a file as a source of scaffold name synonyms for
import_sequence_synonyms.pl
, [FILES]SCAFFOLD_NAMES
must be set and theHEADER
flag may be used to indicate that the file has a header row that should be skipped during import. - Alternatively Match and replace regular expressions may be defined for
SCAFFOLD
and/orCONTIG
names to automatically generate a file of synonyms during sequence import.
Updated less than a minute ago