This step consists of two scripts,
import_sequences.pl must be run to set up a core database and load sequence data.
import_sequence_synonyms.pl is optional and will be run if you have a list of alternate scaffold/contig names or alternate names are generated by the first script (see configuration options).
docker run --rm \ --name easy-import-operophtera_brumata_v1_core_32_85_1 \ --link genomehubs-mysql \ -v ~/demo/genomehubs-import/import/conf:/import/conf \ -v ~/demo/genomehubs-import/import/data:/import/data \ -e DATABASE=operophtera_brumata_v1_core_32_85_1 \ -e FLAGS="-s" \ genomehubs/easy-import:latest
[ENSEMBL] LOCAL = /ensembl
LOCAL is the path to the Ensembl repositories on the
localhost and should be set to the same value as [WEBSITE]
[DATABASE_TEMPLATE] NAME = bombyx_mori_core_31_84_1 HOST = localhost PORT = 3306 RO_USER = anonymous RO_PASS =
Connection details for an existing local (or remote) database using the same schema version as the current import used as a datasource to ensure tables containing data that does not change across species are filled consistently.
[DATABASE_CORE] NAME = operophtera_brumata_v1_core_31_84_1 HOST = localhost PORT = 3306 RW_USER = importer RW_PASS = importpassword RO_USER = anonymous RO_PASS =
Contains the name and connection parameters for the core database that will be created for the current species/assembly. the numbering after
_core_ should follow the pattern of [DATABASE_TEMPLATE]. Connection parameters should be as defined in Step 1.2: Setup database connections.
[DATABASE_TAXONOMY] NAME = ncbi_taxonomy HOST = localhost PORT = 3306 RO_USER = anonymous RO_PASS =
Connection details for a copy of the (Ensembl format) ncbi_taxonomy database, used to fill in the taxonomic hierarchy in the
meta table during import.
[META] SPECIES.PRODUCTION_NAME = Operophtera_brumata_v1 SPECIES.SCIENTIFIC_NAME = Operophtera brumata SPECIES.COMMON_NAME = Winter moth SPECIES.DISPLAY_NAME = Operophtera brumata v1 SPECIES.DIVISION = EnsemblMetazoa SPECIES.URL = Operophtera_brumata_v1 SPECIES.TAXONOMY_ID = 472141 SPECIES.ALIAS = [ operophtera_brumata operophtera_brumata_v1 operophtera%20brumata winter%moth ] ASSEMBLY.NAME = v1 ASSEMBLY.DATE = 2015-08-11 ASSEMBLY.ACCESSION = GCA_001266575.1 ASSEMBLY.DEFAULT = v1 PROVIDER.NAME = Wageningen University PROVIDER.URL = http://www.bioinformatics.nl/wintermoth GENEBUILD.ID = 1 GENEBUILD.START_DATE = 2015-08 GENEBUILD.VERSION = 1 GENEBUILD.METHOD = import
Metadata for the current import. These fields should be edited to suit the current import and are used either during this import pipeline or as a datasource for parts of the Ensembl website.
[FILES] SCAFFOLD = [ fa http://www.bioinformatics.nl/wintermoth/data_files/Obru1.fsa.gz ]
Details of the sequence file(s) to be imported.
SCAFFOLDfile of type
fais provided, then a
CONTIGfile is optional and vice versa.
SCAFFOLDdata can also be imported from an
CONTIGsequences are provided.
CONTIGfile is provided, contigs will be imputed from runs of
[MODIFY] OVERWRITE_DB = 1 TRUNCATE_SEQUENCE_TABLES = 1
OVERWRITE_DBis set to 1, running this script will cause any existing database with the same [DATABASE_CORE]
NAMEto be dropped and recreated before any data are imported.
TRUNCATE_SEQUENCE_TABLESto 1 will truncate any existing sequence tables before importing.
if these values are left unset, additional data will be added to an existing database/sequence table, which may have unintended consequences so proceed with caution.
[SCAFFOLD_NAMES] HEADER = 1 SCAFFOLD = [ /(.+)/ /scaf_/scaffold/ ] CONTIG = [ /(.+)/ /ctg_/contig/ ]
- To use a file as a source of scaffold name synonyms for
SCAFFOLD_NAMESmust be set and the
HEADERflag may be used to indicate that the file has a header row that should be skipped during import.
- Alternatively Match and replace regular expressions may be defined for
CONTIGnames to automatically generate a file of synonyms during sequence import.
Updated less than a minute ago