Step 1.2: Setup database connections

🚧

Multiple .ini files

All configuration options are stored in .ini files. This ensures reproducibility as all options must be saved before a script is executed and avoids the need for numerous command-line flags. Where practical, different scripts use common .ini files, reading only those parameters that are relevant. However the options for easy import can be conceptually divided into four distinct groups, and it is convenient to keep the options for each of these groups in a separate .ini file:

  • Server/Ensembl instance configuration
  • Database hosting
  • Core import (genome assembly/annotation specific data)
  • Compara import (multiple assembly comparative data)

This means that database connection data are repeated across multiple .ini files so care must be taken to use the correct template when making changes to the default settings.

To host an Ensembl mirror with remotely hosted data, at least one local database must be created with write access, to host additional data locally and to allow data import, additional users/databases must be created. These instructions assume that both the webserver and database are on localhost. Use of separate hosts is supported (in which case this script may be run on a different host to the rest of Stage 1) but will require changes to /etc/mysql/my.cnf to allow external connections.

cd ~/ei/em
./setup-databases.sh ../conf/setup-db.ini

Configuration options

setup-db.ini provides the following options:

[DATABASE]
    DB_USER = anonymous
    DB_PASS =

    DB_SESSION_USER = ensrw
    DB_SESSION_PASS = ensrw

    DB_IMPORT_USER = importer
    DB_IMPORT_PASSWORD = importpassword

    DB_ROOT_USER = root
    DB_ROOT_PASSWORD = secretpassword
    DB_PORT = 3306
    DB_HOST = localhost

Root user connection details and user names (and passwords) for database users to be created. DB_USER has SELECT permissions only and will be used as the 'ro' user for the Ensembl instance. DB_SESSION_USER has permissions on the ensembl_accounts database and will be used as the 'rw' user for the Ensembl instance. DB_IMPORT_USER has more extensive permissions on all databases and will be used during Core and Compara Import.

[WEBSITE]
    ENSEMBL_WEBSITE_HOST = localhost

The name of the ENSEMBL_WEBSITE_HOST host (on which Step 1.1, etc. are run) is used when setting up the database users. If this is anything other than localhost then changes will be required to /etc/mysql/my.cnf to support external connections.

DATA_SOURCE]
    ENSEMBL_DB_URL = ftp://ftp.ensembl.org/pub/current_mysql/
    ENSEMBL_DB_REPLACE =
    ENSEMBL_DBS = [ ensembl_accounts ]

    EG_DB_URL = ftp://ftp.ensemblgenomes.org/pub/current/pan_ensembl/mysql/
    EG_DB_REPLACE = 1
    EG_DBS = [ ncbi_taxonomy ensembl_website_84 ]

    SPECIES_DB_URL = ftp://ftp.ensemblgenomes.org/pub/current/metazoa/mysql/
    SPECIES_DB_REPLACE =
    SPECIES_DB_AUTO_EXPAND =
    SPECIES_DBS = [ bombyx_mori_core_31_84_1 ]

    MISC_DB_URL =
    MISC_DB_REPLACE =
    MISC_DBS =

Locations and names of database dumps to fetch and load locally.

  • ENSEMBL_DB_URL - the URL containing the Ensembl database dumps
  • ENSEMBL_DB_REPLACE - a flag to specify whether to overwrite databases that already exist on the DB_HOST
  • ENSEMBL_DBS - a space separated list of database dump names in square braces. ensembl_accounts is required, all others are optional
  • The equivalent variables may be set for EG_DB_URL to fetch and download EnsemblGenomes database dumps and for MISC_DB_URL to support situations where the required databases are spread across multiple hosts.
  • An additional variable may be set for species databases, SPECIES_DB_AUTO_EXPAND - a space separated list of database types to use as replacement strings for core to facilitate downloading multiple database types for each species in SPECIES_DBS