Step 1.3: git clone ensembl repositories

๐Ÿ“˜

Reusing this script

The update-ensembl-code.sh script should be run again (as Step 3.1) following data import in Stage 2/Stage 3. To set up a mirror site, continue directly to Step 3.1: Update Ensembl webcode for full details of additional configuration options.

Running the update-ensembl-code.sh script will git clone/git pull a set of (mostly Ensembl) repositories required for Core/Compara Import and for hosting an Ensembl instance.

cd ~/ei/em
./update-ensembl-code.sh ../conf/setup.ini

๐Ÿ“˜

Testing your setup

With the basic setup complete, you can now jump directly to Step 3.2: Reload Ensembl website to launch a local Ensembl instance.

Configuration options

Two stanzas in setup.ini are relevant at this stage:

[DATABASE]
    DB_HOST = localhost
    DB_PORT = 3306
    DB_USER = anonymous
    DB_PASS =

    DB_SESSION_HOST = localhost
    DB_SESSION_PORT = 3306
    DB_SESSION_USER = ensrw
    DB_SESSION_PASS = ensrw

    DB_FALLBACK_HOST = mysql-eg-publicsql.ebi.ac.uk
    DB_FALLBACK_PORT = 4157
    DB_FALLBACK_USER = anonymous
    DB_FALLBACK_PASS =

    DB_FALLBACK2_HOST = ensembldb.ensembl.org
    DB_FALLBACK2_PORT = 3306
    DB_FALLBACK2_USER = anonymous
    DB_FALLBACK2_PASS =

Four subsections with DB_[*_]HOST, DB_[*_]PORT, DB_[*_]USER and DB_[*_]PASS variables specify connection settings for:

  • DB_HOST etc. - the primary database host with species/multi-species databases.

  • DB_SESSION_HOST etc. - user-specific information, typically the only database to require read-write access and therefore a password protected connection.

  • DB_FALLBACK_HOST etc. - to reduce the amount of locally hosted data, it is often desirable to use alternate sources for some databases, the DB_FALLBACK_HOST host will be queried to find any required databases that are not available on DB_HOST

  • DB_FALLBACK2_HOST etc. - especially with EnsemblGenomes sites, remote databases may be found on more than one host, the DB_FALLBACK2_HOST host will be queried to find any required databases that are not available on DB_HOST or DB_FALLBACK_HOST

  • [REPOSITORIES]

[REPOSITORIES]
    ENSEMBL_URL = https://github.com/Ensembl
    ENSEMBL_BRANCH = release/84

    BIOPERL_URL = https://github.com/bioperl
    BIOPERL_BRANCH = master

    EG_METAZOA_PLUGIN_URL = https://github.com/EnsemblGenomes/eg-web-metazoa
    EG_METAZOA_PLUGIN_BRANCH = release/eg/31
    EG_METAZOA_PLUGIN_PACKAGE = EG::Metazoa

    API_PLUGIN_URL = https://github.com/EnsemblGenomes/ensemblgenomes-api
    API_PLUGIN_BRANCH = release/eg/31
    API_PLUGIN_PACKAGE = EG::API

    EG_COMMON_PLUGIN_URL = https://github.com/EnsemblGenomes/eg-web-common
    EG_COMMON_PLUGIN_BRANCH = release/eg/31
    EG_COMMON_PLUGIN_PACKAGE = EG::Common

    PUBLIC_PLUGINS = [ ]

Connection/branch information for the Github repositories to be cloned

  • ENSEMBL_URL/ENSEMBL_BRANCH and BIOPERL_URL/BIOPERL_BRANCH are always required. Several Ensel]mbl repositories are git cloned from ENSEMBL_URL for both the Ensembl website and to support the import pipeline.
  • Additional plugins can be loaded from any public git repository by specifying <NAME>_PLUGIN_URL, <NAME>_PLUGIN_BRANCH and <NAME>_PLUGIN_PACKAGE as above. Plugins specified in this way will be added to ensembl-webcode/conf/Plugins.pm in the order they are listed in the .ini file so those at the top of the list will overwrite functions in plugins further down the list.
  • PUBLIC_PLUGINS can be loaded from the Ensembl public-plugins repository by specifying the directory and package name of each repository as in PUBLIC_PLUGINS = [ ensembl|EnsEMBL::Ensembl genoverse|EnsEMBL::Genoverse ]

The remaining section in setup.ini, [DATA_SOURCE], can also be edited at this stage - see Step 4.1: Update Ensembl webcode for details.