Step 2.1: Fetch/summarise assembly/annotation files
Optional?
This step may be considered optional as the files will be retrieved in subsequent stages if a local copy is not already present in the working directory. However it is useful to have access to a local copy of the files and the summary statistics generated by this step when determining how to process the
.gff
file and which information to assign to stable_ids, synonyms and descriptions during subsequent steps.
Sequence, Annotation and other files can be retrieved from a variety of locations, using wget
, scp
or cp
as appropriate, according to the location. Compressed files will be automatically unzipped. This ensures that the original file locations can be stored in the .ini
file.
Working directory
All scripts in stage 2 assume that data files are present in the current working directory. It is therefore important to
cd
to the directory into which you want local copies of the files to be created and use relative or absolute paths to the scripts/config files.
mkdir ~/import
cd ~/import
perl ../ei/core/summarise_files.pl ../ei/conf/core-import.ini
summarise_files.pl
will create a summary
subdirectory in the current working directory with a summary of the attributes associated with each feature type in the .gff
which is useful when setting options in Step 2.3: Prepare the gff file for import either to retrieve information from particular attributes or to fix broken files.
Configuration options
Only the [FILES] stanza of core-import.ini
is used at this stage.
[FILES]
SCAFFOLD = [ fa http://www.bioinformatics.nl/wintermoth/data_files/Obru1.fsa.gz ]
GFF = [ gff3 http://www.bioinformatics.nl/wintermoth/data_files/Obru_genes.gff.gz ]
PROTEIN = [ fa http://www.bioinformatics.nl/wintermoth/data_files/ObruPep.fasta.gz ]
Updated less than a minute ago