{"__v":8,"_id":"5735adde1f16241700c8a0a0","category":{"__v":0,"_id":"5735a32931a73b1700887c94","project":"5735936aafab441700723a50","version":"5735936aafab441700723a53","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-05-13T09:49:29.176Z","from_sync":false,"order":2,"slug":"quick-start","title":"Stage 2 - Core Import"},"parentDoc":null,"project":"5735936aafab441700723a50","user":"573592b84b0ab120000b7d44","version":{"__v":12,"_id":"5735936aafab441700723a53","project":"5735936aafab441700723a50","createdAt":"2016-05-13T08:42:18.615Z","releaseDate":"2016-05-13T08:42:18.615Z","categories":["5735936aafab441700723a54","5735a32931a73b1700887c94","5735b55beceb872200abbc6c","5735b56eb667601700d3bd6f","5735b9ba4b0ab120000b7dd4","5735b9c94b0ab120000b7dd5","5735cb131f16241700c8a0f7","5735e5c4e4824c3400aa1f21","5735e5d9e4824c3400aa1f23","5735e5f2ec67f6290013ac72","573ecfe0804f901700a9dfc7","573f276c7eeb8b190094ca7d"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":false,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-05-13T10:35:10.666Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":1,"body":"[block:callout]\n{\n  \"type\": \"info\",\n  \"title\": \"Optional?\",\n  \"body\": \"This step may be considered optional as the files will be retrieved in subsequent stages if a local copy is not already present in the working directory.  However it is useful to have access to a local copy of the files and the summary statistics generated by this step when determining how to process the ``.gff`` file and which information to assign to stable_ids, synonyms and descriptions during subsequent steps.\"\n}\n[/block]\nSequence, Annotation and other files can be retrieved from a variety of locations, using ``wget``, ``scp`` or ``cp`` as appropriate, according to the location.  Compressed files will be automatically unzipped.  This ensures that the original file locations can be stored in the ``.ini`` file.\n[block:callout]\n{\n  \"type\": \"warning\",\n  \"title\": \"Working directory\",\n  \"body\": \"All scripts in stage 2 assume that data files are present in the current working directory.  It is therefore important to ``cd`` to the directory into which you want local copies of the files to be created and use relative or absolute paths to the scripts/config files.\"\n}\n[/block]\n```\nmkdir ~/import\ncd ~/import\nperl ../ei/core/summarise_files.pl ../ei/conf/core-import.ini\n```\n``summarise_files.pl`` will create a ``summary`` subdirectory in the current working directory with a summary of the attributes associated with each feature type in the ``.gff`` which is useful when setting options in [Step 2.3: Prepare the gff file for import](doc:step-23-prepare-the-gff-file-for-import) either to retrieve information from particular attributes or to fix broken files.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Configuration options\"\n}\n[/block]\nOnly the [[FILES]](doc:files-core) stanza of ``core-import.ini`` is used at this stage.\n```\n[FILES]\n  SCAFFOLD = [ fa http://www.bioinformatics.nl/wintermoth/data_files/Obru1.fsa.gz ]\n  GFF = [ gff3 http://www.bioinformatics.nl/wintermoth/data_files/Obru_genes.gff.gz ]\n  PROTEIN = [ fa http://www.bioinformatics.nl/wintermoth/data_files/ObruPep.fasta.gz ]\n```","excerpt":"","slug":"step-21-fetchsummarise-assemblyannotation-files","type":"basic","title":"Step 2.1: Fetch/summarise assembly/annotation files"}

Step 2.1: Fetch/summarise assembly/annotation files


[block:callout] { "type": "info", "title": "Optional?", "body": "This step may be considered optional as the files will be retrieved in subsequent stages if a local copy is not already present in the working directory. However it is useful to have access to a local copy of the files and the summary statistics generated by this step when determining how to process the ``.gff`` file and which information to assign to stable_ids, synonyms and descriptions during subsequent steps." } [/block] Sequence, Annotation and other files can be retrieved from a variety of locations, using ``wget``, ``scp`` or ``cp`` as appropriate, according to the location. Compressed files will be automatically unzipped. This ensures that the original file locations can be stored in the ``.ini`` file. [block:callout] { "type": "warning", "title": "Working directory", "body": "All scripts in stage 2 assume that data files are present in the current working directory. It is therefore important to ``cd`` to the directory into which you want local copies of the files to be created and use relative or absolute paths to the scripts/config files." } [/block] ``` mkdir ~/import cd ~/import perl ../ei/core/summarise_files.pl ../ei/conf/core-import.ini ``` ``summarise_files.pl`` will create a ``summary`` subdirectory in the current working directory with a summary of the attributes associated with each feature type in the ``.gff`` which is useful when setting options in [Step 2.3: Prepare the gff file for import](doc:step-23-prepare-the-gff-file-for-import) either to retrieve information from particular attributes or to fix broken files. [block:api-header] { "type": "basic", "title": "Configuration options" } [/block] Only the [[FILES]](doc:files-core) stanza of ``core-import.ini`` is used at this stage. ``` [FILES] SCAFFOLD = [ fa http://www.bioinformatics.nl/wintermoth/data_files/Obru1.fsa.gz ] GFF = [ gff3 http://www.bioinformatics.nl/wintermoth/data_files/Obru_genes.gff.gz ] PROTEIN = [ fa http://www.bioinformatics.nl/wintermoth/data_files/ObruPep.fasta.gz ] ```