{"_id":"5739b2caf6877e170078f0f2","__v":7,"user":"573592b84b0ab120000b7d44","category":{"_id":"5735e5d9e4824c3400aa1f23","__v":0,"version":"5735936aafab441700723a53","project":"5735936aafab441700723a50","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-05-13T14:34:01.858Z","from_sync":false,"order":9,"slug":"configuration-options-core-import","title":"Configuration Options (Core Import)"},"project":"5735936aafab441700723a50","parentDoc":null,"version":{"_id":"5735936aafab441700723a53","__v":12,"project":"5735936aafab441700723a50","createdAt":"2016-05-13T08:42:18.615Z","releaseDate":"2016-05-13T08:42:18.615Z","categories":["5735936aafab441700723a54","5735a32931a73b1700887c94","5735b55beceb872200abbc6c","5735b56eb667601700d3bd6f","5735b9ba4b0ab120000b7dd4","5735b9c94b0ab120000b7dd5","5735cb131f16241700c8a0f7","5735e5c4e4824c3400aa1f21","5735e5d9e4824c3400aa1f23","5735e5f2ec67f6290013ac72","573ecfe0804f901700a9dfc7","573f276c7eeb8b190094ca7d"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":false,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-05-16T11:45:14.700Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":11,"body":"[block:callout]\n{\n  \"type\": \"info\",\n  \"title\": \"Understanding the [GFF] syntax\",\n  \"body\": \"The ``[GFF]`` stanza uses a meta-syntax to set options for [gff parser](https://github.com/rjchallis/gff-parser).  This method of configuration maintains a lot of flexibility in the variations of ``.gff`` file that can be processed, and is particularly useful for [Repairing gff](doc:repairing-gff), but may appear slightly intimidating at first.  \\n\\nIf you are unsure how to relate this to your ``.gff`` file after reading this documentation then a good place to start is by just using the same settings as the example below.  Setting fewer ``EXPECTATIONS`` for \\\"clean\\\" ``.gff`` will save a little processing time but for most files, these settings will not cause any problems.  If your ``.gff`` has characteristics that need more conditions then running the script in [Step 2.3: Prepare the gff file for import](doc:step-23-prepare-the-gff-file-for-import) should give informative error messages that can be compared to the examples (try pasting the error/warning into the search box) to show you what to do.\"\n}\n[/block]\n```\n[GFF]\n  ;  SPLIT = [ ##FASTA GFF CONTIG ]\n  SORT = 1\n  CHUNK = [ change region ]\n  ;  CHUNK = [ separator\t\t### ]\n  CONDITION1 = [ MULTILINE   CDS ]\n  CONDITION1a = [ MULTILINE  five_prime_UTR ]\n  CONDITION1b = [ MULTILINE  three_prime_UTR ]\n  CONDITION2 = [ EXPECTATION cds\t hasSister exon force ]\n  CONDITION3 = [ EXPECTATION cds\t hasParent mrna force ];\n  CONDITION4 = [ EXPECTATION exon\t hasParent mrna force ];\n  CONDITION4a = [ EXPECTATION five_prime_UTR hasParent mrna force ];\n  CONDITION4b = [ EXPECTATION three_prime_UTR  hasParent mrna force ];\n  CONDITION5 = [ EXPECTATION mrna\t hasParent gene force ];\n  CONDITION10 = [ EXPECTATION cds|exon|mrna|three_prime_UTR|five_prime_UTR|gene <=[_start,_end] SELF warn ];\n```\n\n- For files with fasta sequence included at the end, ``SPLIT`` will split the gff file on the specified keyword (``##FASTA``) and assign the resulting subfiles to the [[FILES]](doc:files-core) handles ``GFF`` and ``CONTIG``\n- ``SORT`` is a flag to determine whether the file should be sorted prior to processing.  This is a basic sort which will result in each sequence region forming a block in the sorted file, allowing the file to be processed in chunks for much faster performance.\n- ``CHUNK`` causes the file to be processed in independent chunks, which is much more efficient than reading the entire file into memory, particularly if there are a large number of validation steps.\n  - for sorted files, specifying ``change region`` will split the file into a separate chunk for each sequence region.\n  - alternatively, for files with additional formatting rows, the file may be split on specific ``separator``s\n- Most other keys (e.g. ``CONDITION1``) can have any name and will be used to set validation conditions.\n  - Each feature in a ``.gff`` file should have a unique ID.  Specifying ``MULTILINE`` allows individual CDS features, for example to be defined across multiple lines.\n  - ``EXPECTATION``s can be set for individual feature types (or pipe-separated sets of feature types) and may be of type ``hasParent <type>`` (feature has a parent feature of the named type) or ``hasSister <type>`` (feature shares a parent with a feature of the named type at overlapping coordinates), or one of a set of comparison operators ``<``, ``<=``, ``==``, >=``, ``>``.  \n  - For each expectation, the behaviour of the validator can be set to ``ignore``, ``warn``, ``find`` a matching feature, ``make`` a matching feature, ``force`` (``find`` followed by ``make``), or ``die``.","excerpt":"","slug":"gff-core","type":"basic","title":"[GFF]"}
[block:callout] { "type": "info", "title": "Understanding the [GFF] syntax", "body": "The ``[GFF]`` stanza uses a meta-syntax to set options for [gff parser](https://github.com/rjchallis/gff-parser). This method of configuration maintains a lot of flexibility in the variations of ``.gff`` file that can be processed, and is particularly useful for [Repairing gff](doc:repairing-gff), but may appear slightly intimidating at first. \n\nIf you are unsure how to relate this to your ``.gff`` file after reading this documentation then a good place to start is by just using the same settings as the example below. Setting fewer ``EXPECTATIONS`` for \"clean\" ``.gff`` will save a little processing time but for most files, these settings will not cause any problems. If your ``.gff`` has characteristics that need more conditions then running the script in [Step 2.3: Prepare the gff file for import](doc:step-23-prepare-the-gff-file-for-import) should give informative error messages that can be compared to the examples (try pasting the error/warning into the search box) to show you what to do." } [/block] ``` [GFF] ; SPLIT = [ ##FASTA GFF CONTIG ] SORT = 1 CHUNK = [ change region ] ; CHUNK = [ separator ### ] CONDITION1 = [ MULTILINE CDS ] CONDITION1a = [ MULTILINE five_prime_UTR ] CONDITION1b = [ MULTILINE three_prime_UTR ] CONDITION2 = [ EXPECTATION cds hasSister exon force ] CONDITION3 = [ EXPECTATION cds hasParent mrna force ]; CONDITION4 = [ EXPECTATION exon hasParent mrna force ]; CONDITION4a = [ EXPECTATION five_prime_UTR hasParent mrna force ]; CONDITION4b = [ EXPECTATION three_prime_UTR hasParent mrna force ]; CONDITION5 = [ EXPECTATION mrna hasParent gene force ]; CONDITION10 = [ EXPECTATION cds|exon|mrna|three_prime_UTR|five_prime_UTR|gene <=[_start,_end] SELF warn ]; ``` - For files with fasta sequence included at the end, ``SPLIT`` will split the gff file on the specified keyword (``##FASTA``) and assign the resulting subfiles to the [[FILES]](doc:files-core) handles ``GFF`` and ``CONTIG`` - ``SORT`` is a flag to determine whether the file should be sorted prior to processing. This is a basic sort which will result in each sequence region forming a block in the sorted file, allowing the file to be processed in chunks for much faster performance. - ``CHUNK`` causes the file to be processed in independent chunks, which is much more efficient than reading the entire file into memory, particularly if there are a large number of validation steps. - for sorted files, specifying ``change region`` will split the file into a separate chunk for each sequence region. - alternatively, for files with additional formatting rows, the file may be split on specific ``separator``s - Most other keys (e.g. ``CONDITION1``) can have any name and will be used to set validation conditions. - Each feature in a ``.gff`` file should have a unique ID. Specifying ``MULTILINE`` allows individual CDS features, for example to be defined across multiple lines. - ``EXPECTATION``s can be set for individual feature types (or pipe-separated sets of feature types) and may be of type ``hasParent <type>`` (feature has a parent feature of the named type) or ``hasSister <type>`` (feature shares a parent with a feature of the named type at overlapping coordinates), or one of a set of comparison operators ``<``, ``<=``, ``==``, >=``, ``>``. - For each expectation, the behaviour of the validator can be set to ``ignore``, ``warn``, ``find`` a matching feature, ``make`` a matching feature, ``force`` (``find`` followed by ``make``), or ``die``.