[GFF]
Understanding the [GFF] syntax
The
[GFF]
stanza uses a meta-syntax to set options for gff parser. This method of configuration maintains a lot of flexibility in the variations of.gff
file that can be processed, and is particularly useful for Repairing gff, but may appear slightly intimidating at first.If you are unsure how to relate this to your
.gff
file after reading this documentation then a good place to start is by just using the same settings as the example below. Setting fewerEXPECTATIONS
for "clean".gff
will save a little processing time but for most files, these settings will not cause any problems. If your.gff
has characteristics that need more conditions then running the script in Step 2.3: Prepare the gff file for import should give informative error messages that can be compared to the examples (try pasting the error/warning into the search box) to show you what to do.
[GFF]
; SPLIT = [ ##FASTA GFF CONTIG ]
SORT = 1
CHUNK = [ change region ]
; CHUNK = [ separator ### ]
CONDITION1 = [ MULTILINE CDS ]
CONDITION1a = [ MULTILINE five_prime_UTR ]
CONDITION1b = [ MULTILINE three_prime_UTR ]
CONDITION2 = [ EXPECTATION cds hasSister exon force ]
CONDITION3 = [ EXPECTATION cds hasParent mrna force ];
CONDITION4 = [ EXPECTATION exon hasParent mrna force ];
CONDITION4a = [ EXPECTATION five_prime_UTR hasParent mrna force ];
CONDITION4b = [ EXPECTATION three_prime_UTR hasParent mrna force ];
CONDITION5 = [ EXPECTATION mrna hasParent gene force ];
CONDITION10 = [ EXPECTATION cds|exon|mrna|three_prime_UTR|five_prime_UTR|gene <=[_start,_end] SELF warn ];
- For files with fasta sequence included at the end,
SPLIT
will split the gff file on the specified keyword (##FASTA
) and assign the resulting subfiles to the [FILES] handlesGFF
andCONTIG
SORT
is a flag to determine whether the file should be sorted prior to processing. This is a basic sort which will result in each sequence region forming a block in the sorted file, allowing the file to be processed in chunks for much faster performance.CHUNK
causes the file to be processed in independent chunks, which is much more efficient than reading the entire file into memory, particularly if there are a large number of validation steps.- for sorted files, specifying
change region
will split the file into a separate chunk for each sequence region. - alternatively, for files with additional formatting rows, the file may be split on specific
separator
s
- for sorted files, specifying
- Most other keys (e.g.
CONDITION1
) can have any name and will be used to set validation conditions.- Each feature in a
.gff
file should have a unique ID. SpecifyingMULTILINE
allows individual CDS features, for example to be defined across multiple lines. EXPECTATION
s can be set for individual feature types (or pipe-separated sets of feature types) and may be of typehasParent <type>
(feature has a parent feature of the named type) orhasSister <type>
(feature shares a parent with a feature of the named type at overlapping coordinates), or one of a set of comparison operators<
,<=
,==
, >=,
>``.- For each expectation, the behaviour of the validator can be set to
ignore
,warn
,find
a matching feature,make
a matching feature,force
(find
followed bymake
), ordie
.
- Each feature in a
Updated less than a minute ago