{"__v":29,"_id":"573c3106fe58321900f1b860","category":{"project":"5735936aafab441700723a50","version":"5735936aafab441700723a53","_id":"573f276c7eeb8b190094ca7d","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-05-20T15:04:12.866Z","from_sync":false,"order":6,"slug":"additional-documentation","title":"Additional documentation"},"parentDoc":null,"project":"5735936aafab441700723a50","user":"573592b84b0ab120000b7d44","version":{"__v":12,"_id":"5735936aafab441700723a53","project":"5735936aafab441700723a50","createdAt":"2016-05-13T08:42:18.615Z","releaseDate":"2016-05-13T08:42:18.615Z","categories":["5735936aafab441700723a54","5735a32931a73b1700887c94","5735b55beceb872200abbc6c","5735b56eb667601700d3bd6f","5735b9ba4b0ab120000b7dd4","5735b9c94b0ab120000b7dd5","5735cb131f16241700c8a0f7","5735e5c4e4824c3400aa1f21","5735e5d9e4824c3400aa1f23","5735e5f2ec67f6290013ac72","573ecfe0804f901700a9dfc7","573f276c7eeb8b190094ca7d"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":false,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-05-18T09:08:22.495Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":2,"body":"Gene, transcript and translation [[_STABLE_IDS]](doc:_stable_ids-core), [[_NAMES]](doc:_names-core) and [[_DESCRIPTIONS]](doc:_descriptions-core) can be set based on any attributes of a feature or related feature within a ``.gff`` file by following specific syntactic conventions in the ``.ini`` file.\n\nGiven a basic ``.gff``:\n```\nscaffold1\t.\tgene\t1389\t2804\t.\t+\t.\tID=gene1;Name=Eg00001\nscaffold1\t.\tmRNA\t1389\t2804\t.\t+\t.\tID=mrna1;Parent=gene1;Name=Eg00001-RA\nscaffold1\t.\tCDS\t1389\t1571\t.\t+\t0\tID=cds1;Parent=mrna1;Name=Eg00001-PA\nscaffold1\t.\tCDS\t1881\t2054\t.\t+\t0\tID=cds1;Parent=mrna1;Name=Eg00001-PA\nscaffold1\t.\tCDS\t2657\t2804\t.\t+\t2\tID=cds1;Parent=mrna1;Name=Eg00001-PA\nscaffold1\t.\texon\t1389\t1571\t.\t+\t.\tID=exon1;Parent=mrna1\nscaffold1\t.\texon\t1881\t2054\t.\t+\t.\tID=exon1;Parent=mrna1\nscaffold1\t.\texon\t2321\t2469\t.\t+\t.\tID=exon1;Parent=mrna1\n```\n\nThere is no information here for  [[_NAMES]](doc:_names-core) (i.e. synonyms) or [[_DESCRIPTIONS]](doc:_descriptions-core) and the [[_STABLE_IDS]](doc:_stable_ids-core) in each case should use the corresponding ``Name`` attribute:\n```\n[GENE_STABLE_IDS]\n  GFF = [ gene->Name /(.+)/ ]\n[TRANSCRIPT_STABLE_IDS]\n  GFF = [ mRNA->Name /(.+)/ ]\n[TRANSLATION_STABLE_IDS]\n  GFF = [ CDS->Name /(.+)/ ]\n```\n\n## Nested feature types\n\nWhen ``.gff`` is parsed, each gene is processed separately.  While processing a gene, the script has access to all nested features of that gene, and similarly for transcripts, the script has access to the parent gene and nested features of the transcript, but not to alternate transcripts.  Translations are processed at the level of the associated transcript.\n\n- \n```\n[GENE_STABLE_IDS]\n    GFF = [ gene->Name /(.+)/ ]\n    GFF = [ mRNA->Name /(.+)/ /-RA// ]\n    GFF = [ CDS->Name /(.+)/ /-PA// ]\n[TRANSCRIPT_STABLE_IDS]\n    GFF = [ mRNA->Name /(.+)/ ]\n    GFF = [ gene->Name /(.+)/ /(.+)/$1-PA/ ]\n    GFF = [ CDS->Name /(.+)/ /-PA/-RA/ ]\n[TRANSLATION_STABLE_IDS]\n    GFF = [ CDS->Name /(.+)/ ]\n    GFF = [ gene->Name /(.+)/ /(.+)/$1-PA/ ]\n    GFF = [ mRNA->Name /(.+)/ /-RA/-PA/ ]\n```\n  are all valid (optionally using [Match and replace](doc:match-and-replace) to extract the same string each case)\n\n## The ``SELF`` keyword\n\n- The keyword ``SELF`` will always refer to the current gene/transcript feature.   \n\n- ```\n[GENE_STABLE_IDS]\n    GFF = [ gene->Name /(.+)/ ]\n    GFF = [ SELF->Name /(.+)/ ]\n``` \n  are equivalent ways of referring to the same ``gene`` attribute.  \n\n- Transcript IDs may have different types so \n```\n[TRANSCRIPT_STABLE_IDS]\n    GFF = [ mRNA->Name /(.+)/ ]\n    GFF = [ SELF->Name /(.+)/ ]\n``` \n  are non-equivalent.\n  - ``GFF = [ mRNA->Name /(.+)/ ]`` will only return a stable_id for transcripts of type mRNA\n  - ``GFF = [ SELF->Name /(.+)/ ]`` will return a stable_id for any transcript type.\n  - See [Processing exceptions](doc:processing-exceptions) for an explanation of how to use this distinction when processing ``.gff`` with multiple transcript types.\n\n- ```\n[TRANSLATION_STABLE_IDS]\n    GFF = [ CDS->Name /(.+)/ ]\n    GFF = [ SELF->Name /(.+)/ /-RA/-PA/ ]\n``` \n  Here, ``SELF`` refers to the parent transcript so to achieve the same naming use [Match and replace](doc:match-and-replace) to substitute ``-PA`` for ``-RA`` (useful for files without ``CDS`` features lack ``Name`` attributes.\n\n## The ``DAUGHTER`` keyword\n\n- The keyword ``DAUGHTER`` refers to the first child of the current feature and is most useful to retrieve gene attributes from any daughter transcript type\n```\n[GENE_STABLE_IDS]\n    GFF = [ mRNA->Name /(.+)/ ]\n    GFF = [ DAUGHTER->Name /(.+)/ ]\n```\n  - ``GFF = [ mRNA->Name /(.+)/ ]`` will only return a stable_id for genes with daughter features of type ``mRNA``\n  - ``GFF = [ DAUGHTER->Name /(.+)/ ]`` will return a stable_id for genes with a daughter transcript of any type.","excerpt":"","slug":"referencing-gff-attributes","type":"basic","title":"Referencing gff attributes"}

Referencing gff attributes


Gene, transcript and translation [[_STABLE_IDS]](doc:_stable_ids-core), [[_NAMES]](doc:_names-core) and [[_DESCRIPTIONS]](doc:_descriptions-core) can be set based on any attributes of a feature or related feature within a ``.gff`` file by following specific syntactic conventions in the ``.ini`` file. Given a basic ``.gff``: ``` scaffold1 . gene 1389 2804 . + . ID=gene1;Name=Eg00001 scaffold1 . mRNA 1389 2804 . + . ID=mrna1;Parent=gene1;Name=Eg00001-RA scaffold1 . CDS 1389 1571 . + 0 ID=cds1;Parent=mrna1;Name=Eg00001-PA scaffold1 . CDS 1881 2054 . + 0 ID=cds1;Parent=mrna1;Name=Eg00001-PA scaffold1 . CDS 2657 2804 . + 2 ID=cds1;Parent=mrna1;Name=Eg00001-PA scaffold1 . exon 1389 1571 . + . ID=exon1;Parent=mrna1 scaffold1 . exon 1881 2054 . + . ID=exon1;Parent=mrna1 scaffold1 . exon 2321 2469 . + . ID=exon1;Parent=mrna1 ``` There is no information here for [[_NAMES]](doc:_names-core) (i.e. synonyms) or [[_DESCRIPTIONS]](doc:_descriptions-core) and the [[_STABLE_IDS]](doc:_stable_ids-core) in each case should use the corresponding ``Name`` attribute: ``` [GENE_STABLE_IDS] GFF = [ gene->Name /(.+)/ ] [TRANSCRIPT_STABLE_IDS] GFF = [ mRNA->Name /(.+)/ ] [TRANSLATION_STABLE_IDS] GFF = [ CDS->Name /(.+)/ ] ``` ## Nested feature types When ``.gff`` is parsed, each gene is processed separately. While processing a gene, the script has access to all nested features of that gene, and similarly for transcripts, the script has access to the parent gene and nested features of the transcript, but not to alternate transcripts. Translations are processed at the level of the associated transcript. - ``` [GENE_STABLE_IDS] GFF = [ gene->Name /(.+)/ ] GFF = [ mRNA->Name /(.+)/ /-RA// ] GFF = [ CDS->Name /(.+)/ /-PA// ] [TRANSCRIPT_STABLE_IDS] GFF = [ mRNA->Name /(.+)/ ] GFF = [ gene->Name /(.+)/ /(.+)/$1-PA/ ] GFF = [ CDS->Name /(.+)/ /-PA/-RA/ ] [TRANSLATION_STABLE_IDS] GFF = [ CDS->Name /(.+)/ ] GFF = [ gene->Name /(.+)/ /(.+)/$1-PA/ ] GFF = [ mRNA->Name /(.+)/ /-RA/-PA/ ] ``` are all valid (optionally using [Match and replace](doc:match-and-replace) to extract the same string each case) ## The ``SELF`` keyword - The keyword ``SELF`` will always refer to the current gene/transcript feature. - ``` [GENE_STABLE_IDS] GFF = [ gene->Name /(.+)/ ] GFF = [ SELF->Name /(.+)/ ] ``` are equivalent ways of referring to the same ``gene`` attribute. - Transcript IDs may have different types so ``` [TRANSCRIPT_STABLE_IDS] GFF = [ mRNA->Name /(.+)/ ] GFF = [ SELF->Name /(.+)/ ] ``` are non-equivalent. - ``GFF = [ mRNA->Name /(.+)/ ]`` will only return a stable_id for transcripts of type mRNA - ``GFF = [ SELF->Name /(.+)/ ]`` will return a stable_id for any transcript type. - See [Processing exceptions](doc:processing-exceptions) for an explanation of how to use this distinction when processing ``.gff`` with multiple transcript types. - ``` [TRANSLATION_STABLE_IDS] GFF = [ CDS->Name /(.+)/ ] GFF = [ SELF->Name /(.+)/ /-RA/-PA/ ] ``` Here, ``SELF`` refers to the parent transcript so to achieve the same naming use [Match and replace](doc:match-and-replace) to substitute ``-PA`` for ``-RA`` (useful for files without ``CDS`` features lack ``Name`` attributes. ## The ``DAUGHTER`` keyword - The keyword ``DAUGHTER`` refers to the first child of the current feature and is most useful to retrieve gene attributes from any daughter transcript type ``` [GENE_STABLE_IDS] GFF = [ mRNA->Name /(.+)/ ] GFF = [ DAUGHTER->Name /(.+)/ ] ``` - ``GFF = [ mRNA->Name /(.+)/ ]`` will only return a stable_id for genes with daughter features of type ``mRNA`` - ``GFF = [ DAUGHTER->Name /(.+)/ ]`` will return a stable_id for genes with a daughter transcript of any type.