General feature format
The general feature format (gene-finding format, generic feature format, GFF) is a file format used for describing genes and other features of DNA, RNA and protein sequences. The filename extension associated with such files is .GFF
.
There are two versions of the GFF file format in general use:
- General Feature Format Version 2 (Sanger Institute)
- Generic Feature Format Version 3 (Sequence Ontology Project)
Servers that generate this format:
Server | Example file |
---|---|
UniProt | |
Clients that use this format:
Name | Description | Links |
---|---|---|
GBrowse | GMOD genome viewer | GBrowse |
IGB | Integrated Genome Browser | Integrated Genome Browser |
Jalview | A multiple sequence alignment editor & viewer | Jalview |
STRAP | Underlining sequence features in multiple alignments. Example output: |
GFF Versions
GFF Version 2 has a number of deficiencies, notably that it can only represent two-level feature hierarchies and thus cannot handle the three-level hierarchy of gene → transcript → exon. GFF3 addresses this and other deficiencies. For example, it supports arbitrarily many hierarchical levels, and gives specific meanings to certain tags in the attributes field.
The Gene transfer format (GTF) is a refinement of GFF Version 2 and is sometimes referred to as GFF2.5.[1]
Validation
The modENCODE project hosts an online GFF3 validation tool with generous limits of 286.10 MB and 15 million lines.
The Genome Tools software collection contains a gff3validator tool that can be used offline to validate and possibly tidy GFF3 files. An online validation service is also available.