General feature format

The general feature format (gene-finding format, generic feature format, GFF) is a file format used for describing genes and other features of DNA, RNA and protein sequences. The filename extension associated with such files is .GFF.

There are two versions of the GFF file format in general use:

Servers that generate this format:

Server Example file
UniProt

Clients that use this format:

Name Description Links
GBrowse GMOD genome viewer GBrowse
IGB Integrated Genome Browser Integrated Genome Browser
Jalview A multiple sequence alignment editor & viewer Jalview
STRAP Underlining sequence features in multiple alignments. Example output:

GFF Versions

GFF Version 2 has a number of deficiencies, notably that it can only represent two-level feature hierarchies and thus cannot handle the three-level hierarchy of gene → transcript → exon. GFF3 addresses this and other deficiencies. For example, it supports arbitrarily many hierarchical levels, and gives specific meanings to certain tags in the attributes field.

The Gene transfer format (GTF) is a refinement of GFF Version 2 and is sometimes referred to as GFF2.5.[1]

Validation

The modENCODE project hosts an online GFF3 validation tool with generous limits of 286.10 MB and 15 million lines.

The Genome Tools software collection contains a gff3validator tool that can be used offline to validate and possibly tidy GFF3 files. An online validation service is also available.

See also

References

  1. http://gmod.org/wiki/GFF3