Biological data
Biological data are data or measurements collected from biological sources, which are often stored or exchanged in a digital form. Biological data are commonly stored in files or databases. Examples of biological data are DNA base-pair sequences, and population data used in ecology.
Data File Formats
Each file format has been designed for specific needs and outputs in mind.
- GFF
- BAM
- SAM
- VCF
- AB1 – In DNA sequencing, chromatogram files used by instruments from Applied Biosystems
- ACE – A sequence assembly format
- BAM – Binary compressed SAM format
- BED – The browser extensible display format is used for describing genes and other features of DNA sequences
- CAF – Common Assembly Format for sequence assembly
- EMBL – The flatfile format used by the EMBL to represent database records for nucleotide and peptide sequences from EMBL databases
- FASTA – The FASTA file format, for sequence data. Sometimes also given as FNA or FAA (Fasta Nucleic Acid or Fasta Amino Acid).
- FASTQ – The FASTQ file format, for sequence data with quality. Sometimes also given as QUAL.
- GenBank – The flatfile format used by the NCBI to represent database records for nucleotide and peptide sequences from the GenBank and RefSeq databases
- GFF – The General feature format is used for describing genes and other features of DNA, RNA and protein sequences
- GTF – The Gene transfer format is used to hold information about gene structure.
- NEXUS – The Nexus file encodes mixed information about genetic sequence data in a block structured format.
- NWK – The Newick tree format is a way of representing graph-theoretical trees with edge lengths using parentheses and commas and usefil to hold phylogenetic trees.
- PDB – structures of biomolecules deposited in Protein Data Bank. Also used for exchanging protein/nucleic acid structures.
- PHD – Phred output, from the basecalling software Phred
- SAM – Sequence Alignment/Map format, in which the results of the 1000 Genomes Project will be released.
- SCF – Staden chromatogram files used to store data from DNA sequencing
- SBML – The Systems Biology Markup Language is used to store biochemical network computational models
- SFF - Standard Flowgram Format
- Stockholm – The Stockholm format for representing multiple sequence alignments
- Swiss-Prot – The flatfile format used to represent database records for protein sequences from the Swiss-Prot database
- VCF – Variant Call Format, a standard created by the 1000 Genomes Project that lists and annotates the entire collection of human variants (with the exception of approximately 1.6 million variants).
Biological Data Sharing
- Genomics data sharing
- TransPLANT data
See also
- Bioinformatics
- Biological database
- Biological model (disambiguation)
- Data modeling
- DNA sequencing
- Data mining