GeneCards
GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes.[1][2][3][4] It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science. This database aims at providing a quick overview of the current available biomedical information about the searched gene, including the human genes, the encoded proteins, and the relevant diseases.[1][5][6] The GeneCards database provides access to free Web resources about more than 7000 all known human genes that integrated from >90 data resources, such as HGNC, Ensembl, and NCBI. The core gene list is based on approved gene symbols published by the HUGO Gene Nomenclature Committee (HGNC).[7][8] The information are carefully gathered and selected from these databases by the powerful and user-friendly engine. If the search does not return any results, this database will give several suggestions to help users accomplish their searching depended on the type of query, and offer direct links to other databases’ search engine.[1] Over time, the GeneCards database has developed a suite of tools (GeneDecks, GeneLoc, GeneALaCart) that has more specialised capability. Since 1998, the GeneCards database has been widely used by bioinformatics, genomics and medical communities for more than 15 years.[7][8]
History
Since the 1980s, sequence information has become increasingly abundant, many laboratories realized this and started to store these information in the central repositories- the primary database.[9] However, the information provided by the primary sequence databases (lower level database) focus on different aspect. To gather these scattered data, The Weizmann Institute of Science Crown Human Genome Centre developed a database called ‘GeneCards’ in 1997. This database mainly deals with the human genome information, human genes, the encoded proteins’ functions, and the related diseases.[1]
Growth
At first, it only includes two main features: the function to get integrated biomedical information about certain gene in ‘card’ format; and a search engine based on text. Since 1998, by adding new database resources and data types, such as protein expression and gene network, using a more speedy and sophisticated search engine that can get a better performance of results, and a revamped, technologically enabling infrastructure expanding the gene-centric dogma to contain gene-set analyses, the GeneCards Database has evolved significantly from the initial version. . Currently, the version 3 gather information from more than 90 database resources based on a consolidated gene list and this process can be conducted offline. It has developed a set of GeneCards suit, which are focus one more specific purposes. “GeneNote and GeneAnnot for transcriptome analyses, GeneLoc for genomic locations and markers, GeneALaCart for batch queries and GeneDecks for finding functional partners and for gene set distillations.” Nearly every 3 years life cycle, a new planning phase for subsequent revision will start, including implementation, development and semi-automated quality assurance, and deployment. Technologies used include Eclipse, Apache, Perl, XML, PHP, Propel, Java, R and MySQL.[7][8]
Ongoing GeneCards Expansions[7]
Animal models
Tissue proteomics profiling
RNA genes
Gene and protein identifier mapping
Online analytical processing (OLAP)
Availability
GeneCards can be freely accessed by non-profit institution for educational and research purpose at http://www.genecards.org/ and academic mirror sites. Commercial usage requires a license.
GeneCards Suite
GeneDecks
GeneDecks is a novel analysis tool to identify similar or partner genes, which provides a similarity metric by highlighting shared descriptors between genes, based on GeneCards’ unique wealth of combinatorial annotations of human genes.
- Annotation combinatory: Using GeneDecks, one can get a set of similar genes for a particular gene with a selected combinatorial annotation. The summary table result in ranking the different level of similarity between the identified genes and the probe gene.
- Annotation unification: Different data source often offer annotations with heterogeneous naming system. Annotation unification of GeneDecks is based on the similarity in GeneCards gene-content space detection algorithms.
- Partner hunting: In GeneDecks’s Partner Hunter, users give a query gene, and the system seeks similar genes based on combinatorial similarity of weighted attributes.
- Set distillation: In Set distiller, users give a set of genes, and the system ranks attributes by their degree of sharing within a given gene set. Like Partner Hunter, it enables sophisticated investigation of a variety of gene sets, of diverse origins, for discovering and elucidating relevant biological patterns, thus enhancing systematic genomics and systems biology scrutiny.[8][10][11]
GeneALaCart
GeneALaCart is a gene-set-orientated batch-querying engine based on the popular GeneCards database. It allows retrieval of information about multiple genes in a batch query.[7][12]
GeneLoc
The GeneLoc suit member presents an integrated human chromosome map, which is very important for designing a custom-made capture chip, based on data integrated by the GeneLoc algorithm. GeneLoc includes further links to GeneCards, NCBI's Human Genome Sequencing, UniGene, and mapping resources.[7][13]
Usage
Search
Firstly, enter what you want to search into the blank on the homepages. Searching methods include Keywords, Symbol only, Symbol/Alias/Identifier and Symbol/Alias.[5] The default search option is searching by keywords. When you search by keywords, MicroCard and MiniCard are shown. However, when you search by Symbol only, you’ll go directly to GeneCard.[14] Also, you can further your search by clicking on advanced search, where you can choose section, category, GIFtS, Symbol Source and gene sets directly. Sections include Aliases & Descriptions, Disorders, Drugs & Compounds, Expression in Human Tissues, Function, Genomic Location, Genomic Variants, Orthologs, Paralogs, Pathways &Interactions, Protein Domains/Families, Proteins, Publications, Summaries and Transcripts. The default option is searching for all sections.[5] Categories involve Protein-coding, Pseudo genes, RNA genes, Genetic Loci, Gene clusters and Uncategorized. The default option is searching for all categories.[5] GIFtS is the GeneCards Inferred Functionality Scores, which gives objective numbers to show the knowledge level about the functionality of human genes. It includes High, Medium, Low or you can give custom range.[4][15] Symbol Sources include HGNC (HUGO Gene Nomenclature Committee), EntrezGene (gene-centered information at NCBI), Ensembl, GeneCards RNA genes, CroW21 and so on.[5]
Moreover, you can choose to search for all GeneCards or Within gene subset, which would be more specific and with priority.
Secondly,the search result page shows all relevant minicards. Symbol, Description, Category, GIFtS, GC id and Score are displayed on the page.[5]</ref> Click on the plus button for each of the mini-cards if you want to open the minicard. Also, you can click directly on the symbol to see the details of a particular GeneCard.
GeneCards Content[5]
For a particular GeneCard, it is consist of the following contents.
- Header: The header is made up of gene’s symbol, category (i.e. protein-coding), GIFtS(i.e. 74) and GCID(GC19M041837). Different categories have different colors to express: protein-coding, pseudogene, RNA gene, gene cluster, genetic locus, and uncategorized. The background indicates the symbol sources: HGNC Approved Genes, EntrezGene Database, Ensembl Gene Database, or GeneCards Generated Genes.
- Aliases: Aliases, as its name indicates, shows synonyms and aliases of the gene according to diverse sources such as HGNC. The right column displays how the aliases associated with the resources and gives previous GC identifiers.
- Summaries: The left column is the same with the one in the Aliases, which shows the sources. The right column here gives brief summary on gene’s function, localization and effect on phenotype from various sources.
- Genomic Views: In addition to sources, this section gives reference DNA sequence, regulatory elements, epigenetics, chromosome band and genomic location of different sources. The red line on the image indicates the GeneLoc integrated location. In particular, if the GeneLoc integrated location is different from the location in Entrez Gene, it is shown in green; Blue is appeared when the GeneLoc integrated location differs from the location in Ensembl. Addition details can be accessed through the links in the section.
- Proteins: This section presents annotated information of genes, including recommended name, size, subunit, subcellular location and secondary accessions. Also, post-translational modifications, protein expression data, REF SEQ proteins, ENSEMBL proteins, Reactome Protein details, Human Recombinant Protein Products, Gene Ontology, Antibody Products and Assay Products are introduced.
- Protein Domains/Families: This section shows annotated information of protein domains and families.
- Function: The function section describes gene function, including: Human phenotypes, bound Targets, shRNA for human and/or mouse/rat, miRNA Gene Targets, RNAi products, microRNA for human and/or mouse/rat orthologs, Gene Editing, Clones, Cell Lines, Animal models, in situ hybridization assays.
- Pathways & Interactions: This section shows unified GeneCards pathways and interactions that are from different sources. Unified GeneCards pathways are collected into super-pathways, which displays the connection between different pathways. Interaction shows interactant and interaction details.
- Drugs & Compounds: This section connects GeneCards with drugs and compounds. TOCRIS compounds show compound, action and CAS number. DrugBank compound gives compound, synonyms, CAS number (Chemical Abstracts Registry number), type (transporter/target/carrier/enzyme), actions and PubMed IDs. HMDB and Novoseek show the relationships of chemical compounds, which includes compound,synonyms, CAS number and PubMed IDs (articles related to the compound). BitterDB displays compound, CAS number and SMILES (Simplified Molecular Input Line Entry Specification). PharmGKB gives drug/compound and its annotation.
- Transcripts: This section is consist of reference sequence mRNAs, Unigene Cluster and representatice Sequence, miRNA products, inhib.RNA products, Clone products, primer products and additional mRNA sequence. Also, you can gain exon structure from GeneLoc.
- Expression: The left column shows the resources of the data. Expression images and data, similar genes, PCR arrays, primers for human and in situ hybridization assays are included in this section.
- Orthologs: This section gives orthologs for a particular gene from numbers of species. The table displays the corresponding organism, taxonomic classification, gene, description, human similarity, orthology type and details. It’s connected to ENSEMBL Gene Tree and TreeFam Gene Tree.
- Paralogs: This section displays paralogs and pseudogenes for a particular gene.
- Genomic Variants: The genomic variants show the result of NCBI SNPs/Variants, HapMap linkage disequilibrium report, structural variations, human gene mutation database(HGMD), QIAGEN SeqTarget long-range PCR primers in human, mouse &rat and SABiosciences cancer mutation PCR arrays. The table in this section shows SNP ID, Valid, Clinical significance, Chr pos, Sequence for genomic data, AAChg, Type and More for transcription related data, Allele freq, Pop, Total sample and More for Allele Frequencies. For Valid, the different character represents different validation methods. ‘C’ means by-cluster; ‘A’ is by-2hit-2allele; ‘F’ is by-frequency; ‘H’ is by-hapmap and ‘O’ is by-other-pop. Clinical significance can be one of the following: non-pathogenic, pathogenic, drug-response, histocompatibility, probable-non-pathogenic, probable-pathogenic, untested, unknown and other. Type should be one of these: nonsynon, syn, cds, spl, utr, int, exc, loc, stg, ds500, spa, spd, us2k, us5k, PupaSUITE Designations.
- Disorders/Diseases: Shows disorders/diseases associated with the gene.
- Publications: Displays publications associated with the gene.
- External Searches: You can search more information in PubMed, OMIM and NCBI.
- Genome Databases: Other Databases, and specialized Databases.
- Intellectual Property: This section gives patent information and licensable technologies.
- Products
Application
The GeneCards, which presents human genetic information, has now been widely used in biology and biomedical field. For example, S.H. Shah extracted data of early-onset coronary artery disease from GeneCards to identify genes that contributes to the disease. Chromosome 3q13, 1q25 etc. are confirmed to take effects and this paper further discussed the relationship between morbid genes and serum lipoproteins with the help of GeneCard.[16]
Another example is the research on synthetic lethality in cancer. Synthetic lethality appears when mutation in a single gene has no effect on the function of cell while mutation in one more additional gene leads to the death of the cell. This project is aimed to find novel method to treat cancer through blocking lethality of drugs. GeneCards is used when comparing data of a given target gene with all possible genes. In this process, Partner Hunter will calculate the annotation sharing score and give paralogy. Inactivation targets will be extracted after the microarray experiments of resistant and non-resistant neuroblastoma cell lines.[7]
References
- 1 2 3 4 Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (April 1997). "GeneCards: integrating information about genes, proteins and diseases". Trends Genet. 13 (4): 163. doi:10.1016/S0168-9525(97)01103-7. PMID 9097728.
- ↑ Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (1998). "GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support". Bioinformatics 14 (8): 656–64. doi:10.1093/bioinformatics/14.8.656. PMID 9789091.
- ↑ Safran M, Solomon I, Shmueli O, Lapidot M, Shen-Orr S, Adato A, Ben-Dor U, Esterman N, Rosen N, Peter I, Olender T, Chalifa-Caspi V, Lancet D (November 2002). "GeneCards 2002: towards a complete, object-oriented, human gene compendium". Bioinformatics 18 (11): 1542–3. doi:10.1093/bioinformatics/18.11.1542. PMID 12424129.
- 1 2 Harel A, Inger A, Stelzer G, Strichman-Almashanu L, Dalah I, Safran M, Lancet D (October 2009). "GIFtS: annotation landscape analysis with GeneCards". BMC Bioinformatics 10 (1): 348. doi:10.1186/1471-2105-10-348. PMC 2774327. PMID 19852797.
- 1 2 3 4 5 6 7 "GeneCards". GeneCards. Retrieved 18 Oct 2013.
- ↑ "GeneCards". Retrieved 19 Oct 2013.
- 1 2 3 4 5 6 7 Stelzer G, Dalah I, Iny Stein T, Satanower Y, Rosen N, Nativ N, Oz-Levi D, Olender T, Belinky F, Bahir I, Krug H, Perco P, Mayer B, Kolker E, Safran M and Lancet D (October 2011). "In-silico Human Genomics with GeneCards". Human Genomics 5 (6): 709–717. doi:10.1186/1479-7364-5-6-709. PMC 3525253. PMID 22155609.
- 1 2 3 4 Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M, Nativ N, Bahir I, Doniger T, Krug H, Sirota-Madi A, Olender T, Golan Y, Stelzer G, Harel A and Lancet D (2010). "GeneCards Version 3: the human gene integrator". Database 2010: baq020. doi:10.1093/database/baq020. PMC 2938269. PMID 20689021.
- ↑ Teresa K. Attwood and David J. Parry-Smith (1999). Introduction to Bioinformatics. Harlow Longman.
- ↑ "GeneDecks". GeneCards. Retrieved 20 Oct 2013.
- ↑ Stelzer G, Inger A, Olender T, Iny-Stein T, Dalah I, Harel A, Safran M, Lancet D (December 2009). "GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation". OMICS 13 (6): 477–87. doi:10.1089/omi.2009.0069. PMID 20001862.
- ↑ "GeneALaCart". GeneCards. Retrieved 20 Oct 2013.
- ↑ "GeneLoc". GeneCards. Retrieved 20 Oct 2013.
- ↑ Sridhar, G. R.; et al. (2006). "Bioinformatics approach to extract information from genes". Int J Diabetes Dev Ctries 26: 149–51. doi:10.4103/0973-3930.33179.
- ↑ Chalifa-Caspi, Vered; et al. (2003). "GeneAnnot: interfacing GeneCards with high-throughput gene expression compendia". Briefings in bioinformatics 4 (4): 349–360. doi:10.1093/bib/4.4.349. PMID 14725348.
- ↑ Shah, S. H.; et al. (2006). "Serum lipids in the GENECARD study of coronary artery disease identify quantitative trait loci and phenotypic subsets on chromosomes 3q and 5q". Annals of Human Genetics 70 (6): 738–748. doi:10.1111/j.1469-1809.2006.00288.x. PMID 17044848.