Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins.
Hutchins JR.
???displayArticle.abstract???
The genomic era has enabled research projects that use approaches including genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrometry-based proteomics to discover genes and proteins involved in biological processes. Such methods generate data sets of gene, transcript, or protein hits that researchers wish to explore to understand their properties and functions and thus their possible roles in biological systems of interest. Recent years have seen a profusion of Internet-based resources to aid this process. This review takes the viewpoint of the curious biologist wishing to explore the properties of protein-coding genes and their products, identified using genome-based technologies. Ten key questions are asked about each hit, addressing functions, phenotypes, expression, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set-wide information retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Results obtained using some of these are discussed in more depth using the p53 tumor suppressor as an example. This flexible and universally applicable approach for characterizing experimental hits helps researchers to maximize the potential of their projects for biological discovery.
FIGURE 1:. Generalized workflow for the analysis of DNA, RNA, or protein samples and questions about the hits identified. Nucleic acid or protein samples isolated from the biological material of interest are processed, then analyzed by various methods. Raw analytical data are then matched to entries in public databases, generating a results table listing the genes, transcripts, or proteins (hits) identified. For each of these hits, 10 questions relating to their features, functions, and other properties are shown (blue boxes). Each question is addressed by a section in the text, plus one or more supplemental tables containing examples of hyperlinks to entries in online resources.
FIGURE 2:. Approaches for obtaining functional information about experimentally identified gene, transcript, or protein hits. Freely available software tools can be used to obtain information about features and functions of genes, transcripts, or proteins in a results table from multiple sources. Generation of an interaction network shows at a glance the nature of any previously reported interactions between members of a set of hits, each of which can be explored using the resources indicated. Making a hyperlinked results table allows one-click access from each hit directly to relevant pages from a wide range of resources. Creating an annotated results table containing controlled-vocabulary terms or keywords from a range of sources allows hits to be classified and sorted on the basis of these terms. Step-by-step protocols for performing these analyses are presented in the Supplemental Materials.
Alberts,
The cell as a collection of protein machines: preparing the next generation of molecular biologists.
1998, Pubmed
Alberts,
The cell as a collection of protein machines: preparing the next generation of molecular biologists.
1998,
Pubmed Alexander,
Spatial exclusivity combined with positive and negative selection of phosphorylation motifs is the basis for context-dependent mitotic signaling.
2011,
Pubmed
,
Xenbase Altschul,
Basic local alignment search tool.
1990,
Pubmed Amberger,
A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®).
2011,
Pubmed Asplund,
Antibodies for profiling the human proteome-The Human Protein Atlas as a resource for cancer research.
2012,
Pubmed Barrett,
NCBI GEO: archive for functional genomics data sets--update.
2013,
Pubmed Becker,
The genetic association database.
2004,
Pubmed Benson,
GenBank.
2014,
Pubmed Bento,
The ChEMBL bioactivity database: an update.
2014,
Pubmed Bhagwat,
Searching NCBI's dbSNP database.
2010,
Pubmed Blake,
Gene Ontology annotations and resources.
2013,
Pubmed Boutros,
The art and design of genetic screens: RNA interference.
2008,
Pubmed Bragin,
DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation.
2014,
Pubmed Bulusu,
canSAR: updated cancer research and drug discovery knowledgebase.
2014,
Pubmed Capaldi,
Analysis of gene function using DNA microarrays.
2010,
Pubmed Chatr-Aryamontri,
The BioGRID interaction database: 2013 update.
2013,
Pubmed Croft,
The Reactome pathway knowledgebase.
2014,
Pubmed Davis,
The Comparative Toxicogenomics Database: update 2013.
2013,
Pubmed de Beer,
PDBsum additions.
2014,
Pubmed Deribe,
Post-translational modifications in signal integration.
2010,
Pubmed Dice,
Peptide sequences that target cytosolic proteins for lysosomal proteolysis.
1990,
Pubmed Dinkel,
Phospho.ELM: a database of phosphorylation sites--update 2011.
2011,
Pubmed Dinkel,
The eukaryotic linear motif resource ELM: 10 years and counting.
2014,
Pubmed Dorée,
From Cdc2 to Cdk1: when did the cell cycle kinase join its cyclin partner?
2002,
Pubmed Eisenhaber,
Prediction of posttranslational modification of proteins from their amino acid sequence.
2010,
Pubmed Fernández,
iHOP web services.
2007,
Pubmed Finn,
Pfam: the protein families database.
2014,
Pubmed Fitch,
Homology a personal view on some of the problems.
2000,
Pubmed Fleischmann,
IntEnz, the integrated relational enzyme database.
2004,
Pubmed Flicek,
Ensembl 2014.
2014,
Pubmed Forbes,
COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.
2011,
Pubmed Forsburg,
The art and design of genetic screens: yeast.
2001,
Pubmed Franceschini,
STRING v9.1: protein-protein interaction networks, with increased coverage and integration.
2013,
Pubmed Gaudet,
neXtProt: organizing protein knowledge in the context of human proteome projects.
2013,
Pubmed Geer,
CDART: protein homology by domain architecture.
2002,
Pubmed Geer,
The NCBI BioSystems database.
2010,
Pubmed Gnad,
PHOSIDA 2011: the posttranslational modification database.
2011,
Pubmed Good,
Scaffold proteins: hubs for controlling the flow of cellular information.
2011,
Pubmed
,
Xenbase Griss,
Consequences of the discontinuation of the International Protein Index (IPI) database and its substitution by the UniProtKB "complete proteome" sets.
2011,
Pubmed Gutmanas,
PDBe: Protein Data Bank in Europe.
2014,
Pubmed Hayles,
A genome-wide resource of cell cycle and cell shape genes of fission yeast.
2013,
Pubmed Hedegaard,
Methods for interpreting lists of affected genes obtained in a DNA microarray experiment.
2009,
Pubmed Herráez,
Biomolecules in the computer: Jmol to the rescue.
2006,
Pubmed Hibbs,
Exploring the functional landscape of gene expression: directed search of large microarray compendia.
2007,
Pubmed Hopkins,
The druggable genome.
2002,
Pubmed Hornbeck,
PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.
2012,
Pubmed Horowitz,
One-gene-one-enzyme: remembering biochemical genetics.
1995,
Pubmed Huang,
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.
2009,
Pubmed Huang,
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.
2009,
Pubmed Hung,
Gene set enrichment analysis: performance evaluation and usage guidelines.
2012,
Pubmed Hunter,
InterPro in 2011: new developments in the family and domain prediction database.
2012,
Pubmed Huntley,
QuickGO: a user tutorial for the web-based Gene Ontology browser.
2009,
Pubmed Hutchins,
Systematic analysis of human protein complexes identifies chromosome segregation proteins.
2010,
Pubmed Kanehisa,
Data, information, knowledge and principle: back to metabolism in KEGG.
2014,
Pubmed Karolchik,
The UCSC Genome Browser database: 2014 update.
2014,
Pubmed Kersey,
Ensembl Genomes 2013: scaling up access to genome-wide data.
2014,
Pubmed Kersey,
The International Protein Index: an integrated database for proteomics experiments.
2004,
Pubmed Kim,
Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe.
2010,
Pubmed Kirschner,
The meaning of systems biology.
2005,
Pubmed Kornberg,
The private life of DNA polymerase I.
1990,
Pubmed Kosuge,
DDBJ progress report: a new submission system for leading to a correct annotation.
2014,
Pubmed Kouskoumvekaki,
Facilitating the use of large-scale biological data and tools in the era of translational bioinformatics.
2014,
Pubmed Landrum,
ClinVar: public archive of relationships among sequence variation and human phenotype.
2014,
Pubmed Lane,
p53-based cancer therapy.
2010,
Pubmed Lappalainen,
DbVar and DGVa: public archives for genomic structural variation.
2013,
Pubmed Law,
DrugBank 4.0: shedding new light on drug metabolism.
2014,
Pubmed Lee,
The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes.
2005,
Pubmed Lees,
Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis.
2014,
Pubmed Letunic,
SMART 7: recent updates to the protein domain annotation resource.
2012,
Pubmed Liebel,
Bioinformatic "Harvester": a search engine for genome-wide human, mouse, and rat protein resources.
2005,
Pubmed Lipinski,
Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings.
2001,
Pubmed Lotia,
Cytoscape app store.
2013,
Pubmed Lu,
PubMed and beyond: a survey of web tools for searching biomedical literature.
2011,
Pubmed Lu,
DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications.
2013,
Pubmed Lütjohann,
'Sciencenet'--towards a global search and share engine for all scientific knowledge.
2011,
Pubmed Madej,
MMDB: 3D structures and macromolecular interactions.
2012,
Pubmed Marchler-Bauer,
CDD: conserved domains and protein three-dimensional structure.
2013,
Pubmed Mi,
Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences.
2012,
Pubmed Mi,
PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees.
2013,
Pubmed Müller,
Textpresso: an ontology-based information retrieval and extraction system for biological literature.
2004,
Pubmed NCBI Resource Coordinators,
Database resources of the National Center for Biotechnology Information.
2014,
Pubmed Neumann,
Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes.
2010,
Pubmed Niedringhaus,
Landscape of next-generation sequencing technologies.
2011,
Pubmed Obenauer,
Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs.
2003,
Pubmed Ooi,
ANNIE: integrated de novo protein sequence annotation.
2009,
Pubmed Orchard,
The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases.
2014,
Pubmed Orchard,
Protein interaction data curation: the International Molecular Exchange (IMEx) consortium.
2012,
Pubmed Ozsolak,
RNA sequencing: advances, challenges and opportunities.
2011,
Pubmed Pakseresht,
Assembly information services in the European Nucleotide Archive.
2014,
Pubmed Petryszak,
Expression Atlas update--a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments.
2014,
Pubmed Pruitt,
RefSeq: an update on mammalian reference sequences.
2014,
Pubmed Que,
Evaluation of protein phosphorylation site predictors.
2010,
Pubmed Reardon,
Project ranks billions of drug interactions.
2013,
Pubmed Rose,
The RCSB Protein Data Bank: new resources for research and education.
2013,
Pubmed Rosenbloom,
ENCODE data in the UCSC Genome Browser: year 5 update.
2013,
Pubmed Rustici,
ArrayExpress update--trends in database growth and links to data analysis tools.
2013,
Pubmed Saito,
A travel guide to Cytoscape plugins.
2012,
Pubmed Schomburg,
BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA.
2013,
Pubmed Schreiber,
TreeFam v9: a new website, more species and orthology-on-the-fly.
2014,
Pubmed Sigrist,
New and continuing developments at PROSITE.
2013,
Pubmed Sillitoe,
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures.
2013,
Pubmed Smith,
InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data.
2012,
Pubmed Smoot,
Cytoscape 2.8: new features for data integration and network visualization.
2011,
Pubmed Sönnichsen,
Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans.
2005,
Pubmed Stelzer,
In-silico human genomics with GeneCards.
2011,
Pubmed Suzek,
UniRef: comprehensive and non-redundant UniProt reference clusters.
2007,
Pubmed UniProt Consortium,
Activities at the Universal Protein Resource (UniProt).
2014,
Pubmed Villaveces,
Dasty3, a WEB framework for DAS.
2011,
Pubmed Walther,
Mass spectrometry-based proteomics in cell biology.
2010,
Pubmed Wang,
PubChem BioAssay: 2014 update.
2014,
Pubmed Wang,
PubChem: a public information system for analyzing bioactivities of small molecules.
2009,
Pubmed Wolfsberg,
A user's guide to the human genome.
2002,
Pubmed Wood,
PomBase: a comprehensive online resource for fission yeast.
2012,
Pubmed Yang,
Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.
2013,
Pubmed Young,
Systems-wide proteomic characterization of combinatorial post-translational modification patterns.
2010,
Pubmed