Research

To facilitate clinical and translational research, the Nagarajan lab is engaged in multiple biomedical informatics projects. These may be grouped broadly in the areas of: institutional informatics infrastructure, caBIGTM projects, development of new analytical and visualization tools, and application of informatics tools toward the analysis of complex diseases.



1) Institutional Informatics Infrastructure (Biomedical Informatics Core) : 
Enterprise-wide databases for the management and integration of diverse biomedical information are being or have been developed. These include-
  • Electronic/ Medical Record:  A research patient data warehouse containing data from a small number of patients (~2500) that allows users to query for, view, and download virtually any inpatient medical information has been developed. Plans are underway to extend this architecture to include the entire BJC HealthCare database (~3 million patients).
  • Clinical Studies Management:  An application, termed ClinPortal, is being developed to allow principal investigators and study coordinators to describe the data capture and content components of a clinical study using controlled vocabularies. ClinPortal then automatically generates the back end database schema as well as the data capture and query forms. Using this tool, data for virtually any clinical study may be managed without IT personnel intervention.
  • Biospecimen Management:  As part of the National Cancer Institute’s cancer Biomedical Informatics Grid (caBIGTM) initiative, we have developed caTissue Core as the foundation piece for biospecimen management by banking facilities. Currently, we are developing caTissue Suite, an application that will combine caTissue Core with caTIES , a natural language processing tool that concept codes surgical pathology reports, and the ability to annotate biospecimens using clinical and pathology annotations (e.g. tumor staging, lab values and treatment protocols).
  • Molecular Data Set Management:  Databases to warehouse and store expression, SNP, and large scale re-sequencing data generated by Washington University investigators have been developed. A Proteomics LIMS project is under development to track and store proteomics experiments.


2) caBIGTM Projects:
  • To facilitate compatibility reviews by the Vocabularies and Common Data Elements (VCDE) and Architecture (ARCH) workspaces, we have developed and released version 1.0 of the Compatibility Review System (CRS). We are currently developing version 2.0 that will contain tools for compatibility self-checks by developers and reports that determine the true CDE reuse.
  • To annotate probes on microarray using data available from public biomedical databases (e.g. Entrez Gene, OMIM, GO, and UniGene), we developed caFE. caFE is currently caBIGTM Silver-compatible and is grid enabled.
  • caTissue Suite- See above.
  • To facilitate clinical and translational research, we are developing caBench-to-Bedside (caB2B). caB2B will be used to query caGrid data services and will analyze this data using caGrid analytical services.In the first year, the scope of caB2B is confined to the query, analysis and visualization of microarray expression data.
  • GeneConnect: To interlink genomic identifiers. As part of the caBIGTM project, we have developed a genomic identifier mapping service termed. GeneConnect allows users to input one or more commonly used genomic identifiers (e.g. Entrez Gene ID, Ensembl Gene ID, or UniProtKB Accession Number) and returns linked identifiers from other databases (e.g. RefSeq mRNA or Protein ID or GenBank mRNA/Peptide ID).



3) Novel Analytical and Visualization Tools:  Tools for the analysis of molecular data sets include-
  • Promoter Analysis Pipeline or PAP:  The Promoter Analysis Pipeline (PAP) is a web-based workbench for analyzing promoter sequences of a set of co-expressed genes in mammals. PAP is based on a statistical model which is based on overrepresentation of evolutionarily conserved transcription factor binding sites in promoters. The typical input for PAP is a set of co-expressed genes. After PAP searches these gene identifiers in its database, users may then predict the transcription factors that regulate these genes, identify the transcription factor binding sites in these genes, and identify other genes in the genome that are regulated by the same transcription factors. These functions are useful for better understanding transcription regulation in mammals.
  • Function Express or FE:  To analyze microarray expression data. FE allows end users to filter, normalize, cluster, and classify gene expression data. Available algorithms include Significance Analysis of Microarrays (SAM), k-means, hierarchical, and SOM clustering, GenePattern (Consensus Clustering, Comparative Marker Selection, and Weighted Voting), and R modules (SVM and PCA). FE will also allow end users to view gene networks created using co-expression, transcription factor binding site, pathway, interaction, and literature-based data.
  • MuV:  To analyze re-sequencing data. We are refining the MuV pipeline to allow users to design primers in high throughput using a web interface, to analyze sequence traces for mixed peaks and indels using multiple analytical tools, and to visualize mutation information in the context of genomic annotation (e.g. exon-intron boundaries, protein domains, and known SNPs).
  • Mascot Viewer or MaV:  To analyze proteomics data. MaV allows users to view a ‘clickable’ gel and to click on relevant spots to view consolidated (i.e. multiple mass spectrometry [MS] runs on one or more machines) MS and MS/MS results.
  • GeneConnect:  To interlink genomic identifiers. As part of the caBIGTM project, we have developed a genomic identifier mapping service termed. GeneConnect allows users to input one or more commonly used genomic identifiers (e.g. Entrez Gene ID, Ensembl Gene ID, or UniProtKB Accession Number) and returns linked identifiers from other databases (e.g. RefSeq mRNA or Protein ID or GenBank mRNA/Peptide ID).


4) Collaborative Projects:  The Nagarajan lab collaborates with multiple investigators not only to develop new computational tools but also to analyze diverse biomedical data sets.
Examples include:
  • GAML-PPG:  The Genomics of AML Program Project Grant consists of a group of investigators who are studying the molecular mechanisms underlying acute myeloid leukemia. Using a variety of ‘omic’ technologies (e.g. microarray expression, aCGH, and SNP,methylation, and re-sequencing), these investigators are correlating molecular fingerprints with clinical parameters such as survival and recurrence.
  • NORG:  The Neuro-Oncology Group is studying a variety of tumors (e.g. astrocytomas and oligodendrogliomas) using genomic technologies to identify molecular signatures that may ultimately used to manage patients with such cancers.