Plant Bioinformatics


The past 15 years have been exciting ones in plant biology. Hundreds of plant genomes have been sequenced, RNA-seq has enabled transcriptome-wide expression profiling, and a proliferation of “-seq”-based methods has permitted protein-protein and protein-DNA interactions to be determined cheaply and in a high-throughput manner. These data sets in turn allow us to generate hypotheses at the click of a mouse. For instance, knowing where and when a gene is expressed can help us narrow down the phenotypic search space when we don’t see a phenotype in a gene mutant under “normal” growth conditions. Coexpression analyses and association networks can provide high-quality candidate genes involved in a biological process of interest. Using Gene Ontology enrichment analysis and pathway visualization tools can help us make sense of our own ‘omics experiments and answer the question “what processes/pathways are being perturbed in our mutant of interest?”

Structure: each of the 6 week hands-on modules consists of a ~2 minute intro, a ~20 minute theory mini-lecture, a 1.5 hour hands-on lab, an optional ~20 minute lab discussion if experiencing difficulties with lab, and a ~2 minute summary.
Tools covered [Material updated in June 2022]:
Module 1: GENOMIC DBs / PRECOMPUTED GENE TREES / PROTEIN TOOLS. Araport, TAIR, Gramene, EnsemblPlants Compara, PLAZA; SUBA4 and Cell eFP Browser, 1001 Genomes Browser
Module 2: EXPRESSION TOOLS. eFP Browser / eFP-Seq Browser, Araport, Genevestigator, TravaDB, NCBI Genome Data Viewer for exploring RNA-seq data for many plant species, MPSS database for small RNAs
Module 3: COEXPRESSION TOOLS. ATTED II, Expression Angler, AraNet, AtCAST2
Module 4: PROMOTER ANALYSIS. Cistome, MEME, ePlant
Module 5: GO ENRICHMENT ANALYSIS AND PATHWAY VIZUALIZATION. AgriGO, AmiGO, Classification SuperViewer, TAIR, g:profiler, AraCyc, MapMan (optional: Plant Reactome)
Module 6: NETWORK EXPLORATION. Arabidopsis Interactions Viewer 2, ePlant, TF2Network, Virtual Plant, GeneMANIA

What you will learn

Plant Genomic Databases, and useful sites for info about proteins

In this module we’ll be exploring several plant databases including Ensembl Plants, Gramene, PLAZA, SUBA, TAIR and Araport. The information in these databases allows us to easily identify functional regions within gene products, view subcellular localization, find homologs in other species, and even explore pre-computed gene trees to see if our gene of interest has undergone a gene duplication event in another species, all at the click of a mouse!

Expression Analysis

Vast databases of gene expression and nifty visualization tools allow us to explore where and when a gene is expressed. Often this information can be used to help guide a search for a phenotype if we don’t see a phenotype in a gene mutant under “normal” growth conditions. We explore several tools for Arabidopsis data (eFP Browser, Genevestigator, TraVA DB, Araport) along with NCBI’s Genome Data Viewer for RNA-seq data for other plant species. We also examine the MPSS database of small RNAs and degradation products to see if our example gene has any potential microRNA targets.

Coexpression Tools

Being able to group genes by similar patterns of expression across expression data sets using algorithms like WGCNA is a very useful way of organizing the data. Clusters of genes with similar patterns of expression can then be subject to Gene Ontology term enrichment analysis (see Module 5) or examined to see if they are part of the same pathway. What’s even more powerful is being able to identify genes with similar patterns of expression without doing a single expression profiling experiment, by mining gene expression databases! There are several tools that allow you to do this in many plant species simply by entering a query gene identifier. The genes that are returned are often in the same biological process as the query gene, and thus this “guilt-by-association” paradigm is a excellent tool for hypothesis generation.

Sectional Quiz 1

What’s included