gravityboy (gravityboy) wrote,

On Visualizing Biological Data

The following is a brain dump of some of the things I've been thinking about lately.

One of the biggest changes over the past several years in biology has been the incredible deluge of information. In response to this there's been a rise in bioinformatics to cope with all this. While this has led to some major successes, where it's failed is in its ability to impart a greater understanding of the subject at hand. Biologists still learn primarily from reading papers, the same way we always have. There are massive databases full of wonderful information but most of it is encoded with minimal or no context so you're always forced to go back to the papers to understand what the database is actually telling you. In that sense, these databases are fantastic at indexing information, but very poor at organizing it in such a way as to teach people about the topic at hand. We're still forced to slog through papers for just about everything.

What's striking about this is that the most informative bits in any biological paper I've ever read are encapsulated in the figures. The images themselves, provided you have sufficient background knowledge, show the basic data and give you the most understanding for the smallest investment of time. You can skim an article's abstract, it's figures and figure legends and gain a fair understanding of the topic before deciding whether or not to go further.

Now, there's a contradiction of sorts here. Many of these figures are generated via computers, usually an excel-made graph. The rest are actual photographs of things, such as blots, gels, or stained tissues, eventually inserted and processed via the computer. The contradiction is that the computer is used intensively to organize this data for publication, but we have a hard time extracting the essence of that visualization for indexing in "big picture" sorts of ways. That almost always has to be done by hand (and brain) by the biologist. This, of course, is suboptimal when you have thousands of individual genes.

The fundamental reason for all of this is that biological information depends wholly on context. For example, you can sequence the whole genome, but it's totally unclear what genes will be expressed at any given time unless you have much more information about the cell type, developmental stage, pathologies, and so forth. As far as I can see, all our bioinformatics tools have failed completely at providing any sort of context for their information. A common thing to see is so-called "wiring diagrams" that display molecular interactions. These diagrams are full of nodes and edges and it looks almost impossible to understand such things. While there is a great deal of complexity, contextual information provides us with a framework to understand what's actually working. Looking at these diagrams, there's no sense of this though, it looks instead like complexity overrun.

So that presents us with the challenge for the future. What's required of bioinformatics is to not only index the raw data, but also the context, and then present the data to us in a context-dependent manner. I am convinced that the key to presenting data this way is to come up with novel visualization methods because it's the visualizations in the papers that we use today to get the most out of our time. I believe that this problem is tractable and that there is a solution. More than likely we'll need several solutions, and I look forward to seeing them develop.
  • Post a new comment


    default userpic