Subcellular location is an important property of proteins, carefully regulated by the cell machinery. To determine subcellular location on a proteome-wide scale, fluorescent image data is most commonly used and a classification system is employed for analysis. These systems assign each protein to one of a small set of predefined location classes (typically the major organelles).
Too often, in the past, the performance of classification was evaluated on datasets which contained multiple images of the same protein as representative of a class. I will argue that this is overly optimistic and generalises poorly.
On the second part of my talk, I will discuss how classification implies a limited representation of the underlying biology as proteins are often in multiple organelles. I will present techniques that go beyond the case of single location assignment to fractional assignment. These techniques were applied on a large collection of images of fluorescently tagged mouse proteins, which included several proteins for which no location assignment had been previously reported in the literature.
This work was performed at Carnegie Mellon University with Prof. Robert F. Murphy and Dr. Tao Peng.