Inferring network structures is a central problem arising in numerous fields of science and technology, including communication systems, biology, sociology, and neuroscience. Unfortunately, it is often difficult, or impossible, to obtain data that directly reveal the underlying network structure, and so one must infer a network from incomplete data. In this talk, we will look at the problem of inferring network structure from “co-occurrence” observations.
These observations identify which network components (e.g., switches and routers, in a communications network, or genes, in a gene regulatory network) co-occur in a path, but do not indicate the order in which they occur in that path. Without order information, the number of networks that are consistent with the data grows exponentially with the size of the network (i.e., the number of nodes). Yet, the basic engineering/evolutionary principles underlying most networks strongly suggest that not all networks consistent with the observations are equally likely. In particular, nodes that often co-occur are probably closer than nodes that rarely co-occur. This rationale suggests modeling co-occurrence observations as independent realizations of a Markovian random walk on the network, subjected to a random permutation to account for the lack of order information.
Treating permutations as missing data, allows deriving an expectation-–maximization (EM) algorithm for estimating the random walk parameters. The model and EM algorithm significantly simplify the problem, but the computational complexity of the reconstruction process does grow exponentially in the length of each transmission path. For networks with long paths, the exact E-step may be computationally intractable. We thus propose a polynomial-time Monte Carlo EM algorithm based on importance sampling and derive conditions that ensure convergence of the algorithm with high probability. Finally, we report simulations and experiments with Internet measurements and inference of biological networks that demonstrate the promise of this approach.
The work reported in this talk was done in collaboration with Prof. Michael Rabbat (McGill University, Canada) and Prof. Robert D. Nowak (University of Wisconsin, USA).