My current research in machine learning focuses on convex models for learning predictive representation. Most existing machine learning methods infer representation from data in a way that is independent of its subsequent use, e.g. learning a predictive model. This is suboptimal. My research goal is to jointly infer latent representation and learn predictors for massive datasets by combining them into a single convex optimization problem. Convexity allows jointly optimal solutions to be found for these two tasks, and scale up efficiently to large application problems. To achieve this goal, my key strategies are: 1) find appropriate convex relaxations that retain the structure of the data, e.g. semi-definite relaxations; and 2) design efficient algorithms for optimization such as low-rank approximation.