Personalized medicine entails a vision where biomedical data is used to generate both static individual risk profiles (through personalized genomes) and to track patients' health longitudinally using dynamic molecular and physiological data. Mass spectrometry (MS) plays a critical role in providing quantitative molecular profiles of patients over time, allowing early detection of disease and monitoring of interventions. While MS-based proteomics and metabolomics technologies capture crucial dynamic information on the molecular level, they currently lag behind sequencing-based technologies due to a lack of throughput, reproducibility and coverage.
Personalized medicine is based on the premise of "biochemical individuality", i.e. the notion that individuals differ in their biochemical make-up and (i) may have different "normal" (healthy) levels of certain analytes and (ii) may react differently to external interventions and perturbations (such as lifestyle changes or medication). However, to understand the impact of protein and metabolite levels for health and disease on a personal level, we need to be able to track these analytes with mass spectrometry and be able to interpret their variation within the context of the inter-personal and intra-personal variation as well as the personalized genome.
Mass-spectrometry based quantitative proteomics allows researchers to accurately quantify the dynamics of protein abundance and protein activity in biological systems. In order to increase the quantitative accuracy and the throughput of proteomics methods, we have developed a novel targeted proteomics method called SWATH-MS that is based on data-independent acquisition (DIA) which aims to complement traditional mass spectrometry-based proteomics techniques such as shotgun and SRM methods. In principal, it allows a complete and permanent recording of all fragment ions of all peptide precursors in a biological sample and can thus potentially combine the advantages of shotgun (high throughput) with those of SRM (high reproducibility and sensitivity).
To analyze the SWATH-MS data, we developed OpenSWATH, an automated software to perform targeted data extraction from the SWATH-MS maps. Our software allows to perform automated data extraction, peak-picking and feature-detection in chromatographic traces, thus performing a complete SWATH-MS data analysis completely automatically; the only input are the raw MS/MS files as well as a transition library to perform the targeted data extraction. After feature detection, we use the mProphet algorithm for error rate estimation.
Using SWATH-MS in conjunction with OpenSWATH, we have successfully quantified over 900 proteins in the pathogen Streptococcus pyogenes in a single LC-MS/MS injection (more than any previous study), allowing us to study the response of the pathogen to human blood plasma in unprecedented detail. We also could quantify over 1900 human proteins in an AP-MS pulldown experiment and identify over 500 high-confidence physical protein-protein interactions of the 14-3-3β scaffold protein, giving us direct insight into the dynamics of a large protein interaction network.
We are interested to study human variation in healthy subjects in collaboration with other research labs. The hPOP (Human Personalized Omics Profiling) project is designed to study the variance of molecular markers across a large number of participants. Recent advances in high throughput technologies allow profiling of thousands of analytes within a single experiment. These measurements could potentially be used to diagnose disease early, monitor treatment progression and stratify patient groups to ensure each individual obtains the treatment best suited to their needs. This personalized approach to medicine would include continuous monitoring of thousands of parameters over a whole lifetime. However, in order to be able to interpret such data, we need to have a better understanding of the underlying natural variation of these molecular parameters in health and disease. Only if we know the natural ranges of individual analytes, the expected responses to perturbations and the long-term trends in their levels, can we draw meaningful conclusions from comprehensive personalized profiling.
Often, biological phenomena cannot be studied or measured directly (or doing so would be too resource-intensive) and researchers need to use in silico simulations to analyze complex phenomena. Some well-known examples in computational biology include protein folding or kinetic modelling where computer simulation-based approaches are used heavily. In mass-spectrometry based proteomics, peptide digestion, chromatographic separation and collision-induced dissociation to fragment charged peptide precursor ions are complex phenomena which can be approached using simulation to provide predictions and insights into error rates during peptide identification and peptide quantification.
Our software, the SRMCollider, allows to model all individual steps in a LC-MS/MS experiment (digestions, chromatographic separation, fragmentation), specifically taking into account the challenges of targeted proteomic where only a few fragment ions are monitored for each peptide. This allowed us to investigate the question of assay redundancy in SRM and SWATH-MS experiments and make concrete predictions about assay specificity in a targeted proteomics setting. We have successfully applied these simulations for multiple studies in the Aebersold lab, including proteomes as diverse as Mycobacterium tuberculosis, Saccharomyces cerevisiae and Homo sapiens.