Document Type


Publication Date


Publication Title

Journal of biomolecular techniques : JBT


In untargeted metabolomics experiments library search engines detect metabolites using several features, including precursor mass, isotopic distribution, retention time, and MS2 fragmentation. Matching acquired MS2 to library spectra is vital as numerous compounds share molecular formulas, resulting in identical precursor measurements and similar retention times. However, many metabolomics experiments are still collected using LC-MS only, and even in LC-MS/MS experiments many precursors lack MS2 spectra due to the stochastic nature of data dependent acquisition. We observe that when metabolites ionize they can produce unanticipated MS1 features resulting from neutral losses, in-source fragmentation, multimerization, and adducts. Here we present a new approach to leverage these measurements to identify metabolites when MS2 spectra are of low quality or not available. We processing datasets of 75 known standards mixed with whole yeast lysates to strip them of their MS2 scans to produce a gold-standard MS1-only data set of a complex metabolome with known targets. For each dataset we determined the proportion unambiguous annotations (where the correct annotation had a higher score than other potential annotations) and unmistakable annotations (where the correct annotation was the only valid annotation detected). We found that incorporating in-source fragments improved these metrics for both MS1-only (increasing from 60% to 73% unambiguous and 40% to 65% unmistakable matches) and MS2 datasets (from 79% to 84% unambiguous and 41% to 60% unmistakable). Unexpectedly, in these data we observed that the MS2 spectra were less useful than in-source fragment data for improving identification accuracy. We believe this is largely because the low-resolution iontrap MS2 spectra collected in this experiment show significant noise, which diminishes spectral match scores and allows other candidates to outscore the correct identifications. We suspect that noise is less likely to affect MS1 peak groups because they are generated from data aggregated across multiple high-resolution MS1 scans.


Institute for Systems Biology