We use the spectral entropy and normalized entropy to measure the spectra complexity of the spectra in MoNA database and label the low-quality spectra.
When a spectrum has (normalized entropy)4 higher than 0.8 and spectral entropy higher than 3, we label it as a low-quality spectrum. Note: GC-MS Spectra has an exception and has been evaluated as clean regardless of (normalized entropy)4 and spectral entropy.
Spectral entropy is a measure of the total information content of an MS/MS spectrum. Many people use "peak number" to measure spectral complexity, however, spectra with same number of peaks are not contains similar information. For example, in the figure 1, all the three spectra have the same number of peaks, but they have different information content. Therefore, we introduce spectral entropy to precisely measure the information content of a spectrum.
Figure 1: Spectra have same number of peaks but different spectral entropy.
The spectral entropy is similar to the Shannon entropy in the theory of information. The spectral entropy can be calculated by the following formula:
Figure 2. The formula of calculating spectral entropy.
On the other side, the spectral entropy can be seen as the intensity weighted spectral peak number.
Most spectra of small molecule in the database have spectral entropy between 0 to 5. See the figure 3.
Figure 3: Distribution of spectral entropies in the NIST20, MassBank.us and GNPS databases.
The spectral entropy is ranged from 0 to ln(peak number). The spectral entropy can be normalized to 0 to 1 by dividing the spectral entropy by ln(peak number). Therefore, we have normalized entropy as follows:
Figure 4. The formula of calculating normalized entropy.
We injected 216 metabolites in 13 different concentrations, and collecting in 3 different collision energy, which resulting 9,695 spectra. We divided those spectra into two groups: high quality and low quality, the normalized entropy of each group is shown in the figure 5.
Figure 5. Distribution of normalized entropies in spectra with different quality.
Yuanyue Li, Tobias Kind, Jacob Folz, Arpana Vaniya, Sajjan Singh Mehta & Oliver Fiehn Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nat Methods 18, 1524–1531 (2021).