Confidence Estimation

One of the primary purposes of mokapot is to assign confidence estimates to PSMs. This task is accomplished by ranking PSMs according to a score and using an appropriate confidence estimation procedure for the type of data. mokapot can provide confidence estimates based any score, regardless of whether it was the result of a learned Model() instance or provided independently.

The following classes store the confidence estimates for a dataset based on the provided score. They provide utilities to access, save, and plot these estimates for the various relevant levels (i.e. PSMs, peptides, and proteins). The LinearConfidence() class is appropriate for most data-dependent acquisition proteomics datasets.

We recommend using the brew() function or the assign_confidence() method to obtain these confidence estimates, rather than initializing the classes below directly.

class mokapot.confidence.GroupedConfidence(psms, scores, desc=True, eval_fdr=0.01)[source]

Perform grouped confidence estimation for a collection of PSMs.

Groups are defined by the LinearPsmDataset. Confidence estimates for each group can be retrieved by using the group name as an attribute, or from the group_confidence_estimates() property.

Parameters:

psmsLinearPsmDataset object: A collection of PSMs.
scoresnp.ndarray: A vector containing the score of each PSM.
descbool: Are higher scores better?
eval_fdrfloat: The FDR threshold at which to report performance. This parameter has no affect on the analysis itself, only logging messages.
rngint, np.random.Generator, optional: A seed or generator used to break ties, or None to use the default random number generator state.

Attributes:

groups: List
group_confidence_estimates: Dict

Methods

to_txt([dest_dir, file_root, sep, decoys, ...])

Save confidence estimates to delimited text files.

property group_confidence_estimates: A dictionary of the confidence estimates for each group.

property groups: The groups for confidence estimation

to_txt(dest_dir=None, file_root=None, sep='\t', decoys=False, combine=False)[source]

Save confidence estimates to delimited text files.

Parameters:

dest_dirstr or None, optional: The directory in which to save the files. None will use the current working directory.
file_rootstr or None, optional: An optional prefix for the confidence estimate files. The suffix will be “mokapot.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins) if combine=True. If combine=False (the default), additionally the group value is prepended, yeilding a suffix “{group}.mokapot.{level}.txt”.
sepstr, optional: The delimiter to use.
decoysbool, optional: Save decoys confidence estimates as well?
combinebool, optional: Should groups be combined into a single file?

Returns:

list of str: The paths to the saved files.

class mokapot.confidence.LinearConfidence(psms, scores, desc=True, eval_fdr=0.01)[source]

Assign confidence estimates to a set of PSMs

Estimate q-values and posterior error probabilities (PEPs) for PSMs and peptides when ranked by the provided scores.

Parameters:

psmsLinearPsmDataset object: A collection of PSMs.
scoresnp.ndarray: A vector containing the score of each PSM.
descbool: Are higher scores better?
eval_fdrfloat: The FDR threshold at which to report performance. This parameter has no affect on the analysis itself, only logging messages.

Attributes:

levelslist of str: The available levels for confidence estimates.
psmspandas.DataFrame: Confidence estimates for PSMs in the dataset.
peptidespandas.DataFrame: Confidence estimates for peptides in the dataset.
proteinspandas.DataFrame or None: Confidence estimates for proteins in the dataset.
confidence_estimatesDict[str, pandas.DataFrame]: A dictionary of confidence estimates at each level.
decoy_confidence_estimatesDict[str, pandas.DataFrame]: A dictionary of confidence estimates for the decoys at each level.

Methods

`plot_qvalues`([level, threshold, ax])	Plot the cumulative number of discoveries over range of q-values.
`to_flashlfq`([out_file])	Save confidenct peptides for quantification with FlashLFQ.
`to_txt`([dest_dir, file_root, sep, decoys])	Save confidence estimates to delimited text files.

to_flashlfq(out_file='mokapot.flashlfq.txt')[source]

Save confidenct peptides for quantification with FlashLFQ.

FlashLFQ is an open-source tool for label-free quantification. For mokapot to save results in a compatible format, a few extra columns are required to be present, which specify the MS data file name, the theoretical peptide monoisotopic mass, the retention time, and the charge for each PSM. If these are not present, saving to the FlashLFQ format is disabled.

Note that protein grouping in the FlashLFQ results will be more accurate if proteins were added for analysis with mokapot.

Parameters:

out_filestr, optional: The output file to write.

Returns:

str: The path to the saved file.

property levels: The available levels for confidence estimates.

plot_qvalues(level='psms', threshold=0.1, ax=None, **kwargs)

Plot the cumulative number of discoveries over range of q-values.

The available levels can be found using levels() attribute.

Parameters:

levelstr, optional: The level of q-values to report.
thresholdfloat, optional: Indicates the maximum q-value to plot.
axmatplotlib.pyplot.Axes, optional: The matplotlib Axes on which to plot. If None the current Axes instance is used.
**kwargsdict, optional: Arguments passed to matplotlib.pyplot.plot().

Returns:

matplotlib.pyplot.Axes: An matplotlib.axes.Axes with the cumulative number of accepted target PSMs or peptides.

to_txt(dest_dir=None, file_root=None, sep='\t', decoys=False)

Save confidence estimates to delimited text files.

Parameters:

dest_dirstr or None, optional: The directory in which to save the files. None will use the current working directory.
file_rootstr or None, optional: An optional prefix for the confidence estimate files. The suffix will always be “mokapot.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).
sepstr, optional: The delimiter to use.
decoysbool, optional: Save decoys confidence estimates as well?

Returns:

list of str: The paths to the saved files.

mokapot.confidence.plot_qvalues(qvalues, threshold=0.1, ax=None, **kwargs)[source]

Plot the cumulative number of discoveries over range of q-values.

Parameters:

qvaluesnumpy.ndarray: The q-values to plot.
thresholdfloat, optional: Indicates the maximum q-value to plot.
axmatplotlib.pyplot.Axes, optional: The matplotlib Axes on which to plot. If None the current Axes instance is used.
**kwargsdict, optional: Arguments passed to matplotlib.axes.Axes.plot().

Returns:

matplotlib.pyplot.Axes: An matplotlib.axes.Axes with the cumulative number of accepted target PSMs or peptides.