Confidence Estimation

One of the primary purposes of mokapot is to assign confidence estimates to PSMs. This task is accomplished by ranking PSMs according to a score and using an appropriate confidence estimation procedure for the type of data. mokapot can provide confidence estimates based any score, regardless of whether it was the result of a learned Model() instance or provided independently.

The following classes store the confidence estimates for a dataset based on the provided score. They provide utilities to access, save, and plot these estimates for the various relevant levels (i.e. PSMs, peptides, and proteins). The LinearConfidence() class is appropriate for most data-dependent acquisition proteomics datasets.

We recommend using the brew() function or the assign_confidence() method to obtain these confidence estimates, rather than initializing the classes below directly.

class mokapot.confidence.GroupedConfidence(psms, scores, desc=True, eval_fdr=0.01)[source]

Perform grouped confidence estimation for a collection of PSMs.

Groups are defined by the LinearPsmDataset. Confidence estimates for each group can be retrieved by using the group name as an attribute, or from the group_confidence_estimates() property.

Parameters:
psmsLinearPsmDataset object

A collection of PSMs.

scoresnp.ndarray

A vector containing the score of each PSM.

descbool

Are higher scores better?

eval_fdrfloat

The FDR threshold at which to report performance. This parameter has no affect on the analysis itself, only logging messages.

rngint, np.random.Generator, optional

A seed or generator used to break ties, or None to use the default random number generator state.

Attributes:
groups: List
group_confidence_estimates: Dict

Methods

to_txt([dest_dir, file_root, sep, decoys, ...])

Save confidence estimates to delimited text files.

property group_confidence_estimates

A dictionary of the confidence estimates for each group.

property groups

The groups for confidence estimation

to_txt(dest_dir=None, file_root=None, sep='\t', decoys=False, combine=False)[source]

Save confidence estimates to delimited text files.

Parameters:
dest_dirstr or None, optional

The directory in which to save the files. None will use the current working directory.

file_rootstr or None, optional

An optional prefix for the confidence estimate files. The suffix will be “mokapot.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins) if combine=True. If combine=False (the default), additionally the group value is prepended, yeilding a suffix “{group}.mokapot.{level}.txt”.

sepstr, optional

The delimiter to use.

decoysbool, optional

Save decoys confidence estimates as well?

combinebool, optional

Should groups be combined into a single file?

Returns:
list of str

The paths to the saved files.

class mokapot.confidence.LinearConfidence(psms, scores, desc=True, eval_fdr=0.01)[source]

Assign confidence estimates to a set of PSMs

Estimate q-values and posterior error probabilities (PEPs) for PSMs and peptides when ranked by the provided scores.

Parameters:
psmsLinearPsmDataset object

A collection of PSMs.

scoresnp.ndarray

A vector containing the score of each PSM.

descbool

Are higher scores better?

eval_fdrfloat

The FDR threshold at which to report performance. This parameter has no affect on the analysis itself, only logging messages.

Attributes:
levelslist of str

The available levels for confidence estimates.

psmspandas.DataFrame

Confidence estimates for PSMs in the dataset.

peptidespandas.DataFrame

Confidence estimates for peptides in the dataset.

proteinspandas.DataFrame or None

Confidence estimates for proteins in the dataset.

confidence_estimatesDict[str, pandas.DataFrame]

A dictionary of confidence estimates at each level.

decoy_confidence_estimatesDict[str, pandas.DataFrame]

A dictionary of confidence estimates for the decoys at each level.

Methods

plot_qvalues([level, threshold, ax])

Plot the cumulative number of discoveries over range of q-values.

to_flashlfq([out_file])

Save confidenct peptides for quantification with FlashLFQ.

to_txt([dest_dir, file_root, sep, decoys])

Save confidence estimates to delimited text files.

to_flashlfq(out_file='mokapot.flashlfq.txt')[source]

Save confidenct peptides for quantification with FlashLFQ.

FlashLFQ is an open-source tool for label-free quantification. For mokapot to save results in a compatible format, a few extra columns are required to be present, which specify the MS data file name, the theoretical peptide monoisotopic mass, the retention time, and the charge for each PSM. If these are not present, saving to the FlashLFQ format is disabled.

Note that protein grouping in the FlashLFQ results will be more accurate if proteins were added for analysis with mokapot.

Parameters:
out_filestr, optional

The output file to write.

Returns:
str

The path to the saved file.

property levels

The available levels for confidence estimates.

plot_qvalues(level='psms', threshold=0.1, ax=None, **kwargs)

Plot the cumulative number of discoveries over range of q-values.

The available levels can be found using levels() attribute.

Parameters:
levelstr, optional

The level of q-values to report.

thresholdfloat, optional

Indicates the maximum q-value to plot.

axmatplotlib.pyplot.Axes, optional

The matplotlib Axes on which to plot. If None the current Axes instance is used.

**kwargsdict, optional

Arguments passed to matplotlib.pyplot.plot().

Returns:
matplotlib.pyplot.Axes

An matplotlib.axes.Axes with the cumulative number of accepted target PSMs or peptides.

to_txt(dest_dir=None, file_root=None, sep='\t', decoys=False)

Save confidence estimates to delimited text files.

Parameters:
dest_dirstr or None, optional

The directory in which to save the files. None will use the current working directory.

file_rootstr or None, optional

An optional prefix for the confidence estimate files. The suffix will always be “mokapot.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).

sepstr, optional

The delimiter to use.

decoysbool, optional

Save decoys confidence estimates as well?

Returns:
list of str

The paths to the saved files.

mokapot.confidence.plot_qvalues(qvalues, threshold=0.1, ax=None, **kwargs)[source]

Plot the cumulative number of discoveries over range of q-values.

Parameters:
qvaluesnumpy.ndarray

The q-values to plot.

thresholdfloat, optional

Indicates the maximum q-value to plot.

axmatplotlib.pyplot.Axes, optional

The matplotlib Axes on which to plot. If None the current Axes instance is used.

**kwargsdict, optional

Arguments passed to matplotlib.axes.Axes.plot().

Returns:
matplotlib.pyplot.Axes

An matplotlib.axes.Axes with the cumulative number of accepted target PSMs or peptides.