Confidence Estimation
One of the primary purposes of mokapot is to assign confidence estimates to
PSMs. This task is accomplished by ranking PSMs according to a score and using
an appropriate confidence estimation procedure for the type of data. mokapot
can provide confidence estimates based any score, regardless of whether it was
the result of a learned Model()
instance or provided
independently.
The following classes store the confidence estimates for a dataset based on the
provided score. They provide utilities to access, save, and plot these
estimates for the various relevant levels (i.e. PSMs, peptides, and proteins).
The LinearConfidence()
class is appropriate for most data-dependent
acquisition proteomics datasets.
We recommend using the brew()
function or the
assign_confidence()
method to obtain these
confidence estimates, rather than initializing the classes below directly.
- class mokapot.confidence.GroupedConfidence(psms, scores, desc=True, eval_fdr=0.01)[source]
Perform grouped confidence estimation for a collection of PSMs.
Groups are defined by the
LinearPsmDataset
. Confidence estimates for each group can be retrieved by using the group name as an attribute, or from thegroup_confidence_estimates()
property.- Parameters:
- psmsLinearPsmDataset object
A collection of PSMs.
- scoresnp.ndarray
A vector containing the score of each PSM.
- descbool
Are higher scores better?
- eval_fdrfloat
The FDR threshold at which to report performance. This parameter has no affect on the analysis itself, only logging messages.
- rngint, np.random.Generator, optional
A seed or generator used to break ties, or None to use the default random number generator state.
- Attributes:
- groups: List
- group_confidence_estimates: Dict
Methods
to_txt
([dest_dir, file_root, sep, decoys, ...])Save confidence estimates to delimited text files.
- property group_confidence_estimates
A dictionary of the confidence estimates for each group.
- property groups
The groups for confidence estimation
- to_txt(dest_dir=None, file_root=None, sep='\t', decoys=False, combine=False)[source]
Save confidence estimates to delimited text files.
- Parameters:
- dest_dirstr or None, optional
The directory in which to save the files. None will use the current working directory.
- file_rootstr or None, optional
An optional prefix for the confidence estimate files. The suffix will be “mokapot.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins) if
combine=True
. Ifcombine=False
(the default), additionally the group value is prepended, yeilding a suffix “{group}.mokapot.{level}.txt”.- sepstr, optional
The delimiter to use.
- decoysbool, optional
Save decoys confidence estimates as well?
- combinebool, optional
Should groups be combined into a single file?
- Returns:
- list of str
The paths to the saved files.
- class mokapot.confidence.LinearConfidence(psms, scores, desc=True, eval_fdr=0.01)[source]
Assign confidence estimates to a set of PSMs
Estimate q-values and posterior error probabilities (PEPs) for PSMs and peptides when ranked by the provided scores.
- Parameters:
- psmsLinearPsmDataset object
A collection of PSMs.
- scoresnp.ndarray
A vector containing the score of each PSM.
- descbool
Are higher scores better?
- eval_fdrfloat
The FDR threshold at which to report performance. This parameter has no affect on the analysis itself, only logging messages.
- Attributes:
levels
list of strThe available levels for confidence estimates.
- psmspandas.DataFrame
Confidence estimates for PSMs in the dataset.
- peptidespandas.DataFrame
Confidence estimates for peptides in the dataset.
- proteinspandas.DataFrame or None
Confidence estimates for proteins in the dataset.
- confidence_estimatesDict[str, pandas.DataFrame]
A dictionary of confidence estimates at each level.
- decoy_confidence_estimatesDict[str, pandas.DataFrame]
A dictionary of confidence estimates for the decoys at each level.
Methods
plot_qvalues
([level, threshold, ax])Plot the cumulative number of discoveries over range of q-values.
to_flashlfq
([out_file])Save confidenct peptides for quantification with FlashLFQ.
to_txt
([dest_dir, file_root, sep, decoys])Save confidence estimates to delimited text files.
- to_flashlfq(out_file='mokapot.flashlfq.txt')[source]
Save confidenct peptides for quantification with FlashLFQ.
FlashLFQ is an open-source tool for label-free quantification. For mokapot to save results in a compatible format, a few extra columns are required to be present, which specify the MS data file name, the theoretical peptide monoisotopic mass, the retention time, and the charge for each PSM. If these are not present, saving to the FlashLFQ format is disabled.
Note that protein grouping in the FlashLFQ results will be more accurate if proteins were added for analysis with mokapot.
- Parameters:
- out_filestr, optional
The output file to write.
- Returns:
- str
The path to the saved file.
- property levels
The available levels for confidence estimates.
- plot_qvalues(level='psms', threshold=0.1, ax=None, **kwargs)
Plot the cumulative number of discoveries over range of q-values.
The available levels can be found using
levels()
attribute.- Parameters:
- levelstr, optional
The level of q-values to report.
- thresholdfloat, optional
Indicates the maximum q-value to plot.
- axmatplotlib.pyplot.Axes, optional
The matplotlib Axes on which to plot. If None the current Axes instance is used.
- **kwargsdict, optional
Arguments passed to
matplotlib.pyplot.plot()
.
- Returns:
- matplotlib.pyplot.Axes
An
matplotlib.axes.Axes
with the cumulative number of accepted target PSMs or peptides.
- to_txt(dest_dir=None, file_root=None, sep='\t', decoys=False)
Save confidence estimates to delimited text files.
- Parameters:
- dest_dirstr or None, optional
The directory in which to save the files. None will use the current working directory.
- file_rootstr or None, optional
An optional prefix for the confidence estimate files. The suffix will always be “mokapot.{level}.txt”, where “{level}” indicates the level at which confidence estimation was performed (i.e. PSMs, peptides, proteins).
- sepstr, optional
The delimiter to use.
- decoysbool, optional
Save decoys confidence estimates as well?
- Returns:
- list of str
The paths to the saved files.
- mokapot.confidence.plot_qvalues(qvalues, threshold=0.1, ax=None, **kwargs)[source]
Plot the cumulative number of discoveries over range of q-values.
- Parameters:
- qvaluesnumpy.ndarray
The q-values to plot.
- thresholdfloat, optional
Indicates the maximum q-value to plot.
- axmatplotlib.pyplot.Axes, optional
The matplotlib Axes on which to plot. If None the current Axes instance is used.
- **kwargsdict, optional
Arguments passed to
matplotlib.axes.Axes.plot()
.
- Returns:
- matplotlib.pyplot.Axes
An
matplotlib.axes.Axes
with the cumulative number of accepted target PSMs or peptides.