Python API

The Python API enables maximum flexibility when using mokapot. It also aids in making analyses reproducible by easily integrating into Jupyter notebooks and Python scripts.

Read PSMs using the read_pin() or read_pepxml() functions for files in the Percolator tab-delimited format or PepXML format, respectively. Once a collection of PSMs has been read, the brew() function will apply the mokapot algorithm to learn models from the PSMs and assign confidence estimates based on their new scores. Alternatively, the assign_confidence() method will assign confidence estimates to PSMs based on the best feature, which is often the primary score from the database search engine.

Alternatively, PSMs that are already represented in a pandas.DataFrame can be directly used to create a LinearPsmDataset.

Finally, custom machine learning models can be created using the mokapot.model.Model class.

Functions

Primary Functions

read_pin

Read Percolator input (PIN) tab-delimited files.

read_pepxml

Read PepXML files.

read_fasta

Parse a FASTA file, storing a mapping of peptides and proteins.

brew

Re-score one or more collection of PSMs.

to_txt

Save confidence estimates to delimited text files.

to_flashlfq

Save confidenct peptides for quantification with FlashLFQ.

Utility Functions

save_model

Save a mokapot.model.Model object to a file.

load_model

Load a saved model for mokapot.

read_percolator

Read a Percolator tab-delimited file.

plot_qvalues

Plot the cumulative number of discoveries over range of q-values.

make_decoys

Create a FASTA file with decoy sequences.

digest

Digest a protein sequence into its constituent peptides.

Machine Learning Models

Use a model that emulates the Linear support vector machine used by Percolator or create a custom model from anything with a Scikit-Learn interface.

PercolatorModel

A model that emulates Percolator.

Model

A machine learning model to re-score PSMs.

Collections of PSMs

PSMs can be parsed from Percolator tab-delimited files, PepXML files, or directly from a pandas.DataFrame.

LinearPsmDataset

Store and analyze a collection of PSMs.

Confidence Estimates

An analysis with mokapot yields two forms of confidence estimates—q-values and posterior error probabilities (PEPs)—at various levels: PSMs, peptides, and optionally, proteins.

LinearConfidence

Assign confidence estimates to a set of PSMs

Protein Sequences

To calculate protein-level confidence estimates, mokapot needs the original protein sequences and digestion parameters used for the database search. These are created using the mokapot.read_fasta() function, which return a Proteins object. Proteins objects store the mapping of peptides to the proteins that may have generated them and the mapping of target protein sequences to their corresponding decoys.

Proteins

Store protein sequences.