Python API

The Python API enables maximum flexibility when using mokapot. It also aids in making analyses reproducible by easily integrating into Jupyter notebooks and Python scripts.

Read PSMs using the read_pin() or read_pepxml() functions for files in the Percolator tab-delimited format or PepXML format, respectively. Once a collection of PSMs has been read, the brew() function will apply the mokapot algorithm to learn models from the PSMs and assign confidence estimates based on their new scores. Alternatively, the assign_confidence() method will assign confidence estimates to PSMs based on the best feature, which is often the primary score from the database search engine.

Alternatively, PSMs that are already represented in a pandas.DataFrame can be directly used to create a LinearPsmDataset.

Finally, custom machine learning models can be created using the mokapot.model.Model class.

Functions

Primary Functions

`read_pin`	Read Percolator input (PIN) tab-delimited files.
`read_pepxml`	Read PepXML files.
`read_fasta`	Parse a FASTA file, storing a mapping of peptides and proteins.
`brew`	Re-score one or more collection of PSMs.
`to_txt`	Save confidence estimates to delimited text files.
`to_flashlfq`	Save confidenct peptides for quantification with FlashLFQ.

Utility Functions

`save_model`	Save a `mokapot.model.Model` object to a file.
`load_model`	Load a saved model for mokapot.
`read_percolator`	Read a Percolator tab-delimited file.
`plot_qvalues`	Plot the cumulative number of discoveries over range of q-values.
`make_decoys`	Create a FASTA file with decoy sequences.
`digest`	Digest a protein sequence into its constituent peptides.

Machine Learning Models

Use a model that emulates the Linear support vector machine used by Percolator or create a custom model from anything with a Scikit-Learn interface.

`PercolatorModel`	A model that emulates Percolator.
`Model`	A machine learning model to re-score PSMs.

Collections of PSMs

PSMs can be parsed from Percolator tab-delimited files, PepXML files, or directly from a pandas.DataFrame.

LinearPsmDataset

Store and analyze a collection of PSMs.

Confidence Estimates

An analysis with mokapot yields two forms of confidence estimates—q-values and posterior error probabilities (PEPs)—at various levels: PSMs, peptides, and optionally, proteins.

LinearConfidence

Assign confidence estimates to a set of PSMs

Protein Sequences

To calculate protein-level confidence estimates, mokapot needs the original protein sequences and digestion parameters used for the database search. These are created using the mokapot.read_fasta() function, which return a Proteins object. Proteins objects store the mapping of peptides to the proteins that may have generated them and the mapping of target protein sequences to their corresponding decoys.

Proteins

Store protein sequences.