Python API
The Python API enables maximum flexibility when using mokapot. It also aids in making analyses reproducible by easily integrating into Jupyter notebooks and Python scripts.
Read PSMs using the read_pin()
or
read_pepxml()
functions for files in the Percolator
tab-delimited format or PepXML format, respectively. Once a collection of PSMs
has been read, the brew()
function will apply the mokapot
algorithm to learn models from the PSMs and assign confidence estimates based on
their new scores. Alternatively, the
assign_confidence()
method will
assign confidence estimates to PSMs based on the best feature, which is often
the primary score from the database search engine.
Alternatively, PSMs that are already represented in a
pandas.DataFrame
can be directly used to create a
LinearPsmDataset
.
Finally, custom machine learning models can be created using the
mokapot.model.Model
class.
Functions
Primary Functions
Read Percolator input (PIN) tab-delimited files. |
|
Read PepXML files. |
|
Parse a FASTA file, storing a mapping of peptides and proteins. |
|
Re-score one or more collection of PSMs. |
|
Save confidence estimates to delimited text files. |
|
Save confidenct peptides for quantification with FlashLFQ. |
Utility Functions
Save a |
|
Load a saved model for mokapot. |
|
Read a Percolator tab-delimited file. |
|
Plot the cumulative number of discoveries over range of q-values. |
|
Create a FASTA file with decoy sequences. |
|
Digest a protein sequence into its constituent peptides. |
Machine Learning Models
Use a model that emulates the Linear support vector machine used by Percolator or create a custom model from anything with a Scikit-Learn interface.
A model that emulates Percolator. |
|
A machine learning model to re-score PSMs. |
Collections of PSMs
PSMs can be parsed from Percolator tab-delimited files, PepXML files, or
directly from a pandas.DataFrame
.
Store and analyze a collection of PSMs. |
Confidence Estimates
An analysis with mokapot yields two forms of confidence estimates—q-values and posterior error probabilities (PEPs)—at various levels: PSMs, peptides, and optionally, proteins.
Assign confidence estimates to a set of PSMs |
Protein Sequences
To calculate protein-level confidence estimates, mokapot needs the original
protein sequences and digestion parameters used for the database search. These
are created using the mokapot.read_fasta()
function, which return a
Proteins
object. Proteins
objects store the mapping of
peptides to the proteins that may have generated them and the mapping of
target protein sequences to their corresponding decoys.
Store protein sequences. |