Command Line Interface

mokapot version 0.9.2.dev7+g6725bdc.d20230911. Written by William E. Fondrie ( while in the Department of Genome Sciences at the University of Washington.

Official code website:

More documentation and examples:

usage: mokapot [-h] [-d DEST_DIR] [-w MAX_WORKERS] [-r FILE_ROOT]
               [--proteins PROTEINS] [--decoy_prefix DECOY_PREFIX]
               [--enzyme ENZYME] [--missed_cleavages MISSED_CLEAVAGES]
               [--clip_nterm_methionine] [--min_length MIN_LENGTH]
               [--max_length MAX_LENGTH] [--semi] [--train_fdr TRAIN_FDR]
               [--test_fdr TEST_FDR] [--max_iter MAX_ITER] [--seed SEED]
               [--direction DIRECTION] [--aggregate]
               [--subset_max_train SUBSET_MAX_TRAIN] [--override]
               [--save_models] [--load_models LOAD_MODELS [LOAD_MODELS ...]]
               [--plugin PLUGIN] [--keep_decoys] [--folds FOLDS]
               [--open_modification_bin_size OPEN_MODIFICATION_BIN_SIZE]
               [-v {0,1,2,3}]
               psm_files [psm_files ...]

Positional Arguments


A collection of PSMs in the Percolator tab-delimited or PepXML format.

Named Arguments

-d, --dest_dir

The directory in which to write the result files. Defaults to the current working directory

-w, --max_workers

The number of processes to use for model training. Note that using more than one worker will result in garbled logging messages.

Default: 1

-r, --file_root

The prefix added to all file names.


The FASTA file used for the database search. Using this option enable protein-level confidence estimates using the ‘picked-protein’ approach. Note that the FASTA file must contain both target and decoy sequences. Additionally, verify that the ‘–enzyme’, ‘–missed_cleavages, ‘–min_length’, ‘–max_length’, ‘–semi’, ‘–clip_nterm_methionine’, and ‘–decoy_prefix’ parameters match your search engine conditions.


The prefix used to indicate a decoy protein in the FASTA file. For mokapot to provide accurate confidence estimates, decoy proteins should have same description as the target proteins they were generated from, but this string prepended.

Default: “decoy_


A regular expression defining the enzyme specificity. The cleavage site is interpreted as the end of the match. The default is trypsin, without proline suppression: [KR]

Default: “[KR]”


The allowed number of missed cleavages

Default: 2


Remove methionine residues that occur at the protein N-terminus.

Default: False


The minimum peptide length to consider.

Default: 6


The maximum peptide length to consider.

Default: 50


Was a semi-enzymatic digest used to assign PSMs? If so, the protein database will likely contain shared peptides and yield unhelpful protein-level confidence estimates. We do not recommend using this option.

Default: False


The maximum false discovery rate at which to consider a target PSM as a positive example during model training.

Default: 0.01


The false-discovery rate threshold at which to evaluate the learned models.

Default: 0.01


The number of iterations to use for training.

Default: 10


An integer to use as the random seed.

Default: 1


The name of the feature to use as the initial direction for ranking PSMs. The default automatically selects the feature that finds the most PSMs below the train_fdr.


If used, PSMs from multiple PIN files will be aggregated and analyzed together. Otherwise, a joint model will be trained, but confidence estimates will be calculated separately for each PIN file. This flag only has an effect when multiple PIN files are provided.

Default: False


Maximum number of PSMs to use during the training of each of the cross validation folds in the model. This is useful for very large datasets and will be ignored if less PSMS are available.


Use the learned model even if it performs worse than the best feature.

Default: False


Save the models learned by mokapot as pickled Python objects.

Default: False


Load previously saved models and skip model training.Note that the number of models must match the value of –folds.


The names of the plugins to use.

Default: []


Keep the decoys in the output .txt files

Default: False


The number of cross-validation folds to use. PSMs originating from the same mass spectrum are always in the same fold.

Default: 3


This parameter only affect reading PSMs from PepXML files. If specified, modification masses are binned according to the value. The binned mass difference is appended to the end of the peptide and will be used when grouping peptides for peptide-level confidence estimation. Using this option for open modification search results. We recommend 0.01 as a good starting point.

-v, --verbosity

Possible choices: 0, 1, 2, 3

Specify the verbosity of the current process. Each level prints the following messages, including all those at a lower verbosity: 0-errors, 1-warnings, 2-messages, 3-debug info.

Default: 2