Command Line Interface

mokapot version 0.9.2.dev7+g6725bdc.d20230911. Written by William E. Fondrie (wfondrie@talus.bio) while in the Department of Genome Sciences at the University of Washington.

Official code website: https://github.com/wfondrie/mokapot

More documentation and examples: https://mokapot.readthedocs.io

usage: mokapot [-h] [-d DEST_DIR] [-w MAX_WORKERS] [-r FILE_ROOT]
               [--proteins PROTEINS] [--decoy_prefix DECOY_PREFIX]
               [--enzyme ENZYME] [--missed_cleavages MISSED_CLEAVAGES]
               [--clip_nterm_methionine] [--min_length MIN_LENGTH]
               [--max_length MAX_LENGTH] [--semi] [--train_fdr TRAIN_FDR]
               [--test_fdr TEST_FDR] [--max_iter MAX_ITER] [--seed SEED]
               [--direction DIRECTION] [--aggregate]
               [--subset_max_train SUBSET_MAX_TRAIN] [--override]
               [--save_models] [--load_models LOAD_MODELS [LOAD_MODELS ...]]
               [--plugin PLUGIN] [--keep_decoys] [--folds FOLDS]
               [--open_modification_bin_size OPEN_MODIFICATION_BIN_SIZE]
               [-v {0,1,2,3}]
               psm_files [psm_files ...]

Positional Arguments

psm_files: A collection of PSMs in the Percolator tab-delimited or PepXML format.

Named Arguments

-d, --dest_dir

The directory in which to write the result files. Defaults to the current working directory

-w, --max_workers

The number of processes to use for model training. Note that using more than one worker will result in garbled logging messages.

Default: 1

-r, --file_root

The prefix added to all file names.

--proteins

The FASTA file used for the database search. Using this option enable protein-level confidence estimates using the ‘picked-protein’ approach. Note that the FASTA file must contain both target and decoy sequences. Additionally, verify that the ‘–enzyme’, ‘–missed_cleavages, ‘–min_length’, ‘–max_length’, ‘–semi’, ‘–clip_nterm_methionine’, and ‘–decoy_prefix’ parameters match your search engine conditions.

--decoy_prefix

The prefix used to indicate a decoy protein in the FASTA file. For mokapot to provide accurate confidence estimates, decoy proteins should have same description as the target proteins they were generated from, but this string prepended.

Default: “decoy_”

--enzyme

A regular expression defining the enzyme specificity. The cleavage site is interpreted as the end of the match. The default is trypsin, without proline suppression: [KR]

Default: “[KR]”

--missed_cleavages

The allowed number of missed cleavages

Default: 2

--clip_nterm_methionine

Remove methionine residues that occur at the protein N-terminus.

Default: False

--min_length

The minimum peptide length to consider.

Default: 6

--max_length

The maximum peptide length to consider.

Default: 50

--semi

Was a semi-enzymatic digest used to assign PSMs? If so, the protein database will likely contain shared peptides and yield unhelpful protein-level confidence estimates. We do not recommend using this option.

Default: False

--train_fdr

The maximum false discovery rate at which to consider a target PSM as a positive example during model training.

Default: 0.01

--test_fdr

The false-discovery rate threshold at which to evaluate the learned models.

Default: 0.01

--max_iter

The number of iterations to use for training.

Default: 10

--seed

An integer to use as the random seed.

Default: 1

--direction

The name of the feature to use as the initial direction for ranking PSMs. The default automatically selects the feature that finds the most PSMs below the train_fdr.

--aggregate

If used, PSMs from multiple PIN files will be aggregated and analyzed together. Otherwise, a joint model will be trained, but confidence estimates will be calculated separately for each PIN file. This flag only has an effect when multiple PIN files are provided.

Default: False

--subset_max_train

Maximum number of PSMs to use during the training of each of the cross validation folds in the model. This is useful for very large datasets and will be ignored if less PSMS are available.

--override

Use the learned model even if it performs worse than the best feature.

Default: False

--save_models

Save the models learned by mokapot as pickled Python objects.

Default: False

--load_models

Load previously saved models and skip model training.Note that the number of models must match the value of –folds.

--plugin

The names of the plugins to use.

Default: []

--keep_decoys

Keep the decoys in the output .txt files

Default: False

--folds

The number of cross-validation folds to use. PSMs originating from the same mass spectrum are always in the same fold.

Default: 3

--open_modification_bin_size

This parameter only affect reading PSMs from PepXML files. If specified, modification masses are binned according to the value. The binned mass difference is appended to the end of the peptide and will be used when grouping peptides for peptide-level confidence estimation. Using this option for open modification search results. We recommend 0.01 as a good starting point.

-v, --verbosity

Possible choices: 0, 1, 2, 3

Specify the verbosity of the current process. Each level prints the following messages, including all those at a lower verbosity: 0-errors, 1-warnings, 2-messages, 3-debug info.

Default: 2