Changelog for mokapot

[Unreleased]

[v0.10.1] - 2023-09-11

Breaking changes

  • Mokapot now uses numpy.random.Generator instead of the deprecated numpy.random.RandomState API. New rng arguments have been added to functions and classes that rely on randomness in lieu of setting a global random seed with np.random.seed(). Thanks @sjust-seerbio! (#55)

Changed

  • Added linting with Ruff to tests and pre-commit hooks (along with others)!

Fixed

  • The PepXML reader, which broke due to a Pandas update.

  • Potential bug if lowercase peptide sequences were used and protein-level confidence estimates were enabled

  • Multiprocessing led to the same training set being used for all splits (#104).

[v0.9.1] - 2022-12-14

Changed

  • Cross-validation classes are now detected by looking for inheritance from the sklearn.model_selection._search.BaseSearchCV class.

Fixed

  • Fixed backward compatibility issue for Python <3.10.

[v0.9.0] - 2022-12-02

Added

  • Support for plugins, allowing mokapot to use new models.

  • Added a custom Docker image with optional dependencies.

Fixed

  • Confidence objects are now picklable.

Changes

  • Updated GitHub Actions.

  • Migrated to a full pyproject.toml setuptools build. Thanks @jspaezp!

[v0.8.3] - 2022-07-20

Fixed

  • Fixed the reported mokapot score when group FDR is used.

[v0.8.2] - 2022-07-18

Added

  • mokapot.Model() objects now recorded the CV fold that they were fit on. This means that they can be provided to mokapot.brew() in any order and still maintain proper cross-validation bins.

Fixed

  • Resolved issue where models were required to have an intercept term.

  • The PepXML parser would sometimes try and log transform features with 0’s, resulting in missing values.

[v0.8.1] - 2022-06-24

Added

  • Support for previously trained models in the brew() function and the CLI using the --load_models argument. Thanks @sambenfredj!

Fixed

  • Using clip_nterm_methionine=True could result in peptides of length min_length-1.

  • Links to example datasets in the documentation.

[v0.8.0] - 2022-03-11

Thanks to @sambenfredj, @gessulat, @tkschmidt, and @MatthewThe for PR #44, which made these things happen!

Added

  • A new command line argument, --max_workers. This allows the cross-validation folds to be computed in parallel.

  • The PercolatorModel class now has an n_jobs parameter, which controls parallelization of the grid search.

Changes

  • Improved speed by using multiple jobs for grid search by default.

  • Parallelization within mokapot.brew() now uses joblib instead of concurrent.futures.

[v0.7.4] - 2021-09-03

Changed

  • Improved documentation and added warnings for --subset_max_train. Thanks @jspaezp!

[v0.7.3] - 2021-07-20

Fixed

  • Fixed bug where the --keep_decoys did not work with --aggregate. Also, added tests to cover this. Thanks @jspaezp!

[v0.7.2] - 2021-07-16

Added

  • --keep_decoys option to the command line interface. Thanks @jspaezp!

  • Notes about setting a random seed to the Python API documentation. (Issue #30)

  • Added more information about peptides that couldn’t be mapped to proteins. (Issue #29)

Fixed

  • Loading a saved model with mokapot.load_model() would fail because of an update to Pandas that introduced a new exception. We’ve updated mokapot accordingly.

Changed

  • Updates to unit tests. Warnings are now treated as errors for system tests.

[v0.7.1] - 2021-03-22

Changed

  • Updated the build to align with PEP517

[v0.7.0] - 2021-03-19

Added

  • Support for downstream peptide and protein quantitation with FlashLFQ. This is accomplished through the mokapot.to_flashlfq() function or the to_flashlfq() method of LinearConfidence objects. Note that to support the FlashLFQ format, you’ll need to specify additional columns in read_pin() or use a PepXML input file (read_pepxml()).

  • Added a top-level function for exporting confident PSMs, peptides, and proteins from one or more LinearConfidence objects as a tab-delimited file: mokapot.to_txt().

  • Added a top-level function for reading FASTA files for protein-level confidence estimates: mokapot.read_fasta().

  • Tests accompanying the support for the features above.

  • Added a “mokapot cookbook” to the documentation with helpful code snippets.

Changed

  • Corresponding with support for new formats, the mokapot.read_pin() function and the LinearPsmDataset constructor now have many new optional parameters. These specify the columns containing the metadata needed to write the added formats.

  • Starting mokapot should be slightly faster for Python >= 3.8. We were able to eliminate the runtime call to setuptools, because of the recent addition of importlib.metadata to the standard library, saving a few hundred milliseconds.

[v0.6.2] - 2021-03-12

Added

  • Now checks to verify there are no debugging print statements in the code base when linting.

Fixed

  • Removed debugging print statements.

[v0.6.1] - 2021-03-11

Fixed

  • Parsing Percolator tab-delimited files with a “DefaultDirection” line.

  • Label column is now converted to boolean during PIN file parsing. Previously, problems occurred if the Label column was of dtype object.

  • Parsing modifications from pepXML files were indexed incorrectly on the peptide string.

[v0.6.0] - 2021-03-03

Added

  • Support for parsing PSMs from PepXML input files.

  • This changelog.

Fixed

  • Parsing a FASTA file previously failed if an entry was not followed by a sequence. Now, missing sequences are tolerated and a warning is given instead.

  • When the learned model was worse than the best feature and the lower scores were better for the best feature, assigning confidence would fail.

  • Easy access to grouped confidence estimates in the Python API were not working due to a typo.

  • Deprecation warnings from Pandas about the regex argument.

  • Sometimes peptides were removed as shared incorrectly when part of a protein group.

Changed

  • Refactored and added many new unit and system tests.

  • New pull-requests must now improve or maintain test coverage.

  • Improved error messages.