Changelog for mokapot

[Unreleased]

[v0.10.1] - 2023-09-11

Breaking changes

Mokapot now uses numpy.random.Generator instead of the deprecated numpy.random.RandomState API. New rng arguments have been added to functions and classes that rely on randomness in lieu of setting a global random seed with np.random.seed(). Thanks @sjust-seerbio! (#55)

Changed

Added linting with Ruff to tests and pre-commit hooks (along with others)!

Fixed

The PepXML reader, which broke due to a Pandas update.
Potential bug if lowercase peptide sequences were used and protein-level confidence estimates were enabled
Multiprocessing led to the same training set being used for all splits (#104).

[v0.9.1] - 2022-12-14

Changed

Cross-validation classes are now detected by looking for inheritance from the sklearn.model_selection._search.BaseSearchCV class.

Fixed

Fixed backward compatibility issue for Python <3.10.

[v0.9.0] - 2022-12-02

Added

Support for plugins, allowing mokapot to use new models.
Added a custom Docker image with optional dependencies.

Fixed

Confidence objects are now picklable.

Changes

Updated GitHub Actions.
Migrated to a full pyproject.toml setuptools build. Thanks @jspaezp!

[v0.8.3] - 2022-07-20

Fixed

Fixed the reported mokapot score when group FDR is used.

[v0.8.2] - 2022-07-18

Added

mokapot.Model() objects now recorded the CV fold that they were fit on. This means that they can be provided to mokapot.brew() in any order and still maintain proper cross-validation bins.

Fixed

Resolved issue where models were required to have an intercept term.
The PepXML parser would sometimes try and log transform features with 0’s, resulting in missing values.

[v0.8.1] - 2022-06-24

Added

Support for previously trained models in the brew() function and the CLI using the --load_models argument. Thanks @sambenfredj!

Fixed

Using clip_nterm_methionine=True could result in peptides of length min_length-1.
Links to example datasets in the documentation.

[v0.8.0] - 2022-03-11

Thanks to @sambenfredj, @gessulat, @tkschmidt, and @MatthewThe for PR #44, which made these things happen!

Added

A new command line argument, --max_workers. This allows the cross-validation folds to be computed in parallel.
The PercolatorModel class now has an n_jobs parameter, which controls parallelization of the grid search.

Changes

Improved speed by using multiple jobs for grid search by default.
Parallelization within mokapot.brew() now uses joblib instead of concurrent.futures.

[v0.7.4] - 2021-09-03

Changed

Improved documentation and added warnings for --subset_max_train. Thanks @jspaezp!

[v0.7.3] - 2021-07-20

Fixed

Fixed bug where the --keep_decoys did not work with --aggregate. Also, added tests to cover this. Thanks @jspaezp!

[v0.7.2] - 2021-07-16

Added

--keep_decoys option to the command line interface. Thanks @jspaezp!
Notes about setting a random seed to the Python API documentation. (Issue #30)
Added more information about peptides that couldn’t be mapped to proteins. (Issue #29)

Fixed

Loading a saved model with mokapot.load_model() would fail because of an update to Pandas that introduced a new exception. We’ve updated mokapot accordingly.

Changed

Updates to unit tests. Warnings are now treated as errors for system tests.

[v0.7.1] - 2021-03-22

Changed

Updated the build to align with PEP517

[v0.7.0] - 2021-03-19

Added

Support for downstream peptide and protein quantitation with FlashLFQ. This is accomplished through the mokapot.to_flashlfq() function or the to_flashlfq() method of LinearConfidence objects. Note that to support the FlashLFQ format, you’ll need to specify additional columns in read_pin() or use a PepXML input file (read_pepxml()).
Added a top-level function for exporting confident PSMs, peptides, and proteins from one or more LinearConfidence objects as a tab-delimited file: mokapot.to_txt().
Added a top-level function for reading FASTA files for protein-level confidence estimates: mokapot.read_fasta().
Tests accompanying the support for the features above.
Added a “mokapot cookbook” to the documentation with helpful code snippets.

Changed

Corresponding with support for new formats, the mokapot.read_pin() function and the LinearPsmDataset constructor now have many new optional parameters. These specify the columns containing the metadata needed to write the added formats.
Starting mokapot should be slightly faster for Python >= 3.8. We were able to eliminate the runtime call to setuptools, because of the recent addition of importlib.metadata to the standard library, saving a few hundred milliseconds.

[v0.6.2] - 2021-03-12

Added

Now checks to verify there are no debugging print statements in the code base when linting.

Fixed

Removed debugging print statements.

[v0.6.1] - 2021-03-11

Fixed

Parsing Percolator tab-delimited files with a “DefaultDirection” line.
Label column is now converted to boolean during PIN file parsing. Previously, problems occurred if the Label column was of dtype object.
Parsing modifications from pepXML files were indexed incorrectly on the peptide string.

[v0.6.0] - 2021-03-03

Added

Support for parsing PSMs from PepXML input files.
This changelog.

Fixed

Parsing a FASTA file previously failed if an entry was not followed by a sequence. Now, missing sequences are tolerated and a warning is given instead.
When the learned model was worse than the best feature and the lower scores were better for the best feature, assigning confidence would fail.
Easy access to grouped confidence estimates in the Python API were not working due to a typo.
Deprecation warnings from Pandas about the regex argument.
Sometimes peptides were removed as shared incorrectly when part of a protein group.

Changed

Refactored and added many new unit and system tests.
New pull-requests must now improve or maintain test coverage.
Improved error messages.