Developing

The Cutadapt source code is on GitHub. Cutadapt is written in Python 3 with some extension modules that are written in Cython.

Development installation

For development, make sure that you install Cython and tox. We also recommend using a virtualenv. This sequence of commands should work:

git clone https://github.com/marcelm/cutadapt.git  # or clone your own fork
cd cutadapt
virtualenv .venv
source .venv/bin/activate
pip install Cython pytest tox pre-commit
pre-commit install
pip install -e .

Then you should be able to run Cutadapt:

cutadapt --version

Remember that you do not need to activate a virtualenv to run binaries in it, so this works even when the environment is activated:

venv/bin/cutadapt --version

The tests can then be run like this:

pytest

Or with tox (but then you will need to have binaries for all tested Python versions installed):

tox

Making a release

A new release is automatically deployed to PyPI whenever a new tag is pushed to the Git repository.

Cutadapt uses setuptools_scm to automatically manage version numbers. This means that the version is not stored in the source code but derived from the most recent Git tag. The following procedure can be used to bump the version and make a new release.

  1. Update CHANGES.rst (version number and list of changes)

  2. Ensure you have no uncommitted changes in the working copy.

  3. Run a git pull.

  4. Run tox, ensuring all tests pass.

  5. Tag the current commit with the version number (there must be a v prefix):

    git tag v0.1
    

    To release a development version, use a dev version number such as v1.17.dev1. Users will not automatically get these unless they use pip install --pre.

  6. Push the tag:

    git push --tags
    
  7. Wait for the GitHub Action to finish and to deploy to PyPI.

  8. The bioconda recipe also needs to be updated, but the bioconda bot will likely do this automatically if you just wait a little while.

    Ensure that the list of dependencies (the requirements: section in the recipe) is in sync with the setup.cfg file.

If something went wrong after a version has already been tagged and published to PyPI, fix the problem and tag a new version. Do not change a version that has already been uploaded.

Contributing

Contributions to Cutadapt in the form of source code or documentation improvements or helping out with responding to issues are welcome!

To contribute to Cutadapt development, it is easiest to send in a pull request (PR) on GitHub.

Here are some guidelines for how to do this. They are not strict rules. When in doubt, send in a PR and we will sort it out.

  • Limit a PR to a single topic. Submit multiple PRs if necessary. This way, it is easier to discuss the changes individually, and in case we find that one of them should not go in, the others can still be accepted.
  • For larger changes, consider opening an issue first to plan what you want to do.
  • Include appropriate unit or integration tests. Sometimes, tests are hard to write or don’t make sense. If you think this is the case, just leave the tests out initially and we can discuss whether to add any.
  • Add documentation and a changelog entry if appropriate.

Code style

  • The source code needs to be formatted with black. If you install pre-commit, the formatting will be done for you.
  • There are inconsistencies in the current code base since it’s a few years old already. New code should follow the current rules, however.
  • Using an IDE is beneficial (PyCharm, for example). It helps to catch lots of style issues early (unused imports, spacing etc.).
  • Avoid unnecessary abbreviations for variable names. Code is more often read than written.
  • When writing a help text for a new command-line option, look at the output of cutadapt --help and try to make it look nice and short.
  • In comments and documentation, capitalize FASTQ, BWA, CPU etc.

Ideas/To Do

This is a rather unsorted list of features that would be nice to have, of things that could be improved in the source code, and of possible algorithmic improvements.

  • show average error rate
  • length histogram
  • --detect prints out best guess which of the given adapters is the correct one
  • warn when given adapter sequence contains non-IUPAC characters

Specifying adapters

Allow something such as -a ADAP$TER or -a ADAPTER$NNN. This would be a way to specify less strict anchoring.

Allow N{3,10} as in regular expressions (for a variable-length sequence).

Use parentheses to specify the part of the sequence that should be kept:

  • -a (...)ADAPTER (default)
  • -a (...ADAPTER) (default)
  • -a ADAPTER(...) (default)
  • -a (ADAPTER...) (??)

Or, specify the part that should be removed:

-a ...(ADAPTER...) -a ...ADAPTER(...) -a (ADAPTER)...

Available letters for command-line options

  • Lowercase letters: i, k, s, w
  • Uppercase letters: C, D, E, F, H, I, J, K, L, P, R, S, T, V, W
  • Deprecated, could be re-used: c, d, t
  • Planned/reserved: Q (paired-end quality trimming), V (alias for –version)