Requirements
Optional: |Singularity|
CulebrONT is developed to work mostly on an HPC distributed cluster but a local, single machine, installation is also possible.
Install CulebrONT PyPI package
First, install the CulebrONT python package with pip.
You can install the latest available version
python3 -m pip install culebrONT
culebrONT --help
Optionally, you can specify the version such as :
python3 -m pip install culebrONT==4.0.0
culebrONT --help
Now, follow this documentation according to what you want, local or HPC mode.
Steps for installation
Install CulebrONT using culebrONT install command line.
Usage: culebrONT install [OPTIONS]
Run installation of tool for HPC cluster
Options:
-m, --mode [slurm|local] Mode for installation [default: slurm]
-e, --env [env-modules|apptainer]
Mode for tools dependencies for slurm: ['env-modules', 'apptainer'], local: ['env-modules', 'apptainer']
[default: apptainer]
--bash_completion / --no-bash_completion
Allow bash completion of culebrONT commands on the bashrc file [default: bash_completion]
Optionally (but recommended), after installing in local, you can check the CulebrONT installation using a dataset scaled for single machine.
See the section :ref:`Check install` for details.
Install CulebrONT in local mode by using culebrONT install -m local or in a HPC cluster using culebrONT install -m slurm (default mode) command line.
if you have installed culebrONT with cluster mode, it’s possible to chose between apptainer or use module environments usually available in HPC. In local mode, only apptainer is avail. Use the option -e, --env [env-modules|apptainer].
if tools dependencies are installed using Apptainer, an image is automatically downloaded and used by the configuration files of the pipeline. Local mode install, without scheduler, is constrains to use this apptainer image.
Warning
An Apptainer image is downloaded in the location of the package CulebrONT. Be careful these images need at approximately 3.5 G of free space. If installed with Pypi with the flag –user (without root), the package is installed in your HOME.
Steps for HPC distributed cluster installation
If you have already install culebrONT in your cluster by the command culebrONT install -m slurm as explained before, you need to configurate resources allocated by each soft used. CulebrONT uses any available snakemake profiles to ease cluster installation and resources management. So, please adapt a few files according to your own system architecture.
1. Adapt profile and config.yaml
Now that CulebrONT is installed, it proposes default configuration files, but they can be modified. .
1. Adapt the pre-formatted snakemake profil to configure your cluster options. See the section 1. Snakemake profiles for details.
2. Adapt the config.yaml file to manage cluster resources such as partition, memory and threads available for each job.
See the section 2. Adapting config.yaml for further details.
2. Adapt tools_path.yaml
As CulebrONT uses many tools, you must install them using one of the two following possibilities:
culebrONT install --help
culebrONT install --mode slurm --env modules
# OR
culebrONT install --mode slurm --env apptainer
If --env apptainer argument is specified, CulebrONT will download previously build Apptainer images, containing the complete environment need to run CulebrONT (tools and dependencies).
Adapt the file :file:tools_path.yaml - in YAML (Yet Another Markup Language) - format to indicate CulebrONT where the different tools are installed on your cluster.
See the section 3. How to configure tools_path.yaml for details.
Check install
In order to test your install of CulebrONT, a data test called Data-Xoo-sub/ is available at https://itrop.ird.fr/culebront_utilities/.
test_install
Test_install function downloads a scaled data test, writes a configuration file adapted to it and proposes a command line already to run !!!
culebrONT test_install [OPTIONS]
Options
- -d, --data_dir <data_dir>
Required Path to download data test and create config.yaml to run test
This dataset will be automatically downloaded by CulebrONT in the -d repertory using :
culebrONT test_install -d test
Launching the (suggested, to be adapted) command line in CLUSTER mode will perform the tests:
culebrONT run --configfile test/data_test_config.yaml
In local mode, type :
culebrONT run -t 8 -c test/data_test_config.yaml --apptainer-args "--bind $HOME"
Advance installation
1. Snakemake profiles
The Snakemake-profiles project is an open effort to create configuration profiles allowing to execute Snakemake in various computing environments (job scheduling systems as Slurm, SGE, Grid middleware, or cloud computing), and available at https://github.com/Snakemake-Profiles/doc.
In order to run CulebrONT on HPC cluster, we take advantages of profiles.
Quickly, see here an example of the Snakemake SLURM profile we used for the French national bioinformatics infrastructure at IFB.
More info about profiles can be found here https://github.com/Snakemake-Profiles/slurm#quickstart.
Preparing the profile’s config.yaml file
Once your basic profile is created, to finalize it, modify as necessary the culebrONT/culebrONT/default_profile/config.yaml to customize Snakemake parameters that will be used internally by CulebrONT:
restart-times: 0
jobscript: "slurm-jobscript.sh"
cluster: "slurm-submit.py"
cluster-status: "slurm-status.py"
max-jobs-per-second: 1
max-status-checks-per-second: 10
local-cores: 1
jobs: 200 # edit to limit the number of jobs submitted in parallel
latency-wait: 60000000
use-envmodules: true # adapt True/False for env of apptainer, but only active one possibility !
use-apptainer: false
rerun-incomplete: true
printshellcmds: true
2. Adapting config.yaml
In the config.yaml file, you can manage HPC resources, choosing partition, memory and threads to be used by default,
or specifically, for each rule/tool depending on your HPC Job Scheduler (see there). This file generally belongs to a Snakemake profile (see above).
Warning
If more memory or threads are requested, please adapt the content of this file before running on your cluster.
To access to the config.yaml file you must use
culebrONT edit_profile
The edited config.yaml file will be save in your home /home/USER/.config/ path. A list of CulebrONT rules names can be found in the section Threading rules inside CulebrONT.
Warning
For some rules in the config.yaml as rule_graph or run_get_versions, we use by default wildcards, please don’t remove it.
3. How to configure tools_path.yaml
Note
About versions of tools, the user can choose themself what version of tools to use with modules or with apptainer. HOWEVER, the pipeline was validated with specific versions (check the apptainer def) so it may leads to error due to parameter changes. Assembly
To access to the config.yaml file you must use
culebrONT edit_tools
In the tools_path file, you can find two sections: APPTAINER and ENVMODULES. In order to fill it correctly, you have 2 options:
1. Use only APPTAINER containers: in this case, fill only this section. Put the path to the built apptainer images you want to use. Absolute paths are strongly recommended. See the section ‘How to build apptainer images’ for further details.
APPTAINER:
TOOLS : 'INSTALL_PATH/containers/apptainer.culebront_tools.sif'
Warning
To ensure APPTAINER containers to be really used, one needs to make sure that the –use-apptainer flag is included in the snakemake command line.
Use only ENVMODULES: in this case, fill this section with the modules available on your cluster (here is an example):
R : "r"
QUARTO : "quarto"
QUAST : "quast"
MAUVE : "mauve"
SHASTA : "shasta"
ASSEMBLYTICS : "assemblytics"
MEDAKA : "medaka"
MERQURY: "merqury"
BLOBTOOLS : "blobtools"
Warning
Make sure to specify the –use-envmodules flag in the snakemake command line for ENVMODULE to be implemented. More details can be found here: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#using-environment-modules
And more …
How to build Apptainer images
You can build your own image using the available .def recipes from the culebrONT/culebrONT/containers/ directory.
Warning
Be careful, you need root access to build apptainer images
cd culebrONT/culebrONT/containers/
sudo apptainer build apptainer.culebront_tools.sif apptainer.culebront_tools.def
Threading rules inside CulebrONT
Please find here the rules names found in CulebrONT code. It is recommended to set threads using the snakemake command when running on a single machine, or in a cluster configuration file to manage cluster resources through the job scheduler. This would save users a painful exploration of the snakefiles of CulebrONT.
run_flye
run_canu
run_minimap_for_miniasm
run_miniasm
run_minipolish
run_raven
convert_fastq_to_fasta
run_smartdenovo
run_shasta
run_circlator
tag_circular
tag_circular_to_minipolish
rotate_circular
run_fixstart
index_fasta_to_correction
run_minialign_to_medaka
run_medaka_train
run_medaka_consensus
run_pilon_first_round
run_pilon
rule run_racon
preparing_fasta_to_quality
run_quast
run_busco
run_diamond
run_minimap2
run_blobtools
run_mummer
run_assemblytics
run_mauve
run_bwa_mem2
run_flagstat
run_merqury
run_benchmark_time
stats_assembly
rule_graph
run_report_snakemake
run_flagstats_stats
run_busco_stats
rule_graph
copy_final_assemblies
report_by_sample
report_about_workflow
ipynb_convert_samples_qmd
ipynb_convert_qmd
edit_quarto
build_book