Analysis pipelines with Python: Glossary

Key Points

Basic syntax
  • Errors are there to help us.

Scripts and imports
  • To run a Python program, use python3 program_name.py.

Numpy arrays and lists
  • Lists store a sequence of elements.

  • Numpy allows vector math in Python.

Storing data with dicts
  • Dicts provide key-value storage of information.

Functions and Conditions
  • map() applies a function to every object in a data structure.

  • filter() returns only the data objects for which some condition is true.

Introduction to parallel computing
  • Pool.map() will perform an operation in parallel.

Introduction to Snakemake
  • Bash scripts are not an efficient way of storing a workflow.

  • Snakemake is one method of managing a complex computational workflow.

Snakefiles
  • Snakemake follows Python syntax

  • Rules can have an input and/or outputs, and a command to be run.

Wildcards
  • Use {output} to refer to the output of the current rule.

  • Use {input} to refer to the dependencies of the current rule.

  • You can use Python indexing to retrieve individual outputs and inputs (example: {input[0]})

  • Wildcards can be named (example: {input.file1}).

Pattern Rules
  • Use any named wildcard ({some_name}) as a placeholder in targets and dependencies.

Snakefiles are Python code
  • Snakefiles are Python code.

  • The entire Snakefile is executed whenever you run snakemake.

  • All actual work should be done by rules.

Resources and parallelism
  • Use threads to indicate the number of cores used by a rule.

  • Resources are arbitrary and can be used for anything.

  • The && operator is a useful tool when chaining bash commands.

Scaling a pipeline across a cluster
  • Snakemake generates and submits its own batch scripts for your scheduler.

  • localrules defines rules that are executed on the Snakemake headnode.

  • $PATH must be passed to Snakemake rules.

  • nohup <command> & prevents <command> from exiting when you log off.

Final notes
  • Token files can be used to take the place of output files if none are created.

  • snakemake --dag | dot -Tsvg > dag.svg creates a graphic of your workflow.

  • snakemake --gui opens a browser window with your workflow.

Glossary

FIXME