OCaml linting tools and techniques

Recently (but also 3 years ago), I was interested in finding all catch-all exception handlers in Goblint, which is written in OCaml, in order to prevent “uncatchable” exceptions from being caught and accidentally swallowed. “Uncatchable” exceptions are those which should not be ignored, e.g. Out_of_memory. My first attempt was using a Semgrep rule, but it turned out to be too buggy to reliably do the job. Therefore, I sought out code linters for OCaml.

Tools

The following table summarizes all OCaml linting tools I managed to find; active or dead, general or special-purpose, standalone or Ppx, monolithic or modular. In this post I focus on linting (based on syntax and possibly types) and exclude program analyzers like reanalyze and Salto. Ocamllint and ocp-lint are the most universal attempts at OCaml linting, however they’re long dead and no replacement seems to have emerged.

Tool Status Use case Mode Structure
ocamllint Archived General Ppx Monolithic
ocp-lint/typerex-lint Inactive General Standalone Modular & extensible
camelot Semiactive General/teaching Standalone Modular
zanuda Active General Standalone Modular
bene-gesselint Inactive Framework Ppxlib Modular & extensible
ppx_js_style Active Company Ppxlib Monolithic
base ppx_base_lint Active Project Ppxlib Monolithic
mina ppx_version Active Project Ppxlib Monolithic
less-power ast-check Active Teaching Standalone/Ppxlib Monolithic

There are two general execution modes:

  1. Ppx, which are OCaml AST preprocessors, executed by the build system similarly to other Ppx-es like @@deriving features. These are relatively easy to integrate into modern dune-based workflows.
  2. Standalone, which are to be executed outside of the usual compilation process. These don’t integrate into modern dune-based workflows due to very limited linting support in dune.

There are three general structures to these linters:

  1. Monolithic, where new rules would have to be implemented intertwined with already existing rules. This is reasonable for special-purpose linters and allows all checks to be performed in a single AST pass.
  2. Modular (but not extensible), where rules are implemented independently from others but form a fixed ruleset. These can be more difficult to combine into a single AST pass and might mean multiple passes in some cases. They are non-extensible because new rules must be integrated into the core tool itself.
  3. Modular and extensible, which has the benefits from the previous point, but also allows custom rules to be added without modifying the tool itself. Thus, they feature some sort of plugin system.

Non-Ppxlib tools

The following table provides more details about the non-Ppxlib tools. Notably, some support type information in rules, which allows more expressive and accurate checks, but also that they cannot be part of the usual Ppx preprocessing step.

Tool Parsetree traversal Type support Typedtree traversal
ocamllint Ast_mapper No -
ocp-lint/typerex-lint Recursion Yes TypedtreeIter
camelot Copy of Ast_iterator No -
zanuda Ast_iterator Yes Tast_iterator

Ppxlib tools

The following table provides more details about the Ppxlib-based tools. All of these integrate with dune in one way or another. In some cases, different parts of the same linter from the first table work by slightly different means.

Tool Dune integration Ppxlib phase Traversal Output
bene-gesselint (lint) ~impl iter&Ast_pattern register_correction
ppx_js_style (enforce_cold) (preprocess) ~lint_impl fold Lint_error
ppx_js_style (other) (preprocess) ~impl iter raise_errorf
base ppx_base_lint (preprocess) ~impl iter raise_errorf
mina ppx_version (lint_primitive_uses) (preprocess) ~lint_impl fold/iter raise_errorf
mina ppx_version (lint_version_syntax) (preprocess) ~lint_impl fold Lint_error/eprintf
less-power ast-check (preprocess) ~impl map_with_context error_extensionf

There are two possible ways of dune integration:

  1. (preprocess) stanza, which is the usual way to add Ppx preprocessors to the build of a library/executable. This runs unconditionally during the normal build process.
  2. (lint) stanza, which has similar syntax but is undocumented. This doesn’t run by default in dune, but rather requires dune build @lint to be executed, which is very easy to forget. A rare example of this exists in dune’s test suite.

Either way, a major inconvenience is that the linter has to be added to every dune library and executable. There’s no way right now to define entire-project linters, which is error-prone as one may simply forget to add the linter to a new library.

There are two main Ppxlib phases used for such linters:

  1. ~impl (or ~intf), which is usually used for defining AST transformations. However, linters wouldn’t actually transform the program, but just output warnings during such pass.
  2. ~lint_impl (or ~lint_intf), which run before any transformations take place. In fact, this phase cannot even transform the AST, but only return a list of Lint_errors.

There are various ways in Ppxlib to traverse the AST and each linter uses one based on what needs to be returned from the phase and how the output is done. Note that Ppxlib’s context-free (rewriting) rules aren’t suitable for linting as-is: they can only match extension nodes, special functions, custom constants and attribute-annotated nodes. In particular, arbitrary Ast_pattern-based matching is not offered by Ppxlib. This is what bene-gesselint tries to provide as a thin wrapper, however it doesn’t neatly combine multiple Ast_pattern-matching rules into a single AST pass.

There are five means of output for Ppxlib-based linters:

  1. Driver.register_correction, which proposes a code change that can be promoted using dune. This only works during (lint) and must propose a change, so it cannot simply produce a warning.
  2. Lint_error.of_string, which yields a preprocessor warning. These can only be returned from ~lint_impl, but may warnings can be returned from a single run.
  3. Location.raise_errorf, which crashes the preprocessor with an error. Hence, multiple errors cannot be produced from a single linter run. Ppxlib also discourages the use of exceptions for error handling.
  4. Location.error_extensionf, which creates a special error extension node to be put into the AST. Hence, this requires a map traversal, but also allows multiple errors to be returned. Ppxlib recommends this for error handling, at least for usual Ppxlib expanders, derivers and transformers. However, it seems to me that the OCaml compiler will still only print the error from the first error extension node.
  5. eprintf, which is just very ad hoc.

Ppxlib techniques

Many combinations of dune integration, Ppxlib phase, traversal and output exist, but not all of them are compatible and sensible. Worse yet, some simply don’t even work, either silently or loudly. The following table gives an overview of the reasonable combinations and which to avoid.

Dune integration Ppxlib phase Traversal Output Comment
(lint) ~lint_impl fold Lint_error.of_string Doesn’t work (no output)
(lint) ~impl iter Driver.register_correction Dune-promotable changes
(preprocess) ~lint_impl fold Lint_error.of_string Multiple preprocessor warnings
(preprocess) ~lint_impl iter Location.raise_errorf Single error
(preprocess) ~lint_impl iter eprintf Multiple non-standard warnings
(preprocess) ~impl map Location.error_extensionf Multiple errors
(preprocess) ~impl iter Location.raise_errorf Single error
(preprocess) * * Driver.register_correction Doesn’t work (promotable changes not allowed)

This GitHub repository includes examples of all of these setups in the corresponding subdirectories. See the Cram test run.t files for example outputs.