OCaml linting tools and techniques
Recently (but also 3 years ago), I was interested in finding all catch-all exception handlers in Goblint, which is written in OCaml, in order to prevent “uncatchable” exceptions from being caught and accidentally swallowed. “Uncatchable” exceptions are those which should not be ignored, e.g. Out_of_memory
. My first attempt was using a Semgrep rule, but it turned out to be too buggy to reliably do the job. Therefore, I sought out code linters for OCaml.
Tools
The following table summarizes all OCaml linting tools I managed to find; active or dead, general or special-purpose, standalone or Ppx, monolithic or modular. In this post I focus on linting (based on syntax and possibly types) and exclude program analyzers like reanalyze and Salto. Ocamllint and ocp-lint are the most universal attempts at OCaml linting, however they’re long dead and no replacement seems to have emerged.
Tool | Status | Use case | Mode | Structure |
---|---|---|---|---|
ocamllint | Archived | General | Ppx | Monolithic |
ocp-lint/typerex-lint | Inactive | General | Standalone | Modular & extensible |
camelot | Semiactive | General/teaching | Standalone | Modular |
zanuda | Active | General | Standalone | Modular |
bene-gesselint | Inactive | Framework | Ppxlib | Modular & extensible |
ppx_js_style | Active | Company | Ppxlib | Monolithic |
base ppx_base_lint | Active | Project | Ppxlib | Monolithic |
mina ppx_version | Active | Project | Ppxlib | Monolithic |
less-power ast-check | Active | Teaching | Standalone/Ppxlib | Monolithic |
There are two general execution modes:
- Ppx, which are OCaml AST preprocessors, executed by the build system similarly to other Ppx-es like
@@deriving
features. These are relatively easy to integrate into modern dune-based workflows. - Standalone, which are to be executed outside of the usual compilation process. These don’t integrate into modern dune-based workflows due to very limited linting support in dune.
There are three general structures to these linters:
- Monolithic, where new rules would have to be implemented intertwined with already existing rules. This is reasonable for special-purpose linters and allows all checks to be performed in a single AST pass.
- Modular (but not extensible), where rules are implemented independently from others but form a fixed ruleset. These can be more difficult to combine into a single AST pass and might mean multiple passes in some cases. They are non-extensible because new rules must be integrated into the core tool itself.
- Modular and extensible, which has the benefits from the previous point, but also allows custom rules to be added without modifying the tool itself. Thus, they feature some sort of plugin system.
Non-Ppxlib tools
The following table provides more details about the non-Ppxlib tools. Notably, some support type information in rules, which allows more expressive and accurate checks, but also that they cannot be part of the usual Ppx preprocessing step.
Tool | Parsetree traversal | Type support | Typedtree traversal |
---|---|---|---|
ocamllint | Ast_mapper | No | - |
ocp-lint/typerex-lint | Recursion | Yes | TypedtreeIter |
camelot | Copy of Ast_iterator | No | - |
zanuda | Ast_iterator | Yes | Tast_iterator |
Ppxlib tools
The following table provides more details about the Ppxlib-based tools. All of these integrate with dune in one way or another. In some cases, different parts of the same linter from the first table work by slightly different means.
Tool | Dune integration | Ppxlib phase | Traversal | Output |
---|---|---|---|---|
bene-gesselint | (lint) | ~impl | iter &Ast_pattern | register_correction |
ppx_js_style (enforce_cold ) | (preprocess) | ~lint_impl | fold | Lint_error |
ppx_js_style (other) | (preprocess) | ~impl | iter | raise_errorf |
base ppx_base_lint | (preprocess) | ~impl | iter | raise_errorf |
mina ppx_version (lint_primitive_uses) | (preprocess) | ~lint_impl | fold /iter | raise_errorf |
mina ppx_version (lint_version_syntax) | (preprocess) | ~lint_impl | fold | Lint_error /eprintf |
less-power ast-check | (preprocess) | ~impl | map_with_context | error_extensionf |
There are two possible ways of dune integration:
-
(preprocess)
stanza, which is the usual way to add Ppx preprocessors to the build of a library/executable. This runs unconditionally during the normal build process. -
(lint)
stanza, which has similar syntax but is undocumented. This doesn’t run by default in dune, but rather requiresdune build @lint
to be executed, which is very easy to forget. A rare example of this exists in dune’s test suite.
Either way, a major inconvenience is that the linter has to be added to every dune library and executable. There’s no way right now to define entire-project linters, which is error-prone as one may simply forget to add the linter to a new library.
There are two main Ppxlib phases used for such linters:
-
~impl
(or~intf
), which is usually used for defining AST transformations. However, linters wouldn’t actually transform the program, but just output warnings during such pass. -
~lint_impl
(or~lint_intf
), which run before any transformations take place. In fact, this phase cannot even transform the AST, but only return a list ofLint_error
s.
There are various ways in Ppxlib to traverse the AST and each linter uses one based on what needs to be returned from the phase and how the output is done. Note that Ppxlib’s context-free (rewriting) rules aren’t suitable for linting as-is: they can only match extension nodes, special functions, custom constants and attribute-annotated nodes. In particular, arbitrary Ast_pattern
-based matching is not offered by Ppxlib. This is what bene-gesselint tries to provide as a thin wrapper, however it doesn’t neatly combine multiple Ast_pattern
-matching rules into a single AST pass.
There are five means of output for Ppxlib-based linters:
-
Driver.register_correction
, which proposes a code change that can be promoted using dune. This only works during(lint)
and must propose a change, so it cannot simply produce a warning. -
Lint_error.of_string
, which yields a preprocessor warning. These can only be returned from~lint_impl
, but may warnings can be returned from a single run. -
Location.raise_errorf
, which crashes the preprocessor with an error. Hence, multiple errors cannot be produced from a single linter run. Ppxlib also discourages the use of exceptions for error handling. -
Location.error_extensionf
, which creates a special error extension node to be put into the AST. Hence, this requires amap
traversal, but also allows multiple errors to be returned. Ppxlib recommends this for error handling, at least for usual Ppxlib expanders, derivers and transformers. However, it seems to me that the OCaml compiler will still only print the error from the first error extension node. -
eprintf
, which is just very ad hoc.
Ppxlib techniques
Many combinations of dune integration, Ppxlib phase, traversal and output exist, but not all of them are compatible and sensible. Worse yet, some simply don’t even work, either silently or loudly. The following table gives an overview of the reasonable combinations and which to avoid.
Dune integration | Ppxlib phase | Traversal | Output | Comment |
---|---|---|---|---|
(lint) | ~lint_impl | fold | Lint_error.of_string | Doesn’t work (no output) |
(lint) | ~impl | iter | Driver.register_correction | Dune-promotable changes |
(preprocess) | ~lint_impl | fold | Lint_error.of_string | Multiple preprocessor warnings |
(preprocess) | ~lint_impl | iter | Location.raise_errorf | Single error |
(preprocess) | ~lint_impl | iter | eprintf | Multiple non-standard warnings |
(preprocess) | ~impl | map | Location.error_extensionf | Multiple errors |
(preprocess) | ~impl | iter | Location.raise_errorf | Single error |
(preprocess) | * | * | Driver.register_correction | Doesn’t work (promotable changes not allowed) |
This GitHub repository includes examples of all of these setups in the corresponding subdirectories. See the Cram test run.t
files for example outputs.