Documentation

Lean.PrettyPrinter.Parenthesizer

The parenthesizer inserts parentheses into a Syntax object where syntactically necessary, usually as an intermediary step between the delaborator and the formatter. While the delaborator outputs structurally well-formed syntax trees that can be re-elaborated without post-processing, this tree structure is lost in the formatter and thus needs to be preserved by proper insertion of parentheses.

The abstract problem & solution #

The Lean 4 grammar is unstructured and extensible with arbitrary new parsers, so in general it is undecidable whether parentheses are necessary or even allowed at any point in the syntax tree. Parentheses for different categories, e.g. terms and levels, might not even have the same structure. In this module, we focus on the correct parenthesization of parsers defined via Lean.Parser.prattParser, which includes both aforementioned built-in categories. Custom parenthesizers can be added for new node kinds, but the data collected in the implementation below might not be appropriate for other parenthesization strategies.

Usages of a parser defined via prattParser in general have the form p prec, where prec is the minimum precedence or binding power. Recall that a Pratt parser greedily runs a leading parser with precedence at least prec (otherwise it fails) followed by zero or more trailing parsers with precedence at least prec; the precedence of a parser is encoded in the call to leadingNode/trailingNode, respectively. Thus we should parenthesize a syntax node stx supposedly produced by p prec if

  1. the leading/any trailing parser involved in stx has precedence < prec (because without parentheses, p prec would not produce all of stx), or
  2. the trailing parser parsing the input to the right of stx, if any, has precedence >= prec (because without parentheses, p prec would have parsed it as well and made it a part of stx). We also check that the two parsers are from the same syntax category.

Note that in case 2, it is also sufficient to parenthesize a parent node as long as the offending parser is still to the right of that node. For example, imagine the tree structure of (f fun x => x) y without parentheses. We need to insert some parentheses between x and y since the lambda body is parsed with precedence 0, while the identifier parser for y has precedence maxPrec. But we need to parenthesize the $ node anyway since the precedence of its RHS (0) again is smaller than that of y. So it's better to only parenthesize the outer node than ending up with (f $ (fun x => x)) y.

Implementation #

We transform the syntax tree and collect the necessary precedence information for that in a single traversal. The traversal is right-to-left to cover case 2. More specifically, for every Pratt parser call, we store as monadic state the precedence of the left-most trailing parser and the minimum precedence of any parser (contPrec/minPrec) in this call, if any, and the precedence of the nested trailing Pratt parser call (trailPrec), if any. If stP is the state resulting from the traversal of a Pratt parser call p prec, and st is the state of the surrounding call, we parenthesize if prec > stP.minPrec (case 1) or if stP.trailPrec <= st.contPrec (case 2).

The traversal can be customized for each [*Parser] parser declaration c (more specifically, each SyntaxNodeKind c) using the [parenthesizer c] attribute. Otherwise, a default parenthesizer will be synthesized from the used parser combinators by recursively replacing them with declarations tagged as [combinator_parenthesizer] for the respective combinator. If a called function does not have a registered combinator parenthesizer and is not reducible, the synthesizer fails. This happens mostly at the Parser.mk decl, which is irreducible, when some parser primitive has not been handled yet.

The traversal over the Syntax object is complicated by the fact that a parser does not produce exactly one syntax node, but an arbitrary (but constant, for each parser) amount that it pushes on top of the parser stack. This amount can even be zero for parsers such as checkWsBefore. Thus we cannot simply pass and return a Syntax object to and from visit. Instead, we use a Syntax.Traverser that allows arbitrary movement and modification inside the syntax tree. Our traversal invariant is that a parser interpreter should stop at the syntax object to the left of all syntax objects its parser produced, except when it is already at the left-most child. This special case is not an issue in practice since if there is another parser to the left that produced zero nodes in this case, it should always do so, so there is no danger of the left-most child being processed multiple times.

Ultimately, most parenthesizers are implemented via three primitives that do all the actual syntax traversal: maybeParenthesize mkParen prec x runs x and afterwards transforms it with mkParen if the above condition for p prec is fulfilled. visitToken advances to the preceding sibling and is used on atoms. visitArgs x executes x on the last child of the current node and then advances to the preceding sibling (of the original current node).

Instances For
    @[inline, reducible]
    Instances For
      @[inline, reducible]
      Instances For
        @[inline, reducible]
        Instances For

          Execute x at the right-most child of the current node, if any, then advance to the left.

          Instances For

            Run x and parenthesize the result using mkParen if necessary. If canJuxtapose is false, we assume the category does not have a token-less juxtaposition syntax a la function application and deactivate rule 2.

            Instances For
              @[extern lean_mk_antiquot_parenthesizer]
              @[extern lean_pretty_printer_parenthesizer_interpret_parser_descr]
              @[implemented_by Lean.PrettyPrinter.Parenthesizer.parenthesizerForKindUnsafe]

              Add necessary parentheses in stx parsed by parser.

              Instances For