Automating Mathematics?

Siddhartha Gadgil

Department of Mathematics

Indian Institute of Science

Bangalore

https://github.com/siddhartha-gadgil/ProvingGround

Intelligence and Games

Tic-tac-toe

Othello

Deep Blue

Alpha Go

Levels of intelligence

Carrying out specific computations and control functions.
Following precise instructions to carry out arbitrary computations, i.e., to be computationally universal.
Learning from experience, by imitation and by experimentation to get better results.
Computers are used today in mathematics as assistants, not collaborators.

Games and Strategies

A game has:
- A state at any time.
- Rules for making moves.
- An outcome: win/loss, margin etc.
A strategy can be based on:
- A policy function, giving (weighted) moves to consider.
- A (relative) value function, telling us how good a state is.
We consider certain sequences of moves, based on policy, and optimize value at the end of the sequence.
In tic-tac-toe we can use simple policies and values.
In Othello, we can refine the obvious value - number of black/white coins, by considering edges and corners.

Chess, Expert systems and Deep Blue

In Chess, a basic value function is obtained by counting pieces and pawns with weights.
For openings, we also have strict policy functions - we only consider a small subset of possible move sequences.
Deep Blue (and chess theory) greatly extend these.
The value and policy functions of Kasparov were far better, but compensated for by Deep Blue being able to consider far more move sequences.
“ Play the opening like a book, the middle game like a magician, and the endgame like a machine. ” – Spielmann
A lesson from machines.

AlphaGo vs Lee Sedol

In Go, the number of legal moves is much larger, causing the number of sequences for a weak policy to grow very fast.
More importantly, it is very hard to describe a good value function.
This makes it far harder for computers.
Yet, in March 2016, a Go playing system AlphaGo defeated 18-time world champion Lee Sedol.

AlphaGo, Learning, Deep Neural Nets

The policy and value functions of AlphaGo are deep neural networks that were trained.
The policy network was trained by learning to predict the next move from games of expert players.
The value network was trained by AlphaGo playing against versions of itself and getting feedback from the outcome.
AlphaGo considered fewer sequences of moves than Deep Blue.
AlphaGo came up with unexpected moves.

Deep Neural Network

The state can be represented by a vector, on which we wish to find good functions.
We first define a new vector in terms of the input vector as a composition of functions from vectors to vectors. Our desired functions are in terms of this new vector.
We can view co-ordinates of vectors as forming layers, with each layer defined as a function of the previous layer.
Each co-ordinate of a layer is a linear combination of (some of the) coordinates of the previous layer composed by a cut-off function.
This can be trained by gradient flow, using the chain rule to backward propagate gradients.

Deep Learning and Science

We are trying to solve an Engineering problem of finding a near-optimal strategy, and the related scientific problem of understanding the strategy used by the brain (say for vision).
A good scientific theory gives a concise description of (properties of) observations, which are good approximations.
However, any reasonably concise description may give a very poor approximation, and near-optimal solutions may be far from unique.
Yet, AlphaGo is not based on a single strategy network, but on two black-box networks and some concrete descriptions.

Theorem Proving:

Enumeration, Computation, Deduction

Goals

Computer-Assisted proof components

Automated Deduction

Limits of Set theory and First-order Logic

Goals

Goal: Use computers to greatly increase our ability to discover and prove mathematical results across areas of mathematics.
Why?
How? Learn to search for, construct and recognize useful mathematical objects.

What?

There are infinitely many prime numbers.
There are arbitrarily long arithmetic progressions all of whose elements are prime numbers.
There are infinitely many natural numbers $n$ such $n$ and $n+2$ are both primes.

Computer-Assisted proof components

Computers have been used in various ways to provide a component of a proof:
- Enumeration,
- Symbolic algebra,
- Exact real number arithmetic,
- Linear programming,
- SAT solvers.
Some examples:
- Four colour theorem,
- Kepler conjecture,
- Boolean Pythagorean triples problem.

Formal proofs in Mathematics.

A formal deduction system should giver rules that describe
- What are well-formed expressions, i.e., represent mathematical objects.
- How to deduce statements from other statements (more generally to make judgements).
- Axioms, which are well-formed propositions that we take to be proved.
In the usual foundations of mathematics, the rules for forming expressions and making deductions are those of First-Order Logic, and the axioms are those of Set Theory.

First-order logic: languages

A first-order language, which describes a domain of discourse (e.g. $\mathbb{N}$) has vocabulary consisting of
- Variables - can be taken to be a fixed countable set.
- Constants (e.g. $0$, $1$).
- Functions (e.g. $+$).
- Relations (e.g. $<$, $=$).
- Special symbols $\Rightarrow$, $\iff$, $\wedge$, $\vee$, $\forall$, $\exists$, ...
We form two kinds of expressions from these, terms (e.g. $1+1$, $n + 3$) and formulas (e.g. $n + 1 < 2$, $4 < 3$).
Terms and formulas may depend on variables.
Terms represent objects in the domain of discourse.
Formulas are either true or false.

Deduction and theories

We can deduce formulas from other formulas using the rules of deduction.
The main deduction rule is Modus Ponens : given $P$ and $P\Rightarrow Q$ we deduce $Q$.
A theory is a language together with a collection of formulas, called axioms in the language.
A formula is deducible in a theory if it can be obtained from the axioms by the rules of deduction.

Universal deducer?

A universal deducer is a computable function (algorithm) which, given a proposition (formula), returns a proof if it is deducible and otherwise returns failure.
By results of Turing, there is no such algorithm.
We can enumerate proofs and check if they prove either the given proposition or its negation. This does not always work as there are statements that are true but not provable.
Practically, we can conclude that there is no best deducer, as any given proof can be found by some deducer but no deducer can find all proofs.

Robbins conjecture

Robbins conjecture was a conjectural characterization of Boolean algebras in terms of associativity and commutativity of $\vee$ and the Robbins equation $\neg(\neg(a\vee b)\vee \neg(a \vee \neg b)) = a$.
This was conjectured in the 1930s, and finally proved in 1996 using the automated theorem prover EQP, which is a Resolution Theorem Prover with Paramodulation.
So far, this seems to be the only major success of deductive theorem provers.
First-order logic theorem provers are, however, used in interactive proof systems (hammer tactics).

Real-life mathematics

A proof in real-life mathematics consists of:
- definitions, axioms, assumptions, notation;
- assertions;
- hints about which assertions, definitions etc. are used in the proof of a given assertion.
- focussing attention on relevant objects and results.
- introducing variables,hypotheses etc. into a scope.
The reader is expected to deduce all assertions based on the hints (or at least believe that he/she can do so).
In particular, there is no objective sense in which a proof is complete or correct.

“ Since the first half of the 20th century mathematics has been presented as a science based on ZFC and ZFC was introduced as a particular theory in Predicate Logic.

“ Therefore someone who wanted to get to the bottom of things in mathematics had a simple road to follow - learn what Predicate Logic is, then learn a particular theory called ZFC, then learn how to translate propositions about a few basic mathematical concepts into formulas of ZFC, and then learn to believe, through examples, that the rest of mathematics can be reduced to these few basic concepts.”

Vladimir Voevodsky

Why better (new) foundations?

In the usual foundations of mathematics, $sin(3)$ and $3(sin)$ are syntactically equally valid, i.e., the usual language of mathematics is an informal language.
Statements and proofs formalized in first-order logic are verbose and opaque.
Propositions and Proofs are not first class, i.e., cannot be arguments or values of functions, so not composable. We instead have patterns of proof (such as induction).
HoTT foundations are fully formal, yet much closer to working mathematics and with a syntax that actually describes valid objects. Further, introducing variables, hypothesis and assumptions as well as inductive proofs, are natural.

“ The roadblock that prevented generations of interested mathematicians and computer scientists from solving the problem of computer verification of mathematical reasoning was the unpreparedness of foundations of mathematics for the requirements of this task.”

“ Formulating mathematical reasoning in a language precise enough for a computer to follow meant using a foundational system of mathematics not as a standard of consistency applied only to establish a few fundamental theorems, but as a tool that can be employed in everyday mathematical work. ”

Vladimir Voevodsky

Martin-LÖf Type theory

Type theoretic Foundations

Terms, Types, Rules

Inductive types

Dependent Types

Propositions as types

Higher-order languages

A language has words and phrases belonging to various syntactic categories.
The grammar specifies rules for forming phrases from words and phrases, based on their syntactic categories.
First order logic is closely modelled on this, with terms analogues of noun phrases, functions and relations analogues of verbs and formulas analogues of sentential phrases.
However, meaningless but valid sentences are avoided by having no verb phrases other than the small number of verbs in the vocabulary.

Higher-order languages.

In a higher order logic we have rules for forming new syntactic categories.
In a simple higher order language, these are formed from other syntactic categories (generics).
In a dependently typed language, syntactic categories are regarded as words/phrases.
Thus, we can form syntactic categories in terms of words/phrases.
We can also form phrases using syntactic categories.

Type theoretic foundations

A Type theory is a type system rich enough to replace Set theory as foundations for mathematics.
Mathematical objects, called terms, have types .
A term $a$ having a type $A$, denoted $a : A$, is analogous to an element $a$ belonging to a set $A$, i.e., $a \in A$.
However the rules for forming terms and types, and for determining whether a term has a type, are purely syntactic.
Nevertheless, the rules for forming types are rich enough that types can play the role of sets - for instance, prime numbers form a type.
Even more remarkably, propositions and proofs can be expressed in terms of types and terms.

Terms, Types, Rules

Mathematical objects are called terms.
Every term has a type , generally unique.
Types are also terms, whose types are universes.
We have rules to introduce terms (including types), individually or in groups, into the context.
Rules also let us make two kinds of judgements:
- that a term $a$ is of type $A$.
- that two terms are equal by definition .
All the rules are syntactic.
Note that terms can be equal without being so by definition.
There is a relation (type family) propositional equality extending definitional equality.

Function types, functions and applications

Given types $A$ and $B$, we can introduce the function type $A \to B$, whose members are functions.
Given $f: A \to B$ and $a : A$, we get a term $f(a) : B$.

We can construct a function $f: A \to B$, $$f(a) := b,$$ by giving an expression $b$ of type $B$ in terms of a variable $a : A$ and other terms in the context.

We can also define functions recursively on inductive types.

Dependent functions and type families

We generalize functions $f : A \to B$ to dependent functions, so that $f(a)$ has a type $B(a)$, depending in general on $a : A$.
More precisely,
- A type family $B: A \to \mathfrak{U}$ is a function with codomain a universe, so all its values are types.
- Given a type family $B: A \to \mathfrak{U}$, we can construct a corresponding type $\prod_{a : A} B(a)$ of dependent functions.
- We can apply $f : \prod_{a : A} B(a)$ to $a : A$, to obtain $f(a) : B(a)$.
Constructions of dependent functions are analogous to those of functions.

Inductive types

An inductive type $T$ is defined by specifying terms (usually functions) that construct members of $T$.


data ℕ : Type where
  zero : ℕ
  succ : ℕ → ℕ

Formally, we are introducing into the context the type $\mathbb{N}$ and two terms $0$ and $succ$.
The type is freely generated by its constructors, allowing recursive and inductive definitions.

Recursive definitions

We can define functions recursively on inductive types, by specifying in all cases.


_+_ : ℕ → ℕ → ℕ
zero + y = y
(succ x) + y = succ (x + y)

Formally, we can introduce recursion functions and apply them to the definition data.

More Inductive types

The types $\mathbb{0}$ has no terms .
The type $\mathbb{1}$ has a single term $*$.
The product type $A \times B$ has terms (corresponding to) pairs $(a, b)$ with $a: A$ and $b : B$.
The sum (disjoint union) type $A \oplus B$ has terms (corresponding to) terms of $A$ and terms of $B$.
For a type family $B: A \to \mathcal{U}$, the dependent pair type $\sum_{a: A} B(a)$ has terms (corresponding to) pairs $(a, b)$ with $a: A$ and $b : B(a)$.

Propositions as types

A type $A$ is inhabited if there is a term $a$ with $a : A$.
By propostion we mean a logical statement that must be true or false.
We represent propositions by types.
If a type $A$ is viewed as a proposition, a term $a : A$ is a proof of (or witness to) $A$.
In particular, a proposition is true if and only if the corresponding type is inhabited.
Note that we must be able to form types representing mathematical propositions by the type formation rules.

Combining propositions

Let $A$ and $B$ be types, regarded as representing propositions.

The proposition $A \Rightarrow B$ is represented by $A \to B$.
The propostion $A\wedge B$ is represented by $A \times B$.
The proposition $A \vee B$ is represented by $A \oplus B$.
The proposition $\neg A$ is represented by $A \to \mathbb{0}$.

Quantifying propositions

A proposition depending on a variable $x : A$ is represented by a type family $P : A \to \mathfrak{U}$.
The proposition $\forall x\in A,\ P(x)$ is $\prod_{x: A} P(x)$.
The proposition $\exists x\in A,\ P(x)$ is $\sum_{x : A} P(x)$.

Identity type family

For a fixed type $A$, propositional equality is given by the identity type family freely generated by reflexivity.


data _==_ {A : Type} : A → A → Type where
  refl : (a : A) → a == a

This is an inductive type family.


symmetry : {A : Type} → {x y : A} → (x == y) → (y == x)
symmetry (refl a) = refl a

trans : {A : Type} → {x y z : A} → (x == y) → (y == z) → (x == z)
trans (refl a) (refl .a) = refl a

However, for fixed $a: A$, $a = a$ is not an inductive type, i.e., it is not suffiicient to define functions on $refl(a)$.

Homotopy type theory: Types as Spaces

Equality, Paths, Homotopy

Levels from dimension

Sets, Propositions

Types as Spaces

We interpret
- Types as spaces.
- Terms of a type as points of the space.
- Functions $A \to B$ as continuous functions $A \to B$.
- For a type $A$ and terms $x, y: A$, the identity type $x = y$ as paths in $A$ from $x$ to $y$.
We do not actually construct spaces, i.e., sets with topology, starting with a type.
Instead we make topological (specifically homotopy theoretic) constructions and prove topological results in type theory.
A practical consequence for type theories is that we get a canonical, provably consistent, type theory.

Equality, Paths, Homotopies

As above, for a type $A$ and $x, y : A$, a term $p : (x = y)$ is interpreted as a path from $x$ to $y$.
Two such paths are equal if there is a path of paths, called a homotopy, between them.

We have similar notions of equality for functions.
Types are equal if the corresponding spaces are homotopy equivalent, as a consequence of the Univalence axiom.

Dimension and levels

We define levels of types, based on a characterization of dimension in homotopy theory.
By definition, there is a unique type at level $-2$ (the lowest), which has a single term.
Inductively, we define the level of a type $A$ to be at most $(n + 1)$ if for $a, b : X$, the type $a = b$ has level at most $n$.
Further, we can truncate a type canonically to an $n$-type.

Sets and mere propositions

A set is a space with all of its components contractible.
A type $A$ is a set if for $x, y: A$ and $p, q: x = y$, we have $p = q$.
A mere proposition is a type which is either empty or all of its elements are equal, i.e., a type at level $-1$.
The concept of mere propostions, as well as propositional truncation, allow consistent mixing of classical logic with the type theoretic form.
For instance, in homotopy type theory the law of excluded middle is usually assumed for mere propositions, but not for all types.

Proving-Ground: Theorem proving by Learning

HoTT implemented in Scala

Reinforcement learning

Representation learning

Natural language processing

Useful theorems and proofs

A theorem with a simple statement but difficult proof is useful.
A theorem used to prove many other theorems is useful.
Particularly for applications, we may have an externally determined notion of a priori usefulness.
With homotopy type theory and backward propagation, all these can be naturally captured in simple learning dynamics.

Reinforcement learning: term-type map

Given an initial distribution $P_0$ on terms, rules for forming terms gives a new distribution which can be recursively defined by a relation of the form $$P = \alpha P_0 + \beta \mu_*(P) + \gamma \theta_*(P \times P) + \dots$$
Given a probability distribution on terms we get one on types by mapping a term to its type (proof distribution).
As (inhabited) types themselves are terms, we get a restricted distribution on them (theorem distribution).
We have a flow of the (entropy of) the proof distribution towards the theorem distribution.
We can backward propagate to get a flow on the distribution on terms.
The distribution on theorems can have other components.

Representation learning

Using proximity and order, words, mathematical objects etc can be represented by vectors which capture many of their relations.

NLP: extracting from human literature.

We aim to extract mathematical objects from the literature, with reasonably high accuracy.
These can be used for learning, both giving terms from which to generate others, and a distribution on types.
This is similar to a translation problem, except the target language has strong restrictions on sentences being meaningful and true.
To extract from natural language, we first use a parser.

The Slice-Ribbon conjecture:

Computer-assisted approaches

Knots: Slice and Ribbon

A knot $K$ is a smoothly embbed circle in $S^3$.
$K$ is unknotted if it bounds a smoothly embedded disc in $S^3$.
$K$ is slice if it bounds a smoothly embedded disc in $B^4$.
$K$ is ribbon if it bounds a slice disc so that $x^2+y^2+z^2$ has (no degenerate critical points and) no local maxima in it.
The slice-ribbon conjecture says that evey slice knot is ribbon.

Workflow

We seek useful results by interaction, working out and the literature.
Working out is a combination of experimentation, computation, deduction and search, with learning.
We seek:
- Consequences of being ribbon - to contradict.
- Constructions of slice knots.
- Modifications of, and relations between, slice discs for a fixed knot.
- Invariants, complexities etc associated to slice discs.
One may similarly analyse, for example, the mean curvature flow.