Stream: math-comp users

Topic: Measuring gains from using small-scale reflection


view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:00):

Is there any obvious way to measure the gains from using small-scale reflection in Coq math proofs (as opposed to using no reflection)? Would one look at things like proof script length, number of steps, proof term size?

(Some context here on why this might be relevant to figure out in a canonical way.)

@anormalform @Someody42 @EremondiJoey @plt_amy Oh we also don't care much at all about computation. We prove theorems using lots of rewrites. We avoid heavy `refl` proofs. Beyond some point pure mathematics stops caring about computational behaviour of definitions. This is why different people use Lean to e.g. Coq.

- Kevin Buzzard (@XenaProject)

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:03):

Proof term size is probably going to blow up, it would be interesting to get some actual data comparing Lean proofs with Coq proofs.

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:04):

OTOH HOL-based systems seem not to suffer from death from a thousand (proof) cuts so YMMV

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:05):

but they [the HOLs] don't store proofs by default anywhere, and they don't distribute proof objects like Lean Mathlib does (everyone apparently now downloads binary nightly Mathlib builds)

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:07):

Distributing them does not mean that you'll load them in memory, e.g. in Coq we don't load Qed proofs

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:07):

if you factor in the fact that everybody is claiming that hard drives and bandwidth is cheap...

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:07):

sure, but let's say everyone every day has to download several GB of proof objects (that they don't load into memory), that seems like a significant cost

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:08):

personally I am still waiting for the moment Lean hits the proverbial wall of complexity

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:09):

maybe it will never happen, so far they seem to have been pretty successful

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:09):

(or at least not very vocal about scaling issues)

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:09):

Lean 3 seems to have hit that, but maybe not Lean 4

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:09):

source?

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:10):

Another thing is that Lean is slow. I notice that I have an upper bound on my coding speed because it takes a while before Lean updates the goal. And it's really painful when you have to wait 3 seconds at each keystroke. Had it been faster, I could have done some random thing in 10 minutes, but I spent 1h30 instead. And this happens for several reasons: long proofs and long files. If Lean could provide support for more granular recompilation of long proofs, that would be great. By now, many of the large files have been split into smaller pieces, and I am working on some of the remaining ones.

https://leanprover-community.github.io/blog/posts/backstage-with-dillies/

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:10):

anecdotal, but still

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:10):

3s per keystroke !?

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 11:11):

I had a beer recently with that person, didn't mention such terrible issues when prompted on the Lean vs Coq flamewar

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:12):

well, be sure to follow up if you get the chance...

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:15):

we will see if anyone reports something similar in the upcoming Coq survey

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:29):

hmm, it might actually be feasible to measure average proof term size in MathComp, like we did for average proof script length, sentence length, etc.

On the other hand, if we compare with Lean Mathlib average proof term size, it may not be seen as a fair comparison, since MathComp proofs are consciously factored to be small (right?)

view this post on Zulip Karl Palmskog (Jan 14 2022 at 11:35):

there are probably a lot of optimizations that can be done in a system that truly embraces proof irrelevance, though

view this post on Zulip Paolo Giarrusso (Jan 14 2022 at 12:56):

they also seem to have better hash-consing for proof terms — they specifically claimed the exponential blow-up with typeclasses doesn’t affect Lean (IIUC, even in Lean 3), and Lean 4’s tabling will likely be a boon.

view this post on Zulip Paolo Giarrusso (Jan 14 2022 at 12:59):

regarding equality, they compile to eliminators and IIUC their simp tactic is basically an autorewrite tactic?

view this post on Zulip Paolo Giarrusso (Jan 14 2022 at 13:00):

so I hope they made it faster…

view this post on Zulip Karl Palmskog (Jan 14 2022 at 13:03):

but the MathComp community (main place of small scale reflection proofs) very seldomly uses type classes (CoqEAL comes to mind), so we are not affected by TC resolution performance...

view this post on Zulip Karl Palmskog (Jan 14 2022 at 13:04):

but this brings up another comparison measure: where is the system spending its time during proofs: handling evars, TC search, unification, type checking, ...

view this post on Zulip Paolo Giarrusso (Jan 14 2022 at 13:04):

agree that’s not relevant for math-comp vs lean but only Coq vs Lean — but it was an example of “different tuning” and it came up because you mentioned term size

view this post on Zulip Paolo Giarrusso (Jan 14 2022 at 13:05):

and the relevant term size might differ between the two, even if less dramatically without TCs

view this post on Zulip Paolo Giarrusso (Jan 14 2022 at 13:06):

_also_, Lean’s TC performance is crucial when comparing Lean’s TC-based mathlib with Coq’s CS-based math-comp

view this post on Zulip Paolo Giarrusso (Jan 14 2022 at 13:06):

we already know that math-lib’s TC design would fare much worse in Coq

view this post on Zulip Karl Palmskog (Jan 14 2022 at 13:07):

well, the main libraries and their systems co-evolve. So if someone was loudly using TCs for something the scale of Mathlib, I think we would have seen more TC resolution improvements in Coq

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 13:08):

it's hard to make improvement while preserving backwards compat, especially in a system as complex as TCs

view this post on Zulip Pierre-Marie Pédrot (Jan 14 2022 at 13:09):

(the net result today is that we have a bunch of flags that mostly don't work in the TC algo)


Last updated: Feb 08 2023 at 07:02 UTC