Is there any obvious way to measure the gains from using small-scale reflection in Coq math proofs (as opposed to using no reflection)? Would one look at things like proof script length, number of steps, proof term size?

(Some context here on why this might be relevant to figure out in a canonical way.)

@anormalform @Someody42 @EremondiJoey @plt_amy Oh we also don't care much at all about computation. We prove theorems using lots of rewrites. We avoid heavy `refl` proofs. Beyond some point pure mathematics stops caring about computational behaviour of definitions. This is why different people use Lean to e.g. Coq.

- Kevin Buzzard (@XenaProject)Proof term size is probably going to blow up, it would be interesting to get some actual data comparing Lean proofs with Coq proofs.

OTOH HOL-based systems seem not to suffer from death from a thousand (proof) cuts so YMMV

but they [the HOLs] don't store proofs by default anywhere, and they don't distribute proof objects like Lean Mathlib does (everyone apparently now downloads binary nightly Mathlib builds)

Distributing them does not mean that you'll load them in memory, e.g. in Coq we don't load Qed proofs

if you factor in the fact that everybody is claiming that hard drives and bandwidth is cheap...

sure, but let's say everyone every day has to download several GB of proof objects (that they don't load into memory), that seems like a significant cost

personally I am still waiting for the moment Lean hits the proverbial wall of complexity

maybe it will never happen, so far they seem to have been pretty successful

(or at least not very vocal about scaling issues)

Lean 3 seems to have hit that, but maybe not Lean 4

source?

Another thing is that Lean is slow. I notice that I have an upper bound on my coding speed because it takes a while before Lean updates the goal. And it's really painful when you have to wait 3 seconds at each keystroke. Had it been faster, I could have done some random thing in 10 minutes, but I spent 1h30 instead. And this happens for several reasons: long proofs and long files. If Lean could provide support for more granular recompilation of long proofs, that would be great. By now, many of the large files have been split into smaller pieces, and I am working on some of the remaining ones.

https://leanprover-community.github.io/blog/posts/backstage-with-dillies/

anecdotal, but still

3s per keystroke !?

I had a beer recently with that person, didn't mention such terrible issues when prompted on the Lean vs Coq flamewar

well, be sure to follow up if you get the chance...

we will see if anyone reports something similar in the upcoming Coq survey

hmm, it might actually be feasible to measure average proof term size in MathComp, like we did for average proof script length, sentence length, etc.

On the other hand, if we compare with Lean Mathlib average proof term size, it may not be seen as a fair comparison, since MathComp proofs are consciously factored to be small (right?)

there are probably a lot of optimizations that can be done in a system that truly embraces proof irrelevance, though

they also seem to have better hash-consing for proof terms — they specifically claimed the exponential blow-up with typeclasses doesn’t affect Lean (IIUC, even in Lean 3), and Lean 4’s tabling will likely be a boon.

regarding equality, they compile to eliminators and IIUC their `simp`

tactic is basically an `autorewrite`

tactic?

so I hope they made it faster…

but the MathComp community (main place of small scale reflection proofs) very seldomly uses type classes (CoqEAL comes to mind), so we are not affected by TC resolution performance...

but this brings up another comparison measure: where is the system spending its time during proofs: handling evars, TC search, unification, type checking, ...

agree that’s not relevant for math-comp vs lean but only Coq vs Lean — but it was an example of “different tuning” and it came up because you mentioned term size

and the relevant term size might differ between the two, even if less dramatically without TCs

_also_, Lean’s TC performance is crucial when comparing Lean’s TC-based mathlib with Coq’s CS-based math-comp

we already know that math-lib’s TC design would fare much worse in Coq

well, the main libraries and their systems co-evolve. So if someone was loudly using TCs for something the scale of Mathlib, I think we would have seen more TC resolution improvements in Coq

it's hard to make improvement while preserving backwards compat, especially in a system as complex as TCs

(the net result today is that we have a bunch of flags that mostly don't work in the TC algo)

Last updated: Dec 07 2023 at 06:38 UTC