I've discovered a critical source of slowness in metacoq that comes from the ssr rewrite implementation, but I am not sure I fully understand the code so I am publicly asking for help. (@Enrico Tassi maybe?)

The issue is visible in https://github.com/coq/coq/pull/15782.

In `rwprocess_rule`

we compute multirewrite rules by extraction from a proof p : T where T is recursively built out of equalities, products and functions.

indeed

When `T ≡ A × B`

we compute the projections of `p`

, but there is a "fast path" when `p`

is already a pair

but for some reason, we `simpl`

the hell out of it when trying to check whether it's a pair

(in master, we do it even *twice*)

in the PR we only `simpl`

it once and share the result

but even there it's horribly slow because we're normalizing huge proofs

Where are the huge proofs? in `A`

or `B`

????

since it's recursive you pay the price of normalization at every recursive step

it's `p`

we normalize *the proof* not the type

I tried not normalizing and always rebuilding projections but it breaks mathcomp

(line 576 in the PR)

- is it normal to normalize
*proofs*and 2. why does mathcomp care about the shape of the proof?

This line? ` let sr sigma = match EConstr.kind sigma (Tacred.hnf_constr env sigma r) with `

?

yup

(see the PR for a slightly more readable variant)

Maybe I know, but it is very subtle

I am all ears

(this is terrible algorithmically if you have many nested pairs regardless of the cost of simpl)

Imagine `r = (thm1, thm2) : T = (forall x, x = 1) * (forall x, x = 2)`

, then the proof term would be, say, `projT2 r a`

for some `a`

in the goal. That is the easy case. Now, one may have defined `r`

as `Definition r x y := (thm1 x, thm2 y)`

. So `T = forall x y, (x = 1) * (y = 2)`

. In this case the proof term is `projT2 (r a _)`

for unresolved `_`

. But if you reduce you prune the offending `_`

.

I see...

but this is a heuristic, simpl might not be enough to get rid of the dependency

I think the right fix is to "distribute" binders, but it all comes down to the poor man implementation of "rewrite databases"

so, one δ unfolding should be considered enough a long as there are leading products or something?

in some sense you have to reduce all projT

why isn't `lazy`

the right chooice here??

that's one thing I am wondering

(at least in its whd variant)

if the only thing we care about is the head of the term...

yes, weak head call by need

I'll try that to see if it works any better (and is more efficient)

mathcomp seems to be happy with whd-lazy

and is it more efficient? :)

(I hope so)

I can't test right now on the bench because metacoq doesn't compile

but locally it does seem so

(this hotspot was responsible for 90% of the time in some parts)

opam package for metacoq fixed and merged

Last updated: Feb 05 2023 at 20:03 UTC