I've discovered a critical source of slowness in metacoq that comes from the ssr rewrite implementation, but I am not sure I fully understand the code so I am publicly asking for help. (@Enrico Tassi maybe?)
The issue is visible in https://github.com/coq/coq/pull/15782.
rwprocess_rule we compute multirewrite rules by extraction from a proof p : T where T is recursively built out of equalities, products and functions.
T ≡ A × B we compute the projections of
p, but there is a "fast path" when
p is already a pair
but for some reason, we
simpl the hell out of it when trying to check whether it's a pair
(in master, we do it even twice)
in the PR we only
simpl it once and share the result
but even there it's horribly slow because we're normalizing huge proofs
Where are the huge proofs? in
since it's recursive you pay the price of normalization at every recursive step
we normalize the proof not the type
I tried not normalizing and always rebuilding projections but it breaks mathcomp
(line 576 in the PR)
let sr sigma = match EConstr.kind sigma (Tacred.hnf_constr env sigma r) with ?
(see the PR for a slightly more readable variant)
Maybe I know, but it is very subtle
I am all ears
(this is terrible algorithmically if you have many nested pairs regardless of the cost of simpl)
r = (thm1, thm2) : T = (forall x, x = 1) * (forall x, x = 2), then the proof term would be, say,
projT2 r a for some
a in the goal. That is the easy case. Now, one may have defined
Definition r x y := (thm1 x, thm2 y). So
T = forall x y, (x = 1) * (y = 2). In this case the proof term is
projT2 (r a _) for unresolved
_. But if you reduce you prune the offending
but this is a heuristic, simpl might not be enough to get rid of the dependency
I think the right fix is to "distribute" binders, but it all comes down to the poor man implementation of "rewrite databases"
so, one δ unfolding should be considered enough a long as there are leading products or something?
in some sense you have to reduce all projT
lazy the right chooice here??
that's one thing I am wondering
(at least in its whd variant)
if the only thing we care about is the head of the term...
yes, weak head call by need
I'll try that to see if it works any better (and is more efficient)
mathcomp seems to be happy with whd-lazy
and is it more efficient? :)
(I hope so)
I can't test right now on the bench because metacoq doesn't compile
but locally it does seem so
(this hotspot was responsible for 90% of the time in some parts)
opam package for metacoq fixed and merged
Last updated: Dec 07 2023 at 09:01 UTC