I'm trying to extract some proofs (with results in Type, so they don't get erased) to Haskell. I've ran into an issue where the extraction would take unexpectedly long (not sure how long precisely, I ended up killing it after 30 minutes). However, I found that breaking up the proof by moving some of its subgoals into lemmas made the extraction complete in under 20 seconds (this commit). I didn't make the lemmas any more general, just I copied the available assumptions and the subgoal verbatim and made them into a separate proof.

The OCaml extraction takes around 1 second for both cases.

Me reproduction above is not very readable so I wanted to get a smaller example before reporting this on GitHub and also wanted to check here if this is something already known.

I've managed to get a smaller reproduction here (10 s monolithic vs 0.2 s split up, OCaml 0.2 s for both), but it's still too large to pinpoint the issue and requires setting up the project's dependencies to reproduce. I've tried the Coq Bug Minimizer but it doesn't seem to work well for such performance issues (or I haven't figured out how to use it for those).

I have extracted proofs of similar or larger size without such issues, so there seems to be some issue specific to this proof. Perhaps the actual proof term here is very large -- is there a way to measure that? The only unique thing about this proof vs the other ones I've worked with is that this one uses the lia Ltac.

I'm using Coq 8.13.2, running with a later version is not straightforward because of the project's dependencies, but I didn't notice anything relevant in the release notes since then.

Any suggestions on how to narrow this down appreciated!

If you make the aux lemma Defined (ie `Lemma aux ... Defined. Lemma use_aux ... Qed.`

) is it fast or slow?

Is it only a problem with Haskell extraction?

Maybe https://github.com/coq/coq/pull/16206 is related (AFAICT only possible if making the aux lemma Defined is slow)

the issue is about functors which aren't relevant but the changed code is more general

you could try backporting the PR (should be pretty simple) for testing

If you perf like https://github.com/coq/coq/issues/16172#issuecomment-1153238453 do you get anything interesting?

(remember to `--call-graph dwarf`

in perf record)

Thank you for the tips! Changing the lemma to Defined didn't make a difference, but perf was helpful. A lot of time is spent in `kill_dummy`

and `kill_dummy_hd`

. I think this may be caused by having deeply nested let-ins.

Here's the flamegraph.

Have you tried `Unset Extraction Optimize `

?

Jason Gross said:

Have you tried

`Unset Extraction Optimize`

?

It does fix the issue. My smaller reproduction case from above extracts in 0.02 s with optimisations disabled (with optimisations on: 10s monolithic, 0.2 s split up).

I created an issue on GitHub https://github.com/coq/coq/issues/17448

Last updated: Nov 29 2023 at 22:01 UTC