@Maxime Dénès We've run a big computation in Coq with native_compute (36h) and the extracted code with ocamlopt (14h). The code is fairly simple, little use of dependent types. Does the difference of timings seem correct?
You cannot expect
native_compute to run as fast as extracted code. Indeed, it has to handle the case of open terms. For instance, while
fst x is basically free for extracted code, it is quite costly for
native_compute. I am not sure it can incur such a slowdown, but it should not be that far.
I think @Jason Gross did a bit of benchmarking for this situation, and indeed the difference was somewhere in the 2-3x range.
To be sure, I'm not complaining, just trying so see whether the numbers are expected.
for higher-order code, expect even more slowdown: https://github.com/coq/coq/issues/8019
Did you mean dependent types? Because higher-order is not an issue, as far as I know. But computing types is.
Out of curiosity, which ocaml version are you using @Bas Spitters ?
@Bas Spitters Hi ! This not entirely unexpected, but sill in the high range of what I'd expect. Of course, it depends a lot of the nature of the code. Performance overhad can come from the handling of open terms or computing in types, as @Guillaume Melquiond wrote, or even in some cases from explicit polymorphism: sometimes having a lot of arguments can make the use of registers less smart, even when the values are not really useful. Extraction erases some of these arguments.
@Guillaume Melquiond in HO code you'll pay for closure normalization, which doesn't happen in weak-head CBV.
BTW, I am currently trying to get a working POC removing the naked pointer use of accumulators following Xavier Leroy's proposal, and I am quite afraid this will have terrible consequences on perfs.
Pierre-Marie Pédrot said:
in HO code you'll pay for closure normalization, which doesn't happen in weak-head CBV.
If the final results contains closures, then I agree. But, hopefully, that is not what is being computed here, and the result is a just a plain concrete value.
Pierre-Marie Pédrot said:
I am quite afraid this will have terrible consequences on perfs.
Definitely. The OCaml compiler should also be changed so that
Obj.tag is not a C function but a builtin one.
Even with inlining, it will always cost more than a jump in a table.
(The patch I am trying to make work doesn't call Obj.tag but a dedicated C function that makes stronger assumptions.)
Nah, with inlining, the branch will always be predicted correctly, thus making it almost free.
In fact, it will even make the code faster in some cases, because then
fst x will be free again. (No need for pattern matching anymore.)
We're computing a boolean. Basically, doing a proof by reflection.
Last updated: Jun 01 2023 at 11:01 UTC