Stream: Miscellaneous

Topic: native compute


view this post on Zulip Bas Spitters (Mar 25 2021 at 09:02):

@Maxime Dénès We've run a big computation in Coq with native_compute (36h) and the extracted code with ocamlopt (14h). The code is fairly simple, little use of dependent types. Does the difference of timings seem correct?

view this post on Zulip Guillaume Melquiond (Mar 25 2021 at 09:10):

You cannot expect native_compute to run as fast as extracted code. Indeed, it has to handle the case of open terms. For instance, while fst x is basically free for extracted code, it is quite costly for native_compute. I am not sure it can incur such a slowdown, but it should not be that far.

view this post on Zulip Pierre-Marie Pédrot (Mar 25 2021 at 09:12):

I think @Jason Gross did a bit of benchmarking for this situation, and indeed the difference was somewhere in the 2-3x range.

view this post on Zulip Bas Spitters (Mar 25 2021 at 09:14):

To be sure, I'm not complaining, just trying so see whether the numbers are expected.

view this post on Zulip Pierre-Marie Pédrot (Mar 25 2021 at 09:19):

for higher-order code, expect even more slowdown: https://github.com/coq/coq/issues/8019

view this post on Zulip Guillaume Melquiond (Mar 25 2021 at 09:31):

Did you mean dependent types? Because higher-order is not an issue, as far as I know. But computing types is.

view this post on Zulip Enrico Tassi (Mar 25 2021 at 09:53):

Out of curiosity, which ocaml version are you using @Bas Spitters ?

view this post on Zulip Maxime Dénès (Mar 25 2021 at 10:07):

@Bas Spitters Hi ! This not entirely unexpected, but sill in the high range of what I'd expect. Of course, it depends a lot of the nature of the code. Performance overhad can come from the handling of open terms or computing in types, as @Guillaume Melquiond wrote, or even in some cases from explicit polymorphism: sometimes having a lot of arguments can make the use of registers less smart, even when the values are not really useful. Extraction erases some of these arguments.

view this post on Zulip Pierre-Marie Pédrot (Mar 25 2021 at 10:29):

@Guillaume Melquiond in HO code you'll pay for closure normalization, which doesn't happen in weak-head CBV.

view this post on Zulip Pierre-Marie Pédrot (Mar 25 2021 at 10:30):

BTW, I am currently trying to get a working POC removing the naked pointer use of accumulators following Xavier Leroy's proposal, and I am quite afraid this will have terrible consequences on perfs.

view this post on Zulip Guillaume Melquiond (Mar 25 2021 at 10:41):

Pierre-Marie Pédrot said:

in HO code you'll pay for closure normalization, which doesn't happen in weak-head CBV.

If the final results contains closures, then I agree. But, hopefully, that is not what is being computed here, and the result is a just a plain concrete value.

view this post on Zulip Guillaume Melquiond (Mar 25 2021 at 10:43):

Pierre-Marie Pédrot said:

I am quite afraid this will have terrible consequences on perfs.

Definitely. The OCaml compiler should also be changed so that Obj.tag is not a C function but a builtin one.

view this post on Zulip Pierre-Marie Pédrot (Mar 25 2021 at 10:49):

Even with inlining, it will always cost more than a jump in a table.

view this post on Zulip Pierre-Marie Pédrot (Mar 25 2021 at 10:50):

(The patch I am trying to make work doesn't call Obj.tag but a dedicated C function that makes stronger assumptions.)

view this post on Zulip Guillaume Melquiond (Mar 25 2021 at 10:51):

Nah, with inlining, the branch will always be predicted correctly, thus making it almost free.

view this post on Zulip Guillaume Melquiond (Mar 25 2021 at 10:53):

In fact, it will even make the code faster in some cases, because then fst x will be free again. (No need for pattern matching anymore.)

view this post on Zulip Bas Spitters (Mar 25 2021 at 11:07):

4.11.1
We're computing a boolean. Basically, doing a proof by reflection.


Last updated: Aug 19 2022 at 19:03 UTC