Stream: Coq users

Topic: Performance of arbitrary length integer computations in Coq


view this post on Zulip Michael Soegtrop (Mar 21 2024 at 12:29):

I wonder if someone did a comparison of the performance of arbitrary length integer computations in Coq using the various execution methods vs e.g. Python. I have things like computation of large binomial coefficients, prime number search and the like in mind.

view this post on Zulip Karl Palmskog (Mar 21 2024 at 12:39):

didn't Jason do some stuff in this direction for https://github.com/JasonGross/neural-net-coq-interp ? You may want to ping him if that repo has something that looks worthwhile

view this post on Zulip Michael Soegtrop (Mar 21 2024 at 16:18):

I can't find execution performance related things in there, but it is anyway a good idea to ask Jason.

view this post on Zulip Michael Soegtrop (Mar 21 2024 at 16:21):

@Jason Gross : are you aware of a performance comparison for arbitrary length integer computations between Coq (lazy, vm_compute, native) and e.g. Python?

view this post on Zulip Jason Gross (Mar 21 2024 at 17:14):

I am not. NNCI is effectively profiling matrix operations with primitive ints for indexing primitive arrays of floats. Presumably the bottleneck is closures and iteration, it's about 30,000x slower in native compute than in pytorch. (And the vm is another 2x slower.)

view this post on Zulip Jason Gross (Mar 21 2024 at 17:27):

Is there a Coq library that implements big ints using primitive ints? I guess in theory the code in Fiat Crypto could probably be adapted, e.g., by adjusting https://github.com/mit-plv/fiat-crypto/blob/cf73a058f9ea43800c86517b3a86b47000a966fe/src/ArithmeticCPS/Core.v#L303 to remove the reduction modulo s-c, have it double the length when multiplying, add an operation to remove leading 0s, etc. I guess if you really want a fair comparison with C you should extend Coq's primitive integers with add-with-get-carry and sub-with-get-borrow, so and adapt the VM to compile chains of them (e.g. in ripple-carry-adders) into the assembly instructions.

view this post on Zulip Jason Gross (Mar 21 2024 at 17:38):

But if you just want a comparison of operations on Z, take a look at https://github.com/mit-plv/fiat-crypto/blob/cf73a058f9ea43800c86517b3a86b47000a966fe/src/UnsaturatedSolinasHeuristics/Tests.v#L155-L186. I think native compute is maybe 2x faster than the vm? There's roughly comparable but not identical Python code in https://github.com/mit-plv/fiat-crypto/blob/c60b1d2556a72c37f4bc7444204e9ddc0791ce4f/generate_parameters.py, it probably wouldn't be too hard to, e.g., do recursive extraction of the relevant definition and look for differences between the OCaml and Python (the dependency chain is not that deep). I haven't ever directly compared the performance of this Coq and OCaml code though, AFAICR.

view this post on Zulip Jason Gross (Mar 21 2024 at 17:40):

(this by the way is vaguely similar to prime number search)

view this post on Zulip Jason Gross (Mar 21 2024 at 17:43):

I also have the sieve of Eratosthenes at https://github.com/mit-plv/rewriter/blob/1cd64f2598427f8f4ddcbe453fb61bff32fd0d3a/src/Rewriter/Rewriter/Examples/PerfTesting/SieveOfEratosthenes.v#L18, it's pretty standalone and you could use it for profiling. Probably it's bottlenecked by nat recursion though.

view this post on Zulip Pierre Roux (Mar 21 2024 at 20:24):

Jason Gross said:

Is there a Coq library that implements big ints using primitive ints?

Bignums?

view this post on Zulip Michael Soegtrop (Mar 22 2024 at 08:12):

@Jason Gross : many thanks for the detailed answer! I am a bit confused about your discussion of arbitrary length integer implementations in Coq - I would have asked the same question as Pierre. But your details suggest that the implementation of Bignums must be inefficient (I guess by a small constant, and I further guess 2), because some primitives are missing. Is this indeed so?

view this post on Zulip Michael Soegtrop (Mar 22 2024 at 08:12):

If, as it appears, there is no such comparison, I will make one.

view this post on Zulip Jason Gross (Mar 22 2024 at 16:05):

I'm just unfamiliar with bignums and haven't directly looked at it / am not familiar with the implementation. But indeed without any primitive operations in https://github.com/coq/coq/blob/master/theories/Numbers/Cyclic/Abstract/CyclicAxioms.v that take in carry t, and given how much trouble GCC has compiling ripple carry add chains even without any indirection, I'd guess that bignums must indeed be slower by a constant factor over an optimized assembly implementation when adding two very large numbers (as for the factor I'm not quite sure --- how much more expensive is a branch per int63 than a fetch from memory?), though I would guess that if there are any function calls remaining in implementation of the operations, they will dominate the constant factors, and by much more than a factor of 2.

view this post on Zulip Jason Gross (Mar 22 2024 at 16:11):

Looking at the bignums implementation, it seems like there's more constant factors to be gained by replacing tupling by primitive arrays, possibly. (Though the lack of for loops in Coq might make it impossible to realize these gains)


Last updated: Jun 23 2024 at 05:02 UTC