I wonder if someone did a comparison of the performance of arbitrary length integer computations in Coq using the various execution methods vs e.g. Python. I have things like computation of large binomial coefficients, prime number search and the like in mind.

didn't Jason do some stuff in this direction for https://github.com/JasonGross/neural-net-coq-interp ? You may want to ping him if that repo has something that looks worthwhile

I can't find execution performance related things in there, but it is anyway a good idea to ask Jason.

@Jason Gross : are you aware of a performance comparison for arbitrary length integer computations between Coq (lazy, vm_compute, native) and e.g. Python?

I am not. NNCI is effectively profiling matrix operations with primitive ints for indexing primitive arrays of floats. Presumably the bottleneck is closures and iteration, it's about 30,000x slower in native compute than in pytorch. (And the vm is another 2x slower.)

Is there a Coq library that implements big ints using primitive ints? I guess in theory the code in Fiat Crypto could probably be adapted, e.g., by adjusting https://github.com/mit-plv/fiat-crypto/blob/cf73a058f9ea43800c86517b3a86b47000a966fe/src/ArithmeticCPS/Core.v#L303 to remove the reduction modulo s-c, have it double the length when multiplying, add an operation to remove leading 0s, etc. I guess if you really want a fair comparison with C you should extend Coq's primitive integers with add-with-get-carry and sub-with-get-borrow, so and adapt the VM to compile chains of them (e.g. in ripple-carry-adders) into the assembly instructions.

But if you just want a comparison of operations on Z, take a look at https://github.com/mit-plv/fiat-crypto/blob/cf73a058f9ea43800c86517b3a86b47000a966fe/src/UnsaturatedSolinasHeuristics/Tests.v#L155-L186. I think native compute is maybe 2x faster than the vm? There's roughly comparable but not identical Python code in https://github.com/mit-plv/fiat-crypto/blob/c60b1d2556a72c37f4bc7444204e9ddc0791ce4f/generate_parameters.py, it probably wouldn't be too hard to, e.g., do recursive extraction of the relevant definition and look for differences between the OCaml and Python (the dependency chain is not that deep). I haven't ever directly compared the performance of this Coq and OCaml code though, AFAICR.

(this by the way is vaguely similar to prime number search)

I also have the sieve of Eratosthenes at https://github.com/mit-plv/rewriter/blob/1cd64f2598427f8f4ddcbe453fb61bff32fd0d3a/src/Rewriter/Rewriter/Examples/PerfTesting/SieveOfEratosthenes.v#L18, it's pretty standalone and you could use it for profiling. Probably it's bottlenecked by nat recursion though.

Jason Gross said:

Is there a Coq library that implements big ints using primitive ints?

Bignums?

@Jason Gross : many thanks for the detailed answer! I am a bit confused about your discussion of arbitrary length integer implementations in Coq - I would have asked the same question as Pierre. But your details suggest that the implementation of Bignums must be inefficient (I guess by a small constant, and I further guess 2), because some primitives are missing. Is this indeed so?

If, as it appears, there is no such comparison, I will make one.

I'm just unfamiliar with bignums and haven't directly looked at it / am not familiar with the implementation. But indeed without any primitive operations in https://github.com/coq/coq/blob/master/theories/Numbers/Cyclic/Abstract/CyclicAxioms.v that take in `carry t`

, and given how much trouble GCC has compiling ripple carry add chains even without any indirection, I'd guess that bignums must indeed be slower by a constant factor over an optimized assembly implementation when adding two very large numbers (as for the factor I'm not quite sure --- how much more expensive is a branch per int63 than a fetch from memory?), though I would guess that if there are any function calls remaining in implementation of the operations, they will dominate the constant factors, and by much more than a factor of 2.

Looking at the bignums implementation, it seems like there's more constant factors to be gained by replacing tupling by primitive arrays, possibly. (Though the lack of for loops in Coq might make it impossible to realize these gains)

Last updated: Jun 23 2024 at 05:02 UTC