## Stream: Coq users

### Topic: floating point exp(x)

#### Jason Gross (Jul 28 2023 at 06:44):

Does any Coq library (flocq or otherwise) have a computable implementation of `exp(x)` on primitive floats, + proofs about it?

#### Guillaume Melquiond (Jul 28 2023 at 07:58):

Making a computable version of `exp` is easy. What is hard is proving a specification that suits your needs for it. For example, formally proving that a correctly rounded implementation of exp is indeed correctly rounded (the best possible specification) is not just hard, it is objectively impossible as of today (short of adding some adhoc axioms).

#### Guillaume Melquiond (Jul 28 2023 at 07:59):

More to the point, CoqInterval has a version of `exp` working on primitive floats, but it has a very weak specification, which is presumably only of use for CoqInterval. And it is excruciatingly slow. (One of my students is working on writing and proving a better version, closer to the state of the art.)

#### Michael Soegtrop (Jul 28 2023 at 10:23):

@Jason Gross : what is your goal? Do you want to say verify double precision C code or do you have computational arguments or algorithms you want to run in Coq and prove something this way? I guess what Guillaume wants to say is that the specifications provided by coq-interval for the arithmetic it provides are more suitable for the latter than for the first (because this is what it is intended for).

#### Jason Gross (Jul 28 2023 at 16:32):

@Michael Soegtrop a bit of both. I ported an algorithm from pytorch to Coq, and I want to prove the algorithm approximately correct on its domain. The brute force approach (run the algorithm on all 64^2 inputs in the domain, using primitive floats and primitive arrays) takes about 10 minutes in the native compiler, but scales exponentially in one of the parameters of interest. So I want to refine the proof to be a mix of computation and interval reasoning.

Therefore I need the computation of exp to be as fast as possible, I'd like the computation to match whatever pytorch uses closely (though I'm only trying to prove approximate correctness, so differing somewhat is okay), and the equational properties I expect to need are:
exp(-inf) = 0
exp(nan) = nan
exp(x) > 0 for all other x
exp(a + b) ≈ exp(a) * exp(b)

The only way I use exp is in softmax:
Given a list of finite floats xs, softmax(xs) := let ys := map exp xs in let total := sum of ys in map (fun x => x / total) xs

The theorems I want to prove, given softmax(xs) contains only finite floats, are of the form

1. If I add a constant k to every element of xs, softmax of the result is roughly exp(k) * softmax(xs)
2. Assuming the largest element of xs is at least epsilon bigger than any other element, the index of the largest element of softmax(xs) is the same as the index of the largest element of xs
3. I want a condition on xs (not mentioning exp except in constant factors) which allows me to prove that the largest element of softmax(xs) is larger than the sum of the remaining elements plus a small constant. (I guess the condition is that the largest element of xs is larger than the product of the remaining elements, times ln(the constant).)

#### Jason Gross (Jul 28 2023 at 16:32):

What's the spec of exp in CoqInterval?

#### Michael Soegtrop (Jul 28 2023 at 16:35):

I didn't really look at it. Coq interval comes with an executable interval arithmetic which should include exp. So you get some interval as result, which might be a bit wider than what is possible. So far what I got this way was sufficient for what I wanted to prove. The interval arithmetic is an internal feature btw., but afaik I am not the only one using it.

#### Michael Soegtrop (Jul 28 2023 at 16:36):

I btw. also used CoRN for such things, but recently more used coq-intervals interval arithmetic.

#### Michael Soegtrop (Jul 28 2023 at 16:44):

And in this xs is a largish vector?

#### Michael Soegtrop (Jul 28 2023 at 16:49):

Reasoning about the sum in the denominator of the softmax function will be tricky. What is the range of the elements of xs?

#### Michael Soegtrop (Jul 28 2023 at 16:55):

Hmm possibly we have a different definition of softmax in mind. I thought that softmax for the ith element of a vector xs is exp(xs[i])/(sum j, exp(xs[j]). With this definition, if you add k to each element, doesn't it just cancel out since you multiply the numerator and denominator by exp(k)? So why would you get exp(k) * softmax(xs).

#### Jason Gross (Jul 28 2023 at 17:12):

Hmm possibly we have a different definition of softmax in mind. I thought that softmax for the ith element of a vector xs is exp(xs[i])/(sum j, exp(xs[j]). With this definition, if you add k to each element, doesn't it just cancel out since you multiply the numerator and denominator by exp(k)? So why would you get exp(k) * softmax(xs).

Ah, you're right! This makes it even easier. Yes, I want to prove that softmax(xs + k) and softmax(xs) are approximately the same, then.

#### Jason Gross (Jul 28 2023 at 17:14):

I btw. also used CoRN for such things, but recently more used coq-intervals interval arithmetic.

Does CoRN support primitive floats? (Given that my first small toy example already takes 10 minutes in the native compiler, I'm concerned that trying to do without primitive floats will be too slow.)

#### Jason Gross (Jul 28 2023 at 20:02):

And in this xs is a largish vector?

xs is a smallish vector (length 2 in my toy example, longer length in larger examples), with values that empirically seem to range from about -30 to +30 and empirically seem to be bounded away from 0 by least about 0.1. Post-softmax, on my toy example, all values are bounded away from 0.5 by at least +- 0.002.

#### Jason Gross (Jul 28 2023 at 22:02):

(Btw, my current implementation of exp is translated from stack overflow, except I replaced `fma x y z` with `x * y + z`)

#### Michael Soegtrop (Jul 29 2023 at 08:04):

Does CoRN support primitive floats?

I am not sure, but I don't think so. CoRN is about constructive reals - that is a representation of a real number by an algorithm to compute it to arbitrary precision + proofs that this converges. So constructive reals feel a bit like floats - since one can compute with them - but they have most properties one would expect from real numbers - notably they form a field (which floats do not). Theoretically one could use primitve floats for cases where the precision is sufficient, but it would be complicated. For certain kinds of reasoning constructive reals are quite useful. If your goal is to prove something about floating point arithmetic, they are likely not what you want.

Yes, I want to prove that softmax(xs + k) and softmax(xs) are approximately the same, then.

This is still not 100% clear to me. Do you want to prove that

a) an implementation of softmax using some known implementation of exp based on IEEE 754 compliant basic arithmetic has this property

b) an implementation of softmax using some unknown but IEEE 754 compliant exp function (not sure what this would mean - would have to read the docs) has this property

c) an implementation of softmax using an approximation of exp with known error bounds and using IEEE 754 compliant arithmetic for the rest has this property

If this is about actual neural network implementations you might want c) since I would expect that typical softmax implementations in actual neural network code (tensor flow) are not terribly picky about the precision of exp. I guess any "rapidly growing monotonous function" would work. It might be just the exponent * factor + some low order polynomial of the mantissa which is reasonably continuous at the exponent increments (and I think it is not uncommon to use 2^x to save the exponent factor).

#### Jason Gross (Jul 29 2023 at 08:19):

It is about an actual neural network, so c) is probably fine (with the additional constraint that I'd like the implementation to be efficient when run in the native compiler)

#### Michael Soegtrop (Jul 30 2023 at 08:39):

Then the first step would be to find out what the neural network code you are using (say tensorflow) actually computes for softmax. Or alternatively come up with your own implementation and test it in the real neural network. If this helps I could make a suggestion. I would probably use Sollya to compute the coefficients. Are you using FP 32, FP 16 or fixed point activation values?

Last updated: Jun 18 2024 at 23:01 UTC