eg https://github.com/coq/coq/actions/runs/6768331316/job/18392554877?pr=18258

It seems https://github.com/coq/coq/blob/dc665b84f824370b40a98cb3b72ad691e18bc5e1/test-suite/coq-makefile/timing/precomputed-time-tests/001-correct-diff-sorting-order/time-of-build-both-real.log.expected is different from the generated file: the "after" total time in the .expected says `20m46.07s`

but in the generated `20m46.06s`

Is there some reason macos python would have changed how its rounding recently?

If I understand you correctly, this is a test for the test infrastructure itself, and it fails because a table created from identical raw numbers is different?

it's a test for the time table generation scripts (make-both-time-files.py)

It runs the script on the pregenerated logs next to the .expected I linked then calls diff

I see. I guess one has to run the script through a debugger. This could be a lot of things, say some optimisation which computes sums of floats in a different order or so (float arithmetic is not associative). Most languages don't have guarantees on bit accurateness of float I/O and evaluation order. To give an example: Common LISP does have guarantees that if you convert a float to text and back in certain formats, you get the binary identical number. I am not aware that Python has anything like this. To be save here, one might have to work with rationals or scaled integers.

Can anyone with appleware reproduce locally? We tried quickly with @Matthieu Sozeau but it seems to work on his machine.

Can you see what Python version is used in CI? But in general, as I said, it is a mistake to assume that such things work reliably with floats unless one takes special action. It might make more sense to rewrite the code to use other numbers which guarantee that this works (I can do this).

python 3.12.0

this passes on my machine (3.11.6) but fails on the macos job:

```
#!/usr/bin/env python3
l=[1.38, 1.85, 1.93, 7.33, 11.15, 13.45, 22.81, 24.71, 27.11, 27.81, 31.00, 33.08, 34.35, 38.19, 39.72, 40.13, 43.34, 49.44, 155.79, 181.77, 202.96, 256.77]
seconds=sum(sorted(l))
partial_seconds=int(100 * (seconds - int(seconds)))
print ('seconds=%f, partial_seconds=%d' % (seconds, partial_seconds))
if partial_seconds != 7:
exit(1)
```

https://docs.python.org/3/library/functions.html#sum

Changed in version 3.12: Summation of floats switched to an algorithm that gives higher accuracy on most builds.

https://github.com/coq/coq/pull/18265

This indeed looks like the root cause. If you want to avoid such issues, you should either use Decimals or compute with integer milliseconds or microseconds.

Should I do a PR?

Last updated: Oct 13 2024 at 01:02 UTC