fiat-parsers stack overflows on compiling files like src/Parsers/GenericRecognizerOptimized
because the ocaml native compiler stack overflows when generating the native files for the .vo file. Was there a recent change to Coq master (within the past month or so) which would explain this change? (For example, does it force early native compilation by default now where it used to not?)
https://github.com/coq/coq/pull/13352 ?
Seems likely, thanks!
I may be hitting the same issue trying to compile graph-theory
on the latest coqorg/coq:dev image:
https://github.com/coq-community/graph-theory/pull/12/checks?check_run_id=1518902073
I have never (consciously) fiddled with any "native-" settings. Is there an issue tracking this?
- COQC theories/open_confluence.v
- Fatal error: exception Stack overflow
- Error: Native compiler exited with status 2
However, I had successful builds less than a week ago and this PR was merged 18 days ago.
There's no issue since it's a conscious change. There was an announcement made here: https://coq.discourse.group/t/about-the-native-compiler-flag-and-the-coq-native-opam-package-to-be-released-before-coq-8-13-beta1/1147
If native-compute doesn't work for your project, you should disable it explicitly.
Maybe the steps to do so, should be added to the above announcement.
Basically, this should amount to adding an explicit -arg -native-compiler -arg no
to your _CoqProject
file.
this seems like a doubtful way (overall approach) to handle incompatibility with native compilation. If I just have a regular Coq library, why would I even risk having users complain about features I can't control? It seems like all library devs will want to add this preemptively.
my view is something like: everyone who installs coq-native
is completely on their own, and can't expect any support from any library devs
in absence of a message to this effect when installing coq-native
, it seems prudent to just disallow native compilation by all library maintainers
I am note sure to understand which is your point of view. Either we consider users who install coq-native
are on their own and then library maintainers do not have anything to do a priori (and not even a posteriori). Or we consider that all the libraries need to support this use case (is there a CI job to that effect?), possibly by disabling native compilation.
Anyway, coq-native
is not installed by default; it requires an explicit action from the user. So, hopefully they know what to expect. (Am I too optimistic? Are there users who install packages they do not understand?)
The confusion seems to come from the fact that coq-native
has been added to the docker images used by many people for CI.
I guess, this is a good opportunity to discover that some libraries do not handle native-compute. This information will be needed anyway on the opam repo with the future split approach.
Here is how it works: user X installs coq
and coq-native
, then tries to install a bunch of libraries, then opens lots of issues complaining about failing native compilation. This is a net negative for library maintainers, who have to deal with the complaints. Why would any library maintainer who is not using some fancy reflection tactic ever want their library to be compiled with native? The announcement and CEP give zero evidence that the general Coq library gets any benefits whatsoever.
personally, I'm going to add -arg -native-compiler -arg no
to _CoqProject
in all my projects, and also advise everyone else to do the same.
Many library can be used by people that do want to use naitve-compute.
So there is no point in disabling it if this works (most cases in the CI of Coq).
the point is exactly as I described: we don't want users to report issues or bother maintainers when native-compute breaks everything, which it can be expected to do a lot per above
By comparison to OCaml, you don't want to forbid compilation with ocamlopt except if you know there is a specific problem (even if yourself are only using ocamlc).
the difference here is that ocamlopt at least seems to work on a majority of projects, and the benefits are known (speedups have been measured and are substantial)
This is also the case, although I agree, the majority is smaller (rather a large majority than almost all).
example selling point of ocamlopt:
O'Caml programs compiled with the native-code compiler run 6 times faster (on an average) than the same programs compiled with the O'Caml bytecode compiler.
can anything similar be said for coq-native on average?
Nope, there is a huge overhead due to native compilation.
@Jason Gross has probably more experience in this area btw.
The overhead is at compile time, not runtime though.
Yes, but you pay for it when using native_compute
anyways since it incurs a small compilation phase.
You need to link stuff, which has a definite cost.
then, the "selling proposition" of allowing native-compute for an average Coq library seems terrible as of right now
I am convinced that with the current infrastructure we're bound to displease everybody.
That's why I think that we should have separate native compilation as soon as possible, so that if you're a user of native compute, things should be transparent for you without incurring a cost on non-users.
my view: allowing native-compute to happen for a Coq library should be completely opt-in by the maintainer regardless of the version of Coq and regardless of how Coq was installed (possibly with some way for expert users to override)
Our point was precisely that by including coq-native
in Docker images, maintainers would discover that their projects are not compatible with native-compute and could disable it (which will also avoid complaints from users if they try the same later on).
I don't see the point of disabling native-compute if your CI tells you that it works with it (which it tells you if you are testing with 8.13 or dev right now and it doesn't fail).
cc @Erik Martin-Dorel BTW
so in other words we become guinea-pigs for native-compute
the reasonable thing to have is a way of opting-in to become a guinea-pig
I don't understand what is so bad in having the default CI setting automatically check compatibility with a standard Coq feature until you explicitly disable it.
I mean the only complaint that I do understand (but it hasn't really been voiced) is that it is not clear enough how to deal with these new failures right now.
per Pierre-Marie above, it has substantial overhead, and nobody has shown it has any benefits for the average project
lots of downsides, no demonstrated upside
I wonder how OCaml project maintainers would react if -flambda was suddenly the default in all their compilers
I don't know what's the plan for multicore but I am slightly afraid it'll indeed be shoved down our OCaml-user throat.
The downsides of the current approach were discussed and indeed they do imply that docker images have to be provided both with native and non native support enabled by default.
As a quick fix tho I would have the default docker image to increase stack size using ulimit
, that should greatly reduce the amount of StackOverflows users find.
Most compatible would indeed to have the default docker image setup with -native no
and provide a coq+native
image for those wanting it.
Karl Palmskog said:
Why would any library maintainer who is not using some fancy reflection tactic ever want their library to be compiled with native?
There is no relation. For example, if you forbid native compilation for Coquelicot (which does not use any native-worth reflection), then you forbid it for CoqInterval (for which it is certainly worth it).
Pierre-Marie Pédrot said:
Yes, but you pay for it when using
native_compute
anyways since it incurs a small compilation phase.
Keep in mind that a user who wants to use native_compute
has a use case for it. For example, I am willing to pay the compilation overhead (a few seconds at most) for the interval
tactic, because I am then running it on some goals for several minutes, if not longer. So, the x6 speedup is certainly worth it.
Pierre Roux said:
The confusion seems to come from the fact that
coq-native
has been added to the docker images used by many people for CI.
This is precisely the issue for me here. I get a "random" stack overflow from one day to the next when trying to build my library using the coq images. Without any further information, I tried to come here to see what's going wrong. Now, even after reading the post on discourse I still have no idea. I'm seriously tempted to just to what @Karl Palmskog suggests and disable it.
But how does such a setting affect users of ones library that have coq-native
installed? Will they still be able to use the library, even though it has not been built for native comutation?
Disabling it was precisely my suggestion in your case.
My point is that it doesn't have to be done if CI doesn't break.
Yes, users will still be able to use the library even if they have coq-native
installed.
By disabling you prevent a compilation failure while they try to install the library via opam BTW (the whole point of making CI test this by default).
However, it forces libraries that depend on yours to also disable native compilation, and so on transitively.
Anyway, my takeaway is this:
so the whole Coq ecosystem is supposed to run into these stack overflows, then figure out for themselves that they need to add the disable command into _CoqProject
?
this is to me terrible distribution of knowledge, most people (nearly everyone) have no idea about the intricate tradeoffs discussed by Guillaume, and are not in a position to make choices for themselves on coq-native
What about providing 2 CI images (one with, and one without coq-native
)?
this would not help unless the one without coq-native
is the default
(which I would be fine with)
Yes, that would be my expectation as well. Something that breaks stuff for many people out there with gains for few should not be the default IMHO.
Can we have an actual estimation of what "many people" means here? For now, I have seen only three complaints, but maybe I missed some.
maybe someone can then start to go through the released
coq opam archive and see how coq-native
works out?
Indeed, currently this only hits people who regularly build against the coqorg/coq:dev
image.
for a new default not known to many people outside the dev team, it seems strange to draw conclusions so early from existence/non-existence of complaints
I think a new proposed default should have the burden of showing that the "average project" is not hindered/bothered by it
Yes, so let us stop discussing about it until we can actually draw some conclusion. This is just pointless.
Guillaume Melquiond said:
Yes, so let us stop discussing about it until we can actually draw some conclusion. This is just pointless.
No this is not pointless, some of us have been working having native supported more widely for a while [including fixing lambda which provides another 20% in some cases] and we had a good understanding; you don't need to have a lot of adivinatory powers as just inspecting all the breakage that this default caused in Coq's CI in enough to predict the wider problems for a large set of libraries if the default was set to native=yes. That seems the wrong choice for Docker.
In fact, all these problems were predicted in the discussion, the choice was to ignore them, but that doesn't mean they don't exists. Karl's point is much valid and this new setup is a disaster for users, and even a lack of respect given the time they will have to spend to figure things out. Many of them won't even know what "Coq native" is.
Going back to Docker, I'd try my ulimit
tweak as first step, if that still creates problems I'm afraid that Docker setup should be similar to OPAM, default image should not force native compilation. In fact, there is very little reason with current coq_makefile support for native to enable native when working on a library, except in a handful of very specific cases; for the rest you just incur in a huge pointless overhead. Dune's native support will already disable native anyways for development mode unless you are producing an package for release.
But don't you want CI (we're talking about CI here) to test what you will ship in your release before you do actually release?
thanks for the summary Emilio, I agree with your analysis and proposed plan
Emilio Jesús Gallego Arias said:
In fact, all these problems were predicted in the discussion
I do not know which discussion you are referring. In the CEP for native compilation, there have been 122 comments. Which of those comments are predicting anything about this Docker issue? As a matter of fact, the word Docker does not even appear in the whole discussion until the very last comment. Nobody predicted there would be an issue with the Docker image. And if you did, you forgot to voice it out.
In fact, Erik even asked if someone would object to changing the Docker image. Why did you not object since you knew it would be such a mess?
Théo Zimmermann said:
But don't you want CI (we're talking about CI here) to test what you will ship in your release before you do actually release?
My reason for CI testing against coq.dev that I want to see compatibility problems one-by-one as they are introduced by the development process and (hopefully) find backwards compatible fixes for them one at a time. That this code will be packaged and shipped is a an afterthought at this point.
Note that the new behavior (the fact that the -native-compiler
option is now propagated to coqc) only deals with Coq 8.13+beta1 and Coq dev.
As mentioned by @Guillaume Melquiond, the crux of this issue is that even if a library is fully orthogonal with native_compute
, it still needs to be precompiled for native as soon as it appears as a dependency of another library that wants to benefit from native_compute
.
Regarding @Enrico Tassi's suggestion:
On the one hand, distributing twice more images for each coq(>=8.13) version does not seem to scale well (since this week, there are now 4 different ocaml flavors for each coq version, each in its own image, as it has been discussed in https://github.com/coq-community/docker-coq/issues/12 and as I need to announce it today-or-tomorrow BTW).
On the other hand, assuming we did that way (having 8 images per coq version), it wouldn't be very satisfactory as a solution, because it would just amount to say: if a given library has an issue with native-compiler, the maintainer just ignores the issue rather than "fixing" it by disabling the option; so in the very end, if users do have the coq-native package installed in their switch, the issue will still be here… and will still need to be fixed "upstream".
All in all, I fully agree with @Théo Zimmermann's remark:
Théo Zimmermann said:
Anyway, my takeaway is this:
- the Discourse announcement is not clear enough on what library maintainers need to do
- native compilation failures could contain advice to disable native-compilation and explain how to do so.
Sorry I'm busy with grading assignments this afternoon, so if some of you have the time to post below the Discourse post (URL updated) a summary or so regarding advice for maintainers to tweak the -native-compiler
flag for native-incompatible libraries, feel free to go ahead!
Two pointers just in case:
(and there's also @Emilio Jesús Gallego Arias's suggestion regarding ulimit
that could be explored)
Should we be using -native-compiler no
or -native-compiler ondemand
?
Regarding the benefits of native_compute
, fiat-crypto has one example where the input program is small, the output is small, but there's a lot of computation being done, and here the native compiler shines giving at least 2x speedup over the VM IIRC. In all other cases I've encountered it's worse than the VM due to compilation or readback overhead.
Assuming I actually want to learn about the native-compute feature, i.e. what it can and can't (yet) do. Where would I start? Also, my impression from this discussion is that while coq-native
is to be "unleashed" onto everyone per default, there seems to be no reasonable expectation for it to actually work even on projects that do not mention native
anywhere. (Nobody has responded that hitting a native-compute stack overflow in an a "pure .v" development is a problem that should be fixed...)
I've split up the discussion on OCaml variants to a separate topic.
Nobody has responded that hitting a native-compute stack overflow in an a "pure .v" development is a problem that should be fixed...
It was expected that these problems would happen, just as it is expected that coqchk will run into issues on some projects. This doesn't mean issues cannot be open and problems cannot be fixed.
The question is, should I expect to be able to compile my "pure .v" development without tiptoeing around deficiencies of native-compute and without locking all (potential) users out of using native-compute before 8.13 is considered release ready. (Presumably, nobody would release a new version of gcc wich in the default configuration stack-overflows on 1% of all projects using it).
Jason Gross said:
Should we be using
-native-compiler no
or-native-compiler ondemand
?
In both cases, no cmxs will be produced. The only difference is when your code contains native_compute
calls, in the first case (no), coq will fallback to vm_compute
, in the second case (ondemand), it will precompute the missing cmxs on the fly.
Guillaume Melquiond said:
Can we have an actual estimation of what "many people" means here? For now, I have seen only three complaints, but maybe I missed some.
For what we know now : in Coq's CI, about one third of the tested projects required explicit disabling of native (so two third compile with native). It is likely that projects tested in the CI are larger than average so the figure in the wild could be much smaller.
Does that mean that when I set -native-compiler ondemand
and people want to use native_compute
on things from those parts of the library that do work with the the native compiler, they can still do so?
nope, the difference between no and ondemand is only for you
(if by "people" you mean "user of your library")
FYI, the difference between coqc -native-compiler no
and coqc -native-compiler ondemand
is summarized in this array taken from the CEP.
so to sum up, if a library does not support (in a coq-native
switch) a whole native precompiling:
coqc -native-compiler no
and coqc -native-compiler ondemand
would be fine and prevent that precompilation;coqc -native-compiler ondemand
might be viewed as a "more expressive choice" (as the native_compute
tactic will still be available with its standard semantics, i.e., no fallback to vm_compute
)coqc -native-compiler no
might be viewed as a "more restricted choice" (as any native_compute
invocation will fallback to vm_compute
with a warning, and no temporary files .coq-native/*.cmxs
will be created)native-compiler
's extra expressivity is likely to be only useful in limited scenarios
and I was also a bit surprised when coq-native remained enabled on the Docker images... I was honestly hoping that now we'd get _flambda_ enabled instead, giving everybody 15-20% speedups
(probably that'd need discussion too for potential regressions)
I'm happy enough building my own images with my own opam package, but I'm sure many Coq users prefer otherwise.
standard Coq feature
@Théo Zimmermann I respect you deeply but I happen to disagree; native-compute should be labeled as experimental. ondemand
has extremely limited use-cases, and was IMHO never a good default; native-compute=yes
has specific use-cases and the recent changes should make them easier to experiment with. native-compute=no
works.
and to be clear, I'm interested in anything that can help performance, including native-compute! And I do hope more people will now be able to try it out!
Paolo Giarrusso said:
and I was also a bit surprised when coq-native remained enabled on the Docker images... I was honestly hoping that now we'd get _flambda_ enabled instead, giving everybody 15-20% speedups
All the Docker images that support flambda are indeed (already) flambda enabled and the default OCaml version is soon going to switch to 4.07, which is one of those versions.
@Erik Martin-Dorel It seems to me that we have received enough blacklash from users already to backtrack on the initial choice to inculde coq-native
in the default image starting with 8.13
It was a bit unfortunate that this was not discussed in the initial CEP as well with the "experimental status" of native-compute, but nobody thought about it at the time.
I also think @Erik Martin-Dorel should consider adding a post-installation warning of some kind for the coq-native
opam package, like "Experimental: compilation of Coq files may now fail unexpectedly."
Ideally, something like the following as well: "Coq file compilation failures after installation of this package should be reported to Coq developers, not to Coq project maintainers"
Well, they should actually be reported to the opam-coq-archive so that native-compute can be disabled by default to those packages.
this to me sounds like you're saying opam archive maintainers should shoulder the burden of fixing problems with native-compute. How are we supposed to do this if they are not using coq_makefile? There is no guarantee that just because we have a package, we can control arbitrary options sent to coqc.
Can't we have 1 more image (for one compiler with has no known shortcoming with native_compute) with coq-native and have it off in all other images (to cause no breakage)?
If we want to lint any set of packages w.r.t. a feature, I think it would be more polite to do it ourselves, running jobs with coq-native installed in our CI, not the user's one.
what I would like to be the case w.r.t. the coq opam archive: a maintainer of a package should have to explicitly opt-in to have his package be built with native-compute. If he doesn't do this, native-compute should not be possible to use with the package for regular opam users (expert users could still override)
this to me is the the only reasonable way to handle a feature that can arbitrarily break compilation/build
I think @Théo Zimmermann still has a point. If a user installs both A and B, and B requires coq-native, then A is recompiled, and it may be the case that A breaks with coq-native around. It is may hence be nice to explicitly turn off native in A's package, even if A is.
We can add a job to the opam archive to test if the package happens to break with native (even if it does not require it). @Karl Palmskog if you agree I can try to add this job. If the package uses coq_makefile, a fix in _CoqProjects may be sufficient. At worse, one can declare that the package conflicts with coq-native (this works for any build system).
@Enrico Tassi it's fine if you add an opam archive job with native-compute if this job does not affect the green/red status of CI
With just my hands, I can't. But maybe @Théo Zimmermann maybe can help.
I can make the job "allowed to fail", but then it would be unnoticed, unless a bot posts a message in the discussion informing about the thing and suggesting the PR author what he can do. You would still see the green light, and could decide to merge even if the package is not "clean" w.r.t. native.
Maybe there are better ways.
let's say that 1/3 of projects break with native-compile, this means we would have CI red 1/3 of the time when it should be green, which almost makes the green/red signal meaningless. I think this would be a big loss.
if the job is "allowed to fail" then it does not show up in the CI box and does not contribute to "being red". It is run, and ignored.
the bot could still do something useful out of its result.
I was proposing the bot to do something like an automatic review/comment based on the result of that specific job.
Or even propose a change. But I don't know at which extent this can be done, Theo knows.
Apparently there are API to put a review comment on a line (eg the bottom of the opam file). The bot could suggest to add conflicts: [ "coq-native" ]
when that job fails. WDYT?
Karl Palmskog said:
what I would like to be the case w.r.t. the coq opam archive: a maintainer of a package should have to explicitly opt-in to have his package be built with native-compute. If he doesn't do this, native-compute should not be possible to use with the package for regular opam users (expert users could still override)
If this was the objective, then the changes to Coq were not needed and the CEP could have been implemented with the strategy proposed for older versions of Coq. It's pretty sad that this position was not expressed during the CEP discussion.
@Enrico Tassi You can already add allow failure jobs. They will be reported with the neutral check (gray square). We are using them (and it works well) for async jobs on the main repo but it is also supported in the opam-coq-archive repo.
I can implement whatever you want in addition (such as suggested changes) but I need a clear feature request for that (tell me exactly what you want).
great, I will do that then.
if you give me some hints, I may give a try to patching coqbot. It's like the message for jason about the failure job, but it should be reported as review, rather than as a regular comment. I'd like to test if one can suggest "a patch".
@Théo Zimmermann can you elaborate on where the coq opam archive is mentioned in the CEP as the place (dumping ground) to handle native compilation issues by users?
Nowhere, but it is written in it that native-compute will become available for all packages without any action from the maintainer, and now you are saying it shouldn't be the case. It would have saved much trouble if you had expressed this opinion earlier.
I really don't care if native-compute becomes somehow available (it would have been available with the Coq opam compiler apprach as well). What I care about is who is going to take the fall and apply fixes when things arbitrarily fail. For example, it only now became apparent that 1/3 of the Coq CI doesn't support native-compute (and fails with an inscrutable error message like "stack overflow"). One would think this had been pertinent information when discussing the CEP
Disabling native-compute is supposed to be an easy step. Patching the opam archive to do it when we know a package fails is just a way of sharing effort between native-compute users. Just because you are a maintainer of the archive and you personally don't want to have to care about it doesn't mean that it's not useful or worthwhile.
but if it is worthwhile, where is the evidence? in this thread, we have had maybe 3-4 projects report some anecdotal benefit, and we have the 1/3 failed figure from Coq CI. Given this state of knowledge, I would still recommend everyone who is a Coq project maintainer to turn it off locally in their projects
I'm not a user of native-compute myself, so I cannot answer this, but it looks to me like there are a number of users in this thread that have repeatedly insisted on how important this feature is. (And they can probably point out to actual evidence of this.)
That it is not working well on all projects is certainly in part due to inherent limitations but also to the lack of sufficient testing so far.
Finally, why again would you still recommend every maintainer to disable it by default since the whole point of the CEP was to make it available by default?! If the objective was different, then it could have been achieved differently!
the recommendation would be based on:
I think it's very strange that Coq project maintainers would be expected to welcome features that do not come with any clear upside, and a large possible downside.
but with that said, this recommendation is based on the current state of knowledge, and so could change if benefits are actually shown in benchmarks, and there is no flood of error reports
but just to be clear, I'm not really interested in seeing the exact benefits for one or two specific projects, I'm interested in seeing figures for many (100+?) relevant projects, i.e., averages.
Given that native_compute
is a tactic you have to use explicitly, I don't see how you would magically see anything for a random project.
It seems that you are completely confused because you expect benefits on average like with flambda
when this has nothing to do with this.
The benefits are only for the users who wish to rely on native-compute (and will thus install coq-native
in their switch).
There are certainly no expected benefits on the build speed of average projects. On the contrary, it makes compilation a bit longer!
But nobody suffers from this if they do not install coq-native
!
That was the whole point of the CEP: let the user decide!
So yes, maybe (probably) it was a mistake to install coq-native
by default in the Docker images.
But this is beyond the current discussion which has shifted to a complete utter nonsense.
Well, let's see the good side: at least now many people (including us, coq devs) are aware of the breakage introduced by native compilation.
Yes, that was one of my point above (until now, it was not sufficiently tested).
I didn't confuse anything. We are talking about why or why not I recommend people to use certain options. If the benefits of including an option outweigh its costs, I recommend it. In this case, I judge that adding -arg -native-compiler -arg no
to _CoqProject
should be recommended, since it means you will not ever have to pay the costs of having failing native-compute builds, and your users will not complain about stack overflows when installing your projects.
most users will not be able to tell what coq-native
means, so they may install it by accident or use a Docker image where it is pre-installed (and I guess Nix Coq builds will have it as well). The option I recommend completely rules out any problems from this.
Karl Palmskog said:
we have had maybe 3-4 projects report some anecdotal benefit
Don't forget that up to now native_compute
was mostly unused because it was terribly inpractical to use (basically, everything had to be recompiled manually with specific options). So you cannot conclude to the uselessness of native_compute
from that observation.
@Pierre Roux this is not about concluding the uselessness of coq-native. It's about the situation for regular Coq project maintainers - I'm arguing that for most of them the new coq-native infrastructure will be a tax imposed from above without any personal benefit. Most will have no idea what coq-native is all about, yet they could see their projects fail in Docker and users claim the project fails compilation. The _CoqProject
option will allow them to avoid all these potential problems.
But then, they may experience complains from coq-native users that they disabled it, possibly for no reason. There seems to be no perfect solution.
it seems to me that coq-native users are generally savvy and highly sophisticated and can figure out workarounds, so it seems far more likely that user complaints will come about coq-native failures rather than absence of coq-native.
I agree that this is in the end a question of: whose interests matter more?
You may also consider that, looking at Coq CI, the projects that seems to fail with native compute are not the random proof project but projects doing complicated things, with for instance very large non opaque terms, those users woul probably also know what they are doing and know at least a bit about native compute. Again, the average user doing regular proofs is very likely not to be impacted (but its proof could be of use for others).
on average, it seems unlikely that the "transitive coq-native compilation benefit" will be realized for a project. The opam coq archive dependency graph is quite shallow - not many projects are used as dependencies. Moreover, I think this is like saying you should take one for the team - I should allow this failure vector so someone else I don't know about will gain.
We already had examples so it is definitely possible, how likely is impossible to know today, as already said. "unlikely" is only your personal opinion.
Hi all, I've just seen your comments − I guess the usefulness of the coq-native packaging was discussed in detail in the CEP, and for both use cases:
coq-native
, the new behavior speeds up their compilation of the coq
package, given the stdlib is not precompiled with native anymore)native_compute
tactic for computationally-intensive reflexive tactics (for these users as well, the situation is significantly improved by the existence of the package)But after installing the package by default in the Docker packaging, I wasn't expecting to get as much backlash, following these failures with the Coq.8.13+beta1 or coq.dev
builds… Anyway I guess what suggested Théo and Enrico looks a very sensible trade-off, very easy to setup and to document:
So I plan to start rebuild these images ≥ 8.13 from now on, but could you please comment on which strategy you'd prefer for older images?
(A) do the same for older versions (8.7 ≤ Coq ≤ 8.12) as for Coq 8.13+ (also leading to five different images, one with OCaml 4.07.1 + coq-native (albeit the "automatic propagation of the -native-compiler yes
flag" especially concerns Coq 8.13+))
(B) keep the current four images as is, i.e., with coq-native
, with the stdlib native-precompiled only, but allowing to native-precompile selected libraries later on, when the item 3 of the CEP is gradually taken into account.
the most depended-on package in the opam archive is coq-mathcomp-ssreflect
. Yet, it is only depended on by 32 packages. coq-ext-lib, a general library, is only used by 6 other packages. The upper bound on number of dependencies (and thus potential for transitive coq-native benefits) look terrible compared to the Java ecosystem, where packages regularly depend on 40+ packages (see this paper: https://www.cs.cornell.edu/~legunsen/pubs/GyoriETAL18EcosystemRTS.pdf)
I think A is safer.
I would also vote for (A)
+1 for (A): I had no crystal ball, but I also expected native-compute
to be 100% opt-in. native-compute support for package foo should probably be a feature request from users of foo who are actually interested.
Paolo Giarrusso said:
+1 for (A): I had no crystal ball, but I also expected
native-compute
to be 100% opt-in. native-compute support for package foo should probably be a feature request from users of foo who are actually interested.
Really? Then once again, the part about changing the default value of the coqc
-native-compiler
option in Coq would not have been necessary. You are the second participant to this CEP to express this opinion now.
BTW, option (B) is the most backward compatible for older Coq versions: since the stdlib used to be pre-compiled.
How can it be less backward compatible? If you have job for 8.9, then A says that that job will continue to download the same image as before, while B does not imply that. IIUC
Théo is right regarding backward compatibility (apart from macOS users). See the opam-repository PR that implements the CEP.
I will repost the URL.
OK, maybe I'm lost, no problem. Then I vote for "the option that has the least chance of breaking things for users not opting in".
Here is the link: https://github.com/ocaml/opam-repository/pull/17690/files#diff-655d9f2dcb2ffb68e9326406cdc3a01a43ba35d4e496a87f624f2063e61ab7dd
you can see that beforehand for Coq 8.12.0, we had this line:
"-native-compiler" {os = "macos"} "no" {os = "macos"}
so beforehand, native-compiler was unconditionally disabled for macos (which was maybe a too abrupt choice BTW, but this is another story),
and enabled otherwise (just for the stdlib for Coq < 8.13).
now:
"-native-compiler" "yes" {coq-native:installed} "no" {!coq-native:installed}
so if coq-native is not installed, the stdlib won't be precompiled for Coq < 8.13.
Thus strategy (A) in my previous comment would indeed lead to a non-backward-compatible change for the Docker images of these old Coq versions.
Personally, I'm very OK with both strategies (A) and (B).
But maybe if we have not a very strong opinion on this last choice, we could also defer this choice for, say, one week (keeping the strategy (B) for the moment and only remove coq-native from Coq 8.13+), then do the announcement that I was planning for the single-switch Docker migration, and maybe applying (A) next week depending on the feedback we might get in the meantime? WDYT?
Non-locality of the depext setup was indeed discussed in the CEP, and it is a problem. Unfortunately if you disable native
in a package locally you get a broken switch in the moment some other package depends on it. The only solution I can see with opam is to actually add a conflicts: coq-native
constraint.
Note that all the breakage we are seeing is due to either a) lack of memory or b) small default stack size, [which can be fixed with ulimit
], so indeed I would expect coq-native users to bump ulimit by quite a bit if they want to use it. I'd conjecture that for a large enough ulimit the whole archive should compile.
Note that native + flambda will give nice speedups but will eat even more stack, so IMHO it is reasonable to keep that in mind. coqc
tries to workaround this by unconditionally passing -O classic
to OCaml, this effectively removes many benefits from flambda but provides a safer setup. Super-specialized users would like to tweak Coq so -O 3
is passed instead, as they could likely reap on the benefits.
By the way flambda provides speedups on the range of 2%-18% perfect, just to mention that not everyone will greatly benefit from it, but indeed quite a few users see a speedup.
Hi Emilio, just to be sure, could you elaborate a bit on what you mean by "Non-locality of the depext setup" ?
Erik Martin-Dorel said:
Hi Emilio, just to be sure, could you elaborate a bit on what you mean by "Non-locality of the depext setup" ?
Non-locality means that a package A
may pull coq-native
, forcing recompilation of unrelated package B
with different options that B
is not aware.
OK thanks! (actually I guess you meant the "non-locality of the depopts setup")
Anyways I'd be pragmatic here as I think we'll have a better setup for 8.14; as @Erik Martin-Dorel is discovering touching anything related to build setup is a hairy issue as I've lived on my on flesh :) sending courage as this is hard and exhausting work :)
Thus we could:
ulimit
, I'd use 4.11 but indeed it is not clear what is the best setupcoq-native
opam is installed about the need to bump ulimit
coq-native
for packages that won't compile even with large ulimit
Erik Martin-Dorel said:
OK thanks! (actually I guess you meant the "non-locality of the depopts setup")
Oh yes, sorry :/
This way coq-native
seems pretty opt-in as not to cause a lot of harm in non-expert users, and we would have a good understanding for the future setup with split packages: basically, packages that have the conflicts would not be a candidate to provide coq-pkg-native
in the split setup.
The split setup would signal a bit better what packages cannot be compiled yet with native.
Note that I have submitted https://github.com/ocaml/dune/issues/4020 which will make this process automatic for packages using dune
but if the split-native approach is ready for 8.14, note anyway that the coq-opam-archive packaging maintenance overhead that is induced by that approach should be solved beforehand…
coq_makefile should not be a lot of problem to handle, but indeed the split setup will still have some hurdle as maintainers will have to enable support explicitly, but I much prefer that as IMHO it is reasonable that package maintainers do test native before enabling it in their packages.
Erik Martin-Dorel said:
but if the split-native approach is ready for 8.14, note anyway that the coq-opam-archive packaging maintenance overhead that is induced by that approach should be solved beforehand…
Not sure I fully understand, please feel free to correct me. Ideally, the migration to a split setup would be opt-in
in the sense that before coq-native
, after you implement the split setup, no package will provide a native setup
but coq
with coq-stdlib-native
then you could update packages as you are able to test them
in a one-by-one basis
the thing is that indeed I think that automatically having all the packages compile with native seems a bit too risky for me
in the sense that if you do that, you are not testing them beforehand
I think a slower but less risky setup is to enable packages as you test them
but indeed how to make the life of packagers easier I dunno, for dune that will be automatic, but other people with opam, I guess they should update from a template? [That's how it works today right?]
I also think that the split-native approach is very nice, but from the opam-coq-archive perspective, this adds many packages to maintain!
(Karl already mentioned that tweaking the existing opam
specs for non-native-compatible packages might be a concern, so it will be a yet more complex task if all native-compatible packages have to be duplicated for the split-native approach!)
This could be solved IMHO just if the generation of the coq-lib-native
packages can be automated.
I'm trying to add a job to the opam archive so that we can test native when people submit packages https://github.com/coq/opam-coq-archive/pull/1504
can someone remind me of a package that fails? (to test the infrastructure)
Erik Martin-Dorel said:
I also think that the split-native approach is very nice, but from the opam-coq-archive perspective, this adds many packages to maintain!
This could be solved IMO just if the generation of thecoq-lib-native
packages can be automated.
Yes indeed, it should be automated. I don't think it would be very hard to update the opam templates to do so, but I am not very familiar. For the dune-based packages they'll know what to do.
Thanks Enrico!
Enrico Tassi said:
I'm trying to add a job to the opam archive so that we can test native when people submit packages https://github.com/coq/opam-coq-archive/pull/1504
can someone remind me of a package that fails? (to test the infrastructure)
I guess there is one of the packages mentioned by @Christian Doczkal?
(but I've just seen you already ping'ed him on GitHub :)
@Emilio Jesús Gallego Arias Regarding the flambda topic and to follow-up a remark by @Paolo Giarrusso, it is true that it's only partly supported in docker-coq (and there is an opened issue here: https://github.com/coq-community/docker-coq/issues/10): we could indeed pass the dedicated option -flambda-opts '-O3 -unbox-closures'
to coq's configure,
but I believe setting these options is not a specific issue to docker-coq but to the opam packaging, and could be promoted as a default… (as on non-flambda swiches, this option would be a no-op!)
Actually note that I had first proposed to do the two changes (coq-native and flambda-opts) in one go in the following PR:
https://github.com/ocaml/opam-repository/pull/16887 (now-closed); but I decided to look after these two enhancements one at a time, all the more as Théo had suggested to open the coq-native CEP to foster the discussion…
So regarding -flambda-opts
do you confirm @Emilio Jesús Gallego Arias that proposing another PR in opam-repository soon doing this, would be the way to go?
cf. https://github.com/ocaml/opam-repository/pull/16887/files#diff-655d9f2dcb2ffb68e9326406cdc3a01a43ba35d4e496a87f624f2063e61ab7dd
finally I know that @Paolo Giarrusso and you have done many benchmarks on "with or without flambda"; but would you advise me to have a look in a particular place if you gathered some of your benchmark infrastructure on that flambda topic, in a given repo? anyway, getting as Emilio mentioned, a speedup of "up to 18%" in the default setting would be great…
@Erik Martin-Dorel thanks for reminding me; enabling flambda in opam packages was indeed the plan, which the coq-native change now unblocks.
FWIW we’ve been saying no to native and yes to flambda for around 6 months throughout our company and CI with 4.07.1-flambda
(with limited deps, since we mostly use iris and template-coq)
These flags are already the default on Coq dune builds, and indeed make sense to pass always, I believed it was the case actually.
Note that these flags are for compilation of Coq itself, and they have no relation to OCaml flags used in native-compute
speedup comes from less allocation in the proof engine I think mostly
so flambda is a kind of mechanized @Pierre-Marie Pédrot
But yeah, flambda is a big pain to use, and with native we have a bit of the same setup now so we will find the same problems as noted in the CEP
yes Emilio, I know that the flambda flags are orthogonal to native-compute (modulo the issue you recall in your last comment); my only question was about the need to update the coq packaging in opam-repository (which does not rely yet on dune)
I thought that was already done
not as far as I know
No, because despite "no relation", flambda + native is measurably slower than the default for some reason.
We've been working around the relevant issues for ~one year, the goal here is to work around them better :-)
yes, but what would be a good benchmark suite for that in your opinion?
Anyway, Emilio is for the change, and my concerns are addressed
because if we focus on native_compute, you know there are two times that can be "benched" (the compilation time and the running time)
ah; I have some benchmark data on github coq issues, I can dig into it again if we adjusted the question.
Paolo Giarrusso said:
No, because despite "no relation", flambda + native is measurably slower than the default for some reason.
Oh indeed, yeah that's the problem with flambda's all-or-nothing approach
Yeah even if we pass -O classic
, flambda is slower compiling OCaml files
for compiling Coq that is not problem, but it is a problem if you have native
Good point Erik, my benchmark were just about "I never write native_compute, why do I pay for it"
(or rather, I just enabled flambda, got slowdowns, and investigated. That's the yak we're still shaving)
yeah , I always disable native except when building binaries for distributions
or of course if I was doing some native_compute proofs, and Dune will explicitly disable it when in dev mode
So given the current state of the discussion, I guess we could just do this in the coq opam-packaging?
pass the -flambda-opts "-O3 -unbox-closures"
option only if OCaml >= 4.07 and coq-native
is not installed…
(having ideally up-to-date benchmarks for Coq 8.13+ and Coq < 8.13 that confim again this is the way to go :)
I'd directly conflict with anything less than 4.07
Oh but that can't be done, sorry, never mind
no indeed, the conditions don't help I think
Coq should conflict with flambda < 4.07
this is ocaml-variants.*
even if you don't pass the options, you'll have trouble
if native is enabled
when < 4.07
so coq-native should conflict with that right?
in >= 4.07 there is no problem with coq-native
other than you pay the price
a bit more price in my experience
Emilio Jesús Gallego Arias said:
Coq should conflict with flambda < 4.07
I see what you mean. That's also the reason why there is no ocaml < 4.07 with flambda in docker-coq
yup, actually the conflict should be of coq-native with flabmda variants < 4.07
in opam
other than that, all the configs are supported
but indeed you pay for what you use
the cost of coq-native + flambda is quite small when compared to the cost of coq-native alone
but still not insignificant, in particular there are some problems with flambda middle-end that impact Coq
and unfortunately as this is a compile-time option, you are stuck with ti
with it
and either one is still a cost you don’t want as default, right?
@Erik Martin-Dorel in any case, this is the other reason we’d want +flambda-native images as default: best performance
well, in my build setup, I always call coqc with coqc -native off
except when releasing
so I don't care too much, it is like dune will only build OCaml byte objects when you call @check
Paolo Giarrusso said:
Erik Martin-Dorel in any case, this is the other reason we’d want +flambda-native images as default: best performance
I know, currently the default is 4.05.0 ...
as the default outside 4.05
Yes.
regarding opam indeed you have no control over what compiler the user installs, so coq-native should conflict with broken compilers
Enrico Tassi said:
I'm trying to add a job to the opam archive so that we can test native when people submit packages https://github.com/coq/opam-coq-archive/pull/1504
can someone remind me of a package that fails? (to test the infrastructure)
I looked at the logs of coq-graph-theory
: it fails with 4.05, but with 4.07 it apparently works, so the failure is not (only) related to coq-native.
with coq makefile you could hack this by having a make dev
call, that would explictly add -native off
and having make do the full build
@Enrico Tassi can you paste the link to the log please?
This is the link given by @Christian Doczkal yesterday https://github.com/coq-community/graph-theory/pull/12/checks?check_run_id=1530975359
Grazie
That's a low ulimit
Question: is there a package known to fail with coq-native on 4.07 ?
@Enrico Tassi have a look at Coq's gitlab file, we already bump ulimit there otherwise flambda wouldn't work
@Enrico Tassi failure is relative here as it usually comes from a) not enough memory b) low ulimit
@Emilio Jesús Gallego Arias to sum up what you said, we should mark coq-native conflicts with +flambda < 4.07 (but keep it available for 4.05 I guess)
so depending on your worker setup you can get different results
I do recall seeing a true native-compiler bug somewhere
but that's rare
This may be a suggestion for the docker images or the actions, we can't ask users of coq community templates to bump the stack
@Erik Martin-Dorel yes, in principle it should be available in 4.05, however if you ask me, and based on my memories of discussions about OCaml bugs, I'd just require 4.07.1 for coq-native as the minimum version
there was this flambda bug, but there were other problems IIRC
@Enrico Tassi I meant for the CI
IIRC the ocaml bug you had mentioned dealt with 4.06+flambda
stack is a big problem for users, as even different distros carry different defaults :S
Maybe we could be "bold" and say that Coq 8.13 recommends 4.07? (and drop 4.05 images, and co)
@Erik Martin-Dorel there were more IIRC
4.05 images without coq native could be very useful to test against lower bounds
and even with coq-native
in principle it should work but I seem to recall there was some other problems
it is basically untested
have the configure script accept 4.05..4.07, but warn, and be happy with >= 4.07 ?
so you would be the beta-tester for 4.05 + native [except for the stdlib]
no idea what you will find
but if we disable coq-native for 4.05.0, we won't be able to have native_compute for coq 8.5 and Coq 8.6 anymore:
* Coq dev: 4.05, 4.07-flambda, 4.10-flambda, 4.11-flambda
* Coq 8.13: 4.05, 4.07-flambda, 4.10-flambda, 4.11-flambda
* Coq 8.12: 4.05, 4.07-flambda, 4.10-flambda, 4.11-flambda
* Coq 8.11: 4.05, 4.07-flambda, 4.10-flambda, 4.11-flambda
* Coq 8.10: 4.05, 4.07-flambda, 4.08-flambda, 4.09-flambda
* Coq 8.9: 4.05, 4.07-flambda, 4.08-flambda, 4.09-flambda
* Coq 8.8: 4.05, 4.07-flambda, 4.08-flambda, 4.09-flambda
* Coq 8.7: 4.05, 4.07-flambda, 4.08-flambda, 4.09-flambda
* Coq 8.6: 4.02
* Coq 8.5: 4.02
* Coq 8.4: 4.02
(this array was taken from https://github.com/coq-community/docker-coq/wiki)
Yeah if you want to support that versions it's fine, just be aware that you will be testing previously untested stuff
so you could find no problems, or a lot of them ;)
I don't think so: as said previously, Coq 8.5 and Coq 8.6 were already precompiled with native
only the stdlib
that's a very small test
I know, but this is precisely the question @Théo Zimmermann raised, regarding backward compatibility
Anyways we should put some pressure to OCaml devs so they fix the bugs that are making OCaml work bad in our use case, or fix our compilation scheme :)
That would make our life much simpler :)
Actually fixing these stack overflows in the OCaml compiler is tedious, but not so complex
(a) is 4.07.0 tested? Emilio always says 4.07.1 (b) how much should we be evolving 8.5 and 8.6?
4.07 was tested by me more extensively
indeed by patching coq to always generate native files
I saw huge memory footrprints here and there
but things seemed to work
provided I increased stack + many GiB of RAM
I added to the next Coq Call the item: "Coq 8.13 require 4.07". From this discussion, we don't lose much by dropping 4.05.
What do we gain tho?
this thread would not have existed, for example .
Umm, I think this thread has little to do with it.
Note that buster is still in 4.05.0
The problem by @Christian Doczkal is that the default image uses 4.05 + native. If you are on 4.07 his package works even with native
So to put all things together, regarding the docker-coq flambda/native packaging, let me recap the plan:
Don't do the same as 1. and 2. for all other Coq versions now, because it would break backward compatibility without any announcement,
but maybe do it in one week;
Submit a PR to mark coq-native conflicts with flambda < 4.07;
Later on, we may submit a PR to enable -flambda-opts "-O3 -unbox-closures"
as soon as the switch is a flambda variant
(BTW Emilio how would you quantify the impact of 6, w.r.t. a "mere" 4.07.1+flambda switch without these two options?)
(post edited; this item added) Add ulimit -a
in the default CI code for informative purpose.
That's a packaging problem, but it has little to do with 4.05
but with enabling native
you could use 4.07 and Christian problem would have been the exact same
on the wrong compile
I'm testing it and it works with 4.07 and coq-native.
if Docker maintainers chose to use a untested setup as the default, what can we do, 4.07.1 is only tested with native in my own workstation, so that's not a so great improvement
I'm testing it and it works with 4.07 and coq-native.
It may work, it may not, for example it could work on 4.07 and fail in 4.08 as the compiler did regress on some internal functions AFAICT
stack usage is different indeed, but not guaranteed
I mean OCaml's stack use
OK, I agree we should test things. But we should decide on what. I'm adding a CI job on the opam archive so that we can detect packages that don't compile with coq-native. Which compiler should I pick?
Good question @Enrico Tassi , that's a hard problem
Consequently docker image will use what is tested.
as far as I know, stack use of OCaml among versions is not stable
there are a few bugs open upstream
so, to my understanding, we know that 4.05 has many problems. Hence my proposal to ditch it.
Emilio Jesús Gallego Arias said:
if Docker maintainers chose to use a untested setup as the default, what can we do, 4.07.1 is only tested with native in my own workstation, so that's not a so great improvement
can you be more specific? was it related to my latest comment recapitulating the plan?
nope @Erik Martin-Dorel , it was an ansewr to enrico
Enrico Tassi said:
The problem by Christian Doczkal is that the default image uses 4.05 + native. If you are on 4.07 his package works even with native
can you show also the URL of the log that succeeds with 4.07 ?
@Enrico Tassi for example 4.08.0 may have [and indeed I think it does] the same problem of excessive memory use
should we ditch it?
memory / stack use for compilation among OCaml versions varies a lot
https://gitlab.com/coq/opam-coq-archive/-/jobs/902857143
(but it is not using docker)
even the workers you get may vary on this, making CI unpredicatable
Emilio Jesús Gallego Arias said:
should we ditch it?
yes
OK thanks Enrico
and OCaml upstream answer was "we don't care, that's not an use case we support"
Enrico Tassi said:
Emilio Jesús Gallego Arias said:
should we ditch it?
yes
Seems a bit weak criteria, in the sense that OCaml version A requires M amount of memory to compile a quite random file, then OCaml version B require M+1 memory to compile that random file, we ditch A ?
What happens if for example version B is more efficient in some other file.
What if my distro has a more restricted stack size?
My view is simple, and in agreement with OCaml devs position on this one: if you wanna use native, get ready to increase memory limits and total memory .
or.... patch the OCaml compiler to fix these memory issues.
I can easily construct a file that will fail in 4.07 too
This is not my point. There are Coq users that don't care which version of OCaml they use, they just want Coq to work. For them, we should chose a good version, and test that version as much as we can. If you wan to compile Coq with a version of OCaml of your choice you can, but in the docker images we provide for testing, and in the binary installers we provide, and in the installation scripts we provide (the platform) we should pick sensible versions.
one gain of requiring >= 4.07: extraction can stop emitting code that is deprecated (https://github.com/coq/coq/issues/11359)
Enrico Tassi said:
This is not my point. There are Coq users that don't care which version of OCaml they use, they just want Coq to work. For them, we should chose a good version, and test that version as much as we can. If you wan to compile Coq with a version of OCaml of your choice you can, but in the docker images we provide for testing, and in the installers we provide (eg the platform) we should pick sensible versions.
I agree, but how this is related to us supporting some versions? We support 4.08 but we don't recommend it.
./configure
could accept but warn. And starting from that we could stop providing images for versions we know don't work well for us to avoid users discover it themselves.
the obvious drumbeat for OCaml version choice is the Coq Platform. Michael settled on 4.07.1 for good reasons.
I personally was not aware of so many (performance/memory) differences in these ocaml compilers. I can only imagine Coq users which don't even know how to code in OCaml.
The example you provide is pretty fragile, you cannot generalize on memory behavior just from one particular file. There are files where 4.05.0 will use less memory. This whole thing is very misguided IMHO.
I was aware, and I warned in the CEP .
But we are derailing here really.
coq-native should be opt-in
for now
I guess I did not understand your warning then
PMP also warned
as of today, even 4.12.0
does require huge amounts of memory in some situations
see the discussion on the OCaml bug tracker
so we cannot fix this by using a particular version
of OCaml, all versions will eat lots of ram and stack with native
that's my understanding
I think PMP and I were pretty clear [and we have been involved in the upstream discussion]
It is great we can try to test a large amount of devs in the CI , I fully support that, and I supported the CEP in the end because even if far from perfect we need to make progress. But unfortunately , unless there is significant work on the OCaml compiler or in our compilation strategy, coq-native will still be quite expensive for the regular user.
Also, we have a way more critical bug which is that the compilation scheme may stop working with newer OCaml versions.
I think it was clear that there were problems. It was not clear to me that 1) docker images would have opt-in by default 2) problems were varying depending on the compiler version and custom settings (like the stack).
Anyway, I think the scenario is clear now.
Yup, that's clear, would be great if we could understand which version is best for native
I think that 4.07 may indeed be better than 4.08-4.10
There is plenty of reports of that problems upstream, see https://github.com/ocaml/ocaml/issues/10072
For what it's worth, I just reran the graph-theory
CI using the mathcom/mathcomp-dev:coq-dev
image, which is based on ocaml-4.07.1+flambda
and I still get a stack overflow: https://github.com/coq-community/graph-theory/runs/1532521296
@Enrico Tassi and @Karl Palmskog If you want to "deprecate" 4.05, this page should be updated: https://coq.inria.fr/opam-using.html
Christian Doczkal said:
For what it's worth, I just reran the
graph-theory
CI using themathcom/mathcomp-dev:coq-dev
image, which is based onocaml-4.07.1+flambda
and I still get a stack overflow: https://github.com/coq-community/graph-theory/runs/1532521296
Hi @Christian Doczkal that's explained because I had only propagated the removal of coq-native
in coqorg/coq:dev
and mathcomp/mathcomp:latest-coq-dev
(manual rebuild), but not mathcomp/mathcomp-dev:coq-dev
yet, sorry… I'll take care of this just now. Thanks for your comment!
(Note: this can be checked thanks to this line:
https://github.com/coq-community/graph-theory/runs/1532521296#step:4:295 )
@Erik Martin-Dorel , the point was that @Enrico Tassi had claimed that the development builds correctly, even with coq-native
enabled, on ocaml-4.07
.
Yes but IIUC, the only difference between the build that fails or not seems related to the default stack size, which could differ in opam-coq-archive CI, and GitHub Container Actions…
As recalled by Emilio with the issue he mentioned from the ocaml bug tracker, this is a typically known bug related to the use of ocamlopt w.r.t. large codes, automatically generated or so.
@Emilio Jesús Gallego Arias BTW do you think it would be useful (without modifying anything for now, but at least for informative purpose) to add the following command somewhere in the default CI scripts?
ulimit -a
Erik Martin-Dorel said:
Hi Christian Doczkal that's explained because I had only propagated the removal of
coq-native
incoqorg/coq:dev
andmathcomp/mathcomp:latest-coq-dev
(manual rebuild), but notmathcomp/mathcomp-dev:coq-dev
yet, sorry… I'll take care of this just now.
FYI @Christian Doczkal the build is now on-going: https://gitlab.com/math-comp/math-comp/-/pipelines/228167390
(so feel free to restart your build involving mathcomp/mathcomp-dev:coq-dev
as soon as this pipeline is completed).
anyway for the next time, this will be automated since I've merged https://github.com/coq-community/docker-coq/pull/21
I guess the number of variables made it hard to identify the culprit with reasonable certainty. I think we should try to find a way to reduce that, at least the number of variables visible by users of standard stuff.
@Erik Martin-Dorel sure, we didn't care so much as of today given that SO where rare, but it cannot hurt. We could even do a check in the native compiler for current stack size, but these issues are kinda tricky in general as there is no agreed upon recipe w.r.t. what the OCaml compiler needs.
Théo Zimmermann said:
Paolo Giarrusso said:
+1 for (A): I had no crystal ball, but I also expected
native-compute
to be 100% opt-in. native-compute support for package foo should probably be a feature request from users of foo who are actually interested.Really? Then once again, the part about changing the default value of the
coqc
-native-compiler
option in Coq would not have been necessary. You are the second participant to this CEP to express this opinion now.
good question, and I've thought about it. Back then, the plan was that native-compute
would be opt-in, but we meant _for users_: this was an experimental feature that impacted you even when you didn't use it. We estimated _some_ pain for maintainers, but it only was creating a second package, and the goal of the discussion was limiting that pain.
my main goal in the design was to get native-compute completely disabled for everybody, since today that's still the correct default (IMNSHO)
also, it's the correct short-term default for me, and the one I've been using for a while (https://github.com/Blaisorblade/opam-overlay/tree/master/packages/coq).
However, I/we anticipated this would make it easier to _enable_ native-compute for interested people, so more people could try it out.
at least I did not anticipate anything like this, but it should absolutely not be surprising that a not-fully-tested feature is not production-quality when tested at scale on millions of lines of code.
TLDR: I suspect this pushback, while unfortunate, is in fact a _success_ of the CEP: it's much easier to try out native-compute and discover if it helps.
I think it's a cool feature, and it's cool "the first 90%" (or the POPL 90%) of the work is in, but the engineering/integration is not trivial, and it has been undersestimated for long. Hopefully now it'll be easier to find and address the bugs that pop up, and exploit it where it already works. It might even count as motivation.
Coming to think of it, not testing (and hence not failing due to) coq-native
in the CI images may actually help people who want to use native compilation: It means that library maintainers will not be encouraged to disable native compilation and then forget about it, locking their users out of native compilation. And people who really want to use native compute with some library can install coq-native
and then tinker around with specific versions of OCaml, custom ulimit
, and whatnot. So thanks @Erik Martin-Dorel for the the quick reaction. CI is green again :smile:
So trying to sum up:
coq-native
packageDo we need to do something about the last point? I see a few potential improvements here:
coq-native
stating that this is an experimental feature and users should expect some failures-native-compiler no
or uninstall coq-native
if the user didn't really intend to run native compilation.coq-native
if they don't want to hear about native compilationAbout the last point, a potential implementation would be to have the CI of the OPam archive complain if any package doesn't contain either a conflict clause with coq-native
or some meta-information telling the package is somewhat expected to be native-able. The CI could suggest to put the conflict clause if the maintainer doesn't know/want to bother about native. It seems that something similar will be needed anyway for the split approach to know which *-native packages to generate or not.
My plan to mitigate 3 is https://github.com/coq/opam-coq-archive/pull/1504, that is to know in advance which packages are problematic for coq-native and tell their upstream authors to either make these package conflict with coq-native or put, upstream, a native-compiler off option.
Of course it would make sense to coordinate the tests we run on the opam archive with the docker images and the platform installers, so that we ship/advertise what we actually test. I did put an item about this at the next Coq call.
I am against anything that would require package maintainers to deal with coq-native
or make any decisions about it. Package maintainers can choose to do so, in order to forestall complaints, but as long as users who install coq-native
are informed of the experimental nature of their setup, I see no point in bothering package maintainers, in particular given that changing the ocaml version and various settings may make a library work with coq-native
even if it doesn't work reliably with different settings.
The plan (to be executed) is to have a bot make a review that suggests a little patch + conflicts: [ "coq-native" ]
when the package you upload is found to not work with coq-native. I'm aware that "work" is not very well defined, hence the coordination I hope to obtain.
Is this plan "too much" from the perspective of a package maintainer publishing on the coq opam archive?
I'm of course not against warning users that install coq-native... on the contrary
Well, I'm relatively relaxed about the bot suggesting a conflicts
clause if the library fails to build with coq-native on all versions of ocaml (or a representative subset suggested by the coq developers) with settings that already try to make things work (i.e., by increasing the stack size). What's less clear to me is whether one should add a conflict clause if the package fails to build on some and succeeds to build on other compilers. In any case, it's good if maintainers won't have to change anything in their repositories, so that compatibility with coq-native is always tested when releasing to the archive. Also, would suggest to only add conflicts for the released
repository. In summary, I suggest the meaning of conflicts: [ "coq-native"]
to be is known to (relibably) fail rather than is not known to reliably work.
Let me add that one really willing to use native with a package that conflicts with coq-native can tell opam to ignore that constraint.
So I'm not super convinced of the meaning you suggest, but well, without deciding what we consider as "supported platforms" it not really obvious to give a meaning to "reliable" either.
Btw, did someone figure out if docker has a constrained stack? (on my PC it's 8K, and it has been like that since I could remember)
It might be due to the GitHub Actions default settings. I haven't looked into that.
I see a few potential improvements here:
- putting an explicit message when installing
coq-native
stating that this is an experimental feature and users should expect some failures- modifying the error message for stack overflows, suggesting to experiment with ulimit or to set
-native-compiler no
or uninstallcoq-native
if the user didn't really intend to run native compilation.
:+1:
@Enrico Tassi As I said, as long as maintainers are not forced to disable (once and for all) native compilation by putting some option into their _CoqProject
file, I'm okay.
@Théo Zimmermann how will native-compile be handled in Nix and the Nix CI? if native conflicts is handled at the opam level, Nix packages will require something separate but analogous, right?
Not sure the conflict is a good idea per-se, in the sense that 99% of failures of packages with coq-native are going to be due to low stack size.
So IMHO we need something that makes people installing coq-native to be aware that they needs lots of RAM and stack
but in principle I wouldn't call that a conflict
This is not so different from some packages already requiring huge amounts of RAM even without native
@Karl Palmskog In Nix, we have never set the -native-compiler
flag explicitly. Since ondemand
is the new default, it means that we preserve backward-compatibility. If there are users that want something similar to coq-native
, it will have to be implemented (it's not the case today).
Emilio Jesús Gallego Arias said:
So IMHO we need something that makes people installing coq-native to be aware that they needs lots of RAM and stack
My problem is that I still don't know what you mean by "a lot".
I've just run unimath with coq-native https://gitlab.com/coq/opam-coq-archive/-/jobs/905112076 on a regular runner.
TTBOMK unimath should be the antichrist for coq-native, but ti works...
How many GB of RAM do you have?
There is no clear definition of "a lot", as it depends on the internals of the OCaml compiler, so it is not even stable by versions. For example some code will stress the register allocation, some other will stress the constant propagation. Large modules tend to be a problem too. I'm not sure why Unimath should be bad for coq-native, likely the opposite.
But indeed code with lots of modules for example tend to make the OCaml compiler go hungry.
But that could change next version; anyways I'm not trying to figure out the heuristic, but my point is that packages that don't work, they fail because of the memory setup, so that is not an intrinsic conflict, but more of a circumstantial one.
As stack size is not even consistent among distros of Linux
I'm not sure why Unimath should be bad for coq-native, likely the opposite.
Shitload amount of huge Defined proofs.
I can't find it again but there was a bench with coq-native on, and unimath was the worst slowdown IIRC.
Maybe proofs there are already indeed filtered as to be manageable?
Could be slow but not take a lot of RAM
YMMV
tho, I'm gonna finish the coqnative
patch for Dune, this way Dune will track ocamlc
invocations including memory and time
so we can get that data precisely
UniMath is a very good outlier on benches because it's literally the one development that consistently goes against all recommended good practices in Coq dev.
In the past 2 days I've been trying to reproduce these native failures users are experienciong and we could foresee... and I'm failing miserably (which is a good news to me, but I'm frustrated).
I tried to run Coq master on 4.07 and 4.09 with and without flambda with coq-native installed on unimath and fiat-parsers... and it works on a worker with 4G ram and 8K stack. What am I doing wrong? (I'm not using docker, which may be a relevant, and I did not test 4.05).
https://github.com/coq/opam-coq-archive/pull/1504
I've since disabled the native compiler on fiat-parsers, I can point you at a commit before that if you like.
Last updated: Jun 08 2023 at 04:01 UTC