Stream: Coq Platform devs & users

Topic: Platform stats


view this post on Zulip Karl Palmskog (Feb 17 2023 at 05:04):

One reason we need to publish continuous statistics about the Coq ecosystem: https://youtu.be/AaE1mV27XxA?t=1060

per latest stats, Isabelle AFP is at something like 3,75M LOC, and Mathlib at something like 1,1M LOC. Yet the perception to someone like Harrison is apparently that they are at parity or beyond what Coq has. My estimate is that the extended MathComp ecosystem is probably close to Mathlib, and Coq ecosystem probably at several million lines above AFP. But of course, you can play around with definitions what is an "archive" of formal proofs.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 05:07):

sources:

view this post on Zulip Guillaume Melquiond (Feb 17 2023 at 10:30):

Yes, thecoq-mathcomp Opam packages (at least those compatible with Coq 8.16) amount to 0.9 Mloc.

view this post on Zulip Michael Soegtrop (Feb 17 2023 at 12:09):

For all explicitly installed packages in Coq Platform 8.16.1 the stats are:

-----------------------------------------------------------------------------------
Language                         files          blank        comment           code
-----------------------------------------------------------------------------------
Coq                               9354         354270         223029        2608675

view this post on Zulip Michael Soegtrop (Feb 17 2023 at 12:12):

There is a script here https://github.com/coq/platform/blob/main/maintainer_scripts/count_lines_of_code.sh which copies all source code of all packages into a subfolder - one can then run cloc or other tools on it.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 12:14):

at least Platform releases are well-defined and for single Coq version. Similar to Coq's CI. For the opam archive, the story is less clear, e.g., one would have to ask opam's inference mechanism to figure out all package claimed to be compatible with a certain Coq version.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 12:15):

if I remember correctly Coq's CI is around 2.5M LOC, but has a lot of overlap with the Platform

view this post on Zulip Michael Soegtrop (Feb 17 2023 at 12:20):

It would be an interesting exercise to try and compile all coq-* packages which claim to be compatible with latest Coq.

view this post on Zulip Michael Soegtrop (Feb 17 2023 at 12:21):

@Karl Palmskog : if you have a suggestion on where to publish it, I can fur sure publish the data for Coq Platform releases.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 12:24):

@Michael Soegtrop how about just including the basic LOC from above (cloc with some small polish) in the release notes for releases? E.g., a section "Statistics" at the bottom here: https://github.com/coq/platform/releases/tag/2022.09.1

view this post on Zulip Karl Palmskog (Feb 17 2023 at 12:25):

then we could link to this directly, and it would also end up on Zenodo if we have something there. To me at least, it would suffice to have LOC for the "latest" package pick.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 12:27):

for example, a machine learning researcher could then write:

We train our model on the Coq Platform 2022.09.1 with the package pick for 8.16.1, which has 2.6M LOC [CITE]

view this post on Zulip Guillaume Melquiond (Feb 17 2023 at 13:24):

Karl Palmskog said:

at least Platform releases are well-defined and for single Coq version. Similar to Coq's CI. For the opam archive, the story is less clear, e.g., one would have to ask opam's inference mechanism to figure out all package claimed to be compatible with a certain Coq version.

You can do the following:

opam list 'coq-*' -s --columns=package --coinstallable-with=coq.8.16.1

view this post on Zulip Guillaume Melquiond (Feb 17 2023 at 13:33):

This gives a total of 10Mloc. But some curating would be needed to get accurate numbers. For example, Flocq is counted more than ten times due to vendoring.

view this post on Zulip Guillaume Melquiond (Feb 17 2023 at 14:50):

I did not know, but cloc is actually able to do some curating on its own, which gives about 4.2Mloc.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 14:53):

OK, so just the opam archive for 8.16.1 seems to have more LOC than AFP by 0.5M LOC or so. But then we have many projects, e.g., from Coq's CI, that are never packaged (or have only been packaged for earlier versions)

view this post on Zulip Guillaume Melquiond (Feb 17 2023 at 14:58):

To be fair, AFP and Mathlib have a rather high bar on entry, so we should not count Coq developments that were never packaged or are no longer supported.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 15:01):

not even if they are in CI? I'm thinking of projects like https://github.com/mit-pdos/perennial

view this post on Zulip Guillaume Melquiond (Feb 17 2023 at 15:09):

This is a good example of why Coq's CI is a poor metric. The repository/branch you point is only compatible with Coq 8.13, opam-wise. So, users will have to go through hoops and loops to install it.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 15:14):

for something like machine learning applications, you can just put some dev time into custom builds on top of existing repos. Whatever you cobble together is your corpus... but for a paper, I would probably split a table with opam vs. CI and have a row with the total. Hard to complain scientifically then.

view this post on Zulip Karl Palmskog (Feb 17 2023 at 15:16):

anyway, these are the claims we are seeing now (https://arxiv.org/pdf/2006.09265.pdf):

We build a non-synthetic dataset from the largest repository of proofs written by human experts in a theorem prover.

view this post on Zulip Théo Zimmermann (Feb 17 2023 at 15:47):

Well, it all depends what you call a "repository".

view this post on Zulip Lasse Blaauwbroek (Feb 18 2023 at 07:51):

Michael Soegtrop said:

It would be an interesting exercise to try and compile all coq-* packages which claim to be compatible with latest Coq.

I semi-regularly do something similar to this for benchmarking purposes with Tactician. I try to calculate the largest self-consistent set of Coq packages. This means that they not only have to be co-installable with Coq like @Guillaume Melquiond did above, but also co-installable with each other. This is actually quite a complicated calculation to make. With some fiddling you can get Opam to do an approximate version of this if you use the Aspcud custom solver. I think such a notion fits nicely within the concept of 'library of mathematics', because everything is compatible with each other.

I did this last for Coq 8.16 a few weeks ago. At that time, I got the following list of packages (the situation has changed a little since then, but not much):

coq-elpi.1.16.0 coq-actuary.2.5 coq-algorand.1.4 coq-approx-models.1.0 coq-async-test.0.1.0 coq-bbv.1.3 coq-bedrock2.0.0.4 coq-bedrock2-compiler.0.0.4 coq-belgames.1.0.0 coq-bertrand.8.12.0 coq-bignums.8.16.0 coq-bits.1.1.0 coq-bonsai.1.0.0 coq-buchberger.8.16.0 coq-category-theory.1.0.0 coq-cds4ltl.1.0.0 coq-ceres.0.4.0 coq-cfml.20220112 coq-cfml-basis.20220112 coq-cfml-stdlib.20220112 coq-cheerios.dev coq-coinduction.1.6 coq-coinduction-examples.1.5 coq-color.1.8.3 coq-compcert.3.11 coq-coq2html.1.3 coq-coqeal.1.1.1 coq-coqoban.8.13.0 coq-coqprime.1.2.0 coq-coqprime-generator.1.1.1 coq-coquelicot.3.3.0 coq-coqutil.0.0.2 coq-corn.8.16.0 coq-cunit.1.0.0 coq-deriving.0.1.0 coq-dijkstra.0.1.0 coq-disel.2.3 coq-disel-examples.2.3 coq-dpdgraph.1.0+8.16 coq-aac-tactics.8.16.0 coq-equations.1.3+8.16 coq-error-handlers.1.2.0 coq-ext-lib.0.11.7 coq-extructures.0.3.1 coq-extructures-instances.dev coq-fcsl-pcm.1.7.0 coq-fiat-crypto.0.0.17 coq-file-sync.0.1.0 coq-flocq.4.1.0 coq-formalv-check_range.1.1.0 coq-formalv-prim63_mathcomp.1.1.0 coq-formalv-time.1.1.0 coq-fourcolor.1.2.5 coq-function-ninjas.1.0.0 coq-gaia-numbers.1.15 coq-gaia-ordinals.1.15 coq-gaia-schutte.1.15 coq-gaia-stern.1.15 coq-gaia-theory-of-sets.1.15 coq-gappa.1.5.2 coq-geometric-algebra.0.8.12 coq-ground.0.01.0 coq-hammer.1.3.2+8.16 coq-hammer-tactics.1.3.2+8.16 coq-haskell.1.1.0 coq-hierarchy-builder.1.4.0 coq-hott.8.16 coq-htt.1.2.0 coq-http.0.2.0 coq-huffman.8.15.0 coq-inconsequential_modus_ponens.1.0.6 coq-inf-seq-ext.dev coq-infotheo.0.5.0 coq-interval.4.6.1 coq-iris.4.0.0 coq-iris-heap-lang.4.0.0 coq-itauto.8.16.0 coq-iterable.1.0.0 coq-itree.5.1.0 coq-itree-extra.5.1.0 coq-itree-io.0.1.0 coq-jmlcoq.8.15.0 coq-jsast.3.0.0 coq-json.0.1.2 coq-kruskal-trees.1.0 coq-libhyps.2.0.6 coq-library-complexity.1.0+8.16 coq-library-undecidability.1.0.1+8.16 coq-linearscan.1.1.0 coq-list-plus.1.1.0 coq-list-string.2.1.2 coq-math-classes.8.15.0 coq-mathcomp-abel.1.2.1 coq-mathcomp-algebra.1.15.0 coq-mathcomp-analysis.0.6.0 coq-mathcomp-apery.1.0.2 coq-mathcomp-bigenough.1.0.1 coq-mathcomp-character.1.15.0 coq-mathcomp-classical.0.6.0 coq-mathcomp-field.1.15.0 coq-mathcomp-fingroup.1.15.0 coq-mathcomp-finmap.1.5.2 coq-mathcomp-multinomials.1.5.5 coq-mathcomp-odd-order.1.14.0 coq-mathcomp-real-closed.1.1.3 coq-mathcomp-solvable.1.15.0 coq-mathcomp-ssreflect.1.15.0 coq-mathcomp-zify.1.2.0+1.12+8.13 coq-matrix.1.0.2 coq-menhirlib.20220210 coq-metacoq.1.1.1+8.16 coq-metacoq-erasure.1.1.1+8.16 coq-metacoq-pcuic.1.1.1+8.16 coq-metacoq-safechecker.1.1.1+8.16 coq-metacoq-template.1.1.1+8.16 coq-metacoq-translations.1.1.1+8.16 coq-metalib.dev coq-min-imports.1.0.2 coq-mmaps.1.0 coq-moment.1.2.1 coq-mtac2.1.4+8.16 coq-ollibs.2.0.3 coq-ott.0.33 coq-paco.4.1.2 coq-paramcoq.1.1.3+coq8.16 coq-parsec.0.1.2 coq-pi-agm.1.2.6 coq-pocklington.8.12.0 coq-poltac.0.8.12 coq-prosa.0.5 coq-qarith-stern-brocot.8.14.0 coq-quantumlib.1.1.0 coq-quickchick.1.6.4 coq-record-update.0.3.0 coq-reduction-effects.0.1.4 coq-regexp-brzozowski.1.0 coq-reglang.1.1.3 coq-relation-algebra.1.7.8 coq-rewriter.0.0.7 coq-riscv.0.0.3 coq-rupicola.0.0.6 coq-semantics.8.14.0 coq-simple-io.1.8.0 coq-smpl.8.16 coq-stalmarck.8.16.0 coq-stalmarck-tactic.8.16.0 coq-stdpp.1.8.0 coq-struct-tact.dev coq-sudoku.8.16.0 coq-tlc.20211215 coq-type-infer.0.1.0 coq-unicoq.1.6+8.16 coq-unimath.20220816 coq-verdi.dev coq-verdi-raft.dev coq-vst.2.11.1 coq-vst-iris.2.11.1 coq-vst-zlist.2.11

I don't know how many lines of code this is (but somebody could easily calculate this I guess). I do know that all these packages compile successfully and have approximately 197k lemmas, where a 'lemma' is defined as any definition that is defined using the Ltac1 proof mode (too many notions of 'define' here...).

view this post on Zulip Lasse Blaauwbroek (Feb 18 2023 at 08:26):

Note that with lines of code you have to be quite careful because many packages share the same repository. So you can easily end up double-counting code (not to mention that some repositories vendor in other projects).

view this post on Zulip Lasse Blaauwbroek (Feb 18 2023 at 10:36):

For full transparency: i do exclude a few packages or package versions for various reasons. This is an evolving list, but currently it looks like this:

let excluded_names =
  ["coq-tactician"; "coq-tactician-stdlib"; "coq-tactician-dummy"
  (* part of Tactician *)
  ; "coq-gaia"
  (* has been split into sub-packages *)
  ; "coq-hierarchy-builder-shim"
  (* empty shim that conflicts with original *)
  ; "coq-compcert-64"; "coq-vst-64"
  (* these are old, coq-compcert and coq-vst are now 64b, while coq-compcert-32 and coq-vst-32 are 32b *)
  ; "coq-compcert-32"; "coq-vst-32"
  (* these have a 99% overlap with coq-compcert and coq-vst, hence we exlude them *)
  ; "gappa"
  (* renamed to coq-gappa *)
  ; "coq-engine-bench"; "coq-engine-bench-lite"; "coq-performance-tests"; "coq-performance-tests-lite"
  (* benchmarks *)
  ; "coq-fiat-crypto-legacy"; "coq-fiat-crypto-legacy-extra"; "coq-fiat-core"; "coq-fiat-crypto-with-bedrock"; "coq-fiat-parsers"
  (* seems to all be part of coq-fiat-crypto *)
  ; "coq-flocq3"
  (* legacy package, replaced by coq-flocq *)
  ; "coq-rewriter-perf-Fast"; "coq-rewriter-perf-Medium"; "coq-rewriter-perf-Slow"; "coq-rewriter-perf-SuperFast"; "coq-rewriter-perf-VerySlow"
  (* Some benchmark *)
  ; "menhirLib"; "menhirSdk"; "menhir"
  (* coq-menhirlib should cover this *)
  ; "coq-serapi"
  (* not a Coq development *)
  ; "coq-goedel"
  (* is now part of coq-hydra-battles *)
  ; "coq-io-list"
  (* is now part of coq-io *)
  ; "coq-metacoq-checker"
  (* replaced by coq-metacoq-safechecker *)
  ]

let exclude_versions =
  [ "coq-albert", "dev"; "coq-formal-topology", "dev"; "coq-sf-plf", "dev" (* too old and unmaintained *)
  ; "coq-fiat-crypto", "0.0.13"; "coq-tree-calculus", "1.0.0" (* currently broken *) ]

view this post on Zulip Michael Soegtrop (Feb 20 2023 at 08:50):

@Lasse Blaauwbroek : thanks, interesting! I would be interested in your exact opam procedures (that is a script) for maintaining Coq Platform. Maybe it would even be possible to make a dynamic pick out of this.

For lines of code counting I am using cloc on the complete set in one call. This does eliminate identically duplicate files, but not slightly modified files. One could for sure do better by looking at say unique 10 line blocks or so.

view this post on Zulip Michael Soegtrop (Feb 20 2023 at 08:52):

A note on your exclusion list: gappa and coq-gappa are different things. gappa is a proof generator written in C++ and coq-gappa is a plugin which handles goals by calling gappa. Usually one installs both. But they share the same source tree.

view this post on Zulip Lasse Blaauwbroek (Feb 20 2023 at 10:13):

Thanks for the note on gappa. I removed it from the exclusion.

The code that makes the calculations is here: https://github.com/coq-tactician/benchmark-system/blob/calculate_benchmark/src/opam_build/package_calculations.ml (Note that it is only present in the calculate_benchmark branch.) You can run this using dune exec opam-build <empty-scratch-directory> (after you've installed the dependencies). Note however that you need to link this with the master version of Opam, because of https://github.com/ocaml/opam/pull/4796 and https://github.com/ocaml/opam/pull/4805. (Opam is being really slow in releasing version 2.2.) The easiest way to do this is to compose the master branch of Opam through Dune. You also need a working version of Aspcud and its solvers. In particular, be careful for this bug: https://github.com/potassco/clasp/issues/85

view this post on Zulip Lasse Blaauwbroek (Feb 20 2023 at 10:22):

(You could probably also rewrite this as a bash script, but I've learned not to write bash scripts longer than ~10 lines. You would still need the latest Opam version though.)

view this post on Zulip Michael Soegtrop (Feb 20 2023 at 16:31):

@Lasse Blaauwbroek : thanks - I will wait until it is supported by release opam then. But I would really appreciate, if you would send me your resulting package list, whenever you create it.

view this post on Zulip Lasse Blaauwbroek (Feb 20 2023 at 17:57):

What do you mean by 'resulting package list' beyond what I pasted above? If you need me to calculate some set, I'll be happy to take a look.

view this post on Zulip Michael Soegtrop (Feb 20 2023 at 18:11):

@Lasse Blaauwbroek : I just meant that whenever you go through this exercise, please keep me posted. The list your gave above is perfect.


Last updated: Mar 28 2024 at 21:01 UTC