for funding and proof engineering purposes, could we compute some measure of the size (most easily, in terms of LOC) of the platform releases? The easiest approach would probably be to use cloc, which has OCaml and Coq support and can be invoked in the root directory of an arbitrarily deep directory structure.
motivation: Paulson and his collaborators often claim that the Isabelle AfP is "the largest repository of mechanised proofs" (e.g., here)
Yes, computing the locs wouldn't be hard. E.g. the script which creates the smoke test kit retrieves the sources of all primary packages via opam. I could use the same script and count locs there.
Give me a few minutes ...
Here you go
source-statistics$ cloc *
9081 text files.
7453 unique files.
2447 files ignored.
github.com/AlDanial/cloc v 1.90 T=13.18 s (505.7 files/s, 195677.7 lines/s)
-----------------------------------------------------------------------------------
Language files blank comment code
-----------------------------------------------------------------------------------
Coq 4602 233957 158330 1738065
OCaml 876 24994 31061 111873
C 351 15523 18845 87204
TeX 81 6197 1946 44689
Bourne Shell 77 2525 2589 17908
Markdown 168 4184 0 16952
C/C++ Header 115 3057 7425 9240
SVG 2 1 1 8539
make 108 1617 973 5103
HTML 10 122 105 4166
C++ 10 550 900 3117
YAML 54 460 208 2867
Assembly 60 362 2562 1641
CSS 12 313 100 1247
Python 17 231 145 1180
Lisp 11 181 112 1018
Nix 31 127 154 820
Bourne Again Shell 24 126 231 633
JavaScript 12 100 232 521
XML 1 3 0 296
Haskell 3 48 5 267
Verilog-SystemVerilog 11 28 0 237
m4 1 36 4 233
Lua 2 25 23 224
Perl 3 47 32 214
JSON 2 0 0 176
awk 3 5 1 93
Dockerfile 5 9 0 68
INI 1 0 0 38
sed 4 1 25 38
Bazel 1 18 0 35
DOS Batch 5 10 0 24
diff 1 3 10 11
C Shell 1 0 0 4
TOML 1 0 3 3
reStructuredText 1 32 127 2
-----------------------------------------------------------------------------------
SUM: 6667 294892 226149 2058746
-----------------------------------------------------------------------------------
I will commit a script which just creates a folder with all sources (in subfolders for each package).
The above is for 2021.09 8.13
This includes all packages which have been explicitly installed via opam (--installed-roots
).
OK, so bottom line:
shouldn't there be more rst if it includes coq?
does it include Coq itself? I thought not.
With 876 OCaml files, I suppose it does.
Ehm, sorry, Coq itself is not included - I filtered for coq-
. So it also doesn't include things like gappa or elpi.
A moment ...
in comparison, Isabelle AfP is currently ~3.1M LOC: https://www.isa-afp.org/statistics.html
Do you know how they compute their numbers of LOC (e.g. do they eliminate comments and blank lines)?
Should I keep CoqIDE, elpi, gappa, menhir?
Théo Zimmermann said:
Do you know how they compute their numbers of LOC (e.g. do they eliminate comments and blank lines)?
don't know unfortunately, but pretty sure they eliminate blank lines at least, https://link.springer.com/chapter/10.1007/978-3-319-20615-8_1
“code” refers to definitions and proofs (including comments but not empty lines)
OK, here is an update which adds coq, coqide, gappa, elpi and menhir:
coq-platform-2021.09$ cloc source-statistics/
19061 text files.
12413 unique files.
8261 files ignored.
github.com/AlDanial/cloc v 1.90 T=16.34 s (664.2 files/s, 201435.4 lines/s)
-----------------------------------------------------------------------------------
Language files blank comment code
-----------------------------------------------------------------------------------
Coq 7117 278890 185598 1962558
OCaml 1939 66361 72064 340015
C 358 15741 19047 89299
TeX 107 7342 2302 50496
Markdown 208 6345 0 24638
reStructuredText 109 14088 17064 21499
Bourne Shell 235 3408 3091 21047
C++ 37 1509 1540 11485
C/C++ Header 139 3382 7757 10562
SVG 2 1 1 8539
make 119 1764 1085 5918
HTML 17 193 105 4405
Python 44 1203 1483 4043
YAML 59 594 279 3825
CSS 24 549 256 2606
Assembly 60 362 2562 1641
Lisp 13 240 156 1242
Nix 61 196 186 1180
m4 4 270 23 1168
diff 3 9 27 984
XML 5 29 7 876
Standard ML 67 285 0 818
NAnt script 1 215 0 712
Bourne Again Shell 25 132 233 681
JavaScript 15 118 264 607
CSV 1 0 0 501
Verilog-SystemVerilog 19 34 0 269
Haskell 3 48 5 267
ANTLR Grammar 33 59 16 225
Lua 2 25 23 224
Perl 3 47 32 214
JSON 2 0 0 176
Dockerfile 6 25 22 104
awk 3 5 1 93
DOS Batch 6 37 12 90
INI 1 0 0 38
sed 4 1 25 38
Bazel 1 18 0 35
C Shell 1 0 0 4
TOML 1 0 3 3
Windows Resource File 1 0 0 1
-----------------------------------------------------------------------------------
SUM: 10855 403525 315269 2573126
-----------------------------------------------------------------------------------
If you want something else, please give my your package name regexp :-)
I guess all in all it should be comparable - this is not all of Coq (e.g. MetaCoq is not yet in).
well, it at least shows that released
for 8.13 is very likely larger than AfP right now (2M LOC platform, then a lot of other stuff)
So, you really want me to open a PR to add .elpi support to cloc...
Perl... maybe not today
@Karl Palmskog : it wouldn't be hard to run cloc over everything in coq-released, it just would take a while.
Indeed I believe that Coq Platform is only a rather small part of it. But then for fairness I think one should only count stuff which compiles with 8.13.
if we wanted to measure "all" of released
for 8.13, an alternative to diving into complex opam search parameters would be to use this as a guide: https://coq-bench.github.io/clean/Linux-x86_64-4.12.0-2.0.8/released/
@Karl Palmskog : this doesn't look accurate. E.g. coq-aac-tactics version 8.13.0 is compatible with Coq 8.13.2, but the table says "NC" - if I interpret it right.
That is because it is marked as incompatible with OCaml 4.12.0.
@Guillaume Melquiond thanks! This looks better: (https://coq-bench.github.io/clean/Linux-x86_64-4.10.2-2.0.6/released/)
@Karl Palmskog: Can one have the contents as e.g. XML?
There seems to be a lot of stuff which stopped at 8.10 ...
CC @Guillaume Claret
So I have this JS snippet to get the list of working packages for a given Coq version:
console.log(JSON.stringify(
[...document.querySelectorAll("td.success a")]
.map(a => a.href.slice(a.baseURI.length).split("/"))
.filter(([coqVersion,,]) => coqVersion === "8.13.2")
.map(([,name, version]) =>
(name + "." + version.slice(0, version.length - ".html".length)).replaceAll("%2B", "~")
),
null, 2
))
where 8.13.2
should be replaced by the Coq version number
it should be run into the JS console of the page https://coq-bench.github.io/clean/Linux-x86_64-4.10.2-2.0.6/released/ for example
I have got the output:
[
"aac-tactics.8.13.0",
"addition-chains.0.4",
"almost-full.8.13.0",
"approx-models.1.0",
"autosubst.1.7",
"bbv.1.2",
"bignums.8.13.0",
"bits.1.1.0",
"buchberger.8.13.0",
"ceres.0.4.0",
"chapar.8.13.0",
"chick-blog.1.0.1",
"coinduction.1.0",
"color.1.8.1",
"comp-dec-modal.1.0",
"compcert.3.8",
"compcert.3.9",
"compcert-32.3.8",
"compcert-32.3.9",
"coq2html.1.0",
"coq2html.1.1",
"coq2html.1.2",
"coqffi.1.0.0~beta5",
"coqffi.1.0.0~beta6",
"coqffi.1.0.0~beta7",
"coqoban.8.13.0",
"coqprime.1.0.6",
"coquelicot.3.1.0",
"coquelicot.3.2.0",
"cunit.1.0.0",
"deriving.0.1.0",
"dijkstra.0.1.0",
"dpdgraph.0.6.9",
"elpi.1.8.1",
"elpi.1.9.0",
"elpi.1.9.1",
"elpi.1.9.2",
"elpi.1.9.3",
"elpi.1.9.4",
"elpi.1.9.5",
"elpi.1.9.6",
"elpi.1.9.7",
"elpi.1.10.0",
"elpi.1.10.1",
"elpi.1.10.2",
"elpi.1.10.3",
"elpi.1.11.0",
"equations.1.2.3~8.13",
"equations.1.2.4~8.13",
"equations.1.3~beta2~8.13",
"equations.1.3~beta~8.13",
"error-handlers.1.0.0",
"error-handlers.1.1.0",
"error-handlers.1.1.1",
"error-handlers.1.2.0",
"ext-lib.0.11.3",
"extructures.0.2.2",
"extructures.0.3.0",
"flocq.3.2.1",
"flocq.3.3.0",
"flocq.3.3.1",
"flocq.3.4.0",
"flocq.3.4.1",
"flocq.3.4.2",
"fourcolor.1.2.3",
"freespec-core.0.3",
"freespec-exec.0.3",
"freespec-ffi.0.3",
"function-ninjas.1.0.0",
"gaia.1.12",
"gappa.1.4.5",
"gappa.1.4.6",
"geometric-algebra.0.8.12",
"goedel.8.12.0",
"goedel.8.13.0",
"graph-theory.0.8",
"graph-theory.0.9",
"hammer.1.3.1~8.13",
"hammer-tactics.1.3.1~8.13",
"hierarchy-builder.1.0.0",
"hierarchy-builder.1.1.0",
"hierarchy-builder-shim.1.1.0",
"high-school-geometry.8.13.0",
"hott.8.13",
"http.0.1.0",
"huffman.8.13.0",
"hydra-battles.0.4",
"infotheo.0.3",
"infotheo.0.3.1",
"infotheo.0.3.2",
"infotheo.0.3.3",
"interval.4.1.0",
"interval.4.1.1",
"interval.4.2.0",
"interval.4.3.0",
"io.2.0.0",
"io.2.1.0",
"io.3.0.0",
"io.3.1.0",
"io.4.0.0",
"io-hello-world.1.2.0",
"io-list.1.0.0",
"io-list.1.1.0",
"io-system.2.1.0",
"io-system.2.3.0",
"io-system.2.4.1",
"io-system-ocaml.2.3.1",
"iris.3.4.0",
"iris-heap-lang.3.4.0",
"iris-string-ident.0.1.0",
"itauto.8.13",
"iterable.1.0.0",
"itree.3.2.0",
"itree.4.0.0",
"itree-io.0.1.0",
"jmlcoq.8.13.0",
"jsast.2.0.0",
"libhyps.1.0.1",
"libhyps.1.0.2",
"libhyps.2.0.1",
"libhyps.2.0.2",
"list-plus.1.0.0",
"list-plus.1.1.0",
"list-string.2.0.0",
"list-string.2.1.0",
"list-string.2.1.1",
"list-string.2.1.2",
"math-classes.8.13.0",
"mathcomp-abel.1.0.0",
"mathcomp-algebra.1.12.0",
"mathcomp-analysis.0.3.5",
"mathcomp-analysis.0.3.6",
"mathcomp-analysis.0.3.7",
"mathcomp-analysis.0.3.8",
"mathcomp-analysis.0.3.9",
"mathcomp-analysis.0.3.10",
"mathcomp-bigenough.1.0.0",
"mathcomp-character.1.12.0",
"mathcomp-dioid.0.1",
"mathcomp-field.1.12.0",
"mathcomp-fingroup.1.12.0",
"mathcomp-finmap.1.5.1",
"mathcomp-multinomials.1.5.4",
"mathcomp-odd-order.1.12.0",
"mathcomp-real-closed.1.1.2",
"mathcomp-solvable.1.12.0",
"mathcomp-ssreflect.1.12.0",
"mathcomp-zify.1.0.0~1.12~8.13",
"menhirlib.20200525",
"menhirlib.20200612",
"menhirlib.20200619",
"menhirlib.20200624",
"menhirlib.20201122",
"menhirlib.20201201",
"menhirlib.20201214",
"menhirlib.20201216",
"menhirlib.20210310",
"menhirlib.20210419",
"metacoq-pcuic.1.0~beta2~8.13",
"metacoq-template.1.0~beta2~8.13",
"metacoq-translations.1.0~beta2~8.13",
"min-imports.1.0.0",
"min-imports.1.0.1",
"min-imports.1.0.2",
"moment.1.1.0",
"moment.1.2.0",
"monae.0.3",
"monae.0.3.1",
"monae.0.3.2",
"monae.0.3.3",
"monae.0.3.4",
"mtac2.1.4~8.13",
"ollibs.2.0.1",
"ordinal.0.5.0",
"paco.4.0.4",
"paco.4.1.0",
"paco.4.1.1",
"paramcoq.1.1.2~coq8.13",
"parsec.0.1.0",
"pi-agm.1.2.6",
"pocklington.8.12.0",
"poltac.0.8.12",
"qarith-stern-brocot.8.13.0",
"quickchick.1.5.0",
"record-update.0.3.0",
"reduction-effects.0.1.2",
"reglang.1.1.2",
"relation-algebra.1.7.5",
"robot.0.1",
"rsa.8.8.0",
"semantics.8.11.1",
"simple-io.0.1",
"simple-io.0.2",
"simple-io.1.0.0",
"simple-io.1.5.0",
"smpl.8.13",
"stalmarck.8.13.0",
"stalmarck-tactic.8.13.0",
"stdpp.1.5.0",
"switch.1.0.5",
"tactician.1.0~beta1~8.13",
"tactician-dummy.1.0~beta1",
"tactician-stdlib.1.0~beta1~8.13",
"tlc.20210316",
"topology.9.0.0",
"tree-calculus.1.0.0",
"type-infer.0.1.0",
"unicoq.1.5~8.13",
"unimath.20210807",
"vst.2.7",
"vst.2.7.1",
"vst.2.8",
"vst-32.2.7.1",
"vst-32.2.8",
"waterproof.1.0.0",
"yalla.2.0.2",
"zorns-lemma.9.0.0"
]
@Guillaume Claret : thanks - works nicely! I am not sure, though, if long term I wouldn't prefer a wget of the HTML page + some XSLT.
@Karl Palmskog : there are still some largish developments missing especially from US universities, say CertiCoq, Kami, Vellum, CertiKOS
yes, tons of large projects are missing for sure (although I would not count CertikOS since it's private and proprietary). I think Awesome Coq has the top public ones.
I didn't follow CertiKOS since I left automotive - didn't Zhong Shao plan to open source it at some point? I thought that was a prerequisite for the deepspec funding.
well, if someone can point to a publicly inspectable CertikOS version, I will withdraw my "private and proprietary" comment above
If you google it, you find e.g. http://flint.cs.yale.edu/certikos/publications/device/ or https://github.com/npe9/certikos. But this doesn't look endorsed if you ask me. The first link is to the flint group webpage, though.
The license is not open source, though (Yale non commercial, I guess similar to CompCert).
the npe9 link is seemingly for Coq 8.4, so that stretches "public" quite a bit.
That's what I meant with "looks not endorsed".
Last updated: Jun 03 2023 at 03:01 UTC