We just released our corpus of Coq code based on MC 1.9.0, including both vernacular source (.v
) and serialized machine-readable versions (as sexps) produced by Coq 8.10 and SerAPI: https://github.com/EngineeringSoftware/math-comp-corpus
The initial use of the corpus was for training our also-just-released tool for Coq lemma name generation: https://github.com/EngineeringSoftware/roosterize -- but we expect wide applicability of the corpus in other machine learning tasks, such as Coq formatting learning/prediction.
Suggestions for additional MC-related projects to include in future versions of the corpus would be very welcome.
Many thanks to @Cyril Cohen for his help with project curation and project porting!
Last updated: Jan 29 2023 at 19:02 UTC