@Cyril Cohen let's say we converted our MathComp subtokenizer from Python to OCaml. Could this tool be something that lives in the MathComp repo? In a sense, it encodes knowledge about MathComp naming conventions and is the only part of Roosterize that is MathComp-specific.
For benefit of others, some example inputs and outputs for the subtokenizer:
[exp; g; 1; n]
[mul; I; g]
I would be nice, but I have trouble understanding which of
[exp; g; l; n] and
[expg; l; n] is the right tokenization.
well, I guess it is sometimes better to be consistent than to be right, so one would pick one of the them
Last updated: May 31 2023 at 09:01 UTC