@Cyril Cohen let's say we converted our MathComp subtokenizer from Python to OCaml. Could this tool be something that lives in the MathComp repo? In a sense, it encodes knowledge about MathComp naming conventions and is the only part of Roosterize that is MathComp-specific.
For benefit of others, some example inputs and outputs for the subtokenizer:
expg1n
, [exp; g; 1; n]
mulIg
, [mul; I; g]
I would be nice, but I have trouble understanding which of [exp; g; l; n]
and [expg; l; n]
is the right tokenization.
well, I guess it is sometimes better to be consistent than to be right, so one would pick one of the them
Last updated: May 31 2023 at 09:01 UTC