In this WG I would like to discuss whether or not the community finds it acceptable to gather statistics about how Coq and its plugins are used by users. I'm not sure if this can best be discussed in an offline or online setting, but let me start off with this message. The notion of 'gathering statistics' can be very broad, so I'd like to split this up into a number of sub-questions. Feel free to add your own concerns.
- Version information
- An anonymous or non-anonymous user-id
- Information on how often and when certain commands are used, how long they take, and whether or not they succeed
- Information about the context in which commands are used:
+ Anonymous statistics about the local and global context, such as the number of hypotheses, number of available lemmas, number of proof steps executed, and size of the goal.
+ Less anonymous statistics about the local and global context, such as lemma names, the syntax trees of proof states, proof terms generated by tactics.
Let me also add some context on why I want to do this: For my plugin Tactician, we are currently conducting a study with the students of a course we are teaching on how the plugin Is used. For this, we automatically gather user statistics. However, it occurs to me that it would be much more useful to do this on a larger scale :-) Otherwise, there is also a system/paper by Talia Ringer that instruments Coq for statistics gathering. https://dependenttyp.es/pdf/analytics.pdf
Thoughts? Opinions?
My opinion is that the gathering of user data should be strictly on an opt-in basis. Coq being a "standalone offline" application users, in particular those working on privacy-respecting operating systems like Linux, should have the reasonable expectation of "no network activity whatsoever" unless explicitly enabled/triggered by the user.
I'd go farther than that: it's fine to gather statistics on data from our CI or ask users to submit their own data willfully, but I ever I see Coq performing a network call I'd just draw out my flamethrower.
In fairness, let me reassure @Lasse that Pierre-Marie doesn't restrict colorful language to this occasion (tho I'm sure he means it)
I think I'd be fine with stats on a strictly opt-in basis with clear and informed consent. Under those conditions you can probably ask for anything, you'll just pay with fewer downloads.
@Paolo Giarrusso I'm aware :-) And I'm also bringing this subject up with the full knowledge that it might solicit violent reactions. @Pierre-Marie Pédrot Do you mean that you'd like any data-submission to happen outside of Coq, or would it be okay to make a network call after the user has opted in?
Also, if we are talking about opting in, would it be acceptable to have a Coq option like Set Gather Statistics
?
I think it would be much more acceptable to be implemented in an external plugin
Yes, the idea would be to implement it in an external plugin, but using Coq's option mechanism.
so this doesn't require any changes to Coq, right?
No
and you don't need permission for technical reasons, this is basically just an opinion poll?
Yes. Given the sensitive nature of such things, I do not want to do this without having some consensus by the Coq devs/the wider community.
And given that I'm the second person wanting to gather statistics after Talia Ringer, it might make sense to set up some ethics document that serves as a guideline of what is acceptable and unacceptable.
FWIW — (1) I work on proprietary Coq proofs, so you _might_ want to account for this scenario — the last option is surely out of bounds for me :-)
(we're probably a minority)
Yes, proprietary software is indeed a concern. (Note that I would also say no to some of the points I brought up in the first post, I just include them for completeness).
(2) I wonder if ethical guidelines on student (which are light but AFAIK exist, at least in Germany) experiments might be useful in this regard. But it's been 8-10 years since I learned anything on the topic
Yes, I also don't know exactly what the guidelines are in the Netherlands. But in our case, we basically explained to the students that (1) Participation is completely voluntary (2) which kinds of statistics are gathered (3) The study would have no effect on their final grade.
Debian had popcon https://popcon.debian.org/ since a long time. At installation time you opt in (our out). I've always found that data interesting and did opt int ;-)
"statistics" is usually about presenting and inferring facts from public information. What is discussed here seems to be mostly telemetry ("the in situ collection of measurements or other data at remote points and their automatic transmission to receiving equipment for monitoring")
I'm happy to change the terminology to 'telemetry'. I've changed the title of the WG to reflect this.
Popcon is fairly different from having a program screening for the arbitarty input of a user.
I actually opted in for popcon, and I wouldn't mind a similar mechanism for opam packages
The difference is that the scope of popcon and of a telemetry tool for Coq are very different.
Also, a Coq plugin would be hard to control, i.e. if it becomes active by simply requiring it is easy to indirectly spy on people
Pierre-Marie Pédrot said:
Also, a Coq plugin would be hard to control, i.e. if it becomes active by simply requiring it is easy to indirectly spy on people
Yes, this is indeed a problem. What I would suggest to solve this is to enable telemetry using a command external to Coq, that will persist the chosen setting somewhere on the users disk. That way a Required
module cannot enable telemetry without the user knowing.
Yes, this would be preferable also for the reason that telemetry should not affect the way things a processed by coq, and thus, similar to Print, Check, and Search, shouldn't be inserted into scripts.
Last updated: Jun 11 2023 at 00:30 UTC