Stream: CUDW 2020

Topic: WG: User telemetry


view this post on Zulip Lasse (Nov 30 2020 at 18:22):

In this WG I would like to discuss whether or not the community finds it acceptable to gather statistics about how Coq and its plugins are used by users. I'm not sure if this can best be discussed in an offline or online setting, but let me start off with this message. The notion of 'gathering statistics' can be very broad, so I'd like to split this up into a number of sub-questions. Feel free to add your own concerns.

  1. Do we want to allow the gathering of user statistics at all?
  2. If yes, should this be (a) opt-in by user action (b) opt-in by system-request (i.e. forcing the user to make a decision by giving a prompt) (c) opt-out (d) mandatory.
  3. If we are gathering statistics, what kind of statistics are acceptable?

    - Version information
    - An anonymous or non-anonymous user-id
    - Information on how often and when certain commands are used, how long they take, and whether or not they succeed
    - Information about the context in which commands are used:
    + Anonymous statistics about the local and global context, such as the number of hypotheses, number of available lemmas, number of proof steps executed, and size of the goal.
    + Less anonymous statistics about the local and global context, such as lemma names, the syntax trees of proof states, proof terms generated by tactics.

Let me also add some context on why I want to do this: For my plugin Tactician, we are currently conducting a study with the students of a course we are teaching on how the plugin Is used. For this, we automatically gather user statistics. However, it occurs to me that it would be much more useful to do this on a larger scale :-) Otherwise, there is also a system/paper by Talia Ringer that instruments Coq for statistics gathering. https://dependenttyp.es/pdf/analytics.pdf

Thoughts? Opinions?

view this post on Zulip Christian Doczkal (Nov 30 2020 at 18:32):

My opinion is that the gathering of user data should be strictly on an opt-in basis. Coq being a "standalone offline" application users, in particular those working on privacy-respecting operating systems like Linux, should have the reasonable expectation of "no network activity whatsoever" unless explicitly enabled/triggered by the user.

view this post on Zulip Pierre-Marie Pédrot (Nov 30 2020 at 18:36):

I'd go farther than that: it's fine to gather statistics on data from our CI or ask users to submit their own data willfully, but I ever I see Coq performing a network call I'd just draw out my flamethrower.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 18:44):

In fairness, let me reassure @Lasse that Pierre-Marie doesn't restrict colorful language to this occasion (tho I'm sure he means it)

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 18:46):

I think I'd be fine with stats on a strictly opt-in basis with clear and informed consent. Under those conditions you can probably ask for anything, you'll just pay with fewer downloads.

view this post on Zulip Lasse (Nov 30 2020 at 18:47):

@Paolo Giarrusso I'm aware :-) And I'm also bringing this subject up with the full knowledge that it might solicit violent reactions. @Pierre-Marie Pédrot Do you mean that you'd like any data-submission to happen outside of Coq, or would it be okay to make a network call after the user has opted in?

Also, if we are talking about opting in, would it be acceptable to have a Coq option like Set Gather Statistics?

view this post on Zulip Kenji Maillard (Nov 30 2020 at 18:49):

I think it would be much more acceptable to be implemented in an external plugin

view this post on Zulip Lasse (Nov 30 2020 at 18:49):

Yes, the idea would be to implement it in an external plugin, but using Coq's option mechanism.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 18:50):

so this doesn't require any changes to Coq, right?

view this post on Zulip Lasse (Nov 30 2020 at 18:50):

No

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 18:51):

and you don't need permission for technical reasons, this is basically just an opinion poll?

view this post on Zulip Lasse (Nov 30 2020 at 18:52):

Yes. Given the sensitive nature of such things, I do not want to do this without having some consensus by the Coq devs/the wider community.

view this post on Zulip Lasse (Nov 30 2020 at 18:53):

And given that I'm the second person wanting to gather statistics after Talia Ringer, it might make sense to set up some ethics document that serves as a guideline of what is acceptable and unacceptable.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 18:54):

FWIW — (1) I work on proprietary Coq proofs, so you _might_ want to account for this scenario — the last option is surely out of bounds for me :-)

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 18:54):

(we're probably a minority)

view this post on Zulip Lasse (Nov 30 2020 at 18:56):

Yes, proprietary software is indeed a concern. (Note that I would also say no to some of the points I brought up in the first post, I just include them for completeness).

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 18:57):

(2) I wonder if ethical guidelines on student (which are light but AFAIK exist, at least in Germany) experiments might be useful in this regard. But it's been 8-10 years since I learned anything on the topic

view this post on Zulip Lasse (Nov 30 2020 at 19:00):

Yes, I also don't know exactly what the guidelines are in the Netherlands. But in our case, we basically explained to the students that (1) Participation is completely voluntary (2) which kinds of statistics are gathered (3) The study would have no effect on their final grade.

view this post on Zulip Enrico Tassi (Nov 30 2020 at 20:45):

Debian had popcon https://popcon.debian.org/ since a long time. At installation time you opt in (our out). I've always found that data interesting and did opt int ;-)

view this post on Zulip Karl Palmskog (Dec 01 2020 at 07:17):

"statistics" is usually about presenting and inferring facts from public information. What is discussed here seems to be mostly telemetry ("the in situ collection of measurements or other data at remote points and their automatic transmission to receiving equipment for monitoring")

view this post on Zulip Lasse (Dec 01 2020 at 09:01):

I'm happy to change the terminology to 'telemetry'. I've changed the title of the WG to reflect this.

view this post on Zulip Pierre-Marie Pédrot (Dec 01 2020 at 09:09):

Popcon is fairly different from having a program screening for the arbitarty input of a user.

view this post on Zulip Pierre-Marie Pédrot (Dec 01 2020 at 09:09):

I actually opted in for popcon, and I wouldn't mind a similar mechanism for opam packages

view this post on Zulip Pierre-Marie Pédrot (Dec 01 2020 at 09:09):

The difference is that the scope of popcon and of a telemetry tool for Coq are very different.

view this post on Zulip Pierre-Marie Pédrot (Dec 01 2020 at 09:12):

Also, a Coq plugin would be hard to control, i.e. if it becomes active by simply requiring it is easy to indirectly spy on people

view this post on Zulip Lasse (Dec 01 2020 at 09:16):

Pierre-Marie Pédrot said:

Also, a Coq plugin would be hard to control, i.e. if it becomes active by simply requiring it is easy to indirectly spy on people

Yes, this is indeed a problem. What I would suggest to solve this is to enable telemetry using a command external to Coq, that will persist the chosen setting somewhere on the users disk. That way a Required module cannot enable telemetry without the user knowing.

view this post on Zulip Christian Doczkal (Dec 01 2020 at 09:21):

Yes, this would be preferable also for the reason that telemetry should not affect the way things a processed by coq, and thus, similar to Print, Check, and Search, shouldn't be inserted into scripts.


Last updated: Oct 16 2021 at 09:07 UTC