Stream: Miscellaneous

Topic: Coq and Lean on StackOverflow


view this post on Zulip Kevin Buzzard (Sep 13 2021 at 11:12):

I'd be interested in the feelings of the Coq community towards SO. Some of us over in the Lean community had a discussion about SO and Scott Morrison (one of the founders of MathOverflow) was maximally pro the idea of getting more SO questions about Lean, his argument being that this is the place where the CS people go to get their questions answered. Indeed if you talk to young people about e.g. how they learnt python they seem to not even learn it from a book nowadays, they find some web resources (or they have to learn it at university and are given some resources) and then when they get stuck on something they just google and find a good SO answer. However my opinion is that this only works because python is an extremely popular language with a massive user base. For theorem provers I am not personally convinced that the SO model works so well. The idea with the SO model is that someone comes up with a good and coherent question which can trip up beginners, an expert writes a very well-worded answer, everything gets a ton of upvotes, and then google notices and starts redirecting people there. My impression of the Coq and Lean SO questions is that they are typically not at all of this nature, many of them only get one or two upvotes and indeed it is hard to imagine that most of them will be helpful for other users. On the other hand I am active poster in the Lean Zulip and a frequent lurker here, and I see people's questions being answered far more efficiently because places like this are a much better place to have a dialogue, which somehow seems more appropriate for many of the questions these forums get (SO tends to discourage dialogue, it would far rather have coherent question -> brilliantly explained answer -> end). Morrison's argument against this is that google is not indexing the Zulip chats and that this is obviously bad. The Lean community is putting a public copy of all our Zulip chat online and I believe that the Coq community does this too, however I see no real indication that google is noticing this. I am not a computer scientist and I have no experience with trying to learn or teach any other "small" (in the scheme of things) programming languages like ours, but I am not convinced that if someone has a specific question about Coq then SO is the place to go. What I am confused about is whether it should be the place to go...

view this post on Zulip Bas Spitters (Sep 13 2021 at 11:21):

@Kevin Buzzard That's an interesting question. I agree Zulip works well for our community.
For some questions SO could also well, but I haven't seen this done much efficiently (for Coq), I believe.

My cynical remark above was about gaming the system...

view this post on Zulip Kevin Buzzard (Sep 13 2021 at 11:39):

Right! Both this community and the Lean community seem to do it the other way -- a SO question tagged Coq or Lean is flagged on the appropriate Zulip chat, and (this is my impression at least) people then come from Zulip to answer the SO question. I think there would be some chaos if every Zulip question was automatically posted on SO with the relevant tag :-)

view this post on Zulip Bas Spitters (Sep 13 2021 at 11:59):

I'm wondering whether this can be automated, zulip has already excellent integration of github and discourse. Has anyone tried to do this for SO?
I guess @Théo Zimmermann has been working on such issues.
We have the nice aggregator https://coq.pl-a.net/ (from https://coq.inria.fr/community.html)
However, I'm not sure how many people are using that. Is the lean zulip doing any better at the moment?

view this post on Zulip Théo Zimmermann (Sep 13 2021 at 12:01):

@Bas Spitters I haven't read the full discussion yet but I am the one who introduced the Discourse integration and I did the same for SO questions. Didn't you see them frequently pop up in your feed?

view this post on Zulip Théo Zimmermann (Sep 13 2021 at 12:01):

https://coq.zulipchat.com/#narrow/stream/237977-Coq-users/topic/New.20Stack.20Exchange.20question

view this post on Zulip Bas Spitters (Sep 13 2021 at 12:06):

:face_palm: They showed up so often that I muted the stream ...

Thanks for setting that up!

view this post on Zulip Kevin Buzzard (Sep 13 2021 at 12:11):

Aah, this is an unexpected disadvantage :-)

view this post on Zulip Michael Soegtrop (Sep 13 2021 at 12:28):

The absolute numbers of stack overflow questions and projects would be interesting ...

view this post on Zulip Théo Zimmermann (Sep 13 2021 at 12:31):

Personally, I feel that it is a shame to have to dig chat logs to be able to learn about the answer to a common question, so I would tend to be in favor of asking more questions on platforms like SO. OTOH, Zulip is much more welcoming. A possible solution is for experts themselves to notice Frequently Asked Questions on Zulip and transform them into SO question / answers. SO explicitly welcomes self-answered questions. Well-written Coq questions by expert without providing an answer are also useful.

view this post on Zulip Stefan Monnier (Sep 14 2021 at 00:47):

W.r.t. SO it's important to remember that this is a company that makes a living from that data, and (probably as a consequence) does not give easy access to the data, last I checked. IOW whichever answer or question you put in there belongs to SO and it can disappear any time or be difficult to recover. It's technically fairly well organized, but I personally find it awkward to recommend such a site.

view this post on Zulip Théo Zimmermann (Sep 14 2021 at 06:35):

@Stefan Monnier SO questions and answers are automatically licensed under a CC license AFAIR. And I think there is no automatic copyright transfer to SO, so "belong to SO" is a bit of an exaggeration. Furthermore, there are many backups of SO's content, including of course the Internet Archive, but also many "clone" sites that scrap the content from SO and republish it (and given the content's licensing, it's not something that SO can oppose). So while, like with any proprietary website / technology, the platform could be removed at any time, there is no such risk with the content. I haven't had a look at their API, so I cannot challenge your claim on not giving easy access to the data, but I'm quite surprised by it.

view this post on Zulip Karl Palmskog (Sep 14 2021 at 06:39):

hmm, there is always scraping... but if there is indeed some way to get a representation of questions/answers related to Coq, one could set up recurring exports like what is done on the Zulip. Then I would feel more inclined to use SO

view this post on Zulip Karl Palmskog (Sep 14 2021 at 06:42):

from memory, there are lots of software engineering papers that use SO to perform learning, so either there is an API or really good scraping tools

view this post on Zulip Guillaume Melquiond (Sep 14 2021 at 07:03):

@Théo Zimmermann Careful there. Just because data is under a license that precludes any commercial use does not mean that this data is free of charge. Stack Overflow is under no obligation to make it free or even easy to recover. As long as they do not charge more than their administrative/technical expenses, they are entitled to this money.

view this post on Zulip Théo Zimmermann (Sep 14 2021 at 07:05):

I don't understand your reasoning. First, I don't think the NC clause of CC licenses is used for SO contents, so it doesn't preclude commercial use. Furthermore, my point was not that they are not entitled to doing whatever they want with this data but rather than they can't prevent others from backing it up and sharing it further...

view this post on Zulip Guillaume Melquiond (Sep 14 2021 at 07:06):

Obviously they can. They are paying fees for every single byte that leaves their servers. They can ask for compensation.

view this post on Zulip Guillaume Melquiond (Sep 14 2021 at 07:08):

The only thing they cannot is claim copyright on this data, that is, prevent it from appearing elsewhere.

view this post on Zulip Théo Zimmermann (Sep 14 2021 at 07:15):

I am still missing your point. Can they suddenly make users pay for their API? Sure, they can. Can they suddenly make users pay to access the website? Sure, they can. But if they do such things, it will be virtually the same thing as if they had stopped the website entirely. In the meantime, backups would still exist...

view this post on Zulip Karl Palmskog (Sep 14 2021 at 07:25):

yeah, my suggestion was basically that if we can set up a regular community backup of Coq SO questions like we have for Zulip, then we should be able to fully recommend using SO (and even asking and self-answering frequently asked questions). Obviously, this backup needs to have a permissive license.

If SO then start to use some monetization scheme we don't like, we just take "our" SO questions somewhere else.

view this post on Zulip Karl Palmskog (Sep 14 2021 at 07:31):

I completely agree that CC NonCommercial is a terrible license, but it seems they are using CC BY-SA 4.0: https://meta.stackexchange.com/questions/347758/creative-commons-licensing-ui-and-data-updates/

view this post on Zulip Paolo Giarrusso (Sep 14 2021 at 13:54):

As your link points out, they’re now using CC BY-SA 2.5, 3.0 and 4.0 (whichever was active for new content at submission time). That’s fine but it took them a while to get to this (https://meta.stackexchange.com/q/333089/210646). Which seems an extra reason to not fully trust them and have the backup contingencies some proposed.

view this post on Zulip Yannick Forster (Sep 14 2021 at 14:16):

Why does Google not index the public Zulip archives? Or are they indexed, but just don't appear often as answers for search queries? I agree with Théo that for many things a forum-like system like Discourse or SO is better, but now that the effort to make the archives available was made it's a shame that they are in practice not accessible via a Google search

view this post on Zulip Théo Zimmermann (Sep 14 2021 at 14:18):

We've just added the link to the archive on the Coq website a few days ago. Let's see if Google picks them up after a while.

view this post on Zulip Théo Zimmermann (Sep 14 2021 at 14:19):

Google would really pick them up if there started to be many many links to various discussions within these archives, which I don't think will ever happen

view this post on Zulip Karl Palmskog (Sep 14 2021 at 14:23):

right now, people are basically linking directly into the Zulip

view this post on Zulip Théo Zimmermann (Sep 14 2021 at 14:24):

Yes, and I don't see why that would change.

view this post on Zulip Kevin Buzzard (Sep 14 2021 at 16:14):

Yes this is exactly what is happening with the Lean Zulip too -- if there is some discussion on Zulip about a PR or a discussion which turns into a PR then on github people just post links back to the Zulip, which presumably Google can't see.

view this post on Zulip Yannick Forster (Sep 14 2021 at 16:31):

I googled around a bit and in the end, the problem might go away by having this (actively worked on) PR merged into Zulip: https://github.com/zulip/zulip/pull/18532 Then streams can be configured to become web-publicly readable without logging in, which should also mean that they can be indexed by search engines and that the archives become redundant.


Last updated: Aug 14 2022 at 13:02 UTC