I'd be interested in the feelings of the Coq community towards SO. Some of us over in the Lean community had a discussion about SO and Scott Morrison (one of the founders of MathOverflow) was maximally pro the idea of getting more SO questions about Lean, his argument being that this is the place where the CS people go to get their questions answered. Indeed if you talk to young people about e.g. how they learnt python they seem to not even learn it from a book nowadays, they find some web resources (or they have to learn it at university and are given some resources) and then when they get stuck on something they just google and find a good SO answer. However my opinion is that this only works because python is an extremely popular language with a massive user base. For theorem provers I am not personally convinced that the SO model works so well. The idea with the SO model is that someone comes up with a good and coherent question which can trip up beginners, an expert writes a very well-worded answer, everything gets a ton of upvotes, and then google notices and starts redirecting people there. My impression of the Coq and Lean SO questions is that they are typically not at all of this nature, many of them only get one or two upvotes and indeed it is hard to imagine that most of them will be helpful for other users. On the other hand I am active poster in the Lean Zulip and a frequent lurker here, and I see people's questions being answered far more efficiently because places like this are a much better place to have a dialogue, which somehow seems more appropriate for many of the questions these forums get (SO tends to discourage dialogue, it would far rather have coherent question -> brilliantly explained answer -> end). Morrison's argument against this is that google is not indexing the Zulip chats and that this is obviously bad. The Lean community is putting a public copy of all our Zulip chat online and I believe that the Coq community does this too, however I see no real indication that google is noticing this. I am not a computer scientist and I have no experience with trying to learn or teach any other "small" (in the scheme of things) programming languages like ours, but I am not convinced that if someone has a specific question about Coq then SO is the place to go. What I am confused about is whether it should be the place to go...
@Kevin Buzzard That's an interesting question. I agree Zulip works well for our community.
For some questions SO could also well, but I haven't seen this done much efficiently (for Coq), I believe.
My cynical remark above was about gaming the system...
Right! Both this community and the Lean community seem to do it the other way -- a SO question tagged Coq or Lean is flagged on the appropriate Zulip chat, and (this is my impression at least) people then come from Zulip to answer the SO question. I think there would be some chaos if every Zulip question was automatically posted on SO with the relevant tag :-)
I'm wondering whether this can be automated, zulip has already excellent integration of github and discourse. Has anyone tried to do this for SO?
I guess @Théo Zimmermann has been working on such issues.
We have the nice aggregator https://coq.pl-a.net/ (from https://coq.inria.fr/community.html)
However, I'm not sure how many people are using that. Is the lean zulip doing any better at the moment?
@Bas Spitters I haven't read the full discussion yet but I am the one who introduced the Discourse integration and I did the same for SO questions. Didn't you see them frequently pop up in your feed?
:face_palm: They showed up so often that I muted the stream ...
Thanks for setting that up!
Aah, this is an unexpected disadvantage :-)
The absolute numbers of stack overflow questions and projects would be interesting ...
Personally, I feel that it is a shame to have to dig chat logs to be able to learn about the answer to a common question, so I would tend to be in favor of asking more questions on platforms like SO. OTOH, Zulip is much more welcoming. A possible solution is for experts themselves to notice Frequently Asked Questions on Zulip and transform them into SO question / answers. SO explicitly welcomes self-answered questions. Well-written Coq questions by expert without providing an answer are also useful.
W.r.t. SO it's important to remember that this is a company that makes a living from that data, and (probably as a consequence) does not give easy access to the data, last I checked. IOW whichever answer or question you put in there belongs to SO and it can disappear any time or be difficult to recover. It's technically fairly well organized, but I personally find it awkward to recommend such a site.
@Stefan Monnier SO questions and answers are automatically licensed under a CC license AFAIR. And I think there is no automatic copyright transfer to SO, so "belong to SO" is a bit of an exaggeration. Furthermore, there are many backups of SO's content, including of course the Internet Archive, but also many "clone" sites that scrap the content from SO and republish it (and given the content's licensing, it's not something that SO can oppose). So while, like with any proprietary website / technology, the platform could be removed at any time, there is no such risk with the content. I haven't had a look at their API, so I cannot challenge your claim on not giving easy access to the data, but I'm quite surprised by it.
hmm, there is always scraping... but if there is indeed some way to get a representation of questions/answers related to Coq, one could set up recurring exports like what is done on the Zulip. Then I would feel more inclined to use SO
from memory, there are lots of software engineering papers that use SO to perform learning, so either there is an API or really good scraping tools
@Théo Zimmermann Careful there. Just because data is under a license that precludes any commercial use does not mean that this data is free of charge. Stack Overflow is under no obligation to make it free or even easy to recover. As long as they do not charge more than their administrative/technical expenses, they are entitled to this money.
I don't understand your reasoning. First, I don't think the NC clause of CC licenses is used for SO contents, so it doesn't preclude commercial use. Furthermore, my point was not that they are not entitled to doing whatever they want with this data but rather than they can't prevent others from backing it up and sharing it further...
Obviously they can. They are paying fees for every single byte that leaves their servers. They can ask for compensation.
The only thing they cannot is claim copyright on this data, that is, prevent it from appearing elsewhere.
I am still missing your point. Can they suddenly make users pay for their API? Sure, they can. Can they suddenly make users pay to access the website? Sure, they can. But if they do such things, it will be virtually the same thing as if they had stopped the website entirely. In the meantime, backups would still exist...
yeah, my suggestion was basically that if we can set up a regular community backup of Coq SO questions like we have for Zulip, then we should be able to fully recommend using SO (and even asking and self-answering frequently asked questions). Obviously, this backup needs to have a permissive license.
If SO then start to use some monetization scheme we don't like, we just take "our" SO questions somewhere else.
I completely agree that CC NonCommercial is a terrible license, but it seems they are using CC BY-SA 4.0: https://meta.stackexchange.com/questions/347758/creative-commons-licensing-ui-and-data-updates/
As your link points out, they’re now using CC BY-SA 2.5, 3.0 and 4.0 (whichever was active for new content at submission time). That’s fine but it took them a while to get to this (https://meta.stackexchange.com/q/333089/210646). Which seems an extra reason to not fully trust them and have the backup contingencies some proposed.
Why does Google not index the public Zulip archives? Or are they indexed, but just don't appear often as answers for search queries? I agree with Théo that for many things a forum-like system like Discourse or SO is better, but now that the effort to make the archives available was made it's a shame that they are in practice not accessible via a Google search
We've just added the link to the archive on the Coq website a few days ago. Let's see if Google picks them up after a while.
Google would really pick them up if there started to be many many links to various discussions within these archives, which I don't think will ever happen
right now, people are basically linking directly into the Zulip
Yes, and I don't see why that would change.
Yes this is exactly what is happening with the Lean Zulip too -- if there is some discussion on Zulip about a PR or a discussion which turns into a PR then on github people just post links back to the Zulip, which presumably Google can't see.
I googled around a bit and in the end, the problem might go away by having this (actively worked on) PR merged into Zulip: https://github.com/zulip/zulip/pull/18532 Then streams can be configured to become web-publicly readable without logging in, which should also mean that they can be indexed by search engines and that the archives become redundant.
Last updated: Jun 01 2023 at 13:01 UTC