How much memory does the bench machine have? (What's the upper limit of what's acceptable to use when benching?)
tassi@pendulum:~$ free -g
total used free shared buffers cached
Mem: 62 27 35 0 2 20
-/+ buffers/cache: 3 59
Swap: 62 0 61
I'd say up 32G it should work reliably
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31976 jenkins 20 0 18.191g 237760 12500 S 0.0 0.4 35:58.19 java
209 root 20 0 49348 20612 20452 S 0.0 0.0 1:11.57 systemd-jo+
I guess it may be the time to restart jenkins
but I lost my sudo password, so I can't
@Enrico Tassi Thanks! (The dev I'm adding maxes out at 4.2 GB, currently.) We should probably also update Jenkins, so we can get Zulip integration, but, again, probably safer for someone with access to sudo to do this.
IIRC upgrading jenkins is never a pleasure...
Isn't Inria IT staff (the SED) that does this kind of things?
Actually the ci.inria.fr interface lets us enable at ci.inria.fr/coq-qualif jenkins 2.89.4 instead of 1.6, see https://ci.inria.fr/project/coq/cisoftware but I guess one also has to install the slave code on pendulum. No idea if the configuration of jenkins itself can be migrated automatically.
@Jason Gross Would that version have Zulip integration?
Apparently the conf can be migrated "automatically", so it may be worth trying this
Screenshot_20200520-110033.png
Indeed, the configuration can be migrated automatically, but it seems important to be able to go in and restart Jenkins manually / fix things that broke, if anything goes wrong, no?
This is at https://ci.inria.fr/coq/pluginManager/available
Jenkins itself can be upgraded automatically at https://ci.inria.fr/coq/manage
But there is a risk that plugins break, etc, so we should probably record all of the important configuration somewhere so that if it all disappears, it is not too much of a pain to remember what needs to be recreated
At a minimum, the options at https://ci.inria.fr/coq/view/opam/job/benchmark-part-of-the-branch/configure and their default values should be recorded
Okay, here's a recording if we need to recreate it
Screenshot_20200520-110609.png
Screenshot_20200520-110642.png
And the current packages are
coq-performance-tests coq-hott coq-bignums coq-mathcomp-ssreflect coq-mathcomp-fingroup coq-mathcomp-algebra coq-mathcomp-solvable coq-mathcomp-field coq-mathcomp-character coq-mathcomp-odd-order coq-math-classes coq-corn coq-flocq coq-compcert coq-geocoq coq-color coq-coqprime coq-coqutil coq-bedrock2 coq-rewriter coq-fiat-core coq-fiat-parsers coq-fiat-crypto coq-unimath coq-sf-plf coq-coquelicot coq-lambda-rust coq-verdi coq-verdi-raft coq-fourcolor coq-rewriter-perf-SuperFast
Screenshot_20200520-110926.png
Ugh, is Zulip uploading my images at low quality?
(deleted)
images seem to be stored in high quality too, one just needs to press "Open" button
Ah, okay, thanks. It's hard to tell on a phone
Bump. Any update on updating Jenkins? (And finding someone to install jq
on pendulum
?)
I got my credentials back, and I've just installed jq on pendulum.
Thanks! Now that one of the devs has credentials, should I tell Jenkins to upgrade itself? (Do you want to do any backup or anything first?)
Btw, here's a copy-paste of the current configuration: https://gist.github.com/JasonGross/3f9aa67a376c84732bc268ea4da86031
Trying to figure out how to update the -qualif bench. Previously clicking "synchronize" on https://ci.inria.fr/project/coq/cisoftware wasn't updating the benchmark-part-of-the-branch config. Now https://ci.inria.fr/coq-qualif/view/opam/job/benchmark-part-of-the-branch/ gives
com.thoughtworks.xstream.converters.ConversionException: Refusing to unmarshal securityRealm for security reasons; see https://jenkins.io/redirect/class-filter/
---- Debugging information ----
class : hudson.security.LDAPSecurityRealm
required-type : hudson.security.LDAPSecurityRealm
converter-type : hudson.util.XStream2$BlacklistedTypesConverter
path : /hudson/securityRealm
line number : 25
-------------------------------
at hudson.util.XStream2$BlacklistedTypesConverter.unmarshal(XStream2.java:546)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:390)
at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:328)
Caused: jenkins.util.xstream.CriticalXStreamException: Refusing to unmarshal securityRealm for security reasons; see https://jenkins.io/redirect/class-filter/
---- Debugging information ----
class : hudson.security.LDAPSecurityRealm
required-type : hudson.security.LDAPSecurityRealm
converter-type : hudson.util.XStream2$BlacklistedTypesConverter
path : /hudson/securityRealm
line number : 25
------------------------------- : Refusing to unmarshal securityRealm for security reasons; see https://jenkins.io/redirect/class-filter/
---- Debugging information ----
class : hudson.security.LDAPSecurityRealm
required-type : hudson.security.LDAPSecurityRealm
converter-type : hudson.util.XStream2$BlacklistedTypesConverter
path : /hudson/securityRealm
line number : 25
-------------------------------
message : Refusing to unmarshal securityRealm for security reasons; see https://jenkins.io/redirect/class-filter/
---- Debugging information ----
class : hudson.security.LDAPSecurityRealm
required-type : hudson.security.LDAPSecurityRealm
converter-type : hudson.util.XStream2$BlacklistedTypesConverter
path : /hudson/securityRealm
line number : 25
-------------------------------
cause-exception : com.thoughtworks.xstream.converters.ConversionException
cause-message : Refusing to unmarshal securityRealm for security reasons; see https://jenkins.io/redirect/class-filter/
class : hudson.model.Hudson
required-type : hudson.security.LDAPSecurityRealm
converter-type : hudson.util.RobustReflectionConverter
path : /hudson/securityRealm
line number : 25
version : not available
-------------------------------
at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:353)
at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:267)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134)
at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32)
at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1189)
at hudson.util.XStream2.unmarshal(XStream2.java:161)
at hudson.util.XStream2.unmarshal(XStream2.java:132)
at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1173)
at hudson.XmlFile.unmarshal(XmlFile.java:180)
Caused: java.io.IOException: Unable to read /net/www/ci/coq-qualif/config.xml
at hudson.XmlFile.unmarshal(XmlFile.java:183)
at hudson.XmlFile.unmarshal(XmlFile.java:163)
at jenkins.model.Jenkins.loadConfig(Jenkins.java:3118)
at jenkins.model.Jenkins.access$1200(Jenkins.java:320)
at jenkins.model.Jenkins$13.run(Jenkins.java:3219)
at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:169)
at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
at jenkins.model.Jenkins$5.runTask(Jenkins.java:1133)
at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused: org.jvnet.hudson.reactor.ReactorException
at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:282)
at jenkins.InitReactorRunner.run(InitReactorRunner.java:50)
at jenkins.model.Jenkins.executeReactor(Jenkins.java:1166)
at jenkins.model.Jenkins.<init>(Jenkins.java:966)
at hudson.model.Hudson.<init>(Hudson.java:85)
at hudson.model.Hudson.<init>(Hudson.java:81)
at hudson.WebAppMain$3.run(WebAppMain.java:233)
Caused: hudson.util.HudsonFailedToLoad
at hudson.WebAppMain$3.run(WebAppMain.java:250)
(In fact, I get the same thing at https://ci.inria.fr/coq-qualif/ )
Any ideas what's going on? (cc @Maxime Dénès @Pierre-Marie Pédrot )
https://issues.jenkins-ci.org/browse/JENKINS-60158
There are a few other hits on google
No idea, sorry.
According to https://www.jenkins.io/blog/2018/03/15/jep-200-lts/ , it looks like this might be a genuine bug in Jenkins, so I've reported it at https://issues.jenkins-ci.org/browse/JENKINS-62571
It looks like if someone has ssh access to the qualif server, we might be able to get Jenkins to start on it by manually starting Jenkins and adding the -Dhudson.remoting.ClassFilter=hudson.security.LDAPSecurityRealm
flag?
Is there anyway to downgrade the -qualif server? I only see versions listed that are newer than the current version.
Pendulum is offline (cf https://github.com/coq/coq/pull/11977#issuecomment-649506156). What happened? / How do we get it back?
The problem seems pretty serious, I don't have access to it. I'm probably going to the office tomorrow, to fix this issue.
@Maxime Dénès were you able to get physical access on the machine?
I'm there right now, investigating
Note that I'm doing some experiments with running the bench on Gitlab, so it could induce some variability on other benchmarks running at the same time
It should now be back online, but again, my experiments could disturb a bit the measurements.
May I have the right to run benchmarks? My GitLab.com ID is pi8027. I need it for PR #14386.
granted
Thanks!
now vst is broken
# COQC compcert/lib/IEEE754_extra.v
# File "./compcert/lib/IEEE754_extra.v", line 35, characters 21-22:
# Error: The reference Z was not found in the current environment.
probably it gets the wrong flocq, not that I know which one is the right one
Would it be possible (and permissible) to have coqbot / the bench script tag me when coq-fiat-crypto-with-bedrock fails on the bench?
@Jason Gross I can do that when I impelement a failed jobs list.
I have some improvements to the bench here that need looking at: https://github.com/coq/coq/pull/16680
Comes with a corresponding coqbot PR too: https://github.com/coq/bot/pull/240
I'm increasing the line stats count to 50 and adding pdiff tables too.
Ali Caglayan said:
Jason Gross I can do that when I impelement a failed jobs list.
Done. Waiting for a few PRs to go before I submit the PR tho.
Thank you!
Last updated: Dec 07 2023 at 18:01 UTC