Stream: Machine learning and automation

Topic: Proving things about neural nets


view this post on Zulip Jason Gross (Aug 04 2023 at 03:48):

Is anyone [else] working on proving things in Coq about neural nets?

view this post on Zulip Michael Soegtrop (Aug 04 2023 at 07:19):

No, but I am working on numerics, including numerical linear algebra, so I am not that far away.

view this post on Zulip Karl Palmskog (Aug 04 2023 at 09:51):

probably insiders know about this work already, but for other people reading this the following might be interesting: http://ace.cs.ohio.edu/~gstewart/mlcert.html

In particular, this "prove[s] generalization bounds [in Coq] on neural networks
trained using TensorFlow on the extended MNIST data set" (https://github.com/OUPL/MLCert)

view this post on Zulip Karl Palmskog (Aug 04 2023 at 09:51):

(I don't work in this area myself, but try to follow it)

view this post on Zulip Karl Palmskog (Aug 04 2023 at 10:25):

the MLCert website is apparently not updated with the latest PLDI paper/code that looks at verification of sampling as used in TensorFlow.

view this post on Zulip Paolo Giarrusso (Aug 04 2023 at 12:37):

http://ace.cs.ohio.edu/~gstewart/mlcert.html seems down?

view this post on Zulip Karl Palmskog (Aug 04 2023 at 12:39):

works for me?

view this post on Zulip Paolo Giarrusso (Aug 04 2023 at 12:43):

Good for you, but tried from multiple VPN locations / browsers / hosts, _all_ fail.

view this post on Zulip Karl Palmskog (Aug 04 2023 at 12:44):

latest archive snapshot: https://web.archive.org/web/20230603202215/http://ace.cs.ohio.edu/~gstewart/mlcert.html

view this post on Zulip Michael Soegtrop (Aug 04 2023 at 12:50):

Afaik Gordon is now with BedRock, so it is natural that his university page was shut down (the complete personal page is gone).

view this post on Zulip Karl Palmskog (Aug 04 2023 at 12:51):

GitHub at least proves more stable than university pages. It used to be that university pages were the most stable

view this post on Zulip Karl Palmskog (Aug 04 2023 at 12:53):

seemingly open-access canonical version of the 2019 paper: https://ojs.aaai.org/index.php/AAAI/article/view/4115

view this post on Zulip Karl Palmskog (Aug 04 2023 at 17:54):

sigh, they must have pulled the ohio.edu website just today, after I posted the URL

view this post on Zulip Jason Gross (Aug 04 2023 at 18:31):

@Michael Soegtrop Where's your codebase / what bits are you working on? Should we de-duplicate work?

I have some code here on arbitrary-rank tensors, indexing wrap-around, reshaping, broadcasting, round-tripping through primitive arrays (so computations can be forced at specific points to avoid duplicated work) and some other basic operations, some work on the fancy slicing that python supports including index wrap-around, slicing by ranges with and without steps, slicing by tensors of indices, and leaving implicit the leading ranks; and some preliminary support for Einstein summation notation. I also have a bunch of proofs of Proper instances as well as some development of heterogenous and dependently-typed Proper instances that I'm using to switch between doing linear algebra on primitive floats vs linear algebra on Flocq floats. (I also have a small development of for loops on uint63 based on proving well-foundedness of ltb, which I'm currently using for sum, mean, var, etc) (I also have a development of ln and exp on primitive floats that currently has no proofs, that I'd be happy to replace with something more proven as long as its as fast)

Some design choices I'm not sure about:

view this post on Zulip Jason Gross (Aug 04 2023 at 18:31):

@Karl Palmskog I was not aware of that, and that looks like it might overlap significantly with what I'm working on currently, thanks!

view this post on Zulip Jason Gross (Aug 04 2023 at 18:31):

MLCert doesn't support softmax, though, it seem?

view this post on Zulip Jason Gross (Aug 04 2023 at 18:33):

Also it's for Coq 8.8 :-/

view this post on Zulip Jason Gross (Aug 04 2023 at 18:33):

Maybe I should try to update it to 8.18 and get it in Coq's CI / opam?

view this post on Zulip Karl Palmskog (Aug 04 2023 at 18:42):

@Jason Gross we have multiple people here on the Zulip from Bedrock where Gordon Stewart currently works. Maybe you could get in touch with Gordon about porting to 8.18, either directly or through them? We'd be happy to host MLCert in Coq-community if you get permission from authors (edit: I found Gordon on the Zulip)

view this post on Zulip Karl Palmskog (Aug 04 2023 at 18:46):

@Gordon Stewart what is the status of MLCert (https://github.com/OUPL/MLCert)? Would you be open to getting it ported (by others) to recent Coq, and possibly having it hosted in Coq-community for future maintenance? See discussion above in this topic by Jason Gross.

view this post on Zulip Karl Palmskog (Aug 04 2023 at 18:50):

@Jason Gross there is also this work in Lean that may be adjacent: https://arxiv.org/abs/1911.00385

We present a formal proof in Lean of probably approximately correct (PAC) learnability of the concept class of decision stumps. This classic result in machine learning theory derives a bound on error probabilities for a simple type of classifier. Though such a proof appears simple on paper, analytic and measure-theoretic subtleties arise when carrying it out fully formally. Our proof is structured so as to separate reasoning about deterministic properties of a learning function from proofs of measurability and analysis of probabilities.

https://github.com/jtristan/stump-learnable

view this post on Zulip Patrick Nicodemus (Aug 04 2023 at 19:18):

With machine learning in particular it seems that the culture in that field is about moving quickly, setting up something that seems to work, publishing a conference paper and moving on to the next idea. The major libraries are all in Python, which is unsuitable for static analysis. It seems that rapid prototyping and exploration are more important than correctness in machine learning. I wonder if this work will be appreciated as valuable within that community, it seems like a hard sell.

Honestly insistence on correctness may be seen as splitting hairs. Many statistical fallacies are quite common and it's difficult to get people to take them seriously, i.e. https://arxiv.org/pdf/1603.05766

view this post on Zulip Patrick Nicodemus (Aug 04 2023 at 19:22):

"The work is only exploratory and used as a guide to see what opportunities are possibly worth investigating, so it doesn't have to be perfect"

view this post on Zulip Jason Gross (Aug 04 2023 at 20:01):

I'm trying to understand the gap between what I'm trying to do and what others have done, and trying to figure out if I should be reusing their code, etc. (Insights / opinions welcome!)
What I'm trying to do:

view this post on Zulip Jason Gross (Aug 04 2023 at 20:03):

It looks like MLCert proves expected(?) generalization bounds on the outputs of training procedures. The theorem (and proof???) seem to be independent of the actual weights?

view this post on Zulip Jason Gross (Aug 04 2023 at 20:09):

It looks like there's code for doing bounds analysis on the actual parameters in NNCert/net.v, but I can't tell if/how it gets run

view this post on Zulip Karl Palmskog (Aug 04 2023 at 20:11):

again not familiar with details, but if I remember correctly this project was partially developed to do machine learning verification (not sure to what extent it got applied): https://github.com/jtassarotti/coq-proba

view this post on Zulip Jason Gross (Aug 04 2023 at 20:13):

I was also looking at the cited "Developing Bug-Free Machine Learning Systems With Formal Mathematics" by Selsam et al in Lean, and it looks like they handwave the difference between floats and reals and axiomatize matrix semantics (which is maybe fine, and was a thing I was considering doing as a first pass, though I'm interested in spending a bit more time trying to get a formalization that actually does floats correctly without assuming false)

view this post on Zulip Karl Palmskog (Aug 04 2023 at 20:15):

do you already know about VCFloat? https://github.com/VeriNum/vcfloat

I think the slogan is that it's like Gappa, but completely inside Coq

view this post on Zulip Gordon Stewart (Aug 04 2023 at 20:16):

MLCert isn't actively maintained but I would be happy to see it ported. My
student Alexander Bagnall was the primary developer. I will try to loop him
in (not sure if he's on zulip yet).

view this post on Zulip Jason Gross (Aug 04 2023 at 20:17):

@Patrick Nicodemus The people working in ML alignment/steerability/safety are interested in formal verification, the main question, I think, is if formal verification is powerful enough (yet) to prove anything interesting at all about NNs.

view this post on Zulip Karl Palmskog (Aug 04 2023 at 20:20):

there is this quite general (but abstract) result about NNs proved in Isabelle/HOL (Expressiveness of Deep Learning): https://link.springer.com/article/10.1007/s10817-018-9481-5

Not sure how relevant it is for practical NN verification.

view this post on Zulip Jason Gross (Aug 04 2023 at 20:33):

@Karl Palmskog I haven't seen vcfloat, thanks! Right now what I have is an expression in terms of Coq's primitive floats. I've managed to prove it equal to the same expression in terms of flocq's binary_floats. My current goal is

  1. prove that for a particular range of inputs, no step of the computation results in nan, inf, neg_inf
  2. split the computation in the middle to decompose the proof. In more detail, my computation can be represented as f(r(x, y), ov(r(x, y)), kv(r(x, y))), and I want to show that (a) certain bounds hold on kv(r(x, y)) as a function of x - y (just by brute force enumeration, at first), and (b) whenever kv(r(x, y)) is within particular bounds, then f is within certain other bounds.

Can vcfloat help with these?

view this post on Zulip Gordon Stewart (Aug 04 2023 at 20:35):

Sorry about the MLCert website going down. I don't work at the university
anymore and didn't receive notice that the site/server were being pulled.

view this post on Zulip Jason Gross (Aug 04 2023 at 20:39):

What's the relationship between vcfloat and coq-interval?

view this post on Zulip Karl Palmskog (Aug 04 2023 at 20:49):

I've only used Interval, but from what I remember from Andrew Appel's explanation to me, Interval is about reasoning about reals directly, i.e., your goals will only mention real numbers (I think the underlying machinery uses Coq's primitive floats). In contrast, VCFloat and Gappa (https://gitlab.inria.fr/gappa/coq) are about establishing connections between floats from Flocq and reals.

view this post on Zulip Karl Palmskog (Aug 04 2023 at 20:50):

VCFloat actually uses significant parts of Interval under the hood, specifically some parts that are not exposed to regular Interval users.

view this post on Zulip Karl Palmskog (Aug 04 2023 at 20:52):

proving absence of nan/inf/neg_inf seems (from short discussion with Andrew) like something VCFloat or Coq-Gappa should be able to do, but I can unfortunately only refer to manual and papers here: https://verinum.org/vcfloat/

view this post on Zulip Michael Soegtrop (Aug 05 2023 at 08:11):

@Jason Gross : unfortunately non of my work on verification of numerical C code is published - it is all about verification of proprietary devices and a correctness proof obviously discloses how the stuff works. But I am available to share experience.

Regarding VCfloat vs gappa: they serve a similar purpose: prove the deviation between an infinite precision and a float (or fixed point) computation. I would say that gappa is substantially more powerful than VCfloat in the sense that it can handle substantially higher complexity in a reasonable time. It might be though that for rather simple goals VCfloat is faster, because it doesn't need the external process. I guess that there are cases where gappa finds tighter bounds and there might also be cases where VCfloat finds tighter bounds (gappa is sometimes 1 LSB off the bound I can prove manually and the way VCfloat works makes me believe it could handle this in some cases). Btw.: you don't have to trust external tools when using gappa - gappa creates a Coq proof script. Gappa is a bit awkward to use though and has some strange restrictions. E.g. the bounds you give must be exactly representable in binary floats of the precision you specify. This means it is a bad idea to give bounds which are negative powers of 10. I am using VCfloat mostly when I have small terms with explicit epsilons in them. For larger terms, say higher order polynomial approximations, I tend to use gappa.

view this post on Zulip Michael Soegtrop (Aug 05 2023 at 08:21):

@Jason Gross : can you point me to the code the two above questions refer to?

For absence of nan/inf for functions with not too many arguments I would use coq-interval to prove ranges in ℝ and then gappa to prove that the computation in floats doesn't deviate too much from the computation in ℝ. For proving deviations of approximations from non elementary functions coq-interval is only usable for uni-variate functions, since it can do Taylor or differential analysis only in one variable. For proving bounds for non overflow questions, it can handle larger numbers of variables - it does bisection then in these variables. This can work well even for quite a few variables (I think the max I had was 6) if you need almost no bisections to prove your goal - that is if your bounds are rather loose. If your bounds are rather tight, obviously the effort of bisection grows exponentially with the number of variables.

view this post on Zulip Karl Palmskog (Aug 05 2023 at 09:27):

fair warning that VCFloat is broken on Interval 4.8.0 (also the VCFloat repo last I checked). This is due to changes in Interval internals in 4.8.0. The latest VCFloat released package has bounds to reflect this.

view this post on Zulip Gordon Stewart (Aug 05 2023 at 19:56):

Just as a bit of context, the main goal of MLCert/NNCert was proving generalization bounds on learning procedures modeled as probabilistic programs. The bounds were statistical in nature, applying some results from learning theory like Hoeffding's inequality. This meant that they depended on the size of the training set and the size of the parameter space of the models being learned but not on specifics of the learning procedure itself. While there are interesting aspects of this approach, I'm not sure how well it scales well to large networks (the Limitations section of the paper explains some of the concerns, such as reliance on iid assumptions and on large training sets relative to model sizes).

We do instantiate the theorems to neural networks (see the NNCert directory). But our method may not be totally suitable for Jason's purpose. In particular, we axiomatize some things (floats, vectors), use dyadic rationals in some other places, and compute training and test error only under extraction (i.e., not in Coq itself via something like native_compute). Also, as far as I can recall right now, we supported only ReLU units, not softmax.

view this post on Zulip Karl Palmskog (Aug 05 2023 at 20:25):

@Gordon Stewart I don't see any mention of license in the MLCert repo except for the dataset.py file. If Jason sees potential for reuse of the project, would you consider licensing it under some open source license so the Coq community could benefit? We give some recommendations here.

view this post on Zulip Gordon Stewart (Aug 05 2023 at 20:28):

Sure, I'd be happy to license it (probably under a permissive license like MIT). Not doing so was an oversight. Right now, the project has no CI and doesn't build with current Coq (as Jason noted), so there are other problems that would also need to be fixed to make it re-usable :-/

view this post on Zulip Karl Palmskog (Aug 05 2023 at 20:32):

sounds good. I'd say the ball is in Jason's court w.r.t. if he wants to spend time updating it then (given what has been said about details on things like relu vs. softmax, etc.).

view this post on Zulip Jason Gross (Aug 06 2023 at 00:15):

@Michael Soegtrop Are gappa / vcfloat reflective? It looks like gappa is not, and I can't imagine this going well on my example, given that it takes cbv 4s just to execute the function on some arbitrarily picked inputs, using primitive floats the entire time. (The vm takes only 0.2 s, and lazy takes about 2s, but I can't imagine Ltac being less than a couple orders magnitude slower than reduction.)

The code for the first question is here and here. Any insight / advice / help is welcome! The code for the second question doesn't really exist yet; I figured I should first figure out how to do simple manipulations like turning abs (x / y - z) < w into abs (x - z * y) < w*y before trying to split the more complicated operation.

or proving bounds for non overflow questions, it can handle larger numbers of variables - it does bisection then in these variables. This can work well even for quite a few variables (I think the max I had was 6) if you need almost no bisections to prove your goal - that is if your bounds are rather loose.

Are these variables like context variables, or variables like "intermediate values used multiple times"? I have (depending on how you want to look at it) two Fin.t 64-valued variables; or 128 bool-valued variables; or 64 float-valued variables that I can probably manually craft ranges on. (I am one-hot-encoding two tokens between 0 and 64 into a 32-dimensional space.) But as far as intermediate values used multiple times, I expect I have at least a couple hundred, possibly as many as tens of thousands. (Intermediates used at least once number at least in the tens of thousands.)

When are bisections needed? If I only need bisection to increase precision, then I expect that I won't need any bisections whatsoever. My numbers never exceed a couple hundred.

view this post on Zulip Jason Gross (Aug 06 2023 at 00:18):

@Karl Palmskog @Gordon Stewart It seems like MLCert doesn't overlap enough with my goals (proving things about concrete NNs in Coq, being able to efficiently evaluate transformers with softmax, etc, and make use of the native compiler) to make code reuse make sense. But I expect I'll at some point be interested in the training procedures as well, so I'm glad to have taken a look at MLCert!

view this post on Zulip Jason Gross (Aug 06 2023 at 00:28):

Karl Palmskog said:

fair warning that VCFloat is broken on Interval 4.8.0 (also the VCFloat repo last I checked). This is due to changes in Interval internals in 4.8.0. The latest VCFloat released package has bounds to reflect this.

I've proposed a fix at https://github.com/VeriNum/vcfloat/pull/17

view this post on Zulip Jason Gross (Aug 06 2023 at 00:29):

@Gordon Stewart Just to make sure, am I right that your NN representation doesn't have support for reusing computed intermediates / for let-binding intermediate values? (I think this is a must-have for transformer skip-connections.)

view this post on Zulip Gordon Stewart (Aug 06 2023 at 10:05):

@Jason Gross Yes, that's right. We didn't support skip connections.

view this post on Zulip Michael Soegtrop (Aug 07 2023 at 07:15):

@Jason Gross : gappa is an early work of Guillaume and written in C++. The tactic creates a gappa input file and calls the gappa executable (a separate opam package) which creates a Coq proof term in text form. Note that gappa itself is a bit more flexible than the tactic but if one sticks to the rules I didn't fins a reason yet to write gappa files manually. For VCFloat I can't tell, but I don't think it is reflexive. As far as I know It mostly works by calling other reflective tactics like field to rearrange the terms to get the epsilon terms to the top. I will have a look at your cases later today. As I said I used VCfloat mostly for more analytic situations, where its way of working can be convenient.

view this post on Zulip Michael Soegtrop (Aug 07 2023 at 08:11):

Regarding the variable count:

For coq-interval: In general (afaik) it is so that if a variable occurs only once in a term, nothing bad can happen in interval arithmetic say as would in x-x, which is obviously 0, but interval arithmetic won't find this. In this sense (again afaik) coq-interval can handle only one variable occurring multiple times and requiring differential analysis to compute sufficiently tight bounds (as in x-x). For the remaining variables, the situation must be solvable by interval bisection, which wouldn't give you much in the x-x case (the error decreases only linearly with the bisection effort), but is can be quite effective.

For gappa: it is used to compute the difference between terms with and without roundoff error. Round off errors are usually uncorrelated even between different usages of the same variable. Say if you compute (ax+b)(c*x+d) for different constants a,b,c,d and variable x, the rounding error of the two float additions will be quite uncorrelated. I don't think it is generally feasible to exploit such dependencies, so I don't think gappa attempts to do this. So far gappa it shouldn't matter if variables occur multiple times. I guess what counts is the number of rounding operations you have in one term. Where the limit is here I can't tell.

view this post on Zulip Michael Soegtrop (Aug 07 2023 at 08:19):

@Guillaume Melquiond : Jason is working on verification of neural network floating point code and is exploring the capabilities of the various proof tools to help with reasoning about floats. We discussed the questions how many variables coq-interval (for proving non overflow) and gappa (for proving round off error) can handle. As far as I understand Jason he has two float variables, but they occur many times in the terms (at least a few 100). Maybe you can review my above comments.

view this post on Zulip Jason Gross (Aug 07 2023 at 17:32):

I don't think "two float variables" is accurate. I have two length-32 float vector variables, each of which can take on 64 distinct values, though I'm currently planning to treat this as 64 float variables with a bound that is the smallest interval containing all 64 possible (concrete, known) values.

view this post on Zulip Jason Gross (Aug 07 2023 at 17:34):

(You could say that I have two "integer index" variables which just index into a float matrix)

view this post on Zulip Jason Gross (Aug 08 2023 at 17:02):

Hm, it seems that vcfloat's reflective expr language doesn't support let bindings?

view this post on Zulip Jason Gross (Aug 08 2023 at 17:28):

And it looks like it also doesn't support things like min, max, cast to integer, explicit tests for nan, etc?

view this post on Zulip Jason Rute (Aug 09 2023 at 17:06):

It is not floating point neural networks as far as I’m aware, but https://github.com/IBM/FormalML has a number of formalizations of theorems in probability theory and machine learning. They formally prove the correctness of various RL algorithms (not sure if any use deep RL), and Dvoretzky's approximation theorem which can be used to prove theorems about stochastic gradient descent. I’m happy to connect you with the authors.

view this post on Zulip Jason Gross (Aug 18 2023 at 20:42):

@Michael Soegtrop @Guillaume Melquiond (and others) If you're interested in taking a look at the concrete goal I'm struggling to prove, I factored out a theorem here saying that my program instantiated on Flocq's binary_floats is equal to the same program instantiated on R up to rounding error. (This file and its dependencies should build fine, but there are some other files in the repo that may run out of stack space or be very slow)

view this post on Zulip Michael Soegtrop (Aug 21 2023 at 10:06):

@Jason Gross : I didn't have time yet to look at it (working on a Coq Platform release), but I will have a look in the next days.

view this post on Zulip Bas Spitters (Sep 05 2023 at 10:04):

AITP is happening this week. Not much information online. Is anyone from the Coq side involved?
http://aitp-conference.org/2023/

view this post on Zulip Patrick Nicodemus (Sep 05 2023 at 20:40):

I don't know if you mean affiliated with Coq/Inria itself or Coq users but Talia Ringer is listed there and she uses Coq.

view this post on Zulip Patrick Nicodemus (Sep 05 2023 at 20:42):

I see that Liao Zhang is giving a talk on inductive logic programming in interactive theorem proving. That could potentially be relevant to Coq by way of Elpi.
I personally think that it would be interesting to try and implement some basic concepts of inductive logic programming in Elpi and see if you can learn patterns in proofs that way. But I don't know very much about inductive logic programming and I don't have enough of a clear idea of what we would get out of this.

view this post on Zulip Patrick Nicodemus (Sep 05 2023 at 20:45):

extended_abstract.pdf
I wrote an extended abstract for AITP 2023 but I didn't submit it as I felt I didn't know enough about ILP to bring the idea to fruition and I didn't have a confident vision of what this would look like in practice, I didn't have a clear enough picture of the end goal.

view this post on Zulip Abhishek Anand (Oct 09 2023 at 23:08):

Bas Spitters said:

AITP is happening this week. Not much information online. Is anyone from the Coq side involved?
http://aitp-conference.org/2023/

where the talks recorded? are the videos available to the public?

view this post on Zulip Bas Spitters (Oct 10 2023 at 09:25):

The slides are available on the website.


Last updated: May 24 2024 at 23:01 UTC