Stream: fiat-crypto

Topic: ECDSA


view this post on Zulip Hannes Mehnert (Jan 19 2021 at 16:18):

hey, I've finally found some time to try using fiat-crypto for ECDSA (starting with NIST P256), as suggested in https://github.com/mit-plv/fiat-crypto/issues/885 I'm now running ./word_by_word_montgomery --static 'p256' '64' '0xffffffff00000000ffffffffffffffffbce6faada7179e84f3b9cac2fc632551' -- while using the P256 curve there takes < 1 minute, this is now running since 20 CPU minutes (still < 40MB memory) without any output. the number (given in hex above) is the group order (n) -- should that be passed as polynomial instead?

Hi, thanks for your very nice work on fiat. We are using it (https://github.com/mirage/fiat) - with (not verfied) inversion, point_double and point_add function to compute ECDH P256. Now I've a...

view this post on Zulip Karl Palmskog (Jan 19 2021 at 16:19):

I think we will have to ping @Jason Gross for this one.

view this post on Zulip Hannes Mehnert (Jan 19 2021 at 16:25):

Thanks Karl -- sorry first time zulip user here -- please tell if I'm doing something wrong.

view this post on Zulip Jason Gross (Jan 19 2021 at 16:26):

I'm looking into it now, I suspect the issue is an asymptotic complexity problem in the hex string parser

view this post on Zulip Bas Spitters (Jan 19 2021 at 16:32):

@Rasmus Holdsbjerg-Larsen and @Benjamin Salling Hvass may also be interested/have insights.

view this post on Zulip Jason Gross (Jan 19 2021 at 16:45):

This is definitely a parsing issue. Time Compute parse_num false "0xffffffff00000000ffffffffffffffffbce6f". take 0.056s, but adding a to the end of the string results in it taking 2.303 s.

view this post on Zulip Hannes Mehnert (Jan 19 2021 at 16:46):

ok, 50 CPU minutes 50 MB memory, still running..

view this post on Zulip Jason Gross (Jan 19 2021 at 16:46):

And adding one more a to the string results in vm_compute stack overflowing...

view this post on Zulip Hannes Mehnert (Jan 19 2021 at 16:54):

Is there an easy way out? E.g. manual conversion of the group order from hex into a polynomial? Or hardcoding the group order number in word_by_word_montgomery.ml and recompile (I'm more familiar with OCaml than Coq...)?

view this post on Zulip Jason Gross (Jan 19 2021 at 17:08):

Ah, I figured it out. The issue is that I had the parser accepting 0x<num>e<num> as a notation for exponentiation, so it was computing 10^0x6faada7179e84f3b9cac2fc632551, which is a very very big number. (This is starting to become a theme, where taking lots of RAM and time is a result of somehow misinterpreting the user's input and treating something as an exponent when it should not be treated as such.) Anyway, I should have a commit fixing this pushed in a couple of minutes. In the meantime, if you write your number in decimal (or as any arithmetic expression where you avoid using e and E in hex numbers), that should work.

view this post on Zulip Jason Gross (Jan 19 2021 at 17:10):

Okay, I've just pushed the commit https://github.com/mit-plv/fiat-crypto/commit/e31a36d5f1b20134e67ccc5339d88f0ff3cb0f86 which should fix the issue (not fully tested)

The parser was previously accepting `e` and `E` as an exponent character in hex strings. This results in enormous exponentiation, so we disallow it. Tracked down from the report at https://coq.zu...

view this post on Zulip Hannes Mehnert (Jan 19 2021 at 17:11):

thanks Jason, I updated, running the make standalone-ocaml again, and will report back results.

view this post on Zulip Hannes Mehnert (Jan 19 2021 at 17:30):

Ok, the C code generation (see invocation on top) finished < 1 second now. Thanks! :)

view this post on Zulip Jason Gross (Jan 20 2021 at 03:15):

Great, I'm glad to have been able to solve this so easily!

view this post on Zulip Hannes Mehnert (Jan 21 2021 at 17:32):

hmm, and while trying out the generated C code, it looks like I'm unable to make good use of it (maybe due to a lack of understanding what representations are expected) -- I get a test vector from NIST (specifying a k and a k ^ -1 -- both as hex numbers). When using zarith - an OCaml multi-precision integer library (based on GMP), I'm able to let k = Z.of_string "0x..." in let kinv = Z.invert k (Z.of_string "0x <the P256 group order>") in assert (kinv = Z.of_string "0x <nist-provided kinv>").

With the generated fiat code, when attempting to do the same (convert hex to binary, apply inverse -- using the inversion_template.c), I get a different output for kinv. Now questions include: should I need to call of_bytes/from_bytes (they seem to not to anything memory-wise (eventually C-type-wise?))? what about to/from_montgomery, should I call this before passing the value to inverse? May it be a big-endian/little-endian representation issue? I have tried many of these variants (to/from_montgomery, do a reverse on the bytes before/after) with no success in getting the NIST provided kinv.

view this post on Zulip Hannes Mehnert (Jan 21 2021 at 19:15):

I looked a bit further,and expected \forall i . inverse (inverse i) = i [with the side condition that 0 <= i < m] -- and this does not hold in my code for some reason.

view this post on Zulip Hannes Mehnert (Jan 21 2021 at 19:16):

I switched back to the p256_64.c (i.e. normal 256 NIST curve generated by fiat), and then instantiate the inversion_template.c as suggested in the ECDSA issue.

view this post on Zulip Jason Gross (Jan 21 2021 at 21:14):

Hannes Mehnert said:

I switched back to the p256_64.c (i.e. normal 256 NIST curve generated by fiat), and then instantiate the inversion_template.c as suggested in the ECDSA issue.

I can't quite tell, does this solve the issue you're having?

view this post on Zulip Jason Gross (Jan 21 2021 at 21:18):

Regarding from/to bytes and from/to montgomery: the from/to bytes are things to be done when you need to get numbers from the wire or send them out. The pre and postconditions of most of the functions should tell you when you need to convert from and to montgomery. e.g., fiat_p256_mul has the post-condition eval (from_montgomery out1) mod m = (eval (from_montgomery arg1) * eval (from_montgomery arg2)) mod m and to_montgomery has the postcondition eval (from_montgomery out1) mod m = eval arg1 mod m so you would need to write something like from_montgomery (fiat_p256_mul(to_montgomery(x), to_montgomery(y))) to operate on non-Montgomery-form numbers.

view this post on Zulip Hannes Mehnert (Jan 21 2021 at 21:21):

No, I have some trouble with inversion, even with the "normal" P-256 prime (and not the group with the prime order I generated). I'll write that up tomorrow in a more concise form (including reproducible example) and put it into the GitHub issue.

thanks for your explanation how to read the pre/postconditions.

view this post on Zulip Hannes Mehnert (Jan 29 2021 at 14:46):

I figured out: (a) inversion requires an array with an additional limb set to 0, i.e. in[LIMBS + 1], in[LIMBS] = 0. (b) a from_montgomery should be preceeded before the call to inverse. with these changes, inverse(inverse x) = x.

I'm still a bit confused that -- using P256 (mod p) -- the result of 2^-1 mod p is not the same according to inverse vs Zarith's Z.invert:
0x800000007fffffff 0x8000000100000000 0xffffffff7fffffff 0xfffffffe80000000 (fiat's inverse)
0x7fffffff80000000 0x8000000000000000 0x80000000 0x0 (the big-endian output from zarith)

the input is 2, and { 0, 0, 0, 0x2 } for fiat (on a 64 bit computer)...

view this post on Zulip Jason Gross (Jan 29 2021 at 15:00):

You I think you need to run from_montgomery(inverse(to_montgomery(2))) to get 2^-1 mod p (Z.invert). Is this what you're running?

view this post on Zulip Hannes Mehnert (Jan 29 2021 at 15:08):

I tried at https://github.com/hannesm/p256-test
at 0: inv_two 7fffffff80000000 inv 7fffffff80000001 mo 8000000300000002
at 1: inv_two 8000000000000000 inv 8000000080000000 mo 800000047ffffffb
at 2: inv_two 80000000 inv 7fffffff mo fffffffc00000000
at 3: inv_two 0 inv 0 mo fffffffc80000005

view this post on Zulip Hannes Mehnert (Jan 29 2021 at 15:10):

(with inv_two being the hardcoded constant, inv before the from_montgomery, and mo the value after the from_montgomery)

view this post on Zulip Hannes Mehnert (Jan 29 2021 at 15:13):

here we go, just messed up the endianness

view this post on Zulip Jason Gross (Jan 29 2021 at 15:16):

{ 0, 0, 0, 0x2 } for fiat

Yeah, according to https://github.com/hannesm/p256-test/blob/63bc99c9530dbeac8301580d41f1fb21b3e9e5fe/p256_64.c#L15, this is 2*(1<<192)

view this post on Zulip Hannes Mehnert (Jan 29 2021 at 15:17):

with https://github.com/hannesm/p256-test/commit/0d8c8b3735f8d74aaae1862080a4ffddd8f93c40 it works, so it is 2^-1 mod p == inverse (to_montgomery(2)) [no from_montgomery needed]

view this post on Zulip Hannes Mehnert (Feb 18 2021 at 14:34):

as an update here, I use the tests from https://github.com/google/wycheproof/ for testing ECDSA signing and ECDH [see https://github.com/mirage/mirage-crypto/pull/101 if interested)

view this post on Zulip Jason Gross (Feb 18 2021 at 14:53):

Thanks for the update! I'm glad to hear P256 and P384 are working smoothly!

Are you doing 32-bit P224 or 64-bit P224 or both? If the failures are only seen in 64-bit P224, I'd expect there's some sort of misalignment of assumptions, since, unlike P256, P384, and P224 32-bit, 64-bit P224 leaves half-a-limb unused. Perhaps there's some presumption somewhere of zeroing it out.

Montgomery reduction is saturated, i.e., the number of bits per limb must equal the bitwidth of the machine. So 64-bit P256 uses 256/64=4 limbs, 32-bit P224 uses 224/32=8 limbs. 64-bit 224 wastes half a limb because 224/64=3.5. When the base-2 log of the leading coefficient of the prime is not a multiple of the bitwidth, generally unsaturated Solinas is used, so that we don't need to carry as often. Unfortunately our Montgomery generation is quartic in the number of limbs (see https://github.com/mit-plv/fiat-crypto/issues/851#issuecomment-658350696), and P521 x32 has ceil(521/32)=17 limbs, so I expect it to take about 16 minutes. (How much time does 64-bit Montgomery generation for P521 take for you? For me it takes about 2 minutes with ./src/ExtractionOCaml/word_by_word_montgomery p521 64 '2^521 - 1'.)

view this post on Zulip Hannes Mehnert (Feb 18 2021 at 14:59):

thanks for your kind answer

Are you doing 32-bit P224 or 64-bit P224 or both?

I develop with 64bit (and run the tests there), though CI (atm 256 & 384 only, since I have not pushed 224/256K1) also runs them on 32bit.

zeroing out

that may very well be the case, thanks for reminding me that there are 32bit unused

view this post on Zulip Hannes Mehnert (Feb 18 2021 at 15:00):

and thanks for the explanation. the word_by_word_montgomery for 64bit and P521 was around 5 minutes on this computer.

view this post on Zulip Jason Gross (Feb 18 2021 at 15:03):

Then I'd expect 32-bit P521 WBW Montgomery to complete within 40 minutes. (I am trying it on my machine currently with ulimit -s unlimited; it stack overflowed after 5 minutes without that.)

view this post on Zulip Bas Spitters (Feb 18 2021 at 15:05):

@Hannes Mehnert It's good to do these test, but given that fiat is provably correct.
What would cause test failures? The specification being uncomplete?

view this post on Zulip Jason Gross (Feb 18 2021 at 15:06):

I suspect it's a mismatch between the specification of the generated code and the implicit specification of the higher-level algorithm it's being plugged into. (Examples: whose responsibility it is to mask off high bits when converting from bytes; whether the representation is little-endian or big-endian; where to convert to/from montgomery)

view this post on Zulip Hannes Mehnert (Feb 18 2021 at 15:07):

An excellent question. There's a thin layer to actually do ECDH&ECDSA on top of the generated C code (plus my usage of calling the C code) -- the generated C code is basically "arithmetics in the group" - while there's still some scalar multiplication etc. to be done to do a full ECDH/ECDSA operation.

view this post on Zulip Bas Spitters (Feb 18 2021 at 15:09):

@Rasmus Holdsbjerg-Larsen and @Benjamin Salling Hvass have started to work on using bedrock2 to generate a little more of the operations. It might be relevant in your case too.

view this post on Zulip Hannes Mehnert (Feb 18 2021 at 15:15):

@Jason Gross indeed using the 32bit P224 code leads to all tests succeeding.

view this post on Zulip Jason Gross (Feb 19 2021 at 13:44):

WBW Montgomery for P521 32-bit took 221m50s on my laptop

view this post on Zulip Jason Gross (Feb 19 2021 at 13:44):

If you want the output for testing, I can send it to you, if that's helpful. It's almost 20,000 loc

view this post on Zulip Hannes Mehnert (Feb 22 2021 at 18:40):

I tried unsaturated_solinas, and figured there's no inversion for them (am I missing something?)


Last updated: Feb 06 2023 at 05:03 UTC