Stream: Coq users

Topic: Performance of `Recursive Extraction`


view this post on Zulip Jason Gross (Jun 11 2022 at 03:17):

I made a change that replaced associative lists with tries and FMaps, and Recursive Extraction shot up from around 20s to around 90--95 minutes. How do I debug this? (cc @Pierre-Marie Pédrot ) (Code here)

view this post on Zulip Ali Caglayan (Jun 11 2022 at 10:48):

Could you do an strace before and after? They should have around the same lines. If they do then we know extraction isn't messing around finding files.

view this post on Zulip Ali Caglayan (Jun 11 2022 at 11:48):

We can also drop some Printfs in extraction to find out what it is spending it's time on.

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 11:49):

... or perf the extraction process, you'll get a much clearer picture.

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 11:50):

Usually extraction is slow because of the kind of optimizations it performs, but I haven't checked yet on this one.

view this post on Zulip Jason Gross (Jun 11 2022 at 15:05):

I'll try perfing it shortly, but note that this is with Global Unset Extraction Optimize. Should I just do perf record -g on coqc?

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 15:18):

Indeed. The fact this happens even with optimization unset makes it even more interesting, also.

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 15:18):

(I was not able to compile locally your branch with coq master, is it expected?)

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 15:23):

(ah, no, it was me not realizing this dev had recursive submodules and I didn't update them accordingly)

view this post on Zulip Jason Gross (Jun 11 2022 at 16:02):

The top of perf report on the fast (20s) dev:

Samples: 102K of event 'cpu-clock:uhpppH', Event count (approx.): 25656500000
  Children      Self  Command  Shared Object                       Symbol
+   57.57%     0.49%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__fun_3448
+   41.45%     0.03%  coqc     coqc                                [.] camlPp__prlist_sep_lastsep_606
+   41.43%     0.10%  coqc     coqc                                [.] camlStdlib__list__map_233
+   41.12%     0.02%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Common__pp_tuple_1590
+   36.24%     0.49%  coqc     coqc                                [.] camlCList__map_802
+   33.40%     3.58%  coqc     coqc                                [.] camlCList__map_loop_795
+   24.25%     0.17%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__extract_maybe_term_3305
+   22.30%     0.03%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Common__pp_ocaml_gen_3427
+   21.79%     0.13%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__f_3431
+   21.65%     0.03%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__pp_one_pat_2041
+   21.22%     0.15%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Common__pp_ocaml_local_3405
+   20.11%     0.00%  coqc     coqc                                [.] camlPp__prvecti_with_sep_647
+   20.04%     0.02%  coqc     coqc                                [.] camlStdlib__array__mapi_175
+   19.97%     0.01%  coqc     coqc                                [.] camlPp__fun_1318
+   19.91%     0.02%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__fun_3715
+   19.24%     2.73%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Common__clash_3361
+   19.12%     0.47%  coqc     coqc                                [.] camlStdlib__list__fold_left_272
+   18.55%     0.27%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__extract_cons_app_3309
+   18.16%     0.00%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__ppl_2955
+   18.14%     0.00%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__prlist_sep_nonempty_2942
+   17.93%     0.00%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__pp_decl_2648
+   17.70%     0.00%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__pp_function_2043
+   17.20%     0.11%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__extract_term_3304
+   17.08%     0.00%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__pp_structure_elem_2905
+   17.03%     0.00%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__try_pp_structure_elem_2934
+   16.96%     0.00%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Ocaml__pp_module_expr_2906
+   14.37%     0.00%  coqc     coqc                                [.] camlFlags__with_modified_ref_inner_288
+   14.21%     0.00%  coqc     coqc                                [.] camlVernacinterp__interp_gen_1036
+   14.16%     2.78%  coqc     coqc                                [.] camlStdlib__map__mem_288

view this post on Zulip Jason Gross (Jun 11 2022 at 16:04):

vs, on the first 60 seconds of the slow dev (recording data for more than that is too much for perf report on my laptop):

Samples: 232K of event 'cpu-clock:uhpppH', Event count (approx.): 58173250000
  Children      Self  Command  Shared Object                       Symbol
+   94.81%     1.09%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__extract_type_2708
+   94.81%     0.24%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__extract_type_app_2709
+   94.81%     0.21%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__fun_4736
+   94.59%     0.54%  coqc     coqc                                [.] camlStdlib__list__fold_right_278
+   92.04%     0.00%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__extract_type_cons_2713
+   91.45%     0.00%  coqc     [unknown]                           [.] 0x0000000000000001
+   90.76%     0.00%  coqc     [unknown]                           [.] 0x000000000000000d
+   90.00%     0.00%  coqc     [unknown]                           [.] 0x8d493672087e3b4d
+   57.03%    14.33%  coqc     coqc                                [.] camlConstr__map_with_binders_2478
+   56.35%    10.53%  coqc     coqc                                [.] camlCArray__map_1659
+   55.52%     0.41%  coqc     coqc                                [.] camlStdlib__array__map_154
+   55.25%     0.20%  coqc     coqc                                [.] camlConstr__map_branches_with_binders_2270
+   55.04%     0.64%  coqc     coqc                                [.] camlConstr__map_under_context_with_binders_2260
+   49.55%     0.32%  coqc     coqc                                [.] camlRetyping__sort_family_of_1961
+   38.41%     0.18%  coqc     coqc                                [.] camlRetyping__concl_of_arity_1040
+   23.74%     0.34%  coqc     coqc                                [.] camlReductionops__red_of_state_red_3904
+   19.93%     0.02%  coqc     coqc                                [.] camlReductionops__aux_3127
+   11.28%     0.54%  coqc     coqc                                [.] camlRetyping__type_of_1541
+    8.90%     0.31%  coqc     coqc                                [.] camlEnviron__lookup_constant_key_1446
+    8.89%     0.70%  coqc     coqc                                [.] camlStdlib__array__copy_116
+    8.41%     0.08%  coqc     coqc                                [.] camlRetyping__type_of_global_reference_knowing_parameters_1543+    8.38%     1.17%  coqc     coqc                                [.] caml_array_sub
+    8.19%     0.43%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Mlutil__expand_2536
+    7.74%     0.28%  coqc     coqc                                [.] camlInductive__type_of_inductive_gen_1677
+    7.32%     0.25%  coqc     coqc                                [.] camlInductive__instantiate_universes_1657
+    6.68%     0.47%  coqc     extraction_plugin.cmxs              [.] camlExtraction_plugin__Extraction__mlt_env_2714
+    6.03%     0.22%  coqc     coqc                                [.] camlInductive__make_1626
+    6.01%     0.14%  coqc     coqc                                [.] camlRetyping__anomaly_on_error_483
+    5.49%     0.56%  coqc     coqc                                [.] camlStdlib__map__find_opt_280

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 16:07):

what file is this exactly?

view this post on Zulip Jason Gross (Jun 11 2022 at 16:08):

Flame graph for the fast one:
perf-extract-unsaturated-solinas-fast.svg

view this post on Zulip Jason Gross (Jun 11 2022 at 16:09):

what file is this exactly?

src/ExtractionOCaml/unsaturated_solinas.v

Invocation was

COQPATH="$(pwd)/coqprime/src:$(pwd)/rupicola/bedrock2/deps/coqutil/src:$(pwd)/rupicola/src:$(pwd)/rupicola/bedrock2/bedrock2/src:$(pwd)/rupicola/bedrock2/compiler/src:$(pwd)/rupicola/bedrock2/deps/riscv-coq/src:$(pwd)/rewriter/src" command time -f "src/ExtractionOCaml/unsaturated_solinas.ml (real: %e, user: %U, sys: %S, mem: %M ko)" perf record -g -o /tmp/perf-extract-unsaturated-solinas-timeout-60.data timeout 60 "coqc"  -q -w +implicit-core-hint-db,+implicits-in-term,+non-reversible-notation,+deprecated-intros-until-0,+deprecated-focus,+unused-intro-pattern,+variable-collision,+omega-is-deprecated,+deprecated-instantiate-syntax,+non-recursive,unsupported-attributes -w -notation-overridden,-undeclared-scope,-deprecated-hint-rewrite-without-locality,-deprecated-hint-constr,-fragile-hint-constr,-native-compiler-disabled,-ambiguous-paths,-masking-absolute-name  "-w" "-deprecated-native-compiler-option" "-native-compiler" "ondemand" -R src Crypto src/ExtractionOCaml/unsaturated_solinas.v

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 16:11):

I'm not used to perfing fiat-crypto too much but right now I'm seeing very suspicious stuff in the profile, before reaching that file

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 16:12):

specifically, Coq passing its time in the generation of inductive schemes

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 16:16):

in src/Assembly/Symbolic.v to be precise

view this post on Zulip Jason Gross (Jun 11 2022 at 16:36):

Plausibly this is because we have an inductive with almost 30 constructors in Symbolic.v, and decidable equality schemes are cubic last I checked?

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 16:38):

way worse than cubic if I read correctly

view this post on Zulip Jason Gross (Jun 11 2022 at 17:00):

Which part of Symbolic.v are you seeing as slow? Do you see the same issues in

Require Import Coq.NArith.NArith.
Definition idx := N.
Local Set Decidable Equality Schemes.
Definition symbol := N.

Class OperationSize := operation_size : N.

Section S.
Implicit Type s : OperationSize.
Variant op := old s (_:symbol) | const (_ : Z) | add s | addcarry s | subborrow s | addoverflow s | neg s | shl s | shr s | sar s | rcr s | and s | or s | xor s | slice (lo sz : N) | mul s | set_slice (lo sz : N) | selectznz | iszero (* | ... *)
  | addZ | mulZ | negZ | shlZ | shrZ | andZ | orZ | xorZ | addcarryZ s | subborrowZ s.
End S.

? (All of the scheme generation feels relatively fast to me, though (on the branch but not on master) functor application is kind-of slow.)

view this post on Zulip Jason Gross (Jun 11 2022 at 17:10):

btw, if you want a target that makes just one of the extracted things and its dependencies and no more (sorry for being so late giving this to you), you can run make src/ExtractionOCaml/unsaturated_solinas.ml SKIP_BEDROCK2=1

view this post on Zulip Jason Gross (Jun 11 2022 at 17:17):

Here's the flamegraph for the 60s of the slow extraction:
perf-extract-unsaturated-solinas-slow-timeout-60.svg

view this post on Zulip Jason Gross (Jun 11 2022 at 17:45):

Pierre-Marie Pédrot said:

way worse than cubic if I read correctly

https://github.com/coq/coq/issues/8517 says cubic based on measurements I took a while ago

view this post on Zulip Jason Gross (Jun 11 2022 at 17:46):

Oh, wait, no, that just says that you told me it was cubic

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 19:40):

There are two different issues at play here with scheme generation.

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 19:40):

it's essentially cubic in the number of constructors, but that's not the problem here

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 19:41):

rather, the algorithm to compute the dependencies on other inductive scheme is pretty much arbitrary

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 19:41):

I suspect that in this case it's just crawling through huge terms, including proofs

view this post on Zulip Jason Gross (Jun 11 2022 at 19:42):

Which inductive scheme generation is slow in Symbolic.v? I don't recall any non-proof non-functor hotspots in that file

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 19:44):

I interrupted this file after waiting for more than 40 minutes

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 19:51):

Seems to be

Record symbolic_state := { dag_state :> dag ; symbolic_reg_state :> reg_state ; symbolic_flag_state :> flag_state ; symbolic_mem_state :> mem_state }.

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2022 at 19:55):

I confirm that unsetting the equality scheme generation flag makes this instantatenous

view this post on Zulip Jason Gross (Jun 12 2022 at 01:33):

That is very strange. In https://github.com/mit-plv/fiat-crypto/commit/3f55b840657cc952007744a2c5ba38e25e17469a I added a bunch of functor instantiations to Symbolic.v which bumped the time on my machine from 15s to 25s and from 673MB RAM to 1.3GB RAM. This is on something between Coq 8.11 and 8.15. You can check our CI for timing data of this file across the versions of Coq (the end of each build step displays a table of timing information). For it to be taking 40 minutes on your machine means that something very strange is going on...

view this post on Zulip Jason Gross (Jun 12 2022 at 01:34):

The commit message for https://github.com/mit-plv/fiat-crypto/commit/5342ad973e465642645f25b072cca4051f86291e, btw, includes a table which says how long it should take to build each file

view this post on Zulip Jason Gross (Jun 12 2022 at 15:33):

@Pierre-Marie Pédrot any updates, or insights about why extract_type might be taking so long? (Should I open a Coq issue?)

view this post on Zulip Jason Gross (Jun 12 2022 at 16:57):

@Pierre-Marie Pédrot Here's a reduced example showing that Recursive Extraction is exponential: https://github.com/coq/coq/issues/16172 The hotspot in perf is different, though, not sure what to make of that.

view this post on Zulip Pierre-Marie Pédrot (Jun 12 2022 at 20:21):

wdym? extract_type is the leading issue in you perf profile

view this post on Zulip Pierre-Marie Pédrot (Jun 12 2022 at 20:22):

(I haven't managed to compile fiat-crypto with your patch yet, I can still reproduce the Symbolic.v explosion and it kills my laptop after some time)

view this post on Zulip Gaëtan Gilbert (Jun 12 2022 at 20:34):

per line timings for symbolic.v in https://gitlab.com/coq/coq/-/jobs/2396228741 show nothing interesting
there are no recent ones because fiat-crypto is broken in the bench snce last month


Last updated: Feb 05 2023 at 23:30 UTC