Hi, I started playing around with the profiler in Coq 8.19.0 in RefinedC and I noticed that enabling the profiler can completely change the performance profile of coqc. Is this expected or is there anything one can do about this? As it currently stands, this makes the profiler basically useless for me.
Concretely, I encountered the following example.
Without profiling:
coqc examples/scheduler/src/proofs/fdsched/generated_proof_fds_init.{glob,vo}
Finished transaction in 1.256 secs (1.256u,0.s) (successful)
Finished transaction in 10.321 secs (10.3u,0.02s) (successful)
Finished transaction in 3.149 secs (3.146u,0.002s) (successful)
Finished transaction in 0.381 secs (0.381u,0.s) (successful)
Finished transaction in 1.366 secs (1.364u,0.001s) (successful)
Finished transaction in 0.481 secs (0.481u,0.s) (successful)
Finished transaction in 1.129 secs (1.129u,0.s) (successful)
34.38s 99%
With profiling:
coqc examples/scheduler/src/proofs/fdsched/generated_proof_fds_init.{glob,vo}
Finished transaction in 1.322 secs (1.255u,0.067s) (successful)
Finished transaction in 12.491 secs (12.071u,0.419s) (successful)
Finished transaction in 3.789 secs (3.622u,0.166s) (successful)
Finished transaction in 0.596 secs (0.559u,0.036s) (successful)
Finished transaction in 22.411 secs (18.686u,3.72s) (successful)
Finished transaction in 0.597 secs (0.554u,0.043s) (successful)
Finished transaction in 18.288 secs (15.266u,3.019s) (successful)
80.50s 99%
Note that the 5ht and 7th Time
commands get an order of magnitude slower from profiling.
For reference, this is the file that is compiled: (If someone is interested in replicating this locally, let me know.)
From refinedc.typing Require Import typing.
From refinedc.examples.scheduler.src.fdsched Require Import generated_code.
From refinedc.examples.scheduler.src.fdsched Require Import generated_spec.
From refinedc.examples.scheduler.src.fdsched Require Import message_extra.
From caesium Require Import builtins_specs.
From refinedc.examples.scheduler.src.fdsched Require Import priority_extra.
From refinedc.examples.scheduler.src.fdsched Require Import scheduler_extra.
From refinedc.examples.scheduler.src.fdsched Require Import fdsched_extra.
From refinedc.examples.scheduler.src.fdsched Require Import fdsched_extra.
From refinedc.typing Require Import malloc.
From refinedc.examples.scheduler.src.fdsched Require Import fdsched_extra.
Set Default Proof Using "Type".
(* Generated from [examples/scheduler/src/fdsched.c]. *)
Section proof_fds_init.
Context `{!typeG Σ} `{!globalG Σ}.
Context `{!PacketArrivals}.
Context `{!mallocG Σ}.
(* Typing proof for [fds_init]. *)
Lemma type_fds_init (global_nop_callback global_prio_level_init : loc) :
global_nop_callback ◁ᵥ global_nop_callback @ function_ptr type_of_nop_callback -∗
global_prio_level_init ◁ᵥ global_prio_level_init @ function_ptr type_of_prio_level_init -∗
typed_function (impl_fds_init global_nop_callback global_prio_level_init) type_of_fds_init.
Proof.
Local Open Scope printing_sugar.
start_function "fds_init" ([fds_refn p]) => arg_fds local_prio local_typ.
prepare_parameters (fds_refn p).
split_blocks ((
<[ "#4" :=
∃ priority : nat,
local_typ ◁ₗ uninit (it_layout i32) ∗
local_prio ◁ₗ (priority @ (int (i32))) ∗
arg_fds ◁ₗ (p @ (&own (struct (struct_fd_scheduler) [@{type} uninit (mk_array_layout i32 16) ; (0) @ (int (u64)) ; struct (struct_npfp_scheduler) [@{type} array (layout_of struct_callback) (replicate (Z.to_nat num_msg_types) 0%nat `at_type` callback_t) ; array_p (struct_message_queue) (replicate priority (@nil message_data) `at_type` message_queue) (Z.to_nat num_priorities) ; (replicate (Z.to_nat num_priorities) false) @ (prio_bitmap_t) ] ]))) ∗
⌜priority <= num_priorities⌝
]> $
<[ "#1" :=
∃ i : nat,
local_prio ◁ₗ uninit (it_layout i32) ∗
local_typ ◁ₗ (i @ (int (i32))) ∗
arg_fds ◁ₗ (p @ (&own (struct (struct_fd_scheduler) [@{type} uninit (mk_array_layout i32 16) ; (0) @ (int (u64)) ; struct (struct_npfp_scheduler) [@{type} array_p (layout_of struct_callback) (replicate i 0%nat `at_type` callback_t) (Z.to_nat num_msg_types) ; uninit (mk_array_layout struct_message_queue (Z.to_nat num_priorities)) ; (replicate (Z.to_nat num_priorities) false) @ (prio_bitmap_t) ] ]))) ∗
⌜i <= num_msg_types⌝
]> $
∅
)%I : gmap label (iProp Σ)) (
@nil Prop
).
- Time repeat liRStep; liShow.
all: print_typesystem_goal "fds_init" "#0".
- Time repeat liRStep; liShow.
all: print_typesystem_goal "fds_init" "#4".
- Time repeat liRStep; liShow.
all: print_typesystem_goal "fds_init" "#1".
Unshelve. all: unshelve_sidecond; sidecond_hook; prepare_sideconditions; normalize_and_simpl_goal; try solve_goal; unsolved_sidecond_hook.
+ Time by apply list_subequiv_split; solve_goal.
+ Time have Hprio: priority = (Z.to_nat num_priorities); solve_goal.
+ Time by apply list_subequiv_split; solve_goal.
+ Time have Hi: i = (Z.to_nat num_msg_types); solve_goal.
all: print_sidecondition_goal "fds_init".
Unshelve. all: try done; try apply: inhabitant; print_remaining_shelved_goal "fds_init".
Qed.
End proof_fds_init.
https://github.com/coq/coq/issues/18489 and https://github.com/coq/coq/pull/18483 are probably relevant here
yes, you can try eg env COQ_PROFILE_COMPONENTS=command,parse_command coqc -profile ...
the profile will still have all components in the "subtimes" annotation but only command/parse_command (and process/init) will have full events which AFAICT is the part that gets really slow
then you can look at the "subtimes" for possibly interesting components to re-enable
Thanks, this makes the two runs indeed a lot closer. Is there a good way to visualize the subtimes, e.g. in https://www.speedscope.app/ ? Or do I have to look at the json manually?
Thanks! Quick follow up, is there documentation somewhere what the components mean?
E.g. in my example, I have the following:
Conversion 5.29E+05 us, 41427 calls
intern 1.44E+03 us, 76 calls
interp 1.06E+07 us, 1 call
pretyping 5.06E+06 us, 8644 calls
synterp 2.15 us, 1 call
typeclass search 4.22E+06 us, 547 calls
unfreeze_full_state 5.96 us, 1 call
unification 4.53E+05 us, 61975 calls
I can guess what typeclass search
, unification
and Conversion
are, but what are pretyping
and interp
?
there's no documentation, they're pretty adhoc
you can grep the code of coq for NewProfile.profile "$thestring"
intepreting a user term eg 0 + 0
has 2 phases:
Nat.add Datatypes.0 Datatypes.0
)executing a command is divided in 2 phases (ignoring STM nonsense)
Import M
to get the right notations)Makes sense, thanks for the explanation!
Last updated: Oct 13 2024 at 01:02 UTC