Skip to content

Add HOL Light proof for aarch64 poly_chknorm#942

Merged
mkannwischer merged 2 commits intomainfrom
hol-arm-chknorm
Mar 20, 2026
Merged

Add HOL Light proof for aarch64 poly_chknorm#942
mkannwischer merged 2 commits intomainfrom
hol-arm-chknorm

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer commented Feb 3, 2026

Add a HOL Light functional correctness proof for the aarch64 ML-DSA function poly_chknorm, which checks whether any polynomial coefficient has absolute value >= a given bound.

The correctness proof shows that the assembly computes bitval(?i. i < 256 /\ abs(ival(x i)) >= ival bound)

TODO:

  • Merge 3 missing instructions into s2n-bignum - I opened a PR here
  • CBMC spec and proof

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Feb 3, 2026

CBMC Results (ML-DSA-87)

Full Results (176 proofs)
Proof Status Current Previous Change
**TOTAL** 2586s 2658s -2.7%
sign_verify_internal 315s 337s -7%
polyvecl_pointwise_acc_montgomery_c 258s 270s -4%
mld_attempt_signature_generation 232s 233s -0%
polyvec_matrix_expand 169s 177s -5%
rej_uniform_native 140s 147s -5%
poly_pointwise_montgomery_c 139s 156s -11%
mld_invntt_layer 93s 91s +2%
polyvec_matrix_expand_serial 80s 81s -1%
mld_ct_memcmp 74s 79s -6%
polyveck_decompose 57s 55s +4%
mld_ntt_layer 53s 52s +2%
sign_signature_internal 52s 51s +2%
polymat_permute_bitrev_to_custom 45s 46s -2%
keccak_squeezeblocks_x4 43s 41s +5%
mld_compute_t0_t1_tr_from_sk_components 23s 26s -12%
rej_uniform 21s 23s -9%
fqmul 19s 19s +0%
poly_chknorm_c 19s 20s -5%
poly_uniform_eta_4x 16s 15s +7%
polyeta_unpack 16s 17s -6%
poly_uniform_4x 15s 18s -17%
rej_uniform_c 15s 16s -6%
polyt0_unpack 14s 13s +8%
poly_add 13s 11s +18%
keccakf1600x4_permute_native 12s 12s +0%
mld_ntt_butterfly_block 12s 12s +0%
polyveck_use_hint 12s 11s +9%
keccak_absorb_once_x4 11s 10s +10%
mld_polyvecl_permute_bitrev_to_custom_native 11s 12s -8%
poly_decompose_c 11s 9s +22%
polyvec_matrix_pointwise_montgomery 10s 11s -9%
polyveck_caddq 10s 10s +0%
polyvecl_ntt 10s 9s +11%
sign_pk_from_sk 10s 7s +43%
polyveck_add 9s 11s -18%
polyveck_power2round 9s 8s +12%
keccakf1600x4_extract_bytes 8s 2s +300%
polyveck_pointwise_poly_montgomery 8s 9s -11%
polyveck_reduce 8s 10s -20%
polyz_unpack_c 8s 10s -20%
keccak_absorb 7s 9s -22%
keccakf1600_permute 7s 7s +0%
keccakf1600_permute_native 7s 9s -22%
mld_check_pct 7s 8s -12%
poly_invntt_tomont_c 7s 7s +0%
polyveck_invntt_tomont 7s 10s -30%
polyveck_ntt 7s 8s -12%
polyveck_sub 7s 7s +0%
sign_keypair_internal 7s 7s +0%
mld_compute_pack_z 6s 9s -33%
mld_ct_cmask_nonzero_u32 6s 3s +100%
mld_sample_s1_s2_serial 6s 7s -14%
poly_caddq_c 6s 5s +20%
polyveck_make_hint 6s 8s -25%
shake256_finalize 6s 3s +100%
sign 6s 8s -25%
sign_open 6s 4s +50%
unpack_hints 6s 6s +0%
keccak_finalize 5s 1s +400%
mld_ct_cmask_neg_i32 5s 3s +67%
mld_sample_s1_s2 5s 8s -38%
mld_value_barrier_u32 5s 2s +150%
montgomery_reduce 5s 4s +25%
poly_challenge 5s 3s +67%
poly_uniform_eta 5s 7s -29%
polyveck_chknorm 5s 4s +25%
polyveck_shiftl 5s 7s -29%
polyveck_unpack_eta 5s 4s +25%
polyveck_unpack_t0 5s 5s +0%
polyvecl_pointwise_acc_montgomery 5s 2s +150%
polyvecl_uniform_gamma1 5s 5s +0%
polyvecl_uniform_gamma1_serial 5s 6s -17%
rej_eta_c 5s 4s +25%
sign_verify_extmu 5s 3s +67%
sign_verify_pre_hash_shake256 5s 4s +25%
unpack_pk 5s 2s +150%
intt_native_x86_64 4s 3s +33%
make_hint 4s 5s -20%
mld_ct_cmask_nonzero_u8 4s 3s +33%
mld_ct_get_optblocker_u32 4s 2s +100%
mld_h 4s 5s -20%
mld_value_barrier_i64 4s 2s +100%
ntt_native_x86_64 4s 3s +33%
pack_pk 4s 2s +100%
pack_sig_c_h 4s 3s +33%
poly_caddq 4s 4s +0%
poly_chknorm 4s 4s +0%
poly_decompose_native 4s 4s +0%
poly_invntt_tomont_native 4s 2s +100%
poly_ntt_native 4s 4s +0%
poly_use_hint 4s 3s +33%
polyt0_pack 4s 4s +0%
polyt1_pack 4s 2s +100%
polyvecl_pointwise_acc_montgomery_native 4s 4s +0%
polyvecl_unpack_eta 4s 4s +0%
reduce32 4s 3s +33%
shake128_finalize 4s 3s +33%
shake128_init 4s 3s +33%
shake256_absorb 4s 3s +33%
sign_keypair 4s 6s -33%
sign_signature 4s 4s +0%
sign_signature_extmu 4s 6s -33%
sign_signature_pre_hash_shake256 4s 4s +0%
sign_verify 4s 4s +0%
sign_verify_pre_hash_internal 4s 3s +33%
unpack_sk 4s 5s -20%
caddq 3s 5s -40%
decompose 3s 4s -25%
fqscale 3s 2s +50%
keccak_init 3s 4s -25%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 4s -25%
keccakf1600_xor_bytes 3s 1s +200%
keccakf1600x4_permute 3s 2s +50%
mld_ct_abs_i32 3s 1s +200%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 4s -25%
mld_value_barrier_u8 3s 3s +0%
poly_caddq_native 3s 4s -25%
poly_caddq_native_aarch64 3s 2s +50%
poly_chknorm_native 3s 3s +0%
poly_chknorm_native_aarch64 3s - new
poly_decompose 3s 2s +50%
poly_invntt_tomont 3s 3s +0%
poly_pointwise_montgomery_native 3s 3s +0%
poly_power2round 3s 6s -50%
poly_reduce 3s 4s -25%
poly_uniform_gamma1 3s 2s +50%
poly_uniform_gamma1_4x 3s 3s +0%
poly_use_hint_c 3s 3s +0%
polyeta_pack 3s 3s +0%
polyveck_pack_w1 3s 3s +0%
polyvecl_chknorm 3s 6s -50%
polyvecl_pack_eta 3s 4s -25%
polyvecl_permute_bitrev_to_custom 3s 3s +0%
polyvecl_unpack_z 3s 4s -25%
polyw1_pack 3s 2s +50%
polyz_unpack 3s 3s +0%
power2round 3s 3s +0%
rej_eta 3s 2s +50%
rej_eta_native 3s 5s -40%
shake128_absorb 3s 3s +0%
shake256 3s 3s +0%
shake256_init 3s 3s +0%
shake256_squeeze 3s 1s +200%
sign_signature_pre_hash_internal 3s 4s -25%
sys_check_capability 3s 3s +0%
unpack_sig 3s 2s +50%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_xor_bytes 2s 4s -50%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_ct_sel_int32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 2s +0%
pack_sig_z 2s 3s -33%
pack_sk 2s 2s +0%
poly_make_hint 2s 2s +0%
poly_ntt 2s 3s -33%
poly_ntt_c 2s 3s -33%
poly_pointwise_montgomery 2s 2s +0%
poly_shiftl 2s 4s -50%
poly_sub 2s 3s -33%
poly_uniform 2s 4s -50%
poly_use_hint_native 2s 3s -33%
polyt1_unpack 2s 2s +0%
polyveck_pack_eta 2s 3s -33%
polyz_pack 2s 2s +0%
polyz_unpack_native 2s 3s -33%
shake128_squeeze 2s 3s -33%
shake128x4_absorb_once 2s 3s -33%
shake128x4_squeezeblocks 2s 2s +0%
shake256_release 2s 2s +0%
shake256x4_absorb_once 2s 2s +0%
shake256x4_squeezeblocks 2s 2s +0%
polyveck_pack_t0 1s 4s -75%
shake128_release 1s 2s -50%
use_hint 1s 5s -80%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Feb 3, 2026

CBMC Results (ML-DSA-44)

Full Results (176 proofs)
Proof Status Current Previous Change
**TOTAL** 1932s 2039s -5.2%
mld_attempt_signature_generation 230s 247s -7%
polyvecl_pointwise_acc_montgomery_c 200s 221s -10%
poly_pointwise_montgomery_c 140s 161s -13%
rej_uniform_native 140s 151s -7%
sign_verify_internal 120s 132s -9%
mld_invntt_layer 87s 88s -1%
mld_ct_memcmp 72s 76s -5%
mld_ntt_layer 52s 56s -7%
keccak_squeezeblocks_x4 43s 44s -2%
sign_signature_internal 33s 35s -6%
polyvec_matrix_expand 27s 27s +0%
poly_chknorm_c 20s 21s -5%
fqmul 19s 19s +0%
rej_uniform 19s 22s -14%
polymat_permute_bitrev_to_custom 17s 16s +6%
mld_compute_t0_t1_tr_from_sk_components 16s 15s +7%
poly_uniform_4x 16s 16s +0%
poly_uniform_eta_4x 16s 18s -11%
polyeta_unpack 16s 17s -6%
polyvec_matrix_expand_serial 15s 11s +36%
polyveck_power2round 14s 11s +27%
mld_ntt_butterfly_block 13s 12s +8%
polyt0_unpack 13s 15s -13%
rej_uniform_c 13s 14s -7%
keccakf1600x4_permute_native 12s 14s -14%
polyvec_matrix_pointwise_montgomery 12s 11s +9%
poly_add 10s 13s -23%
polyz_unpack_c 10s 11s -9%
sign 9s 6s +50%
keccak_absorb_once_x4 8s 10s -20%
mld_polyvecl_permute_bitrev_to_custom_native 8s 7s +14%
keccak_absorb 7s 8s -12%
keccakf1600_permute 7s 8s -12%
keccakf1600_permute_native 7s 8s -12%
mld_check_pct 7s 7s +0%
mld_compute_pack_z 7s 10s -30%
mld_prepare_domain_separation_prefix 7s 4s +75%
polyveck_add 7s 7s +0%
polyveck_sub 7s 4s +75%
polyvecl_permute_bitrev_to_custom 7s 4s +75%
sign_pk_from_sk 7s 5s +40%
keccak_squeeze 6s 3s +100%
mld_value_barrier_u32 6s 1s +500%
poly_invntt_tomont_c 6s 6s +0%
poly_power2round 6s 4s +50%
polyveck_caddq 6s 5s +20%
polyveck_ntt 6s 6s +0%
polyveck_pointwise_poly_montgomery 6s 6s +0%
polyveck_reduce 6s 5s +20%
polyveck_use_hint 6s 7s -14%
polyvecl_pointwise_acc_montgomery_native 6s 3s +100%
sign_verify 6s 4s +50%
poly_caddq_c 5s 6s -17%
poly_ntt_native 5s 3s +67%
poly_uniform_eta 5s 3s +67%
poly_use_hint_c 5s 5s +0%
polyeta_pack 5s 1s +400%
polyveck_chknorm 5s 6s -17%
polyveck_decompose 5s 5s +0%
polyveck_invntt_tomont 5s 5s +0%
polyveck_make_hint 5s 6s -17%
polyveck_pack_t0 5s 4s +25%
polyveck_pack_w1 5s 4s +25%
rej_eta_native 5s 4s +25%
sign_keypair_internal 5s 4s +25%
sign_verify_pre_hash_internal 5s 5s +0%
sys_check_capability 5s 2s +150%
unpack_hints 5s 5s +0%
keccakf1600x4_extract_bytes 4s 4s +0%
mld_h 4s 3s +33%
mld_keccakf1600_extract_bytes 4s 3s +33%
mld_sample_s1_s2_serial 4s 5s -20%
ntt_native_x86_64 4s 3s +33%
poly_caddq 4s 4s +0%
poly_caddq_native_aarch64 4s 4s +0%
poly_challenge 4s 5s -20%
poly_decompose 4s 2s +100%
poly_ntt_c 4s 3s +33%
poly_sub 4s 3s +33%
polyt0_pack 4s 3s +33%
polyt1_pack 4s 5s -20%
polyveck_shiftl 4s 6s -33%
polyvecl_chknorm 4s 5s -20%
polyvecl_ntt 4s 3s +33%
polyvecl_pack_eta 4s 2s +100%
polyvecl_pointwise_acc_montgomery 4s 4s +0%
polyvecl_uniform_gamma1 4s 4s +0%
polyz_pack 4s 4s +0%
polyz_unpack 4s 3s +33%
polyz_unpack_native 4s 2s +100%
rej_eta 4s 2s +100%
shake128x4_absorb_once 4s 2s +100%
shake256_release 4s 4s +0%
sign_signature 4s 4s +0%
sign_signature_pre_hash_shake256 4s 7s -43%
sign_verify_extmu 4s 4s +0%
caddq 3s 3s +0%
fqscale 3s 3s +0%
intt_native_x86_64 3s 4s -25%
keccakf1600x4_permute 3s 3s +0%
make_hint 3s 2s +50%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_sample_s1_s2 3s 6s -50%
mld_value_barrier_u8 3s 3s +0%
pack_pk 3s 4s -25%
pack_sig_c_h 3s 4s -25%
pack_sk 3s 2s +50%
poly_decompose_c 3s 4s -25%
poly_invntt_tomont 3s 3s +0%
poly_pointwise_montgomery 3s 5s -40%
poly_reduce 3s 3s +0%
poly_uniform_gamma1 3s 2s +50%
poly_uniform_gamma1_4x 3s 2s +50%
poly_use_hint_native 3s 3s +0%
polyt1_unpack 3s 3s +0%
polyveck_unpack_eta 3s 4s -25%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyvecl_unpack_eta 3s 4s -25%
polyvecl_unpack_z 3s 4s -25%
rej_eta_c 3s 4s -25%
shake128_release 3s 2s +50%
shake256_squeeze 3s 3s +0%
shake256x4_absorb_once 3s 3s +0%
sign_keypair 3s 4s -25%
sign_open 3s 3s +0%
sign_signature_extmu 3s 3s +0%
sign_verify_pre_hash_shake256 3s 4s -25%
unpack_pk 3s 3s +0%
unpack_sk 3s 3s +0%
decompose 2s 2s +0%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_ct_sel_int32 2s 2s +0%
mld_value_barrier_i64 2s 3s -33%
pack_sig_z 2s 3s -33%
poly_caddq_native 2s 2s +0%
poly_chknorm 2s 2s +0%
poly_chknorm_native 2s 5s -60%
poly_chknorm_native_aarch64 2s - new
poly_decompose_native 2s 3s -33%
poly_invntt_tomont_native 2s 4s -50%
poly_make_hint 2s 6s -67%
poly_ntt 2s 3s -33%
poly_shiftl 2s 4s -50%
poly_uniform 2s 3s -33%
polyveck_pack_eta 2s 3s -33%
polyveck_unpack_t0 2s 2s +0%
polyw1_pack 2s 3s -33%
reduce32 2s 2s +0%
shake128_absorb 2s 2s +0%
shake128_finalize 2s 2s +0%
shake128_init 2s 3s -33%
shake128_squeeze 2s 2s +0%
shake128x4_squeezeblocks 2s 2s +0%
shake256 2s 2s +0%
shake256_finalize 2s 3s -33%
shake256_init 2s 5s -60%
shake256x4_squeezeblocks 2s 2s +0%
sign_signature_pre_hash_internal 2s 4s -50%
unpack_sig 2s 2s +0%
use_hint 2s 4s -50%
keccak_finalize 1s 2s -50%
keccak_init 1s 2s -50%
keccakf1600_extract_bytes (big endian) 1s 2s -50%
keccakf1600_xor_bytes (big endian) 1s 3s -67%
mld_ct_cmask_nonzero_u32 1s 5s -80%
montgomery_reduce 1s 1s +0%
poly_pointwise_montgomery_native 1s 3s -67%
poly_use_hint 1s 6s -83%
power2round 1s 3s -67%
shake256_absorb 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Feb 3, 2026

CBMC Results (ML-DSA-65)

Full Results (176 proofs)
Proof Status Current Previous Change
**TOTAL** 2490s 2521s -1.2%
sign_verify_internal 338s 351s -4%
mld_attempt_signature_generation 273s 279s -2%
polyvecl_pointwise_acc_montgomery_c 181s 195s -7%
poly_pointwise_montgomery_c 156s 165s -5%
rej_uniform_native 147s 149s -1%
polyvec_matrix_expand 123s 128s -4%
mld_invntt_layer 97s 97s +0%
mld_ct_memcmp 76s 78s -3%
polyvec_matrix_expand_serial 66s 70s -6%
mld_ntt_layer 57s 57s +0%
keccak_squeezeblocks_x4 44s 43s +2%
sign_signature_internal 39s 39s +0%
polymat_permute_bitrev_to_custom 29s 32s -9%
mld_compute_t0_t1_tr_from_sk_components 27s 26s +4%
fqmul 21s 20s +5%
poly_chknorm_c 21s 20s +5%
rej_uniform 21s 19s +11%
poly_uniform_eta_4x 18s 15s +20%
poly_uniform_4x 16s 15s +7%
keccakf1600x4_permute_native 15s 13s +15%
mld_ntt_butterfly_block 15s 13s +15%
rej_uniform_c 15s 16s -6%
polyt0_unpack 13s 14s -7%
polyveck_decompose 13s 14s -7%
polyveck_sub 13s 12s +8%
polyvecl_chknorm 13s 11s +18%
poly_add 12s 11s +9%
keccak_absorb_once_x4 11s 12s -8%
polyvec_matrix_pointwise_montgomery 11s 12s -8%
polyveck_add 11s 11s +0%
polyveck_ntt 11s 10s +10%
polyveck_invntt_tomont 10s 9s +11%
polyveck_power2round 10s 11s -9%
keccakf1600_permute_native 9s 7s +29%
sign_pk_from_sk 9s 10s -10%
keccakf1600_permute 8s 7s +14%
mld_check_pct 8s 7s +14%
mld_polyvecl_permute_bitrev_to_custom_native 8s 9s -11%
mld_sample_s1_s2_serial 8s 5s +60%
poly_decompose_c 8s 7s +14%
poly_invntt_tomont_c 8s 8s +0%
polyveck_reduce 8s 6s +33%
polyvecl_ntt 8s 7s +14%
sign_signature_extmu 8s 4s +100%
keccak_absorb 7s 7s +0%
mld_ct_cmask_nonzero_u32 7s 3s +133%
poly_caddq_c 7s 5s +40%
polyveck_caddq 7s 7s +0%
polyveck_pointwise_poly_montgomery 7s 8s -12%
sign 7s 7s +0%
mld_sample_s1_s2 6s 6s +0%
pack_pk 6s 3s +100%
poly_decompose_native 6s 3s +100%
poly_uniform_eta 6s 7s -14%
polyveck_shiftl 6s 6s +0%
polyveck_use_hint 6s 8s -25%
rej_eta_c 6s 4s +50%
unpack_hints 6s 4s +50%
fqscale 5s 3s +67%
mld_compute_pack_z 5s 6s -17%
mld_value_barrier_u32 5s 3s +67%
poly_caddq 5s 4s +25%
poly_caddq_native 5s 2s +150%
poly_challenge 5s 3s +67%
poly_chknorm 5s 3s +67%
poly_chknorm_native 5s 2s +150%
poly_chknorm_native_aarch64 5s - new
poly_use_hint_c 5s 5s +0%
polyt0_pack 5s 6s -17%
polyveck_make_hint 5s 6s -17%
polyvecl_pointwise_acc_montgomery_native 5s 4s +25%
polyvecl_unpack_eta 5s 3s +67%
shake256x4_squeezeblocks 5s 3s +67%
sign_keypair_internal 5s 4s +25%
sign_signature 5s 4s +25%
sign_signature_pre_hash_internal 5s 4s +25%
sign_signature_pre_hash_shake256 5s 4s +25%
sign_verify 5s 5s +0%
sign_verify_extmu 5s 3s +67%
unpack_sk 5s 6s -17%
intt_native_x86_64 4s 2s +100%
montgomery_reduce 4s 3s +33%
pack_sk 4s 3s +33%
poly_decompose 4s 5s -20%
poly_invntt_tomont_native 4s 4s +0%
poly_ntt 4s 3s +33%
poly_power2round 4s 4s +0%
poly_reduce 4s 4s +0%
poly_uniform 4s 6s -33%
poly_uniform_gamma1 4s 2s +100%
poly_uniform_gamma1_4x 4s 4s +0%
poly_use_hint 4s 4s +0%
polyeta_unpack 4s 5s -20%
polyt1_pack 4s 4s +0%
polyt1_unpack 4s 3s +33%
polyveck_chknorm 4s 7s -43%
polyveck_pack_w1 4s 3s +33%
polyveck_unpack_eta 4s 5s -20%
polyveck_unpack_t0 4s 3s +33%
polyvecl_pointwise_acc_montgomery 4s 2s +100%
polyvecl_uniform_gamma1 4s 4s +0%
polyvecl_uniform_gamma1_serial 4s 3s +33%
polyvecl_unpack_z 4s 3s +33%
polyz_unpack_c 4s 3s +33%
shake128_finalize 4s 3s +33%
shake128x4_absorb_once 4s 2s +100%
sign_verify_pre_hash_internal 4s 6s -33%
sign_verify_pre_hash_shake256 4s 5s -20%
sys_check_capability 4s 4s +0%
unpack_sig 4s 2s +100%
decompose 3s 2s +50%
keccak_squeeze 3s 4s -25%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_xor_bytes 3s 3s +0%
keccakf1600x4_xor_bytes 3s 2s +50%
mld_ct_abs_i32 3s 3s +0%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_value_barrier_i64 3s 2s +50%
ntt_native_x86_64 3s 5s -40%
pack_sig_z 3s 2s +50%
poly_caddq_native_aarch64 3s 4s -25%
poly_make_hint 3s 7s -57%
poly_ntt_native 3s 4s -25%
poly_pointwise_montgomery 3s 3s +0%
polyvecl_pack_eta 3s 4s -25%
polyvecl_permute_bitrev_to_custom 3s 3s +0%
polyw1_pack 3s 2s +50%
polyz_pack 3s 2s +50%
polyz_unpack 3s 2s +50%
polyz_unpack_native 3s 4s -25%
reduce32 3s 1s +200%
rej_eta_native 3s 7s -57%
shake128_absorb 3s 3s +0%
shake128_squeeze 3s 3s +0%
shake128x4_squeezeblocks 3s 1s +200%
shake256_init 3s 2s +50%
shake256_release 3s 1s +200%
sign_open 3s 8s -62%
unpack_pk 3s 4s -25%
caddq 2s 4s -50%
keccak_finalize 2s 2s +0%
keccak_init 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
make_hint 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 7s -71%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_ct_get_optblocker_u8 2s 3s -33%
mld_ct_sel_int32 2s 1s +100%
mld_h 2s 5s -60%
mld_keccakf1600_extract_bytes 2s 2s +0%
mld_prepare_domain_separation_prefix 2s 5s -60%
pack_sig_c_h 2s 5s -60%
poly_invntt_tomont 2s 3s -33%
poly_ntt_c 2s 4s -50%
poly_pointwise_montgomery_native 2s 5s -60%
poly_shiftl 2s 2s +0%
poly_sub 2s 4s -50%
poly_use_hint_native 2s 4s -50%
polyeta_pack 2s 4s -50%
polyveck_pack_eta 2s 4s -50%
polyveck_pack_t0 2s 4s -50%
power2round 2s 2s +0%
rej_eta 2s 3s -33%
shake128_init 2s 1s +100%
shake128_release 2s 2s +0%
shake256 2s 2s +0%
shake256_absorb 2s 2s +0%
shake256_finalize 2s 2s +0%
shake256x4_absorb_once 2s 1s +100%
sign_keypair 2s 4s -50%
use_hint 2s 3s -33%
keccakf1600x4_extract_bytes 1s 1s +0%
keccakf1600x4_permute 1s 5s -80%
mld_value_barrier_u8 1s 5s -80%
shake256_squeeze 1s 2s -50%

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 46211 cycles 46624 cycles 0.99
ML-DSA-44 sign 132056 cycles 132647 cycles 1.00
ML-DSA-44 verify 47963 cycles 48192 cycles 1.00
ML-DSA-65 keypair 81007 cycles 81247 cycles 1.00
ML-DSA-65 sign 217400 cycles 216496 cycles 1.00
ML-DSA-65 verify 80317 cycles 80278 cycles 1.00
ML-DSA-87 keypair 132163 cycles 132223 cycles 1.00
ML-DSA-87 sign 278856 cycles 277005 cycles 1.01
ML-DSA-87 verify 130830 cycles 130478 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 114152 cycles 114174 cycles 1.00
ML-DSA-44 sign 417928 cycles 418023 cycles 1.00
ML-DSA-44 verify 122286 cycles 122275 cycles 1.00
ML-DSA-65 keypair 195573 cycles 195504 cycles 1.00
ML-DSA-65 sign 682593 cycles 682483 cycles 1.00
ML-DSA-65 verify 197794 cycles 197795 cycles 1.00
ML-DSA-87 keypair 322638 cycles 322704 cycles 1.00
ML-DSA-87 sign 864734 cycles 864793 cycles 1.00
ML-DSA-87 verify 328631 cycles 328713 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 34389 cycles 34839 cycles 0.99
ML-DSA-44 sign 120181 cycles 120131 cycles 1.00
ML-DSA-44 verify 38147 cycles 38194 cycles 1.00
ML-DSA-65 keypair 60702 cycles 61212 cycles 0.99
ML-DSA-65 sign 200463 cycles 201568 cycles 0.99
ML-DSA-65 verify 62614 cycles 62713 cycles 1.00
ML-DSA-87 keypair 93596 cycles 93351 cycles 1.00
ML-DSA-87 sign 235406 cycles 235024 cycles 1.00
ML-DSA-87 verify 96030 cycles 95315 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 113189 cycles 113289 cycles 1.00
ML-DSA-44 sign 355737 cycles 355785 cycles 1.00
ML-DSA-44 verify 117778 cycles 117888 cycles 1.00
ML-DSA-65 keypair 196767 cycles 196971 cycles 1.00
ML-DSA-65 sign 589734 cycles 589375 cycles 1.00
ML-DSA-65 verify 194821 cycles 194603 cycles 1.00
ML-DSA-87 keypair 322483 cycles 322481 cycles 1.00
ML-DSA-87 sign 752763 cycles 753123 cycles 1.00
ML-DSA-87 verify 320123 cycles 320184 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 93672 cycles 93878 cycles 1.00
ML-DSA-44 sign 333154 cycles 333551 cycles 1.00
ML-DSA-44 verify 99800 cycles 99741 cycles 1.00
ML-DSA-65 keypair 160388 cycles 160303 cycles 1.00
ML-DSA-65 sign 546111 cycles 546512 cycles 1.00
ML-DSA-65 verify 161081 cycles 160855 cycles 1.00
ML-DSA-87 keypair 268343 cycles 268335 cycles 1.00
ML-DSA-87 sign 708002 cycles 707679 cycles 1.00
ML-DSA-87 verify 271217 cycles 270536 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 69141 cycles 68893 cycles 1.00
ML-DSA-44 sign 187951 cycles 187316 cycles 1.00
ML-DSA-44 verify 69091 cycles 69406 cycles 1.00
ML-DSA-65 keypair 119649 cycles 124392 cycles 0.96
ML-DSA-65 sign 300863 cycles 312609 cycles 0.96
ML-DSA-65 verify 115182 cycles 120400 cycles 0.96
ML-DSA-87 keypair 203659 cycles 203433 cycles 1.00
ML-DSA-87 sign 395464 cycles 394304 cycles 1.00
ML-DSA-87 verify 196582 cycles 195791 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 56448 cycles 56217 cycles 1.00
ML-DSA-44 sign 181082 cycles 181507 cycles 1.00
ML-DSA-44 verify 61065 cycles 61155 cycles 1.00
ML-DSA-65 keypair 98597 cycles 98607 cycles 1.00
ML-DSA-65 sign 298704 cycles 298482 cycles 1.00
ML-DSA-65 verify 100600 cycles 100148 cycles 1.00
ML-DSA-87 keypair 152388 cycles 152389 cycles 1.00
ML-DSA-87 sign 354368 cycles 354986 cycles 1.00
ML-DSA-87 verify 153457 cycles 153378 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 212546 cycles 212879 cycles 1.00
ML-DSA-44 sign 759892 cycles 759720 cycles 1.00
ML-DSA-44 verify 228806 cycles 229134 cycles 1.00
ML-DSA-65 keypair 380579 cycles 380893 cycles 1.00
ML-DSA-65 sign 1252543 cycles 1251861 cycles 1.00
ML-DSA-65 verify 371847 cycles 372273 cycles 1.00
ML-DSA-87 keypair 605012 cycles 605972 cycles 1.00
ML-DSA-87 sign 1593926 cycles 1592048 cycles 1.00
ML-DSA-87 verify 618666 cycles 617982 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 68119 cycles 68096 cycles 1.00
ML-DSA-44 sign 202199 cycles 201925 cycles 1.00
ML-DSA-44 verify 70872 cycles 70772 cycles 1.00
ML-DSA-65 keypair 121209 cycles 120887 cycles 1.00
ML-DSA-65 sign 331550 cycles 331015 cycles 1.00
ML-DSA-65 verify 117871 cycles 117837 cycles 1.00
ML-DSA-87 keypair 198067 cycles 198825 cycles 1.00
ML-DSA-87 sign 427398 cycles 428194 cycles 1.00
ML-DSA-87 verify 194819 cycles 194532 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 135916 cycles 134942 cycles 1.01
ML-DSA-44 sign 528889 cycles 524702 cycles 1.01
ML-DSA-44 verify 148727 cycles 147310 cycles 1.01
ML-DSA-65 keypair 226844 cycles 227219 cycles 1.00
ML-DSA-65 sign 861039 cycles 860381 cycles 1.00
ML-DSA-65 verify 235205 cycles 235001 cycles 1.00
ML-DSA-87 keypair 371241 cycles 370532 cycles 1.00
ML-DSA-87 sign 1079361 cycles 1079591 cycles 1.00
ML-DSA-87 verify 382651 cycles 382944 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 157529 cycles 157439 cycles 1.00
ML-DSA-44 sign 549404 cycles 551440 cycles 1.00
ML-DSA-44 verify 169147 cycles 169025 cycles 1.00
ML-DSA-65 keypair 267734 cycles 267570 cycles 1.00
ML-DSA-65 sign 904482 cycles 902715 cycles 1.00
ML-DSA-65 verify 274267 cycles 273820 cycles 1.00
ML-DSA-87 keypair 448185 cycles 447961 cycles 1.00
ML-DSA-87 sign 1157713 cycles 1157180 cycles 1.00
ML-DSA-87 verify 457677 cycles 457743 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 42276 cycles 42567 cycles 0.99
ML-DSA-44 sign 134443 cycles 135866 cycles 0.99
ML-DSA-44 verify 44649 cycles 44999 cycles 0.99
ML-DSA-65 keypair 73239 cycles 72632 cycles 1.01
ML-DSA-65 sign 214050 cycles 215661 cycles 0.99
ML-DSA-65 verify 73552 cycles 74051 cycles 0.99
ML-DSA-87 keypair 108087 cycles 107869 cycles 1.00
ML-DSA-87 sign 254022 cycles 250927 cycles 1.01
ML-DSA-87 verify 110555 cycles 110359 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 72369 cycles 72256 cycles 1.00
ML-DSA-44 sign 212519 cycles 211993 cycles 1.00
ML-DSA-44 verify 75774 cycles 75743 cycles 1.00
ML-DSA-65 keypair 127517 cycles 127453 cycles 1.00
ML-DSA-65 sign 351037 cycles 350294 cycles 1.00
ML-DSA-65 verify 125540 cycles 125370 cycles 1.00
ML-DSA-87 keypair 205316 cycles 208145 cycles 0.99
ML-DSA-87 sign 444258 cycles 448948 cycles 0.99
ML-DSA-87 verify 205369 cycles 205071 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 128375 cycles 128340 cycles 1.00
ML-DSA-44 sign 447673 cycles 447711 cycles 1.00
ML-DSA-44 verify 138185 cycles 144657 cycles 0.96
ML-DSA-65 keypair 220205 cycles 220297 cycles 1.00
ML-DSA-65 sign 727008 cycles 727689 cycles 1.00
ML-DSA-65 verify 222732 cycles 223199 cycles 1.00
ML-DSA-87 keypair 364808 cycles 365155 cycles 1.00
ML-DSA-87 sign 926167 cycles 926147 cycles 1.00
ML-DSA-87 verify 372896 cycles 372805 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 120509 cycles 120458 cycles 1.00
ML-DSA-44 sign 450352 cycles 446412 cycles 1.01
ML-DSA-44 verify 130307 cycles 131289 cycles 0.99
ML-DSA-65 keypair 206874 cycles 208289 cycles 0.99
ML-DSA-65 sign 731819 cycles 736271 cycles 0.99
ML-DSA-65 verify 211888 cycles 210565 cycles 1.01
ML-DSA-87 keypair 337173 cycles 338185 cycles 1.00
ML-DSA-87 sign 922084 cycles 924493 cycles 1.00
ML-DSA-87 verify 347670 cycles 347387 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 138590 cycles 138529 cycles 1.00
ML-DSA-44 sign 484054 cycles 484089 cycles 1.00
ML-DSA-44 verify 148566 cycles 162338 cycles 0.92
ML-DSA-65 keypair 242217 cycles 242174 cycles 1.00
ML-DSA-65 sign 792994 cycles 792654 cycles 1.00
ML-DSA-65 verify 240732 cycles 241197 cycles 1.00
ML-DSA-87 keypair 395487 cycles 396232 cycles 1.00
ML-DSA-87 sign 1012589 cycles 1012466 cycles 1.00
ML-DSA-87 verify 402663 cycles 402521 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 276287 cycles 274734 cycles 1.01
ML-DSA-44 sign 814263 cycles 807448 cycles 1.01
ML-DSA-44 verify 279728 cycles 274869 cycles 1.02
ML-DSA-65 keypair 474382 cycles 475069 cycles 1.00
ML-DSA-65 sign 1333382 cycles 1342498 cycles 0.99
ML-DSA-65 verify 456744 cycles 456439 cycles 1.00
ML-DSA-87 keypair 818394 cycles 803841 cycles 1.02
ML-DSA-87 sign 1860251 cycles 1815162 cycles 1.02
ML-DSA-87 verify 794590 cycles 777338 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 464210 cycles 461615 cycles 1.01
ML-DSA-44 sign 2149524 cycles 2134682 cycles 1.01
ML-DSA-44 verify 549563 cycles 546628 cycles 1.01
ML-DSA-65 keypair 779853 cycles 775173 cycles 1.01
ML-DSA-65 sign 3520058 cycles 3500383 cycles 1.01
ML-DSA-65 verify 854219 cycles 849253 cycles 1.01
ML-DSA-87 keypair 1261358 cycles 1250972 cycles 1.01
ML-DSA-87 sign 4376754 cycles 4319185 cycles 1.01
ML-DSA-87 verify 1374196 cycles 1363980 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 113823 cycles 113656 cycles 1.00
ML-DSA-44 sign 356235 cycles 356457 cycles 1.00
ML-DSA-44 verify 118535 cycles 118364 cycles 1.00
ML-DSA-65 keypair 196925 cycles 196801 cycles 1.00
ML-DSA-65 sign 589576 cycles 589447 cycles 1.00
ML-DSA-65 verify 195140 cycles 194983 cycles 1.00
ML-DSA-87 keypair 322576 cycles 323521 cycles 1.00
ML-DSA-87 sign 754818 cycles 754035 cycles 1.00
ML-DSA-87 verify 320910 cycles 320450 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 213650 cycles 212904 cycles 1.00
ML-DSA-44 sign 762125 cycles 760682 cycles 1.00
ML-DSA-44 verify 241797 cycles 234616 cycles 1.03
ML-DSA-65 keypair 381406 cycles 381172 cycles 1.00
ML-DSA-65 sign 1252976 cycles 1254269 cycles 1.00
ML-DSA-65 verify 372647 cycles 372133 cycles 1.00
ML-DSA-87 keypair 606845 cycles 604503 cycles 1.00
ML-DSA-87 sign 1594104 cycles 1594713 cycles 1.00
ML-DSA-87 verify 618424 cycles 618687 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 verify 241797 cycles 234616 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 828243 cycles 828217 cycles 1.00
ML-DSA-44 sign 3238322 cycles 3235342 cycles 1.00
ML-DSA-44 verify 923861 cycles 923322 cycles 1.00
ML-DSA-65 keypair 1411909 cycles 1413312 cycles 1.00
ML-DSA-65 sign 5348828 cycles 5348716 cycles 1.00
ML-DSA-65 verify 1480373 cycles 1481891 cycles 1.00
ML-DSA-87 keypair 2312272 cycles 2329235 cycles 0.99
ML-DSA-87 sign 6675562 cycles 6715181 cycles 0.99
ML-DSA-87 verify 2418344 cycles 2432179 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 225720 cycles 229083 cycles 0.99
ML-DSA-44 sign 617875 cycles 619262 cycles 1.00
ML-DSA-44 verify 217324 cycles 224163 cycles 0.97
ML-DSA-65 keypair 386040 cycles 398249 cycles 0.97
ML-DSA-65 sign 1009962 cycles 1039922 cycles 0.97
ML-DSA-65 verify 365109 cycles 376934 cycles 0.97
ML-DSA-87 keypair 653786 cycles 679098 cycles 0.96
ML-DSA-87 sign 1381648 cycles 1427372 cycles 0.97
ML-DSA-87 verify 638020 cycles 658798 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-44 keypair 322781 cycles 317946 cycles 1.02
ML-DSA-44 sign 1221717 cycles 1190118 cycles 1.03
ML-DSA-44 verify 345309 cycles 336996 cycles 1.02
ML-DSA-65 keypair 588588 cycles 569032 cycles 1.03
ML-DSA-65 sign 1988219 cycles 1946059 cycles 1.02
ML-DSA-65 verify 554093 cycles 539114 cycles 1.03
ML-DSA-87 keypair 860852 cycles 868159 cycles 0.99
ML-DSA-87 sign 2417632 cycles 2464727 cycles 0.98
ML-DSA-87 verify 887711 cycles 893016 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 84184a3 Previous: 89f34e5 Ratio
ML-DSA-65 keypair 588588 cycles 569032 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer marked this pull request as ready for review February 5, 2026 02:12
@mkannwischer mkannwischer requested a review from a team as a code owner February 5, 2026 02:12
Comment thread proofs/hol_light/aarch64/proofs/mldsa_poly_chknorm.ml Outdated
@jakemas jakemas self-requested a review February 5, 2026 18:59
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mkannwischer, looks good overall!

One general comment: For this proof it doesn't matter so much because it's fairly simple, but generally it's useful to keep terms folded as long as possible, both for proof speed and for readability. Here, you can avoid unfolding most quantifiers until the very end.

Also, you can consider introducing wrapper definitions for the expressions you encounter during the proof, fold them eagerly during symbolic execution, and then prove lemmas about those wrapper definitions.

To illustrate what I mean, here is an alternative proof script:

(* Expression emerging from the AVX2 code converting bit to 32-bit mask *)
let bit_to_mask32 = new_definition `bit_to_mask32 (b : bool) : 32 word = word_neg (word (bitval b) : 32 word)`;;

(* Expression used for bounds check itself *)
let bd = new_definition `bd (v : int32) (b: int32) : bool =
    (ival (iword (abs (ival v)) : 32 word) >= ival (word_zx (word_zx b : 64 word) : 32 word))`;;

let MAX_VAL_BIT_TO_MASK32 = prove(
  `MAX (val (bit_to_mask32 b0)) (val (bit_to_mask32 b1)) = val (bit_to_mask32 (b0 \/ b1))`,
  REWRITE_TAC[bit_to_mask32] THEN
  BOOL_CASES_TAC `b0:bool` THEN BOOL_CASES_TAC `b1:bool` THEN
  REWRITE_TAC[BITVAL_CLAUSES] THEN CONV_TAC WORD_REDUCE_CONV THEN ARITH_TAC);;  

let BD_SIMP = prove(
  `abs(ival(x : int32)) < &2 pow 31 ==> (bd x b <=> abs (ival x) >= ival b)`,
  REWRITE_TAC[bd] THEN DISCH_TAC THEN
  SUBGOAL_THEN `ival(iword(abs(ival(x:32 word))) : 32 word) = abs(ival x)` SUBST1_TAC THENL
  [MATCH_MP_TAC IVAL_IWORD THEN REWRITE_TAC[DIMINDEX_32] THEN CONV_TAC NUM_REDUCE_CONV THEN
   FIRST_X_ASSUM MP_TAC THEN REWRITE_TAC[INT_ABS_POS] THEN INT_ARITH_TAC; ALL_TAC] THEN
  SUBGOAL_THEN `(word_zx:64 word -> 32 word) ((word_zx:32 word -> 64 word) b) = b` SUBST1_TAC THENL
  [MATCH_MP_TAC WORD_ZX_ZX THEN REWRITE_TAC[DIMINDEX_32; DIMINDEX_64] THEN ARITH_TAC;
   REFL_TAC]);;

let BIT_TO_MASK32_OR = prove(
  `word_or (bit_to_mask32 b0) (bit_to_mask32 b1) = bit_to_mask32 (b0 \/ b1)`,
  REWRITE_TAC[bit_to_mask32] THEN
  BOOL_CASES_TAC `b0:bool` THEN BOOL_CASES_TAC `b1:bool` THEN
  REWRITE_TAC[BITVAL_CLAUSES] THEN CONV_TAC WORD_REDUCE_CONV);;

let MASK32_TO_BIT = prove(
  `(word_zx:32 word -> 64 word) (word_and ((word_zx:64 word -> 32 word)
     ((word_zx:32 word -> 64 word) (word_subword (word (val (bit_to_mask32 b)) : 128 word) (0,32))))
     (word 1)) = word (bitval b) : 64 word`,
  REWRITE_TAC[bit_to_mask32] THEN
  BOOL_CASES_TAC `b:bool` THEN REWRITE_TAC[BITVAL_CLAUSES] THEN
  CONV_TAC WORD_REDUCE_CONV);;

let WORD_JOIN_OR_TYBIT0 = prove(
  `word_or (word_join (a:N word) (b:N word) : (N tybit0) word) (word_join (c:N word) (d:N word)) =
   word_join (word_or a c) (word_or b d)`,
  REWRITE_TAC[WORD_EQ_BITS_ALT; BIT_WORD_OR; BIT_WORD_JOIN; DIMINDEX_TYBIT0] THEN
  X_GEN_TAC `i:num` THEN
  ASM_CASES_TAC `i < 2 * dimindex(:N)` THEN ASM_REWRITE_TAC[] THEN
  ASM_CASES_TAC `i < dimindex(:N)` THEN ASM_REWRITE_TAC[] THEN
  MATCH_MP_TAC(TAUT `p ==> (q <=> p /\ q)`) THEN ASM_ARITH_TAC);;

(* ------------------------------------------------------------------------- *)
(* Core correctness theorem                                                  *)
(* ------------------------------------------------------------------------- *)

let MLDSA_POLY_CHKNORM_CORRECT = prove(
 `!a (x:num->int32) (bound:int32) pc.
        nonoverlapping (word pc, LENGTH mldsa_poly_chknorm_mc) (a, 1024)
        ==> ensures arm
             (\s. aligned_bytes_loaded s (word pc) mldsa_poly_chknorm_mc /\
                  read PC s = word(pc + MLDSA_POLY_CHKNORM_CORE_START) /\
                  C_ARGUMENTS [a; word_zx bound] s /\
                  (!i. i < 256 ==>
                     read(memory :> bytes32(word_add a (word(4 * i)))) s = x i) /\
                  (!i. i < 256 ==> abs(ival(x i)) < &2 pow 31))
             (\s. read PC s = word(pc + MLDSA_POLY_CHKNORM_CORE_END) /\
                  read X0 s = word(bitval(?i. i < 256 /\ abs(ival(x i)) >= ival bound)))
             (MAYCHANGE_REGS_AND_FLAGS_PERMITTED_BY_ABI)`,
  CONV_TAC CHKNORM_LENGTH_SIMPLIFY_CONV THEN
  MAP_EVERY X_GEN_TAC [`a:int64`; `x:num->int32`; `bound:int32`; `pc:num`] THEN
  REWRITE_TAC[MAYCHANGE_REGS_AND_FLAGS_PERMITTED_BY_ABI; C_ARGUMENTS;
              NONOVERLAPPING_CLAUSES (* ); EXISTS_LT_256 *) ] THEN
  DISCH_THEN(REPEAT_TCL CONJUNCTS_THEN ASSUME_TAC) THEN
  (* Expand bounded foralls in precondition to 256 explicit cases *)
  ENSURES_INIT_TAC "s0" THEN
  UNDISCH_TAC `forall i. i < 256 ==> read (memory :> bytes32 (word_add a (word (4 * i)))) s0 = x i` THEN
  CONV_TAC(ONCE_DEPTH_CONV (EXPAND_CASES_CONV THENC ONCE_DEPTH_CONV NUM_MULT_CONV)) THEN REPEAT STRIP_TAC THEN
  (* Merge bytes32 reads into bytes128 reads (64 merges for 256 coefficients) *)
  MP_TAC(end_itlist CONJ (map (fun n -> READ_MEMORY_MERGE_CONV 2
            (subst[mk_small_numeral(16*n),`n:num`]
                  `read (memory :> bytes128(word_add a (word n))) s0`))
            (0--63))) THEN
  ASM_REWRITE_TAC[WORD_ADD_0] THEN
  DISCARD_MATCHING_ASSUMPTIONS [`read (memory :> bytes32 a) s = x`] THEN
  STRIP_TAC THEN
  (* Symbolically execute all instructions until target PC *)
  MAP_UNTIL_TARGET_PC (fun n ->
    ARM_STEPS_TAC MLDSA_POLY_CHKNORM_EXEC [n] THEN
    RULE_ASSUM_TAC(CONV_RULE(TOP_DEPTH_CONV WORD_SIMPLE_SUBWORD_CONV)) THEN 
    RULE_ASSUM_TAC(REWRITE_RULE[WORD_SUBWORD_OR; GSYM bit_to_mask32; WORD_JOIN_OR_TYBIT0; SYM (SPEC_ALL bd); BIT_TO_MASK32_OR; 
      MAX_VAL_BIT_TO_MASK32; MASK32_TO_BIT])) 1 THEN
  
  (* Close the state relation *)
  ENSURES_FINAL_STATE_TAC THEN ASM_REWRITE_TAC[] THEN
  DISCARD_MATCHING_ASSUMPTIONS [`read t s = x`] THEN

  RULE_ASSUM_TAC (CONV_RULE (ONCE_DEPTH_CONV EXPAND_CASES_CONV)) THEN
  REPEAT(FIRST_X_ASSUM(CONJUNCTS_THEN ASSUME_TAC)) THEN
  IMP_REWRITE_TAC [bd_simp] THEN
  POP_ASSUM_LIST (K ALL_TAC) THEN

  (* Convert to ! instead of ? and split *)
  GEN_REWRITE_TAC (BINOP_CONV o ONCE_DEPTH_CONV) [prove (`b = ~ (~ (b : bool))`, REWRITE_TAC[])] THEN
  GEN_REWRITE_TAC TOP_SWEEP_CONV [MESON[] `~(?i. i < n /\ P i) <=> (!i. i < n ==> ~P i)`; DE_MORGAN_THM] THEN
  CONV_TAC (ONCE_DEPTH_CONV EXPAND_CASES_CONV) THEN
  REPEAT AP_TERM_TAC THEN EQ_TAC THEN SIMP_TAC[]);;

@mkannwischer
Copy link
Copy Markdown
Contributor Author

Thank you @mkannwischer, looks good overall!

One general comment: For this proof it doesn't matter so much because it's fairly simple, but generally it's useful to keep terms folded as long as possible, both for proof speed and for readability. Here, you can avoid unfolding most quantifiers until the very end.

Also, you can consider introducing wrapper definitions for the expressions you encounter during the proof, fold them eagerly during symbolic execution, and then prove lemmas about those wrapper definitions.

To illustrate what I mean, here is an alternative proof script:

(* Expression emerging from the AVX2 code converting bit to 32-bit mask *)
let bit_to_mask32 = new_definition `bit_to_mask32 (b : bool) : 32 word = word_neg (word (bitval b) : 32 word)`;;

(* Expression used for bounds check itself *)
let bd = new_definition `bd (v : int32) (b: int32) : bool =
    (ival (iword (abs (ival v)) : 32 word) >= ival (word_zx (word_zx b : 64 word) : 32 word))`;;

let MAX_VAL_BIT_TO_MASK32 = prove(
  `MAX (val (bit_to_mask32 b0)) (val (bit_to_mask32 b1)) = val (bit_to_mask32 (b0 \/ b1))`,
  REWRITE_TAC[bit_to_mask32] THEN
  BOOL_CASES_TAC `b0:bool` THEN BOOL_CASES_TAC `b1:bool` THEN
  REWRITE_TAC[BITVAL_CLAUSES] THEN CONV_TAC WORD_REDUCE_CONV THEN ARITH_TAC);;  

let BD_SIMP = prove(
  `abs(ival(x : int32)) < &2 pow 31 ==> (bd x b <=> abs (ival x) >= ival b)`,
  REWRITE_TAC[bd] THEN DISCH_TAC THEN
  SUBGOAL_THEN `ival(iword(abs(ival(x:32 word))) : 32 word) = abs(ival x)` SUBST1_TAC THENL
  [MATCH_MP_TAC IVAL_IWORD THEN REWRITE_TAC[DIMINDEX_32] THEN CONV_TAC NUM_REDUCE_CONV THEN
   FIRST_X_ASSUM MP_TAC THEN REWRITE_TAC[INT_ABS_POS] THEN INT_ARITH_TAC; ALL_TAC] THEN
  SUBGOAL_THEN `(word_zx:64 word -> 32 word) ((word_zx:32 word -> 64 word) b) = b` SUBST1_TAC THENL
  [MATCH_MP_TAC WORD_ZX_ZX THEN REWRITE_TAC[DIMINDEX_32; DIMINDEX_64] THEN ARITH_TAC;
   REFL_TAC]);;

let BIT_TO_MASK32_OR = prove(
  `word_or (bit_to_mask32 b0) (bit_to_mask32 b1) = bit_to_mask32 (b0 \/ b1)`,
  REWRITE_TAC[bit_to_mask32] THEN
  BOOL_CASES_TAC `b0:bool` THEN BOOL_CASES_TAC `b1:bool` THEN
  REWRITE_TAC[BITVAL_CLAUSES] THEN CONV_TAC WORD_REDUCE_CONV);;

let MASK32_TO_BIT = prove(
  `(word_zx:32 word -> 64 word) (word_and ((word_zx:64 word -> 32 word)
     ((word_zx:32 word -> 64 word) (word_subword (word (val (bit_to_mask32 b)) : 128 word) (0,32))))
     (word 1)) = word (bitval b) : 64 word`,
  REWRITE_TAC[bit_to_mask32] THEN
  BOOL_CASES_TAC `b:bool` THEN REWRITE_TAC[BITVAL_CLAUSES] THEN
  CONV_TAC WORD_REDUCE_CONV);;

let WORD_JOIN_OR_TYBIT0 = prove(
  `word_or (word_join (a:N word) (b:N word) : (N tybit0) word) (word_join (c:N word) (d:N word)) =
   word_join (word_or a c) (word_or b d)`,
  REWRITE_TAC[WORD_EQ_BITS_ALT; BIT_WORD_OR; BIT_WORD_JOIN; DIMINDEX_TYBIT0] THEN
  X_GEN_TAC `i:num` THEN
  ASM_CASES_TAC `i < 2 * dimindex(:N)` THEN ASM_REWRITE_TAC[] THEN
  ASM_CASES_TAC `i < dimindex(:N)` THEN ASM_REWRITE_TAC[] THEN
  MATCH_MP_TAC(TAUT `p ==> (q <=> p /\ q)`) THEN ASM_ARITH_TAC);;

(* ------------------------------------------------------------------------- *)
(* Core correctness theorem                                                  *)
(* ------------------------------------------------------------------------- *)

let MLDSA_POLY_CHKNORM_CORRECT = prove(
 `!a (x:num->int32) (bound:int32) pc.
        nonoverlapping (word pc, LENGTH mldsa_poly_chknorm_mc) (a, 1024)
        ==> ensures arm
             (\s. aligned_bytes_loaded s (word pc) mldsa_poly_chknorm_mc /\
                  read PC s = word(pc + MLDSA_POLY_CHKNORM_CORE_START) /\
                  C_ARGUMENTS [a; word_zx bound] s /\
                  (!i. i < 256 ==>
                     read(memory :> bytes32(word_add a (word(4 * i)))) s = x i) /\
                  (!i. i < 256 ==> abs(ival(x i)) < &2 pow 31))
             (\s. read PC s = word(pc + MLDSA_POLY_CHKNORM_CORE_END) /\
                  read X0 s = word(bitval(?i. i < 256 /\ abs(ival(x i)) >= ival bound)))
             (MAYCHANGE_REGS_AND_FLAGS_PERMITTED_BY_ABI)`,
  CONV_TAC CHKNORM_LENGTH_SIMPLIFY_CONV THEN
  MAP_EVERY X_GEN_TAC [`a:int64`; `x:num->int32`; `bound:int32`; `pc:num`] THEN
  REWRITE_TAC[MAYCHANGE_REGS_AND_FLAGS_PERMITTED_BY_ABI; C_ARGUMENTS;
              NONOVERLAPPING_CLAUSES (* ); EXISTS_LT_256 *) ] THEN
  DISCH_THEN(REPEAT_TCL CONJUNCTS_THEN ASSUME_TAC) THEN
  (* Expand bounded foralls in precondition to 256 explicit cases *)
  ENSURES_INIT_TAC "s0" THEN
  UNDISCH_TAC `forall i. i < 256 ==> read (memory :> bytes32 (word_add a (word (4 * i)))) s0 = x i` THEN
  CONV_TAC(ONCE_DEPTH_CONV (EXPAND_CASES_CONV THENC ONCE_DEPTH_CONV NUM_MULT_CONV)) THEN REPEAT STRIP_TAC THEN
  (* Merge bytes32 reads into bytes128 reads (64 merges for 256 coefficients) *)
  MP_TAC(end_itlist CONJ (map (fun n -> READ_MEMORY_MERGE_CONV 2
            (subst[mk_small_numeral(16*n),`n:num`]
                  `read (memory :> bytes128(word_add a (word n))) s0`))
            (0--63))) THEN
  ASM_REWRITE_TAC[WORD_ADD_0] THEN
  DISCARD_MATCHING_ASSUMPTIONS [`read (memory :> bytes32 a) s = x`] THEN
  STRIP_TAC THEN
  (* Symbolically execute all instructions until target PC *)
  MAP_UNTIL_TARGET_PC (fun n ->
    ARM_STEPS_TAC MLDSA_POLY_CHKNORM_EXEC [n] THEN
    RULE_ASSUM_TAC(CONV_RULE(TOP_DEPTH_CONV WORD_SIMPLE_SUBWORD_CONV)) THEN 
    RULE_ASSUM_TAC(REWRITE_RULE[WORD_SUBWORD_OR; GSYM bit_to_mask32; WORD_JOIN_OR_TYBIT0; SYM (SPEC_ALL bd); BIT_TO_MASK32_OR; 
      MAX_VAL_BIT_TO_MASK32; MASK32_TO_BIT])) 1 THEN
  
  (* Close the state relation *)
  ENSURES_FINAL_STATE_TAC THEN ASM_REWRITE_TAC[] THEN
  DISCARD_MATCHING_ASSUMPTIONS [`read t s = x`] THEN

  RULE_ASSUM_TAC (CONV_RULE (ONCE_DEPTH_CONV EXPAND_CASES_CONV)) THEN
  REPEAT(FIRST_X_ASSUM(CONJUNCTS_THEN ASSUME_TAC)) THEN
  IMP_REWRITE_TAC [bd_simp] THEN
  POP_ASSUM_LIST (K ALL_TAC) THEN

  (* Convert to ! instead of ? and split *)
  GEN_REWRITE_TAC (BINOP_CONV o ONCE_DEPTH_CONV) [prove (`b = ~ (~ (b : bool))`, REWRITE_TAC[])] THEN
  GEN_REWRITE_TAC TOP_SWEEP_CONV [MESON[] `~(?i. i < n /\ P i) <=> (!i. i < n ==> ~P i)`; DE_MORGAN_THM] THEN
  CONV_TAC (ONCE_DEPTH_CONV EXPAND_CASES_CONV) THEN
  REPEAT AP_TERM_TAC THEN EQ_TAC THEN SIMP_TAC[]);;

Thanks!
This particular proof script does not work for me, but I will keep this in mind for my next proof.

Copy link
Copy Markdown
Contributor

@rod-chapman rod-chapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 2 comments, arising from partly duplicate PR 987

Comment thread mldsa/src/native/api.h Outdated
Comment thread mldsa/src/poly.c
Add a HOL Light functional correctness proof for the aarch64 ML-DSA
function poly_chknorm, which checks whether any polynomial coefficient
has absolute value >= a given bound.

This commit includes:
- Functional correctness proof showing the assembly computes
  `bitval(?i. i < 256 /\ abs(ival(x i)) >= ival bound)`
- autogen support for generating aarch64 HOL Light assembly
- Update of s2n-bignum

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Comment thread proofs/hol_light/aarch64/proofs/mldsa_poly_chknorm.ml
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙏

@mkannwischer mkannwischer merged commit 7c94262 into main Mar 20, 2026
751 of 752 checks passed
@mkannwischer mkannwischer deleted the hol-arm-chknorm branch March 20, 2026 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HOL-Light: Prove AArch64 poly_chknorm

4 participants