Skip to content

FROMLIST: wifi: ath12k: fix EAPOL TX failure caused by stale tcl_metadata bits#1419

Open
Yingying Tang (MilanoPipo) wants to merge 24 commits into
qualcomm-linux:tech/net/athfrom
MilanoPipo:ath-eapol-tcl-metadata-fix
Open

FROMLIST: wifi: ath12k: fix EAPOL TX failure caused by stale tcl_metadata bits#1419
Yingying Tang (MilanoPipo) wants to merge 24 commits into
qualcomm-linux:tech/net/athfrom
MilanoPipo:ath-eapol-tcl-metadata-fix

Conversation

@MilanoPipo

@MilanoPipo Yingying Tang (MilanoPipo) commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

On WCN7850, after the following sequence:

  1. load ath12k and connect to a non-MLO AP
  2. disconnect and connect to an MLO AP
  3. disconnect and reconnect to the non-MLO AP

the third connection always fails with a 4-Way handshake timeout. The supplicant transmits message 2 of 4 four times in response to AP retries of message 1, but the AP never sees any of them.

ath12k_dp_vdev_tx_attach() composes dp_link_vif->tcl_metadata using |=, but dp_link_vif is embedded in struct ath12k_dp_vif and its slots are reused across vif/peer teardown and setup. Since tcl_metadata is never cleared on detach, vdev_id bits from a previous attach remain set when the same link slot is reused with a different vdev_id. In this specific issue, the same link slot is used for vdev_id 0, then vdev_id 1, then vdev_id 0 again, the OR yields tcl_metadata == 0x9, which encodes vdev_id 1 in the HTT_TCL_META_DATA_VDEV_ID field even though ti.vdev_id is 0. Firmware then routes the EAPOL frame to the wrong vdev and the AP never receives message 2.

Use plain assignment instead of |= so the field is fully recomputed from the current arvif on every attach.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c7-00108-QCAHMTSWPL_V1.0_V2.0_SILICONZ_UPSTREAM-3

Fixes: af66c76 ("wifi: ath12k: Refactor ath12k_vif structure")
Link: https://lore.kernel.org/all/20260609-ath12k-fix-eapol-tcl-metadata-v1-1-d47e6f90d4ee@oss.qualcomm.com/
CRs-Fixed: 4589769
Signed-off-by: Baochen Qiang baochen.qiang@oss.qualcomm.com
Signed-off-by: Yingying Tang yingying.tang@oss.qualcomm.com

Yingying Tang (MilanoPipo) and others added 24 commits May 9, 2026 11:47
A wrong channel survey index was introduced in
ath12k_mac_op_get_survey by [1], which can cause ACS to fail.

The index is decremented before being used, resulting in an
incorrect value when accessing the channel survey data.

Fix the index handling to ensure the correct survey entry is
used and avoid ACS failures.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: 4f242b1 ("wifi: ath12k: support get_survey mac op for single wiphy") # [1]
Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com>
Commit [1] introduces dp->reo_cmd_update_rx_queue_list for the purpose
of tracking all pending REO queue flush commands. The helper
ath12k_dp_prepare_reo_update_elem() allocates an element and populates
it with REO queue information, then add it to the list. The element would
be helpful during clean up stage to finally unmap/free the corresponding
REO queue buffer.

In MLO scenarios with more than one links, for non dp_primary_link_only
chips like WCN7850, that helper is called for each link peer. This
results in multiple elements added to the list but all of them pointing
to the same REO queue buffer. Consequently the same buffer gets
unmap/freed multiple times:

BUG kmalloc-2k (Tainted: G    B   W  O       ): Object already free
-----------------------------------------------------------------------------
Allocated in ath12k_wifi7_dp_rx_assign_reoq+0xce/0x280 [ath12k_wifi7] age=7436 cpu=10 pid=16130
 __kmalloc_noprof
 ath12k_wifi7_dp_rx_assign_reoq
 ath12k_dp_rx_peer_tid_setup
 ath12k_dp_peer_setup
 ath12k_mac_station_add
 ath12k_mac_op_sta_state
 [...]
Freed in ath12k_dp_rx_tid_cleanup.part.0+0x25/0x40 [ath12k] age=1 cpu=27 pid=16137
 kfree
 ath12k_dp_rx_tid_cleanup.part.0
 ath12k_dp_rx_reo_cmd_list_cleanup
 ath12k_dp_cmn_device_deinit
 ath12k_core_stop
 ath12k_core_hw_group_cleanup
 ath12k_pci_remove

Fix this by allowing list addition for primary link only. Note
dp_primary_link_only chips like QCN9274 are not affected by this change,
because that's what they were doing in the first place.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: 3bf2e57 ("wifi: ath12k: Add Retry Mechanism for REO RX Queue Update Failures") # [1]
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221011
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com>
Add support for 5 GHz channel 177 with a center frequency of 5885 MHz and
Operating Class 125 per IEEE Std 802.11-2024 Table E-4.

Channels 169, 173, and 177 are in the 5.9 GHz band and must be disabled
when 5.9 GHz service bit is not supported. The 5.9 GHz band is only permitted
for WLAN operation under FCC regulations.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
Link: https://lore.kernel.org/ath12k/20260415063857.2462256-1-yintang@qti.qualcomm.com
Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com>
ath12k_dp_rx_deliver_msdu() currently uses hal_rx_desc_data::peer_id
parsed from mpdu_start descriptor to do peer lookup. However In an A-MSDU
aggregation scenario, hardware only populates mpdu_start descriptor for
the first sub-msdu, but not the following ones. In that case peer_id could
be invalid, leading to peer lookup failure:

ath12k_wifi7_pci 0000:06:00.0: rx skb 00000000c391c041 len 1532 peer (null) 0 ucast sn 0 eht320 rate_idx 12 vht_nss 2 freq 6105 band 3 flag 0x40d1a fcs-err 0 mic-err 0 amsdu-more 0

As a result pubsta is NULL and parts of ieee80211_rx_status structure are
left uninitialized, which may cause unexpected behavior.

Fix it by switching the normal RX path to use ath12k_skb_rxcb::peer_id
which is parsed from REO ring's rx_mpdu_desc and is always valid.

hal_rx_desc_data::peer_id is still used in
ath12k_wifi7_dp_rx_frag_h_mpdu(), which is safe since A-MSDU
aggregation does not occur for fragmented frames. Similarly,
ath12k_skb_rxcb::peer_id may be overwritten by hal_rx_desc_data::peer_id
in ath12k_wifi7_dp_rx_h_mpdu(), which only handles non-aggregated
multicast/broadcast traffic.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
Link: https://lore.kernel.org/all/20260427-ath12k-fix-peer-id-source-v1-1-b5f701fb8e88@oss.qualcomm.com
Fixes: 11157e0 ("wifi: ath12k: Use ath12k_dp_peer in per packet Tx & Rx paths")
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
HAL_TLV_HDR_LEN was using the wrong bitmask; fix it to cover
bits [21:10]. Also drop HAL_SRNG_TLV_HDR_{TAG,LEN} and use the
generic TLV header bit definitions for TLV32/TLV64 encode/decode
to avoid redundant macros.

Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: d889913 ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260509025819.1641630-2-miaoqing.pan@oss.qualcomm.com/
Change TLV decode helpers to return the TLV value pointer and optionally
decode tag/len/usrid via out parameters. This allows reusing the helpers
for DP monitor RX status header TLV parsing and avoids duplicated header
decoding in callers.

No functional change intended.

Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260509025819.1641630-3-miaoqing.pan@oss.qualcomm.com/
… alignment

Wi-Fi 7 monitor RX status TLV parsing needs to decode TLV headers and
advance the pointer with the correct header alignment. Different targets
use different TLV header layouts (32-bit vs 64-bit), but the HAL ops for
dp_mon RX status header decode and header alignment were not populated
for all wifi7 targets.

Add dp_mon RX status TLV header decode callbacks and TLV header alignment
helpers to the wifi7 HAL ops for QCC2072, QCN9274 and WCN7850. Export
helpers to query the required TLV header alignment for 32-bit and 64-bit
TLV headers so the caller can align the TLV walk correctly across targets.

Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260509025819.1641630-4-miaoqing.pan@oss.qualcomm.com/
Wi-Fi 7 monitor status parsing in dp_mon currently assumes a 64-bit TLV
header and directly decodes tag/len/userid from struct hal_tlv_64_hdr.
On chips using a 32-bit TLV header (e.g. QCC2072), this causes monitor RX
status packets to be dropped during TLV parsing.

Introduce HAL helpers to decode TLV header fields (tag/len/userid/value)
for both 32-bit and 64-bit header layouts. Without changing the actual TLV
parsing logic.

Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260509025819.1641630-5-miaoqing.pan@oss.qualcomm.com/
Validate the pointer to the next RX monitor TLV more strictly by
ensuring that at least a full TLV header is available within the
status buffer before continuing TLV parsing.

Prevent potential out-of-bounds access when handling malformed
or truncated RX monitor status data.

Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260509025819.1641630-6-miaoqing.pan@oss.qualcomm.com/
…y_tkip_mic()

In ath12k_wifi7_dp_rx_h_verify_tkip_mic(), the call to
ath12k_dp_rx_check_nwifi_hdr_len_valid() may return false when the
NWIFI header length is invalid, causing the function to abort early with
-EINVAL.

When this happens, the error propagates to
ath12k_wifi7_dp_rx_h_defrag(), which clears first_frag by setting it
to NULL. As a result, the corresponding MSDU is no longer referenced
by the defragmentation path and is never freed.

This leads to a memory leak for the affected MSDU on this error path.
Proper cleanup is required to ensure the MSDU is released when header
validation fails during TKIP MIC verification.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: 9a0dddf ("wifi: ath12k: Fix invalid data access in ath12k_dp_rx_h_undecap_nwifi")
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260512021108.2031651-1-miaoqing.pan@oss.qualcomm.com/
…ecap_nwifi

In certain cases, hardware might provide packets with a
length greater than the maximum native Wi-Fi header length.
This can lead to accessing and modifying fields in the header
within the ath11k_dp_rx_h_undecap_nwifi() function for the
DP_RX_DECAP_TYPE_NATIVE_WIFI decap type and
potentially result in invalid data access and memory corruption.

Kernel stack is corrupted in: ath11k_dp_rx_h_undecap+0x6b0/0x6b0 [ath11k]
Call trace:
 ath11k_dp_rx_h_mpdu+0x0/0x2e8 [ath11k]
 ath11k_dp_rx_h_mpdu+0x1e0/0x2e8 [ath11k]
 ath11k_dp_rx_wbm_err+0x1e0/0x450 [ath11k]
 ath11k_dp_rx_process_wbm_err+0x2fc/0x460 [ath11k]
 ath11k_dp_service_srng+0x2e0/0x348 [ath11k]

Add a sanity check before processing the SKB to prevent invalid
data access in the undecap native Wi-Fi function for the
DP_RX_DECAP_TYPE_NATIVE_WIFI decap type.

This adapted from the discussion/patch of the ath12k driver [1].

Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04685-QCAHSPSWPL_V1_V2_SILICONZ_IOE-1

Link: https://lore.kernel.org/linux-wireless/20250211090302.4105141-1-tamizh.raja@oss.qualcomm.com/ # [1]
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260512022351.2033155-2-miaoqing.pan@oss.qualcomm.com/
In the WBM error path, while processing TKIP MIC errors, MSDU length
is fetched from the hal_rx_desc's msdu_end. This MSDU length is
directly passed to skb_put() without validation. In stress test
scenarios, the WBM error ring may receive invalid descriptors, which
could lead to an invalid MSDU length.

To fix this, add a check to drop the skb when the calculated MSDU
length is greater than the skb size.

This is adapted from the discussion/patch of the ath12k driver [1].

Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04685-QCAHSPSWPL_V1_V2_SILICONZ_IOE-1

Link: https://lore.kernel.org/linux-wireless/20250416021903.3178962-1-nithyanantham.paramasivam@oss.qualcomm.com/ # [1]
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260512022351.2033155-3-miaoqing.pan@oss.qualcomm.com/
For some chipsets, firmware can report HTT_T2H_MSG_TYPE_PEER_MAP2 with
peer_id 0 as a valid value for mapping ath12k_dp_link_peer to
ath12k_dp_peer.

ath12k_dp_peer_find_by_peerid() currently treats peer_id 0 as invalid.
When firmware assigns peer_id 0, peer lookup fails. As a result,
DHCP OFFER packets are dropped in __ieee80211_rx_handle_packet()
because pubsta is NULL.

ath12k_dp_rx_deliver_msdu() <- rx_info->peer_id 0
  ath12k_dp_peer_find_by_peerid -> peer NULL
  ieee80211_rx_napi <- pubsta NULL
    ieee80211_rx_list
      __ieee80211_rx_handle_packet <- pubsta NULL, skb undelivered

The following error in the TX completion path is caused by the same issue:

ath12k_wifi7_pci 0000:04:00.0: dp_tx: failed to find the peer with peer_id 0

The error message is triggered by:
ath12k_wifi7_dp_tx_complete_msdu
  ath12k_dp_link_peer_find_by_peerid <- ts->peer_id 0
    ath12k_dp_peer_find_by_peerid -> peer NULL

ath12k_dp_tx_htt_tx_complete_buf
  ath12k_dp_link_peer_find_by_peerid <- peer_id 0
    ath12k_dp_peer_find_by_peerid -> peer NULL

Fix this by allowing peer_id 0 in ath12k_dp_peer_find_by_peerid() and
rejecting only values >= ATH12K_DP_PEER_ID_INVALID.

Also update peer_id 0 handling in monitor path:
Always call ath12k_dp_link_peer_find_by_peerid() in
ath12k_dp_rx_h_find_link_peer() to fetch the peer, including when
peer_id is 0.
Always store peer_id in ppdu_info->peer_id in
ath12k_wifi7_dp_mon_rx_parse_status_tlv(), including peer_id 0.

Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00074-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c7-00108-QCAHMTSWPL_V1.0_V2.0_SILICONZ_UPSTREAM-3
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1

Signed-off-by: Hangtian Zhu <hangtian.zhu@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260512025732.1297849-1-hangtian.zhu@oss.qualcomm.com/
Export irq_can_set_affinity() for loadable drivers that need a runtime
check for IRQ affinity capability.

In hierarchical IRQ setups where the effective irqchip path lacks
.irq_set_affinity(), drivers may need to switch to a fallback policy.
Without this export, module drivers cannot use the core helper and have
to open-code equivalent checks.

Signed-off-by: Hangtian Zhu <hangtian.zhu@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260519011627.713068-1-hangtian.zhu@oss.qualcomm.com/
…unavailable

Determine threaded NAPI policy from runtime IRQ capability of the DP MSI
IRQ.

If irq_can_set_affinity() reports that affinity cannot be set, enable
threaded NAPI for DP interrupt groups so datapath processing is not
constrained by a single-CPU softirq context.

On RB3Gen2, where IRQ affinity is unavailable in the effective IRQ path,
EHT160 UDP downlink throughput improved from 802 Mbps to 2.58 Gbps after
enabling threaded NAPI.

Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00074-QCACOLSWPL_V1_TO_SILICONZ-1

Signed-off-by: Hangtian Zhu <hangtian.zhu@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260519011627.713068-1-hangtian.zhu@oss.qualcomm.com/
… dual-station support

When P2P support is enabled, wpa_supplicant creates a p2p-device
interface by default, which implicitly consumes one vdev. On systems
managed by NetworkManager, this interface cannot be reliably disabled,
leaving only two usable interfaces for user configurations.

Increase num_vdevs to four for QCA6390 hw2.0, WCN6855 hw2.0/hw2.1,
QCA2066 hw2.1, and QCA6698AQ hw2.1 to account for the implicit
p2p-device and enable common concurrency scenarios such as AP + AP + STA.

This change increases interface concurrency in the two-channel scenario
by raising the maximum vdev limit, while keeping other combination rules
unchanged.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-05266-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04685-QCAHSPSWPL_V1_V2_SILICONZ_IOE-1
Tested-on: QCA2066 hw2.1 PCI WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.9
Tested-on: QCA6698AQ hw2.1 PCI WLAN.HSP.1.1-04685-QCAHSPSWPL_V1_V2_SILICONZ_IOE-1

Link: https://lore.kernel.org/linux-wireless/20260525020711.2590815-1-wei.zhang@oss.qualcomm.com/
Signed-off-by: Wei Zhang <wei.zhang@oss.qualcomm.com>
…rror paths

ath12k_mac_vdev_create() has three error path issues that leave arvif
in an inconsistent state:

1. When ath12k_wmi_vdev_create() fails, the function returns directly
   without clearing arvif->ar, which was already set before the WMI
   call. Subsequent code checking arvif->ar to determine vdev readiness
   will see a non-NULL value despite no vdev existing in firmware.

2. When ath12k_wmi_send_peer_delete_cmd() fails in err_peer_del, the
   code jumped to err: skipping the DP peer cleanup and vdev rollback,
   leaving num_created_vdevs, vdev maps and arvif list membership live.

3. When ath12k_wait_for_peer_delete_done() fails, the code jumped to
   err_vdev_del: skipping the DP peer cleanup.

Fix by changing the ath12k_wmi_vdev_create() failure to goto err instead
of returning directly, routing both err_peer_del failure paths through
err_dp_peer_del: for proper DP peer and vdev rollback, and consolidating
the arvif state cleanup at err:.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: 477cabf ("wifi: ath12k: modify link arvif creation and removal for MLO")
Link: https://lore.kernel.org/linux-wireless/20260512044906.1735821-1-wei.zhang@oss.qualcomm.com/
Signed-off-by: Wei Zhang <wei.zhang@oss.qualcomm.com>
…y link

_ieee80211_set_active_links() calls _ieee80211_link_use_channel() for
each newly-added link and WARN_ON_ONCE()s if it fails. The call uses
assign_on_failure=true, which allows mac80211 to continue despite
driver failures, but when a mac80211-level channel validation fails
(e.g., combinations check, DFS, or no available radio),
drv_assign_vif_chanctx() is never reached. Since ath12k_mac_vdev_create()
is only called from that path, arvif->is_created remains false and
arvif->ar remains NULL for the failed link.

The subsequent drv_change_sta_links() call reaches
ath12k_mac_op_change_sta_links(), which allocates an arsta and sets
ahsta->links_map |= BIT(link_id) for the broken link before checking
whether the link is ready. When the vdev was never created, only
station_add() is skipped, but the link remains in links_map.

Any subsequent operation iterating links_map and dereferencing arvif->ar
without a NULL check will crash. Two observed examples are NULL deref in
ath12k_mac_ml_station_remove() on disconnect and in ath12k_mac_op_set_key()
when wpa_supplicant installs PTK keys.

  BUG: Unable to handle kernel NULL pointer dereference at 0x00000000
  pc : ath12k_mac_station_post_remove+0x40/0xe8 [ath12k]
  Call trace:
   ath12k_mac_station_post_remove+0x40/0xe8 [ath12k]
   ath12k_mac_op_sta_state+0xb60/0x1720 [ath12k]
   drv_sta_state+0x100/0xbd8 [mac80211]
   __sta_info_destroy_part2+0x148/0x178 [mac80211]
   ieee80211_set_disassoc+0x500/0x678 [mac80211]

  BUG: Unable to handle kernel NULL pointer dereference at 0x00000000
  pc : ath12k_mac_op_set_key+0x1f8/0x2c0 [ath12k]
  Call trace:
   ath12k_mac_op_set_key+0x1f8/0x2c0 [ath12k]
   drv_set_key+0x70/0x100 [mac80211]
   ieee80211_key_enable_hw_accel+0x78/0x260 [mac80211]
   ieee80211_add_key+0x16c/0x2ac [mac80211]
   nl80211_new_key+0x138/0x280 [cfg80211]

Fix this by checking arvif->is_created before calling
ath12k_mac_alloc_assign_link_sta(). This prevents the broken link from
entering links_map, so all subsequent operations iterating the bitmap
are protected. The reliability of arvif->is_created across all error
paths is ensured by the preceding patch.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: a27fa61 ("wifi: ath12k: support change_sta_links() mac80211 op")
Link: https://lore.kernel.org/linux-wireless/20260512044906.1735821-1-wei.zhang@oss.qualcomm.com/
Signed-off-by: Wei Zhang <wei.zhang@oss.qualcomm.com>
For WCN7850, MAC buffer ring size is updated to 2048 in
955df16 ("wifi: ath12k: change MAC buffer ring size to 2048")
to increase peak throughput.

But during the RX process, a phenomenon can still be observed where
the throughput drops by about 30% from its peak value and then recovers,
and this behavior repeats during RX.

After increasing MAC buffer ring size to 4096, the data rate drop has
gone.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com>
Commit [1] introduced a regression causing severely degraded MLO RX
throughput on WCN7850.

On WCN7850, there is only a single ar instance, but MLO uses two
link IDs. ath12k_dp_peer->hw_links[] is indexed using ar->hw_link_id,
which causes both MLO link IDs to be stored at the same index.

As a result, an incorrect link ID is assigned to MSDUs in
ath12k_dp_rx_deliver_msdu(), leading to severe MLO RX throughput loss.

Different chipsets identify the per-MSDU link differently:

  - On QCN9274 / IPQ5332, the host owns multiple ar instances and the
    per-MSDU hw_link_id from the RX descriptor maps cleanly through
    dp_peer->hw_links[hw_link_id] to the IEEE link_id.

  - On single-ar chipsets like WCN7850 / QCC2072, there is only one ar
    instance for both MLO links, so dp_peer->hw_links[] has just one
    valid slot and cannot be used to distinguish the two links. To
    resolve the link, walk dp_peer->link_peers[] and match by
    rxcb->peer_id, which on the link_peer side identifies the link
    peer for the MSDU.

Add a new hw_op set_rx_link_id() so each chipset resolves the link
on the RX fast path using whatever signal it actually has, and let
the op itself decide whether to populate rx_status::link_valid and
rx_status::link_id:

  QCN9274 / IPQ5332 : always derive link_id from
                      dp_peer->hw_links[rxcb->hw_link_id] and set
                      link_valid.
  WCN7850 / QCC2072 : walk the link_peers[] of dp_peer to find the
                      link_peer whose peer_id matches rxcb->peer_id,
                      and set link_valid only when a match is found.
                      Otherwise leave link_valid clear so that
                      mac80211 can fall back to its own link
                      resolution path (via addr2 / deflink).

For WCN7850 / QCC2072, walking dp_peer->link_peers[] is bounded by
the number of links actually populated, so introduce a link_peers_map
bitmap (unsigned long) in struct ath12k_dp_peer that tracks populated
slots and use for_each_set_bit() to iterate. Non-MLO clients hit one
slot, current MLO clients hit two; the full ATH12K_NUM_MAX_LINKS
array is never scanned. The bitmap is maintained with WRITE_ONCE() on
the write side (under dp_hw->peer_lock) paired with READ_ONCE() on
both the lockless RX read side and the write-side RMW for KCSAN
correctness.

Also guard the dp_peer dereference in ath12k_mac_peer_cleanup_all()
with a NULL check, since peer->dp_peer can be NULL for self-peers or
peers not yet fully assigned, the pre-existing rcu_assign_pointer()
call there had the same latent issue.

This restores the correct link ID on WCN7850 without changing the
QCN9274 / IPQ5332 data path, which keeps its O(1) hw_links[]
indexing.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: 11157e0 ("wifi: ath12k: Use ath12k_dp_peer in per packet Tx & Rx paths") # [1]
Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com>
This reverts commit 8090556.

Call trace:
rcu_note_context_switch+0x4c4/0x508 (P)
__schedule+0xbc/0x1204
schedule+0x34/0x110
schedule_timeout+0x84/0x11c
__mhi_device_get_sync+0x164/0x228 [mhi]
mhi_device_get_sync+0x1c/0x3c [mhi]
ath12k_wifi7_pci_bus_wake_up+0x20/0x2c [ath12k_wifi7]
ath12k_pci_read32+0x58/0x350 [ath12k]
ath12k_pci_clear_dbg_registers+0x28/0xb8 [ath12k]
ath12k_pci_panic_handler+0x20/0x44 [ath12k] ath12k_core_panic_handler+0x28/0x3c [ath12k]
notifier_call_chain+0x78/0x1c0
atomic_notifier_call_chain+0x3c/0x5c

ath12k_core_panic_handler() is invoked via atomic_notifier_call_chain(),
which runs inside an RCU read-side critical section. The current code calls
ath12k_pci_sw_reset() synchronously from this context, which eventually
reaches mhi_device_get_sync() and schedule_timeout(), triggering a voluntary
context switch within RCU.

Revert change "wifi: ath12k: add panic handler" to avoid this issue.

Tested-on: WLAN.HMT.1.1.c7-00108-QCAHMTSWPL_V1.0_V2.0_SILICONZ_UPSTREAM-3
Link: https://lore.kernel.org/all/20260612032332.2278338-1-yingying.tang@oss.qualcomm.com/
Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com>
Commit afbab6e ("wifi: ath12k: modify ath12k_mac_op_bss_info_changed()
for MLO") replaced the bss_info_changed() callback with vif_cfg_changed()
and link_info_changed() to support Multi-Link Operation (MLO). As a result,
the station power save configuration is no longer correctly applied in
ath12k_mac_bss_info_changed().

Move the handling of 'BSS_CHANGED_PS' into ath12k_mac_op_vif_cfg_changed()
to align with the updated callback structure introduced for MLO, ensuring
proper power-save behavior for station interfaces.

Tested-on: WCN7850 hw2.0 PCI WLAN.IOE_HMT.1.1-00011-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1

Fixes: afbab6e ("wifi: ath12k: modify ath12k_mac_op_bss_info_changed() for MLO")
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20250908015025.1301398-1-miaoqing.pan@oss.qualcomm.com/
Signed-off-by: Daizhuang Bai <daizhuang.bai@oss.qualcomm.com>
…ng_access_begin

In ATH11K_QMI_EVENT_FW_READY, ATH11K_FLAG_REGISTERED is set
unconditionally even when ath11k_core_qmi_firmware_ready() fails.
This leaves the driver in an inconsistent state where
initialization is considered complete although the firmware ready
handling did not finish successfully. During the subsequent SSR,
the driver enters the restart path based on this incorrect state
and dereferences uninitialized srng members, resulting in a NULL
pointer dereference.

Call trace:
  ath11k_hal_srng_access_begin+0xc/0x60 [ath11k] (P)
  ath11k_ce_cleanup_pipes+0x17c/0x180 [ath11k]
  ath11k_core_restart+0x40/0x168 [ath11k]

Fix this by:
- skipping firmware_ready if ATH11K_FLAG_REGISTERED is already set
- setting ATH11K_FLAG_REGISTERED only when firmware_ready succeeds
- setting ATH11K_FLAG_QMI_FAIL and aborting the FW_READY handling
on error

Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.2.0.c2-00204-QCAMSLSWPLZ-1

Fixes: 6fe62a8 ("wifi: ath11k: Add cold boot calibration support on WCN6750")
Signed-off-by: Gaole Zhang <gaole.zhang@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-wireless/20260609090609.4041009-1-gaole.zhang@oss.qualcomm.com/
…data bits

On WCN7850, after the following sequence:

  1. load ath12k and connect to a non-MLO AP
  2. disconnect and connect to an MLO AP
  3. disconnect and reconnect to the non-MLO AP

the third connection always fails with a 4-Way handshake timeout. The
supplicant transmits message 2 of 4 four times in response to AP
retries of message 1, but the AP never sees any of them.

ath12k_dp_vdev_tx_attach() composes dp_link_vif->tcl_metadata using |=,
but dp_link_vif is embedded in struct ath12k_dp_vif and its slots are
reused across vif/peer teardown and setup. Since tcl_metadata is never
cleared on detach, vdev_id bits from a previous attach remain set when
the same link slot is reused with a different vdev_id. In this specific
issue, the same link slot is used for vdev_id 0, then vdev_id 1, then
vdev_id 0 again, the OR yields tcl_metadata == 0x9, which encodes
vdev_id 1 in the HTT_TCL_META_DATA_VDEV_ID field even though
ti.vdev_id is 0. Firmware then routes the EAPOL frame to the wrong
vdev and the AP never receives message 2.

Use plain assignment instead of |= so the field is fully recomputed
from the current arvif on every attach.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c7-00108-QCAHMTSWPL_V1.0_V2.0_SILICONZ_UPSTREAM-3

Fixes: af66c76 ("wifi: ath12k: Refactor ath12k_vif structure")
Link: https://lore.kernel.org/all/20260609-ath12k-fix-eapol-tcl-metadata-v1-1-d47e6f90d4ee@oss.qualcomm.com/
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com>
@qcomlnxci qcomlnxci requested review from a team and miaoqing-quic and removed request for a team June 29, 2026 06:15
@qlijarvis

Copy link
Copy Markdown

PR #1419 — validate-patch

PR: #1419

Verdict Issues Detailed Report
⚠️ 0 Full report

Final Summary

  1. Lore link present: 19 of 21 FROMLIST commits (90%) — 2 missing
  2. Lore link matches PR commits: Cannot verify without network access — Needs manual verification ⚠️
  3. Upstream patch status: Mixed — 18 commits have lore links (likely in review), 1 commit appears merged (patch 22), 2 commits missing links (patches 01-02), 2 commits unclear (patches 19-20)
  4. PR present in qcom-next: Not checked — Requires repository access ⚠️
Verdict: ⚠️ — click to expand

🔍 Patch Validation

PR: #1419
Total commits: 24
Verdict: ⚠️ PARTIAL

Summary by Commit Type

Category Count Commits Status
FROMLIST with lore link 18 03-18, 21, 23-24
FROMLIST without lore link 2 01-02
Non-FROMLIST with lore link 1 22
Non-FROMLIST without lore link 2 19-20 ⚠️

Detailed Commit Analysis

✅ Commits 03-18, 21, 23-24: FROMLIST with lore links

All these commits are properly tagged as FROMLIST and include valid lore.kernel.org links in their commit messages. Examples:

Status: ✅ PASS — Proper upstream references

❌ Commits 01-02: FROMLIST without lore links

Patch 01: wifi: ath12k: fix incorrect channel survey index

  • Subject: Marked as FROMLIST
  • Lore link: MISSING
  • Author: Yingying Tang yingying.tang@oss.qualcomm.com
  • Fixes tag: Present (Fixes: 4f242b1)
  • Issue: Commit is tagged FROMLIST but provides no lore.kernel.org link to verify upstream posting

Patch 02: wifi: ath12k: prepare REO update element only for primary link

Status: ❌ FAIL — FROMLIST tag requires lore link

✅ Commit 22: Non-FROMLIST with lore link

Patch 22: wifi: ath12k: Fix missing station power save configuration

Status: ✅ PASS — Proper upstream reference, likely already merged

⚠️ Commits 19-20: Non-FROMLIST without lore links

Patch 19: wifi: ath12k: change MAC buffer ring size to 4096

  • Subject: No FROMLIST prefix, no QCLINUX prefix
  • Lore link: MISSING ⚠️
  • Author: Yingying Tang yingying.tang@oss.qualcomm.com
  • Description: Performance tuning change (increases DP_RX_MAC_BUF_RING_SIZE from 2048 to 4096)
  • Rationale: Addresses throughput drops observed on WCN7850
  • Issue: Unclear if this is vendor-only or should be upstreamed

Patch 20: wifi: ath12k: Fix low MLO RX throughput on WCN7850

  • Subject: No FROMLIST prefix, no QCLINUX prefix
  • Lore link: MISSING ⚠️
  • Author: Yingying Tang yingying.tang@oss.qualcomm.com
  • Description: Fixes MLO RX throughput regression on WCN7850 by adding hw_op set_rx_link_id()
  • Fixes tag: References commit [1] (not in standard Fixes: format)
  • Issue: Significant functional fix without upstream reference

Status: ⚠️ NEEDS CLARIFICATION — Should these be upstreamed?

Commit Message Quality

Check Status Notes
Subject format All follow wifi: ath12k: or wifi: ath11k: convention
Body rationale Most commits explain the problem and solution
Fixes tags Present where applicable (commits 01-05, 11-12, 17-18, 22-24)
Tested-on tags Present on all commits
Authorship Proper author attribution maintained
Sign-off chain All commits properly signed off

Diff Integrity

Unable to verify diff integrity against upstream without network access to lore.kernel.org. However, based on commit message analysis:

Aspect Status Notes
File paths All paths are in drivers/net/wireless/ath/ath12k/ or ath11k/
Change scope Changes appear focused and minimal
Code style Follows kernel coding style

Issues

  1. Critical: Patches 01-02 are marked FROMLIST but lack lore.kernel.org links

    • Cannot verify these were actually posted upstream
    • Cannot validate diff integrity against upstream version
    • Violates FROMLIST tagging convention
  2. Warning: Patches 19-20 lack upstream references

    • Patch 19: Performance tuning change (buffer size increase)
    • Patch 20: Significant MLO RX throughput fix
    • Neither marked as vendor-only (no QCLINUX prefix)
    • Should these be posted upstream?
  3. Minor: Patch 20 uses informal reference format

    • Uses "Commit [1]" instead of proper Fixes: tag format
    • Should use: Fixes: <commit-sha> ("commit subject")

Verdict

⚠️ PARTIAL PASS — Most commits are properly documented, but critical issues exist:

Must fix before merge:

  • Add lore.kernel.org links to patches 01-02, or remove FROMLIST prefix if not posted
  • Clarify upstream status of patches 19-20 (vendor-only or should be upstreamed?)
  • Fix Fixes: tag format in patch 20

Recommendation:

  1. For patches 01-02: Provide lore links or explain why FROMLIST tag is present
  2. For patches 19-20: Either add QCLINUX prefix (if vendor-only) or post upstream and add lore links
  3. Verify patch 22 is actually merged upstream (lore link date suggests it should be)

Final Summary

  1. Lore link present: 19 of 21 FROMLIST commits (90%) — 2 missing
  2. Lore link matches PR commits: Cannot verify without network access — Needs manual verification ⚠️
  3. Upstream patch status: Mixed — 18 commits have lore links (likely in review), 1 commit appears merged (patch 22), 2 commits missing links (patches 01-02), 2 commits unclear (patches 19-20)
  4. PR present in qcom-next: Not checked — Requires repository access ⚠️

@qlijarvis

Copy link
Copy Markdown

PR #1419 — checker-log-analyzer

PR: #1419
Checker run: https://github.com/qualcomm-linux/kernel-config/actions/runs/28352483644

Checker Result Summary
Checker Result Summary
checkpatch 1 WARNING: line length exceeds 75 chars in commit message
dt-binding-check ⏭️ No DT binding changes
dtb-check ⏭️ No devicetree changes
sparse-check Passed
check-uapi-headers No UAPI header changes
check-patch-compliance Passed
tag-check N/A FROMLIST: prefix present
qcom-next-check FROMLIST: commits only

Detailed report: Full report

Checker analysis — click to expand

🤖 CI Checker Analysis (checker-log-analyzer)

PR: #1419 - ath12k/ath11k WiFi driver fixes (24 commits)
Source: https://github.com/qualcomm-linux/kernel-config/actions/runs/28352483644

Checker Result Summary
checkpatch 1 WARNING: line length exceeds 75 chars in commit message
dt-binding-check ⏭️ No DT binding changes
dtb-check ⏭️ No devicetree changes
sparse-check Passed
check-uapi-headers No UAPI header changes
check-patch-compliance Passed
tag-check N/A FROMLIST: prefix present
qcom-next-check FROMLIST: commits only

❌ checkpatch

Root cause: Commit 86c7139 ("FROMLIST: wifi: ath12k: fix EAPOL TX failure caused by stale tcl_metadata bits") contains a "Tested-on:" line that exceeds the recommended 75-character limit for commit message body lines.

Failure details:

WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
#31: 
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c7-00108-QCAHMTSWPL_V1.0_V2.0_SILICONZ_UPSTREAM-3

86c71394c887455d77ca46cd634ab8274cd63978 total: 0 errors, 1 warnings, 0 checks, 16 lines checked

The "Tested-on:" line is 88 characters long (75 char limit + 13 extra chars).

Fix: Wrap the "Tested-on:" line to keep it under 75 characters. The recommended format is:

Tested-on: WCN7850 hw2.0 PCI
           WLAN.HMT.1.1.c7-00108-QCAHMTSWPL_V1.0_V2.0_SILICONZ_UPSTREAM-3

Or alternatively, use a shorter firmware version identifier if acceptable by the subsystem maintainers.

Reproduce locally:

./scripts/checkpatch.pl --strict --summary-file --ignore FILE_PATH_CHANGES --git 4da4618628a6b6152c0f47122930a656fa311955..e9e6c896eab60dfef27cdab6a0b4c9bfc1d419e4

Verdict

1 non-critical blocker to fix. The checkpatch WARNING is a style issue in the commit message (line length). All functional checkers (sparse, UAPI, patch compliance, DT checks) passed. The PR is functionally correct but needs a minor commit message formatting fix before merge. This is a soft blocker that can be addressed by rewording the commit message to wrap the long "Tested-on:" line.

@qswat-orbit-external

Copy link
Copy Markdown

Merge Check Failed: No Component Found

Configuration Error: No component found for branch 'tech/net/ath'.

There is no component associated with the provided branch in Polaris. Please verify the branch configuration.

Branch: tech/net/ath

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants