[RLC-8] Rebase Custom Changes to rlc-8/4.18.0-553.111.1.el8_10#1015
[RLC-8] Rebase Custom Changes to rlc-8/4.18.0-553.111.1.el8_10#1015ciq-kernel-automation[bot] wants to merge 50 commits intorlc-8/4.18.0-553.111.1.el8_10from
Conversation
Using the kernel crypto API, the SHA3-256 algorithm is used as
conditioning element to replace the LFSR in the Jitter RNG. All other
parts of the Jitter RNG are unchanged.
The application and use of the SHA-3 conditioning operation is identical
to the user space Jitter RNG 3.4.0 by applying the following concept:
- the Jitter RNG initializes a SHA-3 state which acts as the "entropy
pool" when the Jitter RNG is allocated.
- When a new time delta is obtained, it is inserted into the "entropy
pool" with a SHA-3 update operation. Note, this operation in most of
the cases is a simple memcpy() onto the SHA-3 stack.
- To cause a true SHA-3 operation for each time delta operation, a
second SHA-3 operation is performed hashing Jitter RNG status
information. The final message digest is also inserted into the
"entropy pool" with a SHA-3 update operation. Yet, this data is not
considered to provide any entropy, but it shall stir the entropy pool.
- To generate a random number, a SHA-3 final operation is performed to
calculate a message digest followed by an immediate SHA-3 init to
re-initialize the "entropy pool". The obtained message digest is one
block of the Jitter RNG that is returned to the caller.
Mathematically speaking, the random number generated by the Jitter RNG
is:
aux_t = SHA-3(Jitter RNG state data)
Jitter RNG block = SHA-3(time_i || aux_i || time_(i-1) || aux_(i-1) ||
... || time_(i-255) || aux_(i-255))
when assuming that the OSR = 1, i.e. the default value.
This operation implies that the Jitter RNG has an output-blocksize of
256 bits instead of the 64 bits of the LFSR-based Jitter RNG that is
replaced with this patch.
The patch also replaces the varying number of invocations of the
conditioning function with one fixed number of invocations. The use
of the conditioning function consistent with the userspace Jitter RNG
library version 3.4.0.
The code is tested with a system that exhibited the least amount of
entropy generated by the Jitter RNG: the SiFive Unmatched RISC-V
system. The measured entropy rate is well above the heuristically
implied entropy value of 1 bit of entropy per time delta. On all other
tested systems, the measured entropy rate is even higher by orders
of magnitude. The measurement was performed using updated tooling
provided with the user space Jitter RNG library test framework.
The performance of the Jitter RNG with this patch is about en par
with the performance of the Jitter RNG without the patch.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Back-port of commit bb897c5
Author: Stephan Müller <smueller@chronox.de>
Date: Fri Apr 21 08:08:04 2023 +0200
Signed-off-by: Jeremy Allison <jallison@ciq.com>
I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding
cryptographic information should be zeroized once they are no longer
needed. Accomplish this by using kfree_sensitive for buffers that
previously held the private key.
Signed-off-by: Hailey Mothershead <hailmo@amazon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Back-ported from commit 23e4099
Author: Hailey Mothershead <hailmo@amazon.com>
Date: Mon Apr 15 22:19:15 2024 +0000
Signed-off-by: Jeremy Allison <jallison@ciq.com>
Signed-off-by: Jeremy Allison <jallison@ciq.com>
Signed-off-by: Jeremy Allison <jallison@ciq.com>
The output n bits can receive more than n bits of min entropy, of course,
but the fixed output of the conditioning function can only asymptotically
approach the output size bits of min entropy, not attain that bound.
Random maps will tend to have output collisions, which reduces the
creditable output entropy (that is what SP 800-90B Section 3.1.5.1.2
attempts to bound).
The value "64" is justified in Appendix A.4 of the current 90C draft,
and aligns with NIST's in "epsilon" definition in this document, which is
that a string can be considered "full entropy" if you can bound the min
entropy in each bit of output to at least 1-epsilon, where epsilon is
required to be <= 2^(-32).
Note, this patch causes the Jitter RNG to cut its performance in half in
FIPS mode because the conditioning function of the LFSR produces 64 bits
of entropy in one block. The oversampling requires that additionally 64
bits of entropy are sampled from the noise source. If the conditioner is
changed, such as using SHA-256, the impact of the oversampling is only
one fourth, because for the 256 bit block of the conditioner, only 64
additional bits from the noise source must be sampled.
This patch is derived from the user space jitterentropy-library.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Back-port of upstream commit 908dffa.
Signed-off-by: Jeremy Allison <jallison@ciq.com>
private_key is overwritten with the key parameter passed in by the caller (if present), or alternatively a newly generated private key. However, it is possible that the caller provides a key (or the newly generated key) which is shorter than the previous key. In that scenario, some key material from the previous key would not be overwritten. The easiest solution is to explicitly zeroize the entire private_key array first. Note that this patch slightly changes the behavior of this function: previously, if the ecc_gen_privkey failed, the old private_key would remain. Now, the private_key is always zeroized. This behavior is consistent with the case where params.key is set and ecc_is_key_valid fails. Signed-off-by: Joachim Vandersmissen <git@jvdsn.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Back-port of upstream commit: 73e5984 Signed-off-by: Jeremy Allison <jallison@ciq.com>
key might contain private part of the key, so better use
kfree_sensitive to free it
Signed-off-by: Mahmoud Adam <mngyadam@amazon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Back-port of upstream commit: 9f3fa6b
Signed-off-by: Jeremy Allison <jallison@ciq.com>
…ey() to zeroize keys on exit. Signed-off-by: Jeremy Allison <jallison@ciq.com>
Add workflows for pushes and pull requests. Signed-off-by: Greg Rose <g.v.rose@ciq.com>
LE-2786 Sync kernel-x86_64.config with el86-fips-compliant-8 branch from internal dist-git. Same as shipped src.rpm. Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
We run build checks on pull requests now instead of push Signed-off-by: Jonathan Maple <jmaple@ciq.com>
LE-3770 This github action checks the PR commits for references to upstream linux commits (lines starting with "commit <hash>") and does two things: 1. Checks that this hash exists in the upstream linux kernel history 2. Checks if there are any Fixes: references for the referenced commit in the upstream linux kernel history If either of those are found to be true a comment is added to the PR with the pertinent information. The logic for the check is provided by the check_upstream_commits.py script from kernel-src-tree-tools Signed-off-by: Jonathan Maple <jmaple@ciq.com>
LE-3799 After the build check, perform a kabi check Signed-off-by: Jonathan Maple <jmaple@ciq.com>
The upstream commit check workflow was failing for pull requests originating from forked repositories. The previous implementation incorrectly assumed the pull request branch existed on the base repository. This commit corrects the workflow to ensure the pull request branch is checked out from the correct source repository, while the base branch is fetched from the target repository. Signed-off-by: Jonathan Maple <jmaple@ciq.com>
The process-pull-request workflow was failing for pull requests originating from forked repositories. The previous implementation incorrectly assumed the pull request branch existed on the base repository. This commit corrects the workflow to ensure the pull request branch is checked out from the correct source repository, while the base branch is fetched from the target repository. Signed-off-by: Jonathan Maple <jmaple@ciq.com>
There will be a new PR checker inbound soon this one is just broken so removing it. Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
Simplifies the workflow to use the reusable workflow defined in main branch. This reduces duplication and makes the workflow easier to maintain across multiple branches. The workflow was renamed because it now includes validation over and above just checking for upstream fixes Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
… packets jira LE-1733 bugfix geneve_fixes commit 791b408 Move the vxlan_features_check() call to after we verified the packet is a tunneled VXLAN packet. Without this, tunneled UDP non-VXLAN packets (for ex. GENENVE) might wrongly not get offloaded. In some cases, it worked by chance as GENEVE header is the same size as VXLAN, but it is obviously incorrect. Fixes: e3cfc7e ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 791b408) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira VULN-12931 cve CVE-2024-56642 commit-author Kuniyuki Iwashima <kuniyu@amazon.com> commit 6a2fa13 syzkaller reported a use-after-free of UDP kernel socket in cleanup_bearer() without repro. [0][1] When bearer_disable() calls tipc_udp_disable(), cleanup of the UDP kernel socket is deferred by work calling cleanup_bearer(). tipc_net_stop() waits for such works to finish by checking tipc_net(net)->wq_count. However, the work decrements the count too early before releasing the kernel socket, unblocking cleanup_net() and resulting in use-after-free. Let's move the decrement after releasing the socket in cleanup_bearer(). [0]: ref_tracker: net notrefcnt@000000009b3d1faf has 1/1 users at sk_alloc+0x438/0x608 inet_create+0x4c8/0xcb0 __sock_create+0x350/0x6b8 sock_create_kern+0x58/0x78 udp_sock_create4+0x68/0x398 udp_sock_create+0x88/0xc8 tipc_udp_enable+0x5e8/0x848 __tipc_nl_bearer_enable+0x84c/0xed8 tipc_nl_bearer_enable+0x38/0x60 genl_family_rcv_msg_doit+0x170/0x248 genl_rcv_msg+0x400/0x5b0 netlink_rcv_skb+0x1dc/0x398 genl_rcv+0x44/0x68 netlink_unicast+0x678/0x8b0 netlink_sendmsg+0x5e4/0x898 ____sys_sendmsg+0x500/0x830 [1]: BUG: KMSAN: use-after-free in udp_hashslot include/net/udp.h:85 [inline] BUG: KMSAN: use-after-free in udp_lib_unhash+0x3b8/0x930 net/ipv4/udp.c:1979 udp_hashslot include/net/udp.h:85 [inline] udp_lib_unhash+0x3b8/0x930 net/ipv4/udp.c:1979 sk_common_release+0xaf/0x3f0 net/core/sock.c:3820 inet_release+0x1e0/0x260 net/ipv4/af_inet.c:437 inet6_release+0x6f/0xd0 net/ipv6/af_inet6.c:489 __sock_release net/socket.c:658 [inline] sock_release+0xa0/0x210 net/socket.c:686 cleanup_bearer+0x42d/0x4c0 net/tipc/udp_media.c:819 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xcaf/0x1c90 kernel/workqueue.c:3310 worker_thread+0xf6c/0x1510 kernel/workqueue.c:3391 kthread+0x531/0x6b0 kernel/kthread.c:389 ret_from_fork+0x60/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244 Uninit was created at: slab_free_hook mm/slub.c:2269 [inline] slab_free mm/slub.c:4580 [inline] kmem_cache_free+0x207/0xc40 mm/slub.c:4682 net_free net/core/net_namespace.c:454 [inline] cleanup_net+0x16f2/0x19d0 net/core/net_namespace.c:647 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xcaf/0x1c90 kernel/workqueue.c:3310 worker_thread+0xf6c/0x1510 kernel/workqueue.c:3391 kthread+0x531/0x6b0 kernel/kthread.c:389 ret_from_fork+0x60/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244 CPU: 0 UID: 0 PID: 54 Comm: kworker/0:2 Not tainted 6.12.0-rc1-00131-gf66ebf37d69c #7 91723d6f74857f70725e1583cba3cf4adc716cfa Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: events cleanup_bearer Fixes: 26abe14 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241127050512.28438-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit 6a2fa13) Signed-off-by: David Gomez <dgomez@ciq.com>
jira VULN-56026 cve CVE-2025-21927 commit-author Maurizio Lombardi <mlombard@redhat.com> commit ad95bab upstream-diff Removed `nvme_tcp_c2h_term' case from `nvme_tcp_recv_pdu_supported' for the sake of consistency of `nvme_tcp_recv_pdu''s behavior relative to the upstream version, between the cases of proper and improper header. (What could be considered as "`c2h_term' type support" started with 84e0090 commit, not included in `ciqlts9_2''s history, so `nvme_tcp_recv_pdu_supported' in `ciqlts9_2' shouldn't report the `nvme_tcp_c2h_term' type as supported.) nvme_tcp_recv_pdu() doesn't check the validity of the header length. When header digests are enabled, a target might send a packet with an invalid header length (e.g. 255), causing nvme_tcp_verify_hdgst() to access memory outside the allocated area and cause memory corruptions by overwriting it with the calculated digest. Fix this by rejecting packets with an unexpected header length. Fixes: 3f2304f ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> (cherry picked from commit ad95bab) Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
jira VULN-65790 cve CVE-2022-49803 commit-author Wang Yufen <wangyufen@huawei.com> commit 064bc73 kmemleak reports this issue: unreferenced object 0xffff8881bac872d0 (size 8): comm "sh", pid 58603, jiffies 4481524462 (age 68.065s) hex dump (first 8 bytes): 04 00 00 00 de ad be ef ........ backtrace: [<00000000c80b8577>] __kmalloc+0x49/0x150 [<000000005292b8c6>] nsim_dev_trap_fa_cookie_write+0xc1/0x210 [netdevsim] [<0000000093d78e77>] full_proxy_write+0xf3/0x180 [<000000005a662c16>] vfs_write+0x1c5/0xaf0 [<000000007aabf84a>] ksys_write+0xed/0x1c0 [<000000005f1d2e47>] do_syscall_64+0x3b/0x90 [<000000006001c6ec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd The issue occurs in the following scenarios: nsim_dev_trap_fa_cookie_write() kmalloc() fa_cookie nsim_dev->fa_cookie = fa_cookie .. nsim_drv_remove() The fa_cookie allocked in nsim_dev_trap_fa_cookie_write() is not freed. To fix, add kfree(nsim_dev->fa_cookie) to nsim_drv_remove(). Fixes: d3cbb90 ("netdevsim: add ACL trap reporting cookie as a metadata") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Cc: Jiri Pirko <jiri@mellanox.com> Link: https://lore.kernel.org/r/1668504625-14698-1-git-send-email-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 064bc73) Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
jira VULN-45766 jira VULN-45767 cve cve-2024-49978 commit-author Willem de Bruijn <willemb@google.com> commit a1e40ac upstream-diff contextual diff is off due to massive reworks. In addition __udpv6_gso_segment_list_csum definition is not included. This was included via "net/gro.h" via 75082e7 which is a bug fix to 4721031 "net: move gro definitions to include/net/gro.h". Since we also do not have that we're just directly including net/ip6_checksum.h to this file. Detect gso fraglist skbs with corrupted geometry (see below) and pass these to skb_segment instead of skb_segment_list, as the first can segment them correctly. Valid SKB_GSO_FRAGLIST skbs - consist of two or more segments - the head_skb holds the protocol headers plus first gso_size - one or more frag_list skbs hold exactly one segment - all but the last must be gso_size Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can modify these skbs, breaking these invariants. In extreme cases they pull all data into skb linear. For UDP, this causes a NULL ptr deref in __udpv4_gso_segment_list_csum at udp_hdr(seg->next)->dest. Detect invalid geometry due to pull, by checking head_skb size. Don't just drop, as this may blackhole a destination. Convert to be able to pass to regular skb_segment. Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/ Fixes: 9fd1ff5 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241001171752.107580-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit a1e40ac) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira VULN-38750 jira VULN-38751 cve CVE-2024-42281 commit-author Fred Li <dracodingfly@gmail.com> commit fa5ef65 Linearize the skb when downgrading gso_size because it may trigger a BUG_ON() later when the skb is segmented as described in [1,2]. Fixes: 2be7e21 ("bpf: add bpf_skb_adjust_room helper") Signed-off-by: Fred Li <dracodingfly@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/all/20240626065555.35460-2-dracodingfly@gmail.com [1] Link: https://lore.kernel.org/all/668d5cf1ec330_1c18c32947@willemb.c.googlers.com.notmuch [2] Link: https://lore.kernel.org/bpf/20240719024653.77006-1-dracodingfly@gmail.com (cherry picked from commit fa5ef65) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira VULN-156444 jira VULN-156445 cve CVE-2025-38124 commit-author Shiming Cheng <shiming.cheng@mediatek.com> commit 3382a1e Commit a1e40ac ("net: gso: fix udp gso fraglist segmentation after pull from frag_list") detected invalid geometry in frag_list skbs and redirects them from skb_segment_list to more robust skb_segment. But some packets with modified geometry can also hit bugs in that code. We don't know how many such cases exist. Addressing each one by one also requires touching the complex skb_segment code, which risks introducing bugs for other types of skbs. Instead, linearize all these packets that fail the basic invariants on gso fraglist skbs. That is more robust. If only part of the fraglist payload is pulled into head_skb, it will always cause exception when splitting skbs by skb_segment. For detailed call stack information, see below. Valid SKB_GSO_FRAGLIST skbs - consist of two or more segments - the head_skb holds the protocol headers plus first gso_size - one or more frag_list skbs hold exactly one segment - all but the last must be gso_size Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can modify fraglist skbs, breaking these invariants. In extreme cases they pull one part of data into skb linear. For UDP, this causes three payloads with lengths of (11,11,10) bytes were pulled tail to become (12,10,10) bytes. The skbs no longer meets the above SKB_GSO_FRAGLIST conditions because payload was pulled into head_skb, it needs to be linearized before pass to regular skb_segment. skb_segment+0xcd0/0xd14 __udp_gso_segment+0x334/0x5f4 udp4_ufo_fragment+0x118/0x15c inet_gso_segment+0x164/0x338 skb_mac_gso_segment+0xc4/0x13c __skb_gso_segment+0xc4/0x124 validate_xmit_skb+0x9c/0x2c0 validate_xmit_skb_list+0x4c/0x80 sch_direct_xmit+0x70/0x404 __dev_queue_xmit+0x64c/0xe5c neigh_resolve_output+0x178/0x1c4 ip_finish_output2+0x37c/0x47c __ip_finish_output+0x194/0x240 ip_finish_output+0x20/0xf4 ip_output+0x100/0x1a0 NF_HOOK+0xc4/0x16c ip_forward+0x314/0x32c ip_rcv+0x90/0x118 __netif_receive_skb+0x74/0x124 process_backlog+0xe8/0x1a4 __napi_poll+0x5c/0x1f8 net_rx_action+0x154/0x314 handle_softirqs+0x154/0x4b8 [118.376811] [C201134] rxq0_pus: [name:bug&]kernel BUG at net/core/skbuff.c:4278! [118.376829] [C201134] rxq0_pus: [name:traps&]Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP [118.470774] [C201134] rxq0_pus: [name:mrdump&]Kernel Offset: 0x178cc00000 from 0xffffffc008000000 [118.470810] [C201134] rxq0_pus: [name:mrdump&]PHYS_OFFSET: 0x40000000 [118.470827] [C201134] rxq0_pus: [name:mrdump&]pstate: 60400005 (nZCv daif +PAN -UAO) [118.470848] [C201134] rxq0_pus: [name:mrdump&]pc : [0xffffffd79598aefc] skb_segment+0xcd0/0xd14 [118.470900] [C201134] rxq0_pus: [name:mrdump&]lr : [0xffffffd79598a5e8] skb_segment+0x3bc/0xd14 [118.470928] [C201134] rxq0_pus: [name:mrdump&]sp : ffffffc008013770 Fixes: a1e40ac ("gso: fix udp gso fraglist segmentation after pull from frag_list") Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 3382a1e) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
…tead of a two-phase approach jira roc-2673 commit fbf6449 Instead of setting x86_virt_bits to a possibly-correct value and then correcting it later, do all the necessary checks before setting it. At this point, the #VC handler references boot_cpu_data.x86_virt_bits, and in the previous version, it would be triggered by the CPUIDs between the point at which it is set to 48 and when it is set to the correct value. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Adam Dunlap <acdunlap@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jacob Xu <jacobhxu@google.com> Link: https://lore.kernel.org/r/20230912002703.3924521-3-acdunlap@google.com Signed-off-by: Ronnie Sahlberg <rsahlberg@ciq.com>
jira roc-2673 commit 3e32552 c->x86_cache_alignment is initialized from c->x86_clflush_size. However, commit fbf6449 moved c->x86_clflush_size initialization to later in boot without moving the c->x86_cache_alignment assignment: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") This presumably left c->x86_cache_alignment set to zero for longer than it should be. The result was an oops on 32-bit kernels while accessing a pointer at 0x20. The 0x20 came from accessing a structure member at offset 0x10 (buffer->cpumask) from a ZERO_SIZE_PTR=0x10. kmalloc() can evidently return ZERO_SIZE_PTR when it's given 0 as its alignment requirement. Move the c->x86_cache_alignment initialization to be after c->x86_clflush_size has an actual value. Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20231002220045.1014760-1-dave.hansen@linux.intel.com (cherry picked from commit 3e32552) Signed-off-by: Ronnie Sahlberg <rsahlberg@ciq.com>
jira LE-2183 bug-fix x86/sev-es: Set x86_virt_bits commit-author Paolo Bonzini <pbonzini@redhat.com> commit 9a45819 In commit fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach"), the initialization of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is incorrect because early_init_amd() expected to be able to reduce the value according to the contents of CPUID leaf 0x8000001f. Fortunately, the bug was negated by init_amd()'s call to early_init_amd(), which does reduce x86_phys_bits in the end. However, this is very late in the boot process and, most notably, the wrong value is used for x86_phys_bits when setting up MTRRs. To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is set/cleared, and c->extended_cpuid_level is retrieved. Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240131230902.1867092-2-pbonzini%40redhat.com (cherry picked from commit 9a45819) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
…sizes() jira LE-2183 bug-fix-prereq x86/sev-es: Set x86_virt_bits commit-author Borislav Petkov (AMD) <bp@alien8.de> commit 95bfb35 Drop 'vp_bits_from_cpuid' as it is not really needed. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240316120706.4352-1-bp@alien8.de (cherry picked from commit 95bfb35) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
… size jira LE-3812 commit-author Long Li <longli@microsoft.com> commit 9e517a8 MANA hardware uses 4k page size. When calculating the page table index, it should use the hardware page size, not the system page size. Cc: stable@vger.kernel.org Fixes: 0266a17 ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1725030993-16213-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 9e517a8) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira VULN-160088 cve CVE-2024-56661 commit-author Eric Dumazet <edumazet@google.com> commit b04d86f syzbot found [1] that after blamed commit, ub->ubsock->sk was NULL when attempting the atomic_dec() : atomic_dec(&tipc_net(sock_net(ub->ubsock->sk))->wq_count); Fix this by caching the tipc_net pointer. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 0 UID: 0 PID: 5896 Comm: kworker/0:3 Not tainted 6.13.0-rc1-next-20241203-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: events cleanup_bearer RIP: 0010:read_pnet include/net/net_namespace.h:387 [inline] RIP: 0010:sock_net include/net/sock.h:655 [inline] RIP: 0010:cleanup_bearer+0x1f7/0x280 net/tipc/udp_media.c:820 Code: 18 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 3c f7 99 f6 48 8b 1b 48 83 c3 30 e8 f0 e4 60 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 df e8 1a f7 99 f6 49 83 c7 e8 48 8b 1b RSP: 0018:ffffc9000410fb70 EFLAGS: 00010206 RAX: 0000000000000006 RBX: 0000000000000030 RCX: ffff88802fe45a00 RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffc9000410f900 RBP: ffff88807e1f0908 R08: ffffc9000410f907 R09: 1ffff92000821f20 R10: dffffc0000000000 R11: fffff52000821f21 R12: ffff888031d19980 R13: dffffc0000000000 R14: dffffc0000000000 R15: ffff88807e1f0918 FS: 0000000000000000(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556ca050b000 CR3: 0000000031c0c000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 6a2fa13 ("tipc: Fix use-after-free of kernel socket in cleanup_bearer().") Reported-by: syzbot+46aa5474f179dacd1a3b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67508b5f.050a0220.17bd51.0070.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241204170548.4152658-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit b04d86f) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
…/O issuing CPU jira LE-4535 commit-author Long Li <longli@microsoft.com> commit b69ffea When selecting an outgoing channel for I/O, storvsc tries to select a channel with a returning CPU that is not the same as issuing CPU. This worked well in the past, however it doesn't work well when the Hyper-V exposes a large number of channels (up to the number of all CPUs). Use a different CPU for returning channel is not efficient on Hyper-V. Change this behavior by preferring to the channel with the same CPU as the current I/O issuing CPU whenever possible. Tests have shown improvements in newer Hyper-V/Azure environment, and no regression with older Hyper-V/Azure environments. Tested-by: Raheel Abdul Faizy <rabdulfaizy@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Message-Id: <1759381530-7414-1-git-send-email-longli@linux.microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit b69ffea) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira SECO-458 commit-author Alex Williamson <alex.williamson@redhat.com> commit 4453f36 Toggling memory enable is free on bare metal, but potentially expensive in virtualized environments as the device MMIO spaces are added and removed from the VM address space, including DMA mapping of those spaces through the IOMMU where peer-to-peer is supported. Currently memory decode is disabled around sizing each individual BAR, even for SR-IOV BARs while VF Enable is cleared. This can be better optimized for virtual environments by sizing a set of BARs at once, stashing the resulting mask into an array, while only toggling memory enable once. This also naturally improves the SR-IOV path as the caller becomes responsible for any necessary decode disables while sizing BARs, therefore SR-IOV BARs are sized relying only on the VF Enable rather than toggling the PF memory enable in the command register. Link: https://lore.kernel.org/r/20250120182202.1878581-1-alex.williamson@redhat.com Reported-by: Mitchell Augustin <mitchell.augustin@canonical.com> Link: https://lore.kernel.org/r/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com> Reviewed-by: Mitchell Augustin <mitchell.augustin@canonical.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> (cherry picked from commit 4453f36) Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
jira KERNEL-168 commit-author Alexander Lobakin <aleksander.lobakin@intel.com> commit 080d72f Software-side Tx buffers for storing DMA, frame size, skb pointers etc. are pretty much generic and every driver defines them the same way. The same can be said for software Tx completions -- same napi_consume_skb()s and all that... Add a couple simple wrappers for doing that to stop repeating the old tale at least within the Intel code. Drivers are free to use 'priv' member at the end of the structure. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 080d72f) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Alexander Lobakin <aleksander.lobakin@intel.com> commit d9028db upstream-diff | adjusted context due to missing #include <net/libeth/rx.h> introduced in commit 1b1b262 ("idpf: reuse libeth's definitions of parsed ptype structures") part of "convert RX to libeth" patchset. &idpf_tx_buffer is almost identical to the previous generations, as well as the way it's handled. Moreover, relying on dma_unmap_addr() and !!buf->skb instead of explicit defining of buffer's type was never good. Use the newly added libeth helpers to do it properly and reduce the copy-paste around the Tx code. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit d9028db) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com> Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Alexander Lobakin <aleksander.lobakin@intel.com> commit 3dc95a3 Add a shorthand similar to other net*_subqueue() helpers for resetting the queue by its index w/o obtaining &netdev_tx_queue beforehand manually. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 3dc95a3) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Joshua Hay <joshua.a.hay@intel.com> commit 24eb35b upstream-diff | adjusted context in .h file around struct idpf_compl_queue due to missing commit 5a816aa ("idpf: strictly assert cachelines of queue and queue vector structures") Add a mechanism to guard against stashing partial packets into the hash table to make the driver more robust, with more efficient decision making when cleaning. Don't stash partial packets. This can happen when an RE (Report Event) completion is received in flow scheduling mode, or when an out of order RS (Report Status) completion is received. The first buffer with the skb is stashed, but some or all of its frags are not because the stack is out of reserve buffers. This leaves the ring in a weird state since the frags are still on the ring. Use the field libeth_sqe::nr_frags to track the number of fragments/tx_bufs representing the packet. The clean routines check to make sure there are enough reserve buffers on the stack before stashing any part of the packet. If there are not, next_to_clean is left pointing to the first buffer of the packet that failed to be stashed. This leaves the whole packet on the ring, and the next time around, cleaning will start from this packet. An RS completion is still expected for this packet in either case. So instead of being cleaned from the hash table, it will be cleaned from the ring directly. This should all still be fine since the DESC_UNUSED and BUFS_UNUSED will reflect the state of the ring. If we ever fall below the thresholds, the TxQ will still be stopped, giving the completion queue time to catch up. This may lead to stopping the queue more frequently, but it guarantees the Tx ring will always be in a good state. Also, always use the idpf_tx_splitq_clean function to clean descriptors, i.e. use it from clean_buf_ring as well. This way we avoid duplicating the logic and make sure we're using the same reserve buffers guard rail. This does require a switch from the s16 next_to_clean overflow descriptor ring wrap calculation to u16 and the normal ring size check. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 24eb35b) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com> Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Joshua Hay <joshua.a.hay@intel.com> commit 4c69c77 Commit d9028db ("idpf: convert to libeth Tx buffer completion") inadvertently removed code that was necessary for the tx buffer cleaning routine to iterate over all buffers associated with a packet. When a frag is too large for a single data descriptor, it will be split across multiple data descriptors. This means the frag will span multiple buffers in the buffer ring in order to keep the descriptor and buffer ring indexes aligned. The buffer entries in the ring are technically empty and no cleaning actions need to be performed. These empty buffers can precede other frags associated with the same packet. I.e. a single packet on the buffer ring can look like: buf[0]=skb0.frag0 buf[1]=skb0.frag1 buf[2]=empty buf[3]=skb0.frag2 The cleaning routine iterates through these buffers based on a matching completion tag. If the completion tag is not set for buf2, the loop will end prematurely. Frag2 will be left uncleaned and next_to_clean will be left pointing to the end of packet, which will break the cleaning logic for subsequent cleans. This consequently leads to tx timeouts. Assign the empty bufs the same completion tag for the packet to ensure the cleaning routine iterates over all of the buffers associated with the packet. Fixes: d9028db ("idpf: convert to libeth Tx buffer completion") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Madhu chittim <madhu.chittim@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 4c69c77) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Joshua Hay <joshua.a.hay@intel.com> commit cb83b55 upstream-diff | 1. adjusted context around idpf_rx_splitq_clean function due to missing - 74d1412 ("idpf: use libeth Rx buffer management for payload buffer") - 6ad5ff6 ("libeth: convert to netmem") 2. adjusted context around struct idpf_tx_queue members and docstring and did not include the libeth_cacheline_set_assert changes due to missing: - 5a816aa ("idpf: strictly assert cachelines of queue and queue vector structures") 3. fix compilation issue (std=89) in idpf_tx_desc_alloc due to for loop var declaration In certain production environments, it is possible for completion tags to collide, meaning N packets with the same completion tag are in flight at the same time. In this environment, any given Tx queue is effectively used to send both slower traffic and higher throughput traffic simultaneously. This is the result of a customer's specific configuration in the device pipeline, the details of which Intel cannot provide. This configuration results in a small number of out-of-order completions, i.e., a small number of packets in flight. The existing guardrails in the driver only protect against a large number of packets in flight. The slower flow completions are delayed which causes the out-of-order completions. The fast flow will continue sending traffic and generating tags. Because tags are generated on the fly, the fast flow eventually uses the same tag for a packet that is still in flight from the slower flow. The driver has no idea which packet it should clean when it processes the completion with that tag, but it will look for the packet on the buffer ring before the hash table. If the slower flow packet completion is processed first, it will end up cleaning the fast flow packet on the ring prematurely. This leaves the descriptor ring in a bad state resulting in a crash or Tx timeout. In summary, generating a tag when a packet is sent can lead to the same tag being associated with multiple packets. This can lead to resource leaks, crashes, and/or Tx timeouts. Before we can replace the tag generation, we need a new mechanism for the send path to know what tag to use next. The driver will allocate and initialize a refillq for each TxQ with all of the possible free tag values. During send, the driver grabs the next free tag from the refillq from next_to_clean. While cleaning the packet, the clean routine posts the tag back to the refillq's next_to_use to indicate that it is now free to use. This mechanism works exactly the same way as the existing Rx refill queues, which post the cleaned buffer IDs back to the buffer queue to be reposted to HW. Since we're using the refillqs for both Rx and Tx now, genericize some of the existing refillq support. Note: the refillqs will not be used yet. This is only demonstrating how they will be used to pass free tags back to the send path. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit cb83b55) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Joshua Hay <joshua.a.hay@intel.com> commit f2d18e1 upstream-diff | adjusted context in struct idpf_tx_queue because the order of the fields is different due to missing - 5a816aa ("idpf: strictly assert cachelines of queue and queue vector structures") Track the gap between next_to_use and the last RE index. Set RE again if the gap is large enough to ensure RE bit is set frequently. This is critical before removing the stashing mechanisms because the opportunistic descriptor ring cleaning from the out-of-order completions will go away. Previously the descriptors would be "cleaned" by both the descriptor (RE) completion and the out-of-order completions. Without the latter, we must ensure the RE bit is set more frequently. Otherwise, it's theoretically possible for the descriptor ring next_to_clean to never advance. The previous implementation was dependent on the start of a packet falling on a 64th index in the descriptor ring, which is not guaranteed with large packets. Signed-off-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit f2d18e1) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Joshua Hay <joshua.a.hay@intel.com> commit b61dfa9 upstream-diff | adjusted context in 2 places: - when removing func idpf_tx_dma_map_error due to different memset call that uses the hardcoded struct type; - in func idpf_tx_splitq_frame due to missing expected union idpf_flex_tx_ctx_desc *ctx_desc; both differences were introduced in commit 1a49cf8 ("idpf: add Tx timestamp flows"). Move (and rename) the existing rollback logic to singleq.c since that will be the only consumer. Create a simplified splitq specific rollback function to loop through and unmap tx_bufs based on the completion tag. This is critical before replacing the Tx buffer ring with the buffer pool since the previous rollback indexing will not work to unmap the chained buffers from the pool. Cache the next_to_use index before any portion of the packet is put on the descriptor ring. In case of an error, the rollback will bump tail to the correct next_to_use value. Because the splitq path now supports different types of context descriptors (and potentially multiple in the future), this will take care of rolling back any and all context descriptors encoded on the ring for the erroneous packet. The previous rollback logic was broken for PTP packets since it would not account for the PTP context descriptor. Fixes: 1a49cf8 ("idpf: add Tx timestamp flows") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit b61dfa9) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Joshua Hay <joshua.a.hay@intel.com> commit 5f417d5 upstream-diff | adjusted context in: - ifpf_tx_splitq_frame and idpf_tx_clean_bufs; - struct idpf_tx_queue due to missing of some elements in the struct; both due to missing commit - 1a49cf8 ("idpf: add Tx timestamp flows"). and did not include the cacheline assert changes due to missing - 5a816aa ("idpf: strictly assert cachelines of queue and queue vector structures") Replace the TxQ buffer ring with one large pool/array of buffers (only for flow scheduling). This eliminates the tag generation and makes it impossible for a tag to be associated with more than one packet. The completion tag passed to HW through the descriptor is the index into the array. That same completion tag is posted back to the driver in the completion descriptor, and used to index into the array to quickly retrieve the buffer during cleaning. In this way, the tags are treated as a fix sized resource. If all tags are in use, no more packets can be sent on that particular queue (until some are freed up). The tag pool size is 64K since the completion tag width is 16 bits. For each packet, the driver pulls a free tag from the refillq to get the next free buffer index. When cleaning is complete, the tag is posted back to the refillq. A multi-frag packet spans multiple buffers in the driver, therefore it uses multiple buffer indexes/tags from the pool. Each frag pulls from the refillq to get the next free buffer index. These are tracked in a next_buf field that replaces the completion tag field in the buffer struct. This chains the buffers together so that the packet can be cleaned from the starting completion tag taken from the completion descriptor, then from the next_buf field for each subsequent buffer. In case of a dma_mapping_error occurs or the refillq runs out of free buf_ids, the packet will execute the rollback error path. This unmaps any buffers previously mapped for the packet. Since several free buf_ids could have already been pulled from the refillq, we need to restore its original state as well. Otherwise, the buf_ids/tags will be leaked and not used again until the queue is reallocated. Descriptor completions only advance the descriptor ring index to "clean" the descriptors. The packet completions only clean the buffers associated with the given packet completion tag and do not update the descriptor ring index. When operating in queue based scheduling mode, the array still acts as a ring and will only have TxQ descriptor count entries. The tx_bufs are still associated 1:1 with the descriptor ring entries and we can use the conventional indexing mechanisms. Fixes: c2d548c ("idpf: add TX splitq napi poll support") Signed-off-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 5f417d5) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Joshua Hay <joshua.a.hay@intel.com> commit 0c3f135 upstream-diff | adjusted conflict in idpf_tx_splitq_frame func due to missing 1a49cf8 ("idpf: add Tx timestamp flows"). The Tx refillq logic will cause packets to be silently dropped if there are not enough buffer resources available to send a packet in flow scheduling mode. Instead, determine how many buffers are needed along with number of descriptors. Make sure there are enough of both resources to send the packet, and stop the queue if not. Fixes: 7292af0 ("idpf: fix a race in txq wakeup") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 0c3f135) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira KERNEL-168 commit-author Joshua Hay <joshua.a.hay@intel.com> commit 6c4e684 upstream-diff | - adjusted context in .h due to different order in the struct idpf_tx_queue - adjusted context due to missing idpf_tx_read_tstamp func; both are due to missing 1a49cf8 ("idpf: add Tx timestamp flows"). - did not include libeth_cacheline_set_assert for struct idpf_tx_queue due to missing 5a816aa ("idpf: strictly assert cachelines of queue and queue vector structures") With the new Tx buffer management scheme, there is no need for all of the stashing mechanisms, the hash table, the reserve buffer stack, etc. Remove all of that. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 6c4e684) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
|
🤖 Validation Checks In Progress Workflow run: https://github.com/ctrliq/kernel-src-tree/actions/runs/23555971558 |
🔍 Upstream Linux Kernel Commit Check
This is an automated message from the kernel commit checker workflow. |
🔍 Interdiff Analysis
================================================================================
* DELTA DIFFERENCES - code changes that differ between the patches *
================================================================================
--- b/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -152,18 +152,6 @@
return queue - queue->ctrl->queues;
}
-static inline bool nvme_tcp_recv_pdu_supported(enum nvme_tcp_pdu_type type)
-{
- switch (type) {
- case nvme_tcp_c2h_data:
- case nvme_tcp_r2t:
- case nvme_tcp_rsp:
- return true;
- default:
- return false;
- }
-}
-
static inline struct blk_mq_tags *nvme_tcp_tagset(struct nvme_tcp_queue *queue)
{
u32 queue_idx = nvme_tcp_queue_id(queue);
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -217,6 +217,19 @@
return queue - queue->ctrl->queues;
}
+static inline bool nvme_tcp_recv_pdu_supported(enum nvme_tcp_pdu_type type)
+{
+ switch (type) {
+ case nvme_tcp_c2h_term:
+ case nvme_tcp_c2h_data:
+ case nvme_tcp_r2t:
+ case nvme_tcp_rsp:
+ return true;
+ default:
+ return false;
+ }
+}
+
/*
* Check if the queue is TLS encrypted
*/
@@ -818,6 +831,16 @@
return 0;
hdr = queue->pdu;
+ if (unlikely(hdr->hlen != sizeof(struct nvme_tcp_rsp_pdu))) {
+ if (!nvme_tcp_recv_pdu_supported(hdr->type))
+ goto unsupported_pdu;
+
+ dev_err(queue->ctrl->ctrl.device,
+ "pdu type %d has unexpected header length (%d)\n",
+ hdr->type, hdr->hlen);
+ return -EPROTO;
+ }
+
if (unlikely(hdr->type == nvme_tcp_c2h_term)) {
/*
* C2HTermReq never includes Header or Data digests.
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -149,6 +149,6 @@
return queue - queue->ctrl->queues;
}
-static inline struct blk_mq_tags *nvme_tcp_tagset(struct nvme_tcp_queue *queue)
-{
- u32 queue_idx = nvme_tcp_queue_id(queue);
+/*
+ * Check if the queue is TLS encrypted
+ */
@@ -674,6 +818,6 @@
return 0;
hdr = queue->pdu;
- if (queue->hdr_digest) {
- ret = nvme_tcp_verify_hdgst(queue, queue->pdu, hdr->hlen);
- if (unlikely(ret))
+ if (unlikely(hdr->type == nvme_tcp_c2h_term)) {
+ /*
+ * C2HTermReq never includes Header or Data digests.
================================================================================
* DELTA DIFFERENCES - code changes that differ between the patches *
================================================================================
--- b/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -11,7 +11,6 @@
*/
#include <linux/skbuff.h>
-#include <net/ip6_checksum.h>
#include <net/udp.h>
#include <net/protocol.h>
#include <net/inet_common.h>
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -273,6 +269,6 @@
if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST)
return __udp_gso_segment_list(gso_skb, features, is_ipv6);
- mss = skb_shinfo(gso_skb)->gso_size;
- if (gso_skb->len <= sizeof(*uh) + mss)
+ skb_pull(gso_skb, sizeof(*uh));
+
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -273,6 +273,6 @@
bool copy_dtor;
__sum16 check;
__be16 newlen;
- if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) {
- /* Detect modified geometry and pass those to skb_segment. */
+ mss = skb_shinfo(gso_skb)->gso_size;
+ if (gso_skb->len <= sizeof(*uh) + mss)
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1607,7 +1613,6 @@
cpu_detect(c);
get_cpu_vendor(c);
get_cpu_cap(c);
- get_cpu_address_sizes(c);
setup_force_cpu_cap(X86_FEATURE_CPUID);
cpu_parse_early_param();
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1502,6 +1597,5 @@
get_cpu_vendor(c);
get_cpu_cap(c);
- get_model_name(c); /* RHEL8: get model name for unsupported check */
get_cpu_address_sizes(c);
setup_force_cpu_cap(X86_FEATURE_CPUID);
cpu_parse_early_param();
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1508,5 +1509,4 @@
get_cpu_cap(c);
- get_model_name(c); /* RHEL8: get model name for unsupported check */
setup_force_cpu_cap(X86_FEATURE_CPUID);
cpu_parse_early_param();
@@ -1520,5 +1599,4 @@
} else {
- identify_cpu_without_cpuid(c);
setup_clear_cpu_cap(X86_FEATURE_CPUID);
}
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/microsoft/Kconfig
+++ b/drivers/net/ethernet/microsoft/Kconfig
@@ -20,4 +20,4 @@
depends on PCI_MSI && X86_64
depends on PCI_HYPERV
select AUXILIARY_BUS
- help
+ select PAGE_POOL
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/microsoft/Kconfig
+++ b/drivers/net/ethernet/microsoft/Kconfig
@@ -21,4 +21,4 @@
depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN && ARM64_4K_PAGES)
depends on PCI_HYPERV
select AUXILIARY_BUS
- help
+ select PAGE_POOL
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -217,17 +256,6 @@
struct pci_bus_region region, inverted_region;
const char *res_name = pci_resource_name(dev, res - dev->resource);
- mask = type ? PCI_ROM_ADDRESS_MASK : ~0;
-
- /* No printks while decoding is disabled! */
- if (!dev->mmio_always_on) {
- pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
- if (orig_cmd & PCI_COMMAND_DECODE_ENABLE) {
- pci_write_config_word(dev, PCI_COMMAND,
- orig_cmd & ~PCI_COMMAND_DECODE_ENABLE);
- }
- }
-
res->name = pci_name(dev);
pci_read_config_dword(dev, pos, &l);
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -740,4 +739,5 @@
struct resource *res;
+ const char *res_name;
struct pci_dev *pdev;
pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, &ctrl);
--- b/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -186,6 +180,7 @@
u64 l64, sz64, mask64;
u16 orig_cmd;
struct pci_bus_region region, inverted_region;
+ const char *res_name = pci_resource_name(dev, res - dev->resource);
mask = type ? PCI_ROM_ADDRESS_MASK : ~0;
================================================================================
* DELTA DIFFERENCES - code changes that differ between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -1,8 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2023 Intel Corporation */
-#include <net/libeth/tx.h>
-
#include "idpf.h"
/**
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1,8 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2023 Intel Corporation */
-#include <net/libeth/tx.h>
-
#include "idpf.h"
#include "idpf_virtchnl.h"
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -2,6 +2,7 @@
/* Copyright (C) 2023 Intel Corporation */
#include <net/libeth/rx.h>
+#include <net/libeth/tx.h>
#include "idpf.h"
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2,6 +2,7 @@
/* Copyright (C) 2023 Intel Corporation */
#include <net/libeth/rx.h>
+#include <net/libeth/tx.h>
#include "idpf.h"
#include "idpf_virtchnl.h"
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -2,4 +1,8 @@
/* Copyright (C) 2023 Intel Corporation */
+#include <net/libeth/rx.h>
+
+#include <net/libeth/tx.h>
+
#include "idpf.h"
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2,5 +1,9 @@
/* Copyright (C) 2023 Intel Corporation */
+#include <net/libeth/rx.h>
+
+#include <net/libeth/tx.h>
+
#include "idpf.h"
#include "idpf_virtchnl.h"
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -785,7 +785,7 @@
u32 next_to_use;
u32 next_to_clean;
- u32 num_completions;
+ aligned_u64 num_completions;
__cacheline_group_end_aligned(read_write);
__cacheline_group_begin_aligned(cold);
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -971,4 +920,4 @@
u32 num_completions_pending;
};
-/**
+static inline int idpf_q_vector_to_mem(const struct idpf_q_vector *q_vector)
================================================================================
* DELTA DIFFERENCES - code changes that differ between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -247,7 +247,6 @@
struct device *dev = tx_q->dev;
struct idpf_sw_queue *refillq;
int err;
- unsigned int i = 0;
err = idpf_tx_buf_alloc_all(tx_q);
if (err)
@@ -282,7 +281,7 @@
goto err_alloc;
}
- for (i = 0; i < refillq->desc_count; i++)
+ for (unsigned int i = 0; i < refillq->desc_count; i++)
refillq->ring[i] =
FIELD_PREP(IDPF_RFL_BI_BUFID_M, i) |
FIELD_PREP(IDPF_RFL_BI_GEN_M,
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -3551,7 +3630,7 @@
skip_data:
rx_buf->netmem = 0;
- idpf_rx_post_buf_refill(refillq, buf_id);
+ idpf_post_buf_refill(refillq, buf_id);
IDPF_RX_BUMP_NTC(rxq, ntc);
/* skip if it is non EOP desc */
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -622,6 +622,7 @@
* @cleaned_pkts: Number of packets cleaned for the above said case
* @tx_max_bufs: Max buffers that can be transmitted with scatter-gather
* @stash: Tx buffer stash for Flow-based scheduling mode
+ * @refillq: Pointer to refill queue
* @compl_tag_bufid_m: Completion tag buffer id mask
* @compl_tag_cur_gen: Used to keep track of current completion tag generation
* @compl_tag_gen_max: To determine when compl_tag_cur_gen should be reset
@@ -671,6 +672,7 @@
u16 tx_max_bufs;
struct idpf_txq_stash *stash;
+ struct idpf_sw_queue *refillq;
u16 compl_tag_bufid_m;
u16 compl_tag_cur_gen;
@@ -692,7 +694,7 @@
__cacheline_group_end_aligned(cold);
};
libeth_cacheline_set_assert(struct idpf_tx_queue, 64,
- 112 + sizeof(struct u64_stats_sync),
+ 120 + sizeof(struct u64_stats_sync),
24);
/**
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -244,4 +246,5 @@
struct device *dev = tx_q->dev;
int err;
+ unsigned int i = 0;
err = idpf_tx_buf_alloc_all(tx_q);
@@ -3358,4 +3417,4 @@
idpf_rx_post_buf_refill(refillq, buf_id);
-
IDPF_RX_BUMP_NTC(rxq, ntc);
+
/* skip if it is non EOP desc */
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -624,6 +619,6 @@
* @cleaned_pkts: Number of packets cleaned for the above said case
* @tx_max_bufs: Max buffers that can be transmitted with scatter-gather
- * @tx_min_pkt_len: Min supported packet length
+ * @stash: Tx buffer stash for Flow-based scheduling mode
* @compl_tag_bufid_m: Completion tag buffer id mask
- * @compl_tag_gen_s: Completion tag generation bit
- * The format of the completion tag will change based on the TXQ
+ * @compl_tag_cur_gen: Used to keep track of current completion tag generation
+ * @compl_tag_gen_max: To determine when compl_tag_cur_gen should be reset
@@ -743,6 +692,2 @@
- u16 tx_max_bufs;
- u16 tx_min_pkt_len;
-
- u16 compl_tag_bufid_m;
- u16 compl_tag_gen_s;
+/**
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -610,6 +610,8 @@
* @netdev: &net_device corresponding to this queue
* @next_to_use: Next descriptor to use
* @next_to_clean: Next descriptor to clean
+ * @last_re: last descriptor index that RE bit was set
+ * @tx_max_bufs: Max buffers that can be transmitted with scatter-gather
* @cleaned_bytes: Splitq only, TXQ only: When a TX completion is received on
* the TX completion queue, it can be for any TXQ associated
* with that completion queue. This means we can clean up to
@@ -620,7 +622,6 @@
* only once at the end of the cleaning routine.
* @clean_budget: singleq only, queue cleaning budget
* @cleaned_pkts: Number of packets cleaned for the above said case
- * @tx_max_bufs: Max buffers that can be transmitted with scatter-gather
* @stash: Tx buffer stash for Flow-based scheduling mode
* @refillq: Pointer to refill queue
* @compl_tag_bufid_m: Completion tag buffer id mask
@@ -672,7 +675,6 @@
};
u16 cleaned_pkts;
- u16 tx_max_bufs;
struct idpf_txq_stash *stash;
struct idpf_sw_queue *refillq;
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -607,6 +607,5 @@
- * @desc_count: Number of descriptors
+ * @netdev: &net_device corresponding to this queue
* @next_to_use: Next descriptor to use
* @next_to_clean: Next descriptor to clean
- * @netdev: &net_device corresponding to this queue
* @cleaned_bytes: Splitq only, TXQ only: When a TX completion is received on
* the TX completion queue, it can be for any TXQ associated
@@ -685,5 +622,5 @@
* @cleaned_pkts: Number of packets cleaned for the above said case
* @tx_max_bufs: Max buffers that can be transmitted with scatter-gather
- * @tx_min_pkt_len: Min supported packet length
+ * @stash: Tx buffer stash for Flow-based scheduling mode
* @refillq: Pointer to refill queue
* @compl_tag_bufid_m: Completion tag buffer id mask
@@ -723,4 +660,4 @@
- u16 desc_count;
+ __cacheline_group_begin_aligned(read_write);
u16 next_to_use;
u16 next_to_clean;
@@ -731,5 +668,5 @@
u16 tx_max_bufs;
- u16 tx_min_pkt_len;
+ struct idpf_txq_stash *stash;
struct idpf_sw_queue *refillq;
================================================================================
* DELTA DIFFERENCES - code changes that differ between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2287,4 +2287,55 @@
/**
+ * idpf_tx_dma_map_error - handle TX DMA map errors
+ * @txq: queue to send buffer on
+ * @skb: send buffer
+ * @first: original first buffer info buffer for packet
+ * @idx: starting point on ring to unwind
+ */
+void idpf_tx_dma_map_error(struct idpf_tx_queue *txq, struct sk_buff *skb,
+ struct idpf_tx_buf *first, u16 idx)
+{
+ struct libeth_sq_napi_stats ss = { };
+ struct libeth_cq_pp cp = {
+ .dev = txq->dev,
+ .ss = &ss,
+ };
+
+ u64_stats_update_begin(&txq->stats_sync);
+ u64_stats_inc(&txq->q_stats.dma_map_errs);
+ u64_stats_update_end(&txq->stats_sync);
+
+ /* clear dma mappings for failed tx_buf map */
+ for (;;) {
+ struct idpf_tx_buf *tx_buf;
+
+ tx_buf = &txq->tx_buf[idx];
+ libeth_tx_complete(tx_buf, &cp);
+ if (tx_buf == first)
+ break;
+ if (idx == 0)
+ idx = txq->desc_count;
+ idx--;
+ }
+
+ if (skb_is_gso(skb)) {
+ union idpf_tx_flex_desc *tx_desc;
+
+ /* If we failed a DMA mapping for a TSO packet, we will have
+ * used one additional descriptor for a context
+ * descriptor. Reset that here.
+ */
+ tx_desc = &txq->flex_tx[idx];
+ memset(tx_desc, 0, sizeof(struct idpf_flex_tx_ctx_desc));
+ if (idx == 0)
+ idx = txq->desc_count;
+ idx--;
+ }
+
+ /* Update tail in case netdev_xmit_more was previously true */
+ idpf_tx_buf_hw_update(txq, idx, false);
+}
+
+/**
* idpf_tx_splitq_bump_ntu - adjust NTU and generation
* @txq: the tx ring to wrap
@@ -2335,35 +2386,4 @@
/**
- * idpf_tx_splitq_pkt_err_unmap - Unmap buffers and bump tail in case of error
- * @txq: Tx queue to unwind
- * @params: pointer to splitq params struct
- * @first: starting buffer for packet to unmap
- */
-static void idpf_tx_splitq_pkt_err_unmap(struct idpf_tx_queue *txq,
- struct idpf_tx_splitq_params *params,
- struct idpf_tx_buf *first)
-{
- struct libeth_sq_napi_stats ss = { };
- struct idpf_tx_buf *tx_buf = first;
- struct libeth_cq_pp cp = {
- .dev = txq->dev,
- .ss = &ss,
- };
- u32 idx = 0;
-
- u64_stats_update_begin(&txq->stats_sync);
- u64_stats_inc(&txq->q_stats.dma_map_errs);
- u64_stats_update_end(&txq->stats_sync);
-
- do {
- libeth_tx_complete(tx_buf, &cp);
- idpf_tx_clean_buf_ring_bump_ntc(txq, idx, tx_buf);
- } while (idpf_tx_buf_compl_tag(tx_buf) == params->compl_tag);
-
- /* Update tail in case netdev_xmit_more was previously true. */
- idpf_tx_buf_hw_update(txq, params->prev_ntu, false);
-}
-
-/**
* idpf_tx_splitq_map - Build the Tx flex descriptor
* @tx_q: queue to send buffer on
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2339,57 +2339,6 @@
return count;
}
-/**
- * idpf_tx_dma_map_error - handle TX DMA map errors
- * @txq: queue to send buffer on
- * @skb: send buffer
- * @first: original first buffer info buffer for packet
- * @idx: starting point on ring to unwind
- */
-void idpf_tx_dma_map_error(struct idpf_tx_queue *txq, struct sk_buff *skb,
- struct idpf_tx_buf *first, u16 idx)
-{
- struct libeth_sq_napi_stats ss = { };
- struct libeth_cq_pp cp = {
- .dev = txq->dev,
- .ss = &ss,
- };
-
- u64_stats_update_begin(&txq->stats_sync);
- u64_stats_inc(&txq->q_stats.dma_map_errs);
- u64_stats_update_end(&txq->stats_sync);
-
- /* clear dma mappings for failed tx_buf map */
- for (;;) {
- struct idpf_tx_buf *tx_buf;
-
- tx_buf = &txq->tx_buf[idx];
- libeth_tx_complete(tx_buf, &cp);
- if (tx_buf == first)
- break;
- if (idx == 0)
- idx = txq->desc_count;
- idx--;
- }
-
- if (skb_is_gso(skb)) {
- union idpf_tx_flex_desc *tx_desc;
-
- /* If we failed a DMA mapping for a TSO packet, we will have
- * used one additional descriptor for a context
- * descriptor. Reset that here.
- */
- tx_desc = &txq->flex_tx[idx];
- memset(tx_desc, 0, sizeof(*tx_desc));
- if (idx == 0)
- idx = txq->desc_count;
- idx--;
- }
-
- /* Update tail in case netdev_xmit_more was previously true */
- idpf_tx_buf_hw_update(txq, idx, false);
-}
-
/**
* idpf_tx_splitq_bump_ntu - adjust NTU and generation
* @txq: the tx ring to wrap
@@ -2438,6 +2387,37 @@
return true;
}
+/**
+ * idpf_tx_splitq_pkt_err_unmap - Unmap buffers and bump tail in case of error
+ * @txq: Tx queue to unwind
+ * @params: pointer to splitq params struct
+ * @first: starting buffer for packet to unmap
+ */
+static void idpf_tx_splitq_pkt_err_unmap(struct idpf_tx_queue *txq,
+ struct idpf_tx_splitq_params *params,
+ struct idpf_tx_buf *first)
+{
+ struct libeth_sq_napi_stats ss = { };
+ struct idpf_tx_buf *tx_buf = first;
+ struct libeth_cq_pp cp = {
+ .dev = txq->dev,
+ .ss = &ss,
+ };
+ u32 idx = 0;
+
+ u64_stats_update_begin(&txq->stats_sync);
+ u64_stats_inc(&txq->q_stats.dma_map_errs);
+ u64_stats_update_end(&txq->stats_sync);
+
+ do {
+ libeth_tx_complete(tx_buf, &cp);
+ idpf_tx_clean_buf_ring_bump_ntc(txq, idx, tx_buf);
+ } while (idpf_tx_buf_compl_tag(tx_buf) == params->compl_tag);
+
+ /* Update tail in case netdev_xmit_more was previously true. */
+ idpf_tx_buf_hw_update(txq, params->prev_ntu, false);
+}
+
/**
* idpf_tx_splitq_map - Build the Tx flex descriptor
* @tx_q: queue to send buffer on
@@ -2482,8 +2462,9 @@
for (frag = &skb_shinfo(skb)->frags[0];; frag++) {
unsigned int max_data = IDPF_TX_MAX_DESC_DATA_ALIGNED;
- if (dma_mapping_error(tx_q->dev, dma))
- return idpf_tx_dma_map_error(tx_q, skb, first, i);
+ if (unlikely(dma_mapping_error(tx_q->dev, dma)))
+ return idpf_tx_splitq_pkt_err_unmap(tx_q, params,
+ first);
first->nr_frags++;
idpf_tx_buf_compl_tag(tx_buf) = params->compl_tag;
@@ -2939,7 +2920,9 @@
static netdev_tx_t idpf_tx_splitq_frame(struct sk_buff *skb,
struct idpf_tx_queue *tx_q)
{
- struct idpf_tx_splitq_params tx_params = { };
+ struct idpf_tx_splitq_params tx_params = {
+ .prev_ntu = tx_q->next_to_use,
+ };
union idpf_flex_tx_ctx_desc *ctx_desc;
struct idpf_tx_buf *first;
unsigned int count;
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2326,7 +2380,7 @@
* descriptor. Reset that here.
*/
tx_desc = &txq->flex_tx[idx];
- memset(tx_desc, 0, sizeof(struct idpf_flex_tx_ctx_desc));
+ memset(tx_desc, 0, sizeof(*tx_desc));
if (idx == 0)
idx = txq->desc_count;
idx--;
@@ -2817,4 +2871,5 @@
{
struct idpf_tx_splitq_params tx_params = { };
+ union idpf_flex_tx_ctx_desc *ctx_desc;
struct idpf_tx_buf *first;
unsigned int count;
================================================================================
* DELTA DIFFERENCES - code changes that differ between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1915,11 +1915,8 @@
.napi = budget,
};
- tx_buf = &txq->tx_buf[buf_id];
- if (tx_buf->type == LIBETH_SQE_SKB) {
+ if (tx_buf->type == LIBETH_SQE_SKB)
libeth_tx_complete(tx_buf, &cp);
- idpf_post_buf_refill(txq->refillq, buf_id);
- }
while (idpf_tx_buf_next(tx_buf) != IDPF_TXBUF_NULL) {
buf_id = idpf_tx_buf_next(tx_buf);
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1962,6 +1962,7 @@
idpf_tx_buf_compl_tag(tx_buf) != compl_tag))
return false;
+ tx_buf = &txq->tx_buf[buf_id];
if (tx_buf->type == LIBETH_SQE_SKB) {
if (skb_shinfo(tx_buf->skb)->tx_flags & SKBTX_IN_PROGRESS)
idpf_tx_read_tstamp(txq, tx_buf->skb);
@@ -1965,6 +1966,7 @@
idpf_tx_read_tstamp(txq, tx_buf->skb);
libeth_tx_complete(tx_buf, &cp);
+ idpf_post_buf_refill(txq->refillq, buf_id);
}
idpf_tx_clean_buf_ring_bump_ntc(txq, idx, tx_buf);
@@ -2892,6 +2859,7 @@
struct idpf_tx_buf *first;
unsigned int count;
int tso, idx;
+ u32 buf_id;
count = idpf_tx_desc_count_required(tx_q, skb);
if (unlikely(!count))
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -703,6 +710,7 @@
dma_addr_t dma;
struct idpf_q_vector *q_vector;
+ u32 buf_pool_size;
__cacheline_group_end_aligned(cold);
};
libeth_cacheline_set_assert(struct idpf_tx_queue, 64,
@@ -706,7 +714,7 @@
};
libeth_cacheline_set_assert(struct idpf_tx_queue, 64,
120 + sizeof(struct u64_stats_sync),
- 24);
+ 32);
/**
* struct idpf_buf_queue - software structure representing a buffer queue
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1915,8 +1965,12 @@
idpf_tx_buf_compl_tag(tx_buf) != compl_tag))
return false;
- if (tx_buf->type == LIBETH_SQE_SKB)
+ if (tx_buf->type == LIBETH_SQE_SKB) {
+ if (skb_shinfo(tx_buf->skb)->tx_flags & SKBTX_IN_PROGRESS)
+ idpf_tx_read_tstamp(txq, tx_buf->skb);
+
libeth_tx_complete(tx_buf, &cp);
+ }
idpf_tx_clean_buf_ring_bump_ntc(txq, idx, tx_buf);
@@ -2744,4 +2798,4 @@
- struct idpf_flex_tx_ctx_desc *desc;
+ union idpf_flex_tx_ctx_desc *desc;
int i = txq->next_to_use;
txq->tx_buf[i].type = LIBETH_SQE_CTX;
@@ -2802,6 +2856,6 @@
struct idpf_tx_buf *first;
unsigned int count;
- int tso;
+ int tso, idx;
count = idpf_tx_desc_count_required(tx_q, skb);
if (unlikely(!count))
@@ -2840,4 +2962,4 @@
- u64_stats_update_end(&tx_q->stats_sync);
+ idpf_tx_set_tstamp_desc(ctx_desc, idx);
}
/* record the location of the first descriptor for this packet */
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -692,5 +694,9 @@
struct idpf_q_vector *q_vector;
-} ____cacheline_aligned;
+ __cacheline_group_end_aligned(cold);
+};
+libeth_cacheline_set_assert(struct idpf_tx_queue, 64,
+ 120 + sizeof(struct u64_stats_sync),
+ 24);
/**
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2909,7 +2926,7 @@
};
union idpf_flex_tx_ctx_desc *ctx_desc;
struct idpf_tx_buf *first;
- unsigned int count;
+ u32 count, buf_count = 1;
int tso, idx;
u32 buf_id;
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2768,7 +2821,8 @@
};
+ union idpf_flex_tx_ctx_desc *ctx_desc;
struct idpf_tx_buf *first;
unsigned int count;
- int tso;
+ int tso, idx;
u32 buf_id;
count = idpf_tx_desc_count_required(tx_q, skb);
================================================================================
* DELTA DIFFERENCES - code changes that differ between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1557,6 +1557,82 @@
wake_up(&vport->sw_marker_wq);
}
+/**
+ * idpf_tx_clean_stashed_bufs - clean bufs that were stored for
+ * out of order completions
+ * @txq: queue to clean
+ * @compl_tag: completion tag of packet to clean (from completion descriptor)
+ * @cleaned: pointer to stats struct to track cleaned packets/bytes
+ * @budget: Used to determine if we are in netpoll
+ */
+static void idpf_tx_clean_stashed_bufs(struct idpf_tx_queue *txq,
+ u16 compl_tag,
+ struct libeth_sq_napi_stats *cleaned,
+ int budget)
+{
+ struct idpf_tx_stash *stash;
+ struct hlist_node *tmp_buf;
+ struct libeth_cq_pp cp = {
+ .dev = txq->dev,
+ .ss = cleaned,
+ .napi = budget,
+ };
+
+ /* Buffer completion */
+ hash_for_each_possible_safe(txq->stash->sched_buf_hash, stash, tmp_buf,
+ hlist, compl_tag) {
+ if (unlikely(idpf_tx_buf_compl_tag(&stash->buf) != compl_tag))
+ continue;
+
+ hash_del(&stash->hlist);
+ libeth_tx_complete(&stash->buf, &cp);
+
+ /* Push shadow buf back onto stack */
+ idpf_buf_lifo_push(&txq->stash->buf_stack, stash);
+ }
+}
+
+/**
+ * idpf_stash_flow_sch_buffers - store buffer parameters info to be freed at a
+ * later time (only relevant for flow scheduling mode)
+ * @txq: Tx queue to clean
+ * @tx_buf: buffer to store
+ */
+static int idpf_stash_flow_sch_buffers(struct idpf_tx_queue *txq,
+ struct idpf_tx_buf *tx_buf)
+{
+ struct idpf_tx_stash *stash;
+
+ if (unlikely(tx_buf->type <= LIBETH_SQE_CTX))
+ return 0;
+
+ stash = idpf_buf_lifo_pop(&txq->stash->buf_stack);
+ if (unlikely(!stash)) {
+ net_err_ratelimited("%s: No out-of-order TX buffers left!\n",
+ netdev_name(txq->netdev));
+
+ return -ENOMEM;
+ }
+
+ /* Store buffer params in shadow buffer */
+ stash->buf.skb = tx_buf->skb;
+ stash->buf.bytes = tx_buf->bytes;
+ stash->buf.packets = tx_buf->packets;
+ stash->buf.type = tx_buf->type;
+ stash->buf.nr_frags = tx_buf->nr_frags;
+ dma_unmap_addr_set(&stash->buf, dma, dma_unmap_addr(tx_buf, dma));
+ dma_unmap_len_set(&stash->buf, len, dma_unmap_len(tx_buf, len));
+ idpf_tx_buf_compl_tag(&stash->buf) = idpf_tx_buf_compl_tag(tx_buf);
+
+ /* Add buffer to buf_hash table to be freed later */
+ hash_add(txq->stash->sched_buf_hash, &stash->hlist,
+ idpf_tx_buf_compl_tag(&stash->buf));
+
+ tx_buf->type = LIBETH_SQE_EMPTY;
+
+ return 0;
+}
+
#define idpf_tx_splitq_clean_bump_ntc(txq, ntc, desc, buf) \
do { \
if (unlikely(++(ntc) == (txq)->desc_count)) { \
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -663,6 +663,7 @@
* @cleaned_pkts: Number of packets cleaned for the above said case
* @tx_min_pkt_len: Min supported packet length
* @refillq: Pointer to refill queue
+ * @compl_tag_bufid_m: Completion tag buffer id mask
* @compl_tag_gen_s: Completion tag generation bit
* The format of the completion tag will change based on the TXQ
* descriptor ring size so that we can maintain roughly the same level
@@ -683,6 +684,9 @@
* --------------------------------
*
* This gives us 8*8160 = 65280 possible unique values.
+ * @compl_tag_cur_gen: Used to keep track of current completion tag generation
+ * @compl_tag_gen_max: To determine when compl_tag_cur_gen should be reset
+ * @stash: Tx buffer stash for Flow-based scheduling mode
* @stats_sync: See struct u64_stats_sync
* @q_stats: See union idpf_tx_queue_stats
* @q_id: Queue id
@@ -724,6 +728,14 @@
u16 tx_min_pkt_len;
struct idpf_sw_queue *refillq;
+ u16 compl_tag_bufid_m;
+ u16 compl_tag_gen_s;
+
+ u16 compl_tag_cur_gen;
+ u16 compl_tag_gen_max;
+
+ struct idpf_txq_stash *stash;
+
struct u64_stats_sync stats_sync;
struct idpf_tx_queue_stats q_stats;
################################################################################
! REJECTED PATCH2 HUNKS - could not be compared; manual review needed !
################################################################################
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1602,87 +1462,6 @@
spin_unlock_bh(&tx_tstamp_caps->status_lock);
}
-/**
- * idpf_tx_clean_stashed_bufs - clean bufs that were stored for
- * out of order completions
- * @txq: queue to clean
- * @compl_tag: completion tag of packet to clean (from completion descriptor)
- * @cleaned: pointer to stats struct to track cleaned packets/bytes
- * @budget: Used to determine if we are in netpoll
- */
-static void idpf_tx_clean_stashed_bufs(struct idpf_tx_queue *txq,
- u16 compl_tag,
- struct libeth_sq_napi_stats *cleaned,
- int budget)
-{
- struct idpf_tx_stash *stash;
- struct hlist_node *tmp_buf;
- struct libeth_cq_pp cp = {
- .dev = txq->dev,
- .ss = cleaned,
- .napi = budget,
- };
-
- /* Buffer completion */
- hash_for_each_possible_safe(txq->stash->sched_buf_hash, stash, tmp_buf,
- hlist, compl_tag) {
- if (unlikely(idpf_tx_buf_compl_tag(&stash->buf) != compl_tag))
- continue;
-
- hash_del(&stash->hlist);
-
- if (stash->buf.type == LIBETH_SQE_SKB &&
- (skb_shinfo(stash->buf.skb)->tx_flags & SKBTX_IN_PROGRESS))
- idpf_tx_read_tstamp(txq, stash->buf.skb);
-
- libeth_tx_complete(&stash->buf, &cp);
-
- /* Push shadow buf back onto stack */
- idpf_buf_lifo_push(&txq->stash->buf_stack, stash);
- }
-}
-
-/**
- * idpf_stash_flow_sch_buffers - store buffer parameters info to be freed at a
- * later time (only relevant for flow scheduling mode)
- * @txq: Tx queue to clean
- * @tx_buf: buffer to store
- */
-static int idpf_stash_flow_sch_buffers(struct idpf_tx_queue *txq,
- struct idpf_tx_buf *tx_buf)
-{
- struct idpf_tx_stash *stash;
-
- if (unlikely(tx_buf->type <= LIBETH_SQE_CTX))
- return 0;
-
- stash = idpf_buf_lifo_pop(&txq->stash->buf_stack);
- if (unlikely(!stash)) {
- net_err_ratelimited("%s: No out-of-order TX buffers left!\n",
- netdev_name(txq->netdev));
-
- return -ENOMEM;
- }
-
- /* Store buffer params in shadow buffer */
- stash->buf.skb = tx_buf->skb;
- stash->buf.bytes = tx_buf->bytes;
- stash->buf.packets = tx_buf->packets;
- stash->buf.type = tx_buf->type;
- stash->buf.nr_frags = tx_buf->nr_frags;
- dma_unmap_addr_set(&stash->buf, dma, dma_unmap_addr(tx_buf, dma));
- dma_unmap_len_set(&stash->buf, len, dma_unmap_len(tx_buf, len));
- idpf_tx_buf_compl_tag(&stash->buf) = idpf_tx_buf_compl_tag(tx_buf);
-
- /* Add buffer to buf_hash table to be freed later */
- hash_add(txq->stash->sched_buf_hash, &stash->hlist,
- idpf_tx_buf_compl_tag(&stash->buf));
-
- tx_buf->type = LIBETH_SQE_EMPTY;
-
- return 0;
-}
-
#define idpf_tx_splitq_clean_bump_ntc(txq, ntc, desc, buf) \
do { \
if (unlikely(++(ntc) == (txq)->desc_count)) { \
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -598,7 +565,6 @@
* only once at the end of the cleaning routine.
* @clean_budget: singleq only, queue cleaning budget
* @cleaned_pkts: Number of packets cleaned for the above said case
- * @stash: Tx buffer stash for Flow-based scheduling mode
* @refillq: Pointer to refill queue
* @compl_tag_bufid_m: Completion tag buffer id mask
* @compl_tag_cur_gen: Used to keep track of current completion tag generation
@@ -600,9 +566,6 @@
* @cleaned_pkts: Number of packets cleaned for the above said case
* @stash: Tx buffer stash for Flow-based scheduling mode
* @refillq: Pointer to refill queue
- * @compl_tag_bufid_m: Completion tag buffer id mask
- * @compl_tag_cur_gen: Used to keep track of current completion tag generation
- * @compl_tag_gen_max: To determine when compl_tag_cur_gen should be reset
* @cached_tstamp_caps: Tx timestamp capabilities negotiated with the CP
* @tstamp_task: Work that handles Tx timestamp read
* @stats_sync: See struct u64_stats_sync
@@ -633,7 +596,6 @@
u16 desc_count;
u16 tx_min_pkt_len;
- u16 compl_tag_gen_s;
struct net_device *netdev;
__cacheline_group_end_aligned(read_mostly);
@@ -650,7 +612,6 @@
};
u16 cleaned_pkts;
- struct idpf_txq_stash *stash;
struct idpf_sw_queue *refillq;
u16 compl_tag_bufid_m;
@@ -653,10 +614,6 @@
struct idpf_txq_stash *stash;
struct idpf_sw_queue *refillq;
- u16 compl_tag_bufid_m;
- u16 compl_tag_cur_gen;
- u16 compl_tag_gen_max;
-
struct idpf_ptp_vport_tx_tstamp_caps *cached_tstamp_caps;
struct work_struct *tstamp_task;
@@ -674,7 +631,7 @@
__cacheline_group_end_aligned(cold);
};
libeth_cacheline_set_assert(struct idpf_tx_queue, 64,
- 120 + sizeof(struct u64_stats_sync),
+ 104 + sizeof(struct u64_stats_sync),
32);
/**
================================================================================
* CONTEXT DIFFERENCES - surrounding code differences between the patches *
================================================================================
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -3,4 +3,4 @@
-#include "idpf.h"
+#include "idpf_ptp.h"
#include "idpf_virtchnl.h"
struct idpf_tx_stash {
@@ -1725,6 +1770,11 @@
continue;
hash_del(&stash->hlist);
+
+ if (stash->buf.type == LIBETH_SQE_SKB &&
+ (skb_shinfo(stash->buf.skb)->tx_flags & SKBTX_IN_PROGRESS))
+ idpf_tx_read_tstamp(txq, stash->buf.skb);
+
libeth_tx_complete(&stash->buf, &cp);
/* Push shadow buf back onto stack */
--- b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -713,9 +663,7 @@
+ u16 desc_count;
+
+ u16 tx_min_pkt_len;
+ u16 compl_tag_gen_s;
+
+ struct net_device *netdev;
+ __cacheline_group_end_aligned(read_mostly);
- * --------------------------------
- *
- * This gives us 8*8160 = 65280 possible unique values.
- * @compl_tag_cur_gen: Used to keep track of current completion tag generation
- * @compl_tag_gen_max: To determine when compl_tag_cur_gen should be reset
- * @stash: Tx buffer stash for Flow-based scheduling mode
- * @stats_sync: See struct u64_stats_sync
- * @q_stats: See union idpf_tx_queue_stats
- * @q_id: Queue idThis is an automated interdiff check for backported commits. |
JIRA PR Check Results7 commit(s) with issues found: Commit
|
|
❌ Validation checks completed with issues View full results: https://github.com/ctrliq/kernel-src-tree/actions/runs/23555971558 |
|
@PlaidCat Did we update anything manually in this PR? Seems like the code block formatting is not right and I wonder if KernelCI did this. |
Yes I did to paste in the Rebase content, you can also aways go back throught he history here (should GH be in a functional state) |

Summary
This PR has been automatically created after successful completion of all CI stages.
https://ciqinc.atlassian.net/browse/KERNEL-763
Update process (This kernel CentOS base for 4.18.0-553.111.1.el8_10)
rlc-8/4.18.0-553.111.1.el8_10branch fromrocky8_10rlc-8/4.18.0-553.109.1.el8_10into new branch (skipping unneeded code)Rebase Log
Commit Message(s)
Test Results
✅ Build Stage
✅ Boot Verification
✅ Kernel Selftests
Test Comparison
x86_64:
aarch64:
🤖 This PR was automatically generated by GitHub Actions
Run ID: 23551717734