fast-get: fix a recent regression#10639
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a userspace regression in Zephyr’s fast-get cleanup path where memory-domain partition removal can occur from a different thread than the one that originally fast-get’d the buffer (observed with SRC in DP mode and back-to-back aplay -r 44100).
Changes:
- Change
fast_put()userspace partition removal to use theentry->threadmemory domain instead ofk_current_get(). - Update debug logging to reference
entry->threadinstead of the current thread.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lgirdwood
left a comment
There was a problem hiding this comment.
LGTM ,can you respond to copilot. Thanks !
zephyr/lib/fast-get.c
Outdated
| #if CONFIG_USERSPACE | ||
| struct k_mem_domain *mdom = k_current_get()->mem_domain_info.mem_domain; | ||
|
|
||
| /* We only get there for large buffers */ |
There was a problem hiding this comment.
The relation between num_partitions and large buffers is not obvious. Could you make the comment clearer?
zephyr/lib/fast-get.c
Outdated
| /* A userspace thread makes the request */ | ||
| if (k_current_get() != entry->thread && | ||
| if (mdom != entry->mdom && | ||
| !fast_get_domain_exists(k_current_get(), ret, |
There was a problem hiding this comment.
I see that the function fast_get_domain_exists iterates through partitions. Why not calling it fast_get_partition_exists?
Recent commits 7d0621e ("fast-get: fix partition leak for multi-thread usage") and 3c757dd ("fast-get: fix crash by freeing buffer before removing partition") introduced a regression when using SRC in DP mode in userspace on PTL and running two aplay -r 44100 back to back. The reason is that SRC isn't calling mod_fast_put() explicitly and is instead relying on automatic clean up, which then happens in a different thread. In fact those two commits weren't needed AFAICS, since fast_put() shouldn't be called directly from userspace, instead only mod_fast_put() should be called, which is a syscall, and therefore fast_put() then will have access to all the memory. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
In fast_put() sram_ptr == entry->sram_ptr, use the former since it's shorter. Also clarify a comment and rename a function to better match its role. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Before dropping the DP thread memory domain, check if any partitions are left still attached to it. If so, warn and remove them. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Until now SRC was relying on the final module clean up to free copied with fast_get() coefficients. This however isn't clean. Fix it and release partitions explicitly. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
| /* We only get there for large buffers */ | ||
| /* | ||
| * We only get there for large buffers, since small buffers with | ||
| * enabled userspace don't create fast-get entries |
There was a problem hiding this comment.
Just to clarify, I think this part is well understood (you can deduce this info just by examining fast_get function itself). No need to further explain it.
The mystery part that I was referring to in my previous comment is this:
mdom->num_partitions > 1
I am not sure if this comment is even related. I am guessing the related comment is below:
/* A userspace thread makes the request */
Am I right? If yes, maybe you could just change it to:
/* if num_partitions > 0, the request comes form a userspace thread */
Or you can combine both comments like this:
/* We only get there for large buffers.
/* Check if the request comes from a userspace thread by examining num_partitions
*/
There was a problem hiding this comment.
@wjablon1 yes, that test is related to the "userspace thread" comment. @softwarecki wanted to replace that test with a K_USER check, then it will be clearer
There was a problem hiding this comment.
Ok, if we already have a plan for this, forget about my comment and remove unnecessary clarification
Recent commits 7d0621e ("fast-get: fix partition leak for multi-thread usage") and 3c757dd ("fast-get: fix crash by freeing buffer before removing partition") introduced a regression when using SRC in DP mode in userspace on PTL and running two aplay -r 44100
back to back. The reason is that SRC isn't calling mod_fast_put() explicitly and is instead relying on automatic clean up, which then happens in a different thread.
In fact those two commits weren't needed AFAICS, since fast_put() shouldn't be called directly from userspace, instead only mod_fast_put() should be called, which is a syscall, and therefore fast_put() then will have access to all the memory.