imageUtilKernels.cu: optimize initAttentionMaskKernel by vc-zju · Pull Request #28 · NVIDIA/TensorRT-Edge-LLM

vc-zju · 2026-01-26T12:03:02Z

What does this PR do?

Type of change: Optimization

Overview:
Simply change the loop order in imageUtilKernels.cu:initAttentionMaskKernel. Take Qwen2.5-VL as example, optimize the kernel from 2.8ms to 0.4ms for 640*640 image input.
Before

After

Testing

Just use nsys to profile it and you can see it.

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Just reuse previous test
Did you add or update any necessary documentation?: No need to do that
Did you update Changelog?: No

…ing loop index Signed-off-by: Tiekai Bi <tiekaib@nvidia.com>

fans-nv · 2026-01-26T12:13:15Z

Would you mind sharing on which platform did you collect the performance data?
And very nice catch for the perf issue, thanks for the contribution.

fans-nv · 2026-01-26T12:14:51Z

Similar to to other community PRs, I need to cherry-pick to internal branch and then publish the change to main along with next release.

vc-zju · 2026-01-26T12:29:06Z

Sure, I collect it on jetson-Thor. I can also move this to internal gitlab if you think we need to do that.

imageUtilKernels.cu: optimize initAttentionMaskKernel by simply chang…

c20ed2f

…ing loop index Signed-off-by: Tiekai Bi <tiekaib@nvidia.com>

vc-zju requested a review from a team January 26, 2026 12:03

fans-nv assigned fans-nv and vc-zju and unassigned fans-nv Jan 26, 2026

fans-nv self-requested a review January 26, 2026 12:11

fans-nv approved these changes Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imageUtilKernels.cu: optimize initAttentionMaskKernel#28

imageUtilKernels.cu: optimize initAttentionMaskKernel#28
vc-zju wants to merge 1 commit intoNVIDIA:mainfrom
vc-zju:main

vc-zju commented Jan 26, 2026 •

edited

Loading

Uh oh!

fans-nv commented Jan 26, 2026

Uh oh!

fans-nv commented Jan 26, 2026

Uh oh!

vc-zju commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vc-zju commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Uh oh!

fans-nv commented Jan 26, 2026

Uh oh!

fans-nv commented Jan 26, 2026

Uh oh!

vc-zju commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vc-zju commented Jan 26, 2026 •

edited

Loading