Skip to content

Add profiling instrumentation for NAM building blocks#219

Merged
sdatkinson merged 7 commits intosdatkinson:main-profilingfrom
jfsantos:feature/profiling
Feb 14, 2026
Merged

Add profiling instrumentation for NAM building blocks#219
sdatkinson merged 7 commits intosdatkinson:main-profilingfrom
jfsantos:feature/profiling

Conversation

@jfsantos
Copy link
Contributor

@jfsantos jfsantos commented Feb 6, 2026

Adds a profiling framework (NAM/profiling.h, NAM/profiling.cpp) with NAM_PROFILE_START()/NAM_PROFILE_ADD() macros and 14 timing categories. Supports both desktop (std::chrono) and ARM Cortex-M7 (DWT cycle counter) backends. Profiling is compile-time gated via -DNAM_PROFILING.

Instruments wavenet _Layer::Process() and _LayerArray::ProcessInner() with per-category timing, and adds profiling reset/print calls to the benchmodel tool.

Developed with support and sponsorship from TONE3000

João Felipe Santos and others added 3 commits February 6, 2026 09:33
Adds a profiling framework (NAM/profiling.h, NAM/profiling.cpp) with
NAM_PROFILE_START()/NAM_PROFILE_ADD() macros and 14 timing categories.
Supports both desktop (std::chrono) and ARM Cortex-M7 (DWT cycle counter)
backends. Profiling is compile-time gated via -DNAM_PROFILING.

Instruments wavenet _Layer::Process() and _LayerArray::ProcessInner()
with per-category timing, and adds profiling reset/print calls to the
benchmodel tool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Owner

@sdatkinson sdatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments & Q's.

The main thing that makes me pause is adding things into the high-level logic like in the Layer process method. If there were a way to do it that's a little more "hidden", then that would be much preferred before merging this into main...

@@ -1,4 +1,5 @@
#include "conv1d.h"
#include "profiling.h"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be included? Feels weird to include it if I'm doing e.g. a release build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, it should at very least be inside an #ifdef. I'll change that.

NAM/profiling.h Outdated
}

uint32_t total() const {
return conv1d + input_mixin + layer1x1 + head1x1 + rechannel + conv1x1 + activation + film + copies + setzero + ringbuf + condition + lstm + other;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels fragile. Is there a way to "register" things or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole implementation was targeted at the Wavenet and I didn't think of generalizing it, only covered anything that will be in A2... but you are right, it would be better if we just kept a dictionary of things we are measuring and reported all of them, just like a logging library.

NAM/profiling.h Outdated
}
};

print_row("Conv1D", t.conv1d);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...this could be an iterator over the various registered ops?


void nam::wavenet::_Layer::Process(const Eigen::MatrixXf& input, const Eigen::MatrixXf& condition, const int num_frames)
{
NAM_PROFILE_START();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...I'm not super-hot on things like this sitting in the middle of the code. Is there a way to do this that doesn't add lines like this?

# Also ensure assertions are enabled (NDEBUG is not defined) so tests actually run
set_target_properties(run_tests PROPERTIES COMPILE_OPTIONS "-O0")
# Benchmodel should be built with NAM_PROFILING set
target_compile_definitions(benchmodel PRIVATE NAM_PROFILING)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this optional via a command-line flag?

I'd also be interested to see how much overhead the profiler adds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into how to make it a CMake flag. I briefly looked into it, thought it was more complicated than I wanted to deal with, and immediately forgot about it... especially since profiling should only be inserted into profiling tools.

// Conv1x1 stores either a full (out_channels x in_channels) matrix (possibly
// block-diagonal when grouped), or a depthwise weight vector when groups ==
// in_channels == out_channels.
static MemoryResult conv1x1_memory(int in_ch, int out_ch, bool bias, int groups, int M)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this stores only the memory due to weights? Is there any overhead that should be accounted for? It's probably small but I figure I should ask while I'm here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No overhead for small local variables, but if anything is allocating an entire temporary buffer that wasn't accounted for we need to take that into account. Internal buffers are being accounted for (see e.g. line 59)


// Conv1D stores kernel_size weight matrices (each out_ch x in_ch) or depthwise
// vectors, plus a bias vector, a ring buffer, and an output buffer.
static MemoryResult conv1d_memory(int in_ch, int out_ch, int kernel_size, bool bias, int dilation, int groups, int M)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's M?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M is buffer_size everywhere, thanks for catching it. I'll rename it all across the file.

// Recursive condition_dsp
bool has_condition_dsp = false;
int condition_output_channels = condition_dim;
if (config.find("condition_dsp") != config.end())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check for null like #220 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, thanks for catching it!

jfsantos and others added 4 commits February 12, 2026 11:17
…lementation (sdatkinson#223)

* Make activation apply method pure virtual instead of no-op default

* Fix bugs

* Refactor to throw std::invalid_argument in debug mode, add tests
  The Timings struct hardcoded 14 named fields, requiring manual updates
  to reset(), total(), print_results(), and every call site whenever a
  category was added or removed. Replace with a flat-array registry where
  types are registered at file scope via register_type(), returning an
  integer index for O(1) accumulation in the hot path.

  Also adds NAM_PROFILE_RESTART() macro to replace a raw #ifdef block
  in wavenet.cpp.
Copy link
Owner

@sdatkinson sdatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is nice as it is. The trouble with it, like I've said elsewhere, is that I feel uneasy about having this "in main".

What I'm going to do in the meantime is make a main-profiling branch and merge this into there.

It'll be a bit more work to keep the two in sync, but if this really seems to be worth the lines (or I get other feedback from folks saying that they like it) then I'll merge it up to main later.

@sdatkinson sdatkinson changed the base branch from main to main-profiling February 14, 2026 00:54
@sdatkinson sdatkinson merged commit 3e4bc05 into sdatkinson:main-profiling Feb 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants