Skip to content

SOLR-18159 Add metrics for system memory#4209

Open
janhoy wants to merge 2 commits intoapache:mainfrom
janhoy:002-physical-memory-metrics
Open

SOLR-18159 Add metrics for system memory#4209
janhoy wants to merge 2 commits intoapache:mainfrom
janhoy:002-physical-memory-metrics

Conversation

@janhoy
Copy link
Contributor

@janhoy janhoy commented Mar 12, 2026

More specifically jvm_system_memory_free_bytes and jjvm_system_memory_total_bytes

https://issues.apache.org/jira/browse/SOLR-18159

Disclaimer, this can been co-written with Claude Code

@github-actions github-actions bot added documentation Improvements or additions to documentation tests cat:metrics labels Mar 12, 2026
@janhoy janhoy requested a review from Copilot March 12, 2026 12:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new Prometheus gauges under the solr.jvm registry to expose total and free physical (host/container) memory, along with corresponding documentation, tests, and a changelog entry.

Changes:

  • Add jvm_system_memory_bytes and jvm_system_memory_free_bytes gauges to JVM metrics collection.
  • Add/refine tests validating presence of the new metrics (when supported) and behavior when JVM metrics are disabled.
  • Document the new metrics in the Ref Guide and add an unreleased changelog fragment.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
solr/solr-ref-guide/modules/deployment-guide/pages/metrics-reporting.adoc Documents the two new system memory gauges and their availability/behavior.
solr/core/src/test/org/apache/solr/metrics/JvmMetricsTest.java Adds tests for the new metrics and for JVM-metrics-disabled behavior.
solr/core/src/java/org/apache/solr/metrics/OtelRuntimeJvmMetrics.java Registers the two new observable gauges using OperatingSystemMXBean and closes them on shutdown.
changelog/unreleased/SOLR-18159-physical-memory-metrics.yml Adds the unreleased changelog entry for the new metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@janhoy janhoy marked this pull request as draft March 12, 2026 12:24
- Add imports for ManagementFactory and com.sun.management.OperatingSystemMXBean
  in JvmMetricsTest to fix UnnecessarilyFullyQualified ECJ lint errors
- Restore system property correctly in test finally block instead of hardcoding "true"
- Update comment to remove inaccurate claim about separate graceful-degradation test
- Downgrade "OperatingSystemMXBean not present" log from INFO to DEBUG
- Update ref-guide NOTE to describe API availability without listing JDK vendors
@janhoy janhoy marked this pull request as ready for review March 12, 2026 14:53
@janhoy janhoy requested a review from mlbiscoc March 12, 2026 14:53
systemMemoryTotalGauge =
solrMetricManager.observableLongGauge(
registryName,
"jvm.system.memory",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know the prometheus writer would actually transform these delimited by dots in to underscores. But since we are _ for everything else, then make these underscores.

+ " On Linux with cgroup limits, reflects the container memory limit.",
measurement -> {
long total = extOsMxBean.getTotalMemorySize();
if (total > 0) measurement.record(total);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why the > 0 check? Seems wrong but also could this actually be 0?

"Free (unused) physical memory of the host or container in bytes.",
measurement -> {
long free = extOsMxBean.getFreeMemorySize();
if (free > 0) measurement.record(free);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above but having 0 free memory could be possible?

Comment on lines +119 to +126
if (systemMemoryTotalGauge != null) {
systemMemoryTotalGauge.close();
systemMemoryTotalGauge = null;
}
if (systemMemoryFreeGauge != null) {
systemMemoryFreeGauge.close();
systemMemoryFreeGauge = null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close with closeQuietly then set to null in finally might be cleaner.

@@ -0,0 +1,9 @@
title: Add jvm_system_memory_bytes and jvm_system_memory_free_bytes gauges exposing
total and free physical (host or container) memory to the Prometheus metrics endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to just drop Prometheus metrics endpoint. Just metrics is fine since we are not only prometheus anymore.

Comment on lines +81 to +101
systemMemoryTotalGauge =
solrMetricManager.observableLongGauge(
registryName,
"jvm.system.memory",
"Total physical memory of the host or container in bytes."
+ " On Linux with cgroup limits, reflects the container memory limit.",
measurement -> {
long total = extOsMxBean.getTotalMemorySize();
if (total > 0) measurement.record(total);
},
OtelUnit.BYTES);
systemMemoryFreeGauge =
solrMetricManager.observableLongGauge(
registryName,
"jvm.system.memory.free",
"Free (unused) physical memory of the host or container in bytes.",
measurement -> {
long free = extOsMxBean.getFreeMemorySize();
if (free > 0) measurement.record(free);
},
OtelUnit.BYTES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT to make it just jvm.system.memory as the name and differ by attribute like "state=total/freeortype` because the prefixes are the same.

Or jvm.system.memory -> jvm_system_memory_max? I just want to better have it in the names that this is total without having the suffix total because then it will be confused with counters as total is reserved for that type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:metrics documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants