SOLR-18159 Add metrics for system memory#4209
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds two new Prometheus gauges under the solr.jvm registry to expose total and free physical (host/container) memory, along with corresponding documentation, tests, and a changelog entry.
Changes:
- Add
jvm_system_memory_bytesandjvm_system_memory_free_bytesgauges to JVM metrics collection. - Add/refine tests validating presence of the new metrics (when supported) and behavior when JVM metrics are disabled.
- Document the new metrics in the Ref Guide and add an unreleased changelog fragment.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| solr/solr-ref-guide/modules/deployment-guide/pages/metrics-reporting.adoc | Documents the two new system memory gauges and their availability/behavior. |
| solr/core/src/test/org/apache/solr/metrics/JvmMetricsTest.java | Adds tests for the new metrics and for JVM-metrics-disabled behavior. |
| solr/core/src/java/org/apache/solr/metrics/OtelRuntimeJvmMetrics.java | Registers the two new observable gauges using OperatingSystemMXBean and closes them on shutdown. |
| changelog/unreleased/SOLR-18159-physical-memory-metrics.yml | Adds the unreleased changelog entry for the new metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
solr/core/src/java/org/apache/solr/metrics/OtelRuntimeJvmMetrics.java
Outdated
Show resolved
Hide resolved
solr/solr-ref-guide/modules/deployment-guide/pages/metrics-reporting.adoc
Outdated
Show resolved
Hide resolved
- Add imports for ManagementFactory and com.sun.management.OperatingSystemMXBean in JvmMetricsTest to fix UnnecessarilyFullyQualified ECJ lint errors - Restore system property correctly in test finally block instead of hardcoding "true" - Update comment to remove inaccurate claim about separate graceful-degradation test - Downgrade "OperatingSystemMXBean not present" log from INFO to DEBUG - Update ref-guide NOTE to describe API availability without listing JDK vendors
| systemMemoryTotalGauge = | ||
| solrMetricManager.observableLongGauge( | ||
| registryName, | ||
| "jvm.system.memory", |
There was a problem hiding this comment.
I didn't know the prometheus writer would actually transform these delimited by dots in to underscores. But since we are _ for everything else, then make these underscores.
| + " On Linux with cgroup limits, reflects the container memory limit.", | ||
| measurement -> { | ||
| long total = extOsMxBean.getTotalMemorySize(); | ||
| if (total > 0) measurement.record(total); |
There was a problem hiding this comment.
Curious why the > 0 check? Seems wrong but also could this actually be 0?
| "Free (unused) physical memory of the host or container in bytes.", | ||
| measurement -> { | ||
| long free = extOsMxBean.getFreeMemorySize(); | ||
| if (free > 0) measurement.record(free); |
There was a problem hiding this comment.
Same comment as above but having 0 free memory could be possible?
| if (systemMemoryTotalGauge != null) { | ||
| systemMemoryTotalGauge.close(); | ||
| systemMemoryTotalGauge = null; | ||
| } | ||
| if (systemMemoryFreeGauge != null) { | ||
| systemMemoryFreeGauge.close(); | ||
| systemMemoryFreeGauge = null; | ||
| } |
There was a problem hiding this comment.
Close with closeQuietly then set to null in finally might be cleaner.
| @@ -0,0 +1,9 @@ | |||
| title: Add jvm_system_memory_bytes and jvm_system_memory_free_bytes gauges exposing | |||
| total and free physical (host or container) memory to the Prometheus metrics endpoint. | |||
There was a problem hiding this comment.
I think its better to just drop Prometheus metrics endpoint. Just metrics is fine since we are not only prometheus anymore.
| systemMemoryTotalGauge = | ||
| solrMetricManager.observableLongGauge( | ||
| registryName, | ||
| "jvm.system.memory", | ||
| "Total physical memory of the host or container in bytes." | ||
| + " On Linux with cgroup limits, reflects the container memory limit.", | ||
| measurement -> { | ||
| long total = extOsMxBean.getTotalMemorySize(); | ||
| if (total > 0) measurement.record(total); | ||
| }, | ||
| OtelUnit.BYTES); | ||
| systemMemoryFreeGauge = | ||
| solrMetricManager.observableLongGauge( | ||
| registryName, | ||
| "jvm.system.memory.free", | ||
| "Free (unused) physical memory of the host or container in bytes.", | ||
| measurement -> { | ||
| long free = extOsMxBean.getFreeMemorySize(); | ||
| if (free > 0) measurement.record(free); | ||
| }, | ||
| OtelUnit.BYTES); |
There was a problem hiding this comment.
WDYT to make it just jvm.system.memory as the name and differ by attribute like "state=total/freeortype` because the prefixes are the same.
Or jvm.system.memory -> jvm_system_memory_max? I just want to better have it in the names that this is total without having the suffix total because then it will be confused with counters as total is reserved for that type.
More specifically
jvm_system_memory_free_bytesandjjvm_system_memory_total_byteshttps://issues.apache.org/jira/browse/SOLR-18159
Disclaimer, this can been co-written with Claude Code