-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat(host_metrics source): add temperature metrics collector #25607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| The `host_metrics` source can now collect hardware temperature readings via a | ||
| new `temperature` collector. When enabled, it emits `temperature_celsius`, | ||
| `temperature_max_celsius`, and `temperature_critical_celsius` gauges, each | ||
| tagged with the `component` label of the sensor it was read from. | ||
|
|
||
| The collector is opt-in: add `temperature` to the `collectors` list to enable | ||
| it. Components that do not report a given value (for example a missing critical | ||
| threshold) are skipped, and environments without temperature sensors simply | ||
| produce no metrics. | ||
|
|
||
| authors: somaz94 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| use sysinfo::Components; | ||
| use vector_lib::metric_tags; | ||
|
|
||
| use super::HostMetrics; | ||
|
|
||
| const COMPONENT: &str = "component"; | ||
| const TEMPERATURE_CELSIUS: &str = "temperature_celsius"; | ||
| const TEMPERATURE_MAX_CELSIUS: &str = "temperature_max_celsius"; | ||
| const TEMPERATURE_CRITICAL_CELSIUS: &str = "temperature_critical_celsius"; | ||
|
|
||
| impl HostMetrics { | ||
| pub async fn temperature_metrics(&self, output: &mut super::MetricsBuffer) { | ||
| output.name = "temperature"; | ||
| let components = Components::new_with_refreshed_list(); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In containerized host-metrics deployments that mount the host sysfs somewhere like Useful? React with 👍 / 👎. |
||
| for component in &components { | ||
| let label = component.label(); | ||
| let tags = || metric_tags!(COMPONENT => label); | ||
|
Comment on lines
+16
to
+17
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
On Linux systems where Useful? React with 👍 / 👎. |
||
| if let Some(temperature) = component.temperature() { | ||
| output.gauge(TEMPERATURE_CELSIUS, temperature as f64, tags()); | ||
| } | ||
| if let Some(max) = component.max() { | ||
| output.gauge(TEMPERATURE_MAX_CELSIUS, max as f64, tags()); | ||
|
Comment on lines
+18
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Linux, Useful? React with 👍 / 👎. |
||
| } | ||
| if let Some(critical) = component.critical() { | ||
| output.gauge(TEMPERATURE_CRITICAL_CELSIUS, critical as f64, tags()); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::{ | ||
| super::{HostMetrics, HostMetricsConfig, MetricsBuffer, tests::all_gauges}, | ||
| COMPONENT, | ||
| }; | ||
|
|
||
| #[tokio::test] | ||
| async fn generates_temperature_metrics() { | ||
| let mut buffer = MetricsBuffer::new(None); | ||
| HostMetrics::new(HostMetricsConfig::default()) | ||
| .temperature_metrics(&mut buffer) | ||
| .await; | ||
| let metrics = buffer.metrics; | ||
|
|
||
| // Temperature sensors are not exposed in many environments (containers, | ||
| // virtual machines, CI runners), so the component list can legitimately | ||
| // be empty. When metrics are produced, they must all be gauges named | ||
| // `temperature*` and carry the `component` tag. | ||
| assert!(all_gauges(&metrics)); | ||
| for metric in &metrics { | ||
| assert!( | ||
| metric.name().starts_with("temperature"), | ||
| "unexpected metric name: {}", | ||
| metric.name() | ||
| ); | ||
| assert!( | ||
| metric | ||
| .tags() | ||
| .expect("temperature metric is missing tags") | ||
| .contains_key(COMPONENT), | ||
| "temperature metric is missing the `component` tag" | ||
| ); | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a Linux sensor does not expose a kernel
tempN_highestfile,sysinfo::Component::max()is computed by comparing successive refreshes of the sameComponent. RecreatingComponentson everytemperature_metricscall resets that history, sotemperature_max_celsiusbecomes the current sample on each scrape rather than the highest observed temperature. Keep theComponentscollection onHostMetricsand refresh it between scrapes, or avoid emitting the computed max when no persistent history is available.Useful? React with 👍 / 👎.