feat!: enable non-blocking DNS for reqwest#1558
feat!: enable non-blocking DNS for reqwest#1558gh-worker-dd-mergequeue-cf854d[bot] merged 8 commits intomainfrom
Conversation
31fe0f4 to
af8e147
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1558 +/- ##
==========================================
+ Coverage 70.85% 71.24% +0.39%
==========================================
Files 424 423 -1
Lines 61963 62130 +167
==========================================
+ Hits 43903 44267 +364
+ Misses 18060 17863 -197
🚀 New features to boost your workflow:
|
BenchmarksComparisonBenchmark execution time: 2026-02-19 21:07:05 Comparing candidate commit 299c424 in PR branch Found 6 performance improvements and 8 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics. scenario:credit_card/is_card_number/37828224631000521389798
scenario:credit_card/is_card_number/x371413321323331
scenario:credit_card/is_card_number_no_luhn/ 378282246310005
scenario:credit_card/is_card_number_no_luhn/378282246310005
scenario:credit_card/is_card_number_no_luhn/37828224631000521389798
scenario:credit_card/is_card_number_no_luhn/x371413321323331
scenario:normalization/normalize_service/normalize_service/[empty string]
CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
Group 14
Group 15
Group 16
Group 17
Group 18
Group 19
BaselineOmitted due to size. |
af8e147 to
8354f61
Compare
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
8354f61 to
6389231
Compare
ivoanjo
left a comment
There was a problem hiding this comment.
Nice, this was way easier than I was expecting! I've left you a question in PM, otherwise happy to approve :)
6389231 to
3b3f4e0
Compare
3b3f4e0 to
af2d1c7
Compare
af2d1c7 to
0dca02a
Compare
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
ivoanjo
left a comment
There was a problem hiding this comment.
I really like this! Left a few notes :)
| # | ||
| # `rustls-platform-verifier` uses the Windows certificate APIs (Cert*), which live in Crypt32. | ||
| "ws2_32.lib", "advapi32.lib", "userenv.lib", "ntdll.lib", "bcrypt.lib", "ole32.lib", "crypt32.lib" | ||
| "advapi32.lib", | ||
| "bcrypt.lib", | ||
| # `rustls-platform-verifier` uses the Windows certificate APIs (Cert*), which live in Crypt32 | ||
| "crypt32.lib", | ||
| # `hickory-resolver` uses GetAdaptersAddresses, which lives in iphlpapi. | ||
| "iphlpapi.lib", | ||
| "ntdll.lib", | ||
| "ole32.lib", | ||
| "userenv.lib", | ||
| "ws2_32.lib" |
There was a problem hiding this comment.
(Note to other reviewers -- same list of libraries reordered, with only iphlpapi.lib being new)
| PowrProf | ||
| Version) | ||
| Version | ||
| iphlpapi) |
There was a problem hiding this comment.
This part I'm not confident at all in -- maybe worth confirming with @gleocadie that this new dll dependency is fine?
libdd-common/src/lib.rs
Outdated
| let use_hickory = std::env::var("DD_USE_HICKORY_DNS") | ||
| .ok() | ||
| .as_ref() | ||
| .map(|s| matches!(s.trim().to_lowercase().as_str(), "1" | "true" | "yes")) | ||
| .unwrap_or(false); | ||
| builder = builder.hickory_dns(use_hickory); |
There was a problem hiding this comment.
Regardless of the default (I'll comment on that separately), DD_USE_HICKORY_DNS seems to be a bit... too low level?
Maybe flip it around and use DD_FORCE_SYSTEM_DNS/DD_USE_LEGACY_DNS? Or, if not flipping it, DD_USE_BUILTIN_DNS?
libdd-common/src/lib.rs
Outdated
| // DNS resolver: use hickory only when DD_USE_HICKORY_DNS is set to a truthy value | ||
| // (e.g. 1, true, yes). Otherwise use the system resolver. |
There was a problem hiding this comment.
Unless I'm mistaken, only profiling is using the to_reqwest_client_builder. Being that the case, since it's already quite limited subset, I'd go with perhaps using hickory_dns by default, and then dropping a note to every library using libdatadog to perhaps include the env var as a release note in case anyone sees any resolver-related issues?
Also, and I think this would make testing a lot easier -- maybe we could take advantage of the above in an additional way: make the use_hickory a regular argument of this method, and have the env var parsing be part of the caller for now.
This enables easier testing of common AND enables us to evolve a bit how this toggling happens without ever needing to touch this code (which seems nice?)
| /// Client resolves host via hickory when DD_USE_HICKORY_DNS is set. | ||
| /// Uses http://example.com/ so DNS is actually exercised; example.com is reserved by | ||
| /// RFC 2606 for documentation and testing. These tests require network access. | ||
| #[tokio::test] | ||
| #[cfg_attr(miri, ignore)] | ||
| async fn test_hickory_dns_env_enabled() { | ||
| let _guard = EnvGuard::set("DD_USE_HICKORY_DNS", "1"); | ||
| let endpoint = Endpoint::from_slice("http://example.com/"); | ||
|
|
||
| let (builder, url) = endpoint | ||
| .to_reqwest_client_builder() | ||
| .expect("should build client"); | ||
| let client = builder.build().expect("should create client"); | ||
|
|
||
| let response = client | ||
| .get(&url) | ||
| .send() | ||
| .await | ||
| .expect("request should succeed"); | ||
| assert!( | ||
| response.status().is_success() || response.status().is_redirection(), | ||
| "status: {}", | ||
| response.status() | ||
| ); | ||
| } | ||
|
|
||
| /// Client resolves host via system resolver when DD_USE_HICKORY_DNS is unset. | ||
| /// Uses http://example.com/ so DNS is actually exercised; example.com is reserved by | ||
| /// RFC 2606 for documentation and testing. These tests require network access. | ||
| #[tokio::test] | ||
| #[cfg_attr(miri, ignore)] | ||
| async fn test_hickory_dns_env_disabled() { | ||
| let _guard = EnvGuard::remove("DD_USE_HICKORY_DNS"); | ||
| let endpoint = Endpoint::from_slice("http://example.com/"); | ||
|
|
||
| let (builder, url) = endpoint | ||
| .to_reqwest_client_builder() | ||
| .expect("should build client"); | ||
| let client = builder.build().expect("should create client"); | ||
|
|
||
| let response = client | ||
| .get(&url) | ||
| .send() | ||
| .await | ||
| .expect("request should succeed"); | ||
| assert!( | ||
| response.status().is_success() || response.status().is_redirection(), | ||
| "status: {}", | ||
| response.status() | ||
| ); |
There was a problem hiding this comment.
Hmmm... this is quite brittle -- for instance reading the docs:
It sounds like hickory may even be used by default even if the feature isn't enabled...? So there's a non-zero chance this test is not only brittle, but possibly misleading?
Looking at the docs, it looks like the reqwest::Client::builder() has also a dns_resolver parameter where a resolver can be passed in.
What if we allowed the resolver to be injected? E.g. basically reproduce what the match config.hickory_dns is doing in https://docs.rs/reqwest/latest/src/reqwest/async_impl/client.rs.html#421 ?
That way our tests could become:
fn test_without_hickory() {
let resolver = GaiResolver::new();
let builder = to_reqwest_client_builder(resolver);
let response = ...
// maybe somehow assert resolver got called?
}
fn test_with_hickory() {
let resolver = HickoryDnsResolver::default();
let builder = to_reqwest_client_builder(resolver);
let response = ...
// maybe somehow assert resolver got called? perhaps even by setting some config where only hickory can resolve?
}
| // Phase 1: hickory enabled — create client, send request, drop client, count threads. | ||
| let threads_after_hickory = { | ||
| let _guard = EnvGuard::set("DD_USE_HICKORY_DNS", "1"); | ||
| let (builder, url) = endpoint | ||
| .to_reqwest_client_builder() | ||
| .expect("should build client"); | ||
| let client = builder.build().expect("should create client"); | ||
| let _ = client.get(&url).send().await; | ||
| drop(client); | ||
| drop(_guard); | ||
| count_active_threads().expect("count_active_threads not supported on this platform") | ||
| }; | ||
|
|
||
| // Phase 2: system resolver — create client, send request, count threads (client alive). | ||
| let threads_with_system = { | ||
| let _guard = EnvGuard::remove("DD_USE_HICKORY_DNS"); | ||
| let (builder, url) = endpoint | ||
| .to_reqwest_client_builder() | ||
| .expect("should build client"); | ||
| let client = builder.build().expect("should create client"); | ||
| let _ = client.get(&url).send().await; | ||
| let count = count_active_threads() | ||
| .expect("count_active_threads not supported on this platform"); | ||
| drop(_guard); | ||
| count | ||
| }; |
There was a problem hiding this comment.
This test seems to be somewhat... weird/unfair. That is why do we drop the client before we do the check for hickory but not the system dns? Doesn't that mean that if hickory did actually use an extra thread but cleaned it up on drop, we wouldn't see it?
E.g. I think the test should maybe be exactly the same between both variants, the only difference should be the actual toggling on and off, and us observing the different number of threads, not the order of operations.
| assert_eq!( | ||
| threads_with_system, | ||
| threads_after_hickory + 1, | ||
| "system resolver should use one more thread than hickory (hickory: {}, system: {})", | ||
| threads_after_hickory, | ||
| threads_with_system | ||
| ); |
There was a problem hiding this comment.
Is this test... going to be flaky? E.g. I can imagine we're relying on things such as "how long does the background extra thread spawned for dns stay alive" -- or would it stay alive forever? Might be worth doing a quick check that this test is not relying too much on a race.
468316b to
ff80914
Compare
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: 299c424 | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
ff80914 to
c1504f3
Compare
c1504f3 to
71aba36
Compare
ivoanjo
left a comment
There was a problem hiding this comment.
Gave it another pass! Overall I think it's fine although:
- It's weird to me to expose "hickory-dns" anywhere beyond the core
- The way we're enabling/disabling hickory, as well as testing it, is a bit brittle and I think even staying within rust's weird constraints it's possible to further improve the tests
libdd-common-ffi/src/endpoint.rs
Outdated
| }; | ||
| } | ||
|
|
||
| /// Set whether to use the system DNS resolver instead of hickory-dns when building the reqwest |
There was a problem hiding this comment.
I think in general the words "hickory-dns" are an implementation detail that is weird to see showing up outside of the actual configuration of reqwest in common.
That is -- we could as well use any other dns resolver, or even roll out our own, the key distinction between both modes is "using the system resolver or not" ;)
| /// Client resolves host via hickory when use_system_resolver is false. | ||
| /// Uses http://example.com/ so DNS is actually exercised; example.com is reserved by | ||
| /// RFC 2606 for documentation and testing. These tests require network access. | ||
| #[tokio::test] | ||
| #[cfg_attr(miri, ignore)] | ||
| async fn test_hickory_dns_when_system_resolver_disabled() { | ||
| let endpoint = Endpoint::from_slice("http://example.com/").with_system_resolver(false); | ||
|
|
||
| let (builder, url) = endpoint | ||
| .to_reqwest_client_builder() | ||
| .expect("should build client"); | ||
| let client = builder.build().expect("should create client"); | ||
|
|
||
| let response = client | ||
| .get(&url) | ||
| .send() | ||
| .await | ||
| .expect("request should succeed"); | ||
| assert!( | ||
| response.status().is_success() || response.status().is_redirection(), | ||
| "status: {}", | ||
| response.status() | ||
| ); | ||
| } | ||
|
|
||
| /// Client resolves host via system resolver when with_system_resolver(true) is used. | ||
| /// Uses http://example.com/ so DNS is actually exercised; example.com is reserved by | ||
| /// RFC 2606 for documentation and testing. These tests require network access. | ||
| #[tokio::test] | ||
| #[cfg_attr(miri, ignore)] | ||
| async fn test_system_resolver_when_requested() { | ||
| let endpoint = Endpoint::from_slice("http://example.com/").with_system_resolver(true); | ||
|
|
||
| let (builder, url) = endpoint | ||
| .to_reqwest_client_builder() | ||
| .expect("should build client"); | ||
| let client = builder.build().expect("should create client"); | ||
|
|
||
| let response = client | ||
| .get(&url) | ||
| .send() | ||
| .await | ||
| .expect("request should succeed"); | ||
| assert!( | ||
| response.status().is_success() || response.status().is_redirection(), | ||
| "status: {}", | ||
| response.status() | ||
| ); | ||
| } |
There was a problem hiding this comment.
Idk, I'm still not a fan of this test as-is: it can still pass even if hickory is both used for both tests OR never used in either test AND requires network: so double whammy of meh things.
Yes, we have the "saves thread" test below, but I think it's really brittle to rely on it. And, well, if we're relying on it, then these two aren't doing anything and we should just delete them.
(I'd still recommend going with the option of allowing the resolver to be injected directly, and then having helper methods to set hickory/system still, so callers didn't themselves need to do that. That's what I'd do if I was implementing this in Ruby -- although I'd use argument defaults to make it a bit nicer)
There was a problem hiding this comment.
Tried updating the tests.
71aba36 to
83b2e93
Compare
| /// TODO: Even the in-process resolver can lead to long-lived threads that outlast the | ||
| /// runtime (e.g. on OSX the "Grand Central Dispatch" thread). This should be | ||
| /// investigated so we can tighten or simplify the assertions. |
There was a problem hiding this comment.
If the extra threads are on macOS, can we relax the assertion only there? If we use the stricter assertions on Linux, we might be able to catch a "oops" that the relaxed assertions for macOS might let slip.
There was a problem hiding this comment.
It appears to also be linux, I just have not investigated those as carefully. Going to merge this since its an improvement, then keep looking to try to understand the other threads.

What does this PR do?
Motivation
https://github.com/DataDog/dd-source/pull/355564
The non-use of the blocking pool is particularly nice, since it lets us avoid the uncontrolled thread that can break things on forks.
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Describe here in detail how the change can be validated.