Flow eviction and LFT/UFT lifecycle rework#990
Conversation
42acc3e to
7d0fd8b
Compare
This PR reworks the LFT/UFT lifecycle so that inbound/outbound entries are correctly bound by a single lifetime, and so that a slowpath packet tracks all LFTs which were used in its handling. We then associate this list of LFTs with the flow UFT, and the TCP flow entry if created. This allows us to correctly keep LFTs alive so long as a UFT/TCP entry refers to them. This has been a longstanding bug. The new and really useful feature is that this allows us to remove all related and dependent entries when we deliberately evict an LFT to make space for a new flow. An expiry policy for any table can now also specifiy an "eviction priority" for a given flow. Generally speaking most flows are unevictable so long as they are well-behaved, but the main issue is that we need information from elsewhere to *know* when we can make this determination. How we tackle this is by looking at all descendent flows which rely on a given flow. Most will state no preference on the flow lifecycle, but the TCP flow entry can return a non-zero priority when inactive in a given state for too long. Closes #986. Closes #788. Closes #789.
7d0fd8b to
c8bfdd9
Compare
7711592 to
2b7de6d
Compare
d09ad98 to
b1fb3c2
Compare
f37152d to
1df13fe
Compare
|
From what I can tell this appears to have the same throughput on |
rcgoodfellow
left a comment
There was a problem hiding this comment.
Thanks for putting this together @FelixMcFelix. Questions and comments follow.
On the #986 front, while this PR does not introduce the provisional table we discussed, it looks like it does satisfy properties (1) and (2) from that ticket and therefore would satisfy the intent of the ticket. For (1) we only mark connections as eligible for eviction if the last hit is older than the TTL for the connection state they are in. For (2) the priority based eviction seems to cover that, but I have some questions on how it's implemented below.
|
|
||
| /// Operations that the main flow identifier type for a given [`NetworkImpl`] | ||
| /// must support. | ||
| // This should live in `opte::api`, but we don't want to introduce an |
There was a problem hiding this comment.
Let's file a follow up issue for this.
| // should be immune to timer-driven expiry. They are *not* immune to | ||
| // explicit eviction. | ||
| // | ||
| // We don't arrange the UFT as a parent of the TCP flow entry because |
There was a problem hiding this comment.
Just food for thought, I wonder if the three way relationship between UFT/LFT/TFT is more trilateral in nature than parent/child. E.g. we just want to have the three way associations in place through pointers to be able to have policy over all three that may evolve over time and make the parent/child relationship less of a thing.
| Direction::Out => &mut inner.outbound_parents, | ||
| }; | ||
|
|
||
| core::mem::swap(new_parent_slot, &mut parents); |
There was a problem hiding this comment.
What happens if the reaper runs when the parent/child lists are potentially inconsistent between the parent vec swap here and the child addition/removals below? (or other threads that may hit this inconsistent window, consecutive slowpath packets on the same flow?)
| // Ordinarily we will expire flows in these states quickly. If | ||
| // we are under table pressure, we can allow them to be | ||
| // explicitly selected for removal faster. | ||
| a if a > self.incipient_ttl.as_milliseconds() / 2 => { |
There was a problem hiding this comment.
Consider making the 2 a constant with some rationale on why this divisor was selected.
| // quickly, but leave them in place for the ~120s when we are | ||
| // not under pressure. | ||
| a if a > self.incipient_ttl.as_milliseconds() => { | ||
| Some(EvictionPriority::Evictable(NonZeroU16::MIN)) |
There was a problem hiding this comment.
For flows that are beyond the TTL, should the priority reflect how far beyond the TTL they are, something like so the flows furthest beyond their TTL get evicted first? Same question for line 3182 below.
| // Allow established flows to be evicted on the same time scale as | ||
| // UFT/LFT expiry, but if we are not under pressure then we aim to | ||
| // keep the LFTs in place even while the flow is inactive. | ||
| TcpState::Established |
There was a problem hiding this comment.
For the established case, we may want to have this be tunable per OPTE port. Maybe I'm not totally understanding the model, but if we get to a place where a flow exists for a TCP session with a sparse communication pattern that has intervals of silence beyond 60 seconds, the flow will stay in place until the system comes under pressure, then the flow may be evicted in favor of new flows due to high connection churn. In that situation, is the flow dead or does it just encounter a slowpath speed bump? If dead, that seems rather unfortunate and I think we'd want to provide a knob for users to tune FLOW_DEF_TTL for their applications.
| } | ||
| Some((EvictionKey::Evictable(curr_prio, curr_time), ..)) | ||
| if prio > curr_prio | ||
| || (prio == curr_prio && last_hit < curr_time) => |
There was a problem hiding this comment.
Is last hit what we want here, or do we want amount of time beyond TTL, as the TTL can pretty different depending on what state the flow is in? This is related to the question earlier on how we set the eviction priority when looking at the delta relative to TTL in TcpExpiry::eviction_priority.
| /// Return an iterator containing references to all flow entries from other | ||
| /// tables which underpin `self`. | ||
| fn parents(&self) -> impl Iterator<Item = Arc<dyn FlowEntryInfo>> { | ||
| [].into_iter() |
There was a problem hiding this comment.
I feel like this is a bit of a bug magnet if a new impl forgets or does not realize they need to implement this. Consider removing the default and making impls fill in the method?
This PR reworks the LFT/UFT lifecycle so that inbound/outbound entries are correctly bound by a single lifetime, and so that a slowpath packet tracks all LFTs which were used in its handling. We then associate this list of LFTs with the flow UFT, and the TCP flow entry if created.
This allows us to correctly keep LFTs alive so long as a UFT/TCP entry refers to them. This has been a longstanding bug. The new and really useful feature is that this allows us to remove all related and dependent entries when we deliberately evict an LFT to make space for a new flow.
An expiry policy for any table can now also specifiy an "eviction priority" for a given flow. Generally speaking most flows are unevictable so long as they are well-behaved, but the main issue is that we need information from elsewhere to know when we can make this determination. How we tackle this is by looking at all descendent flows which rely on a given flow. Most will state no preference on the flow lifecycle, but the TCP flow entry can return a non-zero priority when inactive in a given state for too long.
Closes #986.
Closes #788.
Closes #789.