Conversation
Created using jj-spr 0.1.0
There was a problem hiding this comment.
license-eye has checked 350 files.
| Valid | Invalid | Ignored | Fixed |
|---|---|---|---|
| 262 | 1 | 87 | 0 |
Click to see the invalid file list
- lib/propolis/src/vsock/poller_stub.rs
Use this command to fix any missing license headers
```bash
docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
</details>
Created using jj-spr 0.1.0
|
There will be a few follow up commits but I wanted to get this up for some review time while that stuff is in progress. cc @dancrossnyc if you have review cycles. |
Created using jj-spr 0.1.0
| if let Some(idx) = mem.read::<u16>(self.gpa_idx) { | ||
| let ndesc = Wrapping(*idx) - self.cur_avail_idx; | ||
| if ndesc.0 != 0 && ndesc.0 < rsize { | ||
| if ndesc.0 != 0 && ndesc.0 <= rsize { |
There was a problem hiding this comment.
is this a fix for a previous existing bug?
There was a problem hiding this comment.
Yup, turns out if you ask for n descriptors and the guest gives you them you should do the right thing!
| // If the data can fit in the remaining space of the ring buffer copy it | ||
| // in one go. | ||
| if data.len() <= available_len { | ||
| self.buf[head_offset..head_offset + data.len()] |
There was a problem hiding this comment.
nit: a cute little trick @cbiffle showed me for slicing something to offset..len is that you can alternatively write:
| self.buf[head_offset..head_offset + data.len()] | |
| self.buf[head_offset..][...data.len()] |
There was a problem hiding this comment.
Oh that's cute. I assume that compiles down to the same code as the first version?
There was a problem hiding this comment.
Sadly, it's actually slightly worse (per this Godbolt), as the version with two slice indexing expressions generates two unique panic sites (as they have slightly different source locations).
Sigh.
| #[repr(C, packed)] | ||
| #[derive(Copy, Clone, Default, Debug)] | ||
| pub struct VsockPacketHeader { |
There was a problem hiding this comment.
nit, not too important: perhaps we ought to link to someplace like https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-39000010 as a reference for the layout of these headers?
| fwd_cnt: u32, | ||
| } | ||
|
|
||
| impl VsockPacketHeader { |
There was a problem hiding this comment.
this is somewhat coming out of left field and it's not super important, but it occurs to me that we might consider using the zerocopy crate for these structures. obviously, the biggest reason to reach for it is that it would allow us to write directly into a byte buffer or read fields out of that buffer in a structured way without having to copy all the bytes around (which may not really have a huge performance impact here) but it also has a rather neat zerocopy::byteorder module for defining structures of integers with explicit endiannesses which might make some of this code a bit simpler. on the other hand, it might just be a bunch of effort that isn't really worth it here. i dunno.
There was a problem hiding this comment.
I like this suggestion. The need for packed here at the moment is unfortunately, but Zerocopy handles the padding issue. My count is that this struct is 44 bytes long, but aligned to an 8-byte boundary, which means space at the end. Ugh.
| "dropping connect request to unknown mapping"; | ||
| "packet" => ?packet, |
There was a problem hiding this comment.
nit, take it or leave it: perhaps we ought to explicitly log the unknown port here too?
There was a problem hiding this comment.
I am logging the packet in this case since it has both the src and dst ports.
| /// Bytes we've consumed from vbuf (forwarded to socket). | ||
| fwd_cnt: Wrapping<u32>, | ||
| /// The fwd_cnt value we last sent to the guest in a credit update. | ||
| last_fwd_cnt_sent: Wrapping<u32>, | ||
| /// Bytes we've sent to the guest from the socket. | ||
| tx_cnt: Wrapping<u32>, | ||
| /// Guest's buffer allocation. | ||
| peer_buf_alloc: u32, | ||
| /// Bytes the guest has consumed from their buffer. | ||
| peer_fwd_cnt: Wrapping<u32>, |
There was a problem hiding this comment.
maybe worth commentary on some of these explaining why they're wrapping?
lib/propolis/src/hw/virtio/vsock.rs
Outdated
| Arc::new(Self { cid, backend, virtio_state, pci_state }) | ||
| } | ||
|
|
||
| // fn _send_transport_reset(&self) { |
There was a problem hiding this comment.
Did you intend to leave this in, but commented-out?
There was a problem hiding this comment.
I removed it for now. It's needed to support live migration... it can be added back in a follow up PR.
lib/propolis/src/vsock/packet.rs
Outdated
| type VsockSocketType = u16; | ||
|
|
||
| /// Shutdown flags for VIRTIO_VSOCK_OP_SHUTDOWN | ||
| pub const VIRTIO_VSOCK_SHUTDOWN_F_RECEIVE: u32 = 1; |
There was a problem hiding this comment.
For these, consider using bitflags::bitflags! or something similar?
lib/propolis/src/vsock/packet.rs
Outdated
| // The spec states: | ||
| // | ||
| // The upper 32 bits of src_cid and dst_cid are reserved and zeroed. | ||
| u64::from_le(self.dst_cid) & u64::from(u32::MAX) |
There was a problem hiding this comment.
There's a crate called bit_field that can help with this: https://docs.rs/bit_field/latest/bit_field/
You can write something like:
use bit_field::BitField;
pub fn src_cid(&self) -> u64 {
u64::from_le(self.src_cid).get_bits(0..32)
}There was a problem hiding this comment.
done, as propolis already uses it!
lib/propolis/src/vsock/packet.rs
Outdated
| } | ||
|
|
||
| impl VsockPacket { | ||
| pub fn new_reset(guest_cid: u32, src_port: u32, dst_port: u32) -> Self { |
There was a problem hiding this comment.
You appear to set every field in the struct, so using ::default() to create a mut struct and then a change of .set_... feels like it could just be:
pub fn new_reset(guest_cid: u32, src_port: u32, dst_port: u32) -> Self {
VsockPacketHeader {
src_cid: VSOCK_HOST_CID as u32,
dst_cid: guest_cid,
src_port,
dst_port,
len: 0,
socket_type: VIRTIO_VSOCK_TYPE_STREAM,
op: VIRTIO_VSOCK_OP_RST,
buf_alloc: 0,
fwd_cnt: 0,
}
}Or, given the enums above, this could perhaps be:
pub fn new_reset(guest_cid: u32, src_port: u32, dst_port: u32) -> Self {
let dst_cid = guest_cid.into();
let src_cid = HOST_CID.into();
let socket_type = SocketType::Stream as u16;
let op = PacketOp::Reset as u16;
Self {
dst_cid,
src_cid,
src_port,
dst_port,
socket_type,
op,
..Default::default()
}
}For that matter, you could have a (private) "generic" new that takes the op, and delegate the rest to specializations of that:
pub fn new_reset(guest_cid: u32, src_port: u32, dst_port: u32) -> Self {
let dst_cid = guest_cid.into();
let src_cid = HOST_CID.into();
let socket_type = SocketType::Stream as u16;
let op = VsockPacketOp::Reset as u16;
Self {
dst_cid,
src_cid,
src_port,
dst_port,
socket_type,
op,
..Default::default()
}
}And similarly with the other constructors, elsewhere.
Or, you could write a "common" private new and implement everything else in terms of that:
fn new(guest_cid: u32, src_port: u32, dst_port: u32, op: PacketOp) -> Self {
let dst_cid = guest_cid.into();
let src_cid = HOST_CID.into();
let socket_type = SocketType::Stream as u16;
let op = op as u16;
Self {
dst_cid,
src_cid,
src_port,
dst_port,
socket_type,
op,
..Default::default()
}
}
pub fn new_reset(guest_cid: u32, src_port: u32, dst_port: u32) -> Self {
Self::new(guest_cid, src_port, dst_port, PacketOp::Reset)
}
pub fn new_response(guest_cid: u32, src_port: u32, dst_port: u32, buf_alloc: u32) -> Self {
Self {
buf_alloc,
..Self::new(guest_cid, src_port, dst_port, PacketOp::Response)
}
}
pub fn new_rw(
guest_cid: u32,
src_port: u32,
dst_port: u32,
buf_alloc: u32,
fwd_cnt: u32,
data: &[u8],
) -> Self {
let len = data.len() as u32;
Self {
len,
buf_alloc,
fwd_cnt,
..Self::new(guest_cid, src_port, dst_port, PacketOp::ReadWrite)
}
}
pub fn new_credit_udpate(
guest_cid: u32,
src_port: u32,
dst_port: u32,
buf_alloc: u32,
fwd_cnt: u32,
) -> Self {
Self {
buf_alloc,
fwd_cnt,
..Self::new(guest_cid, src_port, dst_port, PacketOp::UpdateCredit)
}
}
pub fn new_shutdown(guest_cid: u32, src_port: u32, dst_port: u32, flags: u32) -> Self {
Self {
flags,
..Self::new(guest_cid, src_port, dst_port, PacketOp::Shutdown)
}
}Or, finally, if new is implemented in terms of explicit initialization of all members, then you can make it const, and make all of the associated constructors similarly const:
| pub fn new_reset(guest_cid: u32, src_port: u32, dst_port: u32) -> Self { | |
| const fn new(guest_cid: u32, src_port: u32, dst_port: u32, op: PacketOp) -> Self { | |
| Self { | |
| dst_cid: guest_cid as u64, | |
| src_cid: HOST_CID as u64, | |
| src_port, | |
| dst_port, | |
| len: 0, | |
| socket_type: SocketType::Stream as u16, | |
| op: op as u16, | |
| flags: 0, | |
| buf_alloc: 0, | |
| fwd_cnt: 0, | |
| } | |
| } | |
| pub const fn new_reset(guest_cid: u32, src_port: u32, dst_port: u32) -> Self { | |
| Self::new(guest_cid, src_port, dst_port, PacketOp::Reset) | |
| } | |
| pub const fn new_response( | |
| guest_cid: u32, | |
| src_port: u32, | |
| dst_port: u32, | |
| buf_alloc: u32, | |
| ) -> Self { | |
| Self { | |
| buf_alloc, | |
| ..Self::new(guest_cid, src_port, dst_port, PacketOp::Response) | |
| } | |
| } | |
| pub const fn new_rw( | |
| guest_cid: u32, | |
| src_port: u32, | |
| dst_port: u32, | |
| buf_alloc: u32, | |
| fwd_cnt: u32, | |
| data: &[u8], | |
| ) -> Self { | |
| let len = data.len() as u32; | |
| Self { | |
| len, | |
| buf_alloc, | |
| fwd_cnt, | |
| ..Self::new(guest_cid, src_port, dst_port, PacketOp::ReadWrite) | |
| } | |
| } | |
| pub const fn new_credit_udpate( | |
| guest_cid: u32, | |
| src_port: u32, | |
| dst_port: u32, | |
| buf_alloc: u32, | |
| fwd_cnt: u32, | |
| ) -> Self { | |
| Self { | |
| buf_alloc, | |
| fwd_cnt, | |
| ..Self::new(guest_cid, src_port, dst_port, PacketOp::UpdateCredit) | |
| } | |
| } | |
| pub const fn new_shutdown(guest_cid: u32, src_port: u32, dst_port: u32, flags: u32) -> Self { | |
| Self { | |
| flags, | |
| ..Self::new(guest_cid, src_port, dst_port, PacketOp::Shutdown) | |
| } | |
| } |
There was a problem hiding this comment.
it didn't work out exactly like this but I think I cleaned it up
lib/propolis/src/vsock/poller.rs
Outdated
| } | ||
| } | ||
|
|
||
| bitflags! { |
There was a problem hiding this comment.
To whatever extent possible, I encourage avoiding direct use of libc. In this case, something form nix might be helpful: https://docs.rs/nix/latest/nix/poll/struct.PollFlags.html
@iximeow and I talked about this, and I of course defer to them; but I'll note that nix is already a dependency of propolis, though indirectly.
There was a problem hiding this comment.
my hesitations were (are?) mostly about nix's permissiveness about actual libc calls, I think using the enums from there make a lot of sense either way (they're just a slightly souped up form of this macro though)
| "pci-virtio-vsock" => { | ||
| // XXX MTZ: add the vsock device | ||
| let config = config::VsockDevice::from_opts(&dev.options)?; | ||
| let bdf = bdf.unwrap(); |
There was a problem hiding this comment.
throw in an error message here if no bdf is specified?
There was a problem hiding this comment.
Okay if we leave it as is? The rest of standalone has this pattern which is why I copied/pasted it.
lib/propolis/src/vsock/poller.rs
Outdated
| } | ||
|
|
||
| /// Set of `PollEvents` that signifies a readable event. | ||
| fn fd_readable() -> PollEvents { |
There was a problem hiding this comment.
Instead of doing something like this, I'd consider implementing an is_readable method on PollEvents.
impl PollEvents {
pub const fn is_readable(self, event: Self) -> bool {
const READABLE: Self = PollEvents::IN | PollEvents::HUP | PollEvents::ERR | PollEvents::PRI;
READABLE.intersects(event)
}
}| cid: u32, | ||
| queues: VsockVq, | ||
| log: Logger, | ||
| port_mappings: IdHashMap<VsockPortMapping>, |
There was a problem hiding this comment.
It's probably not necessary, but I would perhaps consider using an IdOrdMap here to get in on some of that B-Tree cache locality goodness.
There was a problem hiding this comment.
Order doesn't really matter here and we currently will have 1 item in this mapping for propolis-server while propolis-standalone will allow you to define as many as you want. It's probably overkill for what it is already but makes the code nicer to deal with.
|
I haven't gone through |
iximeow
left a comment
There was a problem hiding this comment.
I was just gonna mention the acc_mem thing because I happened to spot it, and then I started reading more, sorry 😁
I very slightly skimmed the rest but mostly focused on the places we're sending data between guest and host, doing unsafe {}, etc.
"changes requested" because of the acc_mem.access() None handling, the packet validation-before-use, and guest-provided cid truncation bits. the rest are generally suggestions that I think would help but I'm not as pressed about one way or the other.
| } | ||
|
|
||
| pub fn write(self, header: &VsockPacketHeader, data: &[u8]) { | ||
| let mem = self.vq.acc_mem.access().expect("mem access for write"); |
There was a problem hiding this comment.
if acc_mem.access() returns None then either the memory maps have been taken away or we've gotten to implementing bus mastering and this device has been blocked from accessing memory (which probably shouldn't panic propolis)
the former shouldn't happen since we require devices and vcpus to be quiesced before tearing things down, and the latter won't currently happen because we just don't do that yet. but at the least ought to be "TODO: cannot access memory?". the expect() is fine to me, just don't want to lose track of where we'll need future work.
it'd be worth taking a look at other acc_mem.access() in this change to make sure error handling makes sense w.r.t above.
(you'll find similar todo's all over in other device code 🙂🙁)
There was a problem hiding this comment.
yeah I took a look through and there's a mix of TODO, unwraps, and Errors in various places 😞.
I added the TODOs
lib/propolis/src/hw/virtio/vsock.rs
Outdated
| if self.rx_chain.is_none() { | ||
| let mem = self.acc_mem.access()?; | ||
| let vq = self.queues.get(VSOCK_RX_QUEUE as usize)?; | ||
| let mut chain = Chain::with_capacity(10); | ||
| vq.pop_avail(&mut chain, &mem)?; | ||
| self.rx_chain = Some(chain); | ||
| } | ||
|
|
||
| let header_size = std::mem::size_of::<VsockPacketHeader>(); | ||
| let available_data_space = self | ||
| .rx_chain | ||
| .as_ref() | ||
| .unwrap() | ||
| .remain_write_bytes() | ||
| .saturating_sub(header_size); | ||
|
|
||
| Some(RxPermit { available_data_space, vq: self }) |
There was a problem hiding this comment.
since the expectation is that self.rx_chain is Some by the time we're making RxPermit, would it make sense to actually hold rx_chain in RxPermit and get there like...
| if self.rx_chain.is_none() { | |
| let mem = self.acc_mem.access()?; | |
| let vq = self.queues.get(VSOCK_RX_QUEUE as usize)?; | |
| let mut chain = Chain::with_capacity(10); | |
| vq.pop_avail(&mut chain, &mem)?; | |
| self.rx_chain = Some(chain); | |
| } | |
| let header_size = std::mem::size_of::<VsockPacketHeader>(); | |
| let available_data_space = self | |
| .rx_chain | |
| .as_ref() | |
| .unwrap() | |
| .remain_write_bytes() | |
| .saturating_sub(header_size); | |
| Some(RxPermit { available_data_space, vq: self }) | |
| let chain = if let Some(chain) = self.rx_chain.take() { | |
| chain | |
| } else { /* self.rx_chain.is_none() */ | |
| let mem = self.acc_mem.access()?; | |
| let vq = self.queues.get(VSOCK_RX_QUEUE as usize)?; | |
| let mut chain = Chain::with_capacity(10); | |
| vq.pop_avail(&mut chain, &mem)?; | |
| chain | |
| }; | |
| Some(RxPermit { vq: self, chain }) |
?
(and available_data_space can be computed as-needed from the provided chain over on RxPermit)
this lets you still keep a &mut VsockVq to know there's at most one outstanding chain, and on RxPermit could stuff the chain back on VsockVq in a drop impl.
barring that, I think you could have a helper on VsockVq that directly returns (&Arc<VirtQueue>, &mut Chain) for RxPermit to hold, so the permit can operate directly on the chain instead of having to expect it can reach through an option
There was a problem hiding this comment.
Ugh, I have made some of these changes but not all of them. There's an unfortunate design issue here where if we move the chain into RxPermit and then have a drop impl the compiler gets grump in the poller code because it can't determine that permit.write() has been used and it thinks self is mutably borrowed twice because of the Drop impl needing it.
Along with your other comment about handling associate_fd and bubbling up errors to the connection handing level, I think it would be best if I refactored a bunch of this. We can keep a lot of state in the VsockProxyConn and process all of the the state in one central function. I played around with this but I think it's more important to land this for R19 before we do additional refactoring here. It's my tech debt that I would like to fix once a few of the other pieces land.
| } | ||
|
|
||
| pub struct PciVirtioSock { | ||
| cid: u32, |
There was a problem hiding this comment.
since the VirtIO spec describes this as a u64 and as you've pointed out the upper half is reserved, i'm super fine with making this cid: u64 and asserting the upper half is zeroed in PciVirtioSock::new with a nod to the spec in that assert.
There was a problem hiding this comment.
I am happy keeping this a u32 since propolis-standalone and server can track ensure that it's a u32 and then we use the u64 on the wire. Unless you feel strongly about this, I will leave it as is?
lib/propolis/src/vsock/packet.rs
Outdated
| // The spec states: | ||
| // | ||
| // The upper 32 bits of src_cid and dst_cid are reserved and zeroed. | ||
| u64::from_le(self.dst_cid) & u64::from(u32::MAX) |
There was a problem hiding this comment.
we should at the very least warn if these reserved bits are set. ideally if there's a way to tell the guest that this packet is invalid and respond with an error, we should do that instead of masking off the upper bits. as-is if a guest addresses a packet to 0x00000001_00000002 we'll treat it the same as a packet addressed to the host when it really is not.
I see Linux does not accept 64-bit CID in sockaddr_vm, but if someone did and their kernel checked for CID == 2 as a way to disallow access to the host, we'd be undermining that check!
There was a problem hiding this comment.
I fixed this in the packet parsing code and it will now return an error if the src_cid or dst_cid contain non zero values in the reserved bits. Additionally we validate that an incoming vsock packet is using the guest assigned cid and otherwise drops the packet as we don't know how to send a RST to an unknown guest.
| libc::port_getn( | ||
| self.port_fd.as_raw_fd(), | ||
| events.as_mut_ptr() as *mut libc::port_event, | ||
| MAX_EVENTS, | ||
| &mut nget, |
There was a problem hiding this comment.
my read of port_get(3c) is that we are saying here that we want to block for at least one event (nget = 1 above), and that we are willing to receive up to 32 (MAX_EVENTS) events. and then nget is set to however many we actually got once there was at least one event to read. is that right? (I consider this "me learning port_getn" not like, needing a comment or anything)
then we std::ptr::null_mut() (! not const ! because this unfortunate wrinkle in libc) to wait without a timeout. and that's fine because this is in a thread we spawn for the poller, where there's no runtime to be driving or other work to do? (having a remark that we don't wait with timeout and that's fine for such and such reasons seems helpful)
There was a problem hiding this comment.
You are correct, and I left a TODO where the timeout would be specified as a follow up commit will likely break out of the loop occasionally to check if there are connections we need to kill.
lib/propolis/src/vsock/poller.rs
Outdated
| for i in 0..nget as usize { | ||
| let event = PortEvent::from_raw(unsafe { | ||
| events[i].assume_init_read() | ||
| }); |
There was a problem hiding this comment.
i'd say this unsafe needs a // Safety: port_getn has promised up to nget events are initialized, but a different way to spell this might be...
| for i in 0..nget as usize { | |
| let event = PortEvent::from_raw(unsafe { | |
| events[i].assume_init_read() | |
| }); | |
| // Safety: port_getn has promised ... etc | |
| let events = unsafe { | |
| std::slice::from_parts( | |
| events.as_ptr() as *const libc::port_event, | |
| nget | |
| ) | |
| }; |
... additionally, this is my own caution, but until we see a perf need I'd also consider zero-filling the whole thing instead of MaybeUninit. likewise, assert!(nget <= events.len()) so that we have a check here that we have not suggested to the OS we might want more events than we should be reading, and that the OS has not suggested we should read more data than we can?
| fd => fd, | ||
| }; | ||
|
|
||
| // Set CLOEXEC on the event port fd |
There was a problem hiding this comment.
why? I don't think we'd ever legitimately exec from propolis..
There was a problem hiding this comment.
This seemed like a reasonable assumption, but I did a quick check for Command::new and it appears in at least the dladm crate. So better to be safe than sorry?
| if libc::fcntl( | ||
| fd, | ||
| libc::F_SETFD, | ||
| libc::fcntl(fd, libc::F_GETFD) | libc::FD_CLOEXEC, |
There was a problem hiding this comment.
I realize this is a bit pedantic but I assume fcntl has to return some error if fd was a totally bogus value. fcntl(2) doesn't say this explicitly and I assume could use a few words about what if The fildes argument is _not_ an open file descriptor?.
but if we do actually need CLOEXEC set for some reason, please check that fcntl(F_GETFD) is not negative, consistent with the promise about in ~RETURN VALUES` there 😁
There was a problem hiding this comment.
Interesting bit from the man page:
F_GETFD
Value of flags defined in <fcntl.h>. The return value
will not be negative.
Not sure if that means an invalid fd will just return an empty set?
|
|
||
| if ret < 0 { | ||
| let err = std::io::Error::last_os_error(); | ||
| if let Some(conn) = self.connections.remove(&key) { |
There was a problem hiding this comment.
if there wasn't a connection and we'd tried to associate_fd, that's pretty weird right? I was going to suggest that we should warn in the else here, but maybe it makes more sense to just return the error upwards and let the caller decide to remove/reset the connection? I think that would be something like trying to get the connection in handle_fd_event, then passing the get'd connection for the read/write handling.
that leaves all the self.connections management at that top level instead of trying to clean up from under ourselves and trying to account for that with re-getting the connection for reads and then writes.
There was a problem hiding this comment.
See the comment above...tl;dr I think I should refactor this to be much nicer post R19.
Created using jj-spr 0.1.0
Created using jj-spr 0.1.0
Created using jj-spr 0.1.0
Fixes: #969