Message ID | 20250211214328.640374-10-kwolf@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | rust/block: Add minimal block driver bindings | expand |
On 2/11/25 22:43, Kevin Wolf wrote: > +/// A request to a block driver > +pub enum Request { > + Read { offset: u64, len: u64 }, > +} > + Maybe add flags already? > +#[allow(dead_code)] > +pub enum MappingTarget { > + /// The described blocks are unallocated. Reading from them yields zeros. > + Unmapped, > + > + /// The described blocks are stored in a child node. > + Data { > + /// Child node in which the data is stored > + node: (), Make it already a *mut BlockDriverState, or *mut BdrvChild? Or are you worried of irritating the borrow checker? :) > + /// Offset in the child node at which the data is stored > + offset: u64, > + }, > +} > + > +/// A mapping for a number of contiguous guest blocks > +pub struct Mapping { > + /// Offset of the mapped blocks from the perspective of the guest > + pub offset: u64, > + /// Length of the mapping in bytes > + pub len: u64, > + /// Where the data for the described blocks is stored > + pub target: MappingTarget, > +} > + > /// A trait for writing block drivers. > /// > /// Types that implement this trait can be registered as QEMU block drivers using the > @@ -37,6 +72,11 @@ unsafe fn open( > > /// Returns the size of the image in bytes > fn size(&self) -> u64; > + > + /// Returns the mapping for the first part of `req`. If the returned mapping is shorter than > + /// the request, the function can be called again with a shortened request to get the mapping > + /// for the remaining part. > + async fn map(&self, req: &Request) -> io::Result<Mapping>; I am not sure I like the idea of making this the only way to do a read. Maybe you can keep preadv_part in the trait, and add an utility function like: async fn bdrv_co_preadv_with_mappings<F: Future<Output = io::Result<()>>>( mut offset: u64, mut bytes: u64, // IIRC qiov is not changed, but perhaps you still want to require &mut // for it to _write_ memory? qiov: &mut bindings::QEMUIOVector, mut qiov_offset: usize, flags: bindings::BdrvRequestFlags, mut f: impl FnMut(Request) -> F) -> io::Result<()> { while bytes > 0 { let req = Request::Read { offset, len: bytes, flags }; let mapping = f(req).await?; ... } Ok(()) } Then you can implement BochsImage's trait methods as: async fn preadv_part( &self, offset: u64, bytes: u64, qiov: &mut bindings::QEMUIOVector, qiov_offset: usize, flags: bindings::BdrvRequestFlags, ) -> io::Result<()> { bdrv_co_preadv_with_mappings(offset, bytes, qiov, qiov_offset, flags, |req| self.map(req)).await } BochsImage::map() is a private function and the outer bdrv_co_preadv_part() only handles qemu_co_run_future() and converting the io::Result. Paolo
Am 12.02.2025 um 16:05 hat Paolo Bonzini geschrieben: > On 2/11/25 22:43, Kevin Wolf wrote: > > +/// A request to a block driver > > +pub enum Request { > > + Read { offset: u64, len: u64 }, > > +} > > + > > Maybe add flags already? > > +#[allow(dead_code)] > > +pub enum MappingTarget { > > + /// The described blocks are unallocated. Reading from them yields zeros. > > + Unmapped, > > + > > + /// The described blocks are stored in a child node. > > + Data { > > + /// Child node in which the data is stored > > + node: (), > > Make it already a *mut BlockDriverState, or *mut BdrvChild? Or are you worried of > irritating the borrow checker? :) Mostly I just didn't need it yet and was too lazy to think about the details. The right type would probably be Arc<driver::BdrvChild> (which contains the raw *mut pointer). But I haven't thought much about how this plays together with graph changes. > > + /// Offset in the child node at which the data is stored > > + offset: u64, > > + }, > > +} > > + > > +/// A mapping for a number of contiguous guest blocks > > +pub struct Mapping { > > + /// Offset of the mapped blocks from the perspective of the guest > > + pub offset: u64, > > + /// Length of the mapping in bytes > > + pub len: u64, > > + /// Where the data for the described blocks is stored > > + pub target: MappingTarget, > > +} > > + > > /// A trait for writing block drivers. > > /// > > /// Types that implement this trait can be registered as QEMU block drivers using the > > @@ -37,6 +72,11 @@ unsafe fn open( > > /// Returns the size of the image in bytes > > fn size(&self) -> u64; > > + > > + /// Returns the mapping for the first part of `req`. If the returned mapping is shorter than > > + /// the request, the function can be called again with a shortened request to get the mapping > > + /// for the remaining part. > > + async fn map(&self, req: &Request) -> io::Result<Mapping>; > > I am not sure I like the idea of making this the only way to do a read. We'll clearly need a way for drivers to have explicit functions when the blocks aren't just mapped to somewhere else, e.g. with compression or encryption. (My idea for that was that it would be another MappingTarget branch.) But using map() as the primary interface is intentional as it enables a few things, even though this isn't visible in the code yet. Of course, that basically every block driver duplicates the same loop is a reason to unify it, but as you say there are other ways to do that. One thing map() can do in the common case is provide the caller just a list of mappings to a file descriptor and an offset. This may in the future allow some block exports like ublk or fuse to use zero copy operations. You can't do that if the block driver already does an explicit read into a buffer. You can cache mappings outside of the block driver and even of QEMU. Block exports could tell the kernel that it shouldn't even bother with going to userspace if a request can be completed using a cached mapping. Two hosts with shared storage could access the same qcow2 image without one of them sending all data across the network (like we did in KubeSAN with NBD), but it's enough if metadata (= mappings) are synchronised. So I think this is a useful interface to have, even if we don't know which of the possibilities we'll actually make use of. And it should be the primary interface, because if you want to do caching, then cache invalidation must be done properly by block drivers, which is more likely to go wrong if it's just an additional interface that isn't normally used. Kevin
diff --git a/rust/block/src/driver.rs b/rust/block/src/driver.rs index 5c7c46bfa0..2849ef6949 100644 --- a/rust/block/src/driver.rs +++ b/rust/block/src/driver.rs @@ -9,11 +9,46 @@ use crate::{IoBuffer, SizedIoBuffer}; use qemu_api::bindings; +use qemu_api::futures::qemu_co_run_future; +use std::cmp::min; use std::ffi::c_void; use std::io::{self, Error, ErrorKind}; use std::mem::MaybeUninit; use std::ptr; +/// A request to a block driver +pub enum Request { + Read { offset: u64, len: u64 }, +} + +/// The target for a number of guest blocks, e.g. a location in a child node or the information +/// that the described blocks are unmapped. +// FIXME Actually use node +#[allow(dead_code)] +pub enum MappingTarget { + /// The described blocks are unallocated. Reading from them yields zeros. + Unmapped, + + /// The described blocks are stored in a child node. + Data { + /// Child node in which the data is stored + node: (), + + /// Offset in the child node at which the data is stored + offset: u64, + }, +} + +/// A mapping for a number of contiguous guest blocks +pub struct Mapping { + /// Offset of the mapped blocks from the perspective of the guest + pub offset: u64, + /// Length of the mapping in bytes + pub len: u64, + /// Where the data for the described blocks is stored + pub target: MappingTarget, +} + /// A trait for writing block drivers. /// /// Types that implement this trait can be registered as QEMU block drivers using the @@ -37,6 +72,11 @@ unsafe fn open( /// Returns the size of the image in bytes fn size(&self) -> u64; + + /// Returns the mapping for the first part of `req`. If the returned mapping is shorter than + /// the request, the function can be called again with a shortened request to get the mapping + /// for the remaining part. + async fn map(&self, req: &Request) -> io::Result<Mapping>; } /// Represents the connection between a parent and its child node. @@ -161,6 +201,65 @@ pub async fn read_uninit<T: SizedIoBuffer>( } } +#[doc(hidden)] +pub unsafe extern "C" fn bdrv_co_preadv_part<D: BlockDriver>( + bs: *mut bindings::BlockDriverState, + offset: i64, + bytes: i64, + qiov: *mut bindings::QEMUIOVector, + mut qiov_offset: usize, + flags: bindings::BdrvRequestFlags, +) -> std::os::raw::c_int { + let s = unsafe { &mut *((*bs).opaque as *mut D) }; + + let mut offset = offset as u64; + let mut bytes = bytes as u64; + + while bytes > 0 { + let req = Request::Read { offset, len: bytes }; + let mapping = match qemu_co_run_future(s.map(&req)) { + Ok(mapping) => mapping, + Err(e) => { + return e + .raw_os_error() + .unwrap_or(-(bindings::EIO as std::os::raw::c_int)) + } + }; + + let mapping_offset = offset - mapping.offset; + let cur_bytes = min(bytes, mapping.len - mapping_offset); + + match mapping.target { + MappingTarget::Unmapped => unsafe { + bindings::qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes.try_into().unwrap()); + }, + MappingTarget::Data { + node: _, + offset: target_offset, + } => unsafe { + // TODO Support using child nodes other than bs->file + let ret = bindings::bdrv_co_preadv_part( + (*bs).file, + (target_offset + mapping_offset) as i64, + cur_bytes as i64, + qiov, + qiov_offset, + flags, + ); + if ret < 0 { + return ret; + } + }, + } + + offset += cur_bytes; + qiov_offset += cur_bytes as usize; + bytes -= cur_bytes; + } + + 0 +} + /// Declare a format block driver. This macro is meant to be used at the top level. /// /// `typ` is a type implementing the [`BlockDriver`] trait to handle the image format with the @@ -174,6 +273,7 @@ macro_rules! block_driver { instance_size: ::std::mem::size_of::<$typ>() as i32, bdrv_open: Some($crate::driver::bdrv_open::<$typ>), bdrv_close: Some($crate::driver::bdrv_close::<$typ>), + bdrv_co_preadv_part: Some($crate::driver::bdrv_co_preadv_part::<$typ>), bdrv_child_perm: Some(::qemu_api::bindings::bdrv_default_perms), is_format: true, ..::qemu_api::zeroable::Zeroable::ZERO
This adds a map() function to the BlockDriver trait and makes use of it to implement reading from an image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- rust/block/src/driver.rs | 100 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+)