diff mbox series

[v4,19/25] block_int-common.h: split function pointers in BlockDriver

Message ID 20211025101735.2060852-20-eesposit@redhat.com (mailing list archive)
State New, archived
Headers show
Series block layer: split block APIs in global state and I/O | expand

Commit Message

Emanuele Giuseppe Esposito Oct. 25, 2021, 10:17 a.m. UTC
Similar to the header split, also the function pointers in BlockDriver
can be split in I/O and global state.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/block_int-common.h | 458 ++++++++++++++++---------------
 1 file changed, 237 insertions(+), 221 deletions(-)

Comments

Hanna Czenczek Nov. 15, 2021, noon UTC | #1
On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
> Similar to the header split, also the function pointers in BlockDriver
> can be split in I/O and global state.
>
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   include/block/block_int-common.h | 458 ++++++++++++++++---------------
>   1 file changed, 237 insertions(+), 221 deletions(-)
>
> diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
> index 79a3d801d2..9857e775fe 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -96,6 +96,7 @@ typedef struct BdrvTrackedRequest {
>   
>   
>   struct BlockDriver {
> +    /* Fields initialized in struct definition and never changed. */

I find this a bit difficult to understand.  Perhaps we could be more 
verbose to make it easier to read?  Like

“These fields are initialized when this object is created, and are never 
changed afterwards”

And I’d also add an empty line below to make visually clear that this 
does not describe the single subsequent field (format_name) but the 
whole chunk of fields.

I also wonder if we could substantiate the claim “never changed 
afterwards” by making these fields consts.  Then again, I suppose 
technically none of the fields in BlockDriver is supposed to be mutable 
(except for .list), neither these fields nor the function pointers...  
Yeah, maybe scratch that idea.

>       const char *format_name;
>       int instance_size;

[...]

> +    int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
> +                                          QEMUIOVector *qiov,
> +                                          int64_t pos);
> +    int coroutine_fn (*bdrv_load_vmstate)(BlockDriverState *bs,
> +                                          QEMUIOVector *qiov,
> +                                          int64_t pos);

In patch 18, you classified bdrv_co_readv_vmstate() and 
bdrv_co_writev_vmstate() as I/O functions.  They call these methods, 
though, so something doesn’t seem quite right.

Now I also remember you classified the global bdrv_save_vmstate() and 
bdrv_load_vmstate() functions as GS.  I don’t mind either way; AFAIU 
they are I/O functions that are generally called under the BQL.  But the 
classification should be consistent, of course.

> +
> +    int (*bdrv_change_backing_file)(BlockDriverState *bs,
> +        const char *backing_file, const char *backing_fmt);
> +
> +    void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
> +    void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);

Above bdrv_is_inserted(), there’s a comment (“removable device 
specific”) that I think should be duplicated here.

> +
> +    /* TODO Better pass a option string/QDict/QemuOpts to add any rule? */
> +    int (*bdrv_debug_breakpoint)(BlockDriverState *bs, const char *event,
> +        const char *tag);
> +    int (*bdrv_debug_remove_breakpoint)(BlockDriverState *bs,
> +        const char *tag);
> +    int (*bdrv_debug_resume)(BlockDriverState *bs, const char *tag);
> +    bool (*bdrv_debug_is_suspended)(BlockDriverState *bs, const char *tag);
> +    void (*bdrv_refresh_limits)(BlockDriverState *bs, Error **errp);

Originally, there was an empty line before bdrv_refresh_limits(), and 
I’d keep it.

[...]

> +    /**
> +     * Try to get @bs's logical and physical block size.
> +     * On success, store them in @bsz and return zero.
> +     * On failure, return negative errno.
> +     */
> +    int (*bdrv_probe_blocksizes)(BlockDriverState *bs, BlockSizes *bsz);

Originally, there was a comment above this function (“I/O API, even 
though if it's a filter jumps on parent”) that you added in patch 6 and 
that I didn’t understand.  Given this, I’m not unhappy to see it go 
again, but now I wonder about its purpose even more...

> +    /**
> +     * Try to get @bs's geometry (cyls, heads, sectors)
> +     * On success, store them in @geo and return 0.
> +     * On failure return -errno.
> +     * Only drivers that want to override guest geometry implement this
> +     * callback; see hd_geometry_guess().
> +     */
> +    int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);

(Here, too)

[...]

> +    /**
> +     * Register/unregister a buffer for I/O. For example, when the driver is
> +     * interested to know the memory areas that will later be used in iovs, so
> +     * that it can do IOMMU mapping with VFIO etc., in order to get better
> +     * performance. In the case of VFIO drivers, this callback is used to do
> +     * DMA mapping for hot buffers.
> +     */
> +    void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
> +    void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
> +    QLIST_ENTRY(BlockDriver) list;

This field is interesting because it’s the only mutable field in the 
whole struct.  I think it should be separated from the function pointers 
above and given a comment like “This field is modified only under the 
BQL, and is part of the global state”.

> +
> +    /*
> +     * I/O API functions. These functions are thread-safe.
> +     *
> +     * See include/block/block-io.h for more information about
> +     * the I/O API.
> +     */
> +
> +    int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
> +                                       Error **errp);
> +    int coroutine_fn (*bdrv_co_create_opts)(BlockDriver *drv,
> +                                            const char *filename,
> +                                            QemuOpts *opts,
> +                                            Error **errp);

Now this is really interesting.  Technically I suppose these should work 
in any thread, but trying to do so results in:

$ touch /tmp/iothread-create-test.qcow2
$ ./qemu-system-x86_64 -object iothread,id=iothr0 -qmp stdio <<EOF
{"execute": "qmp_capabilities"}
{"execute":"blockdev-add","arguments":{"node-name":"proto","driver":"file","filename":"/tmp/iothread-create-test.qcow2"}}
{"execute":"x-blockdev-set-iothread","arguments":{"node-name":"proto","iothread":"iothr0"}}
{"execute":"blockdev-create","arguments":{"job-id":"create","options":{"driver":"qcow2","file":"proto","size":0}}}
EOF
{"QMP": {"version": {"qemu": {"micro": 90, "minor": 1, "major": 6}, 
"package": "v6.2.0-rc0-40-gd02d5fe5fb-dirty"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": 1636973542, "microseconds": 338117}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "created", "id": "create"}}
{"timestamp": {"seconds": 1636973542, "microseconds": 338197}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "running", "id": "create"}}
{"return": {}}
qemu: qemu_mutex_unlock_impl: Operation not permitted
[1]    86154 IOT instruction (core dumped)  ./qemu-system-x86_64 -object 
iothread,id=iothr0 -qmp stdio <<<''

So something’s fishy and perhaps we should investigate this...  I mean, 
I can’t really imagine a case where someone would need to run a 
blockdev-create job in an I/O thread, but right now the interface allows 
for it.

And then bdrv_create() is classified as global state, and also 
bdrv_co_create_opts_simple(), which is supposed to be a drop-in function 
for this .bdrv_co_create_opts function.  So that can’t work.

Also, I believe there might have been some functions you classified as 
GS that are called from .create* implementations.  I accepted that, 
given the abort I sketched above.  However, if we classify image 
creation as I/O, then those would need to be re-evaluated. For example, 
qcow2_co_create_opts() calls bdrv_create_file(), which is a GS function.

Some of this issues could be addressed by making .bdrv_co_create_opts a 
GS function and .bdrv_co_create an I/O function.  I believe that would 
be the ideal split, even though as shown above .bdrv_co_create doesn’t 
work in an I/O thread, and then you have the issue of probably all 
format drivers’ .bdrv_co_create implementations calling 
bdrv_open_blockdev_ref(), which is a GS function.

(VMDK even calls blk_new_open(), blk_new_with_bs(), and blk_unref(), 
none of which can ever be I/O functions, I think.)

I believe in practice the best is to for now classify all create-related 
functions as GS functions.  This is supported by the fact that 
qmp_blockdev_create() specifically creates the create job in the main 
context (with a TODO comment) and requires block drivers to error out 
when they encounter a node in a different AioContext.

> +    int coroutine_fn (*bdrv_co_amend)(BlockDriverState *bs,
> +                                      BlockdevAmendOptions *opts,
> +                                      bool force,
> +                                      Error **errp);
> +
>       /* aio */
>       BlockAIOCB *(*bdrv_aio_preadv)(BlockDriverState *bs,
>           int64_t offset, int64_t bytes, QEMUIOVector *qiov,

[...]

> @@ -443,47 +632,20 @@ struct BlockDriver {

Right above here (i.e. line 631), there’s an attribute 
`has_variable_length`.  I believe that should go up to the immutable 
fields section.

>       int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
>       BlockMeasureInfo *(*bdrv_measure)(QemuOpts *opts, BlockDriverState *in_bs,
>                                         Error **errp);
> -

I’d keep this empty line.

Hanna

>       int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
>           int64_t offset, int64_t bytes, QEMUIOVector *qiov);
>       int coroutine_fn (*bdrv_co_pwritev_compressed_part)(BlockDriverState *bs,
>           int64_t offset, int64_t bytes, QEMUIOVector *qiov,
>           size_t qiov_offset);
Emanuele Giuseppe Esposito Nov. 18, 2021, 12:42 p.m. UTC | #2
On 15/11/2021 13:00, Hanna Reitz wrote:
>> +
>> +    /*
>> +     * I/O API functions. These functions are thread-safe.
>> +     *
>> +     * See include/block/block-io.h for more information about
>> +     * the I/O API.
>> +     */
>> +
>> +    int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
>> +                                       Error **errp);
>> +    int coroutine_fn (*bdrv_co_create_opts)(BlockDriver *drv,
>> +                                            const char *filename,
>> +                                            QemuOpts *opts,
>> +                                            Error **errp);
> 
> Now this is really interesting.  Technically I suppose these should work 
> in any thread, but trying to do so results in:
> 
> $ touch /tmp/iothread-create-test.qcow2
> $ ./qemu-system-x86_64 -object iothread,id=iothr0 -qmp stdio <<EOF
> {"execute": "qmp_capabilities"}
> {"execute":"blockdev-add","arguments":{"node-name":"proto","driver":"file","filename":"/tmp/iothread-create-test.qcow2"}} 
> 
> {"execute":"x-blockdev-set-iothread","arguments":{"node-name":"proto","iothread":"iothr0"}} 
> 
> {"execute":"blockdev-create","arguments":{"job-id":"create","options":{"driver":"qcow2","file":"proto","size":0}}} 
> 
> EOF
> {"QMP": {"version": {"qemu": {"micro": 90, "minor": 1, "major": 6}, 
> "package": "v6.2.0-rc0-40-gd02d5fe5fb-dirty"}, "capabilities": ["oob"]}}
> {"return": {}}
> {"return": {}}
> {"return": {}}
> {"timestamp": {"seconds": 1636973542, "microseconds": 338117}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "create"}}
> {"timestamp": {"seconds": 1636973542, "microseconds": 338197}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "create"}}
> {"return": {}}
> qemu: qemu_mutex_unlock_impl: Operation not permitted
> [1]    86154 IOT instruction (core dumped)  ./qemu-system-x86_64 -object 
> iothread,id=iothr0 -qmp stdio <<<''
> 
> So something’s fishy and perhaps we should investigate this...  I mean, 
> I can’t really imagine a case where someone would need to run a 
> blockdev-create job in an I/O thread, but right now the interface allows 
> for it.
> 
> And then bdrv_create() is classified as global state, and also 
> bdrv_co_create_opts_simple(), which is supposed to be a drop-in function 
> for this .bdrv_co_create_opts function.  So that can’t work.
> 
> Also, I believe there might have been some functions you classified as 
> GS that are called from .create* implementations.  I accepted that, 
> given the abort I sketched above.  However, if we classify image 
> creation as I/O, then those would need to be re-evaluated. For example, 
> qcow2_co_create_opts() calls bdrv_create_file(), which is a GS function.
> 
> Some of this issues could be addressed by making .bdrv_co_create_opts a 
> GS function and .bdrv_co_create an I/O function.  I believe that would 
> be the ideal split, even though as shown above .bdrv_co_create doesn’t 
> work in an I/O thread, and then you have the issue of probably all 
> format drivers’ .bdrv_co_create implementations calling 
> bdrv_open_blockdev_ref(), which is a GS function.
> 
> (VMDK even calls blk_new_open(), blk_new_with_bs(), and blk_unref(), 
> none of which can ever be I/O functions, I think.)
> 
> I believe in practice the best is to for now classify all create-related 
> functions as GS functions.  This is supported by the fact that 
> qmp_blockdev_create() specifically creates the create job in the main 
> context (with a TODO comment) and requires block drivers to error out 
> when they encounter a node in a different AioContext.
> 

Ok after better reviewing this I agree with you:
- .bdrv_co_create_opts is for sure a GS function. It is called by 
bdrv_create and it is asserted to be under BQL.
- .bdrv_co_create should also be a GS, and the easiest thing to do would 
be to follow the existing TODO and make sure we cannot run it outside 
the main loop. I think that I will put it as GS, and add the BQL 
assertion to blockdev_create_run, so that if for some reasons someone 
tries to do what you did above, will crash because of the assertion, and 
not because of the aiocontext lock missing.

Emanuele
diff mbox series

Patch

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 79a3d801d2..9857e775fe 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -96,6 +96,7 @@  typedef struct BdrvTrackedRequest {
 
 
 struct BlockDriver {
+    /* Fields initialized in struct definition and never changed. */
     const char *format_name;
     int instance_size;
 
@@ -121,23 +122,7 @@  struct BlockDriver {
      * on those children.
      */
     bool is_format;
-    /*
-     * Return true if @to_replace can be replaced by a BDS with the
-     * same data as @bs without it affecting @bs's behavior (that is,
-     * without it being visible to @bs's parents).
-     */
-    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
-                                     BlockDriverState *to_replace);
 
-    int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
-    int (*bdrv_probe_device)(const char *filename);
-
-    /*
-     * Any driver implementing this callback is expected to be able to handle
-     * NULL file names in its .bdrv_open() implementation.
-     */
-    void (*bdrv_parse_filename)(const char *filename, QDict *options,
-                                Error **errp);
     /*
      * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
      * this field set to true, except ones that are defined only by their
@@ -159,7 +144,66 @@  struct BlockDriver {
      */
     bool supports_backing;
 
-    /* For handling image reopen for split or non-split files */
+    /*
+     * Drivers setting this field must be able to work with just a plain
+     * filename with '<protocol_name>:' as a prefix, and no other options.
+     * Options may be extracted from the filename by implementing
+     * bdrv_parse_filename.
+     */
+    const char *protocol_name;
+
+    /* List of options for creating images, terminated by name == NULL */
+    QemuOptsList *create_opts;
+
+    /* List of options for image amend */
+    QemuOptsList *amend_opts;
+
+    /*
+     * If this driver supports reopening images this contains a
+     * NULL-terminated list of the runtime options that can be
+     * modified. If an option in this list is unspecified during
+     * reopen then it _must_ be reset to its default value or return
+     * an error.
+     */
+    const char *const *mutable_opts;
+
+    /*
+     * Pointer to a NULL-terminated array of names of strong options
+     * that can be specified for bdrv_open(). A strong option is one
+     * that changes the data of a BDS.
+     * If this pointer is NULL, the array is considered empty.
+     * "filename" and "driver" are always considered strong.
+     */
+    const char *const *strong_runtime_opts;
+
+    /*
+     * Global state (GS) API. These functions run under the BQL lock.
+     *
+     * See include/block/block-global-state.h for more information about
+     * the GS API.
+     */
+
+    /*
+     * Return true if @to_replace can be replaced by a BDS with the
+     * same data as @bs without it affecting @bs's behavior (that is,
+     * without it being visible to @bs's parents).
+     */
+    bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
+                                     BlockDriverState *to_replace);
+
+    int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
+    int (*bdrv_probe_device)(const char *filename);
+
+    /*
+     * Any driver implementing this callback is expected to be able to handle
+     * NULL file names in its .bdrv_open() implementation.
+     */
+    void (*bdrv_parse_filename)(const char *filename, QDict *options,
+                                Error **errp);
+
+    /*
+     * For handling image reopen for split or non-split files.
+     */
     int (*bdrv_reopen_prepare)(BDRVReopenState *reopen_state,
                                BlockReopenQueue *queue, Error **errp);
     void (*bdrv_reopen_commit)(BDRVReopenState *reopen_state);
@@ -175,19 +219,6 @@  struct BlockDriver {
                           Error **errp);
     void (*bdrv_close)(BlockDriverState *bs);
 
-
-    int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
-                                       Error **errp);
-    int coroutine_fn (*bdrv_co_create_opts)(BlockDriver *drv,
-                                            const char *filename,
-                                            QemuOpts *opts,
-                                            Error **errp);
-
-    int coroutine_fn (*bdrv_co_amend)(BlockDriverState *bs,
-                                      BlockdevAmendOptions *opts,
-                                      bool force,
-                                      Error **errp);
-
     int (*bdrv_amend_options)(BlockDriverState *bs,
                               QemuOpts *opts,
                               BlockDriverAmendStatusCB *status_cb,
@@ -234,6 +265,182 @@  struct BlockDriver {
      */
     char *(*bdrv_dirname)(BlockDriverState *bs, Error **errp);
 
+    /*
+     * This informs the driver that we are no longer interested in the result
+     * of in-flight requests, so don't waste the time if possible.
+     *
+     * One example usage is to avoid waiting for an nbd target node reconnect
+     * timeout during job-cancel with force=true.
+     */
+    void (*bdrv_cancel_in_flight)(BlockDriverState *bs);
+
+    int (*bdrv_inactivate)(BlockDriverState *bs);
+
+    int (*bdrv_snapshot_create)(BlockDriverState *bs,
+                                QEMUSnapshotInfo *sn_info);
+    int (*bdrv_snapshot_goto)(BlockDriverState *bs,
+                              const char *snapshot_id);
+    int (*bdrv_snapshot_delete)(BlockDriverState *bs,
+                                const char *snapshot_id,
+                                const char *name,
+                                Error **errp);
+    int (*bdrv_snapshot_list)(BlockDriverState *bs,
+                              QEMUSnapshotInfo **psn_info);
+    int (*bdrv_snapshot_load_tmp)(BlockDriverState *bs,
+                                  const char *snapshot_id,
+                                  const char *name,
+                                  Error **errp);
+
+    int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
+                                          QEMUIOVector *qiov,
+                                          int64_t pos);
+    int coroutine_fn (*bdrv_load_vmstate)(BlockDriverState *bs,
+                                          QEMUIOVector *qiov,
+                                          int64_t pos);
+
+    int (*bdrv_change_backing_file)(BlockDriverState *bs,
+        const char *backing_file, const char *backing_fmt);
+
+    void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
+    void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
+
+    /* TODO Better pass a option string/QDict/QemuOpts to add any rule? */
+    int (*bdrv_debug_breakpoint)(BlockDriverState *bs, const char *event,
+        const char *tag);
+    int (*bdrv_debug_remove_breakpoint)(BlockDriverState *bs,
+        const char *tag);
+    int (*bdrv_debug_resume)(BlockDriverState *bs, const char *tag);
+    bool (*bdrv_debug_is_suspended)(BlockDriverState *bs, const char *tag);
+    void (*bdrv_refresh_limits)(BlockDriverState *bs, Error **errp);
+
+    /*
+     * Returns 1 if newly created images are guaranteed to contain only
+     * zeros, 0 otherwise.
+     */
+    int (*bdrv_has_zero_init)(BlockDriverState *bs);
+
+    /*
+     * Remove fd handlers, timers, and other event loop callbacks so the event
+     * loop is no longer in use.  Called with no in-flight requests and in
+     * depth-first traversal order with parents before child nodes.
+     */
+    void (*bdrv_detach_aio_context)(BlockDriverState *bs);
+
+    /*
+     * Add fd handlers, timers, and other event loop callbacks so I/O requests
+     * can be processed again.  Called with no in-flight requests and in
+     * depth-first traversal order with child nodes before parent nodes.
+     */
+    void (*bdrv_attach_aio_context)(BlockDriverState *bs,
+                                    AioContext *new_context);
+
+    /**
+     * Try to get @bs's logical and physical block size.
+     * On success, store them in @bsz and return zero.
+     * On failure, return negative errno.
+     */
+    int (*bdrv_probe_blocksizes)(BlockDriverState *bs, BlockSizes *bsz);
+    /**
+     * Try to get @bs's geometry (cyls, heads, sectors)
+     * On success, store them in @geo and return 0.
+     * On failure return -errno.
+     * Only drivers that want to override guest geometry implement this
+     * callback; see hd_geometry_guess().
+     */
+    int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
+
+    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
+                           Error **errp);
+    void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child,
+                           Error **errp);
+
+    /**
+     * Informs the block driver that a permission change is intended. The
+     * driver checks whether the change is permissible and may take other
+     * preparations for the change (e.g. get file system locks). This operation
+     * is always followed either by a call to either .bdrv_set_perm or
+     * .bdrv_abort_perm_update.
+     *
+     * Checks whether the requested set of cumulative permissions in @perm
+     * can be granted for accessing @bs and whether no other users are using
+     * permissions other than those given in @shared (both arguments take
+     * BLK_PERM_* bitmasks).
+     *
+     * If both conditions are met, 0 is returned. Otherwise, -errno is returned
+     * and errp is set to an error describing the conflict.
+     */
+    int (*bdrv_check_perm)(BlockDriverState *bs, uint64_t perm,
+                           uint64_t shared, Error **errp);
+
+    /**
+     * Called to inform the driver that the set of cumulative set of used
+     * permissions for @bs has changed to @perm, and the set of sharable
+     * permission to @shared. The driver can use this to propagate changes to
+     * its children (i.e. request permissions only if a parent actually needs
+     * them).
+     *
+     * This function is only invoked after bdrv_check_perm(), so block drivers
+     * may rely on preparations made in their .bdrv_check_perm implementation.
+     */
+    void (*bdrv_set_perm)(BlockDriverState *bs, uint64_t perm, uint64_t shared);
+
+    /*
+     * Called to inform the driver that after a previous bdrv_check_perm()
+     * call, the permission update is not performed and any preparations made
+     * for it (e.g. taken file locks) need to be undone.
+     *
+     * This function can be called even for nodes that never saw a
+     * bdrv_check_perm() call. It is a no-op then.
+     */
+    void (*bdrv_abort_perm_update)(BlockDriverState *bs);
+
+    /**
+     * Returns in @nperm and @nshared the permissions that the driver for @bs
+     * needs on its child @c, based on the cumulative permissions requested by
+     * the parents in @parent_perm and @parent_shared.
+     *
+     * If @c is NULL, return the permissions for attaching a new child for the
+     * given @child_class and @role.
+     *
+     * If @reopen_queue is non-NULL, don't return the currently needed
+     * permissions, but those that will be needed after applying the
+     * @reopen_queue.
+     */
+     void (*bdrv_child_perm)(BlockDriverState *bs, BdrvChild *c,
+                             BdrvChildRole role,
+                             BlockReopenQueue *reopen_queue,
+                             uint64_t parent_perm, uint64_t parent_shared,
+                             uint64_t *nperm, uint64_t *nshared);
+
+    /**
+     * Register/unregister a buffer for I/O. For example, when the driver is
+     * interested to know the memory areas that will later be used in iovs, so
+     * that it can do IOMMU mapping with VFIO etc., in order to get better
+     * performance. In the case of VFIO drivers, this callback is used to do
+     * DMA mapping for hot buffers.
+     */
+    void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
+    void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
+    QLIST_ENTRY(BlockDriver) list;
+
+    /*
+     * I/O API functions. These functions are thread-safe.
+     *
+     * See include/block/block-io.h for more information about
+     * the I/O API.
+     */
+
+    int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
+                                       Error **errp);
+    int coroutine_fn (*bdrv_co_create_opts)(BlockDriver *drv,
+                                            const char *filename,
+                                            QemuOpts *opts,
+                                            Error **errp);
+    int coroutine_fn (*bdrv_co_amend)(BlockDriverState *bs,
+                                      BlockdevAmendOptions *opts,
+                                      bool force,
+                                      Error **errp);
+
     /* aio */
     BlockAIOCB *(*bdrv_aio_preadv)(BlockDriverState *bs,
         int64_t offset, int64_t bytes, QEMUIOVector *qiov,
@@ -374,21 +581,11 @@  struct BlockDriver {
         bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
         int64_t *map, BlockDriverState **file);
 
-    /*
-     * This informs the driver that we are no longer interested in the result
-     * of in-flight requests, so don't waste the time if possible.
-     *
-     * One example usage is to avoid waiting for an nbd target node reconnect
-     * timeout during job-cancel with force=true.
-     */
-    void (*bdrv_cancel_in_flight)(BlockDriverState *bs);
-
     /*
      * Invalidate any cached meta-data.
      */
     void coroutine_fn (*bdrv_co_invalidate_cache)(BlockDriverState *bs,
                                                   Error **errp);
-    int (*bdrv_inactivate)(BlockDriverState *bs);
 
     /*
      * Flushes all data for all layers by calling bdrv_co_flush for underlying
@@ -414,14 +611,6 @@  struct BlockDriver {
      */
     int coroutine_fn (*bdrv_co_flush_to_os)(BlockDriverState *bs);
 
-    /*
-     * Drivers setting this field must be able to work with just a plain
-     * filename with '<protocol_name>:' as a prefix, and no other options.
-     * Options may be extracted from the filename by implementing
-     * bdrv_parse_filename.
-     */
-    const char *protocol_name;
-
     /*
      * Truncate @bs to @offset bytes using the given @prealloc mode
      * when growing.  Modes other than PREALLOC_MODE_OFF should be
@@ -443,47 +632,20 @@  struct BlockDriver {
     int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
     BlockMeasureInfo *(*bdrv_measure)(QemuOpts *opts, BlockDriverState *in_bs,
                                       Error **errp);
-
     int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
         int64_t offset, int64_t bytes, QEMUIOVector *qiov);
     int coroutine_fn (*bdrv_co_pwritev_compressed_part)(BlockDriverState *bs,
         int64_t offset, int64_t bytes, QEMUIOVector *qiov,
         size_t qiov_offset);
 
-    int (*bdrv_snapshot_create)(BlockDriverState *bs,
-                                QEMUSnapshotInfo *sn_info);
-    int (*bdrv_snapshot_goto)(BlockDriverState *bs,
-                              const char *snapshot_id);
-    int (*bdrv_snapshot_delete)(BlockDriverState *bs,
-                                const char *snapshot_id,
-                                const char *name,
-                                Error **errp);
-    int (*bdrv_snapshot_list)(BlockDriverState *bs,
-                              QEMUSnapshotInfo **psn_info);
-    int (*bdrv_snapshot_load_tmp)(BlockDriverState *bs,
-                                  const char *snapshot_id,
-                                  const char *name,
-                                  Error **errp);
     int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
 
     ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs,
                                                  Error **errp);
     BlockStatsSpecific *(*bdrv_get_specific_stats)(BlockDriverState *bs);
 
-    int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
-                                          QEMUIOVector *qiov,
-                                          int64_t pos);
-    int coroutine_fn (*bdrv_load_vmstate)(BlockDriverState *bs,
-                                          QEMUIOVector *qiov,
-                                          int64_t pos);
-
-    int (*bdrv_change_backing_file)(BlockDriverState *bs,
-        const char *backing_file, const char *backing_fmt);
-
     /* removable device specific */
     bool (*bdrv_is_inserted)(BlockDriverState *bs);
-    void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
-    void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
 
     /* to control generic scsi devices */
     BlockAIOCB *(*bdrv_aio_ioctl)(BlockDriverState *bs,
@@ -492,21 +654,6 @@  struct BlockDriver {
     int coroutine_fn (*bdrv_co_ioctl)(BlockDriverState *bs,
                                       unsigned long int req, void *buf);
 
-    /* List of options for creating images, terminated by name == NULL */
-    QemuOptsList *create_opts;
-
-    /* List of options for image amend */
-    QemuOptsList *amend_opts;
-
-    /*
-     * If this driver supports reopening images this contains a
-     * NULL-terminated list of the runtime options that can be
-     * modified. If an option in this list is unspecified during
-     * reopen then it _must_ be reset to its default value or return
-     * an error.
-     */
-    const char *const *mutable_opts;
-
     /*
      * Returns 0 for completed check, -errno for internal errors.
      * The check results are stored in result.
@@ -517,58 +664,10 @@  struct BlockDriver {
 
     void (*bdrv_debug_event)(BlockDriverState *bs, BlkdebugEvent event);
 
-    /* TODO Better pass a option string/QDict/QemuOpts to add any rule? */
-    int (*bdrv_debug_breakpoint)(BlockDriverState *bs, const char *event,
-        const char *tag);
-    int (*bdrv_debug_remove_breakpoint)(BlockDriverState *bs,
-        const char *tag);
-    int (*bdrv_debug_resume)(BlockDriverState *bs, const char *tag);
-    bool (*bdrv_debug_is_suspended)(BlockDriverState *bs, const char *tag);
-
-    void (*bdrv_refresh_limits)(BlockDriverState *bs, Error **errp);
-
-    /*
-     * Returns 1 if newly created images are guaranteed to contain only
-     * zeros, 0 otherwise.
-     */
-    int (*bdrv_has_zero_init)(BlockDriverState *bs);
-
-    /*
-     * Remove fd handlers, timers, and other event loop callbacks so the event
-     * loop is no longer in use.  Called with no in-flight requests and in
-     * depth-first traversal order with parents before child nodes.
-     */
-    void (*bdrv_detach_aio_context)(BlockDriverState *bs);
-
-    /*
-     * Add fd handlers, timers, and other event loop callbacks so I/O requests
-     * can be processed again.  Called with no in-flight requests and in
-     * depth-first traversal order with child nodes before parent nodes.
-     */
-    void (*bdrv_attach_aio_context)(BlockDriverState *bs,
-                                    AioContext *new_context);
-
     /* io queue for linux-aio */
     void (*bdrv_io_plug)(BlockDriverState *bs);
     void (*bdrv_io_unplug)(BlockDriverState *bs);
 
-    /**
-     * Try to get @bs's logical and physical block size.
-     * On success, store them in @bsz and return zero.
-     * On failure, return negative errno.
-     */
-    /* I/O API, even though if it's a filter jumps on parent */
-    int (*bdrv_probe_blocksizes)(BlockDriverState *bs, BlockSizes *bsz);
-    /**
-     * Try to get @bs's geometry (cyls, heads, sectors)
-     * On success, store them in @geo and return 0.
-     * On failure return -errno.
-     * Only drivers that want to override guest geometry implement this
-     * callback; see hd_geometry_guess().
-     */
-    /* I/O API, even though if it's a filter jumps on parent */
-    int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
-
     /**
      * bdrv_co_drain_begin is called if implemented in the beginning of a
      * drain operation to drain and stop any internal sources of requests in
@@ -582,69 +681,6 @@  struct BlockDriver {
     void coroutine_fn (*bdrv_co_drain_begin)(BlockDriverState *bs);
     void coroutine_fn (*bdrv_co_drain_end)(BlockDriverState *bs);
 
-    void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
-                           Error **errp);
-    void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child,
-                           Error **errp);
-
-    /**
-     * Informs the block driver that a permission change is intended. The
-     * driver checks whether the change is permissible and may take other
-     * preparations for the change (e.g. get file system locks). This operation
-     * is always followed either by a call to either .bdrv_set_perm or
-     * .bdrv_abort_perm_update.
-     *
-     * Checks whether the requested set of cumulative permissions in @perm
-     * can be granted for accessing @bs and whether no other users are using
-     * permissions other than those given in @shared (both arguments take
-     * BLK_PERM_* bitmasks).
-     *
-     * If both conditions are met, 0 is returned. Otherwise, -errno is returned
-     * and errp is set to an error describing the conflict.
-     */
-    int (*bdrv_check_perm)(BlockDriverState *bs, uint64_t perm,
-                           uint64_t shared, Error **errp);
-
-    /**
-     * Called to inform the driver that the set of cumulative set of used
-     * permissions for @bs has changed to @perm, and the set of sharable
-     * permission to @shared. The driver can use this to propagate changes to
-     * its children (i.e. request permissions only if a parent actually needs
-     * them).
-     *
-     * This function is only invoked after bdrv_check_perm(), so block drivers
-     * may rely on preparations made in their .bdrv_check_perm implementation.
-     */
-    void (*bdrv_set_perm)(BlockDriverState *bs, uint64_t perm, uint64_t shared);
-
-    /*
-     * Called to inform the driver that after a previous bdrv_check_perm()
-     * call, the permission update is not performed and any preparations made
-     * for it (e.g. taken file locks) need to be undone.
-     *
-     * This function can be called even for nodes that never saw a
-     * bdrv_check_perm() call. It is a no-op then.
-     */
-    void (*bdrv_abort_perm_update)(BlockDriverState *bs);
-
-    /**
-     * Returns in @nperm and @nshared the permissions that the driver for @bs
-     * needs on its child @c, based on the cumulative permissions requested by
-     * the parents in @parent_perm and @parent_shared.
-     *
-     * If @c is NULL, return the permissions for attaching a new child for the
-     * given @child_class and @role.
-     *
-     * If @reopen_queue is non-NULL, don't return the currently needed
-     * permissions, but those that will be needed after applying the
-     * @reopen_queue.
-     */
-     void (*bdrv_child_perm)(BlockDriverState *bs, BdrvChild *c,
-                             BdrvChildRole role,
-                             BlockReopenQueue *reopen_queue,
-                             uint64_t parent_perm, uint64_t parent_shared,
-                             uint64_t *nperm, uint64_t *nshared);
-
     bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
     bool (*bdrv_co_can_store_new_dirty_bitmap)(BlockDriverState *bs,
                                                const char *name,
@@ -653,26 +689,6 @@  struct BlockDriver {
     int (*bdrv_co_remove_persistent_dirty_bitmap)(BlockDriverState *bs,
                                                   const char *name,
                                                   Error **errp);
-
-    /**
-     * Register/unregister a buffer for I/O. For example, when the driver is
-     * interested to know the memory areas that will later be used in iovs, so
-     * that it can do IOMMU mapping with VFIO etc., in order to get better
-     * performance. In the case of VFIO drivers, this callback is used to do
-     * DMA mapping for hot buffers.
-     */
-    void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
-    void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
-    QLIST_ENTRY(BlockDriver) list;
-
-    /*
-     * Pointer to a NULL-terminated array of names of strong options
-     * that can be specified for bdrv_open(). A strong option is one
-     * that changes the data of a BDS.
-     * If this pointer is NULL, the array is considered empty.
-     * "filename" and "driver" are always considered strong.
-     */
-    const char *const *strong_runtime_opts;
 };
 
 static inline bool block_driver_can_compress(BlockDriver *drv)