diff mbox series

[1/4] block: Add bdrv_make_empty()

Message ID 20200428132629.796753-2-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Do not call BlockDriver.bdrv_make_empty() directly | expand

Commit Message

Max Reitz April 28, 2020, 1:26 p.m. UTC
Right now, all users of bdrv_make_empty() call the BlockDriver method
directly.  That is not only bad style, it is also wrong, unless the
caller has a BdrvChild with a WRITE permission.

Introduce bdrv_make_empty() that verifies that it does.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h |  1 +
 block.c               | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

Comments

Eric Blake April 28, 2020, 1:53 p.m. UTC | #1
On 4/28/20 8:26 AM, Max Reitz wrote:
> Right now, all users of bdrv_make_empty() call the BlockDriver method
> directly.  That is not only bad style, it is also wrong, unless the
> caller has a BdrvChild with a WRITE permission.
> 
> Introduce bdrv_make_empty() that verifies that it does.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block.h |  1 +
>   block.c               | 23 +++++++++++++++++++++++
>   2 files changed, 24 insertions(+)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index b05995fe9c..d947fb4080 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts,
>   void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
>   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
>   int bdrv_commit(BlockDriverState *bs);
> +int bdrv_make_empty(BdrvChild *c, Error **errp);

Can we please fix this to take a flags parameter?  I want to make it 
easier for callers to request BDRV_REQ_NO_FALLBACK for distinguishing 
between callers where the image must be made empty (read as all zeroes) 
regardless of time spent, vs. made empty quickly (including if it is 
already all zero) but where the caller is prepared for the operation to 
fail and will write zeroes itself if fast bulk zeroing was not possible.


> +int bdrv_make_empty(BdrvChild *c, Error **errp)
> +{
> +    BlockDriver *drv = c->bs->drv;
> +    int ret;
> +
> +    assert(c->perm & BLK_PERM_WRITE);
> +
> +    if (!drv->bdrv_make_empty) {
> +        error_setg(errp, "%s does not support emptying nodes",
> +                   drv->format_name);
> +        return -ENOTSUP;
> +    }

And here's where we can add some automatic fallbacks, such as 
recognizing if the image already reads as all zeroes.  But those 
optimizations can come in separate patches; for YOUR patch, just getting 
the proper API in place is fine.

> +
> +    ret = drv->bdrv_make_empty(c->bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "Failed to empty %s",
> +                         c->bs->filename);
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> 

Other than a missing flag parameter, this looks fine.
Kevin Wolf April 28, 2020, 2:01 p.m. UTC | #2
Am 28.04.2020 um 15:53 hat Eric Blake geschrieben:
> On 4/28/20 8:26 AM, Max Reitz wrote:
> > Right now, all users of bdrv_make_empty() call the BlockDriver method
> > directly.  That is not only bad style, it is also wrong, unless the
> > caller has a BdrvChild with a WRITE permission.
> > 
> > Introduce bdrv_make_empty() that verifies that it does.
> > 
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >   include/block/block.h |  1 +
> >   block.c               | 23 +++++++++++++++++++++++
> >   2 files changed, 24 insertions(+)
> > 
> > diff --git a/include/block/block.h b/include/block/block.h
> > index b05995fe9c..d947fb4080 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts,
> >   void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
> >   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
> >   int bdrv_commit(BlockDriverState *bs);
> > +int bdrv_make_empty(BdrvChild *c, Error **errp);
> 
> Can we please fix this to take a flags parameter?  I want to make it easier
> for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between
> callers where the image must be made empty (read as all zeroes) regardless
> of time spent, vs. made empty quickly (including if it is already all zero)
> but where the caller is prepared for the operation to fail and will write
> zeroes itself if fast bulk zeroing was not possible.

bdrv_make_empty() is not for making an image read as all zeroes, but to
make it fully unallocated so that the backing file becomes visible.

Are you confusing it with bdrv_make_zero(), which is just a wrapper
around bdrv_pwrite_zeroes() and does take flags?

Kevin
Eric Blake April 28, 2020, 2:07 p.m. UTC | #3
On 4/28/20 9:01 AM, Kevin Wolf wrote:

>> Can we please fix this to take a flags parameter?  I want to make it easier
>> for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between
>> callers where the image must be made empty (read as all zeroes) regardless
>> of time spent, vs. made empty quickly (including if it is already all zero)
>> but where the caller is prepared for the operation to fail and will write
>> zeroes itself if fast bulk zeroing was not possible.
> 
> bdrv_make_empty() is not for making an image read as all zeroes, but to
> make it fully unallocated so that the backing file becomes visible.
> 
> Are you confusing it with bdrv_make_zero(), which is just a wrapper
> around bdrv_pwrite_zeroes() and does take flags?

Yes.  Although now I'm wondering if the two should remain separate or 
should just be a single driver callback where flags can include 
BDRV_REQ_ZERO_WRITE to distinguish whether exposing the backing file vs. 
reading as all zeroes is intended, or if that is merging too much.
Kevin Wolf April 28, 2020, 2:16 p.m. UTC | #4
Am 28.04.2020 um 16:07 hat Eric Blake geschrieben:
> On 4/28/20 9:01 AM, Kevin Wolf wrote:
> 
> > > Can we please fix this to take a flags parameter?  I want to make it easier
> > > for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between
> > > callers where the image must be made empty (read as all zeroes) regardless
> > > of time spent, vs. made empty quickly (including if it is already all zero)
> > > but where the caller is prepared for the operation to fail and will write
> > > zeroes itself if fast bulk zeroing was not possible.
> > 
> > bdrv_make_empty() is not for making an image read as all zeroes, but to
> > make it fully unallocated so that the backing file becomes visible.
> > 
> > Are you confusing it with bdrv_make_zero(), which is just a wrapper
> > around bdrv_pwrite_zeroes() and does take flags?
> 
> Yes.  Although now I'm wondering if the two should remain separate or should
> just be a single driver callback where flags can include BDRV_REQ_ZERO_WRITE
> to distinguish whether exposing the backing file vs. reading as all zeroes
> is intended, or if that is merging too much.

I don't think the implementations for both things are too similar, so
you might just end up having two if branches and then two separate
implementations in the block drivers.

If anything, bdrv_make_empty() is more related to discard than
write_zeroes. But we use the discard code for it in qcow2 only as a
fallback because in the most common cases, making an image completely
empty means effectively just creating an entirely new L1 and refcount
table, which is much faster than individually discarding all clusters.

For bdrv_make_zero() I don't see an opportunity for such optimisations,
so I don't really see a reason to have a separate callback. Unless you
do know one?

Kevin
Kevin Wolf April 28, 2020, 2:21 p.m. UTC | #5
Am 28.04.2020 um 15:26 hat Max Reitz geschrieben:
> Right now, all users of bdrv_make_empty() call the BlockDriver method
> directly.  That is not only bad style, it is also wrong, unless the
> caller has a BdrvChild with a WRITE permission.
> 
> Introduce bdrv_make_empty() that verifies that it does.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  include/block/block.h |  1 +
>  block.c               | 23 +++++++++++++++++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index b05995fe9c..d947fb4080 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts,
>  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
>  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
>  int bdrv_commit(BlockDriverState *bs);
> +int bdrv_make_empty(BdrvChild *c, Error **errp);
>  int bdrv_change_backing_file(BlockDriverState *bs,
>      const char *backing_file, const char *backing_fmt);
>  void bdrv_register(BlockDriver *bdrv);
> diff --git a/block.c b/block.c
> index 2e3905c99e..b0d5b98617 100644
> --- a/block.c
> +++ b/block.c
> @@ -6791,3 +6791,26 @@ void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp)
>  
>      parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
>  }
> +
> +int bdrv_make_empty(BdrvChild *c, Error **errp)
> +{
> +    BlockDriver *drv = c->bs->drv;
> +    int ret;
> +
> +    assert(c->perm & BLK_PERM_WRITE);

If I understand correctly, bdrv_make_empty() is called to drop an
overlay whose content is identical to what it would read from its
backing file (in particular after a commit operation). This means that
the caller promises that the visible content doesn't change.

So should we check BLK_PERM_WRITE_UNCHANGED instead?

Kevin
Eric Blake April 28, 2020, 2:25 p.m. UTC | #6
On 4/28/20 9:16 AM, Kevin Wolf wrote:

>>
>> Yes.  Although now I'm wondering if the two should remain separate or should
>> just be a single driver callback where flags can include BDRV_REQ_ZERO_WRITE
>> to distinguish whether exposing the backing file vs. reading as all zeroes
>> is intended, or if that is merging too much.
> 
> I don't think the implementations for both things are too similar, so
> you might just end up having two if branches and then two separate
> implementations in the block drivers.
> 

Yeah, the more I think about it, the more two callbacks still make 
sense.  .bdrv_make_empty may or may not need a flag, but .bdrv_make_zero 
definitely does (because that's where we want a difference between 
making the entire image zero no matter the delay, vs. only making it all 
zero if it is is fast).

> If anything, bdrv_make_empty() is more related to discard than
> write_zeroes. But we use the discard code for it in qcow2 only as a
> fallback because in the most common cases, making an image completely
> empty means effectively just creating an entirely new L1 and refcount
> table, which is much faster than individually discarding all clusters.
> 
> For bdrv_make_zero() I don't see an opportunity for such optimisations,
> so I don't really see a reason to have a separate callback. Unless you
> do know one?

The optimization I have in mind is adding a qcow2 autoclear bit to track 
when an image is known to read as all zero - at which point 
.bdrv_make_zero instantly returns success.  For raw files, a possible 
optimization is to truncate to size 0 and then back to the full size, 
when it is known that truncation forces read-as-zero.  For NBD, I'm 
still playing with whether adding new 64-bit transactions for a bulk 
zero will be useful, and even if not, maybe special-casing 
NBD_CMD_WRITE_ZEROES with a size of 0 to do a bulk zero, if both server 
and client negotiated that particular meaning.
Max Reitz April 29, 2020, 7:39 a.m. UTC | #7
On 28.04.20 16:21, Kevin Wolf wrote:
> Am 28.04.2020 um 15:26 hat Max Reitz geschrieben:
>> Right now, all users of bdrv_make_empty() call the BlockDriver method
>> directly.  That is not only bad style, it is also wrong, unless the
>> caller has a BdrvChild with a WRITE permission.
>>
>> Introduce bdrv_make_empty() that verifies that it does.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  include/block/block.h |  1 +
>>  block.c               | 23 +++++++++++++++++++++++
>>  2 files changed, 24 insertions(+)
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index b05995fe9c..d947fb4080 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts,
>>  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
>>  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
>>  int bdrv_commit(BlockDriverState *bs);
>> +int bdrv_make_empty(BdrvChild *c, Error **errp);
>>  int bdrv_change_backing_file(BlockDriverState *bs,
>>      const char *backing_file, const char *backing_fmt);
>>  void bdrv_register(BlockDriver *bdrv);
>> diff --git a/block.c b/block.c
>> index 2e3905c99e..b0d5b98617 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6791,3 +6791,26 @@ void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp)
>>  
>>      parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
>>  }
>> +
>> +int bdrv_make_empty(BdrvChild *c, Error **errp)
>> +{
>> +    BlockDriver *drv = c->bs->drv;
>> +    int ret;
>> +
>> +    assert(c->perm & BLK_PERM_WRITE);
> 
> If I understand correctly, bdrv_make_empty() is called to drop an
> overlay whose content is identical to what it would read from its
> backing file (in particular after a commit operation). This means that
> the caller promises that the visible content doesn't change.
> 
> So should we check BLK_PERM_WRITE_UNCHANGED instead?

Ah, right.  Yes, that would be better.  (Or to check both, whether any
of them has been taken.)

Max
diff mbox series

Patch

diff --git a/include/block/block.h b/include/block/block.h
index b05995fe9c..d947fb4080 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -351,6 +351,7 @@  BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts,
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
+int bdrv_make_empty(BdrvChild *c, Error **errp);
 int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
diff --git a/block.c b/block.c
index 2e3905c99e..b0d5b98617 100644
--- a/block.c
+++ b/block.c
@@ -6791,3 +6791,26 @@  void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp)
 
     parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
 }
+
+int bdrv_make_empty(BdrvChild *c, Error **errp)
+{
+    BlockDriver *drv = c->bs->drv;
+    int ret;
+
+    assert(c->perm & BLK_PERM_WRITE);
+
+    if (!drv->bdrv_make_empty) {
+        error_setg(errp, "%s does not support emptying nodes",
+                   drv->format_name);
+        return -ENOTSUP;
+    }
+
+    ret = drv->bdrv_make_empty(c->bs);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to empty %s",
+                         c->bs->filename);
+        return ret;
+    }
+
+    return 0;
+}