Message ID | 20200428132629.796753-2-mreitz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Do not call BlockDriver.bdrv_make_empty() directly | expand |
On 4/28/20 8:26 AM, Max Reitz wrote: > Right now, all users of bdrv_make_empty() call the BlockDriver method > directly. That is not only bad style, it is also wrong, unless the > caller has a BdrvChild with a WRITE permission. > > Introduce bdrv_make_empty() that verifies that it does. > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > include/block/block.h | 1 + > block.c | 23 +++++++++++++++++++++++ > 2 files changed, 24 insertions(+) > > diff --git a/include/block/block.h b/include/block/block.h > index b05995fe9c..d947fb4080 100644 > --- a/include/block/block.h > +++ b/include/block/block.h > @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts, > void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr); > void bdrv_refresh_limits(BlockDriverState *bs, Error **errp); > int bdrv_commit(BlockDriverState *bs); > +int bdrv_make_empty(BdrvChild *c, Error **errp); Can we please fix this to take a flags parameter? I want to make it easier for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between callers where the image must be made empty (read as all zeroes) regardless of time spent, vs. made empty quickly (including if it is already all zero) but where the caller is prepared for the operation to fail and will write zeroes itself if fast bulk zeroing was not possible. > +int bdrv_make_empty(BdrvChild *c, Error **errp) > +{ > + BlockDriver *drv = c->bs->drv; > + int ret; > + > + assert(c->perm & BLK_PERM_WRITE); > + > + if (!drv->bdrv_make_empty) { > + error_setg(errp, "%s does not support emptying nodes", > + drv->format_name); > + return -ENOTSUP; > + } And here's where we can add some automatic fallbacks, such as recognizing if the image already reads as all zeroes. But those optimizations can come in separate patches; for YOUR patch, just getting the proper API in place is fine. > + > + ret = drv->bdrv_make_empty(c->bs); > + if (ret < 0) { > + error_setg_errno(errp, -ret, "Failed to empty %s", > + c->bs->filename); > + return ret; > + } > + > + return 0; > +} > Other than a missing flag parameter, this looks fine.
Am 28.04.2020 um 15:53 hat Eric Blake geschrieben: > On 4/28/20 8:26 AM, Max Reitz wrote: > > Right now, all users of bdrv_make_empty() call the BlockDriver method > > directly. That is not only bad style, it is also wrong, unless the > > caller has a BdrvChild with a WRITE permission. > > > > Introduce bdrv_make_empty() that verifies that it does. > > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > --- > > include/block/block.h | 1 + > > block.c | 23 +++++++++++++++++++++++ > > 2 files changed, 24 insertions(+) > > > > diff --git a/include/block/block.h b/include/block/block.h > > index b05995fe9c..d947fb4080 100644 > > --- a/include/block/block.h > > +++ b/include/block/block.h > > @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts, > > void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr); > > void bdrv_refresh_limits(BlockDriverState *bs, Error **errp); > > int bdrv_commit(BlockDriverState *bs); > > +int bdrv_make_empty(BdrvChild *c, Error **errp); > > Can we please fix this to take a flags parameter? I want to make it easier > for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between > callers where the image must be made empty (read as all zeroes) regardless > of time spent, vs. made empty quickly (including if it is already all zero) > but where the caller is prepared for the operation to fail and will write > zeroes itself if fast bulk zeroing was not possible. bdrv_make_empty() is not for making an image read as all zeroes, but to make it fully unallocated so that the backing file becomes visible. Are you confusing it with bdrv_make_zero(), which is just a wrapper around bdrv_pwrite_zeroes() and does take flags? Kevin
On 4/28/20 9:01 AM, Kevin Wolf wrote: >> Can we please fix this to take a flags parameter? I want to make it easier >> for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between >> callers where the image must be made empty (read as all zeroes) regardless >> of time spent, vs. made empty quickly (including if it is already all zero) >> but where the caller is prepared for the operation to fail and will write >> zeroes itself if fast bulk zeroing was not possible. > > bdrv_make_empty() is not for making an image read as all zeroes, but to > make it fully unallocated so that the backing file becomes visible. > > Are you confusing it with bdrv_make_zero(), which is just a wrapper > around bdrv_pwrite_zeroes() and does take flags? Yes. Although now I'm wondering if the two should remain separate or should just be a single driver callback where flags can include BDRV_REQ_ZERO_WRITE to distinguish whether exposing the backing file vs. reading as all zeroes is intended, or if that is merging too much.
Am 28.04.2020 um 16:07 hat Eric Blake geschrieben: > On 4/28/20 9:01 AM, Kevin Wolf wrote: > > > > Can we please fix this to take a flags parameter? I want to make it easier > > > for callers to request BDRV_REQ_NO_FALLBACK for distinguishing between > > > callers where the image must be made empty (read as all zeroes) regardless > > > of time spent, vs. made empty quickly (including if it is already all zero) > > > but where the caller is prepared for the operation to fail and will write > > > zeroes itself if fast bulk zeroing was not possible. > > > > bdrv_make_empty() is not for making an image read as all zeroes, but to > > make it fully unallocated so that the backing file becomes visible. > > > > Are you confusing it with bdrv_make_zero(), which is just a wrapper > > around bdrv_pwrite_zeroes() and does take flags? > > Yes. Although now I'm wondering if the two should remain separate or should > just be a single driver callback where flags can include BDRV_REQ_ZERO_WRITE > to distinguish whether exposing the backing file vs. reading as all zeroes > is intended, or if that is merging too much. I don't think the implementations for both things are too similar, so you might just end up having two if branches and then two separate implementations in the block drivers. If anything, bdrv_make_empty() is more related to discard than write_zeroes. But we use the discard code for it in qcow2 only as a fallback because in the most common cases, making an image completely empty means effectively just creating an entirely new L1 and refcount table, which is much faster than individually discarding all clusters. For bdrv_make_zero() I don't see an opportunity for such optimisations, so I don't really see a reason to have a separate callback. Unless you do know one? Kevin
Am 28.04.2020 um 15:26 hat Max Reitz geschrieben: > Right now, all users of bdrv_make_empty() call the BlockDriver method > directly. That is not only bad style, it is also wrong, unless the > caller has a BdrvChild with a WRITE permission. > > Introduce bdrv_make_empty() that verifies that it does. > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > include/block/block.h | 1 + > block.c | 23 +++++++++++++++++++++++ > 2 files changed, 24 insertions(+) > > diff --git a/include/block/block.h b/include/block/block.h > index b05995fe9c..d947fb4080 100644 > --- a/include/block/block.h > +++ b/include/block/block.h > @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts, > void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr); > void bdrv_refresh_limits(BlockDriverState *bs, Error **errp); > int bdrv_commit(BlockDriverState *bs); > +int bdrv_make_empty(BdrvChild *c, Error **errp); > int bdrv_change_backing_file(BlockDriverState *bs, > const char *backing_file, const char *backing_fmt); > void bdrv_register(BlockDriver *bdrv); > diff --git a/block.c b/block.c > index 2e3905c99e..b0d5b98617 100644 > --- a/block.c > +++ b/block.c > @@ -6791,3 +6791,26 @@ void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp) > > parent_bs->drv->bdrv_del_child(parent_bs, child, errp); > } > + > +int bdrv_make_empty(BdrvChild *c, Error **errp) > +{ > + BlockDriver *drv = c->bs->drv; > + int ret; > + > + assert(c->perm & BLK_PERM_WRITE); If I understand correctly, bdrv_make_empty() is called to drop an overlay whose content is identical to what it would read from its backing file (in particular after a commit operation). This means that the caller promises that the visible content doesn't change. So should we check BLK_PERM_WRITE_UNCHANGED instead? Kevin
On 4/28/20 9:16 AM, Kevin Wolf wrote: >> >> Yes. Although now I'm wondering if the two should remain separate or should >> just be a single driver callback where flags can include BDRV_REQ_ZERO_WRITE >> to distinguish whether exposing the backing file vs. reading as all zeroes >> is intended, or if that is merging too much. > > I don't think the implementations for both things are too similar, so > you might just end up having two if branches and then two separate > implementations in the block drivers. > Yeah, the more I think about it, the more two callbacks still make sense. .bdrv_make_empty may or may not need a flag, but .bdrv_make_zero definitely does (because that's where we want a difference between making the entire image zero no matter the delay, vs. only making it all zero if it is is fast). > If anything, bdrv_make_empty() is more related to discard than > write_zeroes. But we use the discard code for it in qcow2 only as a > fallback because in the most common cases, making an image completely > empty means effectively just creating an entirely new L1 and refcount > table, which is much faster than individually discarding all clusters. > > For bdrv_make_zero() I don't see an opportunity for such optimisations, > so I don't really see a reason to have a separate callback. Unless you > do know one? The optimization I have in mind is adding a qcow2 autoclear bit to track when an image is known to read as all zero - at which point .bdrv_make_zero instantly returns success. For raw files, a possible optimization is to truncate to size 0 and then back to the full size, when it is known that truncation forces read-as-zero. For NBD, I'm still playing with whether adding new 64-bit transactions for a bulk zero will be useful, and even if not, maybe special-casing NBD_CMD_WRITE_ZEROES with a size of 0 to do a bulk zero, if both server and client negotiated that particular meaning.
On 28.04.20 16:21, Kevin Wolf wrote: > Am 28.04.2020 um 15:26 hat Max Reitz geschrieben: >> Right now, all users of bdrv_make_empty() call the BlockDriver method >> directly. That is not only bad style, it is also wrong, unless the >> caller has a BdrvChild with a WRITE permission. >> >> Introduce bdrv_make_empty() that verifies that it does. >> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> include/block/block.h | 1 + >> block.c | 23 +++++++++++++++++++++++ >> 2 files changed, 24 insertions(+) >> >> diff --git a/include/block/block.h b/include/block/block.h >> index b05995fe9c..d947fb4080 100644 >> --- a/include/block/block.h >> +++ b/include/block/block.h >> @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts, >> void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr); >> void bdrv_refresh_limits(BlockDriverState *bs, Error **errp); >> int bdrv_commit(BlockDriverState *bs); >> +int bdrv_make_empty(BdrvChild *c, Error **errp); >> int bdrv_change_backing_file(BlockDriverState *bs, >> const char *backing_file, const char *backing_fmt); >> void bdrv_register(BlockDriver *bdrv); >> diff --git a/block.c b/block.c >> index 2e3905c99e..b0d5b98617 100644 >> --- a/block.c >> +++ b/block.c >> @@ -6791,3 +6791,26 @@ void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp) >> >> parent_bs->drv->bdrv_del_child(parent_bs, child, errp); >> } >> + >> +int bdrv_make_empty(BdrvChild *c, Error **errp) >> +{ >> + BlockDriver *drv = c->bs->drv; >> + int ret; >> + >> + assert(c->perm & BLK_PERM_WRITE); > > If I understand correctly, bdrv_make_empty() is called to drop an > overlay whose content is identical to what it would read from its > backing file (in particular after a commit operation). This means that > the caller promises that the visible content doesn't change. > > So should we check BLK_PERM_WRITE_UNCHANGED instead? Ah, right. Yes, that would be better. (Or to check both, whether any of them has been taken.) Max
diff --git a/include/block/block.h b/include/block/block.h index b05995fe9c..d947fb4080 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -351,6 +351,7 @@ BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts, void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr); void bdrv_refresh_limits(BlockDriverState *bs, Error **errp); int bdrv_commit(BlockDriverState *bs); +int bdrv_make_empty(BdrvChild *c, Error **errp); int bdrv_change_backing_file(BlockDriverState *bs, const char *backing_file, const char *backing_fmt); void bdrv_register(BlockDriver *bdrv); diff --git a/block.c b/block.c index 2e3905c99e..b0d5b98617 100644 --- a/block.c +++ b/block.c @@ -6791,3 +6791,26 @@ void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp) parent_bs->drv->bdrv_del_child(parent_bs, child, errp); } + +int bdrv_make_empty(BdrvChild *c, Error **errp) +{ + BlockDriver *drv = c->bs->drv; + int ret; + + assert(c->perm & BLK_PERM_WRITE); + + if (!drv->bdrv_make_empty) { + error_setg(errp, "%s does not support emptying nodes", + drv->format_name); + return -ENOTSUP; + } + + ret = drv->bdrv_make_empty(c->bs); + if (ret < 0) { + error_setg_errno(errp, -ret, "Failed to empty %s", + c->bs->filename); + return ret; + } + + return 0; +}
Right now, all users of bdrv_make_empty() call the BlockDriver method directly. That is not only bad style, it is also wrong, unless the caller has a BdrvChild with a WRITE permission. Introduce bdrv_make_empty() that verifies that it does. Signed-off-by: Max Reitz <mreitz@redhat.com> --- include/block/block.h | 1 + block.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+)