Message ID | 20250411010732.358817-10-eblake@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Make blockdev-mirror dest sparse in more cases | expand |
On 11.04.25 04:04, Eric Blake wrote: > The 'want_zero' parameter to raw_co_block_status() was added so that > we can avoid potentially time-consuming lseek(SEEK_DATA) calls > throughout the file (working around poor filesystems that have O(n) > rather than O(1) extent probing). But when it comes to learning if a > file is completely sparse (for example, it was just created), always > claiming that a file is all data without even checking offset 0 breaks > what would otherwise be attempts at useful optimizations for a > known-zero mirror destination. > > Note that this allows file-posix to report a file as completely zero > if it was externally created (such as via 'truncate --size=$n file') > as entirely sparse; however, it does NOT work for files created > internally by blockdev-create. That's because blockdev-create > intentionally does a sequence of truncate(0), truncate(size), > allocate_first_block(), in order to make it possible for gluster on > XFS to probe the sector size for direct I/O (which doesn't work if the > first block is sparse). That will be addressed in a later patch. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > block/file-posix.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/block/file-posix.c b/block/file-posix.c > index 56d1972d156..67e83528cf5 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs, > return ret; > } > > - if (!want_zero) { > + /* > + * If want_zero is clear, then the caller wants speed over > + * accuracy, and the only place where SEEK_DATA should be > + * attempted is at the start of the file to learn if the file has > + * any data at all (anywhere else, just blindly claim the entire > + * file is data). > + */ > + if (!want_zero && offset) { > *pnum = bytes; > *map = offset; > *file = bs; Looks like a hack. So we have bdrv_co_is_zero_fast() which do pass want_zero=false to block-status. But in case of mirror, which want to check the whole disk, we actually want want_zero=true, and detect it by offset=0.. Isn't it better to add a kind of bdrv_is_zero_middle_speed() (which means, don't try to read the data to check, but be free to use suboptimal lseek call or something like this), which will pass want_zero=true, and use it from mirror? Mirror case differs from usage in qcow2 exactly by the fact that we call it only once. Another doubt (really weak): can this one extra lseek be so slow, that mirror becomes worse? Den, is it right, that problems about slow lseek (that we experienced several years ago) were about qcow2-internals, and nothing related to mirror itself? May one lseek call on mirror target break something?
On Mon, Apr 14, 2025 at 08:05:21PM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 11.04.25 04:04, Eric Blake wrote: > > The 'want_zero' parameter to raw_co_block_status() was added so that > > we can avoid potentially time-consuming lseek(SEEK_DATA) calls > > throughout the file (working around poor filesystems that have O(n) > > rather than O(1) extent probing). But when it comes to learning if a > > file is completely sparse (for example, it was just created), always > > claiming that a file is all data without even checking offset 0 breaks > > what would otherwise be attempts at useful optimizations for a > > known-zero mirror destination. > > > > Note that this allows file-posix to report a file as completely zero > > if it was externally created (such as via 'truncate --size=$n file') > > as entirely sparse; however, it does NOT work for files created > > internally by blockdev-create. That's because blockdev-create > > intentionally does a sequence of truncate(0), truncate(size), > > allocate_first_block(), in order to make it possible for gluster on > > XFS to probe the sector size for direct I/O (which doesn't work if the > > first block is sparse). That will be addressed in a later patch. > > > > Signed-off-by: Eric Blake <eblake@redhat.com> > > --- > > block/file-posix.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > index 56d1972d156..67e83528cf5 100644 > > --- a/block/file-posix.c > > +++ b/block/file-posix.c > > @@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs, > > return ret; > > } > > > > - if (!want_zero) { > > + /* > > + * If want_zero is clear, then the caller wants speed over > > + * accuracy, and the only place where SEEK_DATA should be > > + * attempted is at the start of the file to learn if the file has > > + * any data at all (anywhere else, just blindly claim the entire > > + * file is data). > > + */ > > + if (!want_zero && offset) { > > *pnum = bytes; > > *map = offset; > > *file = bs; > > Looks like a hack. So we have bdrv_co_is_zero_fast() which do pass want_zero=false to block-status. But in case of mirror, which want to check the whole disk, we actually want want_zero=true, and detect it by offset=0.. > > Isn't it better to add a kind of bdrv_is_zero_middle_speed() (which means, don't try to read the data to check, but be free to use suboptimal lseek call or something like this), which will pass want_zero=true, and use it from mirror? Mirror case differs from usage in qcow2 exactly by the fact that we call it only once. Which is exactly why I wrote patch 4/6 turning the want_zero bool into an enum so that we are being more explicit in WHY block status is being requested. > > > Another doubt (really weak): can this one extra lseek be so slow, that mirror becomes worse? > Den, is it right, that problems about slow lseek (that we experienced several years ago) were about qcow2-internals, and nothing related to mirror itself? May one lseek call on mirror target break something? I'm not worried about one or even two lseek on the destination; what I'm trying to avoid is an lseek on the destination for every chunk where the source is doing a write-zeroes request (we're already doing an lseek on the source as part of the background task of the mirror, but foreground writes being split into the original and the mirror are where it really matters that we aren't going slower just because of the mirror).
diff --git a/block/file-posix.c b/block/file-posix.c index 56d1972d156..67e83528cf5 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs, return ret; } - if (!want_zero) { + /* + * If want_zero is clear, then the caller wants speed over + * accuracy, and the only place where SEEK_DATA should be + * attempted is at the start of the file to learn if the file has + * any data at all (anywhere else, just blindly claim the entire + * file is data). + */ + if (!want_zero && offset) { *pnum = bytes; *map = offset; *file = bs;
The 'want_zero' parameter to raw_co_block_status() was added so that we can avoid potentially time-consuming lseek(SEEK_DATA) calls throughout the file (working around poor filesystems that have O(n) rather than O(1) extent probing). But when it comes to learning if a file is completely sparse (for example, it was just created), always claiming that a file is all data without even checking offset 0 breaks what would otherwise be attempts at useful optimizations for a known-zero mirror destination. Note that this allows file-posix to report a file as completely zero if it was externally created (such as via 'truncate --size=$n file') as entirely sparse; however, it does NOT work for files created internally by blockdev-create. That's because blockdev-create intentionally does a sequence of truncate(0), truncate(size), allocate_first_block(), in order to make it possible for gluster on XFS to probe the sector size for direct I/O (which doesn't work if the first block is sparse). That will be addressed in a later patch. Signed-off-by: Eric Blake <eblake@redhat.com> --- block/file-posix.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)