[5/6] block/nbd: Do not force-cap *pnum

Message ID	20210617155247.442150-6-mreitz@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Imlp=LL=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A5DF61351 From: Max Reitz <mreitz@redhat.com> To: qemu-block@nongnu.org Subject: [PATCH 5/6] block/nbd: Do not force-cap *pnum Date: Thu, 17 Jun 2021 17:52:46 +0200 Message-Id: <20210617155247.442150-6-mreitz@redhat.com> In-Reply-To: <20210617155247.442150-1-mreitz@redhat.com> References: <20210617155247.442150-1-mreitz@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Received-SPF: pass client-ip=170.10.133.124; envelope-from=mreitz@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -29 X-Spam_score: -3.0 X-Spam_bar: --- X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.197, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	block: block-status cache for data regions \| expand [0/6] block: block-status cache for data regions [1/6] block: Drop BDS comment regarding bdrv_append() [2/6] block: block-status cache for data regions [3/6] block/file-posix: Do not force-cap pnum [4/6] block/gluster: Do not force-cap pnum [5/6] block/nbd: Do not force-cap pnum [6/6] block/iscsi: Do not force-cap pnum

Max Reitz June 17, 2021, 3:52 p.m. UTC

bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/nbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Eric Blake June 18, 2021, 8:20 p.m. UTC | #1

On Thu, Jun 17, 2021 at 05:52:46PM +0200, Max Reitz wrote:
> bdrv_co_block_status() does it for us, we do not need to do it here.
> 
> The advantage of not capping *pnum is that bdrv_co_block_status() can
> cache larger data regions than requested by its caller.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/nbd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 616f9ae6c4..930bd234de 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -1702,7 +1702,7 @@ static int coroutine_fn nbd_client_co_block_status(
>          .type = NBD_CMD_BLOCK_STATUS,
>          .from = offset,
>          .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
> -                   MIN(bytes, s->info.size - offset)),
> +                   s->info.size - offset),
>          .flags = NBD_CMD_FLAG_REQ_ONE,

I'd love to someday get rid of using NBD_CMD_FLAG_REQ_ONE (so the
server can reply with more extents in one go), but that's a bigger
task and unrelated to your block-layer cache.

Vladimir Sementsov-Ogievskiy June 19, 2021, 10:53 a.m. UTC | #2

17.06.2021 18:52, Max Reitz wrote:
> bdrv_co_block_status() does it for us, we do not need to do it here.
> 
> The advantage of not capping *pnum is that bdrv_co_block_status() can
> cache larger data regions than requested by its caller.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/nbd.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 616f9ae6c4..930bd234de 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -1702,7 +1702,7 @@ static int coroutine_fn nbd_client_co_block_status(
>           .type = NBD_CMD_BLOCK_STATUS,
>           .from = offset,
>           .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
> -                   MIN(bytes, s->info.size - offset)),
> +                   s->info.size - offset),
>           .flags = NBD_CMD_FLAG_REQ_ONE,
>       };
>   
> 

Hmm..

I don't that this change is correct. In contrast with file-posix you don't get extra information for free, you just make a larger request. This means that server will have to do more work.

(look at blockstatus_to_extents, it calls bdrv_block_status_above in a loop).

For example, assume that nbd export is a qcow2 image with all clusters allocated. With this change, nbd server will loop through the whole qcow2 image, load all L2 tables to return big allocated extent.

So, only server can decide, could it add some extra free information to request or not. But unfortunately NBD_CMD_FLAG_REQ_ONE doesn't allow it.

Vladimir Sementsov-Ogievskiy June 19, 2021, 11:12 a.m. UTC | #3

18.06.2021 23:20, Eric Blake wrote:
> On Thu, Jun 17, 2021 at 05:52:46PM +0200, Max Reitz wrote:
>> bdrv_co_block_status() does it for us, we do not need to do it here.
>>
>> The advantage of not capping *pnum is that bdrv_co_block_status() can
>> cache larger data regions than requested by its caller.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/nbd.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 616f9ae6c4..930bd234de 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -1702,7 +1702,7 @@ static int coroutine_fn nbd_client_co_block_status(
>>           .type = NBD_CMD_BLOCK_STATUS,
>>           .from = offset,
>>           .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
>> -                   MIN(bytes, s->info.size - offset)),
>> +                   s->info.size - offset),
>>           .flags = NBD_CMD_FLAG_REQ_ONE,
> 
> I'd love to someday get rid of using NBD_CMD_FLAG_REQ_ONE (so the
> server can reply with more extents in one go), but that's a bigger
> task and unrelated to your block-layer cache.
> 

I think for this to work, the generic block_status should be updated so we can work with several extents in one go.

Max Reitz June 21, 2021, 9:50 a.m. UTC | #4

On 19.06.21 12:53, Vladimir Sementsov-Ogievskiy wrote:
> 17.06.2021 18:52, Max Reitz wrote:
>> bdrv_co_block_status() does it for us, we do not need to do it here.
>>
>> The advantage of not capping *pnum is that bdrv_co_block_status() can
>> cache larger data regions than requested by its caller.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/nbd.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 616f9ae6c4..930bd234de 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -1702,7 +1702,7 @@ static int coroutine_fn 
>> nbd_client_co_block_status(
>>           .type = NBD_CMD_BLOCK_STATUS,
>>           .from = offset,
>>           .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
>> -                   MIN(bytes, s->info.size - offset)),
>> +                   s->info.size - offset),
>>           .flags = NBD_CMD_FLAG_REQ_ONE,
>>       };
>>
>
> Hmm..
>
> I don't that this change is correct. In contrast with file-posix you 
> don't get extra information for free, you just make a larger request. 
> This means that server will have to do more work.

Oh, oops.  Seems I was blind in my rage to replace this MIN() pattern.

You’re absolutely right.  So this patch should be dropped.

Max

> (look at blockstatus_to_extents, it calls bdrv_block_status_above in a 
> loop).
>
> For example, assume that nbd export is a qcow2 image with all clusters 
> allocated. With this change, nbd server will loop through the whole 
> qcow2 image, load all L2 tables to return big allocated extent.
>
> So, only server can decide, could it add some extra free information 
> to request or not. But unfortunately NBD_CMD_FLAG_REQ_ONE doesn't 
> allow it.
>

Eric Blake June 21, 2021, 6:53 p.m. UTC | #5

On Sat, Jun 19, 2021 at 01:53:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > +++ b/block/nbd.c
> > @@ -1702,7 +1702,7 @@ static int coroutine_fn nbd_client_co_block_status(
> >           .type = NBD_CMD_BLOCK_STATUS,
> >           .from = offset,
> >           .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
> > -                   MIN(bytes, s->info.size - offset)),
> > +                   s->info.size - offset),
> >           .flags = NBD_CMD_FLAG_REQ_ONE,
> >       };
> > 
> 
> Hmm..
> 
> I don't that this change is correct. In contrast with file-posix you don't get extra information for free, you just make a larger request. This means that server will have to do more work.

Not necessarily.  The fact that we have passed NBD_CMD_FLAG_REQ_ONE
means that the server is still only allowed to give us one extent in
its answer, and that it may not give us information beyond the length
we requested.  You are right that if we lose the REQ_ONE flag we may
result in the server doing more work to provide us additional extents
that we will then be ignoring because we aren't yet set up for
avoiding REQ_ONE.  Fixing that is a longer-term goal.  But in the
short term, I see no harm in giving a larger length to the server with
REQ_ONE.

> 
> (look at blockstatus_to_extents, it calls bdrv_block_status_above in a loop).
> 
> For example, assume that nbd export is a qcow2 image with all clusters allocated. With this change, nbd server will loop through the whole qcow2 image, load all L2 tables to return big allocated extent.

No, the server is allowed to reply with less length than our request,
and that is particularly true if the server does NOT have free access
to the full length of our request.  In the case of qcow2, since
bdrv_block_status is (by current design) clamped at cluster
boundaries, requesting a 4G length will NOT increase the amount of the
server response any further than the first cluster boundary (that is,
the point where the server no longer has free access to status without
loading another cluster of L2 entries).

> 
> So, only server can decide, could it add some extra free information to request or not. But unfortunately NBD_CMD_FLAG_REQ_ONE doesn't allow it.

What the flag prohibits is the server giving us more information than
the length we requested.  But this patch is increasing our request
length for the case where the server CAN give us more information than
we need locally, on the hopes that even though the server can only
reply with one extent, we aren't wasting as many network
back-and-forth trips when a larger request would have worked.

Eric Blake June 21, 2021, 6:54 p.m. UTC | #6

On Mon, Jun 21, 2021 at 11:50:02AM +0200, Max Reitz wrote:
> > I don't that this change is correct. In contrast with file-posix you
> > don't get extra information for free, you just make a larger request.
> > This means that server will have to do more work.
> 
> Oh, oops.  Seems I was blind in my rage to replace this MIN() pattern.
> 
> You’re absolutely right.  So this patch should be dropped.

I disagree - I think ths patch is still correct, as written, _because_
we use the REQ_ONE flag.

Vladimir Sementsov-Ogievskiy June 22, 2021, 9:07 a.m. UTC | #7

21.06.2021 21:53, Eric Blake wrote:
> On Sat, Jun 19, 2021 at 01:53:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> +++ b/block/nbd.c
>>> @@ -1702,7 +1702,7 @@ static int coroutine_fn nbd_client_co_block_status(
>>>            .type = NBD_CMD_BLOCK_STATUS,
>>>            .from = offset,
>>>            .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
>>> -                   MIN(bytes, s->info.size - offset)),
>>> +                   s->info.size - offset),
>>>            .flags = NBD_CMD_FLAG_REQ_ONE,
>>>        };
>>>
>>
>> Hmm..
>>
>> I don't that this change is correct. In contrast with file-posix you don't get extra information for free, you just make a larger request. This means that server will have to do more work.
> 
> Not necessarily.  The fact that we have passed NBD_CMD_FLAG_REQ_ONE
> means that the server is still only allowed to give us one extent in
> its answer, and that it may not give us information beyond the length
> we requested.  You are right that if we lose the REQ_ONE flag we may
> result in the server doing more work to provide us additional extents
> that we will then be ignoring because we aren't yet set up for
> avoiding REQ_ONE.  Fixing that is a longer-term goal.  But in the
> short term, I see no harm in giving a larger length to the server with
> REQ_ONE.
> 
>>
>> (look at blockstatus_to_extents, it calls bdrv_block_status_above in a loop).
>>
>> For example, assume that nbd export is a qcow2 image with all clusters allocated. With this change, nbd server will loop through the whole qcow2 image, load all L2 tables to return big allocated extent.
> 
> No, the server is allowed to reply with less length than our request,
> and that is particularly true if the server does NOT have free access
> to the full length of our request.  In the case of qcow2, since
> bdrv_block_status is (by current design) clamped at cluster
> boundaries, requesting a 4G length will NOT increase the amount of the
> server response any further than the first cluster boundary (that is,
> the point where the server no longer has free access to status without
> loading another cluster of L2 entries).

No. No matter where bdrv_block_status_above is clamped. If the whole disk is allocated, blockstatus_to_extents() in nbd/server.c will loop through the whole requested range and merge all the information into one extent. This doesn't violate NBD_CMD_FLAG_REQ_ONE: we have one extent on output and don't go beyound the length. It's valid for the server to try to satisfy as much as possible of request, and blockstatus_to_extents works in this way currently.

Remember that nbd_extent_array_add() can merge new extent to the previous if it has the same type.

> 
>>
>> So, only server can decide, could it add some extra free information to request or not. But unfortunately NBD_CMD_FLAG_REQ_ONE doesn't allow it.
> 
> What the flag prohibits is the server giving us more information than
> the length we requested.  But this patch is increasing our request
> length for the case where the server CAN give us more information than
> we need locally, on the hopes that even though the server can only
> reply with one extent, we aren't wasting as many network
> back-and-forth trips when a larger request would have worked.
>

[5/6] block/nbd: Do not force-cap *pnum

Commit Message

Comments

Patch