mbox series

[00/11] mirror: cancel nbd reconnect

Message ID 20201118180433.11931-1-vsementsov@virtuozzo.com (mailing list archive)
Headers show
Series mirror: cancel nbd reconnect | expand

Message

Vladimir Sementsov-Ogievskiy Nov. 18, 2020, 6:04 p.m. UTC
Hi all!

The problem

Assume we have mirror job with nbd target node with enabled reconnect.
Connection failed. So, all current requests to nbd node are waiting for
nbd driver to reconnect. And they will wait for reconnect-delay time
specified in nbd blockdev options. This timeout may be long enough, for
example, we in Virtuozzo use 300 seconds by default.

So, if at this moment user tries to cancel the job, job will wait for
its in-flight requests to finish up to 300 seconds. From the user point
of view, cancelling the job takes a long time. Bad.

Solution

Let's just cancel "waiting for reconnect in in-flight request coroutines"
on mirror (and backup) cancel. Welcome the series below.

Vladimir Sementsov-Ogievskiy (11):
  block: add new BlockDriver handler: bdrv_cancel_in_flight
  block/nbd: implement .bdrv_cancel_in_flight
  block/raw-format: implement .bdrv_cancel_in_flight handler
  job: add .cancel handler for the driver
  block/mirror: implement .cancel job handler
  iotests/264: fix style
  iotests/264: move to python unittest
  iotests.py: qemu_nbd_popen: remove pid file after use
  iotests/264: add mirror-cancel test-case
  block/backup: implement .cancel job handler
  iotests/264: add backup-cancel test-case

 include/block/block.h         |   3 +
 include/block/block_int.h     |   9 +++
 include/qemu/job.h            |   5 ++
 block/backup.c                |  10 +++
 block/io.c                    |  11 +++
 block/mirror.c                |   9 +++
 block/nbd.c                   |  15 ++++
 block/raw-format.c            |   6 ++
 job.c                         |   3 +
 tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
 tests/qemu-iotests/264.out    |  20 ++---
 tests/qemu-iotests/iotests.py |   6 +-
 12 files changed, 173 insertions(+), 67 deletions(-)

Comments

Eric Blake Nov. 18, 2020, 6:19 p.m. UTC | #1
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.

Given that we're past -rc2, I think this is enough on the 'new feature'
side to defer into 6.0 rather than trying to claim as a bug-fix needed
for 5.2-rc3.

That said, the summary does make it sound like a worthwhile thing to add.
Vladimir Sementsov-Ogievskiy Nov. 18, 2020, 6:36 p.m. UTC | #2
18.11.2020 21:19, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
> 
> Given that we're past -rc2, I think this is enough on the 'new feature'
> side to defer into 6.0 rather than trying to claim as a bug-fix needed
> for 5.2-rc3.

Of course.

> 
> That said, the summary does make it sound like a worthwhile thing to add.
>
Vladimir Sementsov-Ogievskiy Dec. 18, 2020, 11:05 a.m. UTC | #3
ping

18.11.2020 21:04, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.
> 
> Vladimir Sementsov-Ogievskiy (11):
>    block: add new BlockDriver handler: bdrv_cancel_in_flight
>    block/nbd: implement .bdrv_cancel_in_flight
>    block/raw-format: implement .bdrv_cancel_in_flight handler
>    job: add .cancel handler for the driver
>    block/mirror: implement .cancel job handler
>    iotests/264: fix style
>    iotests/264: move to python unittest
>    iotests.py: qemu_nbd_popen: remove pid file after use
>    iotests/264: add mirror-cancel test-case
>    block/backup: implement .cancel job handler
>    iotests/264: add backup-cancel test-case
> 
>   include/block/block.h         |   3 +
>   include/block/block_int.h     |   9 +++
>   include/qemu/job.h            |   5 ++
>   block/backup.c                |  10 +++
>   block/io.c                    |  11 +++
>   block/mirror.c                |   9 +++
>   block/nbd.c                   |  15 ++++
>   block/raw-format.c            |   6 ++
>   job.c                         |   3 +
>   tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
>   tests/qemu-iotests/264.out    |  20 ++---
>   tests/qemu-iotests/iotests.py |   6 +-
>   12 files changed, 173 insertions(+), 67 deletions(-)
>
Vladimir Sementsov-Ogievskiy Jan. 9, 2021, 10:11 a.m. UTC | #4
ping ping

18.12.2020 14:05, Vladimir Sementsov-Ogievskiy wrote:
> ping
> 
> 18.11.2020 21:04, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
>>
>> Vladimir Sementsov-Ogievskiy (11):
>>    block: add new BlockDriver handler: bdrv_cancel_in_flight
>>    block/nbd: implement .bdrv_cancel_in_flight
>>    block/raw-format: implement .bdrv_cancel_in_flight handler
>>    job: add .cancel handler for the driver
>>    block/mirror: implement .cancel job handler
>>    iotests/264: fix style
>>    iotests/264: move to python unittest
>>    iotests.py: qemu_nbd_popen: remove pid file after use
>>    iotests/264: add mirror-cancel test-case
>>    block/backup: implement .cancel job handler
>>    iotests/264: add backup-cancel test-case
>>
>>   include/block/block.h         |   3 +
>>   include/block/block_int.h     |   9 +++
>>   include/qemu/job.h            |   5 ++
>>   block/backup.c                |  10 +++
>>   block/io.c                    |  11 +++
>>   block/mirror.c                |   9 +++
>>   block/nbd.c                   |  15 ++++
>>   block/raw-format.c            |   6 ++
>>   job.c                         |   3 +
>>   tests/qemu-iotests/264        | 143 ++++++++++++++++++++++------------
>>   tests/qemu-iotests/264.out    |  20 ++---
>>   tests/qemu-iotests/iotests.py |   6 +-
>>   12 files changed, 173 insertions(+), 67 deletions(-)
>>
> 
>
Eric Blake Jan. 21, 2021, 2:14 a.m. UTC | #5
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The problem
> 
> Assume we have mirror job with nbd target node with enabled reconnect.
> Connection failed. So, all current requests to nbd node are waiting for
> nbd driver to reconnect. And they will wait for reconnect-delay time
> specified in nbd blockdev options. This timeout may be long enough, for
> example, we in Virtuozzo use 300 seconds by default.
> 
> So, if at this moment user tries to cancel the job, job will wait for
> its in-flight requests to finish up to 300 seconds. From the user point
> of view, cancelling the job takes a long time. Bad.
> 
> Solution
> 
> Let's just cancel "waiting for reconnect in in-flight request coroutines"
> on mirror (and backup) cancel. Welcome the series below.

Because of my question on 4/11, I did not queue the entire series yet.
But 6/11 was trivial enough to queue now.
Vladimir Sementsov-Ogievskiy Jan. 21, 2021, 7:44 p.m. UTC | #6
21.01.2021 05:14, Eric Blake wrote:
> On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> The problem
>>
>> Assume we have mirror job with nbd target node with enabled reconnect.
>> Connection failed. So, all current requests to nbd node are waiting for
>> nbd driver to reconnect. And they will wait for reconnect-delay time
>> specified in nbd blockdev options. This timeout may be long enough, for
>> example, we in Virtuozzo use 300 seconds by default.
>>
>> So, if at this moment user tries to cancel the job, job will wait for
>> its in-flight requests to finish up to 300 seconds. From the user point
>> of view, cancelling the job takes a long time. Bad.
>>
>> Solution
>>
>> Let's just cancel "waiting for reconnect in in-flight request coroutines"
>> on mirror (and backup) cancel. Welcome the series below.
> 
> Because of my question on 4/11, I did not queue the entire series yet.
> But 6/11 was trivial enough to queue now.
> 

Thanks!