Message ID | 20201118180433.11931-1-vsementsov@virtuozzo.com (mailing list archive) |
---|---|
Headers | show |
Series | mirror: cancel nbd reconnect | expand |
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > The problem > > Assume we have mirror job with nbd target node with enabled reconnect. > Connection failed. So, all current requests to nbd node are waiting for > nbd driver to reconnect. And they will wait for reconnect-delay time > specified in nbd blockdev options. This timeout may be long enough, for > example, we in Virtuozzo use 300 seconds by default. > > So, if at this moment user tries to cancel the job, job will wait for > its in-flight requests to finish up to 300 seconds. From the user point > of view, cancelling the job takes a long time. Bad. > > Solution > > Let's just cancel "waiting for reconnect in in-flight request coroutines" > on mirror (and backup) cancel. Welcome the series below. Given that we're past -rc2, I think this is enough on the 'new feature' side to defer into 6.0 rather than trying to claim as a bug-fix needed for 5.2-rc3. That said, the summary does make it sound like a worthwhile thing to add.
18.11.2020 21:19, Eric Blake wrote: > On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote: >> Hi all! >> >> The problem >> >> Assume we have mirror job with nbd target node with enabled reconnect. >> Connection failed. So, all current requests to nbd node are waiting for >> nbd driver to reconnect. And they will wait for reconnect-delay time >> specified in nbd blockdev options. This timeout may be long enough, for >> example, we in Virtuozzo use 300 seconds by default. >> >> So, if at this moment user tries to cancel the job, job will wait for >> its in-flight requests to finish up to 300 seconds. From the user point >> of view, cancelling the job takes a long time. Bad. >> >> Solution >> >> Let's just cancel "waiting for reconnect in in-flight request coroutines" >> on mirror (and backup) cancel. Welcome the series below. > > Given that we're past -rc2, I think this is enough on the 'new feature' > side to defer into 6.0 rather than trying to claim as a bug-fix needed > for 5.2-rc3. Of course. > > That said, the summary does make it sound like a worthwhile thing to add. >
ping 18.11.2020 21:04, Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > The problem > > Assume we have mirror job with nbd target node with enabled reconnect. > Connection failed. So, all current requests to nbd node are waiting for > nbd driver to reconnect. And they will wait for reconnect-delay time > specified in nbd blockdev options. This timeout may be long enough, for > example, we in Virtuozzo use 300 seconds by default. > > So, if at this moment user tries to cancel the job, job will wait for > its in-flight requests to finish up to 300 seconds. From the user point > of view, cancelling the job takes a long time. Bad. > > Solution > > Let's just cancel "waiting for reconnect in in-flight request coroutines" > on mirror (and backup) cancel. Welcome the series below. > > Vladimir Sementsov-Ogievskiy (11): > block: add new BlockDriver handler: bdrv_cancel_in_flight > block/nbd: implement .bdrv_cancel_in_flight > block/raw-format: implement .bdrv_cancel_in_flight handler > job: add .cancel handler for the driver > block/mirror: implement .cancel job handler > iotests/264: fix style > iotests/264: move to python unittest > iotests.py: qemu_nbd_popen: remove pid file after use > iotests/264: add mirror-cancel test-case > block/backup: implement .cancel job handler > iotests/264: add backup-cancel test-case > > include/block/block.h | 3 + > include/block/block_int.h | 9 +++ > include/qemu/job.h | 5 ++ > block/backup.c | 10 +++ > block/io.c | 11 +++ > block/mirror.c | 9 +++ > block/nbd.c | 15 ++++ > block/raw-format.c | 6 ++ > job.c | 3 + > tests/qemu-iotests/264 | 143 ++++++++++++++++++++++------------ > tests/qemu-iotests/264.out | 20 ++--- > tests/qemu-iotests/iotests.py | 6 +- > 12 files changed, 173 insertions(+), 67 deletions(-) >
ping ping 18.12.2020 14:05, Vladimir Sementsov-Ogievskiy wrote: > ping > > 18.11.2020 21:04, Vladimir Sementsov-Ogievskiy wrote: >> Hi all! >> >> The problem >> >> Assume we have mirror job with nbd target node with enabled reconnect. >> Connection failed. So, all current requests to nbd node are waiting for >> nbd driver to reconnect. And they will wait for reconnect-delay time >> specified in nbd blockdev options. This timeout may be long enough, for >> example, we in Virtuozzo use 300 seconds by default. >> >> So, if at this moment user tries to cancel the job, job will wait for >> its in-flight requests to finish up to 300 seconds. From the user point >> of view, cancelling the job takes a long time. Bad. >> >> Solution >> >> Let's just cancel "waiting for reconnect in in-flight request coroutines" >> on mirror (and backup) cancel. Welcome the series below. >> >> Vladimir Sementsov-Ogievskiy (11): >> block: add new BlockDriver handler: bdrv_cancel_in_flight >> block/nbd: implement .bdrv_cancel_in_flight >> block/raw-format: implement .bdrv_cancel_in_flight handler >> job: add .cancel handler for the driver >> block/mirror: implement .cancel job handler >> iotests/264: fix style >> iotests/264: move to python unittest >> iotests.py: qemu_nbd_popen: remove pid file after use >> iotests/264: add mirror-cancel test-case >> block/backup: implement .cancel job handler >> iotests/264: add backup-cancel test-case >> >> include/block/block.h | 3 + >> include/block/block_int.h | 9 +++ >> include/qemu/job.h | 5 ++ >> block/backup.c | 10 +++ >> block/io.c | 11 +++ >> block/mirror.c | 9 +++ >> block/nbd.c | 15 ++++ >> block/raw-format.c | 6 ++ >> job.c | 3 + >> tests/qemu-iotests/264 | 143 ++++++++++++++++++++++------------ >> tests/qemu-iotests/264.out | 20 ++--- >> tests/qemu-iotests/iotests.py | 6 +- >> 12 files changed, 173 insertions(+), 67 deletions(-) >> > >
On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > The problem > > Assume we have mirror job with nbd target node with enabled reconnect. > Connection failed. So, all current requests to nbd node are waiting for > nbd driver to reconnect. And they will wait for reconnect-delay time > specified in nbd blockdev options. This timeout may be long enough, for > example, we in Virtuozzo use 300 seconds by default. > > So, if at this moment user tries to cancel the job, job will wait for > its in-flight requests to finish up to 300 seconds. From the user point > of view, cancelling the job takes a long time. Bad. > > Solution > > Let's just cancel "waiting for reconnect in in-flight request coroutines" > on mirror (and backup) cancel. Welcome the series below. Because of my question on 4/11, I did not queue the entire series yet. But 6/11 was trivial enough to queue now.
21.01.2021 05:14, Eric Blake wrote: > On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote: >> Hi all! >> >> The problem >> >> Assume we have mirror job with nbd target node with enabled reconnect. >> Connection failed. So, all current requests to nbd node are waiting for >> nbd driver to reconnect. And they will wait for reconnect-delay time >> specified in nbd blockdev options. This timeout may be long enough, for >> example, we in Virtuozzo use 300 seconds by default. >> >> So, if at this moment user tries to cancel the job, job will wait for >> its in-flight requests to finish up to 300 seconds. From the user point >> of view, cancelling the job takes a long time. Bad. >> >> Solution >> >> Let's just cancel "waiting for reconnect in in-flight request coroutines" >> on mirror (and backup) cancel. Welcome the series below. > > Because of my question on 4/11, I did not queue the entire series yet. > But 6/11 was trivial enough to queue now. > Thanks!