mbox series

[v5,0/6] block/rbd: migrate to coroutines and add write zeroes support

Message ID 20210702172356.11574-1-idryomov@gmail.com (mailing list archive)
Headers show
Series block/rbd: migrate to coroutines and add write zeroes support | expand

Message

Ilya Dryomov July 2, 2021, 5:23 p.m. UTC
This series migrates the qemu rbd driver from the old aio emulation
to native coroutines and adds write zeroes support which is important
for block operations.

To achieve this we first bump the librbd requirement to the already
outdated luminous release of ceph to get rid of some wrappers and
ifdef'ry in the code.

V4->V5:
 - rebase on Kevin's block branch (https://repo.or.cz/qemu/kevin.git block)
   to resolve conflicts with "block/rbd: Add support for rbd image encryption"
   patch

V3->V4:
 - this patch is now rebased on top of current master
 - Patch 1: just mention librbd, tweak version numbers [Ilya]
 - Patch 3: use rbd_get_size instead of rbd_stat [Ilya]
 - Patch 4: retain comment about using a BH in the callback [Ilya]
 - Patch 5: set BDRV_REQ_NO_FALLBACK and silently ignore BDRV_REQ_MAY_UNMAP [Ilya]

V2->V3:
 - this patch is now rebased on top of current master
 - Patch 1: only use cc.links and not cc.run to not break
   cross-compiling. [Kevin]
   Since Qemu 6.1 its okay to rely on librbd >= 12.x since RHEL-7
   support was dropped [Daniel]
 - Patch 4: dropped
 - Patch 5: store BDS in RBDTask and use bdrv_get_aio_context() [Kevin]

V1->V2:
 - this patch is now rebased on top of current master with Paolos
   upcoming fixes for the meson.build script included:
    - meson: accept either shared or static libraries if --disable-static
    - meson: honor --enable-rbd if cc.links test fails
 - Patch 1: adjusted to meson.build script
 - Patch 2: unchanged
 - Patch 3: new patch
 - Patch 4: do not implement empty detach_aio_context callback [Jason]
 - Patch 5: - fix aio completion cleanup in error case [Jason]
            - return error codes from librbd
 - Patch 6: - add support for thick provisioning [Jason]
            - do not set write zeroes alignment
 - Patch 7: new patch

Peter Lieven (6):
  block/rbd: bump librbd requirement to luminous release
  block/rbd: store object_size in BDRVRBDState
  block/rbd: update s->image_size in qemu_rbd_getlength
  block/rbd: migrate from aio to coroutines
  block/rbd: add write zeroes support
  block/rbd: drop qemu_rbd_refresh_limits

 block/rbd.c | 406 ++++++++++++++++------------------------------------
 meson.build |   7 +-
 2 files changed, 128 insertions(+), 285 deletions(-)

Comments

Ilya Dryomov July 2, 2021, 7:55 p.m. UTC | #1
On Fri, Jul 2, 2021 at 7:24 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> This series migrates the qemu rbd driver from the old aio emulation
> to native coroutines and adds write zeroes support which is important
> for block operations.
>
> To achieve this we first bump the librbd requirement to the already
> outdated luminous release of ceph to get rid of some wrappers and
> ifdef'ry in the code.
>
> V4->V5:
>  - rebase on Kevin's block branch (https://repo.or.cz/qemu/kevin.git block)
>    to resolve conflicts with "block/rbd: Add support for rbd image encryption"
>    patch
>
> V3->V4:
>  - this patch is now rebased on top of current master
>  - Patch 1: just mention librbd, tweak version numbers [Ilya]
>  - Patch 3: use rbd_get_size instead of rbd_stat [Ilya]
>  - Patch 4: retain comment about using a BH in the callback [Ilya]
>  - Patch 5: set BDRV_REQ_NO_FALLBACK and silently ignore BDRV_REQ_MAY_UNMAP [Ilya]
>
> V2->V3:
>  - this patch is now rebased on top of current master
>  - Patch 1: only use cc.links and not cc.run to not break
>    cross-compiling. [Kevin]
>    Since Qemu 6.1 its okay to rely on librbd >= 12.x since RHEL-7
>    support was dropped [Daniel]
>  - Patch 4: dropped
>  - Patch 5: store BDS in RBDTask and use bdrv_get_aio_context() [Kevin]
>
> V1->V2:
>  - this patch is now rebased on top of current master with Paolos
>    upcoming fixes for the meson.build script included:
>     - meson: accept either shared or static libraries if --disable-static
>     - meson: honor --enable-rbd if cc.links test fails
>  - Patch 1: adjusted to meson.build script
>  - Patch 2: unchanged
>  - Patch 3: new patch
>  - Patch 4: do not implement empty detach_aio_context callback [Jason]
>  - Patch 5: - fix aio completion cleanup in error case [Jason]
>             - return error codes from librbd
>  - Patch 6: - add support for thick provisioning [Jason]
>             - do not set write zeroes alignment
>  - Patch 7: new patch
>
> Peter Lieven (6):
>   block/rbd: bump librbd requirement to luminous release
>   block/rbd: store object_size in BDRVRBDState
>   block/rbd: update s->image_size in qemu_rbd_getlength
>   block/rbd: migrate from aio to coroutines
>   block/rbd: add write zeroes support
>   block/rbd: drop qemu_rbd_refresh_limits
>
>  block/rbd.c | 406 ++++++++++++++++------------------------------------
>  meson.build |   7 +-
>  2 files changed, 128 insertions(+), 285 deletions(-)
>
> --
> 2.19.2
>

Attempting to suit patchew:

Based-on: <20210627114635.39326-1-oro@il.ibm.com>
([PATCH v2] block/rbd: Add support for rbd image encryption)

Thanks,

                Ilya
Kevin Wolf July 6, 2021, 1:19 p.m. UTC | #2
Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben:
> This series migrates the qemu rbd driver from the old aio emulation
> to native coroutines and adds write zeroes support which is important
> for block operations.
> 
> To achieve this we first bump the librbd requirement to the already
> outdated luminous release of ceph to get rid of some wrappers and
> ifdef'ry in the code.

Thanks, applied to the block branch.

I've only had a very quick look at the patches, but I think there is one
suggestion for a cleanup I can make: The qemu_rbd_finish_bh()
indirection is probably unnecessary now because aio_co_wake() is thread
safe.

(Also, if I were the responsible maintainer, I would prefer true/false
rather than 0/1 for bools, but that's minor. :-))

Kevin
Peter Lieven July 6, 2021, 2:55 p.m. UTC | #3
> Am 06.07.2021 um 15:19 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben:
>> This series migrates the qemu rbd driver from the old aio emulation
>> to native coroutines and adds write zeroes support which is important
>> for block operations.
>> 
>> To achieve this we first bump the librbd requirement to the already
>> outdated luminous release of ceph to get rid of some wrappers and
>> ifdef'ry in the code.
> 
> Thanks, applied to the block branch.
> 
> I've only had a very quick look at the patches, but I think there is one
> suggestion for a cleanup I can make: The qemu_rbd_finish_bh()
> indirection is probably unnecessary now because aio_co_wake() is thread
> safe.

But this is new, isn’t it?

We also have this indirection in iscsi and nfs drivers I think.

Does it matter that the completion callback is called from an librbd thread? Will the coroutine continue to run in the right thread?

I will have a decent look after my vacation.

Anyway, Thanks for applying,
Peter

> 
> (Also, if I were the responsible maintainer, I would prefer true/false
> rather than 0/1 for bools, but that's minor. :-))
> 
> Kevin
>
Kevin Wolf July 6, 2021, 3:25 p.m. UTC | #4
Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
> > Am 06.07.2021 um 15:19 schrieb Kevin Wolf <kwolf@redhat.com>:
> > 
> > Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben:
> >> This series migrates the qemu rbd driver from the old aio emulation
> >> to native coroutines and adds write zeroes support which is important
> >> for block operations.
> >> 
> >> To achieve this we first bump the librbd requirement to the already
> >> outdated luminous release of ceph to get rid of some wrappers and
> >> ifdef'ry in the code.
> > 
> > Thanks, applied to the block branch.
> > 
> > I've only had a very quick look at the patches, but I think there is one
> > suggestion for a cleanup I can make: The qemu_rbd_finish_bh()
> > indirection is probably unnecessary now because aio_co_wake() is thread
> > safe.
> 
> But this is new, isn’t it?

Not sure in what sense you mean. aio_co_wake() has always been thread
safe, as far as I know.

Obviously, the old code didn't use aio_co_wake(), but directly called
some callbacks, so the part that is new is your coroutine conversion
that enables getting rid of the BH.

> We also have this indirection in iscsi and nfs drivers I think.

Indeed, the resulting codes look the same. In iscsi and nfs it doesn't
come from an incomplete converstion to coroutines, but they both used
qemu_coroutine_enter() originally, which resumes the coroutine in the
current thread...

> Does it matter that the completion callback is called from an librbd
> thread? Will the coroutine continue to run in the right thread?

...whereas aio_co_wake() resumes the coroutine in the thread where it
was running before.

(Before commit 5f50be9b5, this would have been buggy from an librbd
thread, but now it should work correctly even for threads that are
neither iothreads nor vcpu threads.)

> I will have a decent look after my vacation.

Sounds good, thanks. Enjoy your vacation!

Kevin
Peter Lieven July 6, 2021, 3:48 p.m. UTC | #5
> Am 06.07.2021 um 17:25 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
>>>> Am 06.07.2021 um 15:19 schrieb Kevin Wolf <kwolf@redhat.com>:
>>> 
>>> Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben:
>>>> This series migrates the qemu rbd driver from the old aio emulation
>>>> to native coroutines and adds write zeroes support which is important
>>>> for block operations.
>>>> 
>>>> To achieve this we first bump the librbd requirement to the already
>>>> outdated luminous release of ceph to get rid of some wrappers and
>>>> ifdef'ry in the code.
>>> 
>>> Thanks, applied to the block branch.
>>> 
>>> I've only had a very quick look at the patches, but I think there is one
>>> suggestion for a cleanup I can make: The qemu_rbd_finish_bh()
>>> indirection is probably unnecessary now because aio_co_wake() is thread
>>> safe.
>> 
>> But this is new, isn’t it?
> 
> Not sure in what sense you mean. aio_co_wake() has always been thread
> safe, as far as I know.
> 
> Obviously, the old code didn't use aio_co_wake(), but directly called
> some callbacks, so the part that is new is your coroutine conversion
> that enables getting rid of the BH.
> 
>> We also have this indirection in iscsi and nfs drivers I think.
> 
> Indeed, the resulting codes look the same. In iscsi and nfs it doesn't
> come from an incomplete converstion to coroutines, but they both used
> qemu_coroutine_enter() originally, which resumes the coroutine in the
> current thread...

If I remember correctly this would also serialize requests and thus we used BHs. libnfs and libiscsi are not thread safe as well and they completely run in qemus threads so this wasn’t the original reason.

Thanks for the hints to the relevant commit.

I will send a follow up for rbd/nfs/iscsi in about 2 weeks.

Peter

> 
>> Does it matter that the completion callback is called from an librbd
>> thread? Will the coroutine continue to run in the right thread?
> 
> ...whereas aio_co_wake() resumes the coroutine in the thread where it
> was running before.
> 
> (Before commit 5f50be9b5, this would have been buggy from an librbd
> thread, but now it should work correctly even for threads that are
> neither iothreads nor vcpu threads.)
> 
>> I will have a decent look after my vacation.
> 
> Sounds good, thanks. Enjoy your vacation!
> 
> Kevin
>
Peter Lieven July 7, 2021, 6:13 p.m. UTC | #6
> Am 06.07.2021 um 17:25 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
>>> Am 06.07.2021 um 15:19 schrieb Kevin Wolf <kwolf@redhat.com>:
>>> 
>>> Am 02.07.2021 um 19:23 hat Ilya Dryomov geschrieben:
>>>> This series migrates the qemu rbd driver from the old aio emulation
>>>> to native coroutines and adds write zeroes support which is important
>>>> for block operations.
>>>> 
>>>> To achieve this we first bump the librbd requirement to the already
>>>> outdated luminous release of ceph to get rid of some wrappers and
>>>> ifdef'ry in the code.
>>> 
>>> Thanks, applied to the block branch.
>>> 
>>> I've only had a very quick look at the patches, but I think there is one
>>> suggestion for a cleanup I can make: The qemu_rbd_finish_bh()
>>> indirection is probably unnecessary now because aio_co_wake() is thread
>>> safe.
>> 
>> But this is new, isn’t it?
> 
> Not sure in what sense you mean. aio_co_wake() has always been thread
> safe, as far as I know.
> 
> Obviously, the old code didn't use aio_co_wake(), but directly called
> some callbacks, so the part that is new is your coroutine conversion
> that enables getting rid of the BH.
> 
>> We also have this indirection in iscsi and nfs drivers I think.
> 
> Indeed, the resulting codes look the same. In iscsi and nfs it doesn't
> come from an incomplete converstion to coroutines, but they both used
> qemu_coroutine_enter() originally, which resumes the coroutine in the
> current thread...
> 
>> Does it matter that the completion callback is called from an librbd
>> thread? Will the coroutine continue to run in the right thread?
> 
> ...whereas aio_co_wake() resumes the coroutine in the thread where it
> was running before.
> 
> (Before commit 5f50be9b5, this would have been buggy from an librbd
> thread, but now it should work correctly even for threads that are
> neither iothreads nor vcpu threads.)
> 
>> I will have a decent look after my vacation.
> 
> Sounds good, thanks. Enjoy your vacation!


As I had to fire up my laptop to look into another issue anyway, I have sent two patches for updating MAINTAINERS and to fix the int vs. bool mix for task->complete. As Paolos fix (5f50be9b5) is relatively new and there are maybe other non obvious problems when removing the BH indirection and we are close to soft freeze I would leave the BH removal change for 6.2.

Best,
Peter
Kevin Wolf July 8, 2021, 12:18 p.m. UTC | #7
Am 07.07.2021 um 20:13 hat Peter Lieven geschrieben:
> Am 06.07.2021 um 17:25 schrieb Kevin Wolf <kwolf@redhat.com>:
> > Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
> >> I will have a decent look after my vacation.
> > 
> > Sounds good, thanks. Enjoy your vacation!
> 
> As I had to fire up my laptop to look into another issue anyway, I
> have sent two patches for updating MAINTAINERS and to fix the int vs.
> bool mix for task->complete.

I think you need to reevaluate your definition of vacation. ;-)

But thanks anyway.

> As Paolos fix (5f50be9b5) is relatively new and there are maybe other
> non obvious problems when removing the BH indirection and we are close
> to soft freeze I would leave the BH removal change for 6.2.

Sure, code cleanups aren't urgent.

Kevin
Peter Lieven July 8, 2021, 6:23 p.m. UTC | #8
> Am 08.07.2021 um 14:18 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> Am 07.07.2021 um 20:13 hat Peter Lieven geschrieben:
>>> Am 06.07.2021 um 17:25 schrieb Kevin Wolf <kwolf@redhat.com>:
>>> Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
>>>> I will have a decent look after my vacation.
>>> 
>>> Sounds good, thanks. Enjoy your vacation!
>> 
>> As I had to fire up my laptop to look into another issue anyway, I
>> have sent two patches for updating MAINTAINERS and to fix the int vs.
>> bool mix for task->complete.
> 
> I think you need to reevaluate your definition of vacation. ;-)

Lets talk about this when the kids are grown up. Sometimes sending patches can be quite relaxing :-)

> 
> But thanks anyway.
> 
>> As Paolos fix (5f50be9b5) is relatively new and there are maybe other
>> non obvious problems when removing the BH indirection and we are close
>> to soft freeze I would leave the BH removal change for 6.2.
> 
> Sure, code cleanups aren't urgent.

Isn’t the indirection also a slight performance drop?

Peter
Kevin Wolf July 9, 2021, 10:21 a.m. UTC | #9
Am 08.07.2021 um 20:23 hat Peter Lieven geschrieben:
> Am 08.07.2021 um 14:18 schrieb Kevin Wolf <kwolf@redhat.com>:
> > Am 07.07.2021 um 20:13 hat Peter Lieven geschrieben:
> >>> Am 06.07.2021 um 17:25 schrieb Kevin Wolf <kwolf@redhat.com>:
> >>> Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
> >>>> I will have a decent look after my vacation.
> >>> 
> >>> Sounds good, thanks. Enjoy your vacation!
> >> 
> >> As I had to fire up my laptop to look into another issue anyway, I
> >> have sent two patches for updating MAINTAINERS and to fix the int vs.
> >> bool mix for task->complete.
> > 
> > I think you need to reevaluate your definition of vacation. ;-)
> 
> Lets talk about this when the kids are grown up. Sometimes sending
> patches can be quite relaxing :-)

Heh, fair enough. :-)

> > But thanks anyway.
> > 
> >> As Paolos fix (5f50be9b5) is relatively new and there are maybe other
> >> non obvious problems when removing the BH indirection and we are close
> >> to soft freeze I would leave the BH removal change for 6.2.
> > 
> > Sure, code cleanups aren't urgent.
> 
> Isn’t the indirection also a slight performance drop?

Yeah, I guess technically it is, though I doubt it's measurable.

Kevin
Peter Lieven Sept. 16, 2021, 12:34 p.m. UTC | #10
Am 09.07.21 um 12:21 schrieb Kevin Wolf:
> Am 08.07.2021 um 20:23 hat Peter Lieven geschrieben:
>> Am 08.07.2021 um 14:18 schrieb Kevin Wolf <kwolf@redhat.com>:
>>> Am 07.07.2021 um 20:13 hat Peter Lieven geschrieben:
>>>>> Am 06.07.2021 um 17:25 schrieb Kevin Wolf <kwolf@redhat.com>:
>>>>> Am 06.07.2021 um 16:55 hat Peter Lieven geschrieben:
>>>>>> I will have a decent look after my vacation.
>>>>> Sounds good, thanks. Enjoy your vacation!
>>>> As I had to fire up my laptop to look into another issue anyway, I
>>>> have sent two patches for updating MAINTAINERS and to fix the int vs.
>>>> bool mix for task->complete.
>>> I think you need to reevaluate your definition of vacation. ;-)
>> Lets talk about this when the kids are grown up. Sometimes sending
>> patches can be quite relaxing :-)
> Heh, fair enough. :-)
>
>>> But thanks anyway.
>>>
>>>> As Paolos fix (5f50be9b5) is relatively new and there are maybe other
>>>> non obvious problems when removing the BH indirection and we are close
>>>> to soft freeze I would leave the BH removal change for 6.2.
>>> Sure, code cleanups aren't urgent.
>> Isn’t the indirection also a slight performance drop?
> Yeah, I guess technically it is, though I doubt it's measurable.


As promised I was trying to remove the indirection through the BH after Qemu 6.1 release.

However, if I remove the BH I run into the following assertion while running some fio tests:


qemu-system-x86_64: ../block/block-backend.c:1197: blk_wait_while_drained: Assertion `blk->in_flight > 0' failed.


Any idea?


This is what I changed:


diff --git a/block/rbd.c b/block/rbd.c
index 3cb24f9981..bc1dbc20f7 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1063,13 +1063,6 @@ static int qemu_rbd_resize(BlockDriverState *bs, uint64_t size)
      return 0;
  }

-static void qemu_rbd_finish_bh(void *opaque)
-{
-    RBDTask *task = opaque;
-    task->complete = true;
-    aio_co_wake(task->co);
-}
-
  /*
   * This is the completion callback function for all rbd aio calls
   * started from qemu_rbd_start_co().
@@ -1083,8 +1076,8 @@ static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
  {
      task->ret = rbd_aio_get_return_value(c);
      rbd_aio_release(c);
-    aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
-                            qemu_rbd_finish_bh, task);
+    task->complete = true;
+    aio_co_wake(task->co);
  }


Peter