mbox series

[v3,00/15] rbd journaling feature

Message ID 1564393377-28949-1-git-send-email-dongsheng.yang@easystack.cn (mailing list archive)
Headers show
Series rbd journaling feature | expand

Message

Dongsheng Yang July 29, 2019, 9:42 a.m. UTC
Hi Ilya, Jason and all:
      	As new exclusive-lock is merged, I think we can start this work now.
This is V3, which is rebased against 5.3-rc1.

Testing:
	kernel branch: https://github.com/yangdongsheng/linux/tree/krbd_journaling_v3
	ceph qa branch: https://github.com/yangdongsheng/ceph/tree/krbd_mirror_qa

	(1). A new test added: workunits/rbd/kernel_journal.sh: to test the journal replaying in krbd.
	(2). A new test added: qa/suites/krbd/mirror/, this test krbd journaling with rbd-mirror daemon.

Performance:
        compared with librbd journaling, preformance of krbd journaling looks reasonable.
        -------------------------------------------------------------------------------------
        (1) rbd bench with journaling disabled:         |       IOPS: 114
        -------------------------------------------------------------------------------------
        (2) rbd bench with journaling enabled:          |       IOPS: 55
        -------------------------------------------------------------------------------------
        (3) fio krbd with journaling disabled:          |       IOPS: 118
        -------------------------------------------------------------------------------------
        (4) fio krbd with journaling enabled:           |       IOPS: 57
        -------------------------------------------------------------------------------------

TODO:
	(1). there are some TODOs in comments, such as supporting rbd_journal_max_concurrent_object_sets.
	(2). add debugfs for generic journaling.

	I would like to put this TODO work in next cycle, but focus on making  the current work ready to go.

Changelog:
	-V2
		1. support large event (> 4M)
		2. fix bug in replay in different clients appending
		3. rebase against 5.3-rc1
		4. refactor journaler appending into state machine
        -V1
                1. add test case in qa
                2. address all memleak found in kmemleak
                3. several bug fixes
                4. performance improvement.
        -RFC
                1. error out if there is some unsupported event type in replaying
                2. just one memory copy from bio to msg.
                3. use async IO in journal appending.
                4. no mutex around IO.

Any comments are welcome!!

Dongsheng Yang (15):
  libceph: introduce ceph_extract_encoded_string_kvmalloc
  libceph: introduce a new parameter of workqueue in ceph_osdc_watch()
  libceph: support op append
  libceph: add prefix and suffix in ceph_osd_req_op.extent
  libceph: introduce cls_journaler_client
  libceph: introduce generic journaling
  libceph: journaling: introduce api to replay uncommitted journal
    events
  libceph: journaling: introduce api for journal appending
  libceph: journaling: trim object set when we found there is no client
    refer it
  rbd: introduce completion for each img_request
  rbd: introduce IMG_REQ_NOLOCK flag for image request state
  rbd: introduce rbd_journal_allocate_tag to allocate journal tag for
    rbd client
  rbd: append journal event in image request state machine
  rbd: replay events in journal
  rbd: add support for feature of RBD_FEATURE_JOURNALING

 drivers/block/rbd.c                       |  600 +++++++-
 include/linux/ceph/cls_journaler_client.h |   94 ++
 include/linux/ceph/decode.h               |   21 +-
 include/linux/ceph/journaler.h            |  184 +++
 include/linux/ceph/osd_client.h           |   19 +
 net/ceph/Makefile                         |    3 +-
 net/ceph/cls_journaler_client.c           |  558 ++++++++
 net/ceph/journaler.c                      | 2231 +++++++++++++++++++++++++++++
 net/ceph/osd_client.c                     |   61 +-
 9 files changed, 3759 insertions(+), 12 deletions(-)
 create mode 100644 include/linux/ceph/cls_journaler_client.h
 create mode 100644 include/linux/ceph/journaler.h
 create mode 100644 net/ceph/cls_journaler_client.c
 create mode 100644 net/ceph/journaler.c

Comments

Ilya Dryomov Aug. 18, 2019, 7:28 p.m. UTC | #1
On Mon, Jul 29, 2019 at 11:43 AM Dongsheng Yang
<dongsheng.yang@easystack.cn> wrote:
>
> Hi Ilya, Jason and all:
>         As new exclusive-lock is merged, I think we can start this work now.
> This is V3, which is rebased against 5.3-rc1.
>
> Testing:
>         kernel branch: https://github.com/yangdongsheng/linux/tree/krbd_journaling_v3
>         ceph qa branch: https://github.com/yangdongsheng/ceph/tree/krbd_mirror_qa
>
>         (1). A new test added: workunits/rbd/kernel_journal.sh: to test the journal replaying in krbd.
>         (2). A new test added: qa/suites/krbd/mirror/, this test krbd journaling with rbd-mirror daemon.
>
> Performance:
>         compared with librbd journaling, preformance of krbd journaling looks reasonable.
>         -------------------------------------------------------------------------------------
>         (1) rbd bench with journaling disabled:         |       IOPS: 114
>         -------------------------------------------------------------------------------------
>         (2) rbd bench with journaling enabled:          |       IOPS: 55
>         -------------------------------------------------------------------------------------
>         (3) fio krbd with journaling disabled:          |       IOPS: 118
>         -------------------------------------------------------------------------------------
>         (4) fio krbd with journaling enabled:           |       IOPS: 57
>         -------------------------------------------------------------------------------------
>
> TODO:
>         (1). there are some TODOs in comments, such as supporting rbd_journal_max_concurrent_object_sets.
>         (2). add debugfs for generic journaling.
>
>         I would like to put this TODO work in next cycle, but focus on making  the current work ready to go.
>
> Changelog:
>         -V2
>                 1. support large event (> 4M)
>                 2. fix bug in replay in different clients appending
>                 3. rebase against 5.3-rc1
>                 4. refactor journaler appending into state machine
>         -V1
>                 1. add test case in qa
>                 2. address all memleak found in kmemleak
>                 3. several bug fixes
>                 4. performance improvement.
>         -RFC
>                 1. error out if there is some unsupported event type in replaying
>                 2. just one memory copy from bio to msg.
>                 3. use async IO in journal appending.
>                 4. no mutex around IO.
>
> Any comments are welcome!!
>
> Dongsheng Yang (15):
>   libceph: introduce ceph_extract_encoded_string_kvmalloc
>   libceph: introduce a new parameter of workqueue in ceph_osdc_watch()
>   libceph: support op append
>   libceph: add prefix and suffix in ceph_osd_req_op.extent
>   libceph: introduce cls_journaler_client
>   libceph: introduce generic journaling
>   libceph: journaling: introduce api to replay uncommitted journal
>     events
>   libceph: journaling: introduce api for journal appending
>   libceph: journaling: trim object set when we found there is no client
>     refer it
>   rbd: introduce completion for each img_request
>   rbd: introduce IMG_REQ_NOLOCK flag for image request state
>   rbd: introduce rbd_journal_allocate_tag to allocate journal tag for
>     rbd client
>   rbd: append journal event in image request state machine
>   rbd: replay events in journal
>   rbd: add support for feature of RBD_FEATURE_JOURNALING
>
>  drivers/block/rbd.c                       |  600 +++++++-
>  include/linux/ceph/cls_journaler_client.h |   94 ++
>  include/linux/ceph/decode.h               |   21 +-
>  include/linux/ceph/journaler.h            |  184 +++
>  include/linux/ceph/osd_client.h           |   19 +
>  net/ceph/Makefile                         |    3 +-
>  net/ceph/cls_journaler_client.c           |  558 ++++++++
>  net/ceph/journaler.c                      | 2231 +++++++++++++++++++++++++++++
>  net/ceph/osd_client.c                     |   61 +-
>  9 files changed, 3759 insertions(+), 12 deletions(-)
>  create mode 100644 include/linux/ceph/cls_journaler_client.h
>  create mode 100644 include/linux/ceph/journaler.h
>  create mode 100644 net/ceph/cls_journaler_client.c
>  create mode 100644 net/ceph/journaler.c

Hi Dongsheng,

Some general comments that apply to the whole series:

- comments should look like

  /* comment */

  /*
   * multi-line
   * comment
   */

- placement of braces: a) don't use braces around single statements
  (everywhere) and b) functions should have the opening brace on the
  next line (e.g. rbd_img_need_journal())

- 80 column limit: we aren't very strict about it, but overly long
  lines should be the exception, not the rule

- unnecessary forward declarations: just place the new function above
  the call-site

- sizeof(struct foo) should be sizeof(*foo_ptr)

- integer types: use u{8,16,32,64} instead of uint{8,16,32,64}_t

- static const variables should be defines (e.g. PREAMBLE)

- no need to initialize fields to 0, NULL, etc after kzalloc() or
  similar (e.g. ceph_journaler_open())

Many of these rules are in Documentation/process/coding-style.rst.

Lastly, I would drop replaying for now.  This is a large series and
replaying amounts to at least a quarter of it without actually solving
the problem in its entirety.  Let's try to get appending and trimming
in shape first.

Thanks,

                Ilya
Dongsheng Yang Aug. 26, 2019, 2:53 a.m. UTC | #2
On 08/19/2019 03:28 AM, Ilya Dryomov wrote:
> On Mon, Jul 29, 2019 at 11:43 AM Dongsheng Yang
> <dongsheng.yang@easystack.cn>  wrote:
>> Hi Ilya, Jason and all:
>>          As new exclusive-lock is merged, I think we can start this work now.
>> This is V3, which is rebased against 5.3-rc1.
>>
>> Testing:
>>          kernel branch:https://github.com/yangdongsheng/linux/tree/krbd_journaling_v3
>>          ceph qa branch:https://github.com/yangdongsheng/ceph/tree/krbd_mirror_qa
>>
>>          (1). A new test added: workunits/rbd/kernel_journal.sh: to test the journal replaying in krbd.
>>          (2). A new test added: qa/suites/krbd/mirror/, this test krbd journaling with rbd-mirror daemon.
>>
>> Performance:
>>          compared with librbd journaling, preformance of krbd journaling looks reasonable.
>>          -------------------------------------------------------------------------------------
>>          (1) rbd bench with journaling disabled:         |       IOPS: 114
>>          -------------------------------------------------------------------------------------
>>          (2) rbd bench with journaling enabled:          |       IOPS: 55
>>          -------------------------------------------------------------------------------------
>>          (3) fio krbd with journaling disabled:          |       IOPS: 118
>>          -------------------------------------------------------------------------------------
>>          (4) fio krbd with journaling enabled:           |       IOPS: 57
>>          -------------------------------------------------------------------------------------
>>
>> TODO:
>>          (1). there are some TODOs in comments, such as supporting rbd_journal_max_concurrent_object_sets.
>>          (2). add debugfs for generic journaling.
>>
>>          I would like to put this TODO work in next cycle, but focus on making  the current work ready to go.
>>
>> Changelog:
>>          -V2
>>                  1. support large event (> 4M)
>>                  2. fix bug in replay in different clients appending
>>                  3. rebase against 5.3-rc1
>>                  4. refactor journaler appending into state machine
>>          -V1
>>                  1. add test case in qa
>>                  2. address all memleak found in kmemleak
>>                  3. several bug fixes
>>                  4. performance improvement.
>>          -RFC
>>                  1. error out if there is some unsupported event type in replaying
>>                  2. just one memory copy from bio to msg.
>>                  3. use async IO in journal appending.
>>                  4. no mutex around IO.
>>
>> Any comments are welcome!!
>>
>> Dongsheng Yang (15):
>>    libceph: introduce ceph_extract_encoded_string_kvmalloc
>>    libceph: introduce a new parameter of workqueue in ceph_osdc_watch()
>>    libceph: support op append
>>    libceph: add prefix and suffix in ceph_osd_req_op.extent
>>    libceph: introduce cls_journaler_client
>>    libceph: introduce generic journaling
>>    libceph: journaling: introduce api to replay uncommitted journal
>>      events
>>    libceph: journaling: introduce api for journal appending
>>    libceph: journaling: trim object set when we found there is no client
>>      refer it
>>    rbd: introduce completion for each img_request
>>    rbd: introduce IMG_REQ_NOLOCK flag for image request state
>>    rbd: introduce rbd_journal_allocate_tag to allocate journal tag for
>>      rbd client
>>    rbd: append journal event in image request state machine
>>    rbd: replay events in journal
>>    rbd: add support for feature of RBD_FEATURE_JOURNALING
>>
>>   drivers/block/rbd.c                       |  600 +++++++-
>>   include/linux/ceph/cls_journaler_client.h |   94 ++
>>   include/linux/ceph/decode.h               |   21 +-
>>   include/linux/ceph/journaler.h            |  184 +++
>>   include/linux/ceph/osd_client.h           |   19 +
>>   net/ceph/Makefile                         |    3 +-
>>   net/ceph/cls_journaler_client.c           |  558 ++++++++
>>   net/ceph/journaler.c                      | 2231 +++++++++++++++++++++++++++++
>>   net/ceph/osd_client.c                     |   61 +-
>>   9 files changed, 3759 insertions(+), 12 deletions(-)
>>   create mode 100644 include/linux/ceph/cls_journaler_client.h
>>   create mode 100644 include/linux/ceph/journaler.h
>>   create mode 100644 net/ceph/cls_journaler_client.c
>>   create mode 100644 net/ceph/journaler.c
> Hi Dongsheng,
>
> Some general comments that apply to the whole series:
>
> - comments should look like
>
>    /* comment */
>
>    /*
>     * multi-line
>     * comment
>     */
>
> - placement of braces: a) don't use braces around single statements
>    (everywhere) and b) functions should have the opening brace on the
>    next line (e.g. rbd_img_need_journal())
>
> - 80 column limit: we aren't very strict about it, but overly long
>    lines should be the exception, not the rule
>
> - unnecessary forward declarations: just place the new function above
>    the call-site
>
> - sizeof(struct foo) should be sizeof(*foo_ptr)
>
> - integer types: use u{8,16,32,64} instead of uint{8,16,32,64}_t
>
> - static const variables should be defines (e.g. PREAMBLE)
>
> - no need to initialize fields to 0, NULL, etc after kzalloc() or
>    similar (e.g. ceph_journaler_open())
>
> Many of these rules are in Documentation/process/coding-style.rst.

Hi Ilya,
     Thanx for your suggestion about coding-style above, I will check my 
code again.
> Lastly, I would drop replaying for now.  This is a large series and
> replaying amounts to at least a quarter of it without actually solving
> the problem in its entirety.  Let's try to get appending and trimming
> in shape first.

I would like to keep replaying:
(1) The current replaying at least cover the all events generated from 
kernel rbd. So this feature looks self-consistent
I think.

That means, if you are only using krbd in your usecase, krbd journaling 
would works well. But if we drop replaying, I
am afraid we can't use journaling at all even when we are going to use 
krbd only.

(2) Even if we don't do replaying, we still have to do journal fetching 
and preprocess, to
know is there any uncommitted journal event. That's still a not small work.

(3) About solving the problem entirely, I actually had a plan in my mind.
we can provide an map option to user,
maybe "rbd map IMAGE -o 
journal-replay-helper=/usr/bin/rbd-journal-replay-helper.sh"

If this option is specified, we will use call_usermodehelper() to call 
the helper specified to do replay in journal opening.

else, we can use default replaying which is provided in this patch set.

Thanx
> Thanks,
>
>                  Ilya
>