From patchwork Thu Aug 16 05:59:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 10566989 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 620E0913 for ; Thu, 16 Aug 2018 06:05:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 47D1C2AB89 for ; Thu, 16 Aug 2018 06:05:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3A9E52AB91; Thu, 16 Aug 2018 06:05:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C5F3C2AB89 for ; Thu, 16 Aug 2018 06:05:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388827AbeHPJBe (ORCPT ); Thu, 16 Aug 2018 05:01:34 -0400 Received: from m50149.mail.qiye.163.com ([123.125.50.149]:13137 "EHLO m50149.mail.qiye.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388759AbeHPJBe (ORCPT ); Thu, 16 Aug 2018 05:01:34 -0400 X-Greylist: delayed 323 seconds by postgrey-1.27 at vger.kernel.org; Thu, 16 Aug 2018 05:01:27 EDT Received: from atest-guest.localdomain (unknown [218.94.118.90]) by smtp8 (Coremail) with SMTP id RdOowACHY5HTEnVbpKRVAA--.250S2; Thu, 16 Aug 2018 13:59:47 +0800 (CST) From: Dongsheng Yang To: idryomov@gmail.com, sage@redhat.com, elder@kernel.org, jdillama@redhat.com Cc: ceph-devel@vger.kernel.org, dongsheng.yang@easystack.cn Subject: [RFC PATCH 0/4] rbd journaling feature Date: Thu, 16 Aug 2018 01:59:28 -0400 Message-Id: <1534399172-27610-1-git-send-email-dongsheng.yang@easystack.cn> X-Mailer: git-send-email 1.8.3.1 X-CM-TRANSID: RdOowACHY5HTEnVbpKRVAA--.250S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxAFy8CrW3ZFW7GF4fZw4Utwb_yoWrWr1Upa 45GanYyrWUZF1akw4fZ34DA3Wfury8JFyUuwnrZw17Gry5Zr9xXF1DtFZ8urW7A34vqF1r Ga43Ca4rC3WjyFUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0JbhBMtUUUUU= X-Originating-IP: [218.94.118.90] X-CM-SenderInfo: 5grqw2pkhqwhp1dqwq5hdv52pwdfyhdfq/1tbiCBGkelkXOS9rgwAAs3 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi all, This patchset implement the journaling feature in kernel rbd, which makes mirroring in kubernetes possible. This is an RFC patchset, and it passed the /ceph/ceph/qa/workunits/rbd/rbd_mirror.sh, with a little change as below: ``` [root@atest-guest build]# git diff /ceph/ceph/qa/workunits/rbd/rbd_mirror_helpers.sh ``` That means this patchset is working well in mirroring data. There are some TODOs in comments, but most of them are about performance improvement. So I think it's a good timing to ask for comments from all of you guys. If you want to play with it, there is a simple script mirroring xfs below: ``` ./mstart.sh remote -k -l --bluestore ./mstart.sh local -k -l --bluestore rados -c ./run/local/ceph.conf rmpool rbd rbd --yes-i-really-really-mean-it rados -c ./run/remote/ceph.conf rmpool rbd rbd --yes-i-really-really-mean-it rados -c ./run/local/ceph.conf mkpool rbd rados -c ./run/remote/ceph.conf mkpool rbd rbd -c ./run/local/ceph.conf mirror pool enable rbd image rbd -c ./run/remote/ceph.conf mirror pool enable rbd image rbd -c ./run/local/ceph.conf mirror pool peer add rbd client.admin@remote rbd -c ./run/remote/ceph.conf mirror pool peer add rbd client.admin@local rbd -c ./run/remote/ceph.conf mirror pool info rbd rbd -c ./run/local/ceph.conf mirror pool info rbd rbd -c ./run/local/ceph.conf create test --image-feature layering --image-feature exclusive-lock --image-feature journaling -s 100M rbd -c ./run/local/ceph.conf mirror image enable test rbd -c ./run/remote/ceph.conf ls rbd -c ./run/local/ceph.conf map test mkfs.xfs /dev/rbd0 mount /dev/rbd0 /mnt/rbd0 dd if=/dev/urandom of=/mnt/rbd0/data bs=128K count=1 md5sum /mnt/rbd0/data sync data_1=`md5sum /mnt/rbd0/data|awk '{print $1}'` umount /mnt/rbd0 rbd unmap /dev/rbd0 until rbd -c ./run/remote/ceph.conf ls |grep test; do sleep 1 done rbd -c ./run/local/ceph.conf mirror image demote test sleep 3 rbd -c ./run/remote/ceph.conf mirror image promote test rbd -c ./run/remote/ceph.conf map test mount /dev/rbd0 /mnt/rbd0 md5sum /mnt/rbd0/data data_2=`md5sum /mnt/rbd0/data|awk '{print $1}'` echo data_1: $data_1 echo data_2: $data_2 if (( "$data_1" != "$data_2" )); then echo "failed" else echo "pass" fi umount /mnt/rbd0 rbd unmap /dev/rbd0 exit ``` Any comment is welcome! Dongsheng Yang (4): libceph: support op append libceph: introduce cls_journaler_client libceph: introduce generic journaling rbd: enable journaling drivers/block/rbd.c | 478 +++++++++++- include/linux/ceph/cls_journaler_client.h | 87 +++ include/linux/ceph/journaler.h | 131 ++++ net/ceph/Makefile | 3 +- net/ceph/cls_journaler_client.c | 501 ++++++++++++ net/ceph/journaler.c | 1208 +++++++++++++++++++++++++++++ net/ceph/osd_client.c | 13 +- 7 files changed, 2409 insertions(+), 12 deletions(-) create mode 100644 include/linux/ceph/cls_journaler_client.h create mode 100644 include/linux/ceph/journaler.h create mode 100644 net/ceph/cls_journaler_client.c create mode 100644 net/ceph/journaler.c diff --git a/qa/workunits/rbd/rbd_mirror_helpers.sh b/qa/workunits/rbd/rbd_mirror_helpers.sh index e019de5..9d00d3e 100755 --- a/qa/workunits/rbd/rbd_mirror_helpers.sh +++ b/qa/workunits/rbd/rbd_mirror_helpers.sh @@ -854,9 +854,9 @@ write_image() test -n "${size}" || size=4096 - rbd --cluster ${cluster} -p ${pool} bench ${image} --io-type write \ - --io-size ${size} --io-threads 1 --io-total $((size * count)) \ - --io-pattern rand + rbd --cluster ${cluster} -p ${pool} map ${image} + fio --name=test --rw=randwrite --bs=${size} --runtime=60 --ioengine=libaio --iodepth=1 --numjobs=1 --filename=/dev/rbd0 --direct=1 --group_reporting --size $((size * count)) --group_reporting --eta-newline + rbd --cluster ${cluster} -p ${pool} unmap ${image} } stress_write_image()