From patchwork Mon Nov 1 02:04:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 12595591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F515C433EF for ; Mon, 1 Nov 2021 02:05:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E9FD60FE8 for ; Mon, 1 Nov 2021 02:05:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230346AbhKACHe (ORCPT ); Sun, 31 Oct 2021 22:07:34 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:34211 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230191AbhKACHb (ORCPT ); Sun, 31 Oct 2021 22:07:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635732298; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lkCExoYHKmmP8kyQttX6GO+S2QrPi5ovhrz89ZkqFPE=; b=MeQs/R/L1zssAzYNzvm8Sr9/skOpKgDTjqPtI5i54zeouyjOoJPDsh9IALooM99OlP8rGZ DWJ8lGtylastDTPQw892xtjN2+xi9nkQ7cfSD1B4EUV6KjcoXFs36gaLgm3ezJ43FNqmIO fnshkQUmnFW+WczDnBtoQu4jDvLN8A4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-314-KkZnt5_vP1eu1QdyHesKAA-1; Sun, 31 Oct 2021 22:04:56 -0400 X-MC-Unique: KkZnt5_vP1eu1QdyHesKAA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8430E1023F4D; Mon, 1 Nov 2021 02:04:55 +0000 (UTC) Received: from lxbceph1.gsslab.pek2.redhat.com (unknown [10.72.47.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id 282B55D6CF; Mon, 1 Nov 2021 02:04:52 +0000 (UTC) From: xiubli@redhat.com To: jlayton@kernel.org Cc: idryomov@gmail.com, vshankar@redhat.com, pdonnell@redhat.com, khiremat@redhat.com, ceph-devel@vger.kernel.org, Xiubo Li Subject: [PATCH v4 1/4] Revert "ceph: make client zero partial trailing block on truncate" Date: Mon, 1 Nov 2021 10:04:44 +0800 Message-Id: <20211101020447.75872-2-xiubli@redhat.com> In-Reply-To: <20211101020447.75872-1-xiubli@redhat.com> References: <20211101020447.75872-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li This reverts commit c97968122078ce0380cd8db405b8505a8b0a55d8. --- fs/ceph/file.c | 3 ++- fs/ceph/inode.c | 23 ++--------------------- fs/ceph/super.h | 1 - 3 files changed, 4 insertions(+), 23 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index ee13512b610d..af58be73ce1c 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -2250,7 +2250,8 @@ static void ceph_zero_pagecache_range(struct inode *inode, loff_t offset, ceph_zero_partial_page(inode, offset, length); } -int ceph_zero_partial_object(struct inode *inode, loff_t offset, loff_t *length) +static int ceph_zero_partial_object(struct inode *inode, + loff_t offset, loff_t *length) { struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 5d47b98b61af..9b798690fdc9 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -2393,6 +2393,7 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c cpu_to_le64(round_up(isize, CEPH_FSCRYPT_BLOCK_SIZE)); req->r_fscrypt_file = attr->ia_size; + /* FIXME: client must zero out any partial blocks! */ } else { req->r_args.setattr.size = cpu_to_le64(attr->ia_size); req->r_args.setattr.old_size = cpu_to_le64(isize); @@ -2481,28 +2482,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c ceph_mdsc_put_request(req); ceph_free_cap_flush(prealloc_cf); - if (err >= 0 && (mask & (CEPH_SETATTR_SIZE|CEPH_SETATTR_FSCRYPT_FILE))) { + if (err >= 0 && (mask & CEPH_SETATTR_SIZE)) __ceph_do_pending_vmtruncate(inode); - if (mask & CEPH_SETATTR_FSCRYPT_FILE) { - loff_t orig_len, len; - - len = round_up(attr->ia_size, CEPH_FSCRYPT_BLOCK_SIZE) - attr->ia_size; - orig_len = len; - - /* - * FIXME: this is just doing the truncating the last OSD - * object, but for "real" fscrypt support, we need - * to do a RMW with the end of the block zeroed out. - */ - if (len) { - err = ceph_zero_partial_object(inode, attr->ia_size, &len); - /* This had better not be shortened */ - WARN_ONCE(!err && len != orig_len, - "attr->ia_size=%lld orig_len=%lld len=%lld\n", - attr->ia_size, orig_len, len); - } - } - } return err; } diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 6d4a22c6d32d..7f3976b3319d 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1236,7 +1236,6 @@ extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry, extern int ceph_release(struct inode *inode, struct file *filp); extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page, char *data, size_t len); -int ceph_zero_partial_object(struct inode *inode, loff_t offset, loff_t *length); /* dir.c */ extern const struct file_operations ceph_dir_fops; From patchwork Mon Nov 1 02:04:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 12595593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2D50C433EF for ; Mon, 1 Nov 2021 02:05:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9BE8B60F24 for ; Mon, 1 Nov 2021 02:05:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230400AbhKACHh (ORCPT ); Sun, 31 Oct 2021 22:07:37 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:45311 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230333AbhKACHf (ORCPT ); Sun, 31 Oct 2021 22:07:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635732302; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uzof8A1qRInCZYX4ONpNJOEu4IY+psyAJr3vrozCMe8=; b=G/QsuBL/S9YxyyDBJ2NrwKVbi34hWPIGmKpkRUdf28CO1Ic0KLvEAoWEkXIwNmUZTnsDNl lNvHQYRL0fFYHu0KQj37vRQbvizjz5UwLDpmUW7Vetj6GAcqOnTdcWiViT5CgWTq0TZNQq SEvpvHhbEwV5z9YyAyIxu1hKpDhOMP8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-426-e-Ns7uQBOOC_u7LeBI0pkw-1; Sun, 31 Oct 2021 22:04:59 -0400 X-MC-Unique: e-Ns7uQBOOC_u7LeBI0pkw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 647A680668E; Mon, 1 Nov 2021 02:04:58 +0000 (UTC) Received: from lxbceph1.gsslab.pek2.redhat.com (unknown [10.72.47.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0AC165D6CF; Mon, 1 Nov 2021 02:04:55 +0000 (UTC) From: xiubli@redhat.com To: jlayton@kernel.org Cc: idryomov@gmail.com, vshankar@redhat.com, pdonnell@redhat.com, khiremat@redhat.com, ceph-devel@vger.kernel.org, Xiubo Li Subject: [PATCH v4 2/4] ceph: add __ceph_get_caps helper support Date: Mon, 1 Nov 2021 10:04:45 +0800 Message-Id: <20211101020447.75872-3-xiubli@redhat.com> In-Reply-To: <20211101020447.75872-1-xiubli@redhat.com> References: <20211101020447.75872-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li Signed-off-by: Xiubo Li --- fs/ceph/caps.c | 19 +++++++++++++------ fs/ceph/super.h | 2 ++ 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index d628dcdbf869..4e2a588465c5 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2876,10 +2876,9 @@ int ceph_try_get_caps(struct inode *inode, int need, int want, * due to a small max_size, make sure we check_max_size (and possibly * ask the mds) so we don't get hung up indefinitely. */ -int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got) +int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need, + int want, loff_t endoff, int *got) { - struct ceph_file_info *fi = filp->private_data; - struct inode *inode = file_inode(filp); struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); int ret, _got, flags; @@ -2888,7 +2887,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got if (ret < 0) return ret; - if ((fi->fmode & CEPH_FILE_MODE_WR) && + if (fi && (fi->fmode & CEPH_FILE_MODE_WR) && fi->filp_gen != READ_ONCE(fsc->filp_gen)) return -EBADF; @@ -2896,7 +2895,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got while (true) { flags &= CEPH_FILE_MODE_MASK; - if (atomic_read(&fi->num_locks)) + if (fi && atomic_read(&fi->num_locks)) flags |= CHECK_FILELOCK; _got = 0; ret = try_get_cap_refs(inode, need, want, endoff, @@ -2941,7 +2940,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got continue; } - if ((fi->fmode & CEPH_FILE_MODE_WR) && + if (fi && (fi->fmode & CEPH_FILE_MODE_WR) && fi->filp_gen != READ_ONCE(fsc->filp_gen)) { if (ret >= 0 && _got) ceph_put_cap_refs(ci, _got); @@ -3004,6 +3003,14 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got return 0; } +int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got) +{ + struct ceph_file_info *fi = filp->private_data; + struct inode *inode = file_inode(filp); + + return __ceph_get_caps(inode, fi, need, want, endoff, got); +} + /* * Take cap refs. Caller must already know we hold at least one ref * on the caps in question or we don't know this is safe. diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 7f3976b3319d..027d5f579ba0 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1208,6 +1208,8 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn, struct inode *dir, int mds, int drop, int unless); +extern int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, + int need, int want, loff_t endoff, int *got); extern int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got); extern int ceph_try_get_caps(struct inode *inode, From patchwork Mon Nov 1 02:04:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 12595597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3938C433FE for ; Mon, 1 Nov 2021 02:05:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8CAD60F24 for ; Mon, 1 Nov 2021 02:05:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230395AbhKACHi (ORCPT ); Sun, 31 Oct 2021 22:07:38 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:26756 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230451AbhKACHh (ORCPT ); Sun, 31 Oct 2021 22:07:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635732303; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bZW6H4UnsfaQ9RdkVQk0dgyoL+wdF1UpGJoFzfoSvRg=; b=P5WMfgCc69U7dGgoq/KGFGQzaH+EnuHgVGJwNNjXIsfPIN4CDdB4Fik1OvFRxb+xWBdsIJ Jaz5Y7XcrTPj8xk2+x19SlVeeylESnx68ulxxcF5gCBgWEOzTw6mGtQR7Njw5+m239OC7M j4hRJsstwr9quFn8jolFf2w/elwFbEo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-68-OqSJ1EcnMJSObMUy4XjKfg-1; Sun, 31 Oct 2021 22:05:02 -0400 X-MC-Unique: OqSJ1EcnMJSObMUy4XjKfg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 475C28018AC; Mon, 1 Nov 2021 02:05:01 +0000 (UTC) Received: from lxbceph1.gsslab.pek2.redhat.com (unknown [10.72.47.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id E03145D6CF; Mon, 1 Nov 2021 02:04:58 +0000 (UTC) From: xiubli@redhat.com To: jlayton@kernel.org Cc: idryomov@gmail.com, vshankar@redhat.com, pdonnell@redhat.com, khiremat@redhat.com, ceph-devel@vger.kernel.org, Xiubo Li Subject: [PATCH v4 3/4] ceph: add __ceph_sync_read helper support Date: Mon, 1 Nov 2021 10:04:46 +0800 Message-Id: <20211101020447.75872-4-xiubli@redhat.com> In-Reply-To: <20211101020447.75872-1-xiubli@redhat.com> References: <20211101020447.75872-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li Signed-off-by: Xiubo Li --- fs/ceph/file.c | 35 +++++++++++++++++++++++------------ fs/ceph/super.h | 2 ++ 2 files changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index af58be73ce1c..9ce78c97de9a 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -901,21 +901,18 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64 * If we get a short result from the OSD, check against i_size; we need to * only return a short read to the caller if we hit EOF. */ -static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, - int *retry_op) +ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, + struct iov_iter *to, int *retry_op) { - struct file *file = iocb->ki_filp; - struct inode *inode = file_inode(file); struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); struct ceph_osd_client *osdc = &fsc->client->osdc; ssize_t ret; - u64 off = iocb->ki_pos; + u64 off = *ki_pos; u64 len = iov_iter_count(to); u64 i_size; - dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len, - (file->f_flags & O_DIRECT) ? "O_DIRECT" : ""); + dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len); if (!len) return 0; @@ -1058,14 +1055,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, break; } - if (off > iocb->ki_pos) { + if (off > *ki_pos) { if (off >= i_size) { *retry_op = CHECK_EOF; - ret = i_size - iocb->ki_pos; - iocb->ki_pos = i_size; + ret = i_size - *ki_pos; + *ki_pos = i_size; } else { - ret = off - iocb->ki_pos; - iocb->ki_pos = off; + ret = off - *ki_pos; + *ki_pos = off; } } out: @@ -1073,6 +1070,20 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, return ret; } +static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, + int *retry_op) +{ + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + + dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos, + (unsigned)iov_iter_count(to), + (file->f_flags & O_DIRECT) ? "O_DIRECT" : ""); + + return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op); + +} + struct ceph_aio_request { struct kiocb *iocb; size_t total_len; diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 027d5f579ba0..57bc952c54e1 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1235,6 +1235,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode); extern int ceph_open(struct inode *inode, struct file *file); extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry, struct file *file, unsigned flags, umode_t mode); +extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, + struct iov_iter *to, int *retry_op); extern int ceph_release(struct inode *inode, struct file *filp); extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page, char *data, size_t len); From patchwork Mon Nov 1 02:04:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 12595599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 969F1C433F5 for ; Mon, 1 Nov 2021 02:05:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8066660F24 for ; Mon, 1 Nov 2021 02:05:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230511AbhKACHm (ORCPT ); Sun, 31 Oct 2021 22:07:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:46954 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230451AbhKACHl (ORCPT ); Sun, 31 Oct 2021 22:07:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635732308; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IZnDX3bIAcTekKdq5uArXp5QY0MKHjpkspAvoqr41GU=; b=LnmyLutqWqQ06amMMVpeV0tYC6eC77IdE5b7M5jJZ2dIzL/bUvveLgXedVQnA93zUvjaX4 u0N36qsJo6Volf3bxeM8BU+wIfoueQesT0dwLNJEaSzjxWtByZz4x+p+g8613+kp00pY2I ZBjekoRSP0ENADQmEQMdmgabojFVL/Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-371-UfOb6KrNO0u2rW4v5tklrQ-1; Sun, 31 Oct 2021 22:05:05 -0400 X-MC-Unique: UfOb6KrNO0u2rW4v5tklrQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2D2E6806689; Mon, 1 Nov 2021 02:05:04 +0000 (UTC) Received: from lxbceph1.gsslab.pek2.redhat.com (unknown [10.72.47.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id C366C5D6CF; Mon, 1 Nov 2021 02:05:01 +0000 (UTC) From: xiubli@redhat.com To: jlayton@kernel.org Cc: idryomov@gmail.com, vshankar@redhat.com, pdonnell@redhat.com, khiremat@redhat.com, ceph-devel@vger.kernel.org, Xiubo Li Subject: [PATCH v4 4/4] ceph: add truncate size handling support for fscrypt Date: Mon, 1 Nov 2021 10:04:47 +0800 Message-Id: <20211101020447.75872-5-xiubli@redhat.com> In-Reply-To: <20211101020447.75872-1-xiubli@redhat.com> References: <20211101020447.75872-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li This will transfer the encrypted last block contents to the MDS along with the truncate request only when the new size is smaller and not aligned to the fscrypt BLOCK size. When the last block is located in the file hole, the truncate request will only contain the header. The MDS could fail to do the truncate if there has another client or process has already updated the Rados object which contains the last block, and will return -EAGAIN, then the kclient needs to retry it. The RMW will take around 50ms, and will let it retry 20 times for now. Signed-off-by: Xiubo Li --- fs/ceph/caps.c | 2 - fs/ceph/file.c | 10 +- fs/ceph/inode.c | 182 ++++++++++++++++++++++++++++++++++-- fs/ceph/super.h | 3 +- include/linux/ceph/crypto.h | 28 ++++++ 5 files changed, 211 insertions(+), 14 deletions(-) create mode 100644 include/linux/ceph/crypto.h diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 4e2a588465c5..c9624b059eb0 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1299,8 +1299,6 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg) * fscrypt_auth holds the crypto context (if any). fscrypt_file * tracks the real i_size as an __le64 field (and we use a rounded-up * i_size in * the traditional size field). - * - * FIXME: should we encrypt fscrypt_file field? */ ceph_encode_32(&p, arg->fscrypt_auth_len); ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len); diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 9ce78c97de9a..8673a4dc5538 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -902,7 +902,8 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64 * only return a short read to the caller if we hit EOF. */ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, - struct iov_iter *to, int *retry_op) + struct iov_iter *to, int *retry_op, + u64 *assert_ver) { struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); @@ -978,6 +979,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, req->r_end_latency, len, ret); + /* Grab assert version. It must be non-zero. */ + *assert_ver = req->r_version; + WARN_ON_ONCE(assert_ver == 0); ceph_osdc_put_request(req); i_size = i_size_read(inode); @@ -1075,12 +1079,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, { struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); + u64 assert_ver; dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos, (unsigned)iov_iter_count(to), (file->f_flags & O_DIRECT) ? "O_DIRECT" : ""); - return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op); + return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op, + &assert_ver); } diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 9b798690fdc9..d84692d6609a 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -21,6 +21,7 @@ #include "cache.h" #include "crypto.h" #include +#include /* * Ceph inode operations @@ -1034,10 +1035,14 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page, pool_ns = old_ns; if (IS_ENCRYPTED(inode) && size && - (iinfo->fscrypt_file_len == sizeof(__le64))) { - size = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file); - if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE)) - pr_warn("size=%llu fscrypt_file=%llu\n", info->size, size); + (iinfo->fscrypt_file_len >= sizeof(__le64))) { + u64 fsize = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file); + if (fsize) { + size = fsize; + if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE)) + pr_warn("size=%llu fscrypt_file=%llu\n", + info->size, size); + } } queue_trunc = ceph_fill_file_size(inode, issued, @@ -2229,6 +2234,129 @@ static const struct inode_operations ceph_encrypted_symlink_iops = { .listxattr = ceph_listxattr, }; +/* + * Transfer the encrypted last block to the MDS and the MDS + * will update the file when truncating a smaller size. + * + * We don't support a PAGE_SIZE that is smaller than the + * CEPH_FSCRYPT_BLOCK_SIZE. + */ +static int fill_fscrypt_truncate(struct inode *inode, + struct ceph_mds_request *req, + struct iattr *attr) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + int boff = attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE; + loff_t pos, orig_pos = round_down(attr->ia_size, CEPH_FSCRYPT_BLOCK_SIZE); + u64 block = orig_pos >> CEPH_FSCRYPT_BLOCK_SHIFT; + struct ceph_pagelist *pagelist = NULL; + struct kvec iov; + struct iov_iter iter; + struct page *page = NULL; + struct ceph_fscrypt_truncate_size_header header; + int retry_op = 0; + int len = CEPH_FSCRYPT_BLOCK_SIZE; + loff_t i_size = i_size_read(inode); + u64 assert_ver = cpu_to_le64(0); + int got, ret, issued; + + ret = __ceph_get_caps(inode, NULL, CEPH_CAP_FILE_RD, 0, -1, &got); + if (ret < 0) + return ret; + + dout("%s size %lld -> %lld got cap refs on %s\n", __func__, + i_size, attr->ia_size, ceph_cap_string(got)); + + issued = __ceph_caps_issued(ci, NULL); + + /* Try to writeback the dirty pagecaches */ + if (issued & (CEPH_CAP_FILE_BUFFER)) + filemap_fdatawrite(&inode->i_data); + + page = __page_cache_alloc(GFP_KERNEL); + if (page == NULL) { + ret = -ENOMEM; + goto out; + } + + pagelist = ceph_pagelist_alloc(GFP_KERNEL); + if (!pagelist) { + ret = -ENOMEM; + goto out; + } + + iov.iov_base = kmap_local_page(page); + iov.iov_len = len; + iov_iter_kvec(&iter, READ, &iov, 1, len); + + pos = orig_pos; + ret = __ceph_sync_read(inode, &pos, &iter, &retry_op, &assert_ver); + ceph_put_cap_refs(ci, got); + + /* Insert the header first */ + header.ver = 1; + header.compat = 1; + + /* + * If we hit a hole here, we should just skip filling + * the fscrypt for the request, because once the fscrypt + * is enabled, the file will be split into many blocks + * with the size of CEPH_FSCRYPT_BLOCK_SIZE, if there + * has a hole, the hole size should be multiple of block + * size. + */ + if (pos < i_size && ret < len) { + dout("%s hit hole, ppos %lld < size %lld\n", + __func__, pos, i_size); + + header.data_len = cpu_to_le32(8 + 8 + 4); + header.assert_ver = cpu_to_le64(0); + header.file_offset = cpu_to_le64(0); + header.block_size = cpu_to_le64(0); + ret = 0; + } else { + header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE); + header.assert_ver = assert_ver; + header.file_offset = cpu_to_le64(orig_pos); + header.block_size = cpu_to_le64(CEPH_FSCRYPT_BLOCK_SIZE); + + /* truncate and zero out the extra contents for the last block */ + memset(iov.iov_base + boff, 0, PAGE_SIZE - boff); + + /* encrypt the last block */ + ret = fscrypt_encrypt_block_inplace(inode, page, + CEPH_FSCRYPT_BLOCK_SIZE, + 0, block, + GFP_KERNEL); + if (ret) + goto out; + + } + + /* Insert the header */ + ret = ceph_pagelist_append(pagelist, &header, sizeof(header)); + if (ret) + goto out; + + if (header.block_size) { + /* Append the last block contents to pagelist */ + ret = ceph_pagelist_append(pagelist, iov.iov_base, + CEPH_FSCRYPT_BLOCK_SIZE); + if (ret) + goto out; + } + req->r_pagelist = pagelist; +out: + dout("%s %p size dropping cap refs on %s\n", __func__, + inode, ceph_cap_string(got)); + kunmap_local(iov.iov_base); + if (page) + __free_pages(page, 0); + if (ret && pagelist) + ceph_pagelist_release(pagelist); + return ret; +} + int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia) { struct ceph_inode_info *ci = ceph_inode(inode); @@ -2236,12 +2364,15 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c struct ceph_mds_request *req; struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc; struct ceph_cap_flush *prealloc_cf; + loff_t isize = i_size_read(inode); int issued; int release = 0, dirtied = 0; int mask = 0; int err = 0; int inode_dirty_flags = 0; bool lock_snap_rwsem = false; + bool fill_fscrypt; + int truncate_retry = 20; /* The RMW will take around 50ms */ prealloc_cf = ceph_alloc_cap_flush(); if (!prealloc_cf) @@ -2254,6 +2385,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c return PTR_ERR(req); } +retry: + fill_fscrypt = false; spin_lock(&ci->i_ceph_lock); issued = __ceph_caps_issued(ci, NULL); @@ -2367,10 +2500,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c } } if (ia_valid & ATTR_SIZE) { - loff_t isize = i_size_read(inode); - dout("setattr %p size %lld -> %lld\n", inode, isize, attr->ia_size); - if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) { + /* + * Only when the new size is smaller and not aligned to + * CEPH_FSCRYPT_BLOCK_SIZE will the RMW is needed. + */ + if (IS_ENCRYPTED(inode) && attr->ia_size < isize && + (attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE)) { + mask |= CEPH_SETATTR_SIZE; + release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL | + CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR; + set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); + mask |= CEPH_SETATTR_FSCRYPT_FILE; + req->r_args.setattr.size = + cpu_to_le64(round_up(attr->ia_size, + CEPH_FSCRYPT_BLOCK_SIZE)); + req->r_args.setattr.old_size = + cpu_to_le64(round_up(isize, + CEPH_FSCRYPT_BLOCK_SIZE)); + req->r_fscrypt_file = attr->ia_size; + fill_fscrypt = true; + } else if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) { if (attr->ia_size > isize) { i_size_write(inode, attr->ia_size); inode->i_blocks = calc_inode_blocks(attr->ia_size); @@ -2393,7 +2543,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c cpu_to_le64(round_up(isize, CEPH_FSCRYPT_BLOCK_SIZE)); req->r_fscrypt_file = attr->ia_size; - /* FIXME: client must zero out any partial blocks! */ } else { req->r_args.setattr.size = cpu_to_le64(attr->ia_size); req->r_args.setattr.old_size = cpu_to_le64(isize); @@ -2465,7 +2614,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c if (inode_dirty_flags) __mark_inode_dirty(inode, inode_dirty_flags); - if (mask) { req->r_inode = inode; ihold(inode); @@ -2473,7 +2621,23 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c req->r_args.setattr.mask = cpu_to_le32(mask); req->r_num_caps = 1; req->r_stamp = attr->ia_ctime; + if (fill_fscrypt) { + err = fill_fscrypt_truncate(inode, req, attr); + if (err) + goto out; + } + + /* + * The truncate will return -EAGAIN when some one + * has updated the last block before the MDS hold + * the xlock for the FILE lock. Need to retry it. + */ err = ceph_mdsc_do_request(mdsc, NULL, req); + if (err == -EAGAIN && truncate_retry--) { + dout("setattr %p result=%d (%s locally, %d remote), retry it!\n", + inode, err, ceph_cap_string(dirtied), mask); + goto retry; + } } out: dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err, diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 57bc952c54e1..c8144273ff28 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1236,7 +1236,8 @@ extern int ceph_open(struct inode *inode, struct file *file); extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry, struct file *file, unsigned flags, umode_t mode); extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, - struct iov_iter *to, int *retry_op); + struct iov_iter *to, int *retry_op, + u64 *assert_ver); extern int ceph_release(struct inode *inode, struct file *filp); extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page, char *data, size_t len); diff --git a/include/linux/ceph/crypto.h b/include/linux/ceph/crypto.h new file mode 100644 index 000000000000..2b0961902887 --- /dev/null +++ b/include/linux/ceph/crypto.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _FS_CEPH_CRYPTO_H +#define _FS_CEPH_CRYPTO_H + +#include + +/* + * Header for the crypted file when truncating the size, this + * will be sent to MDS, and the MDS will update the encrypted + * last block and then truncate the size. + */ +struct ceph_fscrypt_truncate_size_header { + __u8 ver; + __u8 compat; + + /* + * It will be sizeof(assert_ver + file_offset + block_size) + * if the last block is empty when it's located in a file + * hole. Or the data_len will plus CEPH_FSCRYPT_BLOCK_SIZE. + */ + __le32 data_len; + + __le64 assert_ver; + __le64 file_offset; + __le32 block_size; +} __packed; + +#endif