From patchwork Wed Jul 3 12:44:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029473 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CF9EE138B for ; Wed, 3 Jul 2019 12:44:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C0A0120223 for ; Wed, 3 Jul 2019 12:44:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B56B7289A3; Wed, 3 Jul 2019 12:44:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E275289A2 for ; Wed, 3 Jul 2019 12:44:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726457AbfGCMou (ORCPT ); Wed, 3 Jul 2019 08:44:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48758 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfGCMou (ORCPT ); Wed, 3 Jul 2019 08:44:50 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A41A9C18B2C1 for ; Wed, 3 Jul 2019 12:44:49 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6D3E917B40; Wed, 3 Jul 2019 12:44:47 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 1/9] libceph: add function that reset client's entity addr Date: Wed, 3 Jul 2019 20:44:34 +0800 Message-Id: <20190703124442.6614-2-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 03 Jul 2019 12:44:49 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This function also re-open connections to OSD/MON, and re-send in-flight OSD requests after re-opening connections to OSD. Signed-off-by: "Yan, Zheng" --- include/linux/ceph/libceph.h | 1 + include/linux/ceph/messenger.h | 1 + include/linux/ceph/mon_client.h | 1 + include/linux/ceph/osd_client.h | 1 + net/ceph/ceph_common.c | 8 ++++++++ net/ceph/messenger.c | 5 +++++ net/ceph/mon_client.c | 7 +++++++ net/ceph/osd_client.c | 16 ++++++++++++++++ 8 files changed, 40 insertions(+) diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h index a3cddf5f0e60..f29959eed025 100644 --- a/include/linux/ceph/libceph.h +++ b/include/linux/ceph/libceph.h @@ -291,6 +291,7 @@ struct ceph_client *ceph_create_client(struct ceph_options *opt, void *private); struct ceph_entity_addr *ceph_client_addr(struct ceph_client *client); u64 ceph_client_gid(struct ceph_client *client); extern void ceph_destroy_client(struct ceph_client *client); +extern void ceph_reset_client_addr(struct ceph_client *client); extern int __ceph_open_session(struct ceph_client *client, unsigned long started); extern int ceph_open_session(struct ceph_client *client); diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 23895d178149..c4458dc6a757 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -337,6 +337,7 @@ extern void ceph_msgr_flush(void); extern void ceph_messenger_init(struct ceph_messenger *msgr, struct ceph_entity_addr *myaddr); extern void ceph_messenger_fini(struct ceph_messenger *msgr); +extern void ceph_messenger_reset_nonce(struct ceph_messenger *msgr); extern void ceph_con_init(struct ceph_connection *con, void *private, const struct ceph_connection_operations *ops, diff --git a/include/linux/ceph/mon_client.h b/include/linux/ceph/mon_client.h index b4d134d3312a..dbb8a6959a73 100644 --- a/include/linux/ceph/mon_client.h +++ b/include/linux/ceph/mon_client.h @@ -109,6 +109,7 @@ extern int ceph_monmap_contains(struct ceph_monmap *m, extern int ceph_monc_init(struct ceph_mon_client *monc, struct ceph_client *cl); extern void ceph_monc_stop(struct ceph_mon_client *monc); +extern void ceph_monc_reopen_session(struct ceph_mon_client *monc); enum { CEPH_SUB_MONMAP = 0, diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 2294f963dab7..a12b7fc9cfd6 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -381,6 +381,7 @@ extern void ceph_osdc_cleanup(void); extern int ceph_osdc_init(struct ceph_osd_client *osdc, struct ceph_client *client); extern void ceph_osdc_stop(struct ceph_osd_client *osdc); +extern void ceph_osdc_reopen_osds(struct ceph_osd_client *osdc); extern void ceph_osdc_handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg); diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c index 1c811c74bfc0..6676f48d615c 100644 --- a/net/ceph/ceph_common.c +++ b/net/ceph/ceph_common.c @@ -694,6 +694,14 @@ void ceph_destroy_client(struct ceph_client *client) } EXPORT_SYMBOL(ceph_destroy_client); +void ceph_reset_client_addr(struct ceph_client *client) +{ + ceph_messenger_reset_nonce(&client->msgr); + ceph_monc_reopen_session(&client->monc); + ceph_osdc_reopen_osds(&client->osdc); +} +EXPORT_SYMBOL(ceph_reset_client_addr); + /* * true if we have the mon map (and have thus joined the cluster) */ diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 0473d9a7b1f4..53b9ca4b72e0 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -3030,6 +3030,11 @@ static void con_fault(struct ceph_connection *con) } +void ceph_messenger_reset_nonce(struct ceph_messenger *msgr) +{ + msgr->inst.addr.nonce += 1000000; + encode_my_addr(msgr); +} /* * initialize a new messenger instance diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c index 0520bf9825aa..7256c402ebaa 100644 --- a/net/ceph/mon_client.c +++ b/net/ceph/mon_client.c @@ -213,6 +213,13 @@ static void reopen_session(struct ceph_mon_client *monc) __open_session(monc); } +void ceph_monc_reopen_session(struct ceph_mon_client *monc) +{ + mutex_lock(&monc->mutex); + reopen_session(monc); + mutex_unlock(&monc->mutex); +} + static void un_backoff(struct ceph_mon_client *monc) { monc->hunt_mult /= 2; /* reduce by 50% */ diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 54170a35ecec..42d370c6f5c4 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -5095,6 +5095,22 @@ int ceph_osdc_call(struct ceph_osd_client *osdc, } EXPORT_SYMBOL(ceph_osdc_call); +/* + * reset all osd connections + */ +void ceph_osdc_reopen_osds(struct ceph_osd_client *osdc) +{ + struct rb_node *n; + down_write(&osdc->lock); + for (n = rb_first(&osdc->osds); n; ) { + struct ceph_osd *osd = rb_entry(n, struct ceph_osd, o_node); + n = rb_next(n); + if (!reopen_osd(osd)) + kick_osd_requests(osd); + } + up_write(&osdc->lock); +} + /* * init, shutdown */ From patchwork Wed Jul 3 12:44:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029475 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94D641398 for ; Wed, 3 Jul 2019 12:44:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86239289A0 for ; Wed, 3 Jul 2019 12:44:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F56A289A3; Wed, 3 Jul 2019 12:44:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CDAB2898E for ; Wed, 3 Jul 2019 12:44:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726550AbfGCMoz (ORCPT ); Wed, 3 Jul 2019 08:44:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33470 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfGCMoz (ORCPT ); Wed, 3 Jul 2019 08:44:55 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 328E92F8BD8 for ; Wed, 3 Jul 2019 12:44:55 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 57CF517B40; Wed, 3 Jul 2019 12:44:50 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 2/9] libceph: add function that clears osd client's abort_err Date: Wed, 3 Jul 2019 20:44:35 +0800 Message-Id: <20190703124442.6614-3-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Wed, 03 Jul 2019 12:44:55 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: "Yan, Zheng" --- include/linux/ceph/osd_client.h | 1 + net/ceph/osd_client.c | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index a12b7fc9cfd6..71ddb0a89f4d 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -389,6 +389,7 @@ extern void ceph_osdc_handle_map(struct ceph_osd_client *osdc, struct ceph_msg *msg); void ceph_osdc_update_epoch_barrier(struct ceph_osd_client *osdc, u32 eb); void ceph_osdc_abort_requests(struct ceph_osd_client *osdc, int err); +void ceph_osdc_clear_abort_err(struct ceph_osd_client *osdc); extern void osd_req_op_init(struct ceph_osd_request *osd_req, unsigned int which, u16 opcode, u32 flags); diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 42d370c6f5c4..d87127e383f2 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -2485,6 +2485,14 @@ void ceph_osdc_abort_requests(struct ceph_osd_client *osdc, int err) } EXPORT_SYMBOL(ceph_osdc_abort_requests); +void ceph_osdc_clear_abort_err(struct ceph_osd_client *osdc) +{ + down_write(&osdc->lock); + osdc->abort_err = 0; + up_write(&osdc->lock); +} +EXPORT_SYMBOL(ceph_osdc_clear_abort_err); + static void update_epoch_barrier(struct ceph_osd_client *osdc, u32 eb) { if (likely(eb > osdc->epoch_barrier)) { From patchwork Wed Jul 3 12:44:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029477 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AFA871398 for ; Wed, 3 Jul 2019 12:44:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A12D0289A0 for ; Wed, 3 Jul 2019 12:44:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 941FA289A2; Wed, 3 Jul 2019 12:44:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 40DB32898E for ; Wed, 3 Jul 2019 12:44:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726605AbfGCMo6 (ORCPT ); Wed, 3 Jul 2019 08:44:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34504 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfGCMo6 (ORCPT ); Wed, 3 Jul 2019 08:44:58 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1381030C31B8 for ; Wed, 3 Jul 2019 12:44:58 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0B23617B40; Wed, 3 Jul 2019 12:44:55 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 3/9] ceph: allow closing session in restarting/reconnect state Date: Wed, 3 Jul 2019 20:44:36 +0800 Message-Id: <20190703124442.6614-4-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Wed, 03 Jul 2019 12:44:58 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP CEPH_MDS_SESSION_{RESTARTING,RECONNECTING} are for for mds failover, they are sub-states of CEPH_MDS_SESSION_OPEN. So __close_session() should send close request for session in these two state. Signed-off-by: "Yan, Zheng" --- fs/ceph/mds_client.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index f7c8603484fe..810fd8689dff 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -148,9 +148,9 @@ enum { CEPH_MDS_SESSION_OPENING = 2, CEPH_MDS_SESSION_OPEN = 3, CEPH_MDS_SESSION_HUNG = 4, - CEPH_MDS_SESSION_CLOSING = 5, - CEPH_MDS_SESSION_RESTARTING = 6, - CEPH_MDS_SESSION_RECONNECTING = 7, + CEPH_MDS_SESSION_RESTARTING = 5, + CEPH_MDS_SESSION_RECONNECTING = 6, + CEPH_MDS_SESSION_CLOSING = 7, CEPH_MDS_SESSION_REJECTED = 8, }; From patchwork Wed Jul 3 12:44:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029479 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9375213A4 for ; Wed, 3 Jul 2019 12:45:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 839C0289A3 for ; Wed, 3 Jul 2019 12:45:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 77E6A289A2; Wed, 3 Jul 2019 12:45:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9CCA8289A0 for ; Wed, 3 Jul 2019 12:45:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726675AbfGCMpB (ORCPT ); Wed, 3 Jul 2019 08:45:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46008 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfGCMpB (ORCPT ); Wed, 3 Jul 2019 08:45:01 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CC6AC3086262 for ; Wed, 3 Jul 2019 12:45:00 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id BA359834EB; Wed, 3 Jul 2019 12:44:58 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 4/9] ceph: track and report error of async metadata operation Date: Wed, 3 Jul 2019 20:44:37 +0800 Message-Id: <20190703124442.6614-5-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Wed, 03 Jul 2019 12:45:00 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Use errseq_t to track and report errors of async metadata operations, similar to how kernel handles errors during writeback. If any dirty caps or any unsafe request gets dropped during session eviction, record -EIO in corresponding inode's i_meta_err. The error will be reported by subsequent fsync, Signed-off-by: "Yan, Zheng" --- fs/ceph/caps.c | 24 +++++++++++++++++------- fs/ceph/file.c | 6 ++++-- fs/ceph/inode.c | 2 ++ fs/ceph/mds_client.c | 40 +++++++++++++++++++++++++++------------- fs/ceph/super.h | 4 ++++ 5 files changed, 54 insertions(+), 22 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index d98dcd976c80..345d73f57b80 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2258,35 +2258,45 @@ static int unsafe_request_wait(struct inode *inode) int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync) { + struct ceph_file_info *fi = file->private_data; struct inode *inode = file->f_mapping->host; struct ceph_inode_info *ci = ceph_inode(inode); u64 flush_tid; - int ret; + int ret, err; int dirty; dout("fsync %p%s\n", inode, datasync ? " datasync" : ""); ret = file_write_and_wait_range(file, start, end); - if (ret < 0) - goto out; - if (datasync) goto out; dirty = try_flush_caps(inode, &flush_tid); dout("fsync dirty caps are %s\n", ceph_cap_string(dirty)); - ret = unsafe_request_wait(inode); + err = unsafe_request_wait(inode); /* * only wait on non-file metadata writeback (the mds * can recover size and mtime, so we don't need to * wait for that) */ - if (!ret && (dirty & ~CEPH_CAP_ANY_FILE_WR)) { - ret = wait_event_interruptible(ci->i_cap_wq, + if (!err && (dirty & ~CEPH_CAP_ANY_FILE_WR)) { + err = wait_event_interruptible(ci->i_cap_wq, caps_are_flushed(inode, flush_tid)); } + + if (err < 0) + ret = err; + + if (errseq_check(&ci->i_meta_err, READ_ONCE(fi->meta_err))) { + spin_lock(&file->f_lock); + err = errseq_check_and_advance(&ci->i_meta_err, + &fi->meta_err); + spin_unlock(&file->f_lock); + if (err < 0) + ret = err; + } out: dout("fsync %p%s result=%d\n", inode, datasync ? " datasync" : "", ret); return ret; diff --git a/fs/ceph/file.c b/fs/ceph/file.c index aa20a554142b..deffb3010e6a 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -201,6 +201,7 @@ prepare_open_request(struct super_block *sb, int flags, int create_mode) static int ceph_init_file_info(struct inode *inode, struct file *file, int fmode, bool isdir) { + struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_file_info *fi; dout("%s %p %p 0%o (%s)\n", __func__, inode, file, @@ -211,7 +212,7 @@ static int ceph_init_file_info(struct inode *inode, struct file *file, struct ceph_dir_file_info *dfi = kmem_cache_zalloc(ceph_dir_file_cachep, GFP_KERNEL); if (!dfi) { - ceph_put_fmode(ceph_inode(inode), fmode); /* clean up */ + ceph_put_fmode(ci, fmode); /* clean up */ return -ENOMEM; } @@ -222,7 +223,7 @@ static int ceph_init_file_info(struct inode *inode, struct file *file, } else { fi = kmem_cache_zalloc(ceph_file_cachep, GFP_KERNEL); if (!fi) { - ceph_put_fmode(ceph_inode(inode), fmode); /* clean up */ + ceph_put_fmode(ci, fmode); /* clean up */ return -ENOMEM; } @@ -232,6 +233,7 @@ static int ceph_init_file_info(struct inode *inode, struct file *file, fi->fmode = fmode; spin_lock_init(&fi->rw_contexts_lock); INIT_LIST_HEAD(&fi->rw_contexts); + fi->meta_err = errseq_sample(&ci->i_meta_err); return 0; } diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index a565ab124282..fcaa054a9e37 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -515,6 +515,8 @@ struct inode *ceph_alloc_inode(struct super_block *sb) ceph_fscache_inode_init(ci); + ci->i_meta_err = 0; + return &ci->vfs_inode; } diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index f51a7957b3e0..d4f07d3120cb 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1270,6 +1270,7 @@ static void cleanup_session_requests(struct ceph_mds_client *mdsc, { struct ceph_mds_request *req; struct rb_node *p; + struct ceph_inode_info *ci; dout("cleanup_session_requests mds%d\n", session->s_mds); mutex_lock(&mdsc->mutex); @@ -1278,6 +1279,16 @@ static void cleanup_session_requests(struct ceph_mds_client *mdsc, struct ceph_mds_request, r_unsafe_item); pr_warn_ratelimited(" dropping unsafe request %llu\n", req->r_tid); + if (req->r_target_inode) { + /* dropping unsafe change of inode's attributes */ + ci = ceph_inode(req->r_target_inode); + errseq_set(&ci->i_meta_err, -EIO); + } + if (req->r_unsafe_dir) { + /* dropping unsafe directory operation */ + ci = ceph_inode(req->r_unsafe_dir); + errseq_set(&ci->i_meta_err, -EIO); + } __unregister_request(mdsc, req); } /* zero r_attempts, so kick_requests() will re-send requests */ @@ -1370,7 +1381,7 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, struct ceph_fs_client *fsc = (struct ceph_fs_client *)arg; struct ceph_inode_info *ci = ceph_inode(inode); LIST_HEAD(to_remove); - bool drop = false; + bool dirty_dropped = false; bool invalidate = false; dout("removing cap %p, ci is %p, inode is %p\n", @@ -1405,7 +1416,7 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, inode, ceph_ino(inode)); ci->i_dirty_caps = 0; list_del_init(&ci->i_dirty_item); - drop = true; + dirty_dropped = true; } if (!list_empty(&ci->i_flushing_item)) { pr_warn_ratelimited( @@ -1415,10 +1426,22 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, ci->i_flushing_caps = 0; list_del_init(&ci->i_flushing_item); mdsc->num_cap_flushing--; - drop = true; + dirty_dropped = true; } spin_unlock(&mdsc->cap_dirty_lock); + if (dirty_dropped) { + errseq_set(&ci->i_meta_err, -EIO); + + if (ci->i_wrbuffer_ref_head == 0 && + ci->i_wr_ref == 0 && + ci->i_dirty_caps == 0 && + ci->i_flushing_caps == 0) { + ceph_put_snap_context(ci->i_head_snapc); + ci->i_head_snapc = NULL; + } + } + if (atomic_read(&ci->i_filelock_ref) > 0) { /* make further file lock syscall return -EIO */ ci->i_ceph_flags |= CEPH_I_ERROR_FILELOCK; @@ -1430,15 +1453,6 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, list_add(&ci->i_prealloc_cap_flush->i_list, &to_remove); ci->i_prealloc_cap_flush = NULL; } - - if (drop && - ci->i_wrbuffer_ref_head == 0 && - ci->i_wr_ref == 0 && - ci->i_dirty_caps == 0 && - ci->i_flushing_caps == 0) { - ceph_put_snap_context(ci->i_head_snapc); - ci->i_head_snapc = NULL; - } } spin_unlock(&ci->i_ceph_lock); while (!list_empty(&to_remove)) { @@ -1452,7 +1466,7 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, wake_up_all(&ci->i_cap_wq); if (invalidate) ceph_queue_invalidate(inode); - if (drop) + if (dirty_dropped) iput(inode); return 0; } diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 30e9a4e415cc..749326859045 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -395,6 +395,8 @@ struct ceph_inode_info { struct fscache_cookie *fscache; u32 i_fscache_gen; #endif + errseq_t i_meta_err; + struct inode vfs_inode; /* at end */ }; @@ -703,6 +705,8 @@ struct ceph_file_info { spinlock_t rw_contexts_lock; struct list_head rw_contexts; + + errseq_t meta_err; }; struct ceph_dir_file_info { From patchwork Wed Jul 3 12:44:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029481 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A2A313A4 for ; Wed, 3 Jul 2019 12:45:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3AE36285B7 for ; Wed, 3 Jul 2019 12:45:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F64E289A3; Wed, 3 Jul 2019 12:45:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 548E5289A0 for ; Wed, 3 Jul 2019 12:45:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726710AbfGCMpJ (ORCPT ); Wed, 3 Jul 2019 08:45:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51620 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726678AbfGCMpJ (ORCPT ); Wed, 3 Jul 2019 08:45:09 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C8C35307D871 for ; Wed, 3 Jul 2019 12:45:08 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 80F75834EB; Wed, 3 Jul 2019 12:45:01 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 5/9] ceph: pass filp to ceph_get_caps() Date: Wed, 3 Jul 2019 20:44:38 +0800 Message-Id: <20190703124442.6614-6-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Wed, 03 Jul 2019 12:45:08 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Also change several other functions' arguments, no logical changes. This is preparetion for later patch that checks filp error. Signed-off-by: "Yan, Zheng" --- fs/ceph/addr.c | 15 +++++++++------ fs/ceph/caps.c | 32 +++++++++++++++++--------------- fs/ceph/file.c | 35 +++++++++++++++++------------------ fs/ceph/super.h | 6 +++--- 4 files changed, 46 insertions(+), 42 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index e078cc55b989..9f357c5ce84d 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -323,7 +323,8 @@ static int start_read(struct inode *inode, struct ceph_rw_context *rw_ctx, /* caller of readpages does not hold buffer and read caps * (fadvise, madvise and readahead cases) */ int want = CEPH_CAP_FILE_CACHE; - ret = ceph_try_get_caps(ci, CEPH_CAP_FILE_RD, want, true, &got); + ret = ceph_try_get_caps(inode, CEPH_CAP_FILE_RD, want, + true, &got); if (ret < 0) { dout("start_read %p, error getting cap\n", inode); } else if (!(got & want)) { @@ -1451,7 +1452,8 @@ static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf) want = CEPH_CAP_FILE_CACHE; got = 0; - err = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, &got, &pinned_page); + err = ceph_get_caps(vma->vm_file, CEPH_CAP_FILE_RD, want, -1, + &got, &pinned_page); if (err < 0) goto out_restore; @@ -1567,7 +1569,7 @@ static vm_fault_t ceph_page_mkwrite(struct vm_fault *vmf) want = CEPH_CAP_FILE_BUFFER; got = 0; - err = ceph_get_caps(ci, CEPH_CAP_FILE_WR, want, off + len, + err = ceph_get_caps(vma->vm_file, CEPH_CAP_FILE_WR, want, off + len, &got, NULL); if (err < 0) goto out_free; @@ -1988,10 +1990,11 @@ static int __ceph_pool_perm_get(struct ceph_inode_info *ci, return err; } -int ceph_pool_perm_check(struct ceph_inode_info *ci, int need) +int ceph_pool_perm_check(struct inode *inode, int need) { - s64 pool; + struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_string *pool_ns; + s64 pool; int ret, flags; if (ci->i_vino.snap != CEPH_NOSNAP) { @@ -2003,7 +2006,7 @@ int ceph_pool_perm_check(struct ceph_inode_info *ci, int need) return 0; } - if (ceph_test_mount_opt(ceph_inode_to_client(&ci->vfs_inode), + if (ceph_test_mount_opt(ceph_inode_to_client(inode), NOPOOLPERM)) return 0; diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 345d73f57b80..40443955250c 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2567,10 +2567,10 @@ static void __take_cap_refs(struct ceph_inode_info *ci, int got, * * FIXME: how does a 0 return differ from -EAGAIN? */ -static int try_get_cap_refs(struct ceph_inode_info *ci, int need, int want, +static int try_get_cap_refs(struct inode *inode, int need, int want, loff_t endoff, bool nonblock, int *got) { - struct inode *inode = &ci->vfs_inode; + struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; int ret = 0; int have, implemented; @@ -2738,18 +2738,18 @@ static void check_max_size(struct inode *inode, loff_t endoff) ceph_check_caps(ci, CHECK_CAPS_AUTHONLY, NULL); } -int ceph_try_get_caps(struct ceph_inode_info *ci, int need, int want, +int ceph_try_get_caps(struct inode *inode, int need, int want, bool nonblock, int *got) { int ret; BUG_ON(need & ~CEPH_CAP_FILE_RD); BUG_ON(want & ~(CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO|CEPH_CAP_FILE_SHARED)); - ret = ceph_pool_perm_check(ci, need); + ret = ceph_pool_perm_check(inode, need); if (ret < 0) return ret; - ret = try_get_cap_refs(ci, need, want, 0, nonblock, got); + ret = try_get_cap_refs(inode, need, want, 0, nonblock, got); return ret == -EAGAIN ? 0 : ret; } @@ -2758,21 +2758,23 @@ int ceph_try_get_caps(struct ceph_inode_info *ci, int need, int want, * due to a small max_size, make sure we check_max_size (and possibly * ask the mds) so we don't get hung up indefinitely. */ -int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, +int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got, struct page **pinned_page) { + struct inode *inode = file_inode(filp); + struct ceph_inode_info *ci = ceph_inode(inode); int _got, ret; - ret = ceph_pool_perm_check(ci, need); + ret = ceph_pool_perm_check(inode, need); if (ret < 0) return ret; while (true) { if (endoff > 0) - check_max_size(&ci->vfs_inode, endoff); + check_max_size(inode, endoff); _got = 0; - ret = try_get_cap_refs(ci, need, want, endoff, + ret = try_get_cap_refs(inode, need, want, endoff, false, &_got); if (ret == -EAGAIN) continue; @@ -2780,8 +2782,8 @@ int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, DEFINE_WAIT_FUNC(wait, woken_wake_function); add_wait_queue(&ci->i_cap_wq, &wait); - while (!(ret = try_get_cap_refs(ci, need, want, endoff, - true, &_got))) { + while (!(ret = try_get_cap_refs(inode, need, want, + endoff, true, &_got))) { if (signal_pending(current)) { ret = -ERESTARTSYS; break; @@ -2796,7 +2798,7 @@ int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, if (ret < 0) { if (ret == -ESTALE) { /* session was killed, try renew caps */ - ret = ceph_renew_caps(&ci->vfs_inode); + ret = ceph_renew_caps(inode); if (ret == 0) continue; } @@ -2805,9 +2807,9 @@ int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, if (ci->i_inline_version != CEPH_INLINE_NONE && (_got & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) && - i_size_read(&ci->vfs_inode) > 0) { + i_size_read(inode) > 0) { struct page *page = - find_get_page(ci->vfs_inode.i_mapping, 0); + find_get_page(inode->i_mapping, 0); if (page) { if (PageUptodate(page)) { *pinned_page = page; @@ -2826,7 +2828,7 @@ int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, * getattr request will bring inline data into * page cache */ - ret = __ceph_do_getattr(&ci->vfs_inode, NULL, + ret = __ceph_do_getattr(inode, NULL, CEPH_STAT_CAP_INLINE_DATA, true); if (ret < 0) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index deffb3010e6a..b868048211ba 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1262,7 +1262,8 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to) want = CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO; else want = CEPH_CAP_FILE_CACHE; - ret = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, &got, &pinned_page); + ret = ceph_get_caps(filp, CEPH_CAP_FILE_RD, want, -1, + &got, &pinned_page); if (ret < 0) return ret; @@ -1459,7 +1460,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) else want = CEPH_CAP_FILE_BUFFER; got = 0; - err = ceph_get_caps(ci, CEPH_CAP_FILE_WR, want, pos + count, + err = ceph_get_caps(file, CEPH_CAP_FILE_WR, want, pos + count, &got, NULL); if (err < 0) goto out; @@ -1783,7 +1784,7 @@ static long ceph_fallocate(struct file *file, int mode, else want = CEPH_CAP_FILE_BUFFER; - ret = ceph_get_caps(ci, CEPH_CAP_FILE_WR, want, endoff, &got, NULL); + ret = ceph_get_caps(file, CEPH_CAP_FILE_WR, want, endoff, &got, NULL); if (ret < 0) goto unlock; @@ -1812,16 +1813,15 @@ static long ceph_fallocate(struct file *file, int mode, * src_ci. Two attempts are made to obtain both caps, and an error is return if * this fails; zero is returned on success. */ -static int get_rd_wr_caps(struct ceph_inode_info *src_ci, - loff_t src_endoff, int *src_got, - struct ceph_inode_info *dst_ci, +static int get_rd_wr_caps(struct file *src_filp, int *src_got, + struct file *dst_filp, loff_t dst_endoff, int *dst_got) { int ret = 0; bool retrying = false; retry_caps: - ret = ceph_get_caps(dst_ci, CEPH_CAP_FILE_WR, CEPH_CAP_FILE_BUFFER, + ret = ceph_get_caps(dst_filp, CEPH_CAP_FILE_WR, CEPH_CAP_FILE_BUFFER, dst_endoff, dst_got, NULL); if (ret < 0) return ret; @@ -1831,24 +1831,24 @@ static int get_rd_wr_caps(struct ceph_inode_info *src_ci, * we would risk a deadlock by using ceph_get_caps. Thus, we'll do some * retry dance instead to try to get both capabilities. */ - ret = ceph_try_get_caps(src_ci, CEPH_CAP_FILE_RD, CEPH_CAP_FILE_SHARED, + ret = ceph_try_get_caps(file_inode(src_filp), + CEPH_CAP_FILE_RD, CEPH_CAP_FILE_SHARED, false, src_got); if (ret <= 0) { /* Start by dropping dst_ci caps and getting src_ci caps */ - ceph_put_cap_refs(dst_ci, *dst_got); + ceph_put_cap_refs(ceph_inode(file_inode(dst_filp)), *dst_got); if (retrying) { if (!ret) /* ceph_try_get_caps masks EAGAIN */ ret = -EAGAIN; return ret; } - ret = ceph_get_caps(src_ci, CEPH_CAP_FILE_RD, - CEPH_CAP_FILE_SHARED, src_endoff, - src_got, NULL); + ret = ceph_get_caps(src_filp, CEPH_CAP_FILE_RD, + CEPH_CAP_FILE_SHARED, -1, src_got, NULL); if (ret < 0) return ret; /*... drop src_ci caps too, and retry */ - ceph_put_cap_refs(src_ci, *src_got); + ceph_put_cap_refs(ceph_inode(file_inode(src_filp)), *src_got); retrying = true; goto retry_caps; } @@ -1962,8 +1962,8 @@ static ssize_t ceph_copy_file_range(struct file *src_file, loff_t src_off, * clients may have dirty data in their caches. And OSDs know nothing * about caps, so they can't safely do the remote object copies. */ - err = get_rd_wr_caps(src_ci, (src_off + len), &src_got, - dst_ci, (dst_off + len), &dst_got); + err = get_rd_wr_caps(src_file, &src_got, + dst_file, (dst_off + len), &dst_got); if (err < 0) { dout("get_rd_wr_caps returned %d\n", err); ret = -EOPNOTSUPP; @@ -2020,9 +2020,8 @@ static ssize_t ceph_copy_file_range(struct file *src_file, loff_t src_off, goto out; } len -= ret; - err = get_rd_wr_caps(src_ci, (src_off + len), - &src_got, dst_ci, - (dst_off + len), &dst_got); + err = get_rd_wr_caps(src_file, &src_got, + dst_file, (dst_off + len), &dst_got); if (err < 0) goto out; err = is_file_size_ok(src_inode, dst_inode, diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 749326859045..56555d811123 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1063,9 +1063,9 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn, struct inode *dir, int mds, int drop, int unless); -extern int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, +extern int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got, struct page **pinned_page); -extern int ceph_try_get_caps(struct ceph_inode_info *ci, +extern int ceph_try_get_caps(struct inode *inode, int need, int want, bool nonblock, int *got); /* for counting open files by mode */ @@ -1076,7 +1076,7 @@ extern void ceph_put_fmode(struct ceph_inode_info *ci, int mode); extern const struct address_space_operations ceph_aops; extern int ceph_mmap(struct file *file, struct vm_area_struct *vma); extern int ceph_uninline_data(struct file *filp, struct page *locked_page); -extern int ceph_pool_perm_check(struct ceph_inode_info *ci, int need); +extern int ceph_pool_perm_check(struct inode *inode, int need); extern void ceph_pool_perm_destroy(struct ceph_mds_client* mdsc); /* file.c */ From patchwork Wed Jul 3 12:44:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029483 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7FC381398 for ; Wed, 3 Jul 2019 12:45:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6EAA020223 for ; Wed, 3 Jul 2019 12:45:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 62EB928898; Wed, 3 Jul 2019 12:45:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7DBD289A0 for ; Wed, 3 Jul 2019 12:45:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726762AbfGCMpP (ORCPT ); Wed, 3 Jul 2019 08:45:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51672 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfGCMpO (ORCPT ); Wed, 3 Jul 2019 08:45:14 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 27299307D942 for ; Wed, 3 Jul 2019 12:45:14 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9729417B40; Wed, 3 Jul 2019 12:45:09 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 6/9] ceph: return -EIO if read/write against filp that lost file locks Date: Wed, 3 Jul 2019 20:44:39 +0800 Message-Id: <20190703124442.6614-7-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Wed, 03 Jul 2019 12:45:14 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP After mds evicts session, file locks get lost sliently. It's not safe to let programs continue to do read/write. Signed-off-by: "Yan, Zheng" --- fs/ceph/caps.c | 28 ++++++++++++++++++++++------ fs/ceph/locks.c | 8 ++++++-- fs/ceph/super.h | 1 + 3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 40443955250c..416405e03c4c 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2567,8 +2567,13 @@ static void __take_cap_refs(struct ceph_inode_info *ci, int got, * * FIXME: how does a 0 return differ from -EAGAIN? */ +enum { + NON_BLOCKING = 1, + CHECK_FILELOCK = 2, +}; + static int try_get_cap_refs(struct inode *inode, int need, int want, - loff_t endoff, bool nonblock, int *got) + loff_t endoff, int flags, int *got) { struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; @@ -2583,6 +2588,13 @@ static int try_get_cap_refs(struct inode *inode, int need, int want, again: spin_lock(&ci->i_ceph_lock); + if ((flags & CHECK_FILELOCK) && + (ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK)) { + dout("try_get_cap_refs %p error filelock\n", inode); + ret = -EIO; + goto out_unlock; + } + /* make sure file is actually open */ file_wanted = __ceph_caps_file_wanted(ci); if ((file_wanted & need) != need) { @@ -2644,7 +2656,7 @@ static int try_get_cap_refs(struct inode *inode, int need, int want, * we can not call down_read() when * task isn't in TASK_RUNNING state */ - if (nonblock) { + if (flags & NON_BLOCKING) { ret = -EAGAIN; goto out_unlock; } @@ -2749,7 +2761,8 @@ int ceph_try_get_caps(struct inode *inode, int need, int want, if (ret < 0) return ret; - ret = try_get_cap_refs(inode, need, want, 0, nonblock, got); + ret = try_get_cap_refs(inode, need, want, 0, + (nonblock ? NON_BLOCKING : 0), got); return ret == -EAGAIN ? 0 : ret; } @@ -2761,9 +2774,10 @@ int ceph_try_get_caps(struct inode *inode, int need, int want, int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got, struct page **pinned_page) { + struct ceph_file_info *fi = filp->private_data; struct inode *inode = file_inode(filp); struct ceph_inode_info *ci = ceph_inode(inode); - int _got, ret; + int ret, _got, flags; ret = ceph_pool_perm_check(inode, need); if (ret < 0) @@ -2773,17 +2787,19 @@ int ceph_get_caps(struct file *filp, int need, int want, if (endoff > 0) check_max_size(inode, endoff); + flags = atomic_read(&fi->num_locks) ? CHECK_FILELOCK : 0; _got = 0; ret = try_get_cap_refs(inode, need, want, endoff, - false, &_got); + flags, &_got); if (ret == -EAGAIN) continue; if (!ret) { DEFINE_WAIT_FUNC(wait, woken_wake_function); add_wait_queue(&ci->i_cap_wq, &wait); + flags |= NON_BLOCKING; while (!(ret = try_get_cap_refs(inode, need, want, - endoff, true, &_got))) { + endoff, flags, &_got))) { if (signal_pending(current)) { ret = -ERESTARTSYS; break; diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c index ac9b53b89365..cb216501c959 100644 --- a/fs/ceph/locks.c +++ b/fs/ceph/locks.c @@ -32,14 +32,18 @@ void __init ceph_flock_init(void) static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src) { - struct inode *inode = file_inode(src->fl_file); + struct ceph_file_info *fi = dst->fl_file->private_data; + struct inode *inode = file_inode(dst->fl_file); atomic_inc(&ceph_inode(inode)->i_filelock_ref); + atomic_inc(&fi->num_locks); } static void ceph_fl_release_lock(struct file_lock *fl) { + struct ceph_file_info *fi = fl->fl_file->private_data; struct inode *inode = file_inode(fl->fl_file); struct ceph_inode_info *ci = ceph_inode(inode); + atomic_dec(&fi->num_locks); if (atomic_dec_and_test(&ci->i_filelock_ref)) { /* clear error when all locks are released */ spin_lock(&ci->i_ceph_lock); @@ -73,7 +77,7 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct inode *inode, * window. Caller function will decrease the counter. */ fl->fl_ops = &ceph_fl_lock_ops; - atomic_inc(&ceph_inode(inode)->i_filelock_ref); + fl->fl_ops->fl_copy_lock(fl, NULL); } if (operation != CEPH_MDS_OP_SETFILELOCK || cmd == CEPH_LOCK_UNLOCK) diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 56555d811123..da33847bdc9a 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -707,6 +707,7 @@ struct ceph_file_info { struct list_head rw_contexts; errseq_t meta_err; + atomic_t num_locks; }; struct ceph_dir_file_info { From patchwork Wed Jul 3 12:44:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029485 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16E8213A4 for ; Wed, 3 Jul 2019 12:45:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 06F7F2898E for ; Wed, 3 Jul 2019 12:45:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EF921289A9; Wed, 3 Jul 2019 12:45:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 66E972898E for ; Wed, 3 Jul 2019 12:45:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726811AbfGCMpR (ORCPT ); Wed, 3 Jul 2019 08:45:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34428 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfGCMpR (ORCPT ); Wed, 3 Jul 2019 08:45:17 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 274C588306 for ; Wed, 3 Jul 2019 12:45:17 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id E4C3017B40; Wed, 3 Jul 2019 12:45:14 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 7/9] ceph: add 'force_reconnect' option for remount Date: Wed, 3 Jul 2019 20:44:40 +0800 Message-Id: <20190703124442.6614-8-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 03 Jul 2019 12:45:17 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The option make client reconnect to cluster, using new entity addr. It can be used for recovering from blacklistd. Signed-off-by: "Yan, Zheng" --- fs/ceph/mds_client.c | 16 ++++++++--- fs/ceph/super.c | 60 +++++++++++++++++++++++++++++++++++++++--- fs/ceph/super.h | 1 + net/ceph/ceph_common.c | 30 ++++++++++++--------- 4 files changed, 88 insertions(+), 19 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index d4f07d3120cb..59172e63a61f 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1394,9 +1394,12 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, struct ceph_cap_flush *cf; struct ceph_mds_client *mdsc = fsc->mdsc; - if (ci->i_wrbuffer_ref > 0 && - READ_ONCE(fsc->mount_state) == CEPH_MOUNT_SHUTDOWN) - invalidate = true; + if (READ_ONCE(fsc->mount_state) == CEPH_MOUNT_SHUTDOWN) { + if (inode->i_data.nrpages > 0) + invalidate = true; + if (ci->i_wrbuffer_ref > 0) + mapping_set_error(&inode->i_data, -EIO); + } while (!list_empty(&ci->i_cap_flush_list)) { cf = list_first_entry(&ci->i_cap_flush_list, @@ -4369,7 +4372,12 @@ void ceph_mdsc_force_umount(struct ceph_mds_client *mdsc) session = __ceph_lookup_mds_session(mdsc, mds); if (!session) continue; + + if (session->s_state == CEPH_MDS_SESSION_REJECTED) + __unregister_session(mdsc, session); + __wake_requests(mdsc, &session->s_waiting); mutex_unlock(&mdsc->mutex); + mutex_lock(&session->s_mutex); __close_session(mdsc, session); if (session->s_state == CEPH_MDS_SESSION_CLOSING) { @@ -4378,9 +4386,11 @@ void ceph_mdsc_force_umount(struct ceph_mds_client *mdsc) } mutex_unlock(&session->s_mutex); ceph_put_mds_session(session); + mutex_lock(&mdsc->mutex); kick_requests(mdsc, mds); } + __wake_requests(mdsc, &mdsc->waiting_for_map); mutex_unlock(&mdsc->mutex); } diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 5f0c950ca966..04a9af66419d 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -169,6 +169,7 @@ enum { Opt_noquotadf, Opt_copyfrom, Opt_nocopyfrom, + Opt_force_reconnect, }; static match_table_t fsopt_tokens = { @@ -210,6 +211,7 @@ static match_table_t fsopt_tokens = { {Opt_noquotadf, "noquotadf"}, {Opt_copyfrom, "copyfrom"}, {Opt_nocopyfrom, "nocopyfrom"}, + {Opt_force_reconnect, "force_reconnect"}, {-1, NULL} }; @@ -373,6 +375,9 @@ static int parse_fsopt_token(char *c, void *private) case Opt_nocopyfrom: fsopt->flags |= CEPH_MOUNT_OPT_NOCOPYFROM; break; + case Opt_force_reconnect: + fsopt->flags |= CEPH_MOUNT_OPT_FORCERECONNCT; + break; #ifdef CONFIG_CEPH_FS_POSIX_ACL case Opt_acl: fsopt->sb_flags |= SB_POSIXACL; @@ -832,10 +837,53 @@ static void ceph_umount_begin(struct super_block *sb) return; } -static int ceph_remount(struct super_block *sb, int *flags, char *data) +static int ceph_remount(struct super_block *sb, int *flags, char *options) { - sync_filesystem(sb); - return 0; + struct ceph_fs_client *fsc = ceph_sb_to_client(sb); + struct ceph_options *opt; + struct ceph_mount_options *fsopt; + int err = 0; + + fsopt = kzalloc(sizeof(*fsopt), GFP_KERNEL); + if (!fsopt) + return -ENOMEM; + + opt = ceph_parse_options(options, NULL, NULL, + parse_fsopt_token, (void *)fsopt); + if (IS_ERR(opt)) { + err = PTR_ERR(opt); + goto out; + } + + if (!(fsopt->flags & CEPH_MOUNT_OPT_FORCERECONNCT)) { + sync_filesystem(sb); + goto out; + } + + ceph_umount_begin(sb); + + /* Make sure all page caches get invalidated. + * see remove_session_caps_cb() */ + flush_workqueue(fsc->inode_wq); + + /* In case that we were blacklisted. This also reset + * all mon/osd connections */ + ceph_reset_client_addr(fsc->client); + + ceph_osdc_clear_abort_err(&fsc->client->osdc); + fsc->mount_state = 0; + + if (sb->s_root) { + err = __ceph_do_getattr(d_inode(sb->s_root), NULL, + CEPH_STAT_CAP_INODE, true); + } else { + err = 0; + } +out: + if (!IS_ERR_OR_NULL(opt)) + ceph_destroy_options(opt); + destroy_mount_options(fsopt); + return err; } static const struct super_operations ceph_super_ops = { @@ -1067,6 +1115,12 @@ static struct dentry *ceph_mount(struct file_system_type *fs_type, goto out_final; } + if (fsopt->flags & CEPH_MOUNT_OPT_FORCERECONNCT) { + pr_err("ceph: force_reconnect option is only for remount\n"); + res = ERR_PTR(-EINVAL); + goto out_final; + } + /* create client (which we may/may not use) */ fsc = create_fs_client(fsopt, opt); if (IS_ERR(fsc)) { diff --git a/fs/ceph/super.h b/fs/ceph/super.h index da33847bdc9a..71ac136b8758 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -31,6 +31,7 @@ #define CEPH_BLOCK_SHIFT 22 /* 4 MB */ #define CEPH_BLOCK (1 << CEPH_BLOCK_SHIFT) +#define CEPH_MOUNT_OPT_FORCERECONNCT (1<<0) /* force reconnect, remount only */ #define CEPH_MOUNT_OPT_DIRSTAT (1<<4) /* `cat dirname` for stats */ #define CEPH_MOUNT_OPT_RBYTES (1<<5) /* dir st_bytes = rbytes */ #define CEPH_MOUNT_OPT_NOASYNCREADDIR (1<<7) /* no dcache readdir */ diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c index 6676f48d615c..3511782a5b0a 100644 --- a/net/ceph/ceph_common.c +++ b/net/ceph/ceph_common.c @@ -358,13 +358,6 @@ ceph_parse_options(char *options, const char *dev_name, opt = kzalloc(sizeof(*opt), GFP_KERNEL); if (!opt) return ERR_PTR(-ENOMEM); - opt->mon_addr = kcalloc(CEPH_MAX_MON, sizeof(*opt->mon_addr), - GFP_KERNEL); - if (!opt->mon_addr) - goto out; - - dout("parse_options %p options '%s' dev_name '%s'\n", opt, options, - dev_name); /* start with defaults */ opt->flags = CEPH_OPT_DEFAULT; @@ -373,12 +366,23 @@ ceph_parse_options(char *options, const char *dev_name, opt->osd_idle_ttl = CEPH_OSD_IDLE_TTL_DEFAULT; opt->osd_request_timeout = CEPH_OSD_REQUEST_TIMEOUT_DEFAULT; - /* get mon ip(s) */ - /* ip1[:port1][,ip2[:port2]...] */ - err = ceph_parse_ips(dev_name, dev_name_end, opt->mon_addr, - CEPH_MAX_MON, &opt->num_mon); - if (err < 0) - goto out; + dout("parse_options %p options '%s' dev_name '%s'\n", opt, options, + (dev_name ? : "null")); + + if (dev_name) { + opt->mon_addr = kcalloc(CEPH_MAX_MON, sizeof(*opt->mon_addr), + GFP_KERNEL); + if (!opt->mon_addr) + goto out; + + + /* get mon ip(s) */ + /* ip1[:port1][,ip2[:port2]...] */ + err = ceph_parse_ips(dev_name, dev_name_end, opt->mon_addr, + CEPH_MAX_MON, &opt->num_mon); + if (err < 0) + goto out; + } /* parse mount options */ while ((c = strsep(&options, ",")) != NULL) { From patchwork Wed Jul 3 12:44:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029487 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E01BB13A4 for ; Wed, 3 Jul 2019 12:45:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D223F285B7 for ; Wed, 3 Jul 2019 12:45:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C6F3D289A2; Wed, 3 Jul 2019 12:45:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71A50285B7 for ; Wed, 3 Jul 2019 12:45:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726841AbfGCMpW (ORCPT ); Wed, 3 Jul 2019 08:45:22 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46730 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfGCMpW (ORCPT ); Wed, 3 Jul 2019 08:45:22 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 545FB80F81 for ; Wed, 3 Jul 2019 12:45:22 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id C4B13834EB; Wed, 3 Jul 2019 12:45:17 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 8/9] ceph: invalidate all write mode filp after reconnect Date: Wed, 3 Jul 2019 20:44:41 +0800 Message-Id: <20190703124442.6614-9-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 03 Jul 2019 12:45:22 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: "Yan, Zheng" --- fs/ceph/caps.c | 4 ++++ fs/ceph/file.c | 1 + fs/ceph/super.c | 2 ++ fs/ceph/super.h | 3 +++ 4 files changed, 10 insertions(+) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 416405e03c4c..a4717b3c1f89 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2783,6 +2783,10 @@ int ceph_get_caps(struct file *filp, int need, int want, if (ret < 0) return ret; + if ((fi->fmode & CEPH_FILE_MODE_WR) && + fi->filp_gen != READ_ONCE(ceph_inode_to_client(inode)->filp_gen)) + return -EBADF; + while (true) { if (endoff > 0) check_max_size(inode, endoff); diff --git a/fs/ceph/file.c b/fs/ceph/file.c index b868048211ba..77d3ef903cce 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -234,6 +234,7 @@ static int ceph_init_file_info(struct inode *inode, struct file *file, spin_lock_init(&fi->rw_contexts_lock); INIT_LIST_HEAD(&fi->rw_contexts); fi->meta_err = errseq_sample(&ci->i_meta_err); + fi->filp_gen = READ_ONCE(ceph_inode_to_client(inode)->filp_gen); return 0; } diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 04a9af66419d..a00597602810 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -669,6 +669,7 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt, fsc->sb = NULL; fsc->mount_state = CEPH_MOUNT_MOUNTING; + fsc->filp_gen = 1; atomic_long_set(&fsc->writeback_count, 0); @@ -832,6 +833,7 @@ static void ceph_umount_begin(struct super_block *sb) if (!fsc) return; fsc->mount_state = CEPH_MOUNT_SHUTDOWN; + fsc->filp_gen++; // invalidate open files ceph_osdc_abort_requests(&fsc->client->osdc, -EIO); ceph_mdsc_force_umount(fsc->mdsc); return; diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 71ac136b8758..54039f7fb510 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -102,6 +102,8 @@ struct ceph_fs_client { struct ceph_client *client; unsigned long mount_state; + + u32 filp_gen; loff_t max_file_size; struct ceph_mds_client *mdsc; @@ -708,6 +710,7 @@ struct ceph_file_info { struct list_head rw_contexts; errseq_t meta_err; + u32 filp_gen; atomic_t num_locks; }; From patchwork Wed Jul 3 12:44:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 11029489 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E42171398 for ; Wed, 3 Jul 2019 12:45:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D46C7289A1 for ; Wed, 3 Jul 2019 12:45:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C8B4B289A5; Wed, 3 Jul 2019 12:45:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D9F0C289A1 for ; Wed, 3 Jul 2019 12:45:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726881AbfGCMp3 (ORCPT ); Wed, 3 Jul 2019 08:45:29 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44056 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725830AbfGCMp2 (ORCPT ); Wed, 3 Jul 2019 08:45:28 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AC39F87629 for ; Wed, 3 Jul 2019 12:45:28 +0000 (UTC) Received: from zhyan-laptop.redhat.com (ovpn-12-77.pek2.redhat.com [10.72.12.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1F9918F6D2; Wed, 3 Jul 2019 12:45:22 +0000 (UTC) From: "Yan, Zheng" To: ceph-devel@vger.kernel.org Cc: idryomov@redhat.com, jlayton@redhat.com, "Yan, Zheng" Subject: [PATCH 9/9] ceph: auto reconnect after blacklisted Date: Wed, 3 Jul 2019 20:44:42 +0800 Message-Id: <20190703124442.6614-10-zyan@redhat.com> In-Reply-To: <20190703124442.6614-1-zyan@redhat.com> References: <20190703124442.6614-1-zyan@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 03 Jul 2019 12:45:28 +0000 (UTC) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Make client use osd reply and session message to infer if itself is blacklisted. Client reconnect to cluster using new entity addr if it is blacklisted. Auto reconnect is controlled by recover_session= mount option. Clean mode is enabled by default. In this mode, client drops dirty date and dirty metadata, All writable file handles are invalidated. Read-only file handles continue to work and caches are dropped if necessary. Signed-off-by: "Yan, Zheng" --- fs/ceph/addr.c | 15 +++++++++++---- fs/ceph/file.c | 8 +++++++- fs/ceph/mds_client.c | 37 +++++++++++++++++++++++++++++++++++-- fs/ceph/super.c | 29 +++++++++++++++++++++++++++++ fs/ceph/super.h | 8 +++++++- 5 files changed, 89 insertions(+), 8 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 9f357c5ce84d..e3b77aa6b5f1 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -189,8 +189,7 @@ static int ceph_do_readpage(struct file *filp, struct page *page) { struct inode *inode = file_inode(filp); struct ceph_inode_info *ci = ceph_inode(inode); - struct ceph_osd_client *osdc = - &ceph_inode_to_client(inode)->client->osdc; + struct ceph_fs_client *fsc = ceph_inode_to_client(inode); int err = 0; u64 off = page_offset(page); u64 len = PAGE_SIZE; @@ -219,8 +218,8 @@ static int ceph_do_readpage(struct file *filp, struct page *page) dout("readpage inode %p file %p page %p index %lu\n", inode, filp, page, page->index); - err = ceph_osdc_readpages(osdc, ceph_vino(inode), &ci->i_layout, - off, &len, + err = ceph_osdc_readpages(&fsc->client->osdc, ceph_vino(inode), + &ci->i_layout, off, &len, ci->i_truncate_seq, ci->i_truncate_size, &page, 1, 0); if (err == -ENOENT) @@ -228,6 +227,8 @@ static int ceph_do_readpage(struct file *filp, struct page *page) if (err < 0) { SetPageError(page); ceph_fscache_readpage_cancel(inode, page); + if (err == -EBLACKLISTED) + fsc->blacklisted = 1; goto out; } if (err < PAGE_SIZE) @@ -266,6 +267,8 @@ static void finish_read(struct ceph_osd_request *req) int i; dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes); + if (rc == -EBLACKLISTED) + ceph_inode_to_client(inode)->blacklisted = 1; /* unlock all pages, zeroing any data we didn't read */ osd_data = osd_req_op_extent_osd_data(req, 0); @@ -641,6 +644,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc) end_page_writeback(page); return err; } + if (err == -EBLACKLISTED) + fsc->blacklisted = 1; dout("writepage setting page/mapping error %d %p\n", err, page); SetPageError(page); @@ -721,6 +726,8 @@ static void writepages_finish(struct ceph_osd_request *req) if (rc < 0) { mapping_set_error(mapping, rc); ceph_set_error_write(ci); + if (rc == -EBLACKLISTED) + fsc->blacklisted = 1; } else { ceph_clear_error_write(ci); } diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 77d3ef903cce..fcfae03bf9cf 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -698,7 +698,13 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, ceph_release_page_vector(pages, num_pages); } - if (ret <= 0 || off >= i_size || !more) + if (ret < 0) { + if (ret == -EBLACKLISTED) + fsc->blacklisted = 1; + break; + } + + if (off >= i_size || !more) break; } diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 59172e63a61f..ba8909691ef8 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -3032,18 +3032,23 @@ static void handle_forward(struct ceph_mds_client *mdsc, pr_err("mdsc_handle_forward decode error err=%d\n", err); } -static int __decode_and_drop_session_metadata(void **p, void *end) +static int __decode_session_metadata(void **p, void *end, + bool *blacklisted) { /* map */ u32 n; + bool err_str; ceph_decode_32_safe(p, end, n, bad); while (n-- > 0) { u32 len; ceph_decode_32_safe(p, end, len, bad); ceph_decode_need(p, end, len, bad); + err_str = !strncmp(*p, "error_string", len); *p += len; ceph_decode_32_safe(p, end, len, bad); ceph_decode_need(p, end, len, bad); + if (err_str && strnstr(*p, "blacklisted", len)) + *blacklisted = true; *p += len; } return 0; @@ -3067,6 +3072,7 @@ static void handle_session(struct ceph_mds_session *session, u64 seq; unsigned long features = 0; int wake = 0; + bool blacklisted = false; /* decode */ ceph_decode_need(&p, end, sizeof(*h), bad); @@ -3079,7 +3085,7 @@ static void handle_session(struct ceph_mds_session *session, if (msg_version >= 3) { u32 len; /* version >= 2, metadata */ - if (__decode_and_drop_session_metadata(&p, end) < 0) + if (__decode_session_metadata(&p, end, &blacklisted) < 0) goto bad; /* version >= 3, feature bits */ ceph_decode_32_safe(&p, end, len, bad); @@ -3166,6 +3172,8 @@ static void handle_session(struct ceph_mds_session *session, session->s_state = CEPH_MDS_SESSION_REJECTED; cleanup_session_requests(mdsc, session); remove_session_caps(session); + if (blacklisted) + mdsc->fsc->blacklisted = 1; wake = 2; /* for good measure */ break; @@ -4015,7 +4023,30 @@ static void lock_unlock_sessions(struct ceph_mds_client *mdsc) mutex_unlock(&mdsc->mutex); } +void maybe_recover_session(struct ceph_mds_client *mdsc) +{ + struct ceph_fs_client *fsc = mdsc->fsc; + char option[32] = "force_reconnect"; + + if (!ceph_test_mount_opt(fsc, CLEANRECOVER) && + !ceph_test_mount_opt(fsc, BRUTERECOVER)) + return; + + if (READ_ONCE(fsc->mount_state) == CEPH_MOUNT_SHUTDOWN) + return; + if (!READ_ONCE(fsc->blacklisted)) + return; + + if (fsc->last_force_reconnect && + time_before(jiffies, fsc->last_force_reconnect + HZ * 60 * 30)) + return; + + pr_info("auto reconnect after blacklisted\n"); + fsc->last_force_reconnect = jiffies; + fsc->sb->s_op->remount_fs(fsc->sb, NULL, option); + fsc->blacklisted = 0; +} /* * delayed work -- periodically trim expired leases, renew caps with mds @@ -4089,6 +4120,8 @@ static void delayed_work(struct work_struct *work) ceph_trim_snapid_map(mdsc); + maybe_recover_session(mdsc); + schedule_delayed(mdsc); } diff --git a/fs/ceph/super.c b/fs/ceph/super.c index a00597602810..02c96dcfbf83 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -143,6 +143,7 @@ enum { Opt_snapdirname, Opt_mds_namespace, Opt_fscache_uniq, + Opt_recover_session, Opt_last_string, /* string args above */ Opt_dirstat, @@ -185,6 +186,7 @@ static match_table_t fsopt_tokens = { /* int args above */ {Opt_snapdirname, "snapdirname=%s"}, {Opt_mds_namespace, "mds_namespace=%s"}, + {Opt_recover_session, "recover_session=%s"}, {Opt_fscache_uniq, "fsc=%s"}, /* string args above */ {Opt_dirstat, "dirstat"}, @@ -256,6 +258,25 @@ static int parse_fsopt_token(char *c, void *private) if (!fsopt->mds_namespace) return -ENOMEM; break; + case Opt_recover_session: + if (!strncmp(argstr[0].from, "no", + argstr[0].to-argstr[0].from)) { + fsopt->flags &= ~(CEPH_MOUNT_OPT_CLEANRECOVER | + CEPH_MOUNT_OPT_BRUTERECOVER); + } else if (!strncmp(argstr[0].from, "clean", + argstr[0].to-argstr[0].from)) { + fsopt->flags &= ~CEPH_MOUNT_OPT_BRUTERECOVER; + fsopt->flags |= CEPH_MOUNT_OPT_CLEANRECOVER; + /* not implemented yet + } else if (!strncmp(argstr[0].from, "brute", + argstr[0].to-argstr[0].from)) { + fsopt->flags &= ~CEPH_MOUNT_OPT_CLEANRECOVER; + fsopt->flags |= CEPH_MOUNT_OPT_BRUTERECOVER; + */ + } else { + return -EINVAL; + } + break; case Opt_fscache_uniq: kfree(fsopt->fscache_uniq); fsopt->fscache_uniq = kstrndup(argstr[0].from, @@ -581,6 +602,14 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root) if (fsopt->mds_namespace) seq_show_option(m, "mds_namespace", fsopt->mds_namespace); + + if (fsopt->flags & CEPH_MOUNT_OPT_CLEANRECOVER) + seq_show_option(m, "recover_session", "clean"); + else if (fsopt->flags & CEPH_MOUNT_OPT_BRUTERECOVER) + seq_show_option(m, "recover_session", "brute"); + else + seq_show_option(m, "recover_session", "no"); + if (fsopt->wsize != CEPH_MAX_WRITE_SIZE) seq_printf(m, ",wsize=%d", fsopt->wsize); if (fsopt->rsize != CEPH_MAX_READ_SIZE) diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 54039f7fb510..b2b21f32cb33 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -32,6 +32,8 @@ #define CEPH_BLOCK (1 << CEPH_BLOCK_SHIFT) #define CEPH_MOUNT_OPT_FORCERECONNCT (1<<0) /* force reconnect, remount only */ +#define CEPH_MOUNT_OPT_CLEANRECOVER (1<<1) +#define CEPH_MOUNT_OPT_BRUTERECOVER (1<<2) #define CEPH_MOUNT_OPT_DIRSTAT (1<<4) /* `cat dirname` for stats */ #define CEPH_MOUNT_OPT_RBYTES (1<<5) /* dir st_bytes = rbytes */ #define CEPH_MOUNT_OPT_NOASYNCREADDIR (1<<7) /* no dcache readdir */ @@ -45,7 +47,8 @@ #define CEPH_MOUNT_OPT_DEFAULT \ (CEPH_MOUNT_OPT_DCACHE | \ - CEPH_MOUNT_OPT_NOCOPYFROM) + CEPH_MOUNT_OPT_NOCOPYFROM | \ + CEPH_MOUNT_OPT_CLEANRECOVER) #define ceph_set_mount_opt(fsc, opt) \ (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt; @@ -103,6 +106,9 @@ struct ceph_fs_client { unsigned long mount_state; + unsigned long last_force_reconnect; + int blacklisted; + u32 filp_gen; loff_t max_file_size;