From patchwork Wed Jan 24 10:53:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhi Zhang X-Patchwork-Id: 10182109 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4F085601D5 for ; Wed, 24 Jan 2018 10:53:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C249288D8 for ; Wed, 24 Jan 2018 10:53:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 50CD0288DC; Wed, 24 Jan 2018 10:53:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B8B0F288D8 for ; Wed, 24 Jan 2018 10:53:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933129AbeAXKxY (ORCPT ); Wed, 24 Jan 2018 05:53:24 -0500 Received: from mail-pg0-f67.google.com ([74.125.83.67]:34579 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932953AbeAXKxV (ORCPT ); Wed, 24 Jan 2018 05:53:21 -0500 Received: by mail-pg0-f67.google.com with SMTP id r19so2425541pgn.1 for ; Wed, 24 Jan 2018 02:53:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=xr5xNtnkJujleYBVUgsiqQ7vwBSRBtYIFJMVhRiGhCU=; b=Sz2N8TldjmXMqbowEYxFClYnGyZy/fIk4W2L1kYWkis00JvIWFnV27W0DzMjh1rctL ZJnmEHmA3MRdNBZdMedwEJRurEZaBLJGaDknZaA0rBDQ/ccqTfY+Ljt58appxoyYHfS1 rNeupzU+ag6RWvSPjQnHm5Kb7ek5kcASdiWo3MK0c+qL3dqRQOz9RdGU8LLTW059en8+ 2rjZlrr/K94zP1TghUDyfQ3JEtgGLnst+OKTRGsailNOTzpY876HHnUa/LiwNDBbZbS/ RCXoK06eecnZWub1O3BX8w9l0WmWV3C5aqVVcFyUptLbrGBUaWjOcuAmhHT6NChXGw/m jWWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=xr5xNtnkJujleYBVUgsiqQ7vwBSRBtYIFJMVhRiGhCU=; b=tPsqiAPKokaW6Fp9mSqgJ2Dcy2Y+FQN9zVPy80RI0dahhkT2/wedSEZJv/iq28MQVT B+iks+5CsljT15uNvQZY/Rj24grXmjd2A5zCC2JYbthAlX73QKNnSdr/9quIeVhvntze 6olYN4swHpUwn1daRB9i+OLikGvfH9CnXUS1KJ6MknsJo9OD+OnDEvdm6YIR4+9ZQkwS JvZ/StRFPqxglLgpNG4/DC1k15ylaigX2MM3S0a8ZYVqgsX7+AHMaCR2wJAhdLUxN+sq BlukLwo38+Fx9eIzrwHkl85LqKOWw/4WhZqHmA2QXAHfq6UG0CbM9Rq+F9XWbZVtq1/d eXhA== X-Gm-Message-State: AKwxytdBLv91GF6hNg7SgdNz8tozEe3TdLKerKmuE/dNBUVq2YDVAtDN H4gAprH8M1cnoswA4GAvht6jBJWVuao= X-Google-Smtp-Source: AH8x224yPk/vaBKj0gdXUrM9DwZ5PryBLvSYzo8cjl/dvKk5nu6Y+hFwpMAmhs+kWcTe+XUbToPgkA== X-Received: by 10.99.6.72 with SMTP id 69mr10346835pgg.50.1516791201074; Wed, 24 Jan 2018 02:53:21 -0800 (PST) Received: from VM_0_9_centos.localdomain ([139.199.116.216]) by smtp.gmail.com with ESMTPSA id u86sm9217836pfa.102.2018.01.24.02.53.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Jan 2018 02:53:20 -0800 (PST) From: Zhi Zhang To: ceph-devel@vger.kernel.org, zyan@redhat.com Cc: Zhi Zhang Subject: [PATCH] ceph: try to allocate enough memory for reserved caps Date: Wed, 24 Jan 2018 18:53:00 +0800 Message-Id: <1516791180-16881-1-git-send-email-zhang.david2011@gmail.com> X-Mailer: git-send-email 1.8.3.1 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP ceph_reserve_caps may not reserve enough caps under high memory pressure, but it saved the needed caps number that expected to be reserved. When getting caps, crash would happen due to number mismatch. Now we will try to trim more caps when failing to allocate memory for caps need to be reserved, then try again. If still failing to allocate memory, return ENOMEM. Signed-off-by: Zhi Zhang --- fs/ceph/caps.c | 62 +++++++++++++++++++++++++++++++++++++++++++++------- fs/ceph/mds_client.c | 24 ++++++++++++++------ fs/ceph/mds_client.h | 3 +++ fs/ceph/super.h | 2 +- 4 files changed, 75 insertions(+), 16 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index a14b2c9..c25941b 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -154,13 +154,19 @@ void ceph_adjust_min_caps(struct ceph_mds_client *mdsc, int delta) spin_unlock(&mdsc->caps_list_lock); } -void ceph_reserve_caps(struct ceph_mds_client *mdsc, +/* + * Called under mdsc->mutex. + */ +int ceph_reserve_caps(struct ceph_mds_client *mdsc, struct ceph_cap_reservation *ctx, int need) { - int i; + int i, j; struct ceph_cap *cap; int have; int alloc = 0; + int max_caps; + bool trimmed = false; + struct ceph_mds_session *s; LIST_HEAD(newcaps); dout("reserve caps ctx=%p need=%d\n", ctx, need); @@ -179,16 +185,38 @@ void ceph_reserve_caps(struct ceph_mds_client *mdsc, spin_unlock(&mdsc->caps_list_lock); for (i = have; i < need; i++) { +retry: cap = kmem_cache_alloc(ceph_cap_cachep, GFP_NOFS); - if (!cap) - break; + if (!cap) { + if (!trimmed) { + for (j = 0; j < mdsc->max_sessions; j++) { + s = __ceph_lookup_mds_session(mdsc, j); + if (!s) + continue; + mutex_unlock(&mdsc->mutex); + + // trim needed caps to free memory + mutex_lock(&s->s_mutex); + max_caps = s->s_nr_caps - (need - i); + ceph_trim_caps(mdsc, s, max_caps); + mutex_unlock(&s->s_mutex); + + ceph_put_mds_session(s); + mutex_lock(&mdsc->mutex); + } + trimmed = true; + goto retry; + } else { + pr_warn("reserve caps ctx=%p ENOMEM " + "need=%d got=%d\n", + ctx, need, have + alloc); + goto out_nomem; + } + } list_add(&cap->caps_item, &newcaps); alloc++; } - /* we didn't manage to reserve as much as we needed */ - if (have + alloc != need) - pr_warn("reserve caps ctx=%p ENOMEM need=%d got=%d\n", - ctx, need, have + alloc); + BUG_ON(have + alloc != need); spin_lock(&mdsc->caps_list_lock); mdsc->caps_total_count += alloc; @@ -204,6 +232,24 @@ void ceph_reserve_caps(struct ceph_mds_client *mdsc, dout("reserve caps ctx=%p %d = %d used + %d resv + %d avail\n", ctx, mdsc->caps_total_count, mdsc->caps_use_count, mdsc->caps_reserve_count, mdsc->caps_avail_count); + return 0; + +out_nomem: + while (!list_empty(&newcaps)) { + cap = list_first_entry(&newcaps, + struct ceph_cap, caps_item); + list_del(&cap->caps_item); + kmem_cache_free(ceph_cap_cachep, cap); + } + + spin_lock(&mdsc->caps_list_lock); + mdsc->caps_avail_count += have; + mdsc->caps_reserve_count -= have; + BUG_ON(mdsc->caps_total_count != mdsc->caps_use_count + + mdsc->caps_reserve_count + + mdsc->caps_avail_count); + spin_unlock(&mdsc->caps_list_lock); + return -ENOMEM; } int ceph_unreserve_caps(struct ceph_mds_client *mdsc, diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 1b46825..8d74472 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -604,10 +604,20 @@ static void __register_request(struct ceph_mds_client *mdsc, struct ceph_mds_request *req, struct inode *dir) { + int ret = 0; + req->r_tid = ++mdsc->last_tid; - if (req->r_num_caps) - ceph_reserve_caps(mdsc, &req->r_caps_reservation, - req->r_num_caps); + if (req->r_num_caps) { + ret = ceph_reserve_caps(mdsc, &req->r_caps_reservation, + req->r_num_caps); + if (ret) { + pr_err("__register_request %p " + "failed to reserve caps: %d\n", req, ret); + // set req->r_err to fail early from __do_request + req->r_err = ret; + return; + } + } dout("__register_request %p tid %lld\n", req, req->r_tid); ceph_mdsc_get_request(req); insert_request(&mdsc->request_tree, req); @@ -1545,9 +1555,9 @@ static int trim_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg) /* * Trim session cap count down to some max number. */ -static int trim_caps(struct ceph_mds_client *mdsc, - struct ceph_mds_session *session, - int max_caps) +int ceph_trim_caps(struct ceph_mds_client *mdsc, + struct ceph_mds_session *session, + int max_caps) { int trim_caps = session->s_nr_caps - max_caps; @@ -2773,7 +2783,7 @@ static void handle_session(struct ceph_mds_session *session, break; case CEPH_SESSION_RECALL_STATE: - trim_caps(mdsc, session, le32_to_cpu(h->max_caps)); + ceph_trim_caps(mdsc, session, le32_to_cpu(h->max_caps)); break; case CEPH_SESSION_FLUSHMSG: diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 837ac4b..71e3b78 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -444,4 +444,7 @@ extern void ceph_mdsc_handle_fsmap(struct ceph_mds_client *mdsc, extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc, struct ceph_mds_session *session); +extern int ceph_trim_caps(struct ceph_mds_client *mdsc, + struct ceph_mds_session *session, + int max_caps); #endif diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 2beeec0..e5fee4f 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -648,7 +648,7 @@ static inline int __ceph_caps_wanted(struct ceph_inode_info *ci) extern void ceph_caps_init(struct ceph_mds_client *mdsc); extern void ceph_caps_finalize(struct ceph_mds_client *mdsc); extern void ceph_adjust_min_caps(struct ceph_mds_client *mdsc, int delta); -extern void ceph_reserve_caps(struct ceph_mds_client *mdsc, +extern int ceph_reserve_caps(struct ceph_mds_client *mdsc, struct ceph_cap_reservation *ctx, int need); extern int ceph_unreserve_caps(struct ceph_mds_client *mdsc, struct ceph_cap_reservation *ctx);