From patchwork Thu Aug 1 02:17:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070081 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B4B514DB for ; Thu, 1 Aug 2019 02:33:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 301B228450 for ; Thu, 1 Aug 2019 02:33:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2489B26E73; Thu, 1 Aug 2019 02:33:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C6E2628387 for ; Thu, 1 Aug 2019 02:33:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E94888E0012; Wed, 31 Jul 2019 22:33:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DF4EC8E0001; Wed, 31 Jul 2019 22:33:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0B848E0012; Wed, 31 Jul 2019 22:33:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 7E3538E0001 for ; Wed, 31 Jul 2019 22:33:35 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id 71so38686110pld.1 for ; Wed, 31 Jul 2019 19:33:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=wOFTjlq9ACF/EsgC3War9On4yRhkld8OJAs08VCe3Hk=; b=KJH22lfD8YDZwea4i8+5UysiAKD68LxFEh4I7KM7mw1yossokrWL3aNZAze0SO+EXp D23iy97GcBWxh/lyPGt9vjLA9CRTIZ1ijXwBNCrN2m0VRJ6yqgIoRCprOXOxffxdoTMs ygbOTkA3/6ZJz5/CRpzK8OVMuqp0h5TYI0fZZ4FCGEZ55O8kCqelk+rJTnc8shBr26f3 G0HX2Q4ggGaNoI4lKfBuV6Q4vQewa6lpzuckWljMmiFDME+dSI/4CSKcbw6QVQDFerWg a0W9kGlf83EhPg1ZyrHu+M7HP94It9mMKT8w4kg2q2PWc/uVV8D9JQvECoMOV2mVhdSZ acXw== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXFDiwTNl9A2Ea0vt36espMYe3Ibx4dhDioyckkwgPQ+LZ+1wRV NNZWEYArU+PyNZ2+ciynjG1a5V+XtM4sYbr5EcVg6l0xgUtX+riNRkm2dPNjqm+uj8TZE7O6axI emk+wqzyUvrLqnoUUaNQw1yAGbhqkmyvL+oEKIsHKo0W0ukYXuSI21Dfw2s7mvd0= X-Received: by 2002:a17:902:1e2:: with SMTP id b89mr15845544plb.7.1564626815199; Wed, 31 Jul 2019 19:33:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqw+kOj0irYS22kYEMh+sKT1kc4BnxsyPoN0KQ1CNqVyaLd0OX0FTQEePmP7xMhlQTRy19xu X-Received: by 2002:a17:902:1e2:: with SMTP id b89mr15845493plb.7.1564626814063; Wed, 31 Jul 2019 19:33:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564626814; cv=none; d=google.com; s=arc-20160816; b=vMr0mPAKLdVznUnAR4/ERviorjFtu2lRfyZyrhAqFrHlRqSjeb4Hq86MPUWjNUFX5Y Dd1R1APWIkE2vM28oGeTy2C7e4orWQ5o0IE3iMQ12BQHPdMqN7HTvpkurJWaeuYb8L7X WF1FuiKX9zwEz/gqGfKJDHmaewDSM7CNIWk7U84bW1NHl9r6zIHTOMGlLmNXqx/tbJA4 J8oMzRCviSbPhjpH2py6Pz9Vv33rtsNVlq3G6CEUO+jSFe+6o2lAapDdIV3RMpxBnJKO gSWyF1tfqARF2Bk4MS5CtsErSDv7RShVMKWJX9RGuwGLab0d6HR6//Kt/TYmJlmlZiQ+ 4JIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=wOFTjlq9ACF/EsgC3War9On4yRhkld8OJAs08VCe3Hk=; b=AnciSB4ru6yVs2JZyDf0pTz75kOJMGg165bg/ro6lxKhwRGd73TdS4yBcKmAg0EX6k /5FTMjP9u+oxxSK6cjSR4wkm+dCmUx/oJwP3kyF8PwGZiTZ2KjSb7Ae5O9f1Sev+FYS7 hM4p22oiDkc03DFkw4HmHjwnLcSviHvymmqUuF2CHsPGeTTS+e2OAlPRhO04gxRnihgr FLgB4b92TX6nT9cLnVFT+SSQUbDcNiW+JAJUYj7MOc1lBsv80gWcaaOa7UL0AfVHxWpj aHusRX2qjf4R3q2om4UBj7UAI+4+GtPhSPb3S3S2NO7lOqdvyv6oFftsMl6XfWFwDv0k K4dw== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id w21si37417337pgj.153.2019.07.31.19.33.33 for ; Wed, 31 Jul 2019 19:33:34 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 07DE9362348 for ; Thu, 1 Aug 2019 12:33:32 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003aO-Nk; Thu, 01 Aug 2019 12:16:50 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001kZ-L8; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 01/24] mm: directed shrinker work deferral Date: Thu, 1 Aug 2019 12:17:29 +1000 Message-Id: <20190801021752.4986-2-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=3J8m_CEvPCn7CZIx0tYA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Introduce a mechanism for ->count_objects() to indicate to the shrinker infrastructure that the reclaim context will not allow scanning work to be done and so the work it decides is necessary needs to be deferred. This simplifies the code by separating out the accounting of deferred work from the actual doing of the work, and allows better decisions to be made by the shrinekr control logic on what action it can take. Signed-off-by: Dave Chinner --- include/linux/shrinker.h | 7 +++++++ mm/vmscan.c | 8 ++++++++ 2 files changed, 15 insertions(+) diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 9443cafd1969..af78c475fc32 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -31,6 +31,13 @@ struct shrink_control { /* current memcg being shrunk (for memcg aware shrinkers) */ struct mem_cgroup *memcg; + + /* + * set by ->count_objects if reclaim context prevents reclaim from + * occurring. This allows the shrinker to immediately defer all the + * work and not even attempt to scan the cache. + */ + bool will_defer; }; #define SHRINK_STOP (~0UL) diff --git a/mm/vmscan.c b/mm/vmscan.c index 44df66a98f2a..ae3035fe94bc 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -541,6 +541,13 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, freeable, delta, total_scan, priority); + /* + * If the shrinker can't run (e.g. due to gfp_mask constraints), then + * defer the work to a context that can scan the cache. + */ + if (shrinkctl->will_defer) + goto done; + /* * Normally, we should not scan less than batch_size objects in one * pass to avoid too frequent shrinker calls, but if the slab has less @@ -575,6 +582,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, cond_resched(); } +done: if (next_deferred >= scanned) next_deferred -= scanned; else From patchwork Thu Aug 1 02:17:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070051 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E04A1395 for ; Thu, 1 Aug 2019 02:18:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7228A28421 for ; Thu, 1 Aug 2019 02:18:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 66C0D28451; Thu, 1 Aug 2019 02:18:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BBB5B28450 for ; Thu, 1 Aug 2019 02:18:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E85B68E0003; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B66F28E0012; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E4F08E0013; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 0D72C8E0003 for ; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id i26so44536721pfo.22 for ; Wed, 31 Jul 2019 19:18:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=dYkokhGzchvhh8AlOQN79KnC+lI3UhM86uN2hNPC9ko=; b=m/XA1q4TpWi+C5ReGP55tp0E/mqhj7niq0nntlZuicdT09d8elWSqRFPqOLssNcHqv MGSq+WgHc/EAcsq+xykE4UE7sKZSN6pZj5bRA+GZmeZhSEqZU/nyv0gZl2QH4+JWKzFU KdxkS7+HuDV0B1XN9TpjF8I3lHNaMV5h1kNiJLxfPimHYnYESLPiPR1lpG7gqb/h0W6A 5+8kVE5itYTTz60BJQ9fitv/r0DF63fLtxn7LAjQXnjf7IipZyVuRmuVAGNetq9ZtxRl 3BM/QgQ2+iryvLAyo8FsFH2eRqdkESXywyYqI9iOZUcmr8JzOef4Tba2bbYEGP1iUsXn qcaw== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXddS8a2xw52Ns154pA/LZnrjW5T3RbvzX82uKVyGxL0+Y/W8Cg ELGW8Z/cThr0z8TaBZEPzoEOsZo3nqi5hKz6oGCFWz2304Hl/uAPVYNXLqg7m4OVRguWaB7AOzp TybydnGukiRNZNdcfQ0ttfBfk3UoQjJuwOtx1VsjeoEJo9vY+XpTdH3h0GLz/tEY= X-Received: by 2002:a63:b20f:: with SMTP id x15mr7744325pge.453.1564625891591; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqxcIzUyRxCKOwu2qU5DgcG5/FcXSfKr3udnPyTN/ABliJtiaTB093sZ8O1zGoPUPD+mIDbS X-Received: by 2002:a63:b20f:: with SMTP id x15mr7743849pge.453.1564625881632; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625881; cv=none; d=google.com; s=arc-20160816; b=rbY9LmzCfZzDhd5E4wOMyDAJu6SAAu5d0zdjuAtGpU48OSrdQdCZpwGrhvQGo+QY5l Zza8XpvZh4vNLMEuUGZAMB5fybAg0u8Odtpf51Kb5NNNrkpVWt9xQtmv7Nuub/Mxk7FB Aizowf2olphiBXaXhySP76CStssa/iMZGE+tAvRV9qiY8hNLJHewa2hGWiD2Y3YOaRn5 qJCMmRs5hg5A2QVc0B5EGo7mGagGDHwE77OzOBrgFSaCOhsnmCXNvSjdgfPOekoa+BzT 8VnX0xEzp0CYsLkGTOrzjxM7xuo2o4To5ZB0gd0RvfmIAf87lpiMO0QoPBcCcXwNwSpT pbPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=dYkokhGzchvhh8AlOQN79KnC+lI3UhM86uN2hNPC9ko=; b=EpAmodlkNfIrQSQy6pCeiqKWYKTMI8VnJMQk9baIqu/gjm5+QhG1g5uKAORMA4AtEB bh2Ny6inwwjTi+H4eKZUhFZUVz4ByunEI+ejFzXo2IeD8c8rnTni0JBCtvO2xy1T5OqB VTqJQZvvpxusWH3/wVRv4iXwWaqhTDAH+NAyQeNLDrMbE0qJ+18PMkvS2LPvxFk++IA/ tjljm6QW8K7/xZNBcRuj0/dzaO1qZn6iZ7l+2Vj5hQe3Ew9OWn/Z+rTrWKk4jPhymcDl cPIfrGne8eGfjoW3wEkX5RdFMPj+syNLgHRjUbI42p7bvz5cTEC+cL58UDdz656FmHMo YX3A== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id r8si35955754pgr.243.2019.07.31.19.18.01 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id B23C543DD1A; Thu, 1 Aug 2019 12:17:57 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003aQ-OS; Thu, 01 Aug 2019 12:16:50 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001kb-MB; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 02/24] shrinkers: use will_defer for GFP_NOFS sensitive shrinkers Date: Thu, 1 Aug 2019 12:17:30 +1000 Message-Id: <20190801021752.4986-3-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=bIsfdx-f5ddGStTJopEA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner For shrinkers that currently avoid scanning when called under GFP_NOFS contexts, conver them to use the new ->will_defer flag rather than checking and returning errors during scans. This makes it very clear that these shrinkers are not doing any work because of the context limitations, not because there is no work that can be done. Signed-off-by: Dave Chinner --- drivers/staging/android/ashmem.c | 8 ++++---- fs/gfs2/glock.c | 5 +++-- fs/gfs2/quota.c | 6 +++--- fs/nfs/dir.c | 6 +++--- fs/super.c | 6 +++--- fs/xfs/xfs_buf.c | 4 ++++ fs/xfs/xfs_qm.c | 11 ++++++++--- net/sunrpc/auth.c | 5 ++--- 8 files changed, 30 insertions(+), 21 deletions(-) diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c index 74d497d39c5a..fd9027dbd28c 100644 --- a/drivers/staging/android/ashmem.c +++ b/drivers/staging/android/ashmem.c @@ -438,10 +438,6 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { unsigned long freed = 0; - /* We might recurse into filesystem code, so bail out if necessary */ - if (!(sc->gfp_mask & __GFP_FS)) - return SHRINK_STOP; - if (!mutex_trylock(&ashmem_mutex)) return -1; @@ -478,6 +474,10 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) static unsigned long ashmem_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { + /* We might recurse into filesystem code, so bail out if necessary */ + if (!(sc->gfp_mask & __GFP_FS)) + sc->will_defer = true; + /* * note that lru_count is count of pages on the lru, not a count of * objects on the list. This means the scan function needs to return the diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e23fb8b7b020..08c95172d0e5 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1517,14 +1517,15 @@ static long gfs2_scan_glock_lru(int nr) static unsigned long gfs2_glock_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { - if (!(sc->gfp_mask & __GFP_FS)) - return SHRINK_STOP; return gfs2_scan_glock_lru(sc->nr_to_scan); } static unsigned long gfs2_glock_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { + if (!(sc->gfp_mask & __GFP_FS)) + sc->will_defer = true; + return vfs_pressure_ratio(atomic_read(&lru_count)); } diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 69c4b77f127b..d35beda906e8 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -166,9 +166,6 @@ static unsigned long gfs2_qd_shrink_scan(struct shrinker *shrink, LIST_HEAD(dispose); unsigned long freed; - if (!(sc->gfp_mask & __GFP_FS)) - return SHRINK_STOP; - freed = list_lru_shrink_walk(&gfs2_qd_lru, sc, gfs2_qd_isolate, &dispose); @@ -180,6 +177,9 @@ static unsigned long gfs2_qd_shrink_scan(struct shrinker *shrink, static unsigned long gfs2_qd_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { + if (!(sc->gfp_mask & __GFP_FS)) + sc->will_defer = true; + return vfs_pressure_ratio(list_lru_shrink_count(&gfs2_qd_lru, sc)); } diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 8d501093660f..73735ab1d623 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -2202,10 +2202,7 @@ unsigned long nfs_access_cache_scan(struct shrinker *shrink, struct shrink_control *sc) { int nr_to_scan = sc->nr_to_scan; - gfp_t gfp_mask = sc->gfp_mask; - if ((gfp_mask & GFP_KERNEL) != GFP_KERNEL) - return SHRINK_STOP; return nfs_do_access_cache_scan(nr_to_scan); } @@ -2213,6 +2210,9 @@ nfs_access_cache_scan(struct shrinker *shrink, struct shrink_control *sc) unsigned long nfs_access_cache_count(struct shrinker *shrink, struct shrink_control *sc) { + if ((sc->gfp_mask & GFP_KERNEL) != GFP_KERNEL) + sc->will_defer = true; + return vfs_pressure_ratio(atomic_long_read(&nfs_access_nr_entries)); } diff --git a/fs/super.c b/fs/super.c index 113c58f19425..66dd2af6cfde 100644 --- a/fs/super.c +++ b/fs/super.c @@ -73,9 +73,6 @@ static unsigned long super_cache_scan(struct shrinker *shrink, * Deadlock avoidance. We may hold various FS locks, and we don't want * to recurse into the FS that called us in clear_inode() and friends.. */ - if (!(sc->gfp_mask & __GFP_FS)) - return SHRINK_STOP; - if (!trylock_super(sb)) return SHRINK_STOP; @@ -140,6 +137,9 @@ static unsigned long super_cache_count(struct shrinker *shrink, return 0; smp_rmb(); + if (!(sc->gfp_mask & __GFP_FS)) + sc->will_defer = true; + if (sb->s_op && sb->s_op->nr_cached_objects) total_objects = sb->s_op->nr_cached_objects(sb, sc); diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index ca0849043f54..6e0f76532535 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1680,6 +1680,10 @@ xfs_buftarg_shrink_count( { struct xfs_buftarg *btp = container_of(shrink, struct xfs_buftarg, bt_shrinker); + + if (!(sc->gfp_mask & __GFP_FS)) + sc->will_defer = true; + return list_lru_shrink_count(&btp->bt_lru, sc); } diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 5e7a37f0cf84..13c842e8f13b 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -502,9 +502,6 @@ xfs_qm_shrink_scan( unsigned long freed; int error; - if ((sc->gfp_mask & (__GFP_FS|__GFP_DIRECT_RECLAIM)) != (__GFP_FS|__GFP_DIRECT_RECLAIM)) - return 0; - INIT_LIST_HEAD(&isol.buffers); INIT_LIST_HEAD(&isol.dispose); @@ -534,6 +531,14 @@ xfs_qm_shrink_count( struct xfs_quotainfo *qi = container_of(shrink, struct xfs_quotainfo, qi_shrinker); + /* + * __GFP_DIRECT_RECLAIM is used here to avoid blocking kswapd + */ + if ((sc->gfp_mask & (__GFP_FS|__GFP_DIRECT_RECLAIM)) != + (__GFP_FS|__GFP_DIRECT_RECLAIM)) { + sc->will_defer = true; + } + return list_lru_shrink_count(&qi->qi_lru, sc); } diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index cdb05b48de44..6babcbac4a00 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -527,9 +527,6 @@ static unsigned long rpcauth_cache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { - if ((sc->gfp_mask & GFP_KERNEL) != GFP_KERNEL) - return SHRINK_STOP; - /* nothing left, don't come back */ if (list_empty(&cred_unused)) return SHRINK_STOP; @@ -541,6 +538,8 @@ static unsigned long rpcauth_cache_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { + if ((sc->gfp_mask & GFP_KERNEL) != GFP_KERNEL) + sc->will_defer = true; return number_cred_unused * sysctl_vfs_cache_pressure / 100; } From patchwork Thu Aug 1 02:17:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070047 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9FEF14DB for ; Thu, 1 Aug 2019 02:18:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BF2EF26D08 for ; Thu, 1 Aug 2019 02:18:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B3B3628433; Thu, 1 Aug 2019 02:18:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA45228434 for ; Thu, 1 Aug 2019 02:18:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52D5E8E000F; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1F6BE8E0014; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8B2C8E000E; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 765148E000F for ; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id l11so22720414pgc.14 for ; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=InE1Y+B+w3QLREXMIAwOjwH/1wHm2lDGCCBLdb8TYbk=; b=Zb4clE+868133oMW3DpLOMXD89YSAIknBVfIc/FePNrLhoJnK/lnZYbwxMPiwYQWJE DPwfr2DE4GLE6hdExc9XBIKb9CQh8HNM2rELXGsQzXo+O1Valif7Icycv7ZO7ztozmlb mlYwNmCNJCQCz/G2fK+73E9uGMRaLs/+VOmujqst2Npvg4tuu8ZXB3Db3yq5q/YwE4KD V756cYQImKiFDsCp6uy+NjmNUp0UdJjvFTlkz25DJPG1SN661U9TDnLSRSxCVG8NS5af 1lLlrL+kroA0t+h+aSWVrvAgDI2POhxck/xOmHTBrhai3I8cQey8L5kIVXx7gYq7VOMq 6cgg== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAVwhmPF23fvR63MB58swFKNV8h/mqkDT1+/1FiR/zF/rdPdNbJW JmfkA/O5Z8FJz/z6xjSECP87lDJ7EU8CqvxszsY8DdPzN6BYVpxgge3DLYRdwRZygBYmJVEN51c OiK+4ZpcmKgtiCj16VniT3nR/Y72glP9tOO+fRr1DeelP5i/uymW9r93/hxROBsE= X-Received: by 2002:a17:902:b582:: with SMTP id a2mr125624462pls.128.1564625891153; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqxIkgje4beKz0cfRvP/QDVpEeZ9eSAl5wIy1FRoefb4IqU5KVylfW2giP4kZ+w4rm2rUfMY X-Received: by 2002:a17:902:b582:: with SMTP id a2mr125623968pls.128.1564625881681; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625881; cv=none; d=google.com; s=arc-20160816; b=YOQPcOUhTST2TyeBLPhFMmPmZD8h/H/tZnCLgG7Qq7G0MGg9TIgkWJFcDhrJaxfuId fpFiqWpKBWOAZiVGr0byvN4uWeMP7YkMJBV4ha0FkAMzv3xeuXGtrfIMP6d8lL0ZArZo 1vDQfGf+obk4dt5qcavb+42rOjyRaQ1cEVMA2icLiapra7PW2UZi4iFcKndOGZkb7wY1 pNMRqOqExAAYhOl+SDk5h/6/a8oDs/HZy7MOdMTgpueIdTompDFdE+HZ8uEh8ZtZlcnV e+VYOK0aniGpXUPrpiHzpUkRWJdpBqpMhi0Oy1LYJYVO1Gs2Ul072Sg54CXIle3hroZb asrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=InE1Y+B+w3QLREXMIAwOjwH/1wHm2lDGCCBLdb8TYbk=; b=GheQY+Al0IP0LJd1UeDw0Rjq43b2ZatxRv7LD8Jj9yFaxqp7kKVhXpi6X1fhMiEDL+ aOHwkbFif4jdahhtDIvNi6mfZ/9K8NGiQXlE1uzBJPQiNxtyyyXo/cxrhQyo2mRvO8j/ LzuUzhR8aospEYrqDFIyAn7mRc7C7gwh0C2Ma3dPMPJ64l8ukt9YRpYNFWm7ikEk2KhJ bxK7Z/usUbQdF9WdynPzPX5CAmvJQq0xnPJkUoOEqddZ9DBNUbUzfhePwR1eXKzpXfGa eqEXpw8tUSsH0iUi9qKndTUVU3TQ9g1+cn6wvb46ylrWOmSWyX35T0JDhcC/X7gAFpCf mlhg== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id s29si34142829pfd.147.2019.07.31.19.18.01 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A167C43D891; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003aS-PG; Thu, 01 Aug 2019 12:16:50 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001ke-NI; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 03/24] mm: factor shrinker work calculations Date: Thu, 1 Aug 2019 12:17:31 +1000 Message-Id: <20190801021752.4986-4-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=KhDYTzbOwxnNCxDT8zYA:9 a=tfnFA0sY_IgnZ8QS:21 a=zMhpCZC8LXQJiyqo:21 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Start to clean up the shrinker code by factoring out the calculation that determines how much work to do. This separates the calculation from clamping and other adjustments that are done before the shrinker work is run. Also convert the calculation for the amount of work to be done to use 64 bit logic so we don't have to keep jumping through hoops to keep calculations within 32 bits on 32 bit systems. Signed-off-by: Dave Chinner --- mm/vmscan.c | 74 ++++++++++++++++++++++++++++++++++------------------- 1 file changed, 47 insertions(+), 27 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index ae3035fe94bc..b7472953b0e6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -464,13 +464,45 @@ EXPORT_SYMBOL(unregister_shrinker); #define SHRINK_BATCH 128 +/* + * Calculate the number of new objects to scan this time around. Return + * the work to be done. If there are freeable objects, return that number in + * @freeable_objects. + */ +static int64_t shrink_scan_count(struct shrink_control *shrinkctl, + struct shrinker *shrinker, int priority, + int64_t *freeable_objects) +{ + uint64_t delta; + uint64_t freeable; + + freeable = shrinker->count_objects(shrinker, shrinkctl); + if (freeable == 0 || freeable == SHRINK_EMPTY) + return freeable; + + if (shrinker->seeks) { + delta = freeable >> (priority - 2); + do_div(delta, shrinker->seeks); + } else { + /* + * These objects don't require any IO to create. Trim + * them aggressively under memory pressure to keep + * them from causing refetches in the IO caches. + */ + delta = freeable / 2; + } + + *freeable_objects = freeable; + return delta > 0 ? delta : 0; +} + static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, struct shrinker *shrinker, int priority) { unsigned long freed = 0; - unsigned long long delta; long total_scan; - long freeable; + int64_t freeable_objects = 0; + int64_t scan_count; long nr; long new_nr; int nid = shrinkctl->nid; @@ -481,9 +513,10 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) nid = 0; - freeable = shrinker->count_objects(shrinker, shrinkctl); - if (freeable == 0 || freeable == SHRINK_EMPTY) - return freeable; + scan_count = shrink_scan_count(shrinkctl, shrinker, priority, + &freeable_objects); + if (scan_count == 0 || scan_count == SHRINK_EMPTY) + return scan_count; /* * copy the current shrinker scan count into a local variable @@ -492,25 +525,11 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, */ nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0); - total_scan = nr; - if (shrinker->seeks) { - delta = freeable >> priority; - delta *= 4; - do_div(delta, shrinker->seeks); - } else { - /* - * These objects don't require any IO to create. Trim - * them aggressively under memory pressure to keep - * them from causing refetches in the IO caches. - */ - delta = freeable / 2; - } - - total_scan += delta; + total_scan = nr + scan_count; if (total_scan < 0) { pr_err("shrink_slab: %pS negative objects to delete nr=%ld\n", shrinker->scan_objects, total_scan); - total_scan = freeable; + total_scan = scan_count; next_deferred = nr; } else next_deferred = total_scan; @@ -527,19 +546,20 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, * Hence only allow the shrinker to scan the entire cache when * a large delta change is calculated directly. */ - if (delta < freeable / 4) - total_scan = min(total_scan, freeable / 2); + if (scan_count < freeable_objects / 4) + total_scan = min_t(long, total_scan, freeable_objects / 2); /* * Avoid risking looping forever due to too large nr value: * never try to free more than twice the estimate number of * freeable entries. */ - if (total_scan > freeable * 2) - total_scan = freeable * 2; + if (total_scan > freeable_objects * 2) + total_scan = freeable_objects * 2; trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, - freeable, delta, total_scan, priority); + freeable_objects, scan_count, + total_scan, priority); /* * If the shrinker can't run (e.g. due to gfp_mask constraints), then @@ -564,7 +584,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, * possible. */ while (total_scan >= batch_size || - total_scan >= freeable) { + total_scan >= freeable_objects) { unsigned long ret; unsigned long nr_to_scan = min(batch_size, total_scan); From patchwork Thu Aug 1 02:17:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070079 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7268913B1 for ; Thu, 1 Aug 2019 02:33:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 659A92842B for ; Thu, 1 Aug 2019 02:33:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 59D8726E4A; Thu, 1 Aug 2019 02:33:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6B6A2841F for ; Thu, 1 Aug 2019 02:33:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A98588E000D; Wed, 31 Jul 2019 22:33:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A22D68E0001; Wed, 31 Jul 2019 22:33:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EB258E000D; Wed, 31 Jul 2019 22:33:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 31F5D8E0001 for ; Wed, 31 Jul 2019 22:33:33 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id z14so37112139pgr.22 for ; Wed, 31 Jul 2019 19:33:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=O2FiJFNkWUsACgIBXN1RU4QxZLI1SuXi5C0U1Ljd/vs=; b=TI+AfC8ZOZr0VVvtpMPGxZuw1V5b3ijC9DpY+K1Be+A+g1zjUKF4becrpNy0RPV984 uKtem4bFVYCDe9sxDp0obelTZxFVwdcj1PITuuhyTgGOeVoUerWpWcEVB5jB/MT2wxRf aIIJU3Xeeqlh3IGNYvz/dB4gxc8s6O3feLxVxcmF/muGQVhU7WNRRmSLrjLaVA1wpQ+s G6Fb6hn7GPlSwl9zVyNvzLHO38kPrRhsYFwbbA9JXzbcLM7PRbXo/AZ7J0JMGvs3eNU4 Ht7laC3OjRf89vGS/MNC0H8H3RLr64RTkqrhTMeddC3acO9nvY6Y5Ct/ViHPJob/ve9v Ogew== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAWUlkWp93tXHhtG+bAvacfo5jE0dIs86PdZ9nQeM620axbiBDyq KGhFcQE8s9GArCWGVDnkuRglEwmbmwo6TAz5uBkzN0FRdMLjCCV6KKKpqZkvVhiiDbbN7l51KtT KyLczF8S4Tc98jZAvGxEk9SBEkQnz/fKF3RoOidTwd4Y2JO9+GSI6Fa2LA1dsLZ4= X-Received: by 2002:a17:902:9b94:: with SMTP id y20mr123109195plp.260.1564626812853; Wed, 31 Jul 2019 19:33:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqxBf2OTHE3MIuCc9tednBmt3F+NVc5nW7s2PeN9eWKVRV/0JLFTuButwfIq5fsGtoZYtQ7A X-Received: by 2002:a17:902:9b94:: with SMTP id y20mr123109147plp.260.1564626811717; Wed, 31 Jul 2019 19:33:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564626811; cv=none; d=google.com; s=arc-20160816; b=in/Xg/e8m4QgCfJrjLCoo4EiDEl+6E7Xu0WT086jOlF/eF3YXt8xsjQFR57XAx86Mu kzQIVQXKzZuIkLagBu0vIPdt6GaWPmTKqIJuiU3BkXtZ2fgmblnk4irKvi+sP6NeMT7i yWJHg0ppTK5mgzMoRqm7OoaW4et081gTtQd6TLXN4nqbKDfRhx6LgBseuixwIn68zmih 53UN6LAOj3voVT8X/DMp4bktuYrYpToTUJKCfTDYW43d6wkrgR6YHzawT8gzfMPezF6g +kHoTEDogv9H25DLX0pWTZRdun6qciI6Lajf+Ydwp9jSxvCZOnkIF2juil9049jDDXp0 Io6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=O2FiJFNkWUsACgIBXN1RU4QxZLI1SuXi5C0U1Ljd/vs=; b=lJo8357p9KSZTOQPGg3kfrBC5LgX7GbuA7h+BwMPO//TMa+pbSgRXLw1Vcn9uA81An ZTzpAB5gFHQWF5n0r/oEFUcLxpd7m6z6cYV2urFc+QI1VocreA7c2iXSyBFmM9LpPRXn TxalaCpEHZJPendnLhN5o8+lGJQ7a/LVMjtecbIjvV3WQhVk03QeG9eOBNdPCJNB5u7F Xurb07rrm3ZLtd+lAn1T0+5W0FkuhWCMHt4bJjmKvHxksyE1ebiPMC0r6pRVfq8iFMlL eKHrzDMgfcRjBxF/EEM5T/VuU0OnakUx5wn6SaLyzrKcvnu8uduX271Mlq9KXnWkpWFU H2xA== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id e90si30942678plb.309.2019.07.31.19.33.31 for ; Wed, 31 Jul 2019 19:33:31 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 4211743DA5A for ; Thu, 1 Aug 2019 12:33:30 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003aU-QO; Thu, 01 Aug 2019 12:16:50 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001ki-OR; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 04/24] shrinker: defer work only to kswapd Date: Thu, 1 Aug 2019 12:17:32 +1000 Message-Id: <20190801021752.4986-5-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=CHVq44adYJMzgTTYFj8A:9 a=WRIbabeltRbYLjNO:21 a=R37gaMtEP60o-D4u:21 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Right now deferred work is picked up by whatever GFP_KERNEL context reclaimer that wins the race to empty the node's deferred work counter. However, if there are lots of direct reclaimers, that work might be continually picked up by contexts taht can't do any work and so the opportunities to do the work are missed by contexts that could do them. A further problem with the current code is that the deferred work can be picked up by a random direct reclaimer, resulting in that specific process having to do all the deferred reclaim work and hence can take extremely long latencies if the reclaim work blocks regularly. This is not good for direct reclaim fairness or for minimising long tail latency events. To avoid these problems, simply limit deferred work to kswapd contexts. We know kswapd is a context that can always do reclaim work, and hence deferring work to kswapd allows the deferred work to be done in the background and not adversely affect any specific process context doing direct reclaim. The advantage of this is that amount of work to be done in direct reclaim is now bound and predictable - it is entirely based on the cache's freeable objects and the reclaim priority. hence all direct reclaimers running at the same time should be doing relatively equal amounts of work, thereby reducing the incidence of long tail latencies due to uneven reclaim workloads. Signed-off-by: Dave Chinner --- mm/vmscan.c | 93 ++++++++++++++++++++++++++++------------------------- 1 file changed, 50 insertions(+), 43 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index b7472953b0e6..c583b4efb9bf 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -500,15 +500,15 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, struct shrinker *shrinker, int priority) { unsigned long freed = 0; - long total_scan; int64_t freeable_objects = 0; int64_t scan_count; - long nr; + int64_t scanned_objects = 0; + int64_t next_deferred = 0; + int64_t deferred_count = 0; long new_nr; int nid = shrinkctl->nid; long batch_size = shrinker->batch ? shrinker->batch : SHRINK_BATCH; - long scanned = 0, next_deferred; if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) nid = 0; @@ -519,47 +519,53 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, return scan_count; /* - * copy the current shrinker scan count into a local variable - * and zero it so that other concurrent shrinker invocations - * don't also do this scanning work. + * If kswapd, we take all the deferred work and do it here. We don't let + * direct reclaim do this, because then it means some poor sod is going + * to have to do somebody else's GFP_NOFS reclaim, and it hides the real + * amount of reclaim work from concurrent kswapd operations. Hence we do + * the work in the wrong place, at the wrong time, and it's largely + * unpredictable. + * + * By doing the deferred work only in kswapd, we can schedule the work + * according the the reclaim priority - low priority reclaim will do + * less deferred work, hence we'll do more of the deferred work the more + * desperate we become for free memory. This avoids the need for needing + * to specifically avoid deferred work windup as low amount os memory + * pressure won't excessive trim caches anymore. */ - nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0); + if (current_is_kswapd()) { + int64_t deferred_scan; - total_scan = nr + scan_count; - if (total_scan < 0) { - pr_err("shrink_slab: %pS negative objects to delete nr=%ld\n", - shrinker->scan_objects, total_scan); - total_scan = scan_count; - next_deferred = nr; - } else - next_deferred = total_scan; + deferred_count = atomic64_xchg(&shrinker->nr_deferred[nid], 0); - /* - * We need to avoid excessive windup on filesystem shrinkers - * due to large numbers of GFP_NOFS allocations causing the - * shrinkers to return -1 all the time. This results in a large - * nr being built up so when a shrink that can do some work - * comes along it empties the entire cache due to nr >>> - * freeable. This is bad for sustaining a working set in - * memory. - * - * Hence only allow the shrinker to scan the entire cache when - * a large delta change is calculated directly. - */ - if (scan_count < freeable_objects / 4) - total_scan = min_t(long, total_scan, freeable_objects / 2); + /* we want to scan 5-10% of the deferred work here at minimum */ + deferred_scan = deferred_count; + if (priority) + do_div(deferred_scan, priority); + scan_count += deferred_scan; + + /* + * If there is more deferred work than the number of freeable + * items in the cache, limit the amount of work we will carry + * over to the next kswapd run on this cache. This prevents + * deferred work windup. + */ + if (deferred_count > freeable_objects * 2) + deferred_count = freeable_objects * 2; + + } /* * Avoid risking looping forever due to too large nr value: * never try to free more than twice the estimate number of * freeable entries. */ - if (total_scan > freeable_objects * 2) - total_scan = freeable_objects * 2; + if (scan_count > freeable_objects * 2) + scan_count = freeable_objects * 2; - trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, + trace_mm_shrink_slab_start(shrinker, shrinkctl, deferred_count, freeable_objects, scan_count, - total_scan, priority); + scan_count, priority); /* * If the shrinker can't run (e.g. due to gfp_mask constraints), then @@ -583,10 +589,10 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, * scanning at high prio and therefore should try to reclaim as much as * possible. */ - while (total_scan >= batch_size || - total_scan >= freeable_objects) { + while (scan_count >= batch_size || + scan_count >= freeable_objects) { unsigned long ret; - unsigned long nr_to_scan = min(batch_size, total_scan); + unsigned long nr_to_scan = min_t(long, batch_size, scan_count); shrinkctl->nr_to_scan = nr_to_scan; shrinkctl->nr_scanned = nr_to_scan; @@ -596,17 +602,17 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, freed += ret; count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned); - total_scan -= shrinkctl->nr_scanned; - scanned += shrinkctl->nr_scanned; + scan_count -= shrinkctl->nr_scanned; + scanned_objects += shrinkctl->nr_scanned; cond_resched(); } done: - if (next_deferred >= scanned) - next_deferred -= scanned; - else - next_deferred = 0; + if (deferred_count) + next_deferred = deferred_count - scanned_objects; + else if (scan_count > 0) + next_deferred = scan_count; /* * move the unused scan count back into the shrinker in a * manner that handles concurrent updates. If we exhausted the @@ -618,7 +624,8 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, else new_nr = atomic_long_read(&shrinker->nr_deferred[nid]); - trace_mm_shrink_slab_end(shrinker, nid, freed, nr, new_nr, total_scan); + trace_mm_shrink_slab_end(shrinker, nid, freed, deferred_count, new_nr, + scan_count); return freed; } From patchwork Thu Aug 1 02:17:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070053 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EEC6F14DB for ; Thu, 1 Aug 2019 02:18:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E42F427528 for ; Thu, 1 Aug 2019 02:18:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D88BF28450; Thu, 1 Aug 2019 02:18:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A037B283C5 for ; Thu, 1 Aug 2019 02:18:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 238348E0010; Wed, 31 Jul 2019 22:18:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DDDE78E0014; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B7538E0003; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 0F4438E0012 for ; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id n9so40670720pgq.4 for ; Wed, 31 Jul 2019 19:18:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=cfjpC4ZufL8zFJ9iILIMiNwPpuYb7M9uLZGTu5lO0a0=; b=RyC5grk2oxYO+l2/ZCVbABNB9rH13qEmT6QkoNjmpB+sFNQM0Lw4LjIt31iEKoLMIM GqIspjdE2a7/w06U45t2Iwj3WUIvu2tBrcX4GT5qxs0uGJfSVkxcS59Di8QXAQKnIHf8 HNBDyCS2dpiQhSkdKs8X9GFAsYj+DFaIU0nA/D25KJxJGXcvKwKc148uiHy/5WjSmh6F /dKvsvq9Tqa7mK5EVp6G51csmeMe6MTuoS0E2HksvkUEm0DSY2jBZDFLc98cbOtgjrXi jZg1XnFo8rp73Hn1VVKMwECNjCiyq7jxYJ65dIVEJyyVDHV+4SFZvxjJAdZg+6TQNBqt N1Jw== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAV7LLvtqWV3m2oZ7DyzWT5AcNTXQDG+yESpq3icXSGsqjf4Blhw sOyG8jA5zXwKLSm3NrDGKp4R9ULrc3cjl0OEhatVss7manMDg2ZE7wThjahfX/tS2DIVtT4WfwZ O8UYf1ZdE9suTvmsIzl6XAZadtHHxnqoo/Ujk1IxetJLpSSKQeF2b/upLxLwCH2I= X-Received: by 2002:a62:5214:: with SMTP id g20mr50588045pfb.187.1564625891673; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqxEhyB0HQyBhEASZVkYdQ3mGrCAR9a93epWsfaZpWKnBIrZYCM5U9bnMH9kLuKwnfefocsF X-Received: by 2002:a62:5214:: with SMTP id g20mr50587486pfb.187.1564625881817; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625881; cv=none; d=google.com; s=arc-20160816; b=DkJaU1a6i+MzNbvPchoGEYsW3ZKBJGWxTEyrCoYo81ltjtu+qWxW0jv9+2ppDEdldZ glAwLH0rRUB7TM9K11YcGLcxrHgam9ovpZPB6XL+6xcd/x19ReeXSq0T3GauN2bpIoUn zYwkByx8oSUrxuR+N+HzZ4tx7XsMvIs0JMqbncJBhbfyMOY2GMfRO26s7fkSX4ooi6we xlOvFOIGJqv/OhTkxn34B/Sz3xUYjkegAQTWDIEglUeSOez1kWE6Pqy5YYeee/xyuG23 J3Ld1f1GEhLyo+hpjyq/ZshnnJlgb2EXNTCQ7DupqIZ7rbkXVWAxDTiBmUtzRWk/8dDl /QPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=cfjpC4ZufL8zFJ9iILIMiNwPpuYb7M9uLZGTu5lO0a0=; b=PZAA6zYkIXuBWKCATKBVHipisbbSrAfHg/BeS5FsTbzw31lUBbZ17tW2dGXf9O0XAM 8nVLWbK1IbSz9cWHWHmWTHKb3zlspqrPQWWDivI3wydWNu4eNd0amn2kiRQ2sndbnqAs +3Yn00GsP1OMDuaZ+MWoa/ZZ0KgbU7ERCLxh8/opCrjKB7N3U7mWgSJjSWZkB84ifYws yRyE6L9pQ9zJ7/AEkXT89zKqMImWzmb/LU5LdhS57MELf7TWYKl45AekyzkjQznoPIQQ De6l9j1NZwXj7yWe2UbulyONUvF8wfekDKq+YpPwuFtmLl9hDVutI6J4xDnO7hRAdL8z lwOg== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id 33si30601057pli.144.2019.07.31.19.18.01 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id B0A413617D2; Thu, 1 Aug 2019 12:17:57 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003aW-RG; Thu, 01 Aug 2019 12:16:50 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001km-PT; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 05/24] shrinker: clean up variable types and tracepoints Date: Thu, 1 Aug 2019 12:17:33 +1000 Message-Id: <20190801021752.4986-6-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=dfQxWFgAP5TgkvwPFjsA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner The tracepoint information in the shrinker code don't make a lot of sense anymore and contain redundant information as a result of the changes in the patchset. Refine the information passed to the tracepoints so they expose the operation of the shrinkers more precisely and clean up the remaining code and varibles in the shrinker code so it all makes sense. Signed-off-by: Dave Chinner --- include/trace/events/vmscan.h | 69 ++++++++++++++++------------------- mm/vmscan.c | 24 +++++------- 2 files changed, 41 insertions(+), 52 deletions(-) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index a5ab2973e8dc..110637d9efa5 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -184,84 +184,77 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re TRACE_EVENT(mm_shrink_slab_start, TP_PROTO(struct shrinker *shr, struct shrink_control *sc, - long nr_objects_to_shrink, unsigned long cache_items, - unsigned long long delta, unsigned long total_scan, - int priority), + int64_t deferred_count, int64_t freeable_objects, + int64_t scan_count, int priority), - TP_ARGS(shr, sc, nr_objects_to_shrink, cache_items, delta, total_scan, + TP_ARGS(shr, sc, deferred_count, freeable_objects, scan_count, priority), TP_STRUCT__entry( __field(struct shrinker *, shr) __field(void *, shrink) __field(int, nid) - __field(long, nr_objects_to_shrink) - __field(gfp_t, gfp_flags) - __field(unsigned long, cache_items) - __field(unsigned long long, delta) - __field(unsigned long, total_scan) + __field(int64_t, deferred_count) + __field(int64_t, freeable_objects) + __field(int64_t, scan_count) __field(int, priority) + __field(gfp_t, gfp_flags) ), TP_fast_assign( __entry->shr = shr; __entry->shrink = shr->scan_objects; __entry->nid = sc->nid; - __entry->nr_objects_to_shrink = nr_objects_to_shrink; - __entry->gfp_flags = sc->gfp_mask; - __entry->cache_items = cache_items; - __entry->delta = delta; - __entry->total_scan = total_scan; + __entry->deferred_count = deferred_count; + __entry->freeable_objects = freeable_objects; + __entry->scan_count = scan_count; __entry->priority = priority; + __entry->gfp_flags = sc->gfp_mask; ), - TP_printk("%pS %p: nid: %d objects to shrink %ld gfp_flags %s cache items %ld delta %lld total_scan %ld priority %d", + TP_printk("%pS %p: nid: %d scan count %lld freeable items %lld deferred count %lld priority %d gfp_flags %s", __entry->shrink, __entry->shr, __entry->nid, - __entry->nr_objects_to_shrink, - show_gfp_flags(__entry->gfp_flags), - __entry->cache_items, - __entry->delta, - __entry->total_scan, - __entry->priority) + __entry->scan_count, + __entry->freeable_objects, + __entry->deferred_count, + __entry->priority, + show_gfp_flags(__entry->gfp_flags)) ); TRACE_EVENT(mm_shrink_slab_end, - TP_PROTO(struct shrinker *shr, int nid, int shrinker_retval, - long unused_scan_cnt, long new_scan_cnt, long total_scan), + TP_PROTO(struct shrinker *shr, int nid, int64_t freed_objects, + int64_t scanned_objects, int64_t deferred_scan), - TP_ARGS(shr, nid, shrinker_retval, unused_scan_cnt, new_scan_cnt, - total_scan), + TP_ARGS(shr, nid, freed_objects, scanned_objects, + deferred_scan), TP_STRUCT__entry( __field(struct shrinker *, shr) __field(int, nid) __field(void *, shrink) - __field(long, unused_scan) - __field(long, new_scan) - __field(int, retval) - __field(long, total_scan) + __field(long long, freed_objects) + __field(long long, scanned_objects) + __field(long long, deferred_scan) ), TP_fast_assign( __entry->shr = shr; __entry->nid = nid; __entry->shrink = shr->scan_objects; - __entry->unused_scan = unused_scan_cnt; - __entry->new_scan = new_scan_cnt; - __entry->retval = shrinker_retval; - __entry->total_scan = total_scan; + __entry->freed_objects = freed_objects; + __entry->scanned_objects = scanned_objects; + __entry->deferred_scan = deferred_scan; ), - TP_printk("%pS %p: nid: %d unused scan count %ld new scan count %ld total_scan %ld last shrinker return val %d", + TP_printk("%pS %p: nid: %d freed objects %lld scanned objects %lld, deferred scan %lld", __entry->shrink, __entry->shr, __entry->nid, - __entry->unused_scan, - __entry->new_scan, - __entry->total_scan, - __entry->retval) + __entry->freed_objects, + __entry->scanned_objects, + __entry->deferred_scan) ); TRACE_EVENT(mm_vmscan_lru_isolate, diff --git a/mm/vmscan.c b/mm/vmscan.c index c583b4efb9bf..d5ce26b4d49d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -505,7 +505,6 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, int64_t scanned_objects = 0; int64_t next_deferred = 0; int64_t deferred_count = 0; - long new_nr; int nid = shrinkctl->nid; long batch_size = shrinker->batch ? shrinker->batch : SHRINK_BATCH; @@ -564,8 +563,7 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, scan_count = freeable_objects * 2; trace_mm_shrink_slab_start(shrinker, shrinkctl, deferred_count, - freeable_objects, scan_count, - scan_count, priority); + freeable_objects, scan_count, priority); /* * If the shrinker can't run (e.g. due to gfp_mask constraints), then @@ -609,23 +607,21 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, } done: + /* + * Calculate the remaining work that we need to defer to kswapd, and + * store it in a manner that handles concurrent updates. If we exhausted + * the scan, there is no need to do an update. + */ if (deferred_count) next_deferred = deferred_count - scanned_objects; else if (scan_count > 0) next_deferred = scan_count; - /* - * move the unused scan count back into the shrinker in a - * manner that handles concurrent updates. If we exhausted the - * scan, there is no need to do an update. - */ + if (next_deferred > 0) - new_nr = atomic_long_add_return(next_deferred, - &shrinker->nr_deferred[nid]); - else - new_nr = atomic_long_read(&shrinker->nr_deferred[nid]); + atomic_long_add(next_deferred, &shrinker->nr_deferred[nid]); - trace_mm_shrink_slab_end(shrinker, nid, freed, deferred_count, new_nr, - scan_count); + trace_mm_shrink_slab_end(shrinker, nid, freed, scanned_objects, + next_deferred); return freed; } From patchwork Thu Aug 1 02:17:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070077 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0282213B1 for ; Thu, 1 Aug 2019 02:33:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA13826E3E for ; Thu, 1 Aug 2019 02:33:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DE8172842B; Thu, 1 Aug 2019 02:33:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67F2926B41 for ; Thu, 1 Aug 2019 02:33:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AF458E000A; Wed, 31 Jul 2019 22:33:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9376C8E0001; Wed, 31 Jul 2019 22:33:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84FC18E000A; Wed, 31 Jul 2019 22:33:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 5115C8E0001 for ; Wed, 31 Jul 2019 22:33:31 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id 191so44618182pfy.20 for ; Wed, 31 Jul 2019 19:33:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=+HMegXJqs63r9fNYwwecy8WW9ooPRF/60fsoRvsrbR0=; b=eNVpX7dcbu1drx3ZX1QKhsiFWu6opIeRCiLclXeECs9S9pklxAneD3g0SavTw0V/0+ 803MDswwLYTiYePPTHa3azoDeBPDhS+yLq4xzxSEDniVjDF5mPkJ6/WDZad6rircJuyY vrRGOLQYxzsFxTZCEBHz0P36L9iwuskkK0SYrxDdS78izOm8jFcrlmh13ATkgckxHQNe F0Bz0uDBfj09r5ziC3lP/tDC/prgXn6Q/URm4Yml6y1VEmsvYnP+ymR//3olvzGlVYAu UDSfxtlVYmN5RN7fN32TG9EiYdLXjJJdy9zRplXdxnU3yfXXtblfk/BoFEhGmFGuA9IS C6bw== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXRzc6UZH4STgvRrlnUvTRPAf3UNvmggPrOI5xMBV+FKuR2qjw1 gZNi623b1Xc0dxZL2acqbq2pdLNHUKSMgBMVECzvdXF44zy88IgTew8NCphuaMD9TJ5GFgyQnOP jvogha3MsXDUMH8cbKiBwhiLJhMS4EfM6WtOBMYEXyNo9ewbSIHpwuDk7sJXATBw= X-Received: by 2002:a63:d210:: with SMTP id a16mr5696845pgg.77.1564626810804; Wed, 31 Jul 2019 19:33:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqw0H4TwdSQoRwfOqR0L8hjehCYkxe4CVIMOyT9nVYNkwH1eNpJWcnY5ttNrqYCzeFS63A6z X-Received: by 2002:a63:d210:: with SMTP id a16mr5696795pgg.77.1564626809918; Wed, 31 Jul 2019 19:33:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564626809; cv=none; d=google.com; s=arc-20160816; b=WxpZM6ChDs05znAeVXpdHNGQapihvhkQ1KfvUAPWfHxAZM8QCe0qRcSWDthNgtPRDX 2P/YYZC4hUSmVvTNwBuDIXz4I7Yeufd/RlfpJfHY1keWv+BFrJQ/JWmmjSJKK3Z93S5z RwPN8SO1y3KmvIkX/s7v/7+i1itciyl1aD6Tn2h2ycnRA7LhFg19IFg7gCqaDsiYCaoz HIYfksmRaeJKzXuEmXz1J0nXnRN7tt76bFc4XX2P0B07Iqxut1KahWL1HTGe5vhFqpv6 lLFmrrcX6PGOQk2kxi+4g7+OhfBYIv3LbqiqxcF8TX5mUKQrob7x6VGxzTKfv1jKR/Tv hKCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=+HMegXJqs63r9fNYwwecy8WW9ooPRF/60fsoRvsrbR0=; b=udoFYPOZHcc435P7rP2RxFIHEwh6iSSKUDpbcWycJBFFNAs2HtLhaSfog4Lw9/4GLo /tJr6fFTzKK+VUpDhbSsWesAW/tANmhb2i+DMqMHKPjdHZcRK8+Qa4CQ/4rExTNw9OF9 KFE7ygnmNjYaFVV28kmknaeZPEBXwgjmailJ3tlEGEn69l52XXeM4FXspsBzMlEk4EXY S1MLGmy4jGjEE9d0L/JdFaCtq0ncZOgeLj1Sv2Efeh0EX4MXwQLobTKj7R9NaZzSnSnK 9Nx97lw6pnDzBQqcwUAtOckfAYvDfVNwVhtZWFqlYUulYhU/sZ+nrjyAIX8++JpGQEGJ 9c7w== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id u21si36009225pgn.290.2019.07.31.19.33.29 for ; Wed, 31 Jul 2019 19:33:29 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 86C32362348 for ; Thu, 1 Aug 2019 12:33:28 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003aY-Se; Thu, 01 Aug 2019 12:16:50 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001ko-QX; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 06/24] mm: reclaim_state records pages reclaimed, not slabs Date: Thu, 1 Aug 2019 12:17:34 +1000 Message-Id: <20190801021752.4986-7-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=pP-XxJAliQGaY3BKeNcA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Name change only, no logic changes. Signed-off-by: Dave Chinner --- fs/inode.c | 2 +- include/linux/swap.h | 5 +++-- mm/slab.c | 2 +- mm/slob.c | 2 +- mm/slub.c | 2 +- mm/vmscan.c | 4 ++-- 6 files changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 0f1e3b563c47..8c70f0643218 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -762,7 +762,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item, else __count_vm_events(PGINODESTEAL, reap); if (current->reclaim_state) - current->reclaim_state->reclaimed_slab += reap; + current->reclaim_state->reclaimed_pages += reap; } iput(inode); spin_lock(lru_lock); diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a33b7e..978e6cd5c05a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -126,10 +126,11 @@ union swap_header { /* * current->reclaim_state points to one of these when a task is running - * memory reclaim + * memory reclaim. It is typically used by shrinkers to return reclaim + * information back to the main vmscan loop. */ struct reclaim_state { - unsigned long reclaimed_slab; + unsigned long reclaimed_pages; /* pages freed by shrinkers */ }; #ifdef __KERNEL__ diff --git a/mm/slab.c b/mm/slab.c index 9df370558e5d..abc97e340f6d 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1396,7 +1396,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page) page->mapping = NULL; if (current->reclaim_state) - current->reclaim_state->reclaimed_slab += 1 << order; + current->reclaim_state->reclaimed_pages += 1 << order; uncharge_slab_page(page, order, cachep); __free_pages(page, order); } diff --git a/mm/slob.c b/mm/slob.c index 7f421d0ca9ab..c46ce297805e 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -208,7 +208,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node) static void slob_free_pages(void *b, int order) { if (current->reclaim_state) - current->reclaim_state->reclaimed_slab += 1 << order; + current->reclaim_state->reclaimed_pages += 1 << order; free_pages((unsigned long)b, order); } diff --git a/mm/slub.c b/mm/slub.c index e6c030e47364..a3e4bc62383b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1743,7 +1743,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page) page->mapping = NULL; if (current->reclaim_state) - current->reclaim_state->reclaimed_slab += pages; + current->reclaim_state->reclaimed_pages += pages; uncharge_slab_page(page, order, s); __free_pages(page, order); } diff --git a/mm/vmscan.c b/mm/vmscan.c index d5ce26b4d49d..231ddcfcd046 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2765,8 +2765,8 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) } while ((memcg = mem_cgroup_iter(root, memcg, &reclaim))); if (reclaim_state) { - sc->nr_reclaimed += reclaim_state->reclaimed_slab; - reclaim_state->reclaimed_slab = 0; + sc->nr_reclaimed += reclaim_state->reclaimed_pages; + reclaim_state->reclaimed_pages = 0; } /* Record the subtree's reclaim efficiency */ From patchwork Thu Aug 1 02:17:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070033 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D30E186E for ; Thu, 1 Aug 2019 02:18:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5031C2844C for ; Thu, 1 Aug 2019 02:18:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 447FA28450; Thu, 1 Aug 2019 02:18:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFA2D2842A for ; Thu, 1 Aug 2019 02:18:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC51F8E0011; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E25608E000F; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA5478E0010; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 595B08E0003 for ; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id t2so38647041plo.10 for ; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=cdJt571N0eKeazYtNbEQv7Yth90iWql1zac5B5tB+u0=; b=Eo7kHwt5/GW3RuGolOKhR6f+YBIFzr2f3tk+uOjFLIAp1XoWeiScS6r0U33ZXkDGZz cj3ObJzl99DYt4MLlwFVdvmkXi2k3ckt6fUi8icbQxKkC39J3+Ll9Aq6CStLifYmqAPi fM3qLhHA7B7oiiLAZPnI8V2J32TyfwFu2zvPTcqCenOsC4f9TYwj1Tw6DCyU88qQwQ// RglbYJ8W2nAuXcTB4yFJlbGPaaWU+B+lLLepc0nPG2bZrCkJ61fnv5tr+7GpFtq/zgcH WKz7x4G76f5wEK3Oo28rDRKgvSi4H5PfFz4vp1FkTocpdcZCd+s7HmEXayLC/q4qtFON C+0g== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAUcLZVfbpomPXgS5sNs1T6D4f1/RpuGVpfphPUMBimkUwo6fEUh k5d4gBvnRvz0Q7Nd1YdrIqt6FbkcdfDETCcfQvZ1jJ6S5gxzo+V0NfAtkKqI/rXz/1+7coz63iH z6Z8o7y4NnUpHDSaJPLSZdRDhvEpLDCPznlp3ezbkZwe9DVERKgv0jWK7+KvuXAU= X-Received: by 2002:a17:90a:fa07:: with SMTP id cm7mr5746390pjb.138.1564625891028; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqwgUVSae6l5WfvsR7xPvqgs8ngiUhGEzOVi7OBVhLLWUbu+VdzodISx3vcQ1odghTIUgDOH X-Received: by 2002:a17:90a:fa07:: with SMTP id cm7mr5745894pjb.138.1564625881611; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625881; cv=none; d=google.com; s=arc-20160816; b=hT3MwgW2P0hI4WRPZsDTuyqjcW39xMWxthxGaUod6v33Ddrx12xS945lTkOIQPKsZf 2gEOb+di8JZ0Ofn70B8nlfFlWdqDBZzli3wrcwwQKZ7fHHkWxDxGBcD3Dn1GxDqMmTep DnHmpgYMeW/uafZh5iIEC33WUHxCcYeMEv5aWN4nanXKCw2YMEdlMZ2S6DQpVnlMqUnc beDml31yhkIqGmQ5JEMC5MJI1xAH6zooPfgq+fbe+nf1a9bGpAqokSTkQikl97ced+TO 8IXcN5ARoZoDc42k67s8EpJr95fO81HSik5/lOg+KDwJaPxZjYHnqvGzZfqTTjLeqEV5 kH3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=cdJt571N0eKeazYtNbEQv7Yth90iWql1zac5B5tB+u0=; b=ArCzy3fD7FyoJkRTGykcw+uLqqxOVWoE8pai5rpx7v0TSB9Wbb7cCtwFJclAEmCzW7 AbkbuCIYlsEPqWaKwAb+pJOY/VTr/lfg6zqHfYi3yAq6OUvElaOI3zggxuEw60T/dQIW I5Z3XiGSsUhqBnRrfwj9aAJHChlsNSJnjwxOgP//zSbV0igKrESNlYAC6s3DVZtH1TDQ pP/VuajIsMo48MTnlFydenEvlnqxGO9WC1DRUh00SjHbzutUVy2VInC9GNSNSUMMBvY1 tYmKPoa75PJQEV+WG3g5jYOkgmF7yVPhCQCGnqsD3Y1cIhYAtlSgCFVspo7KEqXTSGRJ TVOA== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id j69si31122906pgd.589.2019.07.31.19.18.01 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id B21DD43C95D; Thu, 1 Aug 2019 12:17:57 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003aa-Ts; Thu, 01 Aug 2019 12:16:50 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001ks-Rd; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 07/24] mm: back off direct reclaim on excessive shrinker deferral Date: Thu, 1 Aug 2019 12:17:35 +1000 Message-Id: <20190801021752.4986-8-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=c3jh6I83BcSAbW0NpfQA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner When the majority of possible shrinker reclaim work is deferred by the shrinkers (e.g. due to GFP_NOFS context), and there is more work defered than LRU pages were scanned, back off reclaim if there are large amounts of IO in progress. This tends to occur when there are inode cache heavy workloads that have little page cache or application memory pressure on filesytems like XFS. Inode cache heavy workloads involve lots of IO, so if we are getting device congestion it is indicative of memory reclaim running up against an IO throughput limitation. in this situation we need to throttle direct reclaim as we nee dto wait for kswapd to get some of the deferred work done. However, if there is no device congestion, then the system is keeping up with both the workload and memory reclaim and so there's no need to throttle. Hence we should only back off scanning for a bit if we see this condition and there is block device congestion present. Signed-off-by: Dave Chinner --- include/linux/swap.h | 2 ++ mm/vmscan.c | 30 +++++++++++++++++++++++++++++- 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 978e6cd5c05a..1a3502a9bc1f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -131,6 +131,8 @@ union swap_header { */ struct reclaim_state { unsigned long reclaimed_pages; /* pages freed by shrinkers */ + unsigned long scanned_objects; /* quantity of work done */ + unsigned long deferred_objects; /* work that wasn't done */ }; #ifdef __KERNEL__ diff --git a/mm/vmscan.c b/mm/vmscan.c index 231ddcfcd046..4dc8e333f2c6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -569,8 +569,11 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, * If the shrinker can't run (e.g. due to gfp_mask constraints), then * defer the work to a context that can scan the cache. */ - if (shrinkctl->will_defer) + if (shrinkctl->will_defer) { + if (current->reclaim_state) + current->reclaim_state->deferred_objects += scan_count; goto done; + } /* * Normally, we should not scan less than batch_size objects in one @@ -605,6 +608,8 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, cond_resched(); } + if (current->reclaim_state) + current->reclaim_state->scanned_objects += scanned_objects; done: /* @@ -2766,7 +2771,30 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) if (reclaim_state) { sc->nr_reclaimed += reclaim_state->reclaimed_pages; + + /* + * If we are deferring more work than we are actually + * doing in the shrinkers, and we are scanning more + * objects than we are pages, the we have a large amount + * of slab caches we are deferring work to kswapd for. + * We better back off here for a while, otherwise + * we risk priority windup, swap storms and OOM kills + * once we empty the page lists but still can't make + * progress on the shrinker memory. + * + * kswapd won't ever defer work as it's run under a + * GFP_KERNEL context and can always do work. + */ + if ((reclaim_state->deferred_objects > + sc->nr_scanned - nr_scanned) && + (reclaim_state->deferred_objects > + reclaim_state->scanned_objects)) { + wait_iff_congested(BLK_RW_ASYNC, HZ/50); + } + reclaim_state->reclaimed_pages = 0; + reclaim_state->deferred_objects = 0; + reclaim_state->scanned_objects = 0; } /* Record the subtree's reclaim efficiency */ From patchwork Thu Aug 1 02:17:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070043 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0A1F1395 for ; Thu, 1 Aug 2019 02:18:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C663F2844C for ; Thu, 1 Aug 2019 02:18:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB05928433; Thu, 1 Aug 2019 02:18:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E5F6283C8 for ; Thu, 1 Aug 2019 02:18:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 270B08E000E; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EC29C8E0010; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2E0B8E0003; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 612FB8E000E for ; Wed, 31 Jul 2019 22:18:11 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id 191so44592602pfy.20 for ; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=6Ypp4aA3f8kQSGxsw8XISBG/OEfUW8QH41P+GPRTyvE=; b=M94N1/1qlXdMf+V7vtkiZwjgrNKrcydkAtKfHGyD/LP6ykjmBpzTr/eihotNb31wqx iqWrOqARyaavmiSJtS2P3nM8olReqEIiO/DhStxpFd7mCG124oFxDh3CxUrYEESXmnUg XFFEeDOCTc92S0xA4BoafAUjJa25vG+0VSX9a0PsasGyZTQyqVf3D1m7p4vtKlzqhw6c /B5JLCe/j35L8pplHszT8DgrpQ7wUjTRb4HMSAKRhvjbep01x9H8NDRoqdzfYpLPVR5M wkZTZ3QtErDBDytI+/8PVF2vRowTQbd0wVqF94ywtwq4uhBc7CkmRukhXwFMe+DeMqKj Ctng== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXyeR7KIzOdW2qtaL8RhQVeiWBtOh+Z+5FHWSOBE8mkmW13/HS5 wWqJiJ1QGXKY4EQXEsO3b/Nl7HcKQ8Jfq4QGHv615qWoG0zxizo3aZPZPpHTI28K1rajiE90A+7 M8uLvl75JO2Fh7SgPIf+cTfPxUAooyp9GmNRPJeweTztUOElMVj2cSH9KRDugcKA= X-Received: by 2002:a17:90a:8d86:: with SMTP id d6mr5891412pjo.94.1564625891061; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqxl0Q0INw28ahWjXT5HbIEAT3G1qlmW4V9AhOyZCNmyXmoo8HMsScjR5HCDA15qkgn9jiGO X-Received: by 2002:a17:90a:8d86:: with SMTP id d6mr5890893pjo.94.1564625881679; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625881; cv=none; d=google.com; s=arc-20160816; b=aY+Yn6TV30finFNb/ELdSWswBO6WZp2jWnGUY5C7+J5n5NhNFeJMRCAkBEQWzwOks1 PcK9R84vmedpIb/WbpvjpIk+7rIcS0J4bxn7nPnkitCw7YYYo6SPwg7eJ5uC0FcqxYYr O9SFlWAFGx92SE+Y+429DUWgYDCXB3gsK0jpJkyEwvc3TDYdoD9/IadVkrdtccql3sCD 0Q3rZ71VmXhDI27XG6qqdHPdbr+Low0SIiuDvy1adBCBhd5hb6wp+rto1vRvfMKOnyPo R7edogSPcxhKBp27QZPE0Lh0zG+e2nVsEw5GYJ/76qnhhaBSppNw3nr57raMhOglrVpW Rn2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=6Ypp4aA3f8kQSGxsw8XISBG/OEfUW8QH41P+GPRTyvE=; b=E8Etq5MW1erquQlxMIVOjAlCgx6vPyn/1Jj57NsIDvegc9xfZuP7hDrx5CtphDPdiO R7AXj1z29jrVHwuMIen1VetoMk2WpvowJh6pSQDlNvh9FR4ZfvCeoXl9eIVIZ9bbd6Ti 70bqx12aRt+QCYv/vfdfYt2oakRgl83u756L2q8cYGDMWkBImaQ8LuwUJqxLbS+Vvfdo rR/9fNTbEnhsY13HohpkrRewPh8rbAZxssmSH28hnzG8G6+2PDQm2xLAWiUXiz4BYX1r jI7DLtZR6NQ1BgyaRMye4C/+ppBbpRSsAbry/+puIllmmcpF+gY/u2FkgB/n9saTBA7y E8Cg== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id k143si33125871pfd.212.2019.07.31.19.18.01 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A2FD143EBEB; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003aj-Up; Thu, 01 Aug 2019 12:16:50 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001ku-Su; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 08/24] mm: kswapd backoff for shrinkers Date: Thu, 1 Aug 2019 12:17:36 +1000 Message-Id: <20190801021752.4986-9-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=NNMOctoXzqbiiAOzY8AA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner When kswapd reaches the end of the page LRU and starts hitting dirty pages, the logic in shrink_node() allows it to back off and wait for IO to complete, thereby preventing kswapd from scanning excessively and driving the system into swap thrashing and OOM conditions. When we have inode cache heavy workloads on XFS, we have exactly the same problem with reclaim inodes. The non-blocking kswapd reclaim will keep putting pressure onto the inode cache which is unable to make progress. When the system gets to the point where there is no pages in the LRU to free, there is no swap left and there are no clean inodes that can be freed, it will OOM. This has a specific signature in OOM: [ 110.841987] Mem-Info: [ 110.842816] active_anon:241 inactive_anon:82 isolated_anon:1 active_file:168 inactive_file:143 isolated_file:0 unevictable:2621523 dirty:1 writeback:8 unstable:0 slab_reclaimable:564445 slab_unreclaimable:420046 mapped:1042 shmem:11 pagetables:6509 bounce:0 free:77626 free_pcp:2 free_cma:0 In this case, we have about 500-600 pages left in teh LRUs, but we have ~565000 reclaimable slab pages still available for reclaim. Unfortunately, they are mostly dirty inodes, and so we really need to be able to throttle kswapd when shrinker progress is limited due to reaching the dirty end of the LRU... So, add a flag into the reclaim_state so if the shrinker decides it needs kswapd to back off and wait for a while (for whatever reason) it can do so. Signed-off-by: Dave Chinner --- include/linux/swap.h | 1 + mm/vmscan.c | 10 +++++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 1a3502a9bc1f..416680b1bf0c 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -133,6 +133,7 @@ struct reclaim_state { unsigned long reclaimed_pages; /* pages freed by shrinkers */ unsigned long scanned_objects; /* quantity of work done */ unsigned long deferred_objects; /* work that wasn't done */ + bool need_backoff; /* tell kswapd to slow down */ }; #ifdef __KERNEL__ diff --git a/mm/vmscan.c b/mm/vmscan.c index 4dc8e333f2c6..029dba76ee5a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2844,8 +2844,16 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) * implies that pages are cycling through the LRU * faster than they are written so also forcibly stall. */ - if (sc->nr.immediate) + if (sc->nr.immediate) { congestion_wait(BLK_RW_ASYNC, HZ/10); + } else if (reclaim_state && reclaim_state->need_backoff) { + /* + * Ditto, but it's a slab cache that is cycling + * through the LRU faster than they are written + */ + congestion_wait(BLK_RW_ASYNC, HZ/10); + reclaim_state->need_backoff = false; + } } /* From patchwork Thu Aug 1 02:17:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070075 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD62A13B1 for ; Thu, 1 Aug 2019 02:33:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C225F28451 for ; Thu, 1 Aug 2019 02:33:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B622328420; Thu, 1 Aug 2019 02:33:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 612F726E78 for ; Thu, 1 Aug 2019 02:33:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8764C8E0008; Wed, 31 Jul 2019 22:33:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8011B8E0001; Wed, 31 Jul 2019 22:33:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EE208E0008; Wed, 31 Jul 2019 22:33:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 3BD3D8E0001 for ; Wed, 31 Jul 2019 22:33:29 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id y9so38681605plp.12 for ; Wed, 31 Jul 2019 19:33:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=qDztLN89iCHs4hLXE0pbjxysAKrLOIBJlbYdi/SMrE8=; b=DxRus41ecjHiRLPDr0MzV8LMYFygp3UqDPrdUN8Ki37sKbEzjJa3zaxUV9C46dwCRO yooRlciby7oMR2zHU2+jeTZezLKwRU/N+qPge+/gFl7CH+AxPMO/7PnT7UjfeHOjdbvG 1WpT4yeX7302HUfKY64JY+Rw6ZBfL85NON3MIbqN6AzqPEYtS8MiU8F5lfkZ6Wt7+AQ8 OSbRmwnvBXvkthVu3QU1/kgvTg95oDbVRdHh0QNEQlTkgDD1cZnhcw86WL16YiSovuxr EBYeQGkNY534Q5V2+xR6VQUQaJf+h8xfomLT6D7g38KJ/nIfvKysTzCz272E8PohB+W7 v84g== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAU/omUCaYyp6tFU0pvwyGr2pUpVpu6q9s/cmz/TNwcThsbiDGSy lO6bIvWG2oA4jJ+bNNc3UOiNwYKdPxanOujW3m/XpKQk3N5l9oJ76Loa0la2nqDDJ0qCWIafYwM DKiAI9fiMhcIz0n/4IL0cE57cQFC2tlSYEAQUO5LE4e0COiddqAz5w37QzO8EVC8= X-Received: by 2002:a65:4844:: with SMTP id i4mr5907296pgs.113.1564626808874; Wed, 31 Jul 2019 19:33:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqxsW6CZJWhbr7S1o47j8v7A0UPO9jiNUPKJ2MOQ7xrTrvGlJuUj7SRLpbgNfUcAotbNGBO8 X-Received: by 2002:a65:4844:: with SMTP id i4mr5907247pgs.113.1564626808117; Wed, 31 Jul 2019 19:33:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564626808; cv=none; d=google.com; s=arc-20160816; b=j8NhDVaqYahZ7lNGt3O3sOSDvNQmqz1OQV1Sw6Dlf1nJw6v8FKV76JuLGAEpvDZqV0 lyciqBpQ8O7z+lAbsML70e1hO6aySc0aYCAvecpvl101C6eTZYE10Xmnqt5whuzyFebJ GJ7Wgku/HVZrGegxGtphansXz/SStK3zHBcrGTchC8pkGvoa+5m3nWBA5dBvBwjXwwI3 Lw0VsUcEJHX+1IHufKDAnsaFV6/i24QfZnFaaLqbchYkuKpklNFOj0DN9Qq/4QmjxRz1 n5/Jrv6Vc/Ac5N7APhrtPpoDMLTZ/TBDYu5kYlnlWiyup0VFH2G3b/1tHQFBoz+AvZzG 4kIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=qDztLN89iCHs4hLXE0pbjxysAKrLOIBJlbYdi/SMrE8=; b=mO/fD/WlSdlj3ltKIt0VFFu/6Gxl0vSuK+of/ZEQov/w2ExQBcW1uGPNGUC6F69bYp Yti2ibqlDX0RJhgA5PkcnXv4VvgdrW8zfTKQUgYTAJCfSzad+Zsoy0bJS4Q+LDlDG5yj 3wgTLf/JTM6oYOu6wiDax6NCOpIRYSyBT2fGQKFlWXtiGqrU6Jcgt6SmR0VnGddkmX3K TY9ATxj8njDniWoHEsusB9tlb4wCqt0KGPGm5A31ZQsZz3woG0U58/OsPGsfCW6bMgXG Zhtfr4/mekWkA4xzlGTrwtUxLxCdByKLRGe7a42Zeu0tSkaxgBTDU57Tlk90ePAh+oE4 dMag== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id n75si2816008pjc.27.2019.07.31.19.33.27 for ; Wed, 31 Jul 2019 19:33:28 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id DB35E362329 for ; Thu, 1 Aug 2019 12:33:26 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eA-0003am-W6; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001kx-Tx; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 09/24] xfs: don't allow log IO to be throttled Date: Thu, 1 Aug 2019 12:17:37 +1000 Message-Id: <20190801021752.4986-10-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=5HahVxdoFHTWBnBQlCYA:9 a=DiKeHqHhRZ4A:10 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Running metadata intensive workloads, I've been seeing the AIL pushing getting stuck on pinned buffers and triggering log forces. The log force is taking a long time to run because the log IO is getting throttled by wbt_wait() - the block layer writeback throttle. It's being throttled because there is a huge amount of metadata writeback going on which is filling the request queue. IOWs, we have a priority inversion problem here. Mark the log IO bios with REQ_IDLE so they don't get throttled by the block layer writeback throttle. When we are forcing the CIL, we are likely to need to to tens of log IOs, and they are issued as fast as they can be build and IO completed. Hence REQ_IDLE is appropriate - it's an indication that more IO will follow shortly. And because we also set REQ_SYNC, the writeback throttle will no treat log IO the same way it treats direct IO writes - it will not throttle them at all. Hence we solve the priority inversion problem caused by the writeback throttle being unable to distinguish between high priority log IO and background metadata writeback. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 00e9f5c388d3..7bdea629e749 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1723,7 +1723,15 @@ xlog_write_iclog( iclog->ic_bio.bi_iter.bi_sector = log->l_logBBstart + bno; iclog->ic_bio.bi_end_io = xlog_bio_end_io; iclog->ic_bio.bi_private = iclog; - iclog->ic_bio.bi_opf = REQ_OP_WRITE | REQ_META | REQ_SYNC | REQ_FUA; + + /* + * We use REQ_SYNC | REQ_IDLE here to tell the block layer the are more + * IOs coming immediately after this one. This prevents the block layer + * writeback throttle from throttling log writes behind background + * metadata writeback and causing priority inversions. + */ + iclog->ic_bio.bi_opf = REQ_OP_WRITE | REQ_META | REQ_SYNC | + REQ_IDLE | REQ_FUA; if (need_flush) iclog->ic_bio.bi_opf |= REQ_PREFLUSH; From patchwork Thu Aug 1 02:17:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB46213B1 for ; Thu, 1 Aug 2019 02:33:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF47326E76 for ; Thu, 1 Aug 2019 02:33:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A2C9528429; Thu, 1 Aug 2019 02:33:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21C1E28420 for ; Thu, 1 Aug 2019 02:33:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 590518E0007; Wed, 31 Jul 2019 22:33:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 519C98E0001; Wed, 31 Jul 2019 22:33:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42EC58E0007; Wed, 31 Jul 2019 22:33:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 0F61C8E0001 for ; Wed, 31 Jul 2019 22:33:28 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id e20so44655526pfd.3 for ; Wed, 31 Jul 2019 19:33:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=0PxqOxr1HKVjo3540PJp0gfmBwwVWbrNktsrkewzctc=; b=WbBepw63iSMpWkbLEWRAIhQeAqX56dzIHXmO0PvTD6U7HCXfSsmZNfKc/uz00oJZK4 obxAV2s/eg9POsgTgfjLfsKxeYRmqmL9fFl3jBxxOcBFDHythmHP+lp2zb9cbHTavXh+ MTOqOmDNu2Vb+Z6hRdVKvkbb6LVTR2M9rf3qs3G9iddxqGHPRpCaEFjX9yrAwX5izI7q DfkBwfTNLx+O7wyz476rtWHd5F9WdAagChwVQVbbd0GytN4WQLAzyQjSPnX1l3en6XYb 89TdQ+RIoMJNMMF2ZYxOKtARiFgLZXvfg272JfQUojE5tA+r3isNUfFguz4bd0a5yLB9 c1LQ== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXQCSFk2IqltnHjKTCuKUyAuVjz3uQ+h1NsROqRHBapyf3Yd5ft tXBI3NHnD16nq84IfnHaSNH83wCQWbqmoCWwjYARggG8l2Yqr3YSD/qdi7Fa2L9JX/H7HnUEInU a51H4ybuAGACmiHtHQj0AsZ6dIEqOfydI62LE9LrSCNWIS5QeOh81OdfYY0wNGYs= X-Received: by 2002:a17:902:d917:: with SMTP id c23mr123375638plz.248.1564626807724; Wed, 31 Jul 2019 19:33:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqw8CHZg1TpCenMvHTS6vwopxNbTUTooK+1KKUxtShLY+BSa1WwwrYDc5PjXddXc7WR9F0wx X-Received: by 2002:a17:902:d917:: with SMTP id c23mr123375578plz.248.1564626806492; Wed, 31 Jul 2019 19:33:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564626806; cv=none; d=google.com; s=arc-20160816; b=f06sKW9W1b3r3k1M7QBCWiCv5thgen5eYh0W/JZwex1uHh5ptQGFKV/t5uyySYIBFR /uTrMRR2vzxqQM0Wt1TsgOJigya2f679Mk7UQln//Bcbya9Jd8m1G93pjLtuIouSR8O8 6462OQ6u//kTbmSprOCWD4/2mJwvAFIw1HCr0X5Cv8/EcFC5C7NSSPq84J/gI0mU9iPE txmMbOfxq59VhnbnnUH7XZsLiaQ4+L/ZcUHj9qh4tYgArDrNSKbh6J62tRCbTmJ8x1bq zkJ0V/Y8ngAOwAcSaZii/wEWl1SYcXfQ/Ba1XCWon8o/iXn0lb1J47gij0xkv2ul7b3I +qCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=0PxqOxr1HKVjo3540PJp0gfmBwwVWbrNktsrkewzctc=; b=jqT6MVGT1Ov9vjkPCJae3Lg0+jYwHLCwYcECAiNvXI0h5Q0yq8UJjM/ATNEHnLFePW LMFI+FuLYXctKHWPV9HrCKAakKY7b1AJecBa8Nv+LqNMxnXehO6VTJnnPo6HOua6NFQx oH1ekVeUc78U1Wd3z+eNF66DF4rg5T1cFJ7p+Pw8APTqauwBUN4rzY9a0ZrGx3uEIMxs gc6hHRfL2iyqh6BVEadf4dBOjzGePqXrgV4mLbGMy/dQe8PzmWp/J2XYVfADYNHxOgeu abb92/k38YUlS/RyI8CkXcDJVR7yEzZP+yTa8P5sppx/b5EmyLdVpnCFCz5LPF1liJbC qKjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id m45si2824381pje.39.2019.07.31.19.33.26 for ; Wed, 31 Jul 2019 19:33:26 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 3CD9043D7CA for ; Thu, 1 Aug 2019 12:33:25 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003ap-0r; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001l1-V0; Thu, 01 Aug 2019 12:17:58 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 10/24] xfs: fix missed wakeup on l_flush_wait Date: Thu, 1 Aug 2019 12:17:38 +1000 Message-Id: <20190801021752.4986-11-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=fwyzoN0nAAAA:8 a=FOH2dFAWAAAA:8 a=20KFwNOVAAAA:8 a=TIbIJjZQYPkjqnKAcywA:9 a=Sc3RvPAMVtkGz6dGeUiH:22 a=i3VuKzQdj-NEYjvDI-p3:22 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Rik van Riel The code in xlog_wait uses the spinlock to make adding the task to the wait queue, and setting the task state to UNINTERRUPTIBLE atomic with respect to the waker. Doing the wakeup after releasing the spinlock opens up the following race condition: Task 1 task 2 add task to wait queue wake up task set task state to UNINTERRUPTIBLE This issue was found through code inspection as a result of kworkers being observed stuck in UNINTERRUPTIBLE state with an empty wait queue. It is rare and largely unreproducable. Simply moving the spin_unlock to after the wake_up_all results in the waker not being able to see a task on the waitqueue before it has set its state to UNINTERRUPTIBLE. This bug dates back to the conversion of this code to generic waitqueue infrastructure from a counting semaphore back in 2008 which didn't place the wakeups consistently w.r.t. to the relevant spin locks. [dchinner: Also fix a similar issue in the shutdown path on xc_commit_wait. Update commit log with more details of the issue.] Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t") Reported-by: Chris Mason Signed-off-by: Rik van Riel Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 7bdea629e749..b78c5e95bbba 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2630,7 +2630,6 @@ xlog_state_do_callback( int funcdidcallbacks; /* flag: function did callbacks */ int repeats; /* for issuing console warnings if * looping too many times */ - int wake = 0; spin_lock(&log->l_icloglock); first_iclog = iclog = log->l_iclog; @@ -2826,11 +2825,9 @@ xlog_state_do_callback( #endif if (log->l_iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_IOERROR)) - wake = 1; - spin_unlock(&log->l_icloglock); - - if (wake) wake_up_all(&log->l_flush_wait); + + spin_unlock(&log->l_icloglock); } @@ -3930,7 +3927,9 @@ xfs_log_force_umount( * item committed callback functions will do this again under lock to * avoid races. */ + spin_lock(&log->l_cilp->xc_push_lock); wake_up_all(&log->l_cilp->xc_commit_wait); + spin_unlock(&log->l_cilp->xc_push_lock); xlog_state_do_callback(log, true, NULL); #ifdef XFSERRORDEBUG From patchwork Thu Aug 1 02:17:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11069947 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E26BD1395 for ; Thu, 1 Aug 2019 02:18:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4F4D28455 for ; Thu, 1 Aug 2019 02:18:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C8C0C27528; Thu, 1 Aug 2019 02:18:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 60A832843B for ; Thu, 1 Aug 2019 02:18:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69AC98E0001; Wed, 31 Jul 2019 22:18:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 612888E0005; Wed, 31 Jul 2019 22:18:02 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4395D8E0003; Wed, 31 Jul 2019 22:18:02 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id D35768E0005 for ; Wed, 31 Jul 2019 22:18:01 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id f2so38629947plr.0 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=O8E92+Ia5AD0U55V0BC44ctwhNJuImOq7q/fXkFayPw=; b=n1C/cflTnrVoi5KJ1HdclwNhmRWzwxUbYTgkrrO0e3BPsiMWMDF2F3Oc/zHX3hoJyu s/MxY5duXht3i6oAjWd64wbckNmGCFr56Vgjo4bckRI5vUCrtD5xfrexGwKK9g4jUQzv +1DcZBPaBsF6S6sOVEZInV2BAGiOvJXsvlzdL97iP7jUIiv5UQgnOCiJ12U00RnKT0g9 5t7O2gnDV5mTDC+D/b0u01g0udheLIZ8cSJBdS2d79a4LIoIpYmZG58VLDJXsurW3DVR gk6Ox7oyqwbfXkg1QTLowObG9eSc9R19vPbqjM1WE+LqLms3Ugnqc97MeATiDKxGaDMh K4Tw== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAWVOSEPTaSJBDlV5do1/UOoMfU1Nxw551WLszy8FO5BZdgJNNlE IUQ7gcDlaR/eAlELzHzonRdUnl/+UwLmgpOfHYmLzrQlEfEOWBzctXDZ8rFkTooW1Jf4oZBycQx ryq4Q1rNbmZDZo8o+PqyLzsmZW0AdrQloZU7TDjBwadGBb77bReyAK4L/GS2RXb8= X-Received: by 2002:a63:fe15:: with SMTP id p21mr117957774pgh.149.1564625881424; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqzRD3SDcKZNFpbrh84gUDgnT+4ONgTJvreccI7c/OUFRTDtFQyVflV8sxgFn2hNJLC2AYGO X-Received: by 2002:a63:fe15:: with SMTP id p21mr117957711pgh.149.1564625880189; Wed, 31 Jul 2019 19:18:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625880; cv=none; d=google.com; s=arc-20160816; b=b0XH7qhoQj9D8p2sYdyaL1i67Jl4uaABLB70IGPh0Ho6Iz9U/Cb8W6m0m8GlIfLPUT xTvwzA+V7/AmRm9qt7F3KrbIHCMx4n9kbr68kbB3LQdws7IvTRsyesmXAJI3gzNlNsgJ LaWK+Jk/O6wJZ0rPEhG8ojCjf4npiD06YRiJ2vjRgo8ihCLvcdk4tIZVGZLr9jPjm0mf HoN3tV915K9gwXM2MjK+x5sWxPTwf9HzqVS6v+zPQvilLhkcRMbu26DBjal6M+yvQCrw FVTDUy7ju4Iz1rOjHd4k2rALE3iyMn61mMD7EKTnZlqFkFAF4pFFuZ+Ttq4KjQKLfNox XgyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=O8E92+Ia5AD0U55V0BC44ctwhNJuImOq7q/fXkFayPw=; b=OsPg+mc3rHL5rW/frUWDVkcXMqCBVlmBrBkwJItbPg3ZBh+9LtOTQ9YOr0m2C+Nc81 XdLcPCeLYMQ6jsmdw0jyZPsaOx6kaErvkTNWx9oOT5bcylztRgYh30LC7Uuq+B9sjzVU V/iNY6Mv3bi6bUeYLq9pIk1p8prNWet42cf5Z88IZUhcXKcr1SIvqCrpLWalwgusXTIy E8QhhPvcJZuuYp+6F41HiDLIjWdIsvoHt36wSRdbV7YZzjrMyVO7ewIyAAuKHJKmjTvV wtjAg4NnK73sN7elRNSicjVKUQvHtOh+GViphG2e77BVtJiwaeR2eE6Vpb5skTc2zP4T QVQw== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id f10si36673875pfq.194.2019.07.31.19.17.59 for ; Wed, 31 Jul 2019 19:18:00 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A172443EA07; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003as-26; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fG-0001l4-WA; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 11/24] xfs:: account for memory freed from metadata buffers Date: Thu, 1 Aug 2019 12:17:39 +1000 Message-Id: <20190801021752.4986-12-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=1dk79d6Hl8FtNpQQbMkA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner The buffer cache shrinker frees more than just the xfs_buf slab objects - it also frees the pages attached to the buffers. Make sure the memory reclaim code accounts for this memory being freed correctly, similar to how the inode shrinker accounts for pages freed from the page cache due to mapping invalidation. We also need to make sure that the mm subsystem knows these are reclaimable objects. We provide the memory reclaim subsystem with a a shrinker to reclaim xfs_bufs, so we should really mark the slab that way. We also have a lot of xfs_bufs in a busy system, spread them around like we do inodes. Signed-off-by: Dave Chinner --- fs/xfs/xfs_buf.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 6e0f76532535..beb816cd54d6 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1667,6 +1667,14 @@ xfs_buftarg_shrink_scan( struct xfs_buf *bp; bp = list_first_entry(&dispose, struct xfs_buf, b_lru); list_del_init(&bp->b_lru); + + /* + * Account for the buffer memory freed here so memory reclaim + * sees this and not just the xfs_buf slab entry being freed. + */ + if (current->reclaim_state) + current->reclaim_state->reclaimed_pages += bp->b_page_count; + xfs_buf_rele(bp); } @@ -2057,7 +2065,8 @@ int __init xfs_buf_init(void) { xfs_buf_zone = kmem_zone_init_flags(sizeof(xfs_buf_t), "xfs_buf", - KM_ZONE_HWALIGN, NULL); + KM_ZONE_HWALIGN | KM_ZONE_SPREAD | KM_ZONE_RECLAIM, + NULL); if (!xfs_buf_zone) goto out; From patchwork Thu Aug 1 02:17:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11069979 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4897014DB for ; Thu, 1 Aug 2019 02:18:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E48A2842D for ; Thu, 1 Aug 2019 02:18:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 30EDA26E98; Thu, 1 Aug 2019 02:18:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB3592842D for ; Thu, 1 Aug 2019 02:18:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDC448E0008; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B8DDD8E0003; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A31E58E000A; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 64D918E0008 for ; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id k9so38653286pls.13 for ; Wed, 31 Jul 2019 19:18:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=yJIi7kQG62nMwnJ47jDyvR1n1Z0LdKvDaJwlq08Hgfk=; b=sc0PE3IHOyHNln+P8e7it2UGA/F+swGXEmjeSXXA4UWxNVqk/4pJ63/UujqrkLTV97 PhSnpLsiWWrsYrFtIGE2cTOv3EVAtGf5TjKu+LHqP0+rK1HElZdCUaSK+xWQtTIwVURA Zo1qK5tk+fooow+nwtxeCHgpHh7juhqS1Au/A3d4MfigVmXuSAKKJz42DPM5GFO1/Mgz YDG1kf2peqr0p35TBjJ78dFPMqRkYB+eofP5z9nCxMi8Y4EP1OSXQRos1HPAQPCOJAdA +3EJcqCorBEQNVK2TvssBd/L8y9u3qLKFA0QGmeWPCCfR0lx4ddLtJ+aLXZNdRHAs9y7 WH2Q== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAVVYYfgonocZWHCKmmHHuZDsWXgL/d/aL6ueivC9tv70Q5cf2aG 5nTJV4xtkDl8HSfcANadfADk2snew6MoZt3ogvjjm3UtF7dn2S6dnad3WY4KaqFfHhemrgOCD1X TMMhOf9rAhMeN9ngrjtP1ryCvJZ+DGJOPRUsZNbaRXiRYOKNxj++0Zd30EyAlo3o= X-Received: by 2002:a62:303:: with SMTP id 3mr50294158pfd.118.1564625883054; Wed, 31 Jul 2019 19:18:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqzjB1doqOjBf2LmPoPfXpHs4wYm/ZdedtIk3ycExqbyJy6xUQ/p88MjlfBKrN+rO35+RM9o X-Received: by 2002:a62:303:: with SMTP id 3mr50294057pfd.118.1564625881467; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625881; cv=none; d=google.com; s=arc-20160816; b=cfdnrfey+5OGBRdtB/tC7/0hFQE3Iyd4KX02YO1UwbZ83YT5V/+LBR3CGFaQpujA7j oSldDTLf/9dDpLcfBxgL0Zp30oJDpr578Q1tqETzccQvlMEbUI6sD28NZcEvhRme6NFW tfsI+4oYwXUfvLwdMgo7va60uTK7xMeiBgqTpGl1OGruExbF86+pYp2QvvrWwWade8BJ 0nRui5uJWgmvSbbOGlfTBLZZDBn0vulrM1OilxP4/josvoFZ53KXDeqI/NA57/M9YCXI 0bHJ1J0Reve2JiNi45LdGPFNdsJgkoVDL4Mujivnl8g3IB6I5WoQJafhJZKciTZxfjDT XkPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=yJIi7kQG62nMwnJ47jDyvR1n1Z0LdKvDaJwlq08Hgfk=; b=A2q/dSWXBlZtfUhUfAhjfehecse1NsZf/ru33QJedHsJW81yCvLSIpwtxzecc2huSP kI6D2j0PjG5/FjLrh3pAV7hi20xv34T8e3RaHsrbCnEZ2bDOa+m8hQcruj19pNczNynI l+EAQMgNzEAmcwIY6Qa/5ZoTjwTD6zeErpsz5rTaAzPUYd1f5ynJD1qkYDLBZF1V9k+k 1gT0GgdKWLQPIgg4uKPD3/tJMyZSWVs4YFA5dl9yQ3Jyi3C1ke3io/dTgxy8Sx8X/d/N W7xuNGNi71tSCLbihDzyKvUZSbQCItMd18XSmfAjPBSOtW7auNX1tMe9GHYTCFa+QiEO 9mTw== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id c1si30277329plr.405.2019.07.31.19.18.01 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id A1D8B361350; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003aw-3V; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001l7-16; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 12/24] xfs: correctly acount for reclaimable slabs Date: Thu, 1 Aug 2019 12:17:40 +1000 Message-Id: <20190801021752.4986-13-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=q-nAjRHzglZ9esTleQAA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner The XFS inode item slab actually reclaimed by inode shrinker callbacks from the memory reclaim subsystem. These should be marked as reclaimable so the mm subsystem has the full picture of how much memory it can actually reclaim from the XFS slab caches. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index f9450235533c..67b59815d0df 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1916,7 +1916,7 @@ xfs_init_zones(void) xfs_ili_zone = kmem_zone_init_flags(sizeof(xfs_inode_log_item_t), "xfs_ili", - KM_ZONE_SPREAD, NULL); + KM_ZONE_SPREAD | KM_ZONE_RECLAIM, NULL); if (!xfs_ili_zone) goto out_destroy_inode_zone; xfs_icreate_zone = kmem_zone_init(sizeof(struct xfs_icreate_item), From patchwork Thu Aug 1 02:17:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070083 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D2BB114DB for ; Thu, 1 Aug 2019 02:33:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C555E28420 for ; Thu, 1 Aug 2019 02:33:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B95B828451; Thu, 1 Aug 2019 02:33:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1992628420 for ; Thu, 1 Aug 2019 02:33:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BC348E0014; Wed, 31 Jul 2019 22:33:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3450D8E0001; Wed, 31 Jul 2019 22:33:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BF568E0014; Wed, 31 Jul 2019 22:33:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id CCB9C8E0001 for ; Wed, 31 Jul 2019 22:33:36 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id m17so35209305pgh.21 for ; Wed, 31 Jul 2019 19:33:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=287LQY5tp7/cYuhQjYxuyZcufgwsCxICwLQam0MRGFg=; b=IFENZEcVRQPDX5NmkNv7mZMbiww8/Yihz7/LMk4/jAQgB5/OfiFAHmADY6e26eMEYF QWB9TbqASSD4P79LHyEqP/KtWSh5UFcAzvaVeoi/YDAsYPNmCmAfCxFe5GRFncsc+z41 WsOEXZJa+OMthcl2wUJZa/KBESBpMmOHYrvOBqbWHUJgIhVc3EzFQRNfSBft+C3jOMxV oJt0M+03dbJcZQX6DWSc7AEHhgAaM2Ewh0zMKX+AYY+cmAYlsFz5D24hSSlkev47DX48 i0qcbFSz5lwxd+txWdo9rta+gW3qpeKAjICW+bEzzmge3xQ7ReFhdiYRZVIq+GVxm97r QIwQ== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAUtyE0Oj+IwrDFt0rUXMKG53l0H4sYtt62drpgkg9TBwzH3O0ML AVtFqoBZYlPld5Ipmbf4DWI8b4mrLiu5436XTqaXJ+Hq02LAb/llNifh61vjnrzD7QB1/P2080f ETECNxMWE/FQr3loCTJOrwiXEF6orhuq+0r9KICfmcwktADm86bY/js7PEBHKwR8= X-Received: by 2002:a17:90a:2244:: with SMTP id c62mr6132172pje.29.1564626816524; Wed, 31 Jul 2019 19:33:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqwWUP1QvVABqqbLcqYX4oW9sfiVTOpKdXzW9aaL4KfxgxeygljBfNAJiTZWcJfNhWj5v3C9 X-Received: by 2002:a17:90a:2244:: with SMTP id c62mr6132097pje.29.1564626815186; Wed, 31 Jul 2019 19:33:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564626815; cv=none; d=google.com; s=arc-20160816; b=Iw+epKs2OKC//V1hOn99c2IR6uQEGI4gYT2sO2nMIocOD9PaRSYpkEFW/jEXsmbd3f uYVLlaOvWUwwdvtkNK5c+uftiICTuAHHsJKAuXwmz0gX9M46CJUdUItKH2nTEajKUiBL kgw8S6d6nZcdthFp1jzi64/G4p2Bi6a4BtxXIPw92hT1QbqTtYmysIjR50HEkAOBWp4d zgFvg3lGZoDIxhE6D3DZhPGVk4oBEoLh0685ollOmpRrAtn1OAiQaXa7oYsvdDfst480 KN8dWE58Ik8Vfc7RVBx7OJQ7NIW9MHXt54Z9W+assJkrBJ5Xx1IUUf7ecaUGyI4HQ/td B84Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=287LQY5tp7/cYuhQjYxuyZcufgwsCxICwLQam0MRGFg=; b=XYrGsTI0eHMd5sq2LG3gMCe8POHyELAq4C5UIwR+ElnrlYMaQKCxyK/VeZeWM2gMVE 4UcjKpyp4ZCqyjw0Pxtjm5NwFsIc7GLzko7wp6yrBO+I0yvcaEsAIPuUcvMUh5wfklcY gUdtB1P9y+HFVFAyF/GKtfXU7jLT9bCXyzyMcE/VV6SH2/WSKc36E6COMxgCGmYgzA1u CTa0JUI2/T1VQ0BcYI8oA8xGNm4j6MGk77STg2tQq3cYdEmz+A0YT9Osvi/ZkWIcbmJ/ c8EuaHYgjmR2TSu+Q71bLipmuZ2ER2/CM/Y5s/2zIEdC2rWRnhjbjKVscpaZlV2LMhny N06g== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id 32si9387372plh.154.2019.07.31.19.33.34 for ; Wed, 31 Jul 2019 19:33:35 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id C9846361EED for ; Thu, 1 Aug 2019 12:33:33 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003az-4z; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lB-2Z; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 13/24] xfs: synchronous AIL pushing Date: Thu, 1 Aug 2019 12:17:41 +1000 Message-Id: <20190801021752.4986-14-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=PhaVPwl61JEQPYHB4h0A:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Provide an interface to push the AIL to a target LSN and wait for the tail of the log to move past that LSN. This is used to wait for all items older than a specific LSN to either be cleaned (written back) or relogged to a higher LSN in the AIL. The primary use for this is to allow IO free inode reclaim throttling. Factor the common AIL deletion code that does all the wakeups into a helper so we only have one copy of this somewhat tricky code to interface with all the wakeups necessary when the LSN of the log tail changes. Signed-off-by: Dave Chinner --- fs/xfs/xfs_inode_item.c | 12 +------ fs/xfs/xfs_trans_ail.c | 69 +++++++++++++++++++++++++++++++++-------- fs/xfs/xfs_trans_priv.h | 6 +++- 3 files changed, 62 insertions(+), 25 deletions(-) diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index c9a502eed204..7b942a63e992 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -743,17 +743,7 @@ xfs_iflush_done( xfs_clear_li_failed(blip); } } - - if (mlip_changed) { - if (!XFS_FORCED_SHUTDOWN(ailp->ail_mount)) - xlog_assign_tail_lsn_locked(ailp->ail_mount); - if (list_empty(&ailp->ail_head)) - wake_up_all(&ailp->ail_empty); - } - spin_unlock(&ailp->ail_lock); - - if (mlip_changed) - xfs_log_space_wake(ailp->ail_mount); + xfs_ail_delete_finish(ailp, mlip_changed); } /* diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 6ccfd75d3c24..9e3102179221 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -654,6 +654,37 @@ xfs_ail_push_all( xfs_ail_push(ailp, threshold_lsn); } +/* + * Push the AIL to a specific lsn and wait for it to complete. + */ +void +xfs_ail_push_sync( + struct xfs_ail *ailp, + xfs_lsn_t threshold_lsn) +{ + struct xfs_log_item *lip; + DEFINE_WAIT(wait); + + spin_lock(&ailp->ail_lock); + while ((lip = xfs_ail_min(ailp)) != NULL) { + prepare_to_wait(&ailp->ail_push, &wait, TASK_UNINTERRUPTIBLE); + if (XFS_FORCED_SHUTDOWN(ailp->ail_mount) || + XFS_LSN_CMP(threshold_lsn, lip->li_lsn) <= 0) + break; + /* XXX: cmpxchg? */ + while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0) + xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn); + wake_up_process(ailp->ail_task); + spin_unlock(&ailp->ail_lock); + schedule(); + spin_lock(&ailp->ail_lock); + } + spin_unlock(&ailp->ail_lock); + + finish_wait(&ailp->ail_push, &wait); +} + + /* * Push out all items in the AIL immediately and wait until the AIL is empty. */ @@ -764,6 +795,28 @@ xfs_ail_delete_one( return mlip == lip; } +void +xfs_ail_delete_finish( + struct xfs_ail *ailp, + bool do_tail_update) __releases(ailp->ail_lock) +{ + struct xfs_mount *mp = ailp->ail_mount; + + if (!do_tail_update) { + spin_unlock(&ailp->ail_lock); + return; + } + + if (!XFS_FORCED_SHUTDOWN(mp)) + xlog_assign_tail_lsn_locked(mp); + + wake_up_all(&ailp->ail_push); + if (list_empty(&ailp->ail_head)) + wake_up_all(&ailp->ail_empty); + spin_unlock(&ailp->ail_lock); + xfs_log_space_wake(mp); +} + /** * Remove a log items from the AIL * @@ -789,10 +842,9 @@ void xfs_trans_ail_delete( struct xfs_ail *ailp, struct xfs_log_item *lip, - int shutdown_type) __releases(ailp->ail_lock) + int shutdown_type) { struct xfs_mount *mp = ailp->ail_mount; - bool mlip_changed; if (!test_bit(XFS_LI_IN_AIL, &lip->li_flags)) { spin_unlock(&ailp->ail_lock); @@ -805,17 +857,7 @@ xfs_trans_ail_delete( return; } - mlip_changed = xfs_ail_delete_one(ailp, lip); - if (mlip_changed) { - if (!XFS_FORCED_SHUTDOWN(mp)) - xlog_assign_tail_lsn_locked(mp); - if (list_empty(&ailp->ail_head)) - wake_up_all(&ailp->ail_empty); - } - - spin_unlock(&ailp->ail_lock); - if (mlip_changed) - xfs_log_space_wake(ailp->ail_mount); + xfs_ail_delete_finish(ailp, xfs_ail_delete_one(ailp, lip)); } int @@ -834,6 +876,7 @@ xfs_trans_ail_init( spin_lock_init(&ailp->ail_lock); INIT_LIST_HEAD(&ailp->ail_buf_list); init_waitqueue_head(&ailp->ail_empty); + init_waitqueue_head(&ailp->ail_push); ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s", ailp->ail_mount->m_fsname); diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 2e073c1c4614..5ab70b9b896f 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -61,6 +61,7 @@ struct xfs_ail { int ail_log_flush; struct list_head ail_buf_list; wait_queue_head_t ail_empty; + wait_queue_head_t ail_push; }; /* @@ -92,8 +93,10 @@ xfs_trans_ail_update( } bool xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip); +void xfs_ail_delete_finish(struct xfs_ail *ailp, bool do_tail_update) + __releases(ailp->ail_lock); void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip, - int shutdown_type) __releases(ailp->ail_lock); + int shutdown_type); static inline void xfs_trans_ail_remove( @@ -111,6 +114,7 @@ xfs_trans_ail_remove( } void xfs_ail_push(struct xfs_ail *, xfs_lsn_t); +void xfs_ail_push_sync(struct xfs_ail *, xfs_lsn_t); void xfs_ail_push_all(struct xfs_ail *); void xfs_ail_push_all_sync(struct xfs_ail *); struct xfs_log_item *xfs_ail_min(struct xfs_ail *ailp); From patchwork Thu Aug 1 02:17:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070055 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 240A31395 for ; Thu, 1 Aug 2019 02:20:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1704B2845E for ; Thu, 1 Aug 2019 02:20:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0B3D82842D; Thu, 1 Aug 2019 02:20:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7CE782845E for ; Thu, 1 Aug 2019 02:20:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65C0E8E0006; Wed, 31 Jul 2019 22:20:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 60D2B8E0001; Wed, 31 Jul 2019 22:20:02 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D62D8E0006; Wed, 31 Jul 2019 22:20:02 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 20CCF8E0001 for ; Wed, 31 Jul 2019 22:20:02 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id p29so35895271pgm.10 for ; Wed, 31 Jul 2019 19:20:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=H9r6HEnm/MBGNjDw3DjSsFkwO7PZ7T7uA6vsfNWR2w0=; b=jkMPDVrK9lUbP9N1xfd7iPkPasaSWRnAL6NR/hGYVKpIRTisWtwX8pwM8199uKF03+ vNzESxXmd2V2Oawi7+D222Psh4lUJmUbrXrsjFWr+Q3vKj8yrfOZAvHMwalmY+mOBN4C UEgASvPKHSC5+wO1J5rlNBUK3mKr5RXe6M6P8CXRhdTDG/QeaSXzdn6CNxABH7hkvtQU DbruvNGTWuWu14Aqgb4nPqBZfFI5CwN5SZ1CMw+8JI27OyQ4PIBiKmlha/dHsk7gwAVZ n4nDNE3Dt2Cg7dhydgacMoOcNojAuXahBxo6DUqQMT2JPy2rIDmAKc1Vt9DYr+IiNYNQ POxg== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXfZRx0B0IZe+P1ulWflwLlK/8n45CEm6lVmmZxkfxI/4Dw4HqL Gi0CCJClxEHagY6P6HGh3EzweXIetrnGqWo9SlyyQevjvmcPSIQ98vCwR1K4D9NCmO8M7D6ez4M +KzLuKfh7iWjOUxT+JCCWTBzu3xzj1OYMAVDA9TjiByenN1n8qqjIhaz9rhvIpZs= X-Received: by 2002:aa7:8d88:: with SMTP id i8mr51477713pfr.28.1564626001801; Wed, 31 Jul 2019 19:20:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqwSJfZeG0it6v8Wnczo7BVqMrKuAgazMYli0MFoXJTIC0sjAQCJkz/QVCwjvzYf+LMh48Lh X-Received: by 2002:aa7:8d88:: with SMTP id i8mr51470605pfr.28.1564625882982; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625882; cv=none; d=google.com; s=arc-20160816; b=GzayZXOv443M2fbf9Q8hfmATXrbaJT4geLjkWEDcU2RmZikjKr2IHa1QIR3js/YhYP 8j4eh7/dD8MBI0tJaswd3Ctnm4Kwduhoxg1JqqgNlIRs8g3acQKzCcnKJ8bp575goXvN z1L7bIet1aShuNP1Gxw49NlCz4TSvT37yjBGbYdyEjac9YwSDcUkYT3jtI2lk5FlSHOh uqBRDekVbPcuBnBE94eiQ2yYt2vJBIPQaL1vufTPrd8VJn/SdvqWNgWHFnam8Wy0JYcJ Uc/V58NALfOXrSB92em1W5qsB6XRovreEiQ2sNzZCD39/2Zzhi6PnN6XPJBXVLngiaIP TPOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=H9r6HEnm/MBGNjDw3DjSsFkwO7PZ7T7uA6vsfNWR2w0=; b=bZkvCZV8v8+GRfKlcC/D17eoNjkkiJL5wj7yNkj/7PI9lCpUCFls1U2VtfhuQjeDxG /5GdKPVD7kaQyR6wzmsazP3UR4sdPFUhjY1+FjoD2/mPhqkTxOaI1SUAVkRO6ZyWRyRD SuCuC0ho6IAqdhdOEemkzeMl5BQJN18eolgLzwMwtM1KwLQga1SEGZ7BWfg2j68hjyD1 EsdrBZGPeQ/1K2LfXGZ3VEpUeQXSAbnrVE2AiaNl6IcS/I1WHRHIWAL1sxmJcP/6AOSr mvFhEEvaaVsvbEobq/igCZLlbSOjzrW5KfOJK3X9Fc12EwaV9m9XA0FgFgUJcjV1WdO3 JYtA== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id g15si2670695pjv.54.2019.07.31.19.18.02 for ; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A345643EC85; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003b2-6Z; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lE-44; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 14/24] xfs: tail updates only need to occur when LSN changes Date: Thu, 1 Aug 2019 12:17:42 +1000 Message-Id: <20190801021752.4986-15-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=yAjffBypbmNlQjcpRW8A:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner We currently wake anything waiting on the log tail to move whenever the log item at the tail of the log is removed. Historically this was fine behaviour because there were very few items at any given LSN. But with delayed logging, there may be thousands of items at any given LSN, and we can't move the tail until they are all gone. Hence if we are removing them in near tail-first order, we might be waking up processes waiting on the tail LSN to change (e.g. log space waiters) repeatedly without them being able to make progress. This also occurs with the new sync push waiters, and can result in thousands of spurious wakeups every second when under heavy direct reclaim pressure. To fix this, check that the tail LSN has actually changed on the AIL before triggering wakeups. This will reduce the number of spurious wakeups when doing bulk AIL removal and make this code much more efficient. XXX: occasionally get a temporary hang in xfs_ail_push_sync() with this change - log force from log worker gets things moving again. Only happens under extreme memory pressure - possibly push racing with a tail update on an empty log. Needs further investigation. Signed-off-by: Dave Chinner --- fs/xfs/xfs_inode_item.c | 18 +++++++++++++----- fs/xfs/xfs_trans_ail.c | 37 ++++++++++++++++++++++++++++--------- fs/xfs/xfs_trans_priv.h | 4 ++-- 3 files changed, 43 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 7b942a63e992..16a7d6f752c9 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -731,19 +731,27 @@ xfs_iflush_done( * holding the lock before removing the inode from the AIL. */ if (need_ail) { - bool mlip_changed = false; + xfs_lsn_t tail_lsn = 0; /* this is an opencoded batch version of xfs_trans_ail_delete */ spin_lock(&ailp->ail_lock); list_for_each_entry(blip, &tmp, li_bio_list) { if (INODE_ITEM(blip)->ili_logged && - blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) - mlip_changed |= xfs_ail_delete_one(ailp, blip); - else { + blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) { + /* + * xfs_ail_delete_finish() only cares about the + * lsn of the first tail item removed, any others + * will be at the same or higher lsn so we just + * ignore them. + */ + xfs_lsn_t lsn = xfs_ail_delete_one(ailp, blip); + if (!tail_lsn && lsn) + tail_lsn = lsn; + } else { xfs_clear_li_failed(blip); } } - xfs_ail_delete_finish(ailp, mlip_changed); + xfs_ail_delete_finish(ailp, tail_lsn); } /* diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 9e3102179221..00d66175f41a 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -108,17 +108,25 @@ xfs_ail_next( * We need the AIL lock in order to get a coherent read of the lsn of the last * item in the AIL. */ +static xfs_lsn_t +__xfs_ail_min_lsn( + struct xfs_ail *ailp) +{ + struct xfs_log_item *lip = xfs_ail_min(ailp); + + if (lip) + return lip->li_lsn; + return 0; +} + xfs_lsn_t xfs_ail_min_lsn( struct xfs_ail *ailp) { - xfs_lsn_t lsn = 0; - struct xfs_log_item *lip; + xfs_lsn_t lsn; spin_lock(&ailp->ail_lock); - lip = xfs_ail_min(ailp); - if (lip) - lsn = lip->li_lsn; + lsn = __xfs_ail_min_lsn(ailp); spin_unlock(&ailp->ail_lock); return lsn; @@ -779,12 +787,20 @@ xfs_trans_ail_update_bulk( } } -bool +/* + * Delete one log item from the AIL. + * + * If this item was at the tail of the AIL, return the LSN of the log item so + * that we can use it to check if the LSN of the tail of the log has moved + * when finishing up the AIL delete process in xfs_ail_delete_finish(). + */ +xfs_lsn_t xfs_ail_delete_one( struct xfs_ail *ailp, struct xfs_log_item *lip) { struct xfs_log_item *mlip = xfs_ail_min(ailp); + xfs_lsn_t lsn = lip->li_lsn; trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); xfs_ail_delete(ailp, lip); @@ -792,17 +808,20 @@ xfs_ail_delete_one( clear_bit(XFS_LI_IN_AIL, &lip->li_flags); lip->li_lsn = 0; - return mlip == lip; + if (mlip == lip) + return lsn; + return 0; } void xfs_ail_delete_finish( struct xfs_ail *ailp, - bool do_tail_update) __releases(ailp->ail_lock) + xfs_lsn_t old_lsn) __releases(ailp->ail_lock) { struct xfs_mount *mp = ailp->ail_mount; - if (!do_tail_update) { + /* if the tail lsn hasn't changed, don't do updates or wakeups. */ + if (!old_lsn || old_lsn == __xfs_ail_min_lsn(ailp)) { spin_unlock(&ailp->ail_lock); return; } diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h index 5ab70b9b896f..db589bb7468d 100644 --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -92,8 +92,8 @@ xfs_trans_ail_update( xfs_trans_ail_update_bulk(ailp, NULL, &lip, 1, lsn); } -bool xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip); -void xfs_ail_delete_finish(struct xfs_ail *ailp, bool do_tail_update) +xfs_lsn_t xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip); +void xfs_ail_delete_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn) __releases(ailp->ail_lock); void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip, int shutdown_type); From patchwork Thu Aug 1 02:17:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070085 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A34A13B1 for ; Thu, 1 Aug 2019 02:33:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E8E826B41 for ; Thu, 1 Aug 2019 02:33:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 829AA2846D; Thu, 1 Aug 2019 02:33:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 218BA28451 for ; Thu, 1 Aug 2019 02:33:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBDAB8E0015; Wed, 31 Jul 2019 22:33:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BF8438E0001; Wed, 31 Jul 2019 22:33:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A26978E0015; Wed, 31 Jul 2019 22:33:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 517D58E0001 for ; Wed, 31 Jul 2019 22:33:38 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id i26so44562920pfo.22 for ; Wed, 31 Jul 2019 19:33:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=55GU3i+l+wcvajR7XSfByHlGeW26xiXSp3BdXEb9npw=; b=QLfmRvVBHAiG7UHNGNA232ojBZD1pKus+/n+WBgYRRgwmokdqoRv5z8GMWcUzuQpCl wX5l2SKLhl9/PbgVzoNTb5rFwa3irqHYmfz7TIIUhFSJ6yQbi1Pq1MCplPPBFOjzCuEC dWqw5oOPyVYkSfMC5kxEA/E3RRJuZdizG9HjOyTJmRx3jQjGH7zig29vtiZ1H2p8BxzB a+ATbmrHKhAbQEs2h4Otfm3v+mgfT9SPhn8Xr1O+tqoH/+ziY3yrYWxGvVjr/TBBxaE6 1e+GX05dr8UbhP6m/GNnn1NNJioEZ+L2lxTn7/b2Spn1SpwFSuQbw2cymSkjeDlwVFnQ MxTg== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXhzsmZ9vLDQlp3uzgcQHdjLAMwZRPw86cspDhkDoVmbpGgQR0s 9KfCmqC9BzvKjpN6p1S547FxYGckm4f5rv/hWbNhHzgBYp6MZ91TgR60OWWi9YhMiS9wPAupD9k Zz3PsZVI+ZzzhdKc3TOINZ1U3+FsKjhO8ycG8Vokqw6nlFNiTfscPARMNIGoCG8A= X-Received: by 2002:a17:90a:4f0e:: with SMTP id p14mr5779490pjh.40.1564626818017; Wed, 31 Jul 2019 19:33:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqxsuwAirNE0vuHGwKgeRAWxCjxWzTFQDBn2hJFJMB3G9nL7SxnG7c+lAGIYEXI8b9KvPqO8 X-Received: by 2002:a17:90a:4f0e:: with SMTP id p14mr5779458pjh.40.1564626817343; Wed, 31 Jul 2019 19:33:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564626817; cv=none; d=google.com; s=arc-20160816; b=m7oQfqyR5gGPW3Amg9MwCkTmnBvXyTeWtDeZ0RYAmYD/qSDWKHE+Oyu3/FAhAkHd3T 3KKvWHZHYQ5yG4SgoVCcdEGvqsTtxkxVTZHrqym18niOsBeygGWPLCFL+EXKQYjwUcT3 KDPGB7muuCJYu+dzuaZyggfcT913bYOeIlZf09t2sbEyRz1/ubZ79Acl0Z4RU5ig9IHG E8Bybh4ZtbXPblIfK+gk7S5OIcegt80W56J4yRAtShgPZG6urLngUBqi8x4kRwT2mv62 WXpLwqEQirGWOzTDSL7n4QmmdiMrewYUGVJN3ZgwODHDKxFlTnNpNIjExXdrfpYnD6Dp ACTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=55GU3i+l+wcvajR7XSfByHlGeW26xiXSp3BdXEb9npw=; b=cNq9ke2Yvhrl0/5o3Ki/CTwjv8Q49zAgHN2r6U8n2RFphXX0aWcHAGGn0mRY4RHx7F 1C+fzP2s7c2D6K60h3jVJiE5gzAGCp2osLdIyxTegIwYtpGav/wmHBARlTiWw/nG/mZ+ eZObaIHDy5aVY5pQ2ZpKDhMp6+AmzEdAnVbpSf9TW3jSu6TU6oDPqM1towk8nbH7ysq/ Ar5RoXIUA1XtQ0vSRONoof9KaZc4sG7nozfDrq+2RuitopHW5WYm5bArQ8niJBS/njpB X/jz+FtWmAbXFRdda8CYAU69TYs73QZT70BrOITHCGehpvz3q7Z52fVrj4uI5vy6eKF1 d3FQ== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id d34si15373856pla.283.2019.07.31.19.33.36 for ; Wed, 31 Jul 2019 19:33:37 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id A3D6A36193F for ; Thu, 1 Aug 2019 12:33:35 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003b5-80; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lH-5l; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 15/24] xfs: eagerly free shadow buffers to reduce CIL footprint Date: Thu, 1 Aug 2019 12:17:43 +1000 Message-Id: <20190801021752.4986-16-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=Z3AahodxQ8A0aNDaG7EA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner The CIL can pin a lot of memory and effectively defines the lower free memory boundary of operation for XFS. The way we hang onto log item shadow buffers "just in case" effectively doubles the memory footprint of the CIL for dubious reasons. That is, we hang onto the old shadow buffer in case the next time we log the item it will fit into the shadow buffer and we won't have to allocate a new one. However, we only ever tend to grow dirty objects in the CIL through relogging, so once we've allocated a larger buffer the old buffer we set as a shadow buffer will never get reused as the amount we log never decreases until the item is clean. And then for buffer items we free the log item and the shadow buffers, anyway. Inode items will hold onto their shadow buffer until they are reclaimed - this could double the inode's memory footprint for it's lifetime... Hence we should just free the old log item buffer when we replace it with a new shadow buffer rather than storing it for later use. It's not useful, get rid of it as early as possible. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log_cil.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index fa5602d0fd7f..1863a9bdf4a9 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -238,9 +238,7 @@ xfs_cil_prepare_item( /* * If there is no old LV, this is the first time we've seen the item in * this CIL context and so we need to pin it. If we are replacing the - * old_lv, then remove the space it accounts for and make it the shadow - * buffer for later freeing. In both cases we are now switching to the - * shadow buffer, so update the the pointer to it appropriately. + * old_lv, then remove the space it accounts for and free it. */ if (!old_lv) { if (lv->lv_item->li_ops->iop_pin) @@ -251,7 +249,8 @@ xfs_cil_prepare_item( *diff_len -= old_lv->lv_bytes; *diff_iovecs -= old_lv->lv_niovecs; - lv->lv_item->li_lv_shadow = old_lv; + kmem_free(old_lv); + lv->lv_item->li_lv_shadow = NULL; } /* attach new log vector to log item */ From patchwork Thu Aug 1 02:17:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11069929 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E12B314E5 for ; Thu, 1 Aug 2019 02:18:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D00A42842B for ; Thu, 1 Aug 2019 02:18:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C409E28419; Thu, 1 Aug 2019 02:18:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B3E92842B for ; Thu, 1 Aug 2019 02:18:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 079A88E0006; Wed, 31 Jul 2019 22:18:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 01B668E0001; Wed, 31 Jul 2019 22:18:01 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4C068E0006; Wed, 31 Jul 2019 22:18:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id AD2458E0001 for ; Wed, 31 Jul 2019 22:18:01 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id n4so38136174plp.4 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=F++oM0IFb4Dzarbnf6aTsMjUhikuU7xqUEteruzc4xs=; b=Lj7kkCa0JLxMKkpcEb1JRxeF/M/pvgEPRADM+sjnuhmGIOJiAaT5TADQgyJUHbM8r0 +wT9QD6FVdNNiLTc7tqMbZd8qDuJWhtwJJhO7XlFlJ7ZY1JpvPID4Z5IlrWIJQFkhnLM 7qoZeQcEJzN6pNgd/H5yEDSYlccEw3jhXUwASWXeiPQis/F9J79XW8PQSDfKq09+9mh2 TG6poVBo6rNWxlAncZSXx8cwL9Uw7vRHFz5acKNjUVcERCqmF2BV5lhsGfsX3bWYYFty UBSTT9fjf3wK597AVvkCZ3CDKqd+FaE/HIvZHmhw5gERLOQrbt6WyBCs5eJKPX7C08Vr NKIg== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXfYbps8Gi5L8/+LmvQyxvmjzp9lFRVjzPAPNUKkgghKf3mj5Ll tGKda1DUJZUCZjMoYSDQqaLkN64FXvRZ8z0x0p9FiFL6jTJ9VYInrrOQ3yNn7SBkFM9b4XOke7X HFqg0mN9aXcp+vkgnGNZIg+lQj0TfFAfWvPHvLOHcmCcDNuHathQKiN4iMj0FQS4= X-Received: by 2002:a63:7d49:: with SMTP id m9mr108749228pgn.161.1564625881168; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqzsF3fsjuUHP8x/r2eG96jpjV7sX22dC2PLXePJvWxBBTpqtrN2BgdA5KYWAISDEPhhIDJN X-Received: by 2002:a63:7d49:: with SMTP id m9mr108749172pgn.161.1564625880252; Wed, 31 Jul 2019 19:18:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625880; cv=none; d=google.com; s=arc-20160816; b=NDT0LX4i8CMqHPk7owma8npyPug5Xt7TsrbzOBMLGYrVz5mF5YnL4s6K1AQx6iH8Lh lbZXEKTo0Dd8mES94aanCYCa2iW7XMUKb4+oiUcTJLv7vclpqjNf+QL+5Hz+IN+HZrEW pqhRZlIN99tkcu0u0ofheCWg8aEmH7fLKapd/9+ld8xLCc8w9jnCTSjCC7hjLaYJ+uoH RUDB52fbiQnvfKM2grKFmQ7m6q7zTOwx+IuTdB5pH0sewL4xNBp3xskOb8N9TQT6Z8+T DK3ioYMOFCiQDEkKn9Q9x4jj87afpKkE+L7YMPo7p9Reemc1M1hmtjDYoRqX5FB2Qd43 aAVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=F++oM0IFb4Dzarbnf6aTsMjUhikuU7xqUEteruzc4xs=; b=h/wUREVS36Zzq/r0LWIIdM9PaqN2I8e5D4I9d39IrDX6Wcy9F+WnnF9ibU8uarpDQK Q+BFeO7KJiC3inuxQGZEOF2DgnOmcoE/INIry9v5NXQu48yDixNbWOR9OOM6Nfz40EUV YdI3SE0crtGccX/35+WLGgfYm4TgcCUu5I4wLlBRDnYLzljdOtb4DychVgwN8Yj2+8ZV D3B6U7HySqTTPWnCrWwMMpemGACC7Po7XAFKl42Z/lgUplpi78e36eGgGekFVJxgv3lc mY+tigxPeBJWdOT8YUzJvClPM+1xwT9S0G3klzOOIYnEc7gdmQZ24WSeScOtziZKDGCr rzKA== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id ay21si2657132pjb.34.2019.07.31.19.17.59 for ; Wed, 31 Jul 2019 19:18:00 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A28C443E4AA; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003b8-9B; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lK-7I; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 16/24] xfs: Lower CIL flush limit for large logs Date: Thu, 1 Aug 2019 12:17:44 +1000 Message-Id: <20190801021752.4986-17-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=y69QCnB_skpws7lHVjcA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner The current CIL size aggregation limit is 1/8th the log size. This means for large logs we might be aggregating at least 250MB of dirty objects in memory before the CIL is flushed to the journal. With CIL shadow buffers sitting around, this means the CIL is often consuming >500MB of temporary memory that is all allocated under GFP_NOFS conditions. FLushing the CIL can take some time to do if there is other IO ongoing, and can introduce substantial log force latency by itself. It also pins the memory until the objects are in the AIL and can be written back and reclaimed by shrinkers. Hence this threshold also tends to determine the minimum amount of memory XFS can operate in under heavy modification without triggering the OOM killer. Modify the CIL space limit to prevent such huge amounts of pinned metadata from aggregating. We can 2MB of log IO in flight at once, so limit aggregation to 8x this size (arbitrary). This has some impact on performance (5-10% decrease on 16-way fsmark) and increases the amount of log traffic (~50% on same workload) but it is necessary to prevent rampant OOM killing under iworkloads that modify large amounts of metadata under heavy memory pressure. This was found via trace analysis or AIL behaviour. e.g. insertion from a single CIL flush: xfs_ail_insert: old lsn 0/0 new lsn 1/3033090 type XFS_LI_INODE flags IN_AIL $ grep xfs_ail_insert /mnt/scratch/s.t |grep "new lsn 1/3033090" |wc -l 1721823 $ So there were 1.7 million objects inserted into the AIL from this CIL checkpoint, the first at 2323.392108, the last at 2325.667566 which was the end of the trace (i.e. it hadn't finished). Clearly a major problem. XXX: Need to try bigger sizes to see where the performance/stability boundary lies to see if some of the losses can be regained and log bandwidth increases minimised. XXX: Ideally this threshold should slide with memory pressure. We can allow large amounts of metadata to build up when there is no memory pressure, but then close the window as memory pressure builds up to reduce the footprint of the CIL until memory pressure passes. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log_priv.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index b880c23cb6e4..87c6191daef7 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -329,7 +329,8 @@ struct xfs_cil { * enforced to ensure we stay within our maximum checkpoint size bounds. * threshold, yet give us plenty of space for aggregation on large logs. */ -#define XLOG_CIL_SPACE_LIMIT(log) (log->l_logsize >> 3) +#define XLOG_CIL_SPACE_LIMIT(log) \ + min_t(int, (log)->l_logsize >> 3, XLOG_TOTAL_REC_SHIFT(log) << 3) /* * ticket grant locks, queues and accounting have their own cachlines From patchwork Thu Aug 1 02:17:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11069943 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7743C186E for ; Thu, 1 Aug 2019 02:18:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A7772844B for ; Thu, 1 Aug 2019 02:18:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5DE4B27528; Thu, 1 Aug 2019 02:18:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 813802842A for ; Thu, 1 Aug 2019 02:18:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43CFB8E0007; Wed, 31 Jul 2019 22:18:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3C7FA8E0001; Wed, 31 Jul 2019 22:18:02 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2660E8E0007; Wed, 31 Jul 2019 22:18:02 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id CD55D8E0003 for ; Wed, 31 Jul 2019 22:18:01 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id h3so44121680pgc.19 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=yzrqxqNHmSWXpL3l2S0eDz4sW1FjJzBY6wHraenSjV0=; b=ndDnv/6+waAFH4bbdRD2dZTZL6tSPL8FWFOjLWn7fVeRXwmh0vuy/4YS5+mlsWSGpa jYqpgKdXrSL/sBVvKPDdZoQL13GpK8GOwPLjaS/mzbzFXF0UeyM24drG2DIHNhnh0MFC 12iE+VyiyYZSushbPD9bdg57f6SfTQclMlmZ8Y+g8/7PaRuTWT62l90in7sb02dEUFos NdhvZP+IPaQzBhhHE3BTodO7CzzBfmXb4qcnfeCDuT7o6nLX5Ou19Nu9k0AAZxenn3+k lDQeGjDlEvNMiiGvQIW4RVMUbqrIEHuHGHF99vQjMVczGhxQjPcWIj/tsTw/1k0NoOee GjUg== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAVM5m3FIkor2g6PyxU/LcHa6oPP0SdrhJ3/p2ZoirN0dhihUj5d vxXmb5+QM78Hmzp/YniPdlcVRvtY9jugWxLUpDiH3LBxH+Gd/Jguy7phKsrH+uTe/vyJ7el7d2k FzpSg0SEzMPNANzAEgAoBxLbrCIJZIgSU5+9SlRBH/swLhjrJBrwCwePiAo6oNYM= X-Received: by 2002:a63:d04e:: with SMTP id s14mr111097544pgi.189.1564625881355; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqxIL0OrpNFTLnh1FysVR1dHVnb43uwyuBcQpM3VLKFBCpC3baOVBcU3p89tgHIJWhS1s/OK X-Received: by 2002:a63:d04e:: with SMTP id s14mr111097487pgi.189.1564625880159; Wed, 31 Jul 2019 19:18:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625880; cv=none; d=google.com; s=arc-20160816; b=HXTXHjw4Z8D263/HU/tf2cl5wl3uxxPYqAQJjziD0HRI9QnZ5ImYO2N+VN2H+hSoib wTltqf1g3OHN7wnGuMc0WnczgTwdok9VMfeTKHEDI/hRXgo4YcLDkyTL45vsKJQ8UW+D SJ/Tl/+bqDEeipzd9YtUitRbsj/00k9Btwf8+P78LEGcw6/rqqK6x0HlcXJqNw775WMi vIJUIK/9h9z2DXQf9Zs6QDR63wYXmZRbej4vTGr3Ue+uD1ynNfdUsOzPbWUJ7e9Ge6tW e4PXBdBHAI+zHGHSK0PuTchz9EBMUUF6n7iqDZKCFl2hyjgIZ57kbPnO/KMz7Q8D1zRZ JblQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=yzrqxqNHmSWXpL3l2S0eDz4sW1FjJzBY6wHraenSjV0=; b=X92Jr+AFWQN42HKIEN/oq/jx4ZKvOYNvz0GMzv6aO+meFJDUSy2FqJEH34ViV9YCuX PTVCtf0TD/4UQ9lNbQVJ3nxRYlUeSAx9YCam6Ll5Y1+/HHeviw56voEa2Y1CAso1q5V/ Zk5kAH6Gx0oVgXxuDwGkwoz19Cbc/m5QMC5Bf0smh6SLKoyRP5ySup7/c0bo+RrZ59Cz b7Z5kTbUqkIEZvglokhN2H9Rpitd7fDOKXsSBDGl7x5LsIuszz+uOiC7YB9QN2SbzZaK twzZnppPP5KWUP+bkuZDbRG/QLrOtamJjPmn0qg65kvYhYpJfi7yLf7AcSILsbKaUD8a ZAMA== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id p12si57948416plq.331.2019.07.31.19.17.59 for ; Wed, 31 Jul 2019 19:18:00 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id B13D136185C; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003bB-Ak; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lN-8Z; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 17/24] xfs: don't block kswapd in inode reclaim Date: Thu, 1 Aug 2019 12:17:45 +1000 Message-Id: <20190801021752.4986-18-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=lu14g41xD__19t-wDHQA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner We have a number of reasons for blocking kswapd in XFS inode reclaim, mainly all to do with the fact that memory reclaim has no feedback mechanisms to throttle on dirty slab objects that need IO to reclaim. As a result, we currently throttle inode reclaim by issuing IO in the reclaim context. The unfortunate side effect of this is that it can cause long tail latencies in reclaim and for some workloads this can be a problem. Now that the shrinkers finally have a method of telling kswapd to back off, we can start the process of making inode reclaim in XFS non-blocking. The first thing we need to do is not block kswapd, but so that doesn't cause immediate serious problems, make sure inode writeback is always underway when kswapd is running. Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 0b0fd10a36d4..2fa2f8dcf86b 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1378,11 +1378,22 @@ xfs_reclaim_inodes_nr( struct xfs_mount *mp, int nr_to_scan) { - /* kick background reclaimer and push the AIL */ + int sync_mode = SYNC_TRYLOCK; + + /* kick background reclaimer */ xfs_reclaim_work_queue(mp); - xfs_ail_push_all(mp->m_ail); - return xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT, &nr_to_scan); + /* + * For kswapd, we kick background inode writeback. For direct + * reclaim, we issue and wait on inode writeback to throttle + * reclaim rates and avoid shouty OOM-death. + */ + if (current_is_kswapd()) + xfs_ail_push_all(mp->m_ail); + else + sync_mode |= SYNC_WAIT; + + return xfs_reclaim_inodes_ag(mp, sync_mode, &nr_to_scan); } /* From patchwork Thu Aug 1 02:17:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11069971 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C1CD414E5 for ; Thu, 1 Aug 2019 02:18:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B4F7D2842A for ; Thu, 1 Aug 2019 02:18:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A682428433; Thu, 1 Aug 2019 02:18:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08B892842A for ; Thu, 1 Aug 2019 02:18:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 859908E0009; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 80AA28E0003; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 722D88E0009; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 2E8768E0003 for ; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id u21so44607424pfn.15 for ; Wed, 31 Jul 2019 19:18:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=BzhfXa/iboA3E/3IPDss6+/2NOPohkvU8uRRmFcfAOU=; b=oWEZqXI7GMeSo/dSkW1A17QbGM19H7HNGSbz13DXXvkENDs31ntgbTljM2KuLnUXOy qsjEFH5Op2pyfhkRP+SgAKh6lz7j+Qwy5dTBgSx2OSt0OEu5iS19rjVdxdOqNQTHFYRD H+nHvW9I45hDVYdompArjD4YZqnRbvpCumzzelK4ZbJ5Y9EIlHCOs5g4ifXe/F6uvnrF ++XTluDSbp/UfPkMEV+zL6gq/YOvGgKYMK78022IPv5TRtwi+UHAl6+yViSNKeTwZsmh DUf1yy+Iz1MDrGH8m8DP91ci03+3qBkoL1zLaFXJzhaXCnYRg8UemGWGONrfZDwCucLG Q1Ug== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAVf2FthXudEc8Mx7XYZemniS/2CBqPXY7py9Nrw0PIwVrSKk4vq qQHvyfvg0a/u+JkciZUXErahpRodRnRxL6KXousifPUIqE8PVM/RCnEi5LRMkdZztWDOHtcOmEz KdUcvpsUB5Akwd6AfSry+js0PenNzxRHu+/mZrpLqi55AFzQX+vu4lD79gSs+FFo= X-Received: by 2002:a17:90a:8d86:: with SMTP id d6mr5890963pjo.94.1564625882767; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqzJC/w1T8rGRnbRZ52WGTjQlihEZXksLh4d6XZcidHtebfgB7lSh5vpPsch2YxLhz0D8EiD X-Received: by 2002:a17:90a:8d86:: with SMTP id d6mr5890887pjo.94.1564625881570; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625881; cv=none; d=google.com; s=arc-20160816; b=W7ctdoOeHFVmNXWAaagup5YIpogS1/pKZrkiC4p3dMAvh2V0i+z4g+LFncujVJ/+m4 nmwdeD/C0PVVS0MyrHRJFSZprHt6SW/UNm/qgGb1os5nAQnI3SFgV3G8vVY3+7qLdGSN uoSYf0GIJ8wbtuW6bWSYTqZFCvLFmOMO3TjhhA6VwBPK8OzL0oETwQ9ZHyOmrOs74Ccz M7U5B8jwbKwQhvpfAXC09shd1BTUEiwuoepnQCz1mX3nSK9OiJFzlJ4y4GXpECu2w8z9 BiVRFJn4/gIw2Dsi3eQjFRAFVRrqHgPxI79SVYsgZjdykGFZMFUAkiHK6QFiz67HYsbM FXWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=BzhfXa/iboA3E/3IPDss6+/2NOPohkvU8uRRmFcfAOU=; b=CGkpab8KjlJBD/JNEZVvij9uLUxKMgHm7CkKPjRNhV8H2BeryP/q4nDJkHzrvm2oLY qAl2hRa+S5QW56ul+t5jli+O8ybYnLQ9TjIAVrXw5s4ICLd9T1vPs0IeKcdSsdt6iTxp ja4JtgK+//bhxjuNOuncpdsig+rgQQEIHVzVibT3T0BHA++370MgpKwJtrwKH17SdD3C JjyvP/SGZBzoqs2YWN0Kk8pQJnO/fDw5YmT+pgOvRap5gmUd/kmpMF1NmMdYE6sOfKef 8U6xVWKe9EDJkxO+flJhXfYfXBROVQnaxJ/KPkf/2oDBbq0xeYKjIjDV/ZAL+ENGxw/u 5YUA== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id s17si32487844pfc.237.2019.07.31.19.18.01 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id B150F361934; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003bE-CP; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lQ-9o; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 18/24] xfs: reduce kswapd blocking on inode locking. Date: Thu, 1 Aug 2019 12:17:46 +1000 Message-Id: <20190801021752.4986-19-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=KE6An8oM74Ymw0apzXAA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner When doing async node reclaiming, we grab a batch of inodes that we are likely able to reclaim and ignore those that are already flushing. However, when we actually go to reclaim them, the first thing we do is lock the inode. If we are racing with something else reclaiming the inode or flushing it because it is dirty, we block on the inode lock. Hence we can still block kswapd here. Further, if we flush an inode, we also cluster all the other dirty inodes in that cluster into the same IO, flush locking them all. However, if the workload is operating on sequential inodes (e.g. created by a tarball extraction) most of these inodes will be sequntial in the cache and so in the same batch we've already grabbed for reclaim scanning. As a result, it is common for all the inodes in the batch to be dirty and it is common for the first inode flushed to also flush all the inodes in the reclaim batch. In which case, they are now all going to be flush locked and we do not want to block on them. Hence, for async reclaim (SYNC_TRYLOCK) make sure we always use trylock semantics and abort reclaim of an inode as quickly as we can without blocking kswapd. Found via tracing and finding big batches of repeated lock/unlock runs on inodes that we just flushed by write clustering during reclaim. Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 2fa2f8dcf86b..e6b9030875b9 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1104,11 +1104,23 @@ xfs_reclaim_inode( restart: error = 0; - xfs_ilock(ip, XFS_ILOCK_EXCL); - if (!xfs_iflock_nowait(ip)) { - if (!(sync_mode & SYNC_WAIT)) + /* + * Don't try to flush the inode if another inode in this cluster has + * already flushed it after we did the initial checks in + * xfs_reclaim_inode_grab(). + */ + if (sync_mode & SYNC_TRYLOCK) { + if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) goto out; - xfs_iflock(ip); + if (!xfs_iflock_nowait(ip)) + goto out_unlock; + } else { + xfs_ilock(ip, XFS_ILOCK_EXCL); + if (!xfs_iflock_nowait(ip)) { + if (!(sync_mode & SYNC_WAIT)) + goto out_unlock; + xfs_iflock(ip); + } } if (XFS_FORCED_SHUTDOWN(ip->i_mount)) { @@ -1215,9 +1227,10 @@ xfs_reclaim_inode( out_ifunlock: xfs_ifunlock(ip); +out_unlock: + xfs_iunlock(ip, XFS_ILOCK_EXCL); out: xfs_iflags_clear(ip, XFS_IRECLAIM); - xfs_iunlock(ip, XFS_ILOCK_EXCL); /* * We could return -EAGAIN here to make reclaim rescan the inode tree in * a short while. However, this just burns CPU time scanning the tree From patchwork Thu Aug 1 02:17:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070049 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CE4A14DB for ; Thu, 1 Aug 2019 02:18:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71728262F2 for ; Thu, 1 Aug 2019 02:18:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 663D42842D; Thu, 1 Aug 2019 02:18:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BDA9C28433 for ; Thu, 1 Aug 2019 02:18:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B951D8E0013; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AF16C8E0010; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D5A58E0010; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 181258E0013 for ; Wed, 31 Jul 2019 22:18:12 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id s21so38616311plr.2 for ; Wed, 31 Jul 2019 19:18:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=V39/0ng5LXkVVjEqqVCsyTSaMoXb9Ngx16S1ome9wGo=; b=peAmaSEsW5E7y+bZQCMk7xCyccjdoVIt6HE2xseijxHrROMcYLS35Jgk6ZhkXzSlA5 tkrbx4XXNGRhCbi2DG8TRUUFfinCNhOLM5ERS3C5KlI1lqj6i59HycWXN3RRcpe5axEt +n1y1kUEOOloYkjGLbA9WUECqKZ+C8XlrrWkliblyeVEN1kGaAq5t0p9iSH88+KdTz0H EQmEbmpCpWy61Vz58Xx0b03P1nwQHDbnZiqZOKJGQbPU+QfkvbsA+HlP/BURpF6+ywYJ xLXJwJqle/nf+l71XgoRCoZpJCYtmeGqKnHCFxOCPdJSbRwDRnnFzF8l6LSjx8aEgi5e WqQg== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXYOtDccpFcy1OjKFISoVJv8iO5eN1/yrQyJSst9bpuCB/4Kd4f pG6rxIpXZOmmSsKiYFkggejL9loHnuUa8q1U7X6F8ymREqXJ8cKnMi0zAmdzb1GEIiNlcrEQpGo BWL7NmKqzjRLJjEVRvVQsROhWAN3djxnfGr77vC2rLRmqZ4bd4JFmmbgERoSDYNg= X-Received: by 2002:a17:90a:d814:: with SMTP id a20mr5928658pjv.48.1564625891743; Wed, 31 Jul 2019 19:18:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqyBAabfNZtMuqNKPU0vbRg2mU3vSH+X5mfFfGU3PV6XFyZbVQcipoU2k3LEnoNomA6kaJAg X-Received: by 2002:a17:90a:d814:: with SMTP id a20mr5928165pjv.48.1564625882442; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625882; cv=none; d=google.com; s=arc-20160816; b=i53RIeJFg9YoBoT2M4+wi+4oyxEGl1Cl02J63O8UwytIC3pc1Egizimr5bNLfV3OJP eMT68ug21NSxR62d7Hjcyi9xF5P/jxmFf89UVnP4le7zTnrDqIny3+HCGN93Ngq2qUIs kUSj9OzPlqmsLPRPsW9W6J6ZQuMcosHs/Z9gFJrDxiBYwDP32Uz79DJTONyW3+TTRwPb ghxhoHQOt4f2VeHWcjeYSLcjTqI/yFjicra9iA/2DNHNfYnI0pZ4epo1K01r29gQySNG Q0NrnANAg5g4yRO2pyU9nIhf6IlL3YG5qU3WK6NwGtMOGn6E/Dxt5+oF4pStZ9x3EeUZ YURw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=V39/0ng5LXkVVjEqqVCsyTSaMoXb9Ngx16S1ome9wGo=; b=rVLLFGnmeEFj4hfgoFCGtHFCBdiZY7/To8dR9JAgeA/fUdMInZIxp7S7RI1fi7Nj7s Mxgq95pfuUzaOMkV87t5k06cmIJw4JqSSmCpYtW60HOdh/BlwIA9X02Z1nOOWWC2toMp 6VEOkeBPLhxrMOpYXqBnt7sdia+cBTumx/y8BnFJjlBA9oN8TmvngcRoe9ncz2rDBnCn 6To9TrZfeCNiRciB10ebd2/tBrO6IA7H5qbjZox2xwS6lAn5Sukjdez7fTZjd3tkFH67 nOevrFhr+IyTupqw1CEAFAGCPzSV0fhJLmZvCcZAw9prJ2lpaI3oRt6CIXC2z4AontiU gQOw== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id c1si30277329plr.405.2019.07.31.19.18.02 for ; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id B0DF8361820; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003bG-DM; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lT-BB; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 19/24] xfs: kill background reclaim work Date: Thu, 1 Aug 2019 12:17:47 +1000 Message-Id: <20190801021752.4986-20-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=1xoyCpcK-Ekt5S4qF2sA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner This function is now entirely done by kswapd, so we don't need the worker thread to do async reclaim anymore. Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 44 -------------------------------------------- fs/xfs/xfs_icache.h | 2 -- fs/xfs/xfs_mount.c | 2 -- fs/xfs/xfs_mount.h | 2 -- fs/xfs/xfs_super.c | 11 +---------- 5 files changed, 1 insertion(+), 60 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index e6b9030875b9..0bd4420a7e16 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -138,44 +138,6 @@ xfs_inode_free( __xfs_inode_free(ip); } -/* - * Queue a new inode reclaim pass if there are reclaimable inodes and there - * isn't a reclaim pass already in progress. By default it runs every 5s based - * on the xfs periodic sync default of 30s. Perhaps this should have it's own - * tunable, but that can be done if this method proves to be ineffective or too - * aggressive. - */ -static void -xfs_reclaim_work_queue( - struct xfs_mount *mp) -{ - - rcu_read_lock(); - if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) { - queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work, - msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10)); - } - rcu_read_unlock(); -} - -/* - * This is a fast pass over the inode cache to try to get reclaim moving on as - * many inodes as possible in a short period of time. It kicks itself every few - * seconds, as well as being kicked by the inode cache shrinker when memory - * goes low. It scans as quickly as possible avoiding locked inodes or those - * already being flushed, and once done schedules a future pass. - */ -void -xfs_reclaim_worker( - struct work_struct *work) -{ - struct xfs_mount *mp = container_of(to_delayed_work(work), - struct xfs_mount, m_reclaim_work); - - xfs_reclaim_inodes(mp, SYNC_TRYLOCK); - xfs_reclaim_work_queue(mp); -} - static void xfs_perag_set_reclaim_tag( struct xfs_perag *pag) @@ -192,9 +154,6 @@ xfs_perag_set_reclaim_tag( XFS_ICI_RECLAIM_TAG); spin_unlock(&mp->m_perag_lock); - /* schedule periodic background inode reclaim */ - xfs_reclaim_work_queue(mp); - trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_); } @@ -1393,9 +1352,6 @@ xfs_reclaim_inodes_nr( { int sync_mode = SYNC_TRYLOCK; - /* kick background reclaimer */ - xfs_reclaim_work_queue(mp); - /* * For kswapd, we kick background inode writeback. For direct * reclaim, we issue and wait on inode writeback to throttle diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 48f1fd2bb6ad..4c0d8920cc54 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -49,8 +49,6 @@ int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino, struct xfs_inode * xfs_inode_alloc(struct xfs_mount *mp, xfs_ino_t ino); void xfs_inode_free(struct xfs_inode *ip); -void xfs_reclaim_worker(struct work_struct *work); - int xfs_reclaim_inodes(struct xfs_mount *mp, int mode); int xfs_reclaim_inodes_count(struct xfs_mount *mp); long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 322da6909290..a1805021c92f 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -988,7 +988,6 @@ xfs_mountfs( * qm_unmount_quotas and therefore rely on qm_unmount to release the * quota inodes. */ - cancel_delayed_work_sync(&mp->m_reclaim_work); xfs_reclaim_inodes(mp, SYNC_WAIT); xfs_health_unmount(mp); out_log_dealloc: @@ -1071,7 +1070,6 @@ xfs_unmountfs( * reclaim just to be sure. We can stop background inode reclaim * here as well if it is still running. */ - cancel_delayed_work_sync(&mp->m_reclaim_work); xfs_reclaim_inodes(mp, SYNC_WAIT); xfs_health_unmount(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index fdb60e09a9c5..f0cc952ad527 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -165,7 +165,6 @@ typedef struct xfs_mount { uint m_chsize; /* size of next field */ atomic_t m_active_trans; /* number trans frozen */ struct xfs_mru_cache *m_filestream; /* per-mount filestream data */ - struct delayed_work m_reclaim_work; /* background inode reclaim */ struct delayed_work m_eofblocks_work; /* background eof blocks trimming */ struct delayed_work m_cowblocks_work; /* background cow blocks @@ -182,7 +181,6 @@ typedef struct xfs_mount { struct workqueue_struct *m_buf_workqueue; struct workqueue_struct *m_unwritten_workqueue; struct workqueue_struct *m_cil_workqueue; - struct workqueue_struct *m_reclaim_workqueue; struct workqueue_struct *m_eofblocks_workqueue; struct workqueue_struct *m_sync_workqueue; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 67b59815d0df..09e41c6c1794 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -822,15 +822,10 @@ xfs_init_mount_workqueues( if (!mp->m_cil_workqueue) goto out_destroy_unwritten; - mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s", - WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname); - if (!mp->m_reclaim_workqueue) - goto out_destroy_cil; - mp->m_eofblocks_workqueue = alloc_workqueue("xfs-eofblocks/%s", WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname); if (!mp->m_eofblocks_workqueue) - goto out_destroy_reclaim; + goto out_destroy_cil; mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", WQ_FREEZABLE, 0, mp->m_fsname); @@ -841,8 +836,6 @@ xfs_init_mount_workqueues( out_destroy_eofb: destroy_workqueue(mp->m_eofblocks_workqueue); -out_destroy_reclaim: - destroy_workqueue(mp->m_reclaim_workqueue); out_destroy_cil: destroy_workqueue(mp->m_cil_workqueue); out_destroy_unwritten: @@ -859,7 +852,6 @@ xfs_destroy_mount_workqueues( { destroy_workqueue(mp->m_sync_workqueue); destroy_workqueue(mp->m_eofblocks_workqueue); - destroy_workqueue(mp->m_reclaim_workqueue); destroy_workqueue(mp->m_cil_workqueue); destroy_workqueue(mp->m_unwritten_workqueue); destroy_workqueue(mp->m_buf_workqueue); @@ -1557,7 +1549,6 @@ xfs_mount_alloc( spin_lock_init(&mp->m_perag_lock); mutex_init(&mp->m_growlock); atomic_set(&mp->m_active_trans, 0); - INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker); INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker); INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker); mp->m_kobj.kobject.kset = xfs_kset; From patchwork Thu Aug 1 02:17:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070007 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 279A71395 for ; Thu, 1 Aug 2019 02:18:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 13E7B2842E for ; Thu, 1 Aug 2019 02:18:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0801128434; Thu, 1 Aug 2019 02:18:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AEC7528445 for ; Thu, 1 Aug 2019 02:18:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8C378E000D; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C9F198E000C; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AEFCB8E0003; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 6F74F8E000B for ; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id 30so44090618pgk.16 for ; Wed, 31 Jul 2019 19:18:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=+d+MkzprrrayrZnw2oZirxilFh0bXksnWbgz8tiTeWA=; b=iNKJpc4eHThJfWcOmjmar+v6zJf4ERJXnOCCGSTj7UrTdj4oqXEZ57QURFAKaN5Mez YpuhCdDnEoDP49WDfWZWt3QFDbx3zvT1G0CeVbg9h2zKMStACSGF5ndb+ZxOQpaX2Qro CL+h16cIlvFYADQPBpEtvRy4+Ah0pchvAXahJbsuFb8iSn7Lz1GiShlBl/exR6yiaY1z OnM0oNUSDjKMnrrfxleieUK854+Wrxkrm6ORcWE5837EzR5U7l0HcIFXA1pXOt/En2rH EvtaHCV6oDOURD0hbk7zYucq+/24y75g0CS9ki+wPQuX1acRWentgVu0aT3Dies8J8PL EYxw== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAX8AOXWI3ePNPiXFWA8Cxoz2eY03AOD9HPFetbXwnwfthPvGYik gwuOIB3qNW96uskiGWuseG/L5MHmxDJ+pz2n987Ls/3sbGzbw95etooA3T3dTrc880tGqaQUjmB VdnN4m48kpV+mg2yTjUJktJcFTjlrTBO/9mV+5253C9Ek7ifDrwwXJKJXCz7/L3k= X-Received: by 2002:a17:902:20b:: with SMTP id 11mr124967800plc.78.1564625884057; Wed, 31 Jul 2019 19:18:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqwWobXsqabDVBSpAqKSdxL7gmAMp589r+MFSndYqcDtfjxOsnMXY8/hGqPJO53tlnsf67I+ X-Received: by 2002:a17:902:20b:: with SMTP id 11mr124967677plc.78.1564625881906; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625881; cv=none; d=google.com; s=arc-20160816; b=K/kWuZPc0yKi0CaiBQVKzt8AhVUwBrH8yCkuM1OAE/E8MYRFCWhPsE9086igQTCyMO aX3J8VQ3Ta5te9UTpMeBmNf7G+PjVcqo6Ivu64AjMUWt8f9Ek6rHYVxOMNRNDOqgPeu7 x9dYNK/UiwCenmn2NsiepmzrlmBDJUsXsVOpUSeLHphYJd1B/UcrrKB8K2QCzqrSmQVZ AcI+le5CxwNp63zlkoX87quEb3TXWZzlCfCj27d/CHyUh5KYf5OJSDxPYQY6qG7pmtiP 7MPHGC9qKhL9/3xuM8fE34jVJaJYZ4cdcRT5tf8tIvYHNEQn1oo64ltYFOADCPD6vTRM wzsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=+d+MkzprrrayrZnw2oZirxilFh0bXksnWbgz8tiTeWA=; b=vtY7usWxgiIomWIYjo1mGN1l6JEHPL9WSUPpxIN1EZMdIQ7m8Ijydfltc+DjNZ22s7 lDNz/h5JcpknEK8hGZrnC8M46lIrGUoh46VKgXhJBiO6hkjAbof1p3QVLDoJN5djc2/U yocto9nZxc3a/WSEqtr+3cOfWVyRHdXsLxWEV+Ybarh9/rBZmHwttULni+8S/r0aTb8R 9XbEpt/PvwuMDuDqQIde6eDDRaTQ5SoiwodwhE5toITipRGqooDlqCKVuYL2E15+rSEA b2TVH+JTw8E6t+X1qETReLpRdblDwjnHXuy/9FWpXvWCzPv1w8REqiPtmKrAd6uYitFx T/pg== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id h8si29074919plt.16.2019.07.31.19.18.01 for ; Wed, 31 Jul 2019 19:18:01 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id A20B2361419; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003bJ-ES; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lW-Ci; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 20/24] xfs: use AIL pushing for inode reclaim IO Date: Thu, 1 Aug 2019 12:17:48 +1000 Message-Id: <20190801021752.4986-21-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=gx66ZOwSCviGFWpCog4A:9 a=6HkFz5UkQjFoDJN8:21 a=VkB9qLNburQgnVq-:21 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Inode reclaim currently issues it's own inode IO when it comes across dirty inodes. This is used to throttle direct reclaim down to the rate at which we can reclaim dirty inodes. Failure to throttle in this manner results in the OOM killer being trivial to trigger even when there is lots of free memory available. However, having direct reclaimers issue IO causes an amount of IO thrashing to occur. We can have up to the number of AGs in the filesystem concurrently issuing IO, plus the AIL pushing thread as well. This means we can many competing sources of IO and they all end up thrashing and competing for the request slots in the block device. Similar to dirty page throttling and the BDI flusher thread, we can use the AIL pushing thread the sole place we issue inode writeback from and everything else waits for it to make progress. To do this, reclaim will skip over dirty inodes, but in doing so will record the lowest LSN of all the dirty inodes it skips. It will then push the AIL to this LSN and wait for it to complete that work. In doing so, we block direct reclaim on the IO of at least one IO, thereby providing some level of throttling for when we encounter dirty inodes. However we gain the ability to scan and reclaim clean inodes in a non-blocking fashion. This allows us to remove all the per-ag reclaim locking that avoids excessive direct reclaim, as repeated concurrent direct reclaim will hit the same dirty inodes on block waiting on the same IO to complete. Hence direct reclaim will be throttled directly by the rate at which dirty inodes are cleaned by AIL pushing, rather than by delays caused by competing IO submissions. This allows us to remove all the locking that limits direct reclaim concurrency and greatly simplifies the inode reclaim code now that it just skips dirty inodes. Note: this patch by itself isn't completely able to throttle direct reclaim sufficiently to prevent OOM killer madness. We can't do that until we change the way we index reclaimable inodes in the next patch and can feed back state to the mm core sanely. However, we can't change the way we index reclaimable inodes until we have IO-less non-blocking reclaim for both direct reclaim and kswapd reclaim. Catch-22... Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 208 +++++++++++++++++------------------------ fs/xfs/xfs_mount.c | 4 - fs/xfs/xfs_mount.h | 1 - fs/xfs/xfs_trans_ail.c | 4 +- 4 files changed, 90 insertions(+), 127 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 0bd4420a7e16..4c4c5bc12147 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -22,6 +22,7 @@ #include "xfs_dquot_item.h" #include "xfs_dquot.h" #include "xfs_reflink.h" +#include "xfs_log.h" #include @@ -967,28 +968,42 @@ xfs_inode_ag_iterator_tag( } /* - * Grab the inode for reclaim exclusively. - * Return 0 if we grabbed it, non-zero otherwise. + * Grab the inode for reclaim. + * + * Return false if we aren't going to reclaim it, true if it is a reclaim + * candidate. + * + * If the inode is clean or unreclaimable, return NULLCOMMITLSN to tell the + * caller it does not require flushing. Otherwise return the log item lsn of the + * inode so the caller can determine it's inode flush target. If we get the + * clean/dirty state wrong then it will be sorted in xfs_reclaim_inode() once we + * have locks held. */ -STATIC int +STATIC bool xfs_reclaim_inode_grab( struct xfs_inode *ip, - int flags) + int flags, + xfs_lsn_t *lsn) { ASSERT(rcu_read_lock_held()); + *lsn = 0; /* quick check for stale RCU freed inode */ if (!ip->i_ino) - return 1; + return false; /* - * If we are asked for non-blocking operation, do unlocked checks to - * see if the inode already is being flushed or in reclaim to avoid - * lock traffic. + * Do unlocked checks to see if the inode already is being flushed or in + * reclaim to avoid lock traffic. If the inode is not clean, return the + * it's position in the AIL for the caller to push to. */ - if ((flags & SYNC_TRYLOCK) && - __xfs_iflags_test(ip, XFS_IFLOCK | XFS_IRECLAIM)) - return 1; + if (!xfs_inode_clean(ip)) { + *lsn = ip->i_itemp->ili_item.li_lsn; + return false; + } + + if (__xfs_iflags_test(ip, XFS_IFLOCK | XFS_IRECLAIM)) + return false; /* * The radix tree lock here protects a thread in xfs_iget from racing @@ -1005,11 +1020,11 @@ xfs_reclaim_inode_grab( __xfs_iflags_test(ip, XFS_IRECLAIM)) { /* not a reclaim candidate. */ spin_unlock(&ip->i_flags_lock); - return 1; + return false; } __xfs_iflags_set(ip, XFS_IRECLAIM); spin_unlock(&ip->i_flags_lock); - return 0; + return true; } /* @@ -1050,92 +1065,67 @@ xfs_reclaim_inode_grab( * clean => reclaim * dirty, async => requeue * dirty, sync => flush, wait and reclaim + * + * Returns true if the inode was reclaimed, false otherwise. */ -STATIC int +STATIC bool xfs_reclaim_inode( struct xfs_inode *ip, struct xfs_perag *pag, - int sync_mode) + xfs_lsn_t *lsn) { - struct xfs_buf *bp = NULL; - xfs_ino_t ino = ip->i_ino; /* for radix_tree_delete */ - int error; + xfs_ino_t ino; + + *lsn = 0; -restart: - error = 0; /* * Don't try to flush the inode if another inode in this cluster has * already flushed it after we did the initial checks in * xfs_reclaim_inode_grab(). */ - if (sync_mode & SYNC_TRYLOCK) { - if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) - goto out; - if (!xfs_iflock_nowait(ip)) - goto out_unlock; - } else { - xfs_ilock(ip, XFS_ILOCK_EXCL); - if (!xfs_iflock_nowait(ip)) { - if (!(sync_mode & SYNC_WAIT)) - goto out_unlock; - xfs_iflock(ip); - } - } + if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) + goto out; + if (!xfs_iflock_nowait(ip)) + goto out_unlock; + /* If we are in shutdown, we don't care about blocking. */ if (XFS_FORCED_SHUTDOWN(ip->i_mount)) { xfs_iunpin_wait(ip); /* xfs_iflush_abort() drops the flush lock */ xfs_iflush_abort(ip, false); goto reclaim; } - if (xfs_ipincount(ip)) { - if (!(sync_mode & SYNC_WAIT)) - goto out_ifunlock; - xfs_iunpin_wait(ip); - } - if (xfs_iflags_test(ip, XFS_ISTALE) || xfs_inode_clean(ip)) { - xfs_ifunlock(ip); - goto reclaim; - } /* - * Never flush out dirty data during non-blocking reclaim, as it would - * just contend with AIL pushing trying to do the same job. + * If it is pinned, we only want to flush this if there's nothing else + * to be flushed as it requires a log force. Hence we essentially set + * the LSN to flush the entire AIL which will end up triggering a log + * force to unpin this inode, but that will only happen if there are not + * other inodes in the scan that only need writeback. */ - if (!(sync_mode & SYNC_WAIT)) + if (xfs_ipincount(ip)) { + *lsn = ip->i_itemp->ili_last_lsn; goto out_ifunlock; + } /* - * Now we have an inode that needs flushing. - * - * Note that xfs_iflush will never block on the inode buffer lock, as - * xfs_ifree_cluster() can lock the inode buffer before it locks the - * ip->i_lock, and we are doing the exact opposite here. As a result, - * doing a blocking xfs_imap_to_bp() to get the cluster buffer would - * result in an ABBA deadlock with xfs_ifree_cluster(). - * - * As xfs_ifree_cluser() must gather all inodes that are active in the - * cache to mark them stale, if we hit this case we don't actually want - * to do IO here - we want the inode marked stale so we can simply - * reclaim it. Hence if we get an EAGAIN error here, just unlock the - * inode, back off and try again. Hopefully the next pass through will - * see the stale flag set on the inode. + * Dirty inode we didn't catch, skip it. */ - error = xfs_iflush(ip, &bp); - if (error == -EAGAIN) { - xfs_iunlock(ip, XFS_ILOCK_EXCL); - /* backoff longer than in xfs_ifree_cluster */ - delay(2); - goto restart; + if (!xfs_inode_clean(ip) && !xfs_iflags_test(ip, XFS_ISTALE)) { + *lsn = ip->i_itemp->ili_item.li_lsn; + goto out_ifunlock; } - if (!error) { - error = xfs_bwrite(bp); - xfs_buf_relse(bp); - } + /* + * It's clean, we have it locked, we can now drop the flush lock + * and reclaim it. + */ + xfs_ifunlock(ip); reclaim: ASSERT(!xfs_isiflocked(ip)); + ASSERT(xfs_inode_clean(ip) || xfs_iflags_test(ip, XFS_ISTALE)); + ASSERT(ip->i_ino != 0); /* * Because we use RCU freeing we need to ensure the inode always appears @@ -1148,6 +1138,7 @@ xfs_reclaim_inode( * will see an invalid inode that it can skip. */ spin_lock(&ip->i_flags_lock); + ino = ip->i_ino; /* for radix_tree_delete */ ip->i_flags = XFS_IRECLAIM; ip->i_ino = 0; spin_unlock(&ip->i_flags_lock); @@ -1182,7 +1173,7 @@ xfs_reclaim_inode( xfs_iunlock(ip, XFS_ILOCK_EXCL); __xfs_inode_free(ip); - return error; + return true; out_ifunlock: xfs_ifunlock(ip); @@ -1190,14 +1181,7 @@ xfs_reclaim_inode( xfs_iunlock(ip, XFS_ILOCK_EXCL); out: xfs_iflags_clear(ip, XFS_IRECLAIM); - /* - * We could return -EAGAIN here to make reclaim rescan the inode tree in - * a short while. However, this just burns CPU time scanning the tree - * waiting for IO to complete and the reclaim work never goes back to - * the idle state. Instead, return 0 to let the next scheduled - * background reclaim attempt to reclaim the inode again. - */ - return 0; + return false; } /* @@ -1205,39 +1189,28 @@ xfs_reclaim_inode( * corrupted, we still want to try to reclaim all the inodes. If we don't, * then a shut down during filesystem unmount reclaim walk leak all the * unreclaimed inodes. + * + * Return the number of inodes freed. */ STATIC int xfs_reclaim_inodes_ag( struct xfs_mount *mp, int flags, - int *nr_to_scan) + int nr_to_scan) { struct xfs_perag *pag; - int error = 0; - int last_error = 0; xfs_agnumber_t ag; - int trylock = flags & SYNC_TRYLOCK; - int skipped; + xfs_lsn_t lsn, lowest_lsn = NULLCOMMITLSN; + long freed = 0; -restart: ag = 0; - skipped = 0; while ((pag = xfs_perag_get_tag(mp, ag, XFS_ICI_RECLAIM_TAG))) { unsigned long first_index = 0; int done = 0; int nr_found = 0; ag = pag->pag_agno + 1; - - if (trylock) { - if (!mutex_trylock(&pag->pag_ici_reclaim_lock)) { - skipped++; - xfs_perag_put(pag); - continue; - } - first_index = pag->pag_ici_reclaim_cursor; - } else - mutex_lock(&pag->pag_ici_reclaim_lock); + first_index = pag->pag_ici_reclaim_cursor; do { struct xfs_inode *batch[XFS_LOOKUP_BATCH]; @@ -1262,9 +1235,13 @@ xfs_reclaim_inodes_ag( for (i = 0; i < nr_found; i++) { struct xfs_inode *ip = batch[i]; - if (done || xfs_reclaim_inode_grab(ip, flags)) + if (done || + !xfs_reclaim_inode_grab(ip, flags, &lsn)) batch[i] = NULL; + if (lsn && XFS_LSN_CMP(lsn, lowest_lsn) < 0) + lowest_lsn = lsn; + /* * Update the index for the next lookup. Catch * overflows into the next AG range which can @@ -1293,37 +1270,28 @@ xfs_reclaim_inodes_ag( for (i = 0; i < nr_found; i++) { if (!batch[i]) continue; - error = xfs_reclaim_inode(batch[i], pag, flags); - if (error && last_error != -EFSCORRUPTED) - last_error = error; + if (xfs_reclaim_inode(batch[i], pag, &lsn)) + freed++; + if (lsn && XFS_LSN_CMP(lsn, lowest_lsn) < 0) + lowest_lsn = lsn; } - *nr_to_scan -= XFS_LOOKUP_BATCH; - + nr_to_scan -= XFS_LOOKUP_BATCH; cond_resched(); - } while (nr_found && !done && *nr_to_scan > 0); + } while (nr_found && !done && nr_to_scan > 0); - if (trylock && !done) + if (!done) pag->pag_ici_reclaim_cursor = first_index; else pag->pag_ici_reclaim_cursor = 0; - mutex_unlock(&pag->pag_ici_reclaim_lock); xfs_perag_put(pag); } - /* - * if we skipped any AG, and we still have scan count remaining, do - * another pass this time using blocking reclaim semantics (i.e - * waiting on the reclaim locks and ignoring the reclaim cursors). This - * ensure that when we get more reclaimers than AGs we block rather - * than spin trying to execute reclaim. - */ - if (skipped && (flags & SYNC_WAIT) && *nr_to_scan > 0) { - trylock = 0; - goto restart; - } - return last_error; + if ((flags & SYNC_WAIT) && lowest_lsn != NULLCOMMITLSN) + xfs_ail_push_sync(mp->m_ail, lowest_lsn); + + return freed; } int @@ -1331,9 +1299,7 @@ xfs_reclaim_inodes( xfs_mount_t *mp, int mode) { - int nr_to_scan = INT_MAX; - - return xfs_reclaim_inodes_ag(mp, mode, &nr_to_scan); + return xfs_reclaim_inodes_ag(mp, mode, INT_MAX); } /* @@ -1350,7 +1316,7 @@ xfs_reclaim_inodes_nr( struct xfs_mount *mp, int nr_to_scan) { - int sync_mode = SYNC_TRYLOCK; + int sync_mode = 0; /* * For kswapd, we kick background inode writeback. For direct @@ -1362,7 +1328,7 @@ xfs_reclaim_inodes_nr( else sync_mode |= SYNC_WAIT; - return xfs_reclaim_inodes_ag(mp, sync_mode, &nr_to_scan); + return xfs_reclaim_inodes_ag(mp, sync_mode, nr_to_scan); } /* diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index a1805021c92f..bcf8f64d1b1f 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -148,7 +148,6 @@ xfs_free_perag( ASSERT(atomic_read(&pag->pag_ref) == 0); xfs_iunlink_destroy(pag); xfs_buf_hash_destroy(pag); - mutex_destroy(&pag->pag_ici_reclaim_lock); call_rcu(&pag->rcu_head, __xfs_free_perag); } } @@ -200,7 +199,6 @@ xfs_initialize_perag( pag->pag_agno = index; pag->pag_mount = mp; spin_lock_init(&pag->pag_ici_lock); - mutex_init(&pag->pag_ici_reclaim_lock); INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC); if (xfs_buf_hash_init(pag)) goto out_free_pag; @@ -242,7 +240,6 @@ xfs_initialize_perag( out_hash_destroy: xfs_buf_hash_destroy(pag); out_free_pag: - mutex_destroy(&pag->pag_ici_reclaim_lock); kmem_free(pag); out_unwind_new_pags: /* unwind any prior newly initialized pags */ @@ -252,7 +249,6 @@ xfs_initialize_perag( break; xfs_buf_hash_destroy(pag); xfs_iunlink_destroy(pag); - mutex_destroy(&pag->pag_ici_reclaim_lock); kmem_free(pag); } return error; diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index f0cc952ad527..2049e764faed 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -383,7 +383,6 @@ typedef struct xfs_perag { spinlock_t pag_ici_lock; /* incore inode cache lock */ struct radix_tree_root pag_ici_root; /* incore inode cache root */ int pag_ici_reclaimable; /* reclaimable inodes */ - struct mutex pag_ici_reclaim_lock; /* serialisation point */ unsigned long pag_ici_reclaim_cursor; /* reclaim restart point */ /* buffer cache index */ diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 00d66175f41a..5802139f786b 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -676,8 +676,10 @@ xfs_ail_push_sync( spin_lock(&ailp->ail_lock); while ((lip = xfs_ail_min(ailp)) != NULL) { prepare_to_wait(&ailp->ail_push, &wait, TASK_UNINTERRUPTIBLE); + trace_printk("lip lsn 0x%llx thres 0x%llx targ 0x%llx", + lip->li_lsn, threshold_lsn, ailp->ail_target); if (XFS_FORCED_SHUTDOWN(ailp->ail_mount) || - XFS_LSN_CMP(threshold_lsn, lip->li_lsn) <= 0) + XFS_LSN_CMP(threshold_lsn, lip->li_lsn) < 0) break; /* XXX: cmpxchg? */ while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0) From patchwork Thu Aug 1 02:17:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070071 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4306414DB for ; Thu, 1 Aug 2019 02:33:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E2E926E3E for ; Thu, 1 Aug 2019 02:33:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 222BE28458; Thu, 1 Aug 2019 02:33:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB0C126E3E for ; Thu, 1 Aug 2019 02:33:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D53A8E0005; Wed, 31 Jul 2019 22:33:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 984B58E0001; Wed, 31 Jul 2019 22:33:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 875038E0005; Wed, 31 Jul 2019 22:33:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 529E88E0001 for ; Wed, 31 Jul 2019 22:33:26 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id e33so2132792pgm.20 for ; Wed, 31 Jul 2019 19:33:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=HtwyGlZ5mvkmCyCRQYzEdXItWXLQlR6zKGaCqb0x2HY=; b=P2atkX1ugElBm2qfOIjwZkVDahzuU3Gtotf1uaC9vS1Srv8O2SOfRrkPiSFDM+LiDW nc+cmMDTz2fwlSIOOz8xyZQhSyMFpeR6/Hv91hgNcshR6fEFPqBHtWDgL4fNAkrk/1/x 0MSWZWEQASl4A7aF6G8PPbn04syKiXpS9eDUz7HByQhw2ZT05zQydr4fZ1im5IsRWkWJ wBlFlTpbxW6x5u88Em9hdxuinCHyoLI3xHHBY17qC2Arg3u0PPDv64l+6534lkXgG05h UuswAxXA8M7yskqcac4fLoAxMUY/X+I1LxQIB1ei/XUZ0yNlTOD0A0+W7NtD3lJMCWRw sUqA== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAUl3DFbR1WM9Yft+wCuOcyjrTgzbW9OKnfb75iNq+pMpYcTKGcS miliLZpG7Mx8z9sCNPyZxOoLuTRstRsOVZFj1uHG82G5qgKPPkuQ3nZfiRVkCFSL0w3cnuYR/SS ZxRtJTohZlD5lb1ADOBzkBXB167HFGKk2at0+b5AuOfCkOZln9I/NF8pqNf8CD8I= X-Received: by 2002:aa7:81d4:: with SMTP id c20mr50947823pfn.235.1564626806028; Wed, 31 Jul 2019 19:33:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqwka+0Y78vQxFa9SSo7WCr3auaK73hfptXGXHyTD/DdYlae3o+avu1WmC5bzfqIkMQfn0C+ X-Received: by 2002:aa7:81d4:: with SMTP id c20mr50947747pfn.235.1564626804820; Wed, 31 Jul 2019 19:33:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564626804; cv=none; d=google.com; s=arc-20160816; b=b+OFjgAyL7wKxXw32qVwPq4rYmNYFTEKFBSnfdHDYhGvllZuYoSzvsU00TzJV7G0dv 8snRKD+DkzTR0ep2893kKlK89flQAKBTGmrfj3iLkDO9vnchgKBeuumdR97D/AIUZ2+R I1KSdgoCc6Pq9sJvUqK1Psb6eXdbhUnckA1NcHb7dGyYZNOfSw9dwMc4Madnt7znaVWi qBdDnjpUoZc2jHYKK/wM25g2YEghVKxpiF0+jWYSGj7GZ7m1v/Z7mk+TlBvj4ZV4U8aq 5w+8DoTPf2tjoa9DnulnCY2wdbqt8U6ZPB0BGe3xsu/rlwjPZeNLYveO12NPEayhmzwH hXew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=HtwyGlZ5mvkmCyCRQYzEdXItWXLQlR6zKGaCqb0x2HY=; b=bnU0V8bwuO/ShtftnIhEWWFB6uHs9t7eG5ytEmJalKXrgSx29urOIDw+Za4y57rqpq s6ulFdKGtZdcNqAK7E2+SVqWMenh24Rc5AkOodSyJj4H+WRh0zRAzL5yB1ArFhExWqqW ScnIc3CuTec+a69PaZ7ktSPW/Bb2+Nc06hlMmhVUHeXbSN/RFsDWb4hoit4Mhpuh5EJS StJ+ddzYVmWLwD1SsAfhSzBgqgH1nFf9ApUKSI+vLrvdu2bHD4+j4x7b1dhYCB7qBVZ0 +jfVAxTh9gygyiFsXySJwHK2k3JtZEt7LhTD7BSt1KIJDS/LRsa2QmrA508tLw/wFMRM TlvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id w1si32978418pfn.129.2019.07.31.19.33.24 for ; Wed, 31 Jul 2019 19:33:24 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 7C5F743DA40 for ; Thu, 1 Aug 2019 12:33:23 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003bM-Fg; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lZ-Dr; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 21/24] xfs: remove mode from xfs_reclaim_inodes() Date: Thu, 1 Aug 2019 12:17:49 +1000 Message-Id: <20190801021752.4986-22-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=rBCjN8xBrULXB8iKm2EA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Because it's always SYNC_WAIT now. Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 7 +++---- fs/xfs/xfs_icache.h | 2 +- fs/xfs/xfs_mount.c | 4 ++-- fs/xfs/xfs_super.c | 3 +-- 4 files changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 4c4c5bc12147..aaa1f840a86c 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1294,12 +1294,11 @@ xfs_reclaim_inodes_ag( return freed; } -int +void xfs_reclaim_inodes( - xfs_mount_t *mp, - int mode) + struct xfs_mount *mp) { - return xfs_reclaim_inodes_ag(mp, mode, INT_MAX); + xfs_reclaim_inodes_ag(mp, SYNC_WAIT, INT_MAX); } /* diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 4c0d8920cc54..1c9b9edb2986 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -49,7 +49,7 @@ int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino, struct xfs_inode * xfs_inode_alloc(struct xfs_mount *mp, xfs_ino_t ino); void xfs_inode_free(struct xfs_inode *ip); -int xfs_reclaim_inodes(struct xfs_mount *mp, int mode); +void xfs_reclaim_inodes(struct xfs_mount *mp); int xfs_reclaim_inodes_count(struct xfs_mount *mp); long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index bcf8f64d1b1f..e851b9cfbabd 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -984,7 +984,7 @@ xfs_mountfs( * qm_unmount_quotas and therefore rely on qm_unmount to release the * quota inodes. */ - xfs_reclaim_inodes(mp, SYNC_WAIT); + xfs_reclaim_inodes(mp); xfs_health_unmount(mp); out_log_dealloc: mp->m_flags |= XFS_MOUNT_UNMOUNTING; @@ -1066,7 +1066,7 @@ xfs_unmountfs( * reclaim just to be sure. We can stop background inode reclaim * here as well if it is still running. */ - xfs_reclaim_inodes(mp, SYNC_WAIT); + xfs_reclaim_inodes(mp); xfs_health_unmount(mp); xfs_qm_unmount(mp); diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 09e41c6c1794..a59d3a21be5c 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1179,8 +1179,7 @@ xfs_quiesce_attr( xfs_log_force(mp, XFS_LOG_SYNC); /* reclaim inodes to do any IO before the freeze completes */ - xfs_reclaim_inodes(mp, 0); - xfs_reclaim_inodes(mp, SYNC_WAIT); + xfs_reclaim_inodes(mp); /* Push the superblock and write an unmount record */ error = xfs_log_sbcount(mp); From patchwork Thu Aug 1 02:17:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11069987 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DDB71395 for ; Thu, 1 Aug 2019 02:18:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6CBAE27528 for ; Thu, 1 Aug 2019 02:18:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 60E6B28451; Thu, 1 Aug 2019 02:18:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 982CE27528 for ; Thu, 1 Aug 2019 02:18:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FF9E8E000A; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4B00B8E0003; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 329068E000A; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id F07258E0003 for ; Wed, 31 Jul 2019 22:18:03 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id e20so44629735pfd.3 for ; Wed, 31 Jul 2019 19:18:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=LNm3sCmEqkVojcSaXASBMsdVoFDdM7+6/vhiMF0Fn8Y=; b=M6eby0UWx6SaL2qmTe/bmAa/1nCH563egKK+iVNpQN2IdeXFLAOCnhWB+Xc2IuiFQD J6/4NitUq6pjObtwnYd55Q4nSsr29gLTzyt+e6w7zHuk+/MIZ2+udUjG6Pt67ERcaQ48 zZdM9IUSdBPJaKgXwmKH3U4b5GqWdQgMLUyLI09vrL4yJggmv0++pgZCXB2B6uVZrz6Y xVuz4nWfnGdQ+i0LhO5Lf9eUu4O18RVW87/BMKPsQjltE49gereaHTT2HjqkOP6rAkYZ 75Km9QkjbFi23jLsvO66g0Xt6FIPAqD9D7SV1SYWghmPs0jAsRbh3neW+IU3ZiYBsL1P 98EA== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAXYIH5LbyBTsF8IftCBA7gEwtziNgpnU8RXw9YJ5tBiN626W/We 2ZkWwxNfAt4Q73yV2wWR5hG4VUGE0ingB5p2VD7QppsMDHcHkZ31I+1fRjRsZH58F1VPCNP3Dp0 q5PnyNUIEacP/Fg6k8LjYAYJItN7fblhmaQrGfG9u4WQQPjniyAZ+s/uygnDskJE= X-Received: by 2002:a17:90a:c588:: with SMTP id l8mr5920974pjt.16.1564625883621; Wed, 31 Jul 2019 19:18:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqzzRxafPYLcqg/0fSYbCBwduNkz2SVeyRPvWPAOhydopVINONRnFDDi5LUmv2FWuvbQlUMK X-Received: by 2002:a17:90a:c588:: with SMTP id l8mr5920927pjt.16.1564625882611; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625882; cv=none; d=google.com; s=arc-20160816; b=nLL6FSeESxJ6VGu2lOPXDgBEkZH88qGuQvSBwv5ZVDdkg9LPU32+PZvTL4XY6KEA6O FPmTTSe07AK+sEmXGAb7xKnvuojQSF6mLKp8jEN6BCzq54mCOU8815xVnVYwJBauUpRu tiRcWHbkMi1D6m42eZ9/XG/2jyjTTxxtNqyQAWZpdoOTrbFW33J509LA62UmJkkC6+QH Hjd2K4usfTPt+G7J9eqoaqy4zze8Kze2z35t3z1qqq0iXV1oiQ/T7IdaFHUc8Cfurk7/ JFOcoIYLWNaQ0NIigrSn2BRbaJ4bIyDWn4Z8D2QK0E6ymEdRXuxgWcJJpIiHWNcJqVmR KY0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=LNm3sCmEqkVojcSaXASBMsdVoFDdM7+6/vhiMF0Fn8Y=; b=eRABDS0LKD6gr7lYhajhDRJa4mOoVGp8YO79frKpndhXx3mVn6wsiISGJHSL6vXG8/ +7T4ejZWPH3ZLW7bpt/457MKxbZvO2gMFGsPBWBb4ukVhv8tVQQMGF9C0pSd5oQvBxQg MtPFad+JGLTHBFg8EAqayAoghOx5l0YimZD3rit8G5Xty1cSvEJ8wW8iPmkyZx7GSv8E 6V7jQLxLks95wqf6poe7AxOuEKMl016h2VS6xxEqbxZUP+F4H4T7w5wK44aBtzwjN7Dl sLG3CvpH20HuimzVdgizosx1FVmer5bglKx6ajSGFjowvb1vcUrxGZH+vNzTBN+/r3yg Trjw== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id j69si31122906pgd.589.2019.07.31.19.18.02 for ; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A1AED43EA1D; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003bQ-Gu; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lc-FK; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 22/24] xfs: track reclaimable inodes using a LRU list Date: Thu, 1 Aug 2019 12:17:50 +1000 Message-Id: <20190801021752.4986-23-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=iBnADfy8PgdZzd-sFjsA:9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Now that we don't do IO from the inode reclaim code, there is no need to optimise inode scanning order for optimal IO characteristics. The AIL takes care of that for us, so now reclaim can focus on selecting the best inodes to reclaim. Hence we can change the inode reclaim algorithm to a real LRU and remove the need to use the radix tree to track and walk inodes under reclaim. This frees up a radix tree bit and simplifies the code that marks inodes are reclaim candidates. It also simplifies the reclaim code - we don't need batching anymore and all the reclaim logic can be added to the LRU isolation callback. Further, we get node aware reclaim at the xfs_inode level, which should help the per-node reclaim code free relevant inodes faster. We can re-use the VFS inode lru pointers - once the inode has been reclaimed from the VFS, we can use these pointers ourselves. Hence we don't need to grow the inode to change the way we index reclaimable inodes. Start by adding the list_lru tracking in parallel with the existing reclaim code. This makes it easier to see the LRU infrastructure separate to the reclaim algorithm changes. Especially the locking order, which is ip->i_flags_lock -> list_lru lock. Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 31 +++++++------------------------ fs/xfs/xfs_icache.h | 1 - fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_super.c | 31 ++++++++++++++++++++++++------- 4 files changed, 32 insertions(+), 32 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index aaa1f840a86c..610f643df9f6 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -370,12 +370,11 @@ xfs_iget_cache_hit( /* * We need to set XFS_IRECLAIM to prevent xfs_reclaim_inode - * from stomping over us while we recycle the inode. We can't - * clear the radix tree reclaimable tag yet as it requires - * pag_ici_lock to be held exclusive. + * from stomping over us while we recycle the inode. Remove it + * from the LRU straight away so we can re-init the VFS inode. */ ip->i_flags |= XFS_IRECLAIM; - + list_lru_del(&mp->m_inode_lru, &inode->i_lru); spin_unlock(&ip->i_flags_lock); rcu_read_unlock(); @@ -390,6 +389,7 @@ xfs_iget_cache_hit( spin_lock(&ip->i_flags_lock); wake = !!__xfs_iflags_test(ip, XFS_INEW); ip->i_flags &= ~(XFS_INEW | XFS_IRECLAIM); + list_lru_add(&mp->m_inode_lru, &inode->i_lru); if (wake) wake_up_bit(&ip->i_flags, __XFS_INEW_BIT); ASSERT(ip->i_flags & XFS_IRECLAIMABLE); @@ -1141,6 +1141,9 @@ xfs_reclaim_inode( ino = ip->i_ino; /* for radix_tree_delete */ ip->i_flags = XFS_IRECLAIM; ip->i_ino = 0; + + /* XXX: temporary until lru based reclaim */ + list_lru_del(&pag->pag_mount->m_inode_lru, &VFS_I(ip)->i_lru); spin_unlock(&ip->i_flags_lock); xfs_iunlock(ip, XFS_ILOCK_EXCL); @@ -1330,26 +1333,6 @@ xfs_reclaim_inodes_nr( return xfs_reclaim_inodes_ag(mp, sync_mode, nr_to_scan); } -/* - * Return the number of reclaimable inodes in the filesystem for - * the shrinker to determine how much to reclaim. - */ -int -xfs_reclaim_inodes_count( - struct xfs_mount *mp) -{ - struct xfs_perag *pag; - xfs_agnumber_t ag = 0; - int reclaimable = 0; - - while ((pag = xfs_perag_get_tag(mp, ag, XFS_ICI_RECLAIM_TAG))) { - ag = pag->pag_agno + 1; - reclaimable += pag->pag_ici_reclaimable; - xfs_perag_put(pag); - } - return reclaimable; -} - STATIC int xfs_inode_match_id( struct xfs_inode *ip, diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 1c9b9edb2986..0ab08b58cd45 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -50,7 +50,6 @@ struct xfs_inode * xfs_inode_alloc(struct xfs_mount *mp, xfs_ino_t ino); void xfs_inode_free(struct xfs_inode *ip); void xfs_reclaim_inodes(struct xfs_mount *mp); -int xfs_reclaim_inodes_count(struct xfs_mount *mp); long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan); void xfs_inode_set_reclaim_tag(struct xfs_inode *ip); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 2049e764faed..4a4ecbc22246 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -75,6 +75,7 @@ typedef struct xfs_mount { uint8_t m_rt_sick; struct xfs_ail *m_ail; /* fs active log item list */ + struct list_lru m_inode_lru; struct xfs_sb m_sb; /* copy of fs superblock */ spinlock_t m_sb_lock; /* sb counter lock */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index a59d3a21be5c..b5c4c1b6fd19 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -919,28 +919,30 @@ xfs_fs_destroy_inode( struct inode *inode) { struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; trace_xfs_destroy_inode(ip); ASSERT(!rwsem_is_locked(&inode->i_rwsem)); - XFS_STATS_INC(ip->i_mount, vn_rele); - XFS_STATS_INC(ip->i_mount, vn_remove); + XFS_STATS_INC(mp, vn_rele); + XFS_STATS_INC(mp, vn_remove); xfs_inactive(ip); - if (!XFS_FORCED_SHUTDOWN(ip->i_mount) && ip->i_delayed_blks) { + if (!XFS_FORCED_SHUTDOWN(mp) && ip->i_delayed_blks) { xfs_check_delalloc(ip, XFS_DATA_FORK); xfs_check_delalloc(ip, XFS_COW_FORK); ASSERT(0); } - XFS_STATS_INC(ip->i_mount, vn_reclaim); + XFS_STATS_INC(mp, vn_reclaim); /* * We should never get here with one of the reclaim flags already set. */ - ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIMABLE)); - ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIM)); + spin_lock(&ip->i_flags_lock); + ASSERT_ALWAYS(!__xfs_iflags_test(ip, XFS_IRECLAIMABLE)); + ASSERT_ALWAYS(!__xfs_iflags_test(ip, XFS_IRECLAIM)); /* * We always use background reclaim here because even if the @@ -949,6 +951,9 @@ xfs_fs_destroy_inode( * this more efficiently than we can here, so simply let background * reclaim tear down all inodes. */ + __xfs_iflags_set(ip, XFS_IRECLAIMABLE); + list_lru_add(&mp->m_inode_lru, &VFS_I(ip)->i_lru); + spin_unlock(&ip->i_flags_lock); xfs_inode_set_reclaim_tag(ip); } @@ -1541,6 +1546,15 @@ xfs_mount_alloc( if (!mp) return NULL; + /* + * The inode lru needs to be associated with the superblock shrinker, + * and like the rest of the superblock shrinker, it's memcg aware. + */ + if (list_lru_init_memcg(&mp->m_inode_lru, &sb->s_shrink)) { + kfree(mp); + return NULL; + } + mp->m_super = sb; spin_lock_init(&mp->m_sb_lock); spin_lock_init(&mp->m_agirotor_lock); @@ -1748,6 +1762,7 @@ xfs_fs_fill_super( out_free_fsname: sb->s_fs_info = NULL; xfs_free_fsname(mp); + list_lru_destroy(&mp->m_inode_lru); kfree(mp); out: return error; @@ -1780,6 +1795,7 @@ xfs_fs_put_super( sb->s_fs_info = NULL; xfs_free_fsname(mp); + list_lru_destroy(&mp->m_inode_lru); kfree(mp); } @@ -1801,7 +1817,8 @@ xfs_fs_nr_cached_objects( /* Paranoia: catch incorrect calls during mount setup or teardown */ if (WARN_ON_ONCE(!sb->s_fs_info)) return 0; - return xfs_reclaim_inodes_count(XFS_M(sb)); + + return list_lru_shrink_count(&XFS_M(sb)->m_inode_lru, sc); } static long From patchwork Thu Aug 1 02:17:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070011 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94ABF14DB for ; Thu, 1 Aug 2019 02:18:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8939027528 for ; Thu, 1 Aug 2019 02:18:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8758528445; Thu, 1 Aug 2019 02:18:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8075127528 for ; Thu, 1 Aug 2019 02:18:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14D538E000C; Wed, 31 Jul 2019 22:18:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D8A078E0003; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C12DD8E000B; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 784568E000C for ; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id h3so44121760pgc.19 for ; Wed, 31 Jul 2019 19:18:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Jn6Cb0VrZgctzMLO4B4wTn95tZEMxeku6KVhwrGtC50=; b=g/rvK5PBWVCq6ssLKDuJZzTjpm1U3wOI9tTpLinspkQuij/fsdSQb50D1RVBsXCxOe RybY2PEeycxD6L1hgbiFx8praDeFJX1d7asIYIXc9SA+UPrj8n/NSrJqtg1qplRm6XX3 CkgAzEZ396HipOdayEHYBKbS92pAQyNpa/dlHhdrqif06FaNxvTYDGhWdb/ye2Bnd9+l Lge+0Xdrbg1PyJyZP4JytuQjxC17rmZQHqEL69bw5YvUbRhSqECNcujBdujOGeCPL5Nx Iv4ivsAF1eRBtnWFh2l/6d5qEf7GaILyZF5fgyj3yIUi0BWerY1L07cCGX6HRXl9hqKc mpNA== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAWALfLWZDAQidBHQetqDY0/a9dlY88BhY87ftvOTJkDgDRCQalf hfDCEU/OAQKQGqkeS/Kc7dEgorjotqTRM1ehBLab+E/TQy3WoOwf0zdFodTji70c2aVc5/uOPWS 4CljCVjk3EXwTwNPnWk2/h3bjZLt4LHnYjQeyDIvj29Qs9oCl+/I74j+NnYzE+2o= X-Received: by 2002:a17:902:2868:: with SMTP id e95mr115282614plb.319.1564625884128; Wed, 31 Jul 2019 19:18:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqzZvOeOZ89HbhqUMJJ1Ne6C/qB5FldvNX3IH3QyJAzw3Ap9YBH+xLcS4UZkAk5NkZ4sM+O/ X-Received: by 2002:a17:902:2868:: with SMTP id e95mr115282546plb.319.1564625882869; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625882; cv=none; d=google.com; s=arc-20160816; b=przRq76dAsixRn9rV0MCzMosGOI+llBc3b5QKyPDXhA/lBqxbToe/8Q22FjWdmLNGV j6TtGYeyvP4X76KhocA+2JqNeP9r3o62NLeXfYEI5W3cEkXJngI/VocNSnbVB4JKyuSP JBVKMjDAXoWcmWlhu70OBS4lZRDdcjPSA6b1m2CYbqhnxhvxfOKr4Mu3kY5GHRq44Izl tzh0C8s74q7GW4+hyYMq41YqPD0mMUn3aW8o6Z67hfyavFiXYhIIJepRDpgqMZ+Ke6Lf 0x/rKeUOmptoe34pNeVCx9HvdS4he+o63JpXasQIVn4ptaJWsYthWdmQt2me7yxWY1BI NE5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=Jn6Cb0VrZgctzMLO4B4wTn95tZEMxeku6KVhwrGtC50=; b=hAy/POx/PpKjRXtRW6FLw+oYTvBvNj7nzcFmPfS3nbFAPEYQ0Gnalf5M1xG18O+wqE XeHWZI9DjP+g2fNtPQVNB2ApJNvVFahWNLUGKoq36qTPP8uHtIwrQLoJRPuha4G7WcVb s02yAql2aoNp72Xwq9PHC5YFCohYERqhKjP5sGmgaIL0HM+V8kAx0UiC2YFRL0YV0CTM eQG7Yzh296XpcB+gYplflcA3aGnA9txm5KIW5SJYkk48hbhblVTG400nR3rmZOA7Buxl 5c8i5ZguciR1hpvP4dbYJms8L+zYxLKgp6g/p2FOYglf12ZPcrz3Gl+KEmUrpuAmdZSl 0tEg== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au. [211.29.132.249]) by mx.google.com with ESMTP id y34si31024665plb.423.2019.07.31.19.18.02 for ; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.249; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.249 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id A1981361273; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003bT-IZ; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001lf-GM; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 23/24] xfs: reclaim inodes from the LRU Date: Thu, 1 Aug 2019 12:17:51 +1000 Message-Id: <20190801021752.4986-24-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=nKCprb0aSrhUHDyOBuUA:9 a=a_hQLlXCV-XpzTWr:21 a=8NxpYfFRAdJzUREW:21 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner Replace the AG radix tree walking reclaim code with a list_lru walker, giving us both node-aware and memcg-aware inode reclaim at the XFS level. This requires adding an inode isolation function to determine if the inode can be reclaim, and a list walker to dispose of the inodes that were isolated. We want the isolation function to be non-blocking. If we can't grab an inode then we either skip it or rotate it. If it's clean then we skip it, if it's dirty then we rotate to give it time to be cleaned before it is scanned again. This congregates the dirty inodes at the tail of the LRU, which means that if we start hitting a majority of dirty inodes either there are lots of unlinked inodes in the reclaim list or we've reclaimed all the clean inodes and we're looped back on the dirty inodes. Either way, this is an indication we should tell kswapd to back off. The non-blocking isolation function introduces a complexity for the filesystem shutdown case. When the filesystem is shut down, we want to free the inode even if it is dirty, and this may require blocking. We already hold the locks needed to do this blocking, so what we do is that we leave inodes locked - both the ILOCK and the flush lock - while they are sitting on the dispose list to be freed after the LRU walk completes. This allows us to process the shutdown state outside the LRU walk where we can block safely. Keep in mind we don't have to care about inode lock order or blocking with inode locks held here because a) we are using trylocks, and b) once marked with XFS_IRECLAIM they can't be found via the LRU and inode cache lookups will abort and retry. Hence nobody will try to lock them in any other context that might also be holding other inode locks. Also convert xfs_reclaim_inodes() to use a LRU walk to free all the reclaimable inodes in the filesystem. Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 199 ++++++++++++++++++++++++++++++++++++++------ fs/xfs/xfs_icache.h | 10 ++- fs/xfs/xfs_inode.h | 8 ++ fs/xfs/xfs_super.c | 50 +++++++++-- 4 files changed, 232 insertions(+), 35 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 610f643df9f6..891fe3795c8f 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1195,7 +1195,7 @@ xfs_reclaim_inode( * * Return the number of inodes freed. */ -STATIC int +int xfs_reclaim_inodes_ag( struct xfs_mount *mp, int flags, @@ -1297,40 +1297,185 @@ xfs_reclaim_inodes_ag( return freed; } -void -xfs_reclaim_inodes( - struct xfs_mount *mp) +enum lru_status +xfs_inode_reclaim_isolate( + struct list_head *item, + struct list_lru_one *lru, + spinlock_t *lru_lock, + void *arg) { - xfs_reclaim_inodes_ag(mp, SYNC_WAIT, INT_MAX); + struct xfs_ireclaim_args *ra = arg; + struct inode *inode = container_of(item, struct inode, i_lru); + struct xfs_inode *ip = XFS_I(inode); + enum lru_status ret; + xfs_lsn_t lsn = 0; + + /* Careful: inversion of iflags_lock and everything else here */ + if (!spin_trylock(&ip->i_flags_lock)) + return LRU_SKIP; + + ret = LRU_ROTATE; + if (!xfs_inode_clean(ip) && !__xfs_iflags_test(ip, XFS_ISTALE)) { + lsn = ip->i_itemp->ili_item.li_lsn; + ra->dirty_skipped++; + goto out_unlock_flags; + } + + ret = LRU_SKIP; + if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) + goto out_unlock_flags; + + if (!__xfs_iflock_nowait(ip)) { + lsn = ip->i_itemp->ili_item.li_lsn; + ra->dirty_skipped++; + goto out_unlock_inode; + } + + /* if we are in shutdown, we'll reclaim it even if dirty */ + if (XFS_FORCED_SHUTDOWN(ip->i_mount)) + goto reclaim; + + /* + * Now the inode is locked, we can actually determine if it is dirty + * without racing with anything. + */ + ret = LRU_ROTATE; + if (xfs_ipincount(ip)) { + ra->dirty_skipped++; + goto out_ifunlock; + } + if (!xfs_inode_clean(ip) && !__xfs_iflags_test(ip, XFS_ISTALE)) { + lsn = ip->i_itemp->ili_item.li_lsn; + ra->dirty_skipped++; + goto out_ifunlock; + } + +reclaim: + /* + * Once we mark the inode with XFS_IRECLAIM, no-one will grab it again. + * RCU lookups will still find the inode, but they'll stop when they set + * the IRECLAIM flag. Hence we can leave the inode locked as we move it + * to the dispose list so we can deal with shutdown cleanup there + * outside the LRU lock context. + */ + __xfs_iflags_set(ip, XFS_IRECLAIM); + list_lru_isolate_move(lru, &inode->i_lru, &ra->freeable); + spin_unlock(&ip->i_flags_lock); + return LRU_REMOVED; + +out_ifunlock: + xfs_ifunlock(ip); +out_unlock_inode: + xfs_iunlock(ip, XFS_ILOCK_EXCL); +out_unlock_flags: + spin_unlock(&ip->i_flags_lock); + + if (lsn && XFS_LSN_CMP(lsn, ra->lowest_lsn) < 0) + ra->lowest_lsn = lsn; + return ret; } -/* - * Scan a certain number of inodes for reclaim. - * - * When called we make sure that there is a background (fast) inode reclaim in - * progress, while we will throttle the speed of reclaim via doing synchronous - * reclaim of inodes. That means if we come across dirty inodes, we wait for - * them to be cleaned, which we hope will not be very long due to the - * background walker having already kicked the IO off on those dirty inodes. - */ -long -xfs_reclaim_inodes_nr( - struct xfs_mount *mp, - int nr_to_scan) +static void +xfs_dispose_inode( + struct xfs_inode *ip) { - int sync_mode = 0; + struct xfs_mount *mp = ip->i_mount; + struct xfs_perag *pag; + xfs_ino_t ino; + + ASSERT(xfs_isiflocked(ip)); + ASSERT(xfs_inode_clean(ip) || xfs_iflags_test(ip, XFS_ISTALE)); + ASSERT(ip->i_ino != 0); /* - * For kswapd, we kick background inode writeback. For direct - * reclaim, we issue and wait on inode writeback to throttle - * reclaim rates and avoid shouty OOM-death. + * Process the shutdown reclaim work we deferred from the LRU isolation + * callback before we go any further. */ - if (current_is_kswapd()) - xfs_ail_push_all(mp->m_ail); - else - sync_mode |= SYNC_WAIT; + if (XFS_FORCED_SHUTDOWN(mp)) { + xfs_iunpin_wait(ip); + xfs_iflush_abort(ip, false); + } else { + xfs_ifunlock(ip); + } - return xfs_reclaim_inodes_ag(mp, sync_mode, nr_to_scan); + /* + * Because we use RCU freeing we need to ensure the inode always appears + * to be reclaimed with an invalid inode number when in the free state. + * We do this as early as possible under the ILOCK so that + * xfs_iflush_cluster() and xfs_ifree_cluster() can be guaranteed to + * detect races with us here. By doing this, we guarantee that once + * xfs_iflush_cluster() or xfs_ifree_cluster() has locked XFS_ILOCK that + * it will see either a valid inode that will serialise correctly, or it + * will see an invalid inode that it can skip. + */ + spin_lock(&ip->i_flags_lock); + ino = ip->i_ino; /* for radix_tree_delete */ + ip->i_flags = XFS_IRECLAIM; + ip->i_ino = 0; + spin_unlock(&ip->i_flags_lock); + xfs_iunlock(ip, XFS_ILOCK_EXCL); + + XFS_STATS_INC(mp, xs_ig_reclaims); + /* + * Remove the inode from the per-AG radix tree. + * + * Because radix_tree_delete won't complain even if the item was never + * added to the tree assert that it's been there before to catch + * problems with the inode life time early on. + */ + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ino)); + spin_lock(&pag->pag_ici_lock); + if (!radix_tree_delete(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ino))) + ASSERT(0); + spin_unlock(&pag->pag_ici_lock); + xfs_perag_put(pag); + + /* + * Here we do an (almost) spurious inode lock in order to coordinate + * with inode cache radix tree lookups. This is because the lookup + * can reference the inodes in the cache without taking references. + * + * We make that OK here by ensuring that we wait until the inode is + * unlocked after the lookup before we go ahead and free it. + * + * XXX: need to check this is still true. Not sure it is. + */ + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_qm_dqdetach(ip); + xfs_iunlock(ip, XFS_ILOCK_EXCL); + + __xfs_inode_free(ip); +} + +void +xfs_dispose_inodes( + struct list_head *freeable) +{ + while (!list_empty(freeable)) { + struct inode *inode; + + inode = list_first_entry(freeable, struct inode, i_lru); + list_del_init(&inode->i_lru); + + xfs_dispose_inode(XFS_I(inode)); + cond_resched(); + } +} +void +xfs_reclaim_inodes( + struct xfs_mount *mp) +{ + while (list_lru_count(&mp->m_inode_lru)) { + struct xfs_ireclaim_args ra; + + INIT_LIST_HEAD(&ra.freeable); + ra.lowest_lsn = NULLCOMMITLSN; + list_lru_walk(&mp->m_inode_lru, xfs_inode_reclaim_isolate, + &ra, LONG_MAX); + xfs_dispose_inodes(&ra.freeable); + if (ra.lowest_lsn != NULLCOMMITLSN) + xfs_ail_push_sync(mp->m_ail, ra.lowest_lsn); + } } STATIC int diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 0ab08b58cd45..dadc69a30f33 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -49,8 +49,16 @@ int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino, struct xfs_inode * xfs_inode_alloc(struct xfs_mount *mp, xfs_ino_t ino); void xfs_inode_free(struct xfs_inode *ip); +struct xfs_ireclaim_args { + struct list_head freeable; + xfs_lsn_t lowest_lsn; + unsigned long dirty_skipped; +}; + +enum lru_status xfs_inode_reclaim_isolate(struct list_head *item, + struct list_lru_one *lru, spinlock_t *lru_lock, void *arg); +void xfs_dispose_inodes(struct list_head *freeable); void xfs_reclaim_inodes(struct xfs_mount *mp); -long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan); void xfs_inode_set_reclaim_tag(struct xfs_inode *ip); diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 558173f95a03..463170dc4c02 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -263,6 +263,14 @@ static inline int xfs_isiflocked(struct xfs_inode *ip) extern void __xfs_iflock(struct xfs_inode *ip); +static inline int __xfs_iflock_nowait(struct xfs_inode *ip) +{ + if (ip->i_flags & XFS_IFLOCK) + return false; + ip->i_flags |= XFS_IFLOCK; + return true; +} + static inline int xfs_iflock_nowait(struct xfs_inode *ip) { return !xfs_iflags_test_and_set(ip, XFS_IFLOCK); diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index b5c4c1b6fd19..e3e898a2896c 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -17,6 +17,7 @@ #include "xfs_alloc.h" #include "xfs_fsops.h" #include "xfs_trans.h" +#include "xfs_trans_priv.h" #include "xfs_buf_item.h" #include "xfs_log.h" #include "xfs_log_priv.h" @@ -1810,23 +1811,58 @@ xfs_fs_mount( } static long -xfs_fs_nr_cached_objects( +xfs_fs_free_cached_objects( struct super_block *sb, struct shrink_control *sc) { - /* Paranoia: catch incorrect calls during mount setup or teardown */ - if (WARN_ON_ONCE(!sb->s_fs_info)) - return 0; + struct xfs_mount *mp = XFS_M(sb); + struct xfs_ireclaim_args ra; + long freed; - return list_lru_shrink_count(&XFS_M(sb)->m_inode_lru, sc); + INIT_LIST_HEAD(&ra.freeable); + ra.lowest_lsn = NULLCOMMITLSN; + ra.dirty_skipped = 0; + + freed = list_lru_shrink_walk(&mp->m_inode_lru, sc, + xfs_inode_reclaim_isolate, &ra); + xfs_dispose_inodes(&ra.freeable); + + /* + * Deal with dirty inodes if we skipped any. We will have the LSN of + * the oldest dirty inode in our reclaim args if we skipped any. + * + * For direct reclaim, wait on an AIL push to clean some inodes. + * + * For kswapd, if we skipped too many dirty inodes (i.e. more dirty than + * we freed) then we need kswapd to back off once it's scan has been + * completed. That way it will have some clean inodes once it comes back + * and can make progress, but make sure we have inode cleaning in + * progress.... + */ + if (current_is_kswapd()) { + if (ra.dirty_skipped >= freed) { + if (current->reclaim_state) + current->reclaim_state->need_backoff = true; + if (ra.lowest_lsn != NULLCOMMITLSN) + xfs_ail_push(mp->m_ail, ra.lowest_lsn); + } + } else if (ra.lowest_lsn != NULLCOMMITLSN) { + xfs_ail_push_sync(mp->m_ail, ra.lowest_lsn); + } + + return freed; } static long -xfs_fs_free_cached_objects( +xfs_fs_nr_cached_objects( struct super_block *sb, struct shrink_control *sc) { - return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan); + /* Paranoia: catch incorrect calls during mount setup or teardown */ + if (WARN_ON_ONCE(!sb->s_fs_info)) + return 0; + + return list_lru_shrink_count(&XFS_M(sb)->m_inode_lru, sc); } static const struct super_operations xfs_super_operations = { From patchwork Thu Aug 1 02:17:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11070021 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DBD9614E5 for ; Thu, 1 Aug 2019 02:18:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D069E27528 for ; Thu, 1 Aug 2019 02:18:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C4AB026E98; Thu, 1 Aug 2019 02:18:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B157F2844C for ; Thu, 1 Aug 2019 02:18:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C4658E000B; Wed, 31 Jul 2019 22:18:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4FFDB8E0003; Wed, 31 Jul 2019 22:18:05 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37DB48E000E; Wed, 31 Jul 2019 22:18:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id DDD3C8E000B for ; Wed, 31 Jul 2019 22:18:04 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id f25so44539214pfk.14 for ; Wed, 31 Jul 2019 19:18:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=7d5oLYlXWaGa0BVAXfGGWmmgoHynsOdm25PWDv6FVzA=; b=bURXAlcb//MQKztYm716Sv22XAtnKuz3AV56aDt+sgmBtSBs/wBBh031k0oKTI+zdV qJwvqIPg3WrcZRiX8V7UwaYVyklbccEQPEnAd8NUa2wWEgcmJgdB/fJcNiKH2bp3Rq35 9hUYaxqVtyikbdbd6DgDyZzEej4YY0aa7GgnaRGV2BiGQ1VanZBL8uuNcaU/TwDVvene r1eznlqmlMTsOItXlspJUvQg5te/JuN2Pba1XZzOedBFt9GEqYJlOJn0KOHi6ATh7N2i sPCMZTygnHeeZN5gqdtPSW1uQhqydhZ+9aJSigylBqr1TfjKgljM4oDXb2sPKFmiqYSV D03Q== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com X-Gm-Message-State: APjAAAVXMJUKIMzh9fK/kOHzRem68j1SM8qNxMdQs8kS/eCoOxGwjojr yR2kLhIlvnTWP2iaYJzoui38eg3tsrImuCkoAWixrgAdJdI0v6fCl0FtEdVlexzcq7BuQ9ngi3D gLfPdXQov60XZaEOXWBz3+jc7KuvpFMR7Q950XITpyjkYdWnP8FLoA3vZAKlFT/c= X-Received: by 2002:a17:902:12d:: with SMTP id 42mr117066120plb.187.1564625884506; Wed, 31 Jul 2019 19:18:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqzGYvRTH/LnP6hFvfmxFKNM3v5R80WRWakduvLWee77kLq9ktRGw0YTmm502BBv6KtL0zaH X-Received: by 2002:a17:902:12d:: with SMTP id 42mr117066034plb.187.1564625882753; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564625882; cv=none; d=google.com; s=arc-20160816; b=QSHhi6j8HXnO0i1WGAsu2zWnG/0N/N+Qj2qbWJrAGvM0NgHaPxZgAboKIhv/a7GHHA xcEiMF7MDEpxkPMAU2cdnPGmtgrYNJ4+Q1gl+bu2zui4AkpXC9MdAqO8zztP1EUFiydL vI1edDty31vOR3zcMSU2L6lbwLVsS35MRwWPQsfX5e3J1dFi8b0OwFxUoSfgwGyHV1u+ 08Yvab/iO8DPrfR5ZwWJ2nMTwb69qk9op3HH0jQSBX0pTo5VSzxTJA3J9IYdAlNoXnDs X5bl12rGpyx+kKLzkNb8XITN9ZrpyM9LOrpT8b7kSWWB7/m0BzG2Fh3LlSPowOCy2eem sT8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=7d5oLYlXWaGa0BVAXfGGWmmgoHynsOdm25PWDv6FVzA=; b=yyW1IECCJOke+qni+KMFU/zax2Cif+Hs7FOWWqIvsRjDCXctkWPG23hKHDhFS7ACyi 5L6uheIcBJCy7yD36GXO6r2fOAKmH9y4bZ9niAVDCfUJuNthqSZcCjQVZthnNmiAF7uN X3Pa7g9g8XltK1yi7BaSMJuxVZTmC6hFORrKB4wvjxTmfAMc12g2CzwnVwOloFo48PwS Im/lOBGi0H0UJuQ+FUBn/BKBBP6D/0lnZUtxJHeZgvB0XR7kZ4H+wt5zPcCd4F4hAinO j3yGvQJOuKo2FB4kUw7BIkh0+Gtti2cYnbmWyLk26iLWvbH/iaOaJz741C2jZlDy1Fwi SgtA== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au. [211.29.132.246]) by mx.google.com with ESMTP id k143si33125871pfd.212.2019.07.31.19.18.02 for ; Wed, 31 Jul 2019 19:18:02 -0700 (PDT) Received-SPF: neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) client-ip=211.29.132.246; Authentication-Results: mx.google.com; spf=neutral (google.com: 211.29.132.246 is neither permitted nor denied by best guess record for domain of david@fromorbit.com) smtp.mailfrom=david@fromorbit.com Received: from dread.disaster.area (pa49-195-139-63.pa.nsw.optusnet.com.au [49.195.139.63]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A1EE143EBD3; Thu, 1 Aug 2019 12:17:58 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1ht0eB-0003bW-KC; Thu, 01 Aug 2019 12:16:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1ht0fH-0001li-ID; Thu, 01 Aug 2019 12:17:59 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 24/24] xfs: remove unusued old inode reclaim code Date: Thu, 1 Aug 2019 12:17:52 +1000 Message-Id: <20190801021752.4986-25-david@fromorbit.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190801021752.4986-1-david@fromorbit.com> References: <20190801021752.4986-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 cx=a_idp_d a=fNT+DnnR6FjB+3sUuX8HHA==:117 a=fNT+DnnR6FjB+3sUuX8HHA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=FmdZ9Uzk2mMA:10 a=20KFwNOVAAAA:8 a=NLq-Pxan09OTZ9pW5F8A:9 a=MM6_kbBQj0QFA03F:21 a=Qvj9VV6_zPZ6olsH:21 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner We don't use the custom AG radix tree walker, the reclaim radix tree tag, the reclaimable inode counters, etc, so remove the all now. Signed-off-by: Dave Chinner --- fs/xfs/xfs_icache.c | 410 +------------------------------------------- fs/xfs/xfs_icache.h | 7 +- fs/xfs/xfs_mount.h | 2 - fs/xfs/xfs_super.c | 1 - 4 files changed, 3 insertions(+), 417 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 891fe3795c8f..ad04de119ac1 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -139,81 +139,6 @@ xfs_inode_free( __xfs_inode_free(ip); } -static void -xfs_perag_set_reclaim_tag( - struct xfs_perag *pag) -{ - struct xfs_mount *mp = pag->pag_mount; - - lockdep_assert_held(&pag->pag_ici_lock); - if (pag->pag_ici_reclaimable++) - return; - - /* propagate the reclaim tag up into the perag radix tree */ - spin_lock(&mp->m_perag_lock); - radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno, - XFS_ICI_RECLAIM_TAG); - spin_unlock(&mp->m_perag_lock); - - trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_); -} - -static void -xfs_perag_clear_reclaim_tag( - struct xfs_perag *pag) -{ - struct xfs_mount *mp = pag->pag_mount; - - lockdep_assert_held(&pag->pag_ici_lock); - if (--pag->pag_ici_reclaimable) - return; - - /* clear the reclaim tag from the perag radix tree */ - spin_lock(&mp->m_perag_lock); - radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno, - XFS_ICI_RECLAIM_TAG); - spin_unlock(&mp->m_perag_lock); - trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_); -} - - -/* - * We set the inode flag atomically with the radix tree tag. - * Once we get tag lookups on the radix tree, this inode flag - * can go away. - */ -void -xfs_inode_set_reclaim_tag( - struct xfs_inode *ip) -{ - struct xfs_mount *mp = ip->i_mount; - struct xfs_perag *pag; - - pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); - spin_lock(&pag->pag_ici_lock); - spin_lock(&ip->i_flags_lock); - - radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino), - XFS_ICI_RECLAIM_TAG); - xfs_perag_set_reclaim_tag(pag); - __xfs_iflags_set(ip, XFS_IRECLAIMABLE); - - spin_unlock(&ip->i_flags_lock); - spin_unlock(&pag->pag_ici_lock); - xfs_perag_put(pag); -} - -STATIC void -xfs_inode_clear_reclaim_tag( - struct xfs_perag *pag, - xfs_ino_t ino) -{ - radix_tree_tag_clear(&pag->pag_ici_root, - XFS_INO_TO_AGINO(pag->pag_mount, ino), - XFS_ICI_RECLAIM_TAG); - xfs_perag_clear_reclaim_tag(pag); -} - static void xfs_inew_wait( struct xfs_inode *ip) @@ -397,17 +322,15 @@ xfs_iget_cache_hit( goto out_error; } - spin_lock(&pag->pag_ici_lock); - spin_lock(&ip->i_flags_lock); /* * Clear the per-lifetime state in the inode as we are now * effectively a new inode and need to return to the initial * state before reuse occurs. */ + spin_lock(&ip->i_flags_lock); ip->i_flags &= ~XFS_IRECLAIM_RESET_FLAGS; ip->i_flags |= XFS_INEW; - xfs_inode_clear_reclaim_tag(pag, ip->i_ino); inode->i_state = I_NEW; ip->i_sick = 0; ip->i_checked = 0; @@ -416,7 +339,6 @@ xfs_iget_cache_hit( init_rwsem(&inode->i_rwsem); spin_unlock(&ip->i_flags_lock); - spin_unlock(&pag->pag_ici_lock); } else { /* If the VFS inode is being torn down, pause and try again. */ if (!igrab(inode)) { @@ -967,336 +889,6 @@ xfs_inode_ag_iterator_tag( return last_error; } -/* - * Grab the inode for reclaim. - * - * Return false if we aren't going to reclaim it, true if it is a reclaim - * candidate. - * - * If the inode is clean or unreclaimable, return NULLCOMMITLSN to tell the - * caller it does not require flushing. Otherwise return the log item lsn of the - * inode so the caller can determine it's inode flush target. If we get the - * clean/dirty state wrong then it will be sorted in xfs_reclaim_inode() once we - * have locks held. - */ -STATIC bool -xfs_reclaim_inode_grab( - struct xfs_inode *ip, - int flags, - xfs_lsn_t *lsn) -{ - ASSERT(rcu_read_lock_held()); - *lsn = 0; - - /* quick check for stale RCU freed inode */ - if (!ip->i_ino) - return false; - - /* - * Do unlocked checks to see if the inode already is being flushed or in - * reclaim to avoid lock traffic. If the inode is not clean, return the - * it's position in the AIL for the caller to push to. - */ - if (!xfs_inode_clean(ip)) { - *lsn = ip->i_itemp->ili_item.li_lsn; - return false; - } - - if (__xfs_iflags_test(ip, XFS_IFLOCK | XFS_IRECLAIM)) - return false; - - /* - * The radix tree lock here protects a thread in xfs_iget from racing - * with us starting reclaim on the inode. Once we have the - * XFS_IRECLAIM flag set it will not touch us. - * - * Due to RCU lookup, we may find inodes that have been freed and only - * have XFS_IRECLAIM set. Indeed, we may see reallocated inodes that - * aren't candidates for reclaim at all, so we must check the - * XFS_IRECLAIMABLE is set first before proceeding to reclaim. - */ - spin_lock(&ip->i_flags_lock); - if (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) || - __xfs_iflags_test(ip, XFS_IRECLAIM)) { - /* not a reclaim candidate. */ - spin_unlock(&ip->i_flags_lock); - return false; - } - __xfs_iflags_set(ip, XFS_IRECLAIM); - spin_unlock(&ip->i_flags_lock); - return true; -} - -/* - * Inodes in different states need to be treated differently. The following - * table lists the inode states and the reclaim actions necessary: - * - * inode state iflush ret required action - * --------------- ---------- --------------- - * bad - reclaim - * shutdown EIO unpin and reclaim - * clean, unpinned 0 reclaim - * stale, unpinned 0 reclaim - * clean, pinned(*) 0 requeue - * stale, pinned EAGAIN requeue - * dirty, async - requeue - * dirty, sync 0 reclaim - * - * (*) dgc: I don't think the clean, pinned state is possible but it gets - * handled anyway given the order of checks implemented. - * - * Also, because we get the flush lock first, we know that any inode that has - * been flushed delwri has had the flush completed by the time we check that - * the inode is clean. - * - * Note that because the inode is flushed delayed write by AIL pushing, the - * flush lock may already be held here and waiting on it can result in very - * long latencies. Hence for sync reclaims, where we wait on the flush lock, - * the caller should push the AIL first before trying to reclaim inodes to - * minimise the amount of time spent waiting. For background relaim, we only - * bother to reclaim clean inodes anyway. - * - * Hence the order of actions after gaining the locks should be: - * bad => reclaim - * shutdown => unpin and reclaim - * pinned, async => requeue - * pinned, sync => unpin - * stale => reclaim - * clean => reclaim - * dirty, async => requeue - * dirty, sync => flush, wait and reclaim - * - * Returns true if the inode was reclaimed, false otherwise. - */ -STATIC bool -xfs_reclaim_inode( - struct xfs_inode *ip, - struct xfs_perag *pag, - xfs_lsn_t *lsn) -{ - xfs_ino_t ino; - - *lsn = 0; - - /* - * Don't try to flush the inode if another inode in this cluster has - * already flushed it after we did the initial checks in - * xfs_reclaim_inode_grab(). - */ - if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) - goto out; - if (!xfs_iflock_nowait(ip)) - goto out_unlock; - - /* If we are in shutdown, we don't care about blocking. */ - if (XFS_FORCED_SHUTDOWN(ip->i_mount)) { - xfs_iunpin_wait(ip); - /* xfs_iflush_abort() drops the flush lock */ - xfs_iflush_abort(ip, false); - goto reclaim; - } - - /* - * If it is pinned, we only want to flush this if there's nothing else - * to be flushed as it requires a log force. Hence we essentially set - * the LSN to flush the entire AIL which will end up triggering a log - * force to unpin this inode, but that will only happen if there are not - * other inodes in the scan that only need writeback. - */ - if (xfs_ipincount(ip)) { - *lsn = ip->i_itemp->ili_last_lsn; - goto out_ifunlock; - } - - /* - * Dirty inode we didn't catch, skip it. - */ - if (!xfs_inode_clean(ip) && !xfs_iflags_test(ip, XFS_ISTALE)) { - *lsn = ip->i_itemp->ili_item.li_lsn; - goto out_ifunlock; - } - - /* - * It's clean, we have it locked, we can now drop the flush lock - * and reclaim it. - */ - xfs_ifunlock(ip); - -reclaim: - ASSERT(!xfs_isiflocked(ip)); - ASSERT(xfs_inode_clean(ip) || xfs_iflags_test(ip, XFS_ISTALE)); - ASSERT(ip->i_ino != 0); - - /* - * Because we use RCU freeing we need to ensure the inode always appears - * to be reclaimed with an invalid inode number when in the free state. - * We do this as early as possible under the ILOCK so that - * xfs_iflush_cluster() and xfs_ifree_cluster() can be guaranteed to - * detect races with us here. By doing this, we guarantee that once - * xfs_iflush_cluster() or xfs_ifree_cluster() has locked XFS_ILOCK that - * it will see either a valid inode that will serialise correctly, or it - * will see an invalid inode that it can skip. - */ - spin_lock(&ip->i_flags_lock); - ino = ip->i_ino; /* for radix_tree_delete */ - ip->i_flags = XFS_IRECLAIM; - ip->i_ino = 0; - - /* XXX: temporary until lru based reclaim */ - list_lru_del(&pag->pag_mount->m_inode_lru, &VFS_I(ip)->i_lru); - spin_unlock(&ip->i_flags_lock); - - xfs_iunlock(ip, XFS_ILOCK_EXCL); - - XFS_STATS_INC(ip->i_mount, xs_ig_reclaims); - /* - * Remove the inode from the per-AG radix tree. - * - * Because radix_tree_delete won't complain even if the item was never - * added to the tree assert that it's been there before to catch - * problems with the inode life time early on. - */ - spin_lock(&pag->pag_ici_lock); - if (!radix_tree_delete(&pag->pag_ici_root, - XFS_INO_TO_AGINO(ip->i_mount, ino))) - ASSERT(0); - xfs_perag_clear_reclaim_tag(pag); - spin_unlock(&pag->pag_ici_lock); - - /* - * Here we do an (almost) spurious inode lock in order to coordinate - * with inode cache radix tree lookups. This is because the lookup - * can reference the inodes in the cache without taking references. - * - * We make that OK here by ensuring that we wait until the inode is - * unlocked after the lookup before we go ahead and free it. - */ - xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_qm_dqdetach(ip); - xfs_iunlock(ip, XFS_ILOCK_EXCL); - - __xfs_inode_free(ip); - return true; - -out_ifunlock: - xfs_ifunlock(ip); -out_unlock: - xfs_iunlock(ip, XFS_ILOCK_EXCL); -out: - xfs_iflags_clear(ip, XFS_IRECLAIM); - return false; -} - -/* - * Walk the AGs and reclaim the inodes in them. Even if the filesystem is - * corrupted, we still want to try to reclaim all the inodes. If we don't, - * then a shut down during filesystem unmount reclaim walk leak all the - * unreclaimed inodes. - * - * Return the number of inodes freed. - */ -int -xfs_reclaim_inodes_ag( - struct xfs_mount *mp, - int flags, - int nr_to_scan) -{ - struct xfs_perag *pag; - xfs_agnumber_t ag; - xfs_lsn_t lsn, lowest_lsn = NULLCOMMITLSN; - long freed = 0; - - ag = 0; - while ((pag = xfs_perag_get_tag(mp, ag, XFS_ICI_RECLAIM_TAG))) { - unsigned long first_index = 0; - int done = 0; - int nr_found = 0; - - ag = pag->pag_agno + 1; - first_index = pag->pag_ici_reclaim_cursor; - - do { - struct xfs_inode *batch[XFS_LOOKUP_BATCH]; - int i; - - rcu_read_lock(); - nr_found = radix_tree_gang_lookup_tag( - &pag->pag_ici_root, - (void **)batch, first_index, - XFS_LOOKUP_BATCH, - XFS_ICI_RECLAIM_TAG); - if (!nr_found) { - done = 1; - rcu_read_unlock(); - break; - } - - /* - * Grab the inodes before we drop the lock. if we found - * nothing, nr == 0 and the loop will be skipped. - */ - for (i = 0; i < nr_found; i++) { - struct xfs_inode *ip = batch[i]; - - if (done || - !xfs_reclaim_inode_grab(ip, flags, &lsn)) - batch[i] = NULL; - - if (lsn && XFS_LSN_CMP(lsn, lowest_lsn) < 0) - lowest_lsn = lsn; - - /* - * Update the index for the next lookup. Catch - * overflows into the next AG range which can - * occur if we have inodes in the last block of - * the AG and we are currently pointing to the - * last inode. - * - * Because we may see inodes that are from the - * wrong AG due to RCU freeing and - * reallocation, only update the index if it - * lies in this AG. It was a race that lead us - * to see this inode, so another lookup from - * the same index will not find it again. - */ - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != - pag->pag_agno) - continue; - first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); - if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) - done = 1; - } - - /* unlock now we've grabbed the inodes. */ - rcu_read_unlock(); - - for (i = 0; i < nr_found; i++) { - if (!batch[i]) - continue; - if (xfs_reclaim_inode(batch[i], pag, &lsn)) - freed++; - if (lsn && XFS_LSN_CMP(lsn, lowest_lsn) < 0) - lowest_lsn = lsn; - } - - nr_to_scan -= XFS_LOOKUP_BATCH; - cond_resched(); - - } while (nr_found && !done && nr_to_scan > 0); - - if (!done) - pag->pag_ici_reclaim_cursor = first_index; - else - pag->pag_ici_reclaim_cursor = 0; - xfs_perag_put(pag); - } - - if ((flags & SYNC_WAIT) && lowest_lsn != NULLCOMMITLSN) - xfs_ail_push_sync(mp->m_ail, lowest_lsn); - - return freed; -} - enum lru_status xfs_inode_reclaim_isolate( struct list_head *item, diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index dadc69a30f33..0b4d06691275 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -25,9 +25,8 @@ struct xfs_eofblocks { */ #define XFS_ICI_NO_TAG (-1) /* special flag for an untagged lookup in xfs_inode_ag_iterator */ -#define XFS_ICI_RECLAIM_TAG 0 /* inode is to be reclaimed */ -#define XFS_ICI_EOFBLOCKS_TAG 1 /* inode has blocks beyond EOF */ -#define XFS_ICI_COWBLOCKS_TAG 2 /* inode can have cow blocks to gc */ +#define XFS_ICI_EOFBLOCKS_TAG 0 /* inode has blocks beyond EOF */ +#define XFS_ICI_COWBLOCKS_TAG 1 /* inode can have cow blocks to gc */ /* * Flags for xfs_iget() @@ -60,8 +59,6 @@ enum lru_status xfs_inode_reclaim_isolate(struct list_head *item, void xfs_dispose_inodes(struct list_head *freeable); void xfs_reclaim_inodes(struct xfs_mount *mp); -void xfs_inode_set_reclaim_tag(struct xfs_inode *ip); - void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip); void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip); int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 4a4ecbc22246..ef63357da7af 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -383,8 +383,6 @@ typedef struct xfs_perag { spinlock_t pag_ici_lock; /* incore inode cache lock */ struct radix_tree_root pag_ici_root; /* incore inode cache root */ - int pag_ici_reclaimable; /* reclaimable inodes */ - unsigned long pag_ici_reclaim_cursor; /* reclaim restart point */ /* buffer cache index */ spinlock_t pag_buf_lock; /* lock for pag_buf_hash */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e3e898a2896c..0559fb686e9d 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -955,7 +955,6 @@ xfs_fs_destroy_inode( __xfs_iflags_set(ip, XFS_IRECLAIMABLE); list_lru_add(&mp->m_inode_lru, &VFS_I(ip)->i_lru); spin_unlock(&ip->i_flags_lock); - xfs_inode_set_reclaim_tag(ip); } static void