From patchwork Fri Jul 27 16:21:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Drake X-Patchwork-Id: 10547429 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F04E174A for ; Fri, 27 Jul 2018 16:21:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F0962C127 for ; Fri, 27 Jul 2018 16:21:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 730572C1CC; Fri, 27 Jul 2018 16:21:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7C5C52C127 for ; Fri, 27 Jul 2018 16:21:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90D456B0007; Fri, 27 Jul 2018 12:21:54 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8BD3D6B0008; Fri, 27 Jul 2018 12:21:54 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 760846B000A; Fri, 27 Jul 2018 12:21:54 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 2DFC46B0007 for ; Fri, 27 Jul 2018 12:21:54 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id r2-v6so3227240pgp.3 for ; Fri, 27 Jul 2018 09:21:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id; bh=RJo1bpbTpD3jpRAwX9/ttCICodK0fTis+3K5gkO1RB0=; b=oXTQuakU9knWXgE5a0DoTSlvSJm4Pfyra6z2O+qVbzrO7QFhxMD9HrSaWs3mG8/fgl Oaa9cvCVct+tPpKabh9pXyJHI2U1cFumuR5PHqb23tn/ZGfABvLC2wh8WcgtoKVavqqp COqfTUQt7N3hzpltSy0wTYw88T4pPztq/zrjwX0TxubE7rNjsE19KokQ4tXn0LG2xBI0 g5Xc289pyiDIJmDZgtmoW1gxXlEBDGY/7GcEspHgKCrqasr8PMnFaoTaCCXyRsKauDCv pUpBDG4Jsz4lmOpT1MYbHMF6YKvvjO1Xe0xOzIgvuA0A1L2tWUP2m928wDMCofrOTEJL sxRg== X-Gm-Message-State: AOUpUlFje1uWTleqgqUULfhunHDIaYXwwm2jPhCRdBJsfhX66W6rda0z 1gKguJ9aiTWr9w+8JetjJRHjWPnFuVo4JWI84fREkvKNhxtDtps599/JBXBHXmJBgwfds6P9HfJ nQ89fw8/cQY9emedZzQMRbir/PTs1uXGwjbrrD97J9NTST7XuTHraJ494A80bm3gqEhpemwGrsj cWqq1ypDVt0k/PtoIJmTnjSIfoq63OHx0EM+VUo8fu+zn1TBKwrjVEarNHCJX5MPBsHsdkj06Kr F7a9OsFcKvHSHsbdPaT0oVgytqx7FM4dfUE/loCWjVQ+nGKf1M8HxDaOwYe41G8U4kSRbSA3w6m hurZcDKK8rj90bZLniwi804LGRlwnTNduNaYcGarpr/fuz8yRcU4eMW6qlobckRcWjmAyhlWLNX I X-Received: by 2002:a17:902:28e9:: with SMTP id f96-v6mr6543583plb.240.1532708513844; Fri, 27 Jul 2018 09:21:53 -0700 (PDT) X-Received: by 2002:a17:902:28e9:: with SMTP id f96-v6mr6543525plb.240.1532708512876; Fri, 27 Jul 2018 09:21:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532708512; cv=none; d=google.com; s=arc-20160816; b=kP/FXA05+eyDsgscuZPFFValuiTH5LS65Py686ehwSds4qKn0d7AxUwHCC5heyXziW p+oV/hklV2SS0BlfGOMFYFoc6FjsY66Q7fjQQovZyNhWI4sqCvs5QTf/btdJyIK5PkqB C1sJWbY/Ux3o+fy2UbA3TFmkFAL8sxI8s3PcNNgy44mC9UNId3d5ZUmRjrQmmN8mYwxv bU3NDus1mFdqjoGoG9acQvyGXB+3Nry0wcR5ooSmkWtQ26AY4CxgshmdWrrNx4UxN5vu FJJtjDJwQKKovsL4mr5JetfAKBnPl+aCzSN3ucfyt6hQtS/tH57W4PBy7s5wx7ATRUc1 imdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=RJo1bpbTpD3jpRAwX9/ttCICodK0fTis+3K5gkO1RB0=; b=Xr+t+ORd00TFFeo18PPyfuaFkRLpv69ROvKy1MXQpoBz4PX+hnJ9glC6xw4YvJ7F2X cej1EHqYl/eo963oYoEWry6DuEWp7F1/8Sg8xIRu9Dcqt6fb1f8VwXQLrMZcHeDBOLDP qMMmBUIvNr2hHRdfX5duypEAg/BZ+WF9kwzXFyRlpZV5tipVpLoouFDfBS/ypNMuNpvp nG+/QKY+/zV9ef9buwyWsHUQJL7f6oCPXt9/Jpqwt3MktSKjEOH6AzUOPFRGeHyLqtB+ UoFOakFpKJbabzrWg8fp4rBUKpuDuMueWfWSSBl1z6pyk/z+PS+3J/ut4CuyPVmRB/64 PJDw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@endlessm-com.20150623.gappssmtp.com header.s=20150623 header.b=P7b81J3H; spf=pass (google.com: domain of drake@endlessm.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=drake@endlessm.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id d129-v6sor1231247pfc.144.2018.07.27.09.21.52 for (Google Transport Security); Fri, 27 Jul 2018 09:21:52 -0700 (PDT) Received-SPF: pass (google.com: domain of drake@endlessm.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@endlessm-com.20150623.gappssmtp.com header.s=20150623 header.b=P7b81J3H; spf=pass (google.com: domain of drake@endlessm.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=drake@endlessm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=endlessm-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=RJo1bpbTpD3jpRAwX9/ttCICodK0fTis+3K5gkO1RB0=; b=P7b81J3HXJbzbV/IW/4WCeglpX01UF+PG52sk/qnn4KG/BTTIAhEZ8gSt+BGjIR55I /sfNdaX0yL7AUK0/NmpbNxkOJ9h36I7w41leOMRjSyrwynsCcU0v9OlAVvoTHuwQFaWI C+pH/A7IUkY9C9FjSdynxr9ITVY0wxNtANaivxYh+kLJ0oKFFWrDXtVOjzHwTD7q2rFt sAJALskKD2o66CX5e9hBs+fxXFR7IqA6RykefAPsuo44jicK1knjwSnSHVdZlLOLmGR9 0IdMVPwMxy6Uf8Vn4JIB61amolagXX3EouWzpSdYOSOV02Tw4Jh6pfXgNkZ8O1QrK6w7 GVrw== X-Google-Smtp-Source: AAOMgpf81hw1ubjw1FZvKXd1VdK+UoSzXOU+ES/KsiKnIjECIJODM6CxQIOLbA1/kA3dn+SkwtMvtg== X-Received: by 2002:a62:6941:: with SMTP id e62-v6mr7332670pfc.217.1532708512457; Fri, 27 Jul 2018 09:21:52 -0700 (PDT) Received: from limbo.local ([190.105.169.2]) by smtp.gmail.com with ESMTPSA id r28-v6sm11392248pfd.37.2018.07.27.09.21.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Jul 2018 09:21:51 -0700 (PDT) From: Daniel Drake To: mhocko@kernel.org Cc: hannes@cmpxchg.org, linux-mm@kvack.org, linux@endlessm.com, linux-kernel@vger.kernel.org Subject: Making direct reclaim fail when thrashing Date: Fri, 27 Jul 2018 11:21:43 -0500 Message-Id: <20180727162143.26466-1-drake@endlessm.com> X-Mailer: git-send-email 2.17.1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Split from the thread [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 where we were discussing if/how to make the direct reclaim codepath fail if we're excessively thrashing, so that the OOM killer might step in. This is potentially desirable when the thrashing is so bad that the UI stops responding, causing the user to pull the plug. On Tue, Jul 17, 2018 at 7:23 AM, Michal Hocko wrote: > mm/workingset.c allows for tracking when an actual page got evicted. > workingset_refault tells us whether a give filemap fault is a recent > refault and activates the page if that is the case. So what you need is > to note how many refaulted pages we have on the active LRU list. If that > is a large part of the list and if the inactive list is really small > then we know we are trashing. This all sounds much easier than it will > eventually turn out to be of course but I didn't really get to play with > this much. Apologies in advance for any silly mistakes or terrible code that follows, as I am not familiar in this part of the kernel. As mentioned in my last mail, knowing if a page on the active list was refaulted into place appears not trivial, because the eviction information was lost upon refault (it was stored in the page cache shadow entry). Here I'm experimenting by adding another tag to the page cache radix tree, tagging pages that were activated in the refault path. And then in get_scan_count I'm checking how many active pages have that tag, and also looking at the size of the active and inactive lists. It has a performance blow (probably due to looping over the whole active list and doing lots of locking?) but I figured it might serve as one step forward. The results are not exactly as I would expect. Upon launching 20 processes that allocate and memset 100mb RAM each, exhausting all RAM (and no swap available), the kernel starts thrashing and I get numbers like: get_scan_count lru1 active=422714 inactive=19595 refaulted=0 get_scan_count lru3 active=832 inactive=757 refaulted=21 Lots of active anonymous pages (lru1), and none refaulted, perhaps not surprising because it can't swap them out, no swap available. But only few file pages on the lists (lru3), and only a tiny number of refaulted ones, which doesn't line up with your suggestion of detecting when a large part of the active list is made up of refaulted pages. Any further suggestions appreciated. Thanks Daniel --- include/linux/fs.h | 1 + include/linux/radix-tree.h | 2 +- mm/filemap.c | 2 ++ mm/vmscan.c | 37 +++++++++++++++++++++++++++++++++++++ 4 files changed, 41 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index d85ac9d24bb3..45f94ffd1c67 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -462,6 +462,7 @@ struct block_device { #define PAGECACHE_TAG_DIRTY 0 #define PAGECACHE_TAG_WRITEBACK 1 #define PAGECACHE_TAG_TOWRITE 2 +#define PAGECACHE_TAG_REFAULTED 3 int mapping_tagged(struct address_space *mapping, int tag); diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 34149e8b5f73..86eccb71ef7e 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -65,7 +65,7 @@ static inline bool radix_tree_is_internal_node(void *ptr) /*** radix-tree API starts here ***/ -#define RADIX_TREE_MAX_TAGS 3 +#define RADIX_TREE_MAX_TAGS 4 #ifndef RADIX_TREE_MAP_SHIFT #define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6) diff --git a/mm/filemap.c b/mm/filemap.c index 250f675dcfb2..9a686570dc75 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -917,6 +917,8 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, */ if (!(gfp_mask & __GFP_WRITE) && shadow && workingset_refault(shadow)) { + radix_tree_tag_set(&mapping->i_pages, page_index(page), + PAGECACHE_TAG_REFAULTED); SetPageActive(page); workingset_activation(page); } else diff --git a/mm/vmscan.c b/mm/vmscan.c index 03822f86f288..79bc810b43bb 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2102,6 +2102,30 @@ enum scan_balance { SCAN_FILE, }; + +static int count_refaulted(struct lruvec *lruvec, enum lru_list lru) { + int nr_refaulted = 0; + struct page *page; + + list_for_each_entry(page, &lruvec->lists[lru], lru) { + /* Lookup page cache entry from page following the approach + * taken in __set_page_dirty_nobuffers */ + unsigned long flags; + struct address_space *mapping = page_mapping(page); + if (!mapping) + continue; + + xa_lock_irqsave(&mapping->i_pages, flags); + BUG_ON(page_mapping(page) != mapping); + nr_refaulted += radix_tree_tag_get(&mapping->i_pages, + page_index(page), + PAGECACHE_TAG_REFAULTED); + xa_unlock_irqrestore(&mapping->i_pages, flags); + } + + return nr_refaulted; +} + /* * Determine how aggressively the anon and file LRU lists should be * scanned. The relative value of each set of LRU lists is determined @@ -2270,6 +2294,19 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, unsigned long size; unsigned long scan; + if (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE) { + int nr_refaulted; + unsigned long inactive, active; + + nr_refaulted = count_refaulted(lruvec, lru); + active = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); + inactive = lruvec_lru_size(lruvec, lru - 1, + sc->reclaim_idx); + pr_err("get_scan_count lru%d active=%ld inactive=%ld " + "refaulted=%d\n", + lru, active, inactive, nr_refaulted); + } + size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); scan = size >> sc->priority; /*