From patchwork Tue Dec 11 14:29:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 10723901 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B51DE91E for ; Tue, 11 Dec 2018 14:30:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A56C32B199 for ; Tue, 11 Dec 2018 14:30:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9A2A52B1A5; Tue, 11 Dec 2018 14:30:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 006EC2B19A for ; Tue, 11 Dec 2018 14:30:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 290CB8E0097; Tue, 11 Dec 2018 09:30:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 240428E004D; Tue, 11 Dec 2018 09:30:10 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12E5F8E0097; Tue, 11 Dec 2018 09:30:10 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id AC8998E004D for ; Tue, 11 Dec 2018 09:30:09 -0500 (EST) Received: by mail-ed1-f70.google.com with SMTP id w15so7010261edl.21 for ; Tue, 11 Dec 2018 06:30:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=TPy8IMX//uqJOrVHEIcG76wWCF/8Rr43ZW6+UNT8RBU=; b=V+TEho+NB5BLW0SJHKwpuI2s+iISkL7lnU1FSP/xUKUkPshzEB1DhuQJZPf730NkRw I6DRy9syp2lpFB5FrHhNO5VPVt2OlxT+vD1bmzlc8BC27i7K6c6+WaTvtZgZm5BZMJq6 BS4dgdpZzf6FGWRxLZ/u0sF4Ihzl8HEIahfcOpLO6WTi0ER4GGnOclVC0ldE01UjiF5e J7s2hPMrNoXH91FmvU7bvg9bHS1g9/dsH/jsEmyVgT+5XpDXKOUNEwHdbJqCLWOhpd45 omSBD2w2q7XQDaj367lfCaay9fVnNqqAhn2z6390/I5sh8Dy5llDrFgBG8WKGB4T0lhe OEwg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Gm-Message-State: AA+aEWbXKt/8K5N4oKRpL6oxDsBv2iCKHbSODWhswzM4SzlN0xneFBPf fIhCCFUH56vZiARWIHtgwGwJz8v4H7zixo2p2Mt0gT1PainJG7HsLdmP7fpm4IOWCH6L6lIRuMJ o8KaiHRR+7fBaxLy/up855IC3uMCMSsejWpRZ+MUyQrQPAwhOi/SoGEAXJacjEbHu0g== X-Received: by 2002:a50:ea8d:: with SMTP id d13mr15247010edo.126.1544538609209; Tue, 11 Dec 2018 06:30:09 -0800 (PST) X-Google-Smtp-Source: AFSGD/WexnIySDKWkczFici3eDl/cwgvZN7ZT5U3pF0xBikUnim3H9LY/+pGVKxJa3u5CL36XTAu X-Received: by 2002:a50:ea8d:: with SMTP id d13mr15246952edo.126.1544538608265; Tue, 11 Dec 2018 06:30:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544538608; cv=none; d=google.com; s=arc-20160816; b=vH03McHTTsOLpR5Mygd1ei3fFkDZhkXVdRIhvMJhR0ccavZ/0q4lLIFVo/05XoRJWP ruJ3o+J35pvJtKFr/Jvw+LElkOgxNTnJw4J0aCPHeLrpucFJikrfLms6uA5QanlPB7Z1 4IZOLqPh0omKgdbcmbEDX2ghBqr9Rsr4Q+cV8GMoq3iqLW+p6TgYmGtn5mKUeA/tEN6h BWApWqrXO1GaIepFQ6eRwdKyv+QDGQsjc/t9RfVazSvL7LIuSjjma9nRccrcCsdf5FNz TcrjjOCbUy0xggHu8X50r9MzEF1bblLiJx4LAsyhtGSiRQAVDv6A8paogjIEfe1WEb5C s1lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=TPy8IMX//uqJOrVHEIcG76wWCF/8Rr43ZW6+UNT8RBU=; b=Qj4z/uonGuS4r1XnAx4NCiPjya5SvbCqSqR/rhtWXrE2l7F8bGOM+TciCRRxrvD+ex R7Lrf5NHukL/kJ91NDdHBnzYjLiXy+96oiJa1zfvHDtNXRNAicreGswzdaTXUSBGyoXd IwXCqXD9OyLjBXav17VWszO5auDgsUUzu0Z/luBLQNc0Rv45IDLjZBsLaR988ct6Ltp9 AIqlDZZ/MO9Q9iYg++bVx2ElsgpLX4xG6GAT8s46XDnJvW0BQXzGw14JwExQKlR6t0KP XpPauFHGLzH6lrZwjZ2vj7BkVage9WvFJ+Y/XJK56s6LO8CKngoFK1pRGRpvwjCwm1iM GuIQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id k21-v6si805323ejp.31.2018.12.11.06.30.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Dec 2018 06:30:08 -0800 (PST) Received-SPF: pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B62E8B03A; Tue, 11 Dec 2018 14:30:07 +0000 (UTC) From: Vlastimil Babka To: David Rientjes , Andrea Arcangeli , Mel Gorman Cc: Michal Hocko , Linus Torvalds , linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Subject: [RFC 1/3] mm, thp: restore __GFP_NORETRY for madvised thp fault allocations Date: Tue, 11 Dec 2018 15:29:39 +0100 Message-Id: <20181211142941.20500-2-vbabka@suse.cz> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20181211142941.20500-1-vbabka@suse.cz> References: <20181211142941.20500-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Commit 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations") intended to make THP faults in MADV_HUGEPAGE areas more successful for processes that indicate that they are willing to pay a higher initial setup cost for long-term THP benefits. In the current page allocator implementation this means that the allocations will try to use reclaim and the more costly sync compaction mode, in case the initial direct async compaction fails. However, THP faults also include __GFP_THISNODE, which, combined with direct reclaim, can result in a node-reclaim-like local node thrashing behavior, as reported by Andrea [1]. While this patch is not a full fix, the first step is to restore __GFP_NORETRY for madvised THP faults. The expected downside are potentially worse THP fault success rates for the madvised areas, which will have to then rely more on khugepaged. For khugepaged, __GFP_NORETRY is not restored, as its activity should be limited enough by sleeping to cause noticeable thrashing. Note that alloc_new_node_page() and new_page() is probably another candidate as they handle the migrate_pages(2), resp. mbind(2) syscall, which can thus allow unprivileged node-reclaim-like behavior. The patch also updates the comments in alloc_hugepage_direct_gfpmask() because elsewhere compaction during page fault is called direct compaction, and 'synchronous' refers to the migration mode, which is not used for THP faults. [1] https://lkml.kernel.org/m/20180820032204.9591-1-aarcange@redhat.com Reported-by: Andrea Arcangeli Signed-off-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: David Rientjes Cc: Mel Gorman Cc: Michal Hocko --- mm/huge_memory.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5da55b38b1b7..c442b12b060c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -633,24 +633,23 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) { const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); - /* Always do synchronous compaction */ + /* Always try direct compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); + return GFP_TRANSHUGE | __GFP_NORETRY; /* Kick kcompactd and fail quickly */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags)) return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; - /* Synchronous compaction if madvised, otherwise kick kcompactd */ + /* Direct compaction if madvised, otherwise kick kcompactd */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags)) return GFP_TRANSHUGE_LIGHT | - (vma_madvised ? __GFP_DIRECT_RECLAIM : + (vma_madvised ? (__GFP_DIRECT_RECLAIM | __GFP_NORETRY): __GFP_KSWAPD_RECLAIM); - /* Only do synchronous compaction if madvised */ + /* Only do direct compaction if madvised */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | - (vma_madvised ? __GFP_DIRECT_RECLAIM : 0); + return vma_madvised ? (GFP_TRANSHUGE | __GFP_NORETRY) : GFP_TRANSHUGE_LIGHT; return GFP_TRANSHUGE_LIGHT; } From patchwork Tue Dec 11 14:29:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 10723905 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E3191869 for ; Tue, 11 Dec 2018 14:30:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E4142B192 for ; Tue, 11 Dec 2018 14:30:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 42BC32B1AA; Tue, 11 Dec 2018 14:30:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C17DF2B1A8 for ; Tue, 11 Dec 2018 14:30:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 858308E0094; Tue, 11 Dec 2018 09:30:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7A2788E0096; Tue, 11 Dec 2018 09:30:10 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EB3D8E0094; Tue, 11 Dec 2018 09:30:10 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id D99E68E0095 for ; Tue, 11 Dec 2018 09:30:09 -0500 (EST) Received: by mail-ed1-f69.google.com with SMTP id o21so7040297edq.4 for ; Tue, 11 Dec 2018 06:30:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=qndwCSP65fzZEEikWfQfaFyR1QRf8lbn9R8lxbp3TrI=; b=bqTcDwUot9gaFK7gvOOJmZXIoP8k3X31Ts2Ym2hVd3zgjWAZcE0CSF0FZW626rmEfg J4aUXWJ86YNljblgcukSvzDj+k64oRZ5Pu3PXX3QJfN4mg0O71O+x4YcqousUMPW69S9 DfB8ZVIS8Jrj3lhkHOgWxJQTqJc4z7D7YCrtsx8eX7QtLAcswqEpgzJ7Spnt9Yo+a0dF M3dvUUQSaeZY1fbMBr3AgiQ3O7EBi+SazaJH9F79uzLvrPpOccZtI/Fo6vqEHLQrdCiA uJPMFvGAP70nYB2cwaRy1VtUdQ+nx12cDvjIhowHnm0VKyDi9Riult7pluki3Y+f95MD aHmg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Gm-Message-State: AA+aEWaaaCsljdTSKmtCuCap9cfN5ylC4cK5Sz7/g2QInxFOUsYY8+Bf H5RpB6jcrNXZanh8WyL/Iplx8FIrdizsLq6TmarE5BmpGSu/0C7JKIsyiVHeIEtC09GWcXVVX1A Gr6bFyae7TlcZRk5UR/bfeaX6sWtm0BFRTx6GABSC3FTrLRu9fLCmi14Svam17dCn6g== X-Received: by 2002:a50:da43:: with SMTP id a3mr15752675edk.62.1544538609368; Tue, 11 Dec 2018 06:30:09 -0800 (PST) X-Google-Smtp-Source: AFSGD/XWpKIyQT2VS4WVjPXRTpNARwuTYASOY2fMVvb1vkCJalC0jpnO0UXYnP+eY0uI98j41C+8 X-Received: by 2002:a50:da43:: with SMTP id a3mr15752612edk.62.1544538608355; Tue, 11 Dec 2018 06:30:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544538608; cv=none; d=google.com; s=arc-20160816; b=AKpvAfnm1UBYgIs6iUgDPMvFFQWTsTtB9flU2KjpZqRzACv58pv4ln0OLLfqeqTwNt GTQhJ3Ca8rb3YfVC6WNim222HHcf4EM3gBPnTHIFVq0KPW4VHmNR6fGX4ZrEyla3gl9b XbG2T0P/rn/PY/X6ozfeC0hHGJF4f2u1JlCfsJntgOWWKZZnJPP1TBBeB6bSIe4pQU/q /PDVhmaQb40d+7ofZiv2YFWhxnfxnF69PtyPYF3G+cpoM8dEoAy68ZIHJurua5l1AFrT DtlqBagDb7ZPDlfp1Ib3I+iTqE5hTPKsfPEL8C8G9wVG8yWbsePwqZHa7RHxXGWrw2WP XHjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=qndwCSP65fzZEEikWfQfaFyR1QRf8lbn9R8lxbp3TrI=; b=BuqrFZUWfrIGPKXJLts50fwHybOGtyNbhOvPQqq4PeLn7U3fUl1W4ofCSZUDDgmz8B n2gBP1oi0RbvW+OwTCS7g231VHt023X8K+LhLIXOgr8Woo1wqDdAYvSzE0kbZw2BSu2L ekGU5ZYw+4gj0ptkItDGaFlH/wCGv7giYy6Tt5ncL/nYQS9sdihgbWlhm6g+JXfuPO2J mxLTSNH8wlmCf+9cZAlqHd2B/VZR4YcBy7hHMPSOFKGuN/f0rTpx6k5cgMiLntTz9Hfz SF/ZmyqiMaIV9TxGz9tshdTJkfLOItgEyXpuCZCZKN7H/6wQuQJ+yA2IBXXicZXUftBQ kfsg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id q19-v6si1846770ejt.57.2018.12.11.06.30.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Dec 2018 06:30:08 -0800 (PST) Received-SPF: pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B5A98B039; Tue, 11 Dec 2018 14:30:07 +0000 (UTC) From: Vlastimil Babka To: David Rientjes , Andrea Arcangeli , Mel Gorman Cc: Michal Hocko , Linus Torvalds , linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Subject: [RFC 2/3] mm, page_alloc: reclaim for __GFP_NORETRY costly requests only when compaction was skipped Date: Tue, 11 Dec 2018 15:29:40 +0100 Message-Id: <20181211142941.20500-3-vbabka@suse.cz> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20181211142941.20500-1-vbabka@suse.cz> References: <20181211142941.20500-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For costly __GFP_NORETRY allocations (including THP's) we first do an initial compaction attempt and if that fails, we proceed with reclaim and another round of compaction, unless compaction was deferred due to earlier multiple failures. Andrea proposed [1] that we count all compaction failures as the defered case in try_to_compact_pages(), but I don't think that's a good idea in general. Instead, change the __GFP_NORETRY specific condition so that it only proceeds with further reclaim/compaction when the initial compaction attempt was skipped due to lack of free base pages. Note that the original condition probably never worked properly for THP's, because compaction can only become deferred after a sync compaction failure, and THP's only perform async compaction, except khugepaged, which is infrequent, or madvised faults (until the previous patch restored __GFP_NORETRY for those) which are not the default case. Deferring due to async compaction failures should be however also beneficial and thus introduced in the next patch. Also note that due to how try_to_compact_pages() constructs its return value from compaction attempts across the whole zonelist, returning COMPACT_SKIPPED means that compaction was skipped for *all* attempted zones/nodes, which means all zones/nodes are low on memory at the same moment. This is probably rare, which would mean that the resulting 'goto nopage' would be very common, just because e.g. a single zone had enough memory and compaction failed there, while the rest of nodes could succeed after reclaim. However, since THP faults use __GFP_THISNODE, compaction is also attempted only for a single node, so in practice there should be no significant loss of information when constructing the return value, nor bias towards 'goto nopage' for THP faults. [1] https://lkml.kernel.org/r/20181206005425.GB21159@redhat.com Suggested-by: Andrea Arcangeli Signed-off-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: David Rientjes Cc: Mel Gorman Cc: Michal Hocko --- mm/page_alloc.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ec9cc407216..3d83a6093ada 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4129,14 +4129,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, */ if (costly_order && (gfp_mask & __GFP_NORETRY)) { /* - * If compaction is deferred for high-order allocations, - * it is because sync compaction recently failed. If - * this is the case and the caller requested a THP - * allocation, we do not want to heavily disrupt the - * system, so we fail the allocation instead of entering - * direct reclaim. + * If compaction was skipped because of insufficient + * free pages, proceed with reclaim and another + * compaction attempt. If it failed for other reasons or + * was deferred, do not reclaim and retry, as we do not + * want to heavily disrupt the system for a costly + * __GFP_NORETRY allocation such as THP. */ - if (compact_result == COMPACT_DEFERRED) + if (compact_result != COMPACT_SKIPPED) goto nopage; /* From patchwork Tue Dec 11 14:29:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 10723907 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D445B1869 for ; Tue, 11 Dec 2018 14:30:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C439C2B192 for ; Tue, 11 Dec 2018 14:30:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B8AE02B1A5; Tue, 11 Dec 2018 14:30:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C69AD2B19E for ; Tue, 11 Dec 2018 14:30:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C12388E0096; Tue, 11 Dec 2018 09:30:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A37618E0095; Tue, 11 Dec 2018 09:30:10 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E76C8E0098; Tue, 11 Dec 2018 09:30:10 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 015A38E0096 for ; Tue, 11 Dec 2018 09:30:09 -0500 (EST) Received: by mail-ed1-f72.google.com with SMTP id w2so7127254edc.13 for ; Tue, 11 Dec 2018 06:30:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=9GziqcA7np7N0DaqyW6zeEbZLCaCpd0e+vQXIDmrbnQ=; b=Fcl3N903gWz1hZeaE7eCt5DdBz2Bs0LRCghg9yXs9Gy6krfpUn8A1w8zPYG4Sm+gVl 51AarGmI39XR9h6khl038oMLTlqGmV8/AKYyBB864DnHVaekVCJY1JCUDnV1dcTOg2/B oyjGB7Epo5ExdRrg+3Sr/Nu4fet+oqu5sWN3jP74sR97hWVe09AWRc4IGs/4BHVCpOA+ XTL2mevGy4D8MPkmABxzFDFB808CDnAbt4jiUmoc+9/tBUpH2ab6gax5u9fVT0AFJ4bV OYPQ65TdAMlTB/jO++QUiFT5qJiFpqUFGcHr5VR5UeRS+2UC5eWQfsQ7MUGc5KQaKZXb VF+w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Gm-Message-State: AA+aEWaMYUkpaIuG8BDuX0s8FWm54MSOAxUzfs+wxLlryitUW0iZF4/t TWbnb0JXejlJTQIDR85KZ7IiBCE28IOPBeCm+kka8ptao9GqXBien7ab1Jpa7c3VLkB2Md0PE6H ufhkrLoMdsUrDaZdVZrus2BFSIEENma7GhRfDXegOObf6Cq294cialCaxZtE/ucN1vg== X-Received: by 2002:a50:8e95:: with SMTP id w21mr14840216edw.198.1544538609492; Tue, 11 Dec 2018 06:30:09 -0800 (PST) X-Google-Smtp-Source: AFSGD/XIW90ehpQl+MKuWP3tdW7poZ7psyjwovO+7BAoP1hf6U9YIwASxb96ZTPySzWebEkPeIB6 X-Received: by 2002:a50:8e95:: with SMTP id w21mr14840156edw.198.1544538608489; Tue, 11 Dec 2018 06:30:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544538608; cv=none; d=google.com; s=arc-20160816; b=TTL7F51WgPkQDWj/47w82Aby9p58y2SLzwpd7LSP2rJ075Rba+fHF+2LoRF4TLuPvp 3D0I31l+E2JRdSTKmOdRXR9/1R0g5OVI65bU1DN/LGRijM3RzzKUGfDcEK8l/F0OK8xg ev/QegHC1V1YvSoMEwdf9lOCNdPchE2IZ4MiTsXFCFonBsnHJY4UI6PGuJ41ITccMRRF AfUzP0yjv3bZ7LH/ZnA4YCuQDhyexRD8NAJu+gB6A6JuzLnJpwNpKrWhazYt/8cyVv+z yPcO993l5daZAzKml907RTERkBQaOHCGZ5bSeAHZpvHhntkcTmX8sJ/6O/kbmQZMMQgG D/Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=9GziqcA7np7N0DaqyW6zeEbZLCaCpd0e+vQXIDmrbnQ=; b=OSfuYgyAelVyGxv0/Y3VBgUloVtIY/gp+JjWCYZdTlAJgw3O2Za9pluRPyykMD+gbX 4GfQaJ3S2mAknpaEI7fELxrYXQEuIFEdEJ+Lw2rW4yZZuVMe9Um288PPIAjqhsWl+uyT YCP/IIySvdcnKuhnSurNG9ZKz9bo9xmdqss1BwHgHomp+TGuGBPUWE4qk6EfY5/oArfI jZDGU0h03Ao4fpIIplBACGOzYze+InKny6DdYVSQ/pHuHfdiw+WSjOGm3+LICpTzHDZv 5MYxQE/9pcozUQVyAhG99iJIR846npHNJc4RyKvluT7Rv6lbTr5Z25J2pkdIme0GeJEM Qspg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id v17si605767edl.345.2018.12.11.06.30.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Dec 2018 06:30:08 -0800 (PST) Received-SPF: pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C284EB03B; Tue, 11 Dec 2018 14:30:07 +0000 (UTC) From: Vlastimil Babka To: David Rientjes , Andrea Arcangeli , Mel Gorman Cc: Michal Hocko , Linus Torvalds , linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Subject: [RFC 3/3] mm, compaction: introduce deferred async compaction Date: Tue, 11 Dec 2018 15:29:41 +0100 Message-Id: <20181211142941.20500-4-vbabka@suse.cz> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20181211142941.20500-1-vbabka@suse.cz> References: <20181211142941.20500-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Deferring compaction happens when it fails to fulfill the allocation request at given order, and then a number of the following direct compaction attempts for same or higher orders is skipped; with further failures, the number grows exponentially up to 64. This is reset e.g. when compaction succeeds. Until now, defering compaction is only performed after a sync compaction fails, and then it also blocks async compaction attempts. The rationale is that only a failed sync compaction is expected to fully exhaust all compaction potential of a zone. However, for THP page faults that use __GFP_NORETRY, this means only async compaction is attempted and thus it is never deferred, potentially resulting in pointless reclaim/compaction attempts in a badly fragmented node. This patch therefore tracks and checks async compaction deferred status in addition, and mostly separately from sync compaction. This allows deferring THP fault compaction without affecting any sync pageblock-order compaction. Deferring for sync compaction however implies deferring for async compaction as well. When deferred status is reset, it is reset for both modes. The expected outcome is less compaction/reclaim activity for failing THP faults likely with some expense on THP fault success rate. Signed-off-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: David Rientjes Cc: Mel Gorman Cc: Michal Hocko --- include/linux/compaction.h | 10 ++-- include/linux/mmzone.h | 6 +-- include/trace/events/compaction.h | 29 ++++++----- mm/compaction.c | 80 ++++++++++++++++++------------- 4 files changed, 71 insertions(+), 54 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 68250a57aace..f1d4dc1deec9 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -100,11 +100,11 @@ extern void reset_isolation_suitable(pg_data_t *pgdat); extern enum compact_result compaction_suitable(struct zone *zone, int order, unsigned int alloc_flags, int classzone_idx); -extern void defer_compaction(struct zone *zone, int order); -extern bool compaction_deferred(struct zone *zone, int order); +extern void defer_compaction(struct zone *zone, int order, bool sync); +extern bool compaction_deferred(struct zone *zone, int order, bool sync); extern void compaction_defer_reset(struct zone *zone, int order, bool alloc_success); -extern bool compaction_restarting(struct zone *zone, int order); +extern bool compaction_restarting(struct zone *zone, int order, bool sync); /* Compaction has made some progress and retrying makes sense */ static inline bool compaction_made_progress(enum compact_result result) @@ -189,11 +189,11 @@ static inline enum compact_result compaction_suitable(struct zone *zone, int ord return COMPACT_SKIPPED; } -static inline void defer_compaction(struct zone *zone, int order) +static inline void defer_compaction(struct zone *zone, int order, bool sync) { } -static inline bool compaction_deferred(struct zone *zone, int order) +static inline bool compaction_deferred(struct zone *zone, int order, bool sync) { return true; } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 847705a6d0ec..4c59996dd4f9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -492,9 +492,9 @@ struct zone { * are skipped before trying again. The number attempted since * last failure is tracked with compact_considered. */ - unsigned int compact_considered; - unsigned int compact_defer_shift; - int compact_order_failed; + unsigned int compact_considered[2]; + unsigned int compact_defer_shift[2]; + int compact_order_failed[2]; #endif #if defined CONFIG_COMPACTION || defined CONFIG_CMA diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index 6074eff3d766..7ef40c76bfed 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -245,9 +245,9 @@ DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_suitable, DECLARE_EVENT_CLASS(mm_compaction_defer_template, - TP_PROTO(struct zone *zone, int order), + TP_PROTO(struct zone *zone, int order, bool sync), - TP_ARGS(zone, order), + TP_ARGS(zone, order, sync), TP_STRUCT__entry( __field(int, nid) @@ -256,45 +256,48 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template, __field(unsigned int, considered) __field(unsigned int, defer_shift) __field(int, order_failed) + __field(bool, sync) ), TP_fast_assign( __entry->nid = zone_to_nid(zone); __entry->idx = zone_idx(zone); __entry->order = order; - __entry->considered = zone->compact_considered; - __entry->defer_shift = zone->compact_defer_shift; - __entry->order_failed = zone->compact_order_failed; + __entry->considered = zone->compact_considered[sync]; + __entry->defer_shift = zone->compact_defer_shift[sync]; + __entry->order_failed = zone->compact_order_failed[sync]; + __entry->sync = sync; ), - TP_printk("node=%d zone=%-8s order=%d order_failed=%d consider=%u limit=%lu", + TP_printk("node=%d zone=%-8s order=%d order_failed=%d consider=%u limit=%lu sync=%d", __entry->nid, __print_symbolic(__entry->idx, ZONE_TYPE), __entry->order, __entry->order_failed, __entry->considered, - 1UL << __entry->defer_shift) + 1UL << __entry->defer_shift, + __entry->sync) ); DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred, - TP_PROTO(struct zone *zone, int order), + TP_PROTO(struct zone *zone, int order, bool sync), - TP_ARGS(zone, order) + TP_ARGS(zone, order, sync) ); DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_compaction, - TP_PROTO(struct zone *zone, int order), + TP_PROTO(struct zone *zone, int order, bool sync), - TP_ARGS(zone, order) + TP_ARGS(zone, order, sync) ); DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_reset, - TP_PROTO(struct zone *zone, int order), + TP_PROTO(struct zone *zone, int order, bool sync), - TP_ARGS(zone, order) + TP_ARGS(zone, order, sync) ); #endif diff --git a/mm/compaction.c b/mm/compaction.c index 7c607479de4a..cb139b63a754 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -139,36 +139,40 @@ EXPORT_SYMBOL(__ClearPageMovable); * allocation success. 1 << compact_defer_limit compactions are skipped up * to a limit of 1 << COMPACT_MAX_DEFER_SHIFT */ -void defer_compaction(struct zone *zone, int order) +void defer_compaction(struct zone *zone, int order, bool sync) { - zone->compact_considered = 0; - zone->compact_defer_shift++; + zone->compact_considered[sync] = 0; + zone->compact_defer_shift[sync]++; - if (order < zone->compact_order_failed) - zone->compact_order_failed = order; + if (order < zone->compact_order_failed[sync]) + zone->compact_order_failed[sync] = order; - if (zone->compact_defer_shift > COMPACT_MAX_DEFER_SHIFT) - zone->compact_defer_shift = COMPACT_MAX_DEFER_SHIFT; + if (zone->compact_defer_shift[sync] > COMPACT_MAX_DEFER_SHIFT) + zone->compact_defer_shift[sync] = COMPACT_MAX_DEFER_SHIFT; - trace_mm_compaction_defer_compaction(zone, order); + trace_mm_compaction_defer_compaction(zone, order, sync); + + /* deferred sync compaciton implies deferred async compaction */ + if (sync) + defer_compaction(zone, order, false); } /* Returns true if compaction should be skipped this time */ -bool compaction_deferred(struct zone *zone, int order) +bool compaction_deferred(struct zone *zone, int order, bool sync) { - unsigned long defer_limit = 1UL << zone->compact_defer_shift; + unsigned long defer_limit = 1UL << zone->compact_defer_shift[sync]; - if (order < zone->compact_order_failed) + if (order < zone->compact_order_failed[sync]) return false; /* Avoid possible overflow */ - if (++zone->compact_considered > defer_limit) - zone->compact_considered = defer_limit; + if (++zone->compact_considered[sync] > defer_limit) + zone->compact_considered[sync] = defer_limit; - if (zone->compact_considered >= defer_limit) + if (zone->compact_considered[sync] >= defer_limit) return false; - trace_mm_compaction_deferred(zone, order); + trace_mm_compaction_deferred(zone, order, sync); return true; } @@ -181,24 +185,32 @@ bool compaction_deferred(struct zone *zone, int order) void compaction_defer_reset(struct zone *zone, int order, bool alloc_success) { - if (alloc_success) { - zone->compact_considered = 0; - zone->compact_defer_shift = 0; - } - if (order >= zone->compact_order_failed) - zone->compact_order_failed = order + 1; + int sync; + + for (sync = 0; sync <= 1; sync++) { + if (alloc_success) { + zone->compact_considered[sync] = 0; + zone->compact_defer_shift[sync] = 0; + } + if (order >= zone->compact_order_failed[sync]) + zone->compact_order_failed[sync] = order + 1; - trace_mm_compaction_defer_reset(zone, order); + trace_mm_compaction_defer_reset(zone, order, sync); + } } /* Returns true if restarting compaction after many failures */ -bool compaction_restarting(struct zone *zone, int order) +bool compaction_restarting(struct zone *zone, int order, bool sync) { - if (order < zone->compact_order_failed) + int defer_shift; + + if (order < zone->compact_order_failed[sync]) return false; - return zone->compact_defer_shift == COMPACT_MAX_DEFER_SHIFT && - zone->compact_considered >= 1UL << zone->compact_defer_shift; + defer_shift = zone->compact_defer_shift[sync]; + + return defer_shift == COMPACT_MAX_DEFER_SHIFT && + zone->compact_considered[sync] >= 1UL << defer_shift; } /* Returns true if the pageblock should be scanned for pages to isolate. */ @@ -1555,7 +1567,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro * Clear pageblock skip if there were failures recently and compaction * is about to be retried after being deferred. */ - if (compaction_restarting(zone, cc->order)) + if (compaction_restarting(zone, cc->order, sync)) __reset_isolation_suitable(zone); /* @@ -1767,7 +1779,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, enum compact_result status; if (prio > MIN_COMPACT_PRIORITY - && compaction_deferred(zone, order)) { + && compaction_deferred(zone, order, + prio != COMPACT_PRIO_ASYNC)) { rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); continue; } @@ -1789,14 +1802,15 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, break; } - if (prio != COMPACT_PRIO_ASYNC && (status == COMPACT_COMPLETE || - status == COMPACT_PARTIAL_SKIPPED)) + if (status == COMPACT_COMPLETE || + status == COMPACT_PARTIAL_SKIPPED) /* * We think that allocation won't succeed in this zone * so we defer compaction there. If it ends up * succeeding after all, it will be reset. */ - defer_compaction(zone, order); + defer_compaction(zone, order, + prio != COMPACT_PRIO_ASYNC); /* * We might have stopped compacting due to need_resched() in @@ -1966,7 +1980,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) if (!populated_zone(zone)) continue; - if (compaction_deferred(zone, cc.order)) + if (compaction_deferred(zone, cc.order, true)) continue; if (compaction_suitable(zone, cc.order, 0, zoneid) != @@ -2000,7 +2014,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) * We use sync migration mode here, so we defer like * sync direct compaction does. */ - defer_compaction(zone, cc.order); + defer_compaction(zone, cc.order, true); } count_compact_events(KCOMPACTD_MIGRATE_SCANNED,