From patchwork Fri Nov 23 11:45:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 10695659 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3333715A7 for ; Fri, 23 Nov 2018 11:45:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 223642C02D for ; Fri, 23 Nov 2018 11:45:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1610B2C90E; Fri, 23 Nov 2018 11:45:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 28D612C02D for ; Fri, 23 Nov 2018 11:45:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB4A06B2CED; Fri, 23 Nov 2018 06:45:31 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E2C926B2CF4; Fri, 23 Nov 2018 06:45:31 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB7CC6B2CF7; Fri, 23 Nov 2018 06:45:31 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 6ED056B2CED for ; Fri, 23 Nov 2018 06:45:31 -0500 (EST) Received: by mail-ed1-f71.google.com with SMTP id e12so5666866edd.16 for ; Fri, 23 Nov 2018 03:45:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=IKm/R0S7bZst9J+J0zHdd9OKY7hrE57OnF6pM/5odZg=; b=Ye0Vwi4McNj9FZygJunEKLj8xXsd8vBOw7ED4snrZioGO61al5eAMTmqwRsWMmgQOr giPr6PubuwFVUtKY5U3Xw8bXz0o4GuTpS1qVd+jOX2auZ0C8bSxEAB3zP9P2qybQLXik uLJMPL05QZA8kkungCeoD7B/blKNUKkUpBqlS1+4R+J6x9gjXKNPGR8GaOaqqvcVl48C rWVLCER5PR1oc1FHWI3DUCXk61bCrD2Fs5fjuCVG8EEHi3LViTY7K+nRY2RgpDXLIaYO 395T5X22FKA+bDVpErcoTvXPQETimQxiPbXr9hm553cOQ4ZUtiLv4DSsCy/YnKtwpHLa gG2Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.195 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Gm-Message-State: AA+aEWZnVbyKjvi5RRDwG3gdjI82wDDc9d7cU0bh6W9f80xqzuLmTE9j Z4HCdPjFHOW3aDM+OOPotAVqXGn5FH+w66gPqPQeIsRBwMsc8zO/wIM3Y14Gs4cNtpAQidz8wrO iUrxmxFWr+QmK6/1tQye2nVXV1Sql6QIJgLPp1f5SWn8zeF2G6bLQ3HOFQ1wYDwqTag== X-Received: by 2002:a50:9106:: with SMTP id e6mr12288443eda.148.1542973530911; Fri, 23 Nov 2018 03:45:30 -0800 (PST) X-Google-Smtp-Source: AFSGD/U/iyo3kaXEWO4+qnkQeJRx2kC+7joqhlbvCSQpacFsCfZ/ltQ6CY/SHgV/XyPSUzWnGvqK X-Received: by 2002:a50:9106:: with SMTP id e6mr12288386eda.148.1542973529644; Fri, 23 Nov 2018 03:45:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542973529; cv=none; d=google.com; s=arc-20160816; b=lx6kAK+UgrZ7OtYKE69tNCi4lgMy5LPi5X/PUsMq7QvY9DtuUvg42Um5inZpPGhbco vgGdTCUUWrvwGY21tNKs4C6s7MedE2+KKFK5NzhvyiK+X0LvxWkaMpHuCMbTnFVtITE/ JlQ/jucyTItaLKynl4/202wHFPZZhiWlJIM75yqKCTkoyp9sXRP+EbGSaNPTkrTNURYz pfj4JGgCJ9rW51lUsXp+UFASpq8rl1sPlMUw+THO5Pp4uEHiYDWRdU81Kf3GfbvjUYcl YqNizBzIFxmVrkIgI2562IoVmCVd40Og2GbCAjFamHfCn9I8XYf4eJc6XrdPNF4jNrIy /6Bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from; bh=IKm/R0S7bZst9J+J0zHdd9OKY7hrE57OnF6pM/5odZg=; b=ERDf1OOYgsp59ZfYnjWzu5IYCkhFpzSOG/IUvAFdeujyNndmY3K7zmrhSbvM9gNABC D0ahNcfSacHkDzMq1sAAEFzX07sGpAMpXj7EQ7X+OOkae06juHXYvVHQpKwAYRogma7I c2mXkbBX6TrF+bUOjfRZnuFtTofigSkkMaGZ2SkrRarJWE7k4OhHk98oac6Ev4tmaI4L jfpAaZB2590odFfXOtSCImFBVZ8XbEUNrD/5ziYjcYsjhRJe+AxfENubGjxXEcWQFxnm jPuaNuVYsm++CJE6aIAuPPTYB0M6YhhdQm/dvlDPdtR6hKXJdWVD3GgCZgN+qh2VCGgR KLHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.195 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from outbound-smtp27.blacknight.com (outbound-smtp27.blacknight.com. [81.17.249.195]) by mx.google.com with ESMTPS id b6si1776621edc.315.2018.11.23.03.45.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Nov 2018 03:45:29 -0800 (PST) Received-SPF: pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.195 as permitted sender) client-ip=81.17.249.195; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.195 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp27.blacknight.com (Postfix) with ESMTPS id 36C38B88AC for ; Fri, 23 Nov 2018 11:45:29 +0000 (GMT) Received: (qmail 12233 invoked from network); 23 Nov 2018 11:45:29 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.69]) by 81.17.254.9 with ESMTPA; 23 Nov 2018 11:45:29 -0000 From: Mel Gorman To: Andrew Morton Cc: Vlastimil Babka , David Rientjes , Andrea Arcangeli , Zi Yan , Michal Hocko , LKML , Linux-MM , Mel Gorman Subject: [PATCH 0/5] Fragmentation avoidance improvements v5 Date: Fri, 23 Nov 2018 11:45:23 +0000 Message-Id: <20181123114528.28802-1-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There are some big changes due to both Vlastimil's review feedback on v4 and some oddities spotted while answering his review. In some respects, the series is slightly less effective but the approach is more consistent and logical overall. The overhead is also lower from the first patch and stalls are less harmful in the last patch so overall I think it has much improved. Changelog since v4 o Clarified changelogs in response to review o Add a compile-time check on where Normal and DMA32 is (vbabka) o Restart zone iteration properly in get_page_from_freelist (vbabka) o Reduce overhead in the page allocation fast path (mel) o Do not over-boost due to a fragmentation event (vbabka) o Correct documentation of sysctl (mel) o Really do not wake kswapd if the calling context forbids it (vbabka,mel) o Do not shrink slab if boosting watermarks as premature reclaim of slab can lead to regressions in IO benchmarks (mel) o Take zone lock when boosting watermarks if necessary (vbabka) Changelog since v3 o Rebase to 4.20-rc3 o Remove a stupid warning from the last patch Changelog since v2 o Drop patch 5 as it was borderline o Decrease timeout when stalling on fragmentation events Changelog since v1 o Rebase to v4.20-rc1 for the THP __GFP_THISNODE patch in particular o Add tracepoint to record fragmentation stall durations o Add vmstat event to record that a fragmentation stall occurred o Stalls now alter watermark boosting o Stalls occur only when the allocation is about to fail It has been noted before that fragmentation avoidance (aka anti-fragmentation) is not perfect. Given sufficient time or an adverse workload, memory gets fragmented and the long-term success of high-order allocations degrades. This series defines an adverse workload, a definition of external fragmentation events (including serious) ones and a series that reduces the level of those fragmentation events. The details of the workload and the consequences are described in more detail in the changelogs. However, from patch 1, this is a high-level summary of the adverse workload. The exact details are found in the mmtests implementation. The broad details of the workload are as follows; 1. Create an XFS filesystem (not specified in the configuration but done as part of the testing for this patch) 2. Start 4 fio threads that write a number of 64K files inefficiently. Inefficiently means that files are created on first access and not created in advance (fio parameterr create_on_open=1) and fallocate is not used (fallocate=none). With multiple IO issuers this creates a mix of slab and page cache allocations over time. The total size of the files is 150% physical memory so that the slabs and page cache pages get mixed 3. Warm up a number of fio read-only threads accessing the same files created in step 2. This part runs for the same length of time it took to create the files. It'll fault back in old data and further interleave slab and page cache allocations. As it's now low on memory due to step 2, fragmentation occurs as pageblocks get stolen. 4. While step 3 is still running, start a process that tries to allocate 75% of memory as huge pages with a number of threads. The number of threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP threads contending with fio, any other threads or forcing cross-NUMA scheduling. Note that the test has not been used on a machine with less than 8 cores. The benchmark records whether huge pages were allocated and what the fault latency was in microseconds 5. Measure the number of events potentially causing external fragmentation, the fault latency and the huge page allocation success rate. 6. Cleanup Overall the series reduces external fragmentation causing events by over 94% on 1 and 2 socket machines, which in turn impacts high-order allocation success rates over the long term. There are differences in latencies and high-order allocation success rates. Latencies are a mixed bag as they are vulnerable to exact system state and whether allocations succeeded so they are treated as a secondary metric. Patch 1 uses lower zones if they are populated and have free memory instead of fragmenting a higher zone. It's special cased to handle a Normal->DMA32 fallback with the reasons explained in the changelog. Patch 2-4 boosts watermarks temporarily when an external fragmentation event occurs. kswapd wakes to reclaim a small amount of old memory and then wakes kcompactd on completion to recover the system slightly. This introduces some overhead in the slowpath. The level of boosting can be tuned or disabled depending on the tolerance for fragmentation vs allocation latency. Patch 5 stalls some movable allocation requests to let kswapd from patch 4 make some progress. The duration of the stalls is very low but it is possible to tune the system to avoid fragmentation events if larger stalls can be tolerated. The bulk of the improvement in fragmentation avoidance is from patches 1-4 but patch 5 can deal with a rare corner case and provides the option of tuning a system for THP allocation success rates in exchange for some stalls to control fragmentation. Documentation/sysctl/vm.txt | 44 +++++++ include/linux/mm.h | 2 + include/linux/mmzone.h | 14 ++- include/linux/vm_event_item.h | 1 + include/trace/events/kmem.h | 21 ++++ kernel/sysctl.c | 18 +++ mm/compaction.c | 2 +- mm/internal.h | 15 ++- mm/page_alloc.c | 263 ++++++++++++++++++++++++++++++++++++++---- mm/vmscan.c | 136 ++++++++++++++++++++-- mm/vmstat.c | 1 + 11 files changed, 473 insertions(+), 44 deletions(-)