[5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

An event that potentially causes external fragmentation problems has
already been described but there are degrees of severity.  A "serious"
event is defined as one that steals a contiguous range of pages of an order
lower than fragment_stall_order (PAGE_ALLOC_COSTLY_ORDER by default). If a
movable allocation request that is allowed to sleep needs to steal a small
block then it schedules until kswapd makes progress or a timeout passes.
The watermarks are also boosted slightly faster so that kswapd makes
greater effort to reclaim enough pages to avoid the fragmentation event.

This stall is not guaranteed to avoid serious fragmentation events.
If memory pressure is high enough, the pages freed by kswapd may be
reallocated or the free pages may not be in pageblocks that contain
only movable pages. Furthermore an allocation request that cannot stall
(e.g. atomic allocations) or unmovable/reclaimable allocations will still
proceed without stalling. The reason is that movable allocations can be
migrated and stalling for kswapd to make progress means that compaction
has targets. Unmovable/reclaimable allocations on the other hand do not
benefit from stalling as their pages cannot move.

The worst-case scenario for stalling is a combination of both high memory
pressure where kswapd is having trouble keeping free pages over the
pfmemalloc_reserve and movable allocations are fragmenting memory. In this
case, an allocation request may sleep for longer. There are both vmstats
to identify stalls are happening and a tracepoint to quantify what the
stall durations are. Note that the granularity of the stall detection is
a jiffy so the delay accounting is not precise.

1-socket Skylake machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 1 THP allocating thread
--------------------------------------

4.20-rc3 extfrag events < order 9:   804694
4.20-rc3+patch:                      408912 (49% reduction)
4.20-rc3+patch1-4:                    18421 (98% reduction)
4.20-rc3+patch1-5:                    16788 (98% reduction)

                                   4.20.0-rc3             4.20.0-rc3
                                   boost-v5r8             stall-v5r8
Amean     fault-base-1      652.71 (   0.00%)      651.40 (   0.20%)
Amean     fault-huge-1      178.93 (   0.00%)      174.49 *   2.48%*

thpfioscale Percentage Faults Huge
                              4.20.0-rc3             4.20.0-rc3
                              boost-v5r8             stall-v5r8
Percentage huge-1        5.12 (   0.00%)        5.56 (   8.77%)

Fragmentation events are further reduced. Note that in previous versions,
it was reduced to negligible levels but the logic has been corrected
to avoid exceessive reclaim and slab shrinkage in the meantime to avoid
IO regressions that may not be tolerable.

The latencies and allocation success rates are roughly similar.  Over the
course of 16 minutes, there were 2 stalls due to fragmentation avoidance
for 8 microseconds.

1-socket Skylake machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------

4.20-rc3 extfrag events < order 9:  291392
4.20-rc3+patch:                     191187 (34% reduction)
4.20-rc3+patch1-4:                   13464 (95% reduction)
4.20-rc3+patch1-5:                   15089 (99.7% reduction)

                                   4.20.0-rc3             4.20.0-rc3
                                   boost-v5r8             stall-v5r8
Amean     fault-base-1     1481.67 (   0.00%)        0.00 * 100.00%*
Amean     fault-huge-1     1063.88 (   0.00%)      540.81 *  49.17%*

                              4.20.0-rc3             4.20.0-rc3
                              boost-v5r8             stall-v5r8
Percentage huge-1       83.46 (   0.00%)      100.00 (  19.82%)

The fragmentation events were increased which is bad, but this is offset
by the fact that THP allocation rates had a lower latency and a perfect
allocation success rate. There were 102 stalls over the course of 16
minutes for a total stall time of roughly 0.4 seconds.

2-socket Haswell machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 5 THP allocating threads
----------------------------------------------------------------

4.20-rc3 extfrag events < order 9:  215698
4.20-rc3+patch:                     200210 (7% reduction)
4.20-rc3+patch1-4:                   14263 (93% reduction)
4.20-rc3+patch1-5:                   11702 (95% reduction)

                                   4.20.0-rc3             4.20.0-rc3
                                   boost-v5r8             stall-v5r8
Amean     fault-base-5     1306.87 (   0.00%)     1340.96 (  -2.61%)
Amean     fault-huge-5     1348.94 (   0.00%)     2089.44 ( -54.89%)

                              4.20.0-rc3             4.20.0-rc3
                              boost-v5r8             stall-v5r8
Percentage huge-5        7.91 (   0.00%)        2.43 ( -69.26%)

There is a slight reduction in fragmentation events but it's slight
enough that it may be due to luck. Unfortunately, both the latencies
and success rates were lower. However, this is highly likely to be due
to luck given that there were just 12 stalls for 76 microseconds. Direct
reclaim was also eliminated but that is likely a co-incidence.

2-socket Haswell machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------

4.20-rc3 extfrag events < order 9: 166352
4.20-rc3+patch:                    147463 (11% reduction)
4.20-rc3+patch1-4:                  11095 (93% reduction)
4.20-rc3+patch1-5:                  10677 (94% reduction)

thpfioscale Fault Latencies
                                   4.20.0-rc3             4.20.0-rc3
                                   boost-v5r8             stall-v5r8
Amean     fault-base-5     7419.67 (   0.00%)     6853.97 (   7.62%)
Amean     fault-huge-5     3263.80 (   0.00%)     1799.26 *  44.87%*

                              4.20.0-rc3             4.20.0-rc3
                              boost-v5r8             stall-v5r8
Percentage huge-5       87.98 (   0.00%)       98.97 (  12.49%)

The fragmentation events are slightly reduced with the latencies and
allocation success rates much improved.  There were 462 stalls over the
course of 68 minutes with a total stall time of roughly 1.9 seconds.

This patch has a marginal rate on fragmentation rates as it's rare for
the stall logic to actually trigger but the small stalls can be enough for
kswapd to catch up. How much that helps is variable but probably worthwhile
for long-term allocation success rates. It is possible to eliminate
fragmentation events entirely with tuning due to this patch although that
would require careful evaluation to determine if it's worthwhile.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 Documentation/sysctl/vm.txt   |  23 +++++++++
 include/linux/mm.h            |   1 +
 include/linux/mmzone.h        |   2 +
 include/linux/vm_event_item.h |   1 +
 include/trace/events/kmem.h   |  21 +++++++++
 kernel/sysctl.c               |  10 ++++
 mm/internal.h                 |   1 +
 mm/page_alloc.c               | 105 +++++++++++++++++++++++++++++++++++++-----
 mm/vmscan.c                   |   3 +-
 mm/vmstat.c                   |   1 +
 10 files changed, 155 insertions(+), 13 deletions(-)

Message ID	20181123114528.28802-6-mgorman@techsingularity.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EFAB15A7 for <patchwork-linux-mm@patchwork.kernel.org>; Fri, 23 Nov 2018 11:45:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7C4732C02D for <patchwork-linux-mm@patchwork.kernel.org>; Fri, 23 Nov 2018 11:45:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 709BF2C90C; Fri, 23 Nov 2018 11:45:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 067802C8F6 for <patchwork-linux-mm@patchwork.kernel.org>; Fri, 23 Nov 2018 11:45:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD0846B2CF6; Fri, 23 Nov 2018 06:45:33 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C5CC86B2CF8; Fri, 23 Nov 2018 06:45:33 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A85196B2CFB; Fri, 23 Nov 2018 06:45:33 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 2D5396B2CF6 for <linux-mm@kvack.org>; Fri, 23 Nov 2018 06:45:33 -0500 (EST) Received: by mail-ed1-f69.google.com with SMTP id q8so495621edd.8 for <linux-mm@kvack.org>; Fri, 23 Nov 2018 03:45:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Ujg01Lt7yAbbzPRkYscevKi/FB583OCiwcyMatxCsLw=; b=G1OrSPQBTgBuTh1WjG+kLKDn5ldkGKuYy0nRE7f6ZAx5p2nbdCRxGTM/7x/i4hrbE7 7zPEoaKpAXCMnsAkH/lB9Fwp91iLyEvSAwkXA3CNS1OWOQ9yQJ0xyXhvbqlQTDFA5zBW koaQI++VpjTT89XhDKTfiBPErMmX61hObNVvZGGj1hoyLV/tGEXnhp9TpZbqp54OJTPz pyktaxgyBpCsYag4+RwiIZ+mn6/kPBJSQw4t0uTgmkdqHJ7bh4h5OiUpNn2HAXuyiDsw RaI0ab73YoPH9IDJr3I78FXRXiZgc1xugzyAiq2xu8Yvug2rXcFCPEklipGa+VLIhYHV rlsg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.233 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Gm-Message-State: AA+aEWb9tMwqius8gdUZwv5Ku/OHlbcMnnCwn8ulCknAlELG+bg/3y1K 2dUEQLoJa7mdisNiY+EbIffzNjcFizpTv6U24FNDF51bLzarSctTjB6dftiTfXN5lPFssHbG/W/ YplGhev0nvl4u/h/LFsd7SmBbbzSyfL1CR7e3IqFSPSTWvT2aS4gt9ZYMYMpH0LS85g== X-Received: by 2002:a50:a5e2:: with SMTP id b31mr12635998edc.5.1542973532599; Fri, 23 Nov 2018 03:45:32 -0800 (PST) X-Google-Smtp-Source: AFSGD/WoFSa7smH1pEmo9U4euV93ApZmwnT0bQK/YHtzmOWxdj6HA1idpVfVTco7FTPP0hnJOxCX X-Received: by 2002:a50:a5e2:: with SMTP id b31mr12635893edc.5.1542973530662; Fri, 23 Nov 2018 03:45:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542973530; cv=none; d=google.com; s=arc-20160816; b=dVrqOTYDZWeMhsxeBkRXQ+1hOOXOqwizr1rNV6qkomQVI51vhUSYOWnsM1IYzltqFE NYzmT9Jx+9uRIKVp5MEqLvgBqP6lKPHa59BEMoMvbHxziFxTUXVj+/a7pUA252ROyqzE +ghMDNB97xoykbIP+w7h/zQilhhUg3bV7br/GXu+11U8tlTEP5/MYWTzzkbPi0mA4fdc Bnu5nuEZF1Lktc+tgykGiKv9xEsKi9LLEn7Qedv8GgE0MF6rqzqHuerqYmBt5H1oMbXi fIX43l8oEkFHV6JKhHNOfg2MH3LdDIGgTSzBouEjd5CSPHps9OZrNAphji0CR3T0nqew LNpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Ujg01Lt7yAbbzPRkYscevKi/FB583OCiwcyMatxCsLw=; b=YqmR7pGw6t4w+HHp6SfYWohGGvy1orri259/wQ5ZB9DKTDSCNaDnLzXneRa/Yoq1Js LGAnbQKt0I+7fGy+S7SBUDAaq2KLN1Mps7ykw1ZSqaEWMCsgLEqT752DtikHuKE4kt1x fTzn39mqtUtYIIV0a6jj9F0EvPr6c+h36pSQv8HdsVbrmX48eqbAv9eNSSjlBedXzi6Z X1c/hr0HGxHQ7Zd6AhQHF3X3BXkwI49UHVFk8Nm6gPHgmqqxW1SWU6RSH529wzHnJXK8 qTShICjiwLIikMFfy1rVoPVxCwyKYqR5rmXzZeZaw1q8WEoIHibTXeplN6ZARQRDaV3Q 2W1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.233 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from outbound-smtp16.blacknight.com (outbound-smtp16.blacknight.com. [46.22.139.233]) by mx.google.com with ESMTPS id d1si13139940edb.435.2018.11.23.03.45.30 for <linux-mm@kvack.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Nov 2018 03:45:30 -0800 (PST) Received-SPF: pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.233 as permitted sender) client-ip=46.22.139.233; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.233 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp16.blacknight.com (Postfix) with ESMTPS id 3EC961C2CBC for <linux-mm@kvack.org>; Fri, 23 Nov 2018 11:45:30 +0000 (GMT) Received: (qmail 12400 invoked from network); 23 Nov 2018 11:45:30 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.69]) by 81.17.254.9 with ESMTPA; 23 Nov 2018 11:45:30 -0000 From: Mel Gorman <mgorman@techsingularity.net> To: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz>, David Rientjes <rientjes@google.com>, Andrea Arcangeli <aarcange@redhat.com>, Zi Yan <zi.yan@cs.rutgers.edu>, Michal Hocko <mhocko@kernel.org>, LKML <linux-kernel@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>, Mel Gorman <mgorman@techsingularity.net> Subject: [PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event Date: Fri, 23 Nov 2018 11:45:28 +0000 Message-Id: <20181123114528.28802-6-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181123114528.28802-1-mgorman@techsingularity.net> References: <20181123114528.28802-1-mgorman@techsingularity.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org> X-Virus-Scanned: ClamAV using ClamSMTP
Series	Fragmentation avoidance improvements v5 \| expand [0/5] Fragmentation avoidance improvements v5 [1/5] mm, page_alloc: Spread allocations across zones before introducing fragmentation [2/5] mm: Move zone watermark accesses behind an accessor [3/5] mm: Use alloc_flags to record if kswapd can wake [4/5] mm: Reclaim small amounts of memory when an external fragmentation event occurs [5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation e…

[5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

Commit Message

Comments

Patch