From patchwork Thu Nov 30 15:36:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Schatzberg X-Patchwork-Id: 13474635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 357ABC4167B for ; Thu, 30 Nov 2023 15:38:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 947D98D004D; Thu, 30 Nov 2023 10:38:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F69F8D0001; Thu, 30 Nov 2023 10:38:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7BF478D004D; Thu, 30 Nov 2023 10:38:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6649C8D0001 for ; Thu, 30 Nov 2023 10:38:17 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 34307A02D5 for ; Thu, 30 Nov 2023 15:38:17 +0000 (UTC) X-FDA: 81515027034.24.063ACC2 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by imf30.hostedemail.com (Postfix) with ESMTP id 2595180013 for ; Thu, 30 Nov 2023 15:38:14 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=U0TEjAFA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of schatzberg.dan@gmail.com designates 209.85.222.176 as permitted sender) smtp.mailfrom=schatzberg.dan@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701358695; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=iHt5GWpmh8+9rwkGRbQdmAOL6TBISUyjYNzzHw7DKOw=; b=TDMZ83RnDvj9LBMbEH4lVMPhA3RRtZygRhFGaO/9tfuaa5e3EYVZmRwnxuQGZEghIIJ+EB wmAYDC46oRDxBXIz/7iKu6BDwruWruXJq+T3P5PoZCN93KpF0a6141CKc0RHgyGH8dbsoP aSUROx5tCbqjZ1UuRhTHn7yoE0yr1wY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=U0TEjAFA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of schatzberg.dan@gmail.com designates 209.85.222.176 as permitted sender) smtp.mailfrom=schatzberg.dan@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701358695; a=rsa-sha256; cv=none; b=5DYIdXPyLyNmWIzKo+mivKnZW62aHqCn7MLJYo4LP3FTsiVW9MpJdjnZMeq1ffLw6VoWlW VMTph8U+jLlYUiInh/hYENUskxV9GcM7voX84ZQGegheg6FcRKl+by3il9z1WrhjS47nKA GdACrlZfsxojxM39bFKAYTBX9zDeKMI= Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-77d708e4916so48385385a.3 for ; Thu, 30 Nov 2023 07:38:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701358694; x=1701963494; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=iHt5GWpmh8+9rwkGRbQdmAOL6TBISUyjYNzzHw7DKOw=; b=U0TEjAFABgx/Tv7sLdZehxgCmeCDNThxUbBDmgYZYEwQANnhTJvliWV/Fu2mgX+2mp S/9Iotv+yY4omXj/nnS2v7vhLGqGftS8NvseGhBOU1X8AlVJjHYa8BEsLocyeJEh3jwZ /D7odpgXVubzxXWz2zRrgGfQ7QqIgZGXPeqKMsoHjDlUdRGF5mVeZFH9maTbIm51mWlu kn1HtAKQsyKErGm2ZJyZ0nLJdgadSaCjG33yye6qY9ult2zXq+t0dC0/K6MsZzTIWCds Tk733Hxdbs9xRru+2BGrl5x6znzFXZ6mIzYIFzsjyaCQlv0AN0+TSyIA6Lab2vbPPENs IIBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701358694; x=1701963494; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iHt5GWpmh8+9rwkGRbQdmAOL6TBISUyjYNzzHw7DKOw=; b=JquX5OAjfj6o4izWFdRPKf/q8LF5CGF/L9MZHhkaBTssv5IHJNtZGpic/fIpS52j18 72nmKg82uSpsGiJLc8t34ZK+dvRc8AX2jJdc3QKGRxP/axh+azxx1pcCiM1/VWiJZMoR 1tRcXWNdOca1GKKXtgSWiFsmsq+wWm/Y09v+8WzQQg7/bNGelwEaUtJO5Lct9mrhjVmQ rUC9PUwqlN8tZvq0WzbALeCQre0b9cwyc8GK+GtrlmVCq8oQcyUn6qVWfQBUMDuw5n/g 4VgQpdzmzbW5c/YJJGNaJJjjKP3ZeksLPTER7ZPbzd5/hDvxsaGdkSItarzu++OasWYO JpyQ== X-Gm-Message-State: AOJu0YxBTS80JG/xJqZKDWvXF4NMRsvdGn9ikCx/nrR9VmKGa+VX9hIE TH3xmPFMLT0xq0C5FRTcpiI= X-Google-Smtp-Source: AGHT+IG+YeVQU9bOA8JsYBiNqgPoYpCg7li0RkAm6JM0YUNx3+QG1A+NWtwQO9eyBgDbPmFnHT4o8g== X-Received: by 2002:a05:620a:3913:b0:77d:8659:ee65 with SMTP id qr19-20020a05620a391300b0077d8659ee65mr25383275qkn.67.1701358693897; Thu, 30 Nov 2023 07:38:13 -0800 (PST) Received: from localhost ([2620:10d:c091:500::4:edc5]) by smtp.gmail.com with ESMTPSA id bi32-20020a05620a31a000b0077d9f83e691sm593708qkb.35.2023.11.30.07.38.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 07:38:13 -0800 (PST) From: Dan Schatzberg To: Johannes Weiner , Roman Gushchin , Yosry Ahmed , Huan Yang Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Kefeng Wang , Peter Xu , "Vishal Moola (Oracle)" , Yue Zhao , Hugh Dickins Subject: [PATCH 0/1] Add swappiness argument to memory.reclaim Date: Thu, 30 Nov 2023 07:36:53 -0800 Message-Id: <20231130153658.527556-1-schatzberg.dan@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 2595180013 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: z35furyghzz9ahfffje9jrbhbepn6nog X-HE-Tag: 1701358694-110576 X-HE-Meta: U2FsdGVkX18zZn0Py3Yx5WtfbTaABDXaq+d239JSrHQQAAF56hQ60KV+J7wpBBnfukVJynGar0g1PQdbEoyu8CaZbSq9leZWvf4ArpCp2sNpbc3ELDa5Bb8uMEqXXhBicIKkBAPQhdBcIZppOPiQIt/lcvYTh5gk31YALdvrw5PTwybvJ+6SPr8RBrZNBm7z+TUenWTHuttJ/IjShA2pResauT3MBk0TT2K0CdyZ0c1SGWBiS6gmCjYZF5svZ8G2KRmjTKupQZol2lROBdTljf0bDFLFmkCCcsHYUMn1tukvdoiQSxRXgn1z2EgI5ggqFnOQGJ6/8Zt3YB74Uir1UryEqgBgbNoZiBhjE1gSYTm08kaHHJvGcHJQqlmgfRTb2VfByLl5QkU1/jgRFn5B1p7Vv1aQrhodaLyfv8R4Gh565mxV9aZYtw0q+jyS20JSxtHkut/CRIplU5kvktQn8SGi5dVnWAyHUy83EvaQu65Huoi6Rxgdl3+7eLcElKdGEu2IyAnZjn2k9RmfMd15U9PpANiwo9oTvcUb1mN03YlZi1zu2omar5iyAp0YkfXVg4chQwWoPEEXe1QYUjjg0o4zDrRViTUh6ANgvFCuyWgVFd2S5JymBeRm1oCXLkqmYbLkzXBJ7EwxJhJXMWdB7AzKnl/0DV6wyWuU6qM4/77qOi+tE/STMgaKlJ1NQsax+7xpEsqcnAyOCDGh7KCSwoE9NNm0sE+oOEA2OxpzoiBaAdgRnDiIobmZ7V9JUB/TcoX475VtC1NUGJDZv4pKshI/IbVqwGUHF6gMMeCg5KTxEdZ+tY150tKW5xtFx3qAwokeIwgBhyE6E6zEbUsf2sdW3YGEE6EZoDbCZCaRa3mpKpCEi/WxJF462OtYyLZmYpOlIdIm9+Cv78xBzW1eqWUKvuLZeWyKmbJSdfBdPUibB5W48QwM9RW+GTE+uVnng4kC24gQELT+sLzkjOb DdvQxd5f 6IsCuhrU4SVVpKzKFZbqvOYNqfTq2KesXBe5NtjG2Zsf9SHixH1jpBPdMwrmqf9/IF6nG+5p8z778rtA3MqqBc7bKn6ABXgxFjAu8+qFt26uZP/3y6/UkafcDzO9M3Y8ATfL8gDDhOVbontKhnfhTfSAzd+ZktLfD287jKp1sH8JUFzkH8RcGZmIJD3CwWITZUNpbiYZB1Cdyn+XHH+fl45p6P4Uba4MXbmiaSB3S78MyzeLKKCX/plskN7527bL0KoO0NTqnLouMEgCEsPAg3xOGd3uWkuD1mqgtSgdpqZwXVLq62RXJ99VG1PnTKOmXW/Xc5MqhMWetXmmxNipOR7GP7yUqbET5MiCZTAtUCBoNzgl8SHGl8zdUs4fzzoF/WvWIKa+ha6HqbHJa9Mr7WNdr1m3E6fX8E3r7n/uW6SkLbAc5u/U+g4kmtalRtAw/nzAZsbgMzH0AWpamNWcHna7L6jhmlpHDTkAGacJ8qIMm9VUKkLSSsSTjoVqZmhfBSp68854ZN/Xl9x2rku39pFPocFDfwpWwbwlX7d4wg6titoXS1NcPXnguSS2QBrTF8jrxpnOAJhCG2vpX+dcNZjk6WFd/wnoY837Y7ZvqfAd9++k339+keUn8wn993niFXQrl2zYWqz3Ulr0+a7Mh9gsrZJsJkcrdA48E6cIt0z/K504C9xBDDUKxMFWNqc+T/LPwxNuZAqvJJzVq+Coo+a8kKg4PAVu6mKAClAuA4kJbHO7Smk78JPRoXb3UePQaitsFdS9ASBBNiLQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: (Sorry for the resend - forgot to cc the mailing lists) This patch proposes augmenting the memory.reclaim interface with a swappiness= argument that overrides the swappiness value for that instance of proactive reclaim. Userspace proactive reclaimers use the memory.reclaim interface to trigger reclaim. The memory.reclaim interface does not allow for any way to effect the balance of file vs anon during proactive reclaim. The only approach is to adjust the vm.swappiness setting. However, there are a few reasons we look to control the balance of file vs anon during proactive reclaim, separately from reactive reclaim: * Swapout should be limited to manage SSD write endurance. In near-OOM situations we are fine with lots of swap-out to avoid OOMs. As these are typically rare events, they have relatively little impact on write endurance. However, proactive reclaim runs continuously and so its impact on SSD write endurance is more significant. Therefore it is desireable to control swap-out for proactive reclaim separately from reactive reclaim * Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap exhaustion. This makes sense if the swap exhaustion is triggered due to reactive reclaim but less so if it is triggered due to proactive reclaim (e.g. one could see OOMs when free memory is ample but anon is just particularly cold). Therefore, it's desireable to have proactive reclaim reduce or stop swap-out before the threshold at which OOM killing occurs. In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before writes to memory.reclaim[2]. This has been in production for nearly two years and has addressed our needs to control proactive vs reactive reclaim behavior but is still not ideal for a number of reasons: * vm.swappiness is a global setting, adjusting it can race/interfere with other system administration that wishes to control vm.swappiness. In our case, we need to disable Senpai before adjusting vm.swappiness. * vm.swappiness is stateful - so a crash or restart of Senpai can leave a misconfigured setting. This requires some additional management to record the "desired" setting and ensure Senpai always adjusts to it. With this patch, we avoid these downsides of adjusting vm.swappiness globally. Previously, this exact interface addition was proposed by Yosry[3]. In response, Roman proposed instead an interface to specify precise file/anon/slab reclaim amounts[4]. More recently Huan also proposed this as well[5] and others similarly questioned if this was the proper interface. Previous proposals sought to use this to allow proactive reclaimers to effectively perform a custom reclaim algorithm by issuing proactive reclaim with different settings to control file vs anon reclaim (e.g. to only reclaim anon from some applications). Responses argued that adjusting swappiness is a poor interface for custom reclaim. In contrast, I argue in favor of a swappiness setting not as a way to implement custom reclaim algorithms but rather to bias the balance of anon vs file due to differences of proactive vs reactive reclaim. In this context, swappiness is the existing interface for controlling this balance and this patch simply allows for it to be configured differently for proactive vs reactive reclaim. Specifying explicit amounts of anon vs file pages to reclaim feels inappropriate for this prupose. Proactive reclaimers are un-aware of the relative age of file vs anon for a cgroup which makes it difficult to manage proactive reclaim of different memory pools. A proactive reclaimer would need some amount of anon reclaim attempts separate from the amount of file reclaim attempts which seems brittle given that it's difficult to observe the impact. [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598 [3]https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/ [4]https://lore.kernel.org/linux-mm/YoPHtHXzpK51F%2F1Z@carbon/ [5]https://lore.kernel.org/lkml/20231108065818.19932-1-link@vivo.com/ Dan Schatzberg (1): mm: add swapiness= arg to memory.reclaim include/linux/swap.h | 3 ++- mm/memcontrol.c | 55 +++++++++++++++++++++++++++++++++++--------- mm/vmscan.c | 13 +++++++++-- 3 files changed, 57 insertions(+), 14 deletions(-)