From patchwork Mon Jul 29 02:35:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13744203 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22FFCC3DA49 for ; Mon, 29 Jul 2024 02:36:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E4736B008A; Sun, 28 Jul 2024 22:36:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 794B16B008C; Sun, 28 Jul 2024 22:36:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65CC76B0092; Sun, 28 Jul 2024 22:36:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 426206B008A for ; Sun, 28 Jul 2024 22:36:13 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id EF22AA364B for ; Mon, 29 Jul 2024 02:36:12 +0000 (UTC) X-FDA: 82391225784.28.A5D4EBC Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf27.hostedemail.com (Postfix) with ESMTP id 3D26C40025 for ; Mon, 29 Jul 2024 02:36:11 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="TEKZYwV/"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722220568; a=rsa-sha256; cv=none; b=ErKorzBwRWhLy9fWU12YrzfZ8vHeTezD9DmxflcO7GQrqG7axZAfPvoRpdbGB42H1Q1hng YX0JPm9xSVDoMzleagsIdqSp+54s5DIlBkzC848G0EmSzdzm44PjeAeSe/jrt9rhXoKzVj 1tnvbEueCCAH9VSSjHZsFHqOR4ELEko= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="TEKZYwV/"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722220568; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=aXVH3MGelMqTo8FqzCo7aPHl60S4OtZmZ2LnBKxMJFM=; b=3Fk3HBjXWVO7Wv26q8wBm3naw1HconFv92DbxxHb1BIl1rs1Nqt7O3q2kzqhfyoZXFHx+r mvp4iOSst14tr/WY9kYqE9hwr1XHmeDLsEUz11PPIWokpA64pvPzclnOuVcqvMC4d/EV8S 2J/cs3GajUwguYG/yM3vYDTs4iin/wc= Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-7a211a272c2so1579904a12.1 for ; Sun, 28 Jul 2024 19:36:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722220570; x=1722825370; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=aXVH3MGelMqTo8FqzCo7aPHl60S4OtZmZ2LnBKxMJFM=; b=TEKZYwV/mVWBVyWcUnxwFC9/Gpcgw1bNTAGLH9OBL7qsTnrc/wf27vW4Sfd99H24Mz 6oA9mw8r5TInki4auj6yhtnEQ82nGXmtbnYTC6IE6XugX8q0Pae+5YmimFrUkIn96xGk UgjrBRXM9bHBp2SSt/+4QRaphJ+lJT7lPMyS8fnB9N+Eetah+p6iDa3mEqUTOXIbYK7y X8VCEXeYnuGMhO9MHZpAPlP042xWMkqsRraXbYSn5r/5Yo+RWSO27px4wZL/2sVBQWRc nqS0K7K3Ar2Qncx0RLpzsUdz+n/msrtXg6WquWXcWxRG87Nn2Zv4lK8LHX5LGpm1+KGs 4JKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722220570; x=1722825370; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=aXVH3MGelMqTo8FqzCo7aPHl60S4OtZmZ2LnBKxMJFM=; b=pOF4df898C+2QFtCRMV55OX/FmgX6e1XPjHTtEgz7O6ITiTy9UmjuiGzXQVrEWB7FK JjM6UTl2S0Z3243wUGGXo5CUxHxScUoYViOXT4uw+pK1JsGL4enMmh+DKkbRniYrhM4w cWy8aUZmbwVOidlwvgqABGIXer/vH7E+H4FBpL7xo90flVZ0DahKNLAdH1lHY5zLFLnk cmAWPDLafBrTehdqwzxoptpMmYIs8XP+odvQksRcN7OGZFhKdC66riLZvyHXMsuG6xsh IxfLTb+aksRve3Tmjb62j7hymfdy+erI7xErBQzybavNmlI4xc8my5SwC8OmU5bwp5y5 9FQw== X-Forwarded-Encrypted: i=1; AJvYcCWleFlI91d/15h4MIe81OhWBEz1bkpOTF3ta6+cgOXxjz6vCdZrPyNTtcM9v+rd9nH8vpJ1X7tNfu1LL6fDaDdQuos= X-Gm-Message-State: AOJu0Yy/VhBB/8XSftAQMEv3KSWawAfsVl1vhJsYPFcufy954S+q7Ip+ JcEF1qycAiUHS86tRQRtgGomv0iLER287/aIpw39kz6QAblc9Z164mSuwnM+imIpYQ== X-Google-Smtp-Source: AGHT+IE8KD5bF3q+bt3g3jS7BmLWud4ZfEry1FIitTv/JSQy1r7EfmMyCzKhjLhOSYeE57CRWg74Ow== X-Received: by 2002:a17:90b:4388:b0:2c7:49b4:7e3a with SMTP id 98e67ed59e1d1-2cf7cead4ebmr9713345a91.7.1722220569762; Sun, 28 Jul 2024 19:36:09 -0700 (PDT) Received: from localhost.localdomain ([223.104.210.31]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7a9f7c6ff4bsm5335673a12.4.2024.07.28.19.36.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Jul 2024 19:36:09 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org Cc: ying.huang@intel.com, mgorman@techsingularity.net, linux-mm@kvack.org, Yafang Shao Subject: [PATCH v2 0/3] mm: Introduce a new sysctl knob vm.pcp_batch_scale_max Date: Mon, 29 Jul 2024 10:35:29 +0800 Message-Id: <20240729023532.1555-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 3D26C40025 X-Rspamd-Server: rspam01 X-Stat-Signature: g4fe7w5g3gui9uziz984pmk7m5z7tds1 X-HE-Tag: 1722220571-121703 X-HE-Meta: U2FsdGVkX1+SSpj86BVwtgGDaJdUlIFLI7xVhXn8Z1QJo3b5zE5IFmVgFH5TnEM8kizWyHH5h3Cwfke4dXz9eUZn9PkJgvSDD15IPaB8RDF8q2Y3QnjmBCFipiQdtFiPv2zLN2miyTecBx8/UdbHSViBnyp3+fsPeFzQDPrcEjw77GoUQS97R7k2vt5BIKmGnh8r45X1Cvp4R3OqGO/1gOPaIKiJnk3TdtwVUf3pH6MgNLiCiQKAKgX8pBuyF9KanaqkqfuxmJnqWafS/y0SL71m/s2O5datezNtUpW+xf0/P0oWXRGla44WAW4oxJBmKP3qqCHfZdjeahONxaC94JD9uZ//oxe9GIH9Xki3aNrRAR7Lq1xhn7RRvk5t4aZxuMQQ+k2OozjND3Fz9sufu5HFEMzz+BxO3lqwK2Qi6CHT7HqaCsSsGVnUMxX/KkL2WsoKgRLBUGp70udh8abDuQGPMegMFTpjJM3NglrMIkqQH34aAQNlq+/KyETjC/V4kAaGpI/3wu3bIaA3iUgkBOxKyUQ6fUEYmotJ/aprBYyYesQjy+jMAe2XkBLjdvDG/HmEIi+C2g3yFYr+1ja54379hSR93xb99vkjyC6Q4ES/RTbTBg8SSyRmHG9O3KCA8d00MqDlByZwhBttb+5SsDfeLB4WErDxNoLO6BlH13ovFsjkIX6atGyd2+nzT866JBwlcUDTaSmYhQWWYlzEmojNIjeRIcXzuvlyIDbu25ZzbmYYU02TYmXSTRFkUzXwhq/EPCF6vdHBL87M5+Prr0ufq9CIkYYd9ShvVX0BrdqUMiIaB1qdJfDHXgCgNa4o6i/O5fCsLRnAX+LKjEKRwRCEFlKawupWikJxEtyFohhV3qkVX67spl/xuBO5f9NXxX9h8xmW6NUSNFM5B2qIj8y8iAAveYZT79vG56kXGlmWL5wcLy0HgZfy8E7folC5SHkgYbJHKEE039fdsya sl/lt7r6 qnHfb3E+Fm/HkZ4oFlcrH2dNZtqBf0oCJcSRcfA3ldCWwOfZ1U1ma+VD6Uj61KU1Tig7nSYMoU5jGMKu/BnQnqoVbDmazmvFKl6VpNZBlHLkXpO6oQLpJofsDCw0uVE0Ha2b2/Yc6Nj4vmYmpOsrItkjKNH/AX1P0PPhiT/gyXyiFFippjv6CIZvNmidNvuuawe4IOesLPLgSvdhtLBzVO540OiZKdV3siGaxnYM7BwVQFpH6BQcv+ViYbTAF9KTTz1elbzoTbUT6Rp6lloGKKyqNJ3imoqg40rto0o8R/EZMF/yL+qflr3x5lLPvPWcyKc/ep2zPDRnqFk/mAiziNg8XTYn8zERBUs+FPx20zZ8HZoi1etd0EBBgrs6nJZnZdJDt95gqtXfxEm7wSU/g3x8jyaWgu1ZMn0kmUGbk/a02z5npiNYuMTusV7dNJ8yw6GVsKbwiQ41QJtUOquW+HwNCUmqDyd+ZxLK+Jt+xUI2keijmOpwduBEgfS5cmejGLzi6jEO3u5wAZYVZ7z1LHMSuwDRWy/gUr/7iKjwKMVxIIiKC3HoQNX3Yp7oT0XJ415hOfZNGEiJ6vDfXLzHVnOWgMBlrv3OKZLffAJTgjtFwMDZ38cUdq2W4cpIsQeiERw8JcAuazXT6mMCTJhyfBQWwZRiAQzT5s8ZzOtxsT3FxC9WkvUHrwc6TjXDW8E/y8FVYTfLo1gqks9NDizzlTPOjXGGL7hH83Up0DV4hwPpzkGgltf1MDtLT4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.009203, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Background ========== In our containerized environment, we have a specific type of container that runs 18 processes, each consuming approximately 6GB of RSS. These processes are organized as separate processes rather than threads due to the Python Global Interpreter Lock (GIL) being a bottleneck in a multi-threaded setup. Upon the exit of these containers, other containers hosted on the same machine experience significant latency spikes. Investigation ============= Duration my investigation on this issue, I found the latency spikes were caused by the zone->lock contention. That can be illustrated as follows, CPU A (Freer) CPU B (Allocator) lock zone->lock free pages lock zone->lock unlock zone->lock alloc pages unlock zone->lock If the Freer holds the zone->lock for an extended period, the Allocator has to wait and thus latency spikes occures. I also wrote a python script to reproduce it on my test servers. See the dedails in patch #3. It is worth to note that the reproducer is based on the upstream kernel. Experimenting ============= As the more pages to be freed in one batch, the long the duration will be. So my attempt involves reducing the batch size. After I restrict the batch to the smallest size, there is no complains on the latency spikes any more. However, duration my experiment, I found that the CONFIG_PCP_BATCH_SCALE_MAX is hard to use in practice. So I try to improve it in this series. The Proposal ============ This series encompasses two minor refinements to the PCP high watermark auto-tuning mechanism, along with the introduction of a new sysctl knob that serves as a more practical alternative to the previous configuration method. Future work =========== To ultimately mitigate the zone->lock contention issue, several suggestions have been proposed. One approach involves dividing large zones into multi smaller zones, as suggested by Matthew[0], while another entails splitting the zone->lock using a mechanism similar to memory arenas and shifting away from relying solely on zone_id to identify the range of free lists a particular page belongs to, as suggested by Mel[1]. However, implementing these solutions is likely to necessitate a more extended development effort. Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [0] Link: https://lore.kernel.org/linux-mm/20240705130943.htsyhhhzbcptnkcu@techsingularity.net/ [1] Changes: - v1-> v2: Commit log refinement - v1: mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max https://lwn.net/Articles/981069/ - mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the minimum pagelist https://lore.kernel.org/linux-mm/20240701142046.6050-1-laoar.shao@gmail.com/ Yafang Shao (3): mm/page_alloc: A minor fix to the calculation of pcp->free_count mm/page_alloc: Avoid changing pcp->high decaying when adjusting CONFIG_PCP_BATCH_SCALE_MAX mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++ mm/Kconfig | 11 ------- mm/page_alloc.c | 40 ++++++++++++++++++------- 3 files changed, 47 insertions(+), 21 deletions(-)