From patchwork Fri Sep 1 06:21:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abel Wu X-Patchwork-Id: 13372076 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F027CA0FE1 for ; Fri, 1 Sep 2023 06:23:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 310948D0017; Fri, 1 Sep 2023 02:23:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C1678D0002; Fri, 1 Sep 2023 02:23:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1880A8D0017; Fri, 1 Sep 2023 02:23:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0A7008D0002 for ; Fri, 1 Sep 2023 02:23:34 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C7CF140155 for ; Fri, 1 Sep 2023 06:23:33 +0000 (UTC) X-FDA: 81187037106.29.787464D Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf15.hostedemail.com (Postfix) with ESMTP id 2C32AA002A for ; Fri, 1 Sep 2023 06:23:30 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=g1SSyaSO; spf=pass (imf15.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693549412; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=tjw92k56Ri2yqyAITH3a1oe12KEblf9XsB4dIokF8Vk=; b=WK6KbkyqD4aPTXgmAoM+yokF3wIULCpYQ8W+aIEViVCrApw2lSdE+2BE/KqL5Qm3Zl4s4S hSXNLr5cyk7fqjojMQuou4wEhbndw8nLHwNRd4AHufBZ4RtEJxS58xfDF9+DVvBY9FMCZy QH+YXs06ZAdBMkDk22dK/f5sA+MA6+I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693549412; a=rsa-sha256; cv=none; b=oiqg6Tav4BiVrX+IOk60RttocheMMqnOdX6WTLjJsPB95niwkP2a9Sw5h6duzy0ZX0edkk NSPO3TjP29vBYzIYkVS4pvV/S/I6IN23ouyBOVmhSXtdbUUXARxbZse5zPlHB1YaOqYuey pTFzebeniaT2LxXzFvr86zvdQBd93fE= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=g1SSyaSO; spf=pass (imf15.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-68a3b66f350so1388777b3a.3 for ; Thu, 31 Aug 2023 23:23:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1693549409; x=1694154209; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=tjw92k56Ri2yqyAITH3a1oe12KEblf9XsB4dIokF8Vk=; b=g1SSyaSOO4nZIK652/zbW0J5xBpEG8pulQaaprSlFb1qhB/K7c4693OmUPDYfAoe3t dNLTIQ2DgxhGmZfaSzkFVSK7ctX/EcR5dKx+z80QGh2J1ZMzDZnI5/+X2c0yOPtL6GAC 60neWUa76OCjmPjjs99iT32W3XeIn4UfXaTpPVYABSjR0Hhhd+kjT6KrS/yrnXP/nVcp lyg4Q6ocMPpnFgH1mPr8dExjjF1gN5XnxTfUtYWHJhEn0Zu3osdf0tSqwaipHzCnbevb LTcvhuO3fbWwh9TXvFBf5BA7TRf/O4bQqX9ZPg13S210SLUQtEm3KaMy4KxjeQ/MX6Ra yLEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693549409; x=1694154209; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tjw92k56Ri2yqyAITH3a1oe12KEblf9XsB4dIokF8Vk=; b=cVGTk0lHExqva+P9b34P//GYD/KWb4+ZzwtLfoSpBZhn1TKeprulGBSAA69J/VLJMy sAqi6kGy3nBjPZyuzrTthGENuTvW9dFZFUf4yS50zabn6Ib/F21B95iTr8CdGCpIxYb8 efhwJO5aEUqPjrUyM87PtLooSxiurZTpGodCMG5pq5Zxd8ectDgsjannBiCjvHJvIcRC eg+VqpnvorxaJWmsfWv6egl6maS46vOMOgR9z90ekrg0S5aFJaB/tvtgdu0tphQbEtpC sPN3iYzOV2CxcyJlEvcoRMqY9nyzrhEvwoWQu/TNzTqPgDG3NyrRh1ZTVeLpPSYetwFx u/ug== X-Gm-Message-State: AOJu0YxUgUVImhyuHGNgFvOX8Wnp9VZ5ArplEimvX0T0eLzya/IgaTES fltZPX+D9xA0LZ5qEdwX1ZdxXA== X-Google-Smtp-Source: AGHT+IHUbkndcqYN5Fqk+WG8Dl045p7LcRZXXy94NITFmvTlqRnH/O0RmNO1dAmoECbDkkyZszkUQg== X-Received: by 2002:a05:6a20:431c:b0:137:23a2:2b3c with SMTP id h28-20020a056a20431c00b0013723a22b3cmr2131246pzk.49.1693549409588; Thu, 31 Aug 2023 23:23:29 -0700 (PDT) Received: from C02DV8HUMD6R.bytedance.net ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id fm19-20020a056a002f9300b0068c1ac1784csm2223265pfb.59.2023.08.31.23.23.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Aug 2023 23:23:29 -0700 (PDT) From: Abel Wu To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Andrew Morton , Shakeel Butt , Roman Gushchin , Michal Hocko , Johannes Weiner , Yosry Ahmed , "Matthew Wilcox (Oracle)" , Yu Zhao , Kefeng Wang , Abel Wu , Yafang Shao , Kuniyuki Iwashima , Martin KaFai Lau , Breno Leitao , Alexander Mikhalitsyn , David Howells , Jason Xing Cc: linux-kernel@vger.kernel.org (open list), netdev@vger.kernel.org (open list:NETWORKING [GENERAL]), linux-mm@kvack.org (open list:MEMORY MANAGEMENT) Subject: [RFC PATCH net-next 0/3] sock: Be aware of memcg pressure on alloc Date: Fri, 1 Sep 2023 14:21:25 +0800 Message-Id: <20230901062141.51972-1-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 MIME-Version: 1.0 X-Rspamd-Queue-Id: 2C32AA002A X-Rspam-User: X-Stat-Signature: ep1ao1k8qnwp1spor6e8fq3bsspbwakf X-Rspamd-Server: rspam03 X-HE-Tag: 1693549410-822691 X-HE-Meta: U2FsdGVkX19kwzGAy/rhu1tscIhCKdezUL/vbd1KSdnqLDbA9inck16cqs8kvz7uZeUdlDvRQKi+AjnTG+AHklEA4YRdXJlfbKnIkeIZfpHgm2RvqZ3fmhVXnMblL8XhsNB5jLrhsTATPqv58oPY/H6n8rL4/mv518zED6zfi1p0/tRAuMP5MxzHBjx9L6p7w4/46GAI9o9sdrUeztXBHn4Xup+uSicSLqnsfczKN366KBXvquhHkdBf+MGkAB3xA9H73oJu+aZaGDwQ9ccHJdd4PMTsH7bExTQOZuHh3mAju0ZxqwIqp2VFSlmytMgoAyTXZa5RtoIdFRgUi55I7o3+OLSI4FqjKvamwzZQuox3S3gH+y9nug87z/UbQ3a0fUpgzEwGEyvEDrKFHrlQIX4UYdc+Tg4gCKmZ/qPRTebn0ZIcWJwwc3ieEQq36mZYB/6SciwXdJSpSYSRz8KY1bYSSWcSNFGON9VuKV08ayQHXUT2MxF3G6H4ZocvJRcgSxr6xFmhoYptPV+kHgm1OrmZPwaQd1Ase2IiqguydgerjFni50OYFnIupQeU/09WYFzksHM01t3pm2TsFAt1G0Go4Yyvt7yRRhG6RmBqB3HWY2Dgz7xLdn5YzeqTwbZHCIjgmXvqijDzMQv4i7Ihy3dlSkXe9Rps6XyRS7px3KuzwlweiQAyZnuLr9C07fS4V9Unt1LKB+Su74PJ+N+2d60YkQovv26B46WIeQrP64dR6Htu8ynZ3F9FbrAA3yyRE69hbXPvdNWcmHej0hiW0IwY9ChGj13f/uvy/dS2IaoQRXlMPORuANhLYvLWPTNEKC60vnh+nJJQvhb4qzg/uCoCwa0m53td/nHE2HLI9TfExkeM0YqNtnG7giwsjmAlTLmUkO1TGKPSj8trGoyjnI0Mh1zQxm+zs2PS33UDO3gkBmf6gc8WxCXgfyPuUtfCfg6dx3PMcTs4CdcJgbZ unDx1dVn yJ4GoESucPpc2/Et6PlOpgZUPqmX41IqwRniWmUUJZ+JZKVNHvQ1VNQYxPCDW6UI8ioFPunIdHsu0e1kkOdAuNkyqF70tSLOLfP5fv91asCbgBN3HzUs5Z/QKagnIs2Ok+VXJNoJHmIh0bY8Rz1EDCmXyzAIyFXAUG4pPHsYn9ydz5WYmTkbbiVymKlfe/wmiVlFOVnycFLEUK6+p6u6kUWnzmnuJSgq5IqMEJmWIUdTh0tclj4oooHVknNNYWx1ul0Y7WHEt8/oNw5N1jAYPOvqXWt4lIrxnh/uDNReoraaQj4SQkGu0bK4NcEDu+ZqDAbJGCoavHItCMeXFV/NVZIdO2yeu7tRcLTGbGitunOhsH57wGK8bgVi0Anb7m+Dmj1TL1Ic7RlVXoK6SDlPrvosNVJJFOeohwErMTUvLbhiA+kQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As a cloud service provider, we encountered a problem in our production environment during the transition from cgroup v1 to v2 (partly due to the heavy taxes of accounting socket memory in v1). Say one workload behaves fine in cgroupv1 with memcg limit configured to 10GB memory and another 1GB tcpmem, but will suck (or even be OOM-killed) in v2 with 11GB memory due to burst memory usage on socket, since there is no specific limit for socket memory in cgroupv2 and relies largely on workloads doing traffic control themselves. It's rational for the workloads to build some traffic control to better utilize the resources they bought, but from kernel's point of view it's also reasonable to suppress the allocation of socket memory once there is a shortage of free memory, given that performance degradation is usually better than failure. This patchset aims to be more conservative on alloc for pressure-aware sockets under global and/or memcg pressure, to avoid further memstall or possibly OOM in such case. The patchset includes: 1/3: simple code cleanup, no functional change intended. 2/3: record memcg pressure level to enable fine-grained control. 3/3: throttle alloc for pressure-aware sockets under pressure. The whole patchset focuses on the pressure-aware protocols, and should have no/little impact on pressure-unaware protocols like UDP etc. Tested on Intel Xeon(R) Platinum 8260, a dual socket machine containing 2 NUMA nodes each of which has 24C/48T. All the benchmarks are done inside a separate memcg in a clean host. baseline: net-next c639a708a0b8 compare: baseline + patchset case load baseline(std%) compare%( std%) tbench-loopback thread-24 1.00 ( 0.50) -0.98 ( 0.87) tbench-loopback thread-48 1.00 ( 0.76) -0.29 ( 0.92) tbench-loopback thread-72 1.00 ( 0.75) +1.51 ( 0.14) tbench-loopback thread-96 1.00 ( 4.11) +1.29 ( 3.73) tbench-loopback thread-192 1.00 ( 3.52) +1.44 ( 3.30) TCP_RR thread-24 1.00 ( 1.87) -0.87 ( 2.40) TCP_RR thread-48 1.00 ( 0.92) -0.22 ( 1.61) TCP_RR thread-72 1.00 ( 2.35) +2.42 ( 2.27) TCP_RR thread-96 1.00 ( 2.66) -1.37 ( 3.02) TCP_RR thread-192 1.00 ( 13.25) +0.29 ( 11.80) TCP_STREAM thread-24 1.00 ( 1.26) -0.75 ( 0.87) TCP_STREAM thread-48 1.00 ( 0.29) -1.55 ( 0.14) TCP_STREAM thread-72 1.00 ( 0.05) -1.59 ( 0.05) TCP_STREAM thread-96 1.00 ( 0.19) -0.06 ( 0.29) TCP_STREAM thread-192 1.00 ( 0.23) -0.01 ( 0.28) UDP_RR thread-24 1.00 ( 2.27) +0.33 ( 2.82) UDP_RR thread-48 1.00 ( 1.25) -0.30 ( 1.21) UDP_RR thread-72 1.00 ( 2.54) +2.99 ( 2.34) UDP_RR thread-96 1.00 ( 4.76) +2.49 ( 2.19) UDP_RR thread-192 1.00 ( 14.43) -0.02 ( 12.98) UDP_STREAM thread-24 1.00 (107.41) -0.48 (106.93) UDP_STREAM thread-48 1.00 (100.85) +1.38 (100.59) UDP_STREAM thread-72 1.00 (103.43) +1.40 (103.48) UDP_STREAM thread-96 1.00 ( 99.91) -0.25 (100.06) UDP_STREAM thread-192 1.00 (109.83) -3.67 (104.12) As patch 3 moves forward traversal of cgroup hierarchy for pressure-aware protocols, which could turn a conditional overhead into constant, tests running inside 5-level-depth cgroups are also performed. case load baseline(std%) compare%( std%) tbench-loopback thread-24 1.00 ( 0.59) +0.68 ( 0.09) tbench-loopback thread-48 1.00 ( 0.16) +0.01 ( 0.26) tbench-loopback thread-72 1.00 ( 0.34) -0.67 ( 0.48) tbench-loopback thread-96 1.00 ( 4.40) -3.27 ( 4.84) tbench-loopback thread-192 1.00 ( 0.49) -1.07 ( 1.18) TCP_RR thread-24 1.00 ( 2.40) -0.34 ( 2.49) TCP_RR thread-48 1.00 ( 1.62) -0.48 ( 1.35) TCP_RR thread-72 1.00 ( 1.26) +0.46 ( 0.95) TCP_RR thread-96 1.00 ( 2.98) +0.13 ( 2.64) TCP_RR thread-192 1.00 ( 13.75) -0.20 ( 15.42) TCP_STREAM thread-24 1.00 ( 0.21) +0.68 ( 1.02) TCP_STREAM thread-48 1.00 ( 0.20) -1.41 ( 0.01) TCP_STREAM thread-72 1.00 ( 0.09) -1.23 ( 0.19) TCP_STREAM thread-96 1.00 ( 0.01) +0.01 ( 0.01) TCP_STREAM thread-192 1.00 ( 0.20) -0.02 ( 0.25) UDP_RR thread-24 1.00 ( 2.20) +0.84 ( 17.45) UDP_RR thread-48 1.00 ( 1.34) -0.73 ( 1.12) UDP_RR thread-72 1.00 ( 2.32) +0.49 ( 2.11) UDP_RR thread-96 1.00 ( 2.36) +0.53 ( 2.42) UDP_RR thread-192 1.00 ( 16.34) -0.67 ( 14.06) UDP_STREAM thread-24 1.00 (106.55) -0.70 (107.13) UDP_STREAM thread-48 1.00 (105.11) +1.60 (103.48) UDP_STREAM thread-72 1.00 (100.60) +1.98 (101.13) UDP_STREAM thread-96 1.00 ( 99.91) +2.59 (101.04) UDP_STREAM thread-192 1.00 (135.39) -2.51 (108.00) As expected, no obvious performance gain or loss observed. As for the issue we encountered, this patchset provides better worst-case behavior that such OOM cases are reduced at some extent. While further fine- grained traffic control is what the workloads need to think about. Comments are welcomed! Thanks! Abel Wu (3): sock: Code cleanup on __sk_mem_raise_allocated() net-memcg: Record pressure level when under pressure sock: Throttle pressure-aware sockets under pressure include/linux/memcontrol.h | 39 +++++++++++++++++++++++++---- include/net/sock.h | 2 +- include/net/tcp.h | 2 +- mm/vmpressure.c | 9 ++++++- net/core/sock.c | 51 +++++++++++++++++++++++++++++--------- 5 files changed, 83 insertions(+), 20 deletions(-)