From patchwork Thu Nov 7 23:56:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13867334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54D41D5D699 for ; Thu, 7 Nov 2024 23:56:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E52B6B00A8; Thu, 7 Nov 2024 18:56:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9955E6B00A9; Thu, 7 Nov 2024 18:56:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E79A8D0001; Thu, 7 Nov 2024 18:56:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5A1AF6B00A8 for ; Thu, 7 Nov 2024 18:56:34 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C1DF01605FA for ; Thu, 7 Nov 2024 23:56:33 +0000 (UTC) X-FDA: 82760959470.19.5680FF5 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf13.hostedemail.com (Postfix) with ESMTP id 9F07B2001E for ; Thu, 7 Nov 2024 23:55:54 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=axR6ouNE; spf=pass (imf13.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731023565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kGp398/OGKFadKTdBHGJCLRTWw479TJgb4XjwbwuJpk=; b=shaEv7NmPyIt9iS/5O2MvswxYaAj3fIfVEsDIKvk8twb/T1ORbS9m6yP1k86/er5rIrhZM /OQYXmVFEOW93o4VTQO7Dznv+0moKBn2jDReBQfED4C7KwhtTuWepS4uJkA4LmAL8ibuuL 56vTK4fd15zkSGQ4u/y70zBX5LcRzq0= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=axR6ouNE; spf=pass (imf13.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731023565; a=rsa-sha256; cv=none; b=YeX3fbEPde8I2Fk21qIZoce69OKKnC+x4Tm1YEX8LQLQXJIkrPGqy5JAstuJbulR/dF7eq O0Ug77BzjU2hekU8rsiAL5h3PUBbRbzUQP00LVLUVs0wiTBNBa2xgYeWYMahz28rliXsFn dLsLLH5J1H9/oXx1/vvVJcejMW4z2/Y= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-6ea85f7f445so17443587b3.0 for ; Thu, 07 Nov 2024 15:56:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731023791; x=1731628591; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kGp398/OGKFadKTdBHGJCLRTWw479TJgb4XjwbwuJpk=; b=axR6ouNESfopspILGv1fTFH/331srqGO2wcRMCTwNLxXCsZ+D6t6hmoBT/tJ18vFVm db11DJkyExMfjsfnv4fp7ji8pbXckRITQbMnP8+8WDTnlk90JwGEDIRyB2yUzG+rxcoE ZhbMLDGw8GIXjNBbwx+4dPvw2AmBzR7siqRndGc7IhhOdSq9UN1l+opk91pcVh+p5PLO frABI5MrIiejqsfYMF50W/LYOT58YUAFiX//o1D9mAF9MoErXxZYf3YjymLHueNJq9Hp b+LWoGVUc4ytqtng4o6nR/SXG62xB2FHH2GPGWpj2a9M/eMQYorsn2rlCtVmcmMXVkDx 00sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731023791; x=1731628591; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kGp398/OGKFadKTdBHGJCLRTWw479TJgb4XjwbwuJpk=; b=gHBBy6K9K3zq/G9RQw2m6u2RMspcJunPLNxZcR3kdaz2/sa4HgR4Uak2UMWMdKIMYq YUSTb5oZqHyt5Diavxw54oJKhQb5e6tzG+dDdnUNDYbmC3T55Ojzk0pLAhg1e0HQQrS7 iFmsUTIyXTn3racljhDCxa4s5YzorcPtvtMdI3U7I4VIzL5OjNlZJxJyou4Z55AgDs0B hVk9A5GbfcwoHIPl12XO5mCW6uoqtOt2WXYEvFT7G/UAwWGGl2MhgpARhznkSnGgCu7r 2TJJfg6/L6I3QUf7kBsquMo9n7CG/cpjbXo9ZNxLl0tJa2/Zeqpj7g+9U7kOD7a8hWXi jAMw== X-Forwarded-Encrypted: i=1; AJvYcCWHqvn/v+TtRvw6pHhb8+UJWMKZ2PRvuMmV4/qDCUjBKRWbMu1pJnawzx2Lhyc/k8pUzSW/DqhebA==@kvack.org X-Gm-Message-State: AOJu0YzrwpoiKIgAEud2FcFtawI22n7n9tbTNBvRiIdpGsjXp2CMqjfM EgEg9FsW/MFP8EC37WRYYM4bX9AeQ1gQzmeIUCOHXKlCA2Uh4Ur9 X-Google-Smtp-Source: AGHT+IGtwXdTUWVou6799sIkHT1OBBdHpywl7x1G3IHz/NTXy8iOQgd0OjfXXOssI2uPEUYjyoyP1g== X-Received: by 2002:a05:690c:4b8d:b0:6e3:8ecc:bb0e with SMTP id 00721157ae682-6eaddd94248mr12819677b3.11.1731023790878; Thu, 07 Nov 2024 15:56:30 -0800 (PST) Received: from localhost (fwdproxy-nha-012.fbsv.net. [2a03:2880:25ff:c::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6eaceb6563bsm4999937b3.77.2024.11.07.15.56.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 15:56:30 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Date: Thu, 7 Nov 2024 15:56:10 -0800 Message-ID: <20241107235614.3637221-3-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241107235614.3637221-1-joannelkoong@gmail.com> References: <20241107235614.3637221-1-joannelkoong@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9F07B2001E X-Stat-Signature: 4eezd9m7eo1kck1r87o84hst8sfem4bi X-Rspam-User: X-HE-Tag: 1731023754-654727 X-HE-Meta: U2FsdGVkX18Ph/w4Hlay6mw527rOizT83fyBlTL6K/32TidvEpU8GZeqQKyGWCVKa2wPgX89+XzwiBDr1bpulLrHW9rYlW/2hZgYohcusb/eum50GOeIVJFcRu8YJoc7ncPY/3OhkZcJl54TDKRRWU2w4tCJDsWV6d0gmZaejNT3Y7v/eGM/ol5c3QYJrcMYq2FyIuImhD4tzRdMqSMSMY3Qv4pBIvPyNyjdz1OvWOOAihIhVLVXjhigJnd7hG+aCoUAxQ7x9sY17/dGzha09Jz4evztFKXjwk7174vP/7e5nVYFbTFOggbOCtQor5YccemLoLUxiPT6WZk9E/91+F0jg7AX/aaA+vLY+wGATBIYRC//IDKcTu3YVKLAXg+AxhSBa729RkolsolOLeV8FQHOm6t06fJgDL+wK7HuTGnY3w0KjAlxAFSGFGOO+L1zfR0YLBANgiAQ1hS51zMs5VzzLo5JcU1hfogOf74jrI33E/AlviQruuxGUV404uXtue3ZL/J9Gczhx4AbcT2XOdDBLct9K8I1SciCjjE5XRhXUkVljHEfCuHAkHEOFYgUanBOPxFkuAIC6l4izvHZSYnFj8iBBXggi9FkolWPkVZ11VH0eE04K/BhaUVVMCsGEHoiVxqiLk5PLR7OdstOldf7t6tdANEt5qJNblOPGU0aXiyRt2OIIrQK+DupWvKU1k5I2lVWKHZHLLUgRb1WvGxHYynu3ZFDq90JeYujTK9qHSw4ex8s5AHBcWk69Xhg1Akhaw1YsnK5PbxadL/rdEua8TVNtCgkwkQoKvfoC1Ditmihlvfyi4Iz3m2WQOiGXaXcKyTXtXo4CbsNszVfAS4qYBK3xpPkl2TSk8WhMrnSSv17qFTw6XXS7rShAVhNWJPg6i65OnyhG0YfZKimq9m4zxAEs4TCj5F0rZT4tlCIwjy8kLb7SBf7XiJaRlcL7lFQSkuyWWP7sf0JPFy U1YfIDgp BnkNKJdVlvbcXVJhDJEfBnQmuBvvheTBggOWraQoYFTSByniPZtbJMhEppFKiU2kNpP5wSk5akaV9ObQlIFj+rM+oTcBkZs5ZkB3mbH5GUPv1tU7cTasRG0HaYo3EDizQMILdo7NA9FMncEpKL6JEBw+8AV/HDoqdvCpmwvqdRTrMKEzyaJVPd7S+fUOibB9weV0jJsKoKgUW9XT/RmVY6ebg06Q65nexgFanPAXKkWCtNCaNx7G1+J1qwa8Ay7Elkw0+vd5gqD0TDmXwZFMiQdxExWmmhkeQB+ufEdX84RAoInMC9Yl1VcTtIa9Q8QNUQXsk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently in shrink_folio_list(), reclaim for folios under writeback falls into 3 different cases: 1) Reclaim is encountering an excessive number of folios under writeback and this folio has both the writeback and reclaim flags set 2) Dirty throttling is enabled (this happens if reclaim through cgroup is not enabled, if reclaim through cgroupv2 memcg is enabled, or if reclaim is on the root cgroup), or if the folio is not marked for immediate reclaim, or if the caller does not have __GFP_FS (or __GFP_IO if it's going to swap) set 3) Legacy cgroupv1 encounters a folio that already has the reclaim flag set and the caller did not have __GFP_FS (or __GFP_IO if swap) set In cases 1) and 2), we activate the folio and skip reclaiming it while in case 3), we wait for writeback to finish on the folio and then try to reclaim the folio again. In case 3, we wait on writeback because cgroupv1 does not have dirty folio throttling, as such this is a mitigation against the case where there are too many folios in writeback with nothing else to reclaim. The issue is that for filesystems where writeback may block, sub-optimal workarounds may need to be put in place to avoid a potential deadlock that may arise from reclaim waiting on writeback. (Even though case 3 above is rare given that legacy cgroupv1 is on its way to being deprecated, this case still needs to be accounted for). For example, for FUSE filesystems, a temp page gets allocated per dirty page and the contents of the dirty page are copied over to the temp page so that writeback can be immediately cleared on the dirty page in order to avoid the following deadlock: * single-threaded FUSE server is in the middle of handling a request that needs a memory allocation * memory allocation triggers direct reclaim * direct reclaim waits on a folio under writeback (eg falls into case 3 above) that needs to be written back to the FUSE server * the FUSE server can't write back the folio since it's stuck in direct reclaim In this commit, if legacy memcg encounters a folio with the reclaim flag set (eg case 3) and the folio belongs to a mapping that has the AS_WRITEBACK_MAY_BLOCK flag set, the folio will be activated and skip reclaim (eg default to behavior in case 2) instead. This allows for the suboptimal workarounds added to address the "reclaim wait on writeback" deadlock scenario to be removed. Signed-off-by: Joanne Koong --- mm/vmscan.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 749cdc110c74..e9755cb7211b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1110,6 +1110,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (writeback && folio_test_reclaim(folio)) stat->nr_congested += nr_pages; + mapping = folio_mapping(folio); + /* * If a folio at the tail of the LRU is under writeback, there * are three cases to consider. @@ -1129,8 +1131,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * 2) Global or new memcg reclaim encounters a folio that is * not marked for immediate reclaim, or the caller does not * have __GFP_FS (or __GFP_IO if it's simply going to swap, - * not to fs). In this case mark the folio for immediate - * reclaim and continue scanning. + * not to fs), or writebacks in the mapping may block. + * In this case mark the folio for immediate reclaim and + * continue scanning. * * Require may_enter_fs() because we would wait on fs, which * may not have submitted I/O yet. And the loop driver might @@ -1165,7 +1168,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, /* Case 2 above */ } else if (writeback_throttling_sane(sc) || !folio_test_reclaim(folio) || - !may_enter_fs(folio, sc->gfp_mask)) { + !may_enter_fs(folio, sc->gfp_mask) || + (mapping && mapping_writeback_may_block(mapping))) { /* * This is slightly racy - * folio_end_writeback() might have