From patchwork Thu Feb 1 15:49:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13541307 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23884C48291 for ; Thu, 1 Feb 2024 15:50:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFCE16B0099; Thu, 1 Feb 2024 10:50:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A86E46B009A; Thu, 1 Feb 2024 10:50:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 926646B009B; Thu, 1 Feb 2024 10:50:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7CC956B0099 for ; Thu, 1 Feb 2024 10:50:24 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 468C740DCF for ; Thu, 1 Feb 2024 15:50:24 +0000 (UTC) X-FDA: 81743671968.11.7DDAF1E Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) by imf29.hostedemail.com (Postfix) with ESMTP id 7C06B120025 for ; Thu, 1 Feb 2024 15:50:21 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf29.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706802621; a=rsa-sha256; cv=none; b=rM+GxTCZbnJcRbQPBhr1Q5cvbR2XBmm2mwllsEBJbh7lFsS+e6L2+VICoROyLdVkGSsXpc 4tMGY8+pCMIoOUizB4XZHp0iK1ZqjMpd5ZZt6h057VAM/zsGdJF2eFc8co930OwewXGsmx KTC80uAY0MBCJPBwVnB3gPz10wZBBuA= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf29.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706802621; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gRXNvrwRwU8rXbqOKQDS9J1LBJUthFhHZOynRdqSwrM=; b=pz7pCuze+X9xGBxkAl3syhHxqM+LCstkp9OjdvzDcubyS23IcaHEi2Xt72IVCjWXn4+4Hz 91r1u0va5SMQHpvygfu2GI2TamC3RuQMXsSfOT7t8UkqLCgnXZKZZzUsS6bsiRqjJK5DUm rHUMovfg+qqG6AGzY3jRbjCxGXofo2Y= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Thu, 01 Feb 2024 15:49:01 +0000 Subject: [PATCH 1/6] mm/zswap: add more comments in shrink_memcg_cb() MIME-Version: 1.0 Message-Id: <20240201-b4-zswap-invalidate-entry-v1-1-56ed496b6e55@bytedance.com> References: <20240201-b4-zswap-invalidate-entry-v1-0-56ed496b6e55@bytedance.com> In-Reply-To: <20240201-b4-zswap-invalidate-entry-v1-0-56ed496b6e55@bytedance.com> To: Nhat Pham , Johannes Weiner , Andrew Morton , Yosry Ahmed Cc: linux-kernel@vger.kernel.org, Yosry Ahmed , Chengming Zhou , Johannes Weiner , linux-mm@kvack.org X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7C06B120025 X-Stat-Signature: nt1tb8kbwji9aea3zcyfxddp3k7hh5ra X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam-User: X-Rspam: Yes X-HE-Tag: 1706802621-637694 X-HE-Meta: U2FsdGVkX189vmxnWefXEE2BTXNcUbitOBF9P0aAKXCkmvSscPoKiWhKbE6+4krtDEa5MC8LUBAzkgZcftMi1rrf0LhiLdOp/gFxQLZrFVhL2RPmsWM7GwB0qIp5pcFlKHmfpPTe4ta7AkX7g3Ksyb+M886+G5dtLt3AJCxMlSI0e3+RdurM49U4V0OKpUX5cNxQINamshqJ7Zc+cGT3bW51NjBQwtVVamLaXl5ftxhdXXBjg8YaIN3fnf2Ek+Gkq5i7ENW9pX/92jWb9i7TDyCoBh2rqUlZK6nDVqPHudP6jpXzx2b5Sfc/LNcVgG5oaqPnlYYJZilAbRu5mjDUSqDqe9ZSc9kh5jr15GQRP3w6oDlE9sFnLbWefLeJkXEt3pZTUA5uzH9zsrNSuqaSfZl8Ut/9P1tgZMjxuR17ym5LGXy2u1Hf/5uEb1dwW/mYT7aEruX7xtYc/qayBksUkgrrTU7lyQqJxiWX/WmHH/H4SjuO9AbmPlLqyQzPt89Aqe212U7k/CJ1DY2bAbliQZEFsq+9APyyslOCteJu6Kmgp2TKW/3BnqMGeizJOFEAJXUUczzRfQ4Csj2kAlt3BoEeCzc7S0tF4r9tfAD6uilFq4tb0IvEgA8rWKOdU/Ug/v1axFOeaSM3AlgysxRp9d99d3XXRbIbL6FL1IQiynJsNSh/YXf76mNGrcEnETVVDDzB18sMdnM140zAFlSNprX/XFahl8wiDDR6AQ2bB3nCTiBump38nra7YgZhoKwmBQLa+DP8GfvvsK3Tjb5rV/+viwuJlW2ygWvJtZh6hFtk8ZYHEHGhTbUDpaN8LbcsjjdOisH8gmZU0HRjXvD7T8VZIaBdZQHu5cmmuGBEE6OsmImkRuuRFV4+Xanszvl0FZpOD50WPnYmOE1l6LUW0tEcG1jMij/n9eLMIFNdyo8KWh2XLQmThD5nrKdeSi6wrtdqvKgv23q0GDqSzTt N7uflCx9 VncM669EOhHCNrc4v+ecSN1agWTkJXnFPlFNSoZbU+iW7nWCNna5UfylYfgLCZMQd6TwPnluSU+R0qkecEBwQSHVxMTNGC0ai3W2ib34h436Rcwzk45SVjKCDigMeLXo9GaMLLpEPNgPBEf1hKJcwSBZDt8Hr5/DbSxvhzB1i56eqX4Q9CYfmjHOkIq5NrkU4PgxadPnWDM+b/fPYwIo4SrDv6IJaDF42mNh+VHrIFFGH+FJceXIavAq9+LsmdtwQrrYgWIIGHYiAGNC5RL0Nnz7ilVsZ4qOink91QZ7Dk4gb2kRtIqW2fQtXBYb3mR47JH3LwxhqrSKw5OGYNr88ONqHFGCoqH1en017i4udNdk8wBgiWhOGWbZJYSvQ41O0k53OBANPWt+7jc6WO1incPz2Q2Bij7Mjjs8mnXRu7ZU5ewTM3/p0KvQazxBz05QYJ46i8FbN96ecrSiSvG6U0njwmhr3gX425fxEaAxmu1JP7xQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add more comments in shrink_memcg_cb() to describe the deref dance which is implemented to fix race problem between lru writeback and swapoff, and the reason why we rotate the entry at the beginning. Also fix the stale comments in zswap_writeback_entry(), and add more comments to state that we only deref the tree after we get the swapcache reference. Suggested-by: Yosry Ahmed Suggested-by: Johannes Weiner Signed-off-by: Chengming Zhou Signed-off-by: Johannes Weiner Acked-by: Yosry Ahmed Reviewed-by: Nhat Pham --- mm/zswap.c | 43 ++++++++++++++++++++++++++----------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 4aea03285532..735f1a6ef336 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1207,10 +1207,12 @@ static int zswap_writeback_entry(struct zswap_entry *entry, /* * folio is locked, and the swapcache is now secured against - * concurrent swapping to and from the slot. Verify that the - * swap entry hasn't been invalidated and recycled behind our - * backs (our zswap_entry reference doesn't prevent that), to - * avoid overwriting a new swap folio with old compressed data. + * concurrent swapping to and from the slot, and concurrent + * swapoff so we can safely dereference the zswap tree here. + * Verify that the swap entry hasn't been invalidated and recycled + * behind our backs, to avoid overwriting a new swap folio with + * old compressed data. Only when this is successful can the entry + * be dereferenced. */ tree = swap_zswap_tree(swpentry); spin_lock(&tree->lock); @@ -1263,22 +1265,29 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o int writeback_result; /* - * Rotate the entry to the tail before unlocking the LRU, - * so that in case of an invalidation race concurrent - * reclaimers don't waste their time on it. + * As soon as we drop the LRU lock, the entry can be freed by + * a concurrent invalidation. This means the following: * - * If writeback succeeds, or failure is due to the entry - * being invalidated by the swap subsystem, the invalidation - * will unlink and free it. + * 1. We extract the swp_entry_t to the stack, allowing + * zswap_writeback_entry() to pin the swap entry and + * then validate the zwap entry against that swap entry's + * tree using pointer value comparison. Only when that + * is successful can the entry be dereferenced. * - * Temporary failures, where the same entry should be tried - * again immediately, almost never happen for this shrinker. - * We don't do any trylocking; -ENOMEM comes closest, - * but that's extremely rare and doesn't happen spuriously - * either. Don't bother distinguishing this case. + * 2. Usually, objects are taken off the LRU for reclaim. In + * this case this isn't possible, because if reclaim fails + * for whatever reason, we have no means of knowing if the + * entry is alive to put it back on the LRU. * - * But since they do exist in theory, the entry cannot just - * be unlinked, or we could leak it. Hence, rotate. + * So rotate it before dropping the lock. If the entry is + * written back or invalidated, the free path will unlink + * it. For failures, rotation is the right thing as well. + * + * Temporary failures, where the same entry should be tried + * again immediately, almost never happen for this shrinker. + * We don't do any trylocking; -ENOMEM comes closest, + * but that's extremely rare and doesn't happen spuriously + * either. Don't bother distinguishing this case. */ list_move_tail(item, &l->list);