From patchwork Sat Jul 15 06:31:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 13314389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D62DC001E0 for ; Sat, 15 Jul 2023 06:31:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230029AbjGOGba (ORCPT ); Sat, 15 Jul 2023 02:31:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229482AbjGOGbZ (ORCPT ); Sat, 15 Jul 2023 02:31:25 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6657A2726; Fri, 14 Jul 2023 23:31:24 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-3fb4146e8deso26154945e9.0; Fri, 14 Jul 2023 23:31:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689402682; x=1691994682; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5499ObPoo5ZYZm4cR26jD7+9ROdxWSkLTLfdSFPaw5E=; b=GDokpqBhwTxdi/FxZ+GgecZdhADW5Xn7QbWLhDoA/w5OwA2OO7waF1Vc2gujTiRL42 1cv9ygVHFlmhL68HBxqZkZCbrwyIz/7FFvD+ZmFMNGnK4YQ7Q3yxorhxXZBta/RymOBi 1odO6zGMX3IOAVX0JqWPjAcjI/g2SqDMQZyQuoF4iowgxHOzQMRbIJq8191bPF1GRuCj OXFMMyWZ9MlGF4YDfyI1jOmVZ3t15XXHaNA3sGdqWDFj+gATUGO6zZz58seBfC7oYf7M E6X14+6gNAwUHsuF7lI45/7lZG1pBmVgq/rXe91WeESr4qEDWIyrcAI6dKj2C9dPPNkG TL7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689402682; x=1691994682; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5499ObPoo5ZYZm4cR26jD7+9ROdxWSkLTLfdSFPaw5E=; b=EOu93urLqEIL8g26QvTeOkd+6rXtE0Z66OJUKNe8l/onWIhali9/FtVB08lTfqk6zB 9u+RiS3PVba+okTyBg52bqmC4Dmi1xdVPUhH0WkxSNr7heUjrjyq2BzRw/KNpWqP+cOU MAy3em1fTAs0Ocm4Kby6w7Kq0DXXLcR6jP4zcy5bdQTZzHisg9deGJ3MpaYe2EvRBJb1 Yzh7BOjxSoadNorDFuPNjHr7maFvcVHh1FfI2OXzY1KrF9T+K19GYCX8jThP0SUEnEPt 8NvJ0fPUIOPh6DIAiTsF7GSRqNmajboP0ipIVyhe0y7lbvCP6953wFCsDV8HF9ykxgZZ GxeA== X-Gm-Message-State: ABy/qLYstbp6TjIGf57xXU00Cvf4s7PNO15GipffDOi0CRgACYGdyaas CgtWmmVxNQKfNoBdunxom1M= X-Google-Smtp-Source: APBJJlHbx2VhDR5H4x0sHLPQObn3zeqvVS4Sa+6B7t4I3n3TLSwgmiW5fj7lxvBLpeZbpAaLW10qvw== X-Received: by 2002:a05:600c:c8:b0:3fc:182:7eac with SMTP id u8-20020a05600c00c800b003fc01827eacmr6637922wmm.33.1689402682296; Fri, 14 Jul 2023 23:31:22 -0700 (PDT) Received: from amir-ThinkPad-T480.lan ([5.29.249.86]) by smtp.gmail.com with ESMTPSA id m6-20020a7bcb86000000b003fbe36a4ce6sm2957360wmi.10.2023.07.14.23.31.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jul 2023 23:31:21 -0700 (PDT) From: Amir Goldstein To: Greg Kroah-Hartman Cc: Sasha Levin , Leah Rumancik , Chandan Babu R , "Darrick J . Wong" , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, stable@vger.kernel.org, Dave Chinner , Dave Chinner Subject: [PATCH 6.1 1/4] xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately Date: Sat, 15 Jul 2023 09:31:11 +0300 Message-Id: <20230715063114.1485841-2-amir73il@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230715063114.1485841-1-amir73il@gmail.com> References: <20230715063114.1485841-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" commit 03e0add80f4cf3f7393edb574eeb3a89a1db7758 upstream. I've been noticing odd racing behavior in the inodegc code that could only be explained by one cpu adding an inode to its inactivation llist at the same time that another cpu is processing that cpu's llist. Preemption is disabled between get/put_cpu_ptr, so the only explanation is scheduler mayhem. I inserted the following debug code into xfs_inodegc_worker (see the next patch): ASSERT(gc->cpu == smp_processor_id()); This assertion tripped during overnight tests on the arm64 machines, but curiously not on x86_64. I think we haven't observed any resource leaks here because the lockfree list code can handle simultaneous llist_add and llist_del_all functions operating on the same list. However, the whole point of having percpu inodegc lists is to take advantage of warm memory caches by inactivating inodes on the last processor to touch the inode. The incorrect scheduling seems to occur after an inodegc worker is subjected to mod_delayed_work(). This wraps mod_delayed_work_on with WORK_CPU_UNBOUND specified as the cpu number. Unbound allows for scheduling on any cpu, not necessarily the same one that scheduled the work. Because preemption is disabled for as long as we have the gc pointer, I think it's safe to use current_cpu() (aka smp_processor_id) to queue the delayed work item on the correct cpu. Fixes: 7cf2b0f9611b ("xfs: bound maximum wait time for inodegc work") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index eae7427062cf..536885f8b8a8 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -2052,7 +2052,8 @@ xfs_inodegc_queue( queue_delay = 0; trace_xfs_inodegc_queue(mp, __return_address); - mod_delayed_work(mp->m_inodegc_wq, &gc->work, queue_delay); + mod_delayed_work_on(current_cpu(), mp->m_inodegc_wq, &gc->work, + queue_delay); put_cpu_ptr(gc); if (xfs_inodegc_want_flush_work(ip, items, shrinker_hits)) { @@ -2096,7 +2097,8 @@ xfs_inodegc_cpu_dead( if (xfs_is_inodegc_enabled(mp)) { trace_xfs_inodegc_queue(mp, __return_address); - mod_delayed_work(mp->m_inodegc_wq, &gc->work, 0); + mod_delayed_work_on(current_cpu(), mp->m_inodegc_wq, &gc->work, + 0); } put_cpu_ptr(gc); } From patchwork Sat Jul 15 06:31:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 13314390 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF4D1C001DC for ; Sat, 15 Jul 2023 06:31:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230043AbjGOGba (ORCPT ); Sat, 15 Jul 2023 02:31:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229984AbjGOGb0 (ORCPT ); Sat, 15 Jul 2023 02:31:26 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A22752735; Fri, 14 Jul 2023 23:31:25 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id ffacd0b85a97d-3143798f542so2757586f8f.2; Fri, 14 Jul 2023 23:31:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689402684; x=1691994684; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VyZJHmKPjPkb3yPje1jkcke84M++giH6sd9JvhkMUDA=; b=JNrKupWUSD4UlRQAjg3KQ0K170ZTY0PJmQGEcd/S4bLKYnnoSICnkJcH4zi36m92eC T8UnItgd2OhTL81upxlcRScPI267F8vzLsM9PD3LOJWHtNDK4SwJ1wbf6dOEGyCDADCw /uNBgbf5O3EMUCPiBuP/t034A24yUJhawCuB04/aAi94VNiwildVchZYJt6eh2ebMCyw IqQirLNCXjCjoXEcW3ymenrhAUpCSRx5xnUfTuIivodIv9wwbYmCaWeX85BrPBod5gNh rTKPyInLEGVkEI7ZL6p2UC8DdGQRO/8IixtqZZUF/Hz/IrJayQg6zxAkhqkAhcOPwxhM 9unw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689402684; x=1691994684; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VyZJHmKPjPkb3yPje1jkcke84M++giH6sd9JvhkMUDA=; b=lrPlmDcTYKYoZS0yKpiiJPvwd385ODTn1EnGEWa1lNCeaUDUDI+GPtTKIpvkHqG9f0 fdEIuR67VHBS0uWeB/3yZm3QsYxkEjGaxHvSQKgNBBtM0Sc2NTAfj5x9YnEZ68POOoSY 5IU9fRGkfxwlms3JDkeOZYuEb7rYMkQkHIPSQslt7grOtGUGhkKAfu8X75AW73lK4oLD i8OicTHW6bXumLRivw4XnjAU0YbGHHdiy4emqR1WdPqivx8pPHQnuox0PbsWfyy9/GBm sR/8sN66YM5VjdWTtItaBYK0vaK72a6JT6qILILXP8qF3edrVmDPzwDRJ19ROgUEgd2p 0HAw== X-Gm-Message-State: ABy/qLbjt18AYOkM+Rf8sS+eib2x7btkf7HF2YWYBppBHVswXJQFIqGt FQtVOcb/11C5QC3yQXpSgpA= X-Google-Smtp-Source: APBJJlEkv1S7qZqkQvAOSk8oegQg1BvVfva6Hbl5Y9wxZ71fWjQs+cIeYI/czpJ8jesggpr0pBWjQw== X-Received: by 2002:a5d:6047:0:b0:314:3ac8:c277 with SMTP id j7-20020a5d6047000000b003143ac8c277mr6368823wrt.9.1689402683846; Fri, 14 Jul 2023 23:31:23 -0700 (PDT) Received: from amir-ThinkPad-T480.lan ([5.29.249.86]) by smtp.gmail.com with ESMTPSA id m6-20020a7bcb86000000b003fbe36a4ce6sm2957360wmi.10.2023.07.14.23.31.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jul 2023 23:31:23 -0700 (PDT) From: Amir Goldstein To: Greg Kroah-Hartman Cc: Sasha Levin , Leah Rumancik , Chandan Babu R , "Darrick J . Wong" , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, stable@vger.kernel.org, Dave Chinner , Dave Chinner Subject: [PATCH 6.1 2/4] xfs: check that per-cpu inodegc workers actually run on that cpu Date: Sat, 15 Jul 2023 09:31:12 +0300 Message-Id: <20230715063114.1485841-3-amir73il@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230715063114.1485841-1-amir73il@gmail.com> References: <20230715063114.1485841-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" commit b37c4c8339cd394ea6b8b415026603320a185651 upstream. Now that we've allegedly worked out the problem of the per-cpu inodegc workers being scheduled on the wrong cpu, let's put in a debugging knob to let us know if a worker ever gets mis-scheduled again. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 2 ++ fs/xfs/xfs_mount.h | 3 +++ fs/xfs/xfs_super.c | 3 +++ 3 files changed, 8 insertions(+) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 536885f8b8a8..7ce262dcabca 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1848,6 +1848,8 @@ xfs_inodegc_worker( struct llist_node *node = llist_del_all(&gc->list); struct xfs_inode *ip, *n; + ASSERT(gc->cpu == smp_processor_id()); + WRITE_ONCE(gc->items, 0); if (!node) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 8aca2cc173ac..69ddd5319634 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -66,6 +66,9 @@ struct xfs_inodegc { /* approximate count of inodes in the list */ unsigned int items; unsigned int shrinker_hits; +#if defined(DEBUG) || defined(XFS_WARN) + unsigned int cpu; +#endif }; /* diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index ee4b429a2f2c..4b179526913f 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1084,6 +1084,9 @@ xfs_inodegc_init_percpu( for_each_possible_cpu(cpu) { gc = per_cpu_ptr(mp->m_inodegc, cpu); +#if defined(DEBUG) || defined(XFS_WARN) + gc->cpu = cpu; +#endif init_llist_head(&gc->list); gc->items = 0; INIT_DELAYED_WORK(&gc->work, xfs_inodegc_worker); From patchwork Sat Jul 15 06:31:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 13314391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EFA4C04FDF for ; Sat, 15 Jul 2023 06:31:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230090AbjGOGbb (ORCPT ); Sat, 15 Jul 2023 02:31:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229991AbjGOGb2 (ORCPT ); Sat, 15 Jul 2023 02:31:28 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F10B2D7B; Fri, 14 Jul 2023 23:31:27 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-3fbf1b82d9cso23065115e9.2; Fri, 14 Jul 2023 23:31:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689402685; x=1691994685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2AXY8sGXuOMpTkb31aXJGFVK03qdQQBJUaLjhUzmYCs=; b=nBa1tA9QEislcV5CGrscgZ6jyVcHcXgNNizrSh2qBMYJM8wssB8ILuKUX5bV4+N8xj dUCYWk6OJUCxmNtq7p1BD1x3B0JAyitZBuap4y90aav+CWMW6avz9HRJ3SGtVmYKHGZw ySo5MrzUvRetD1mPbepRvuyd9JGJSQ5bZ79DuqwUAJdPl3qkHAKjkYU+4+yZw4qcsD2H YIESdaYQYRjU4bF/gnpRebmeIeKtU99tIxUAoQPKPFCZTtQrtRnnfjpFur5vm/yvqV4n zWvTJmETAAhjFlKePxKXWkuyhhE7Ipt7zfSIAknamcSwlgcn23Qb1w+imQcbrGjaokZn HTvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689402685; x=1691994685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2AXY8sGXuOMpTkb31aXJGFVK03qdQQBJUaLjhUzmYCs=; b=BYRRWWDtseUaSEoHYOVLjYmDpDKce9KGPsW7jEha7xD3mTPMSSv46kIKTC5caThlbZ napX38X3+8qT2vDqnGaoVgOyG4ynD3p/ldaUGatpELWH5E0iSBOsmSeSFi/lyUzVozym DWj8/OElzdQJDMQiZKhWOiS/pbIKf4wdsFZpslDxPEEf6ObO3pmnw7zXzyaH1Qb+gg2U s5tDbkUCaXDSDI++mT76uNTRJTHafwuQKKPRrgPZ315S7t/FSsm4XDtBeaP/pliBFQyZ eZLoJS9U26tnDKqlY5mHv6V0oI9fFng2idnTr3aFByU8QmBwZ9KCBGh9DcON73hFb/Zt rvJA== X-Gm-Message-State: ABy/qLavApKG6iHs8lExlYwuHnm96VVXSfEcdPyRRzO1SEvjAQb/vZWp WYBRuft4+jc/1iuh+7HGQX8= X-Google-Smtp-Source: APBJJlE7Xuakk0qM4YH9U+I/TifU81VcfMgJpfAsVa64dMnDBq1olyK69fciRlLeBN0SkgXEX2ewpg== X-Received: by 2002:a7b:c319:0:b0:3fb:a2b6:8dfd with SMTP id k25-20020a7bc319000000b003fba2b68dfdmr4853172wmj.32.1689402685459; Fri, 14 Jul 2023 23:31:25 -0700 (PDT) Received: from amir-ThinkPad-T480.lan ([5.29.249.86]) by smtp.gmail.com with ESMTPSA id m6-20020a7bcb86000000b003fbe36a4ce6sm2957360wmi.10.2023.07.14.23.31.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jul 2023 23:31:25 -0700 (PDT) From: Amir Goldstein To: Greg Kroah-Hartman Cc: Sasha Levin , Leah Rumancik , Chandan Babu R , "Darrick J . Wong" , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, stable@vger.kernel.org, Dave Chinner , Dave Chinner Subject: [PATCH 6.1 3/4] xfs: disable reaping in fscounters scrub Date: Sat, 15 Jul 2023 09:31:13 +0300 Message-Id: <20230715063114.1485841-4-amir73il@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230715063114.1485841-1-amir73il@gmail.com> References: <20230715063114.1485841-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" commit 2d5f38a31980d7090f5bf91021488dc61a0ba8ee upstream. The fscounters scrub code doesn't work properly because it cannot quiesce updates to the percpu counters in the filesystem, hence it returns false corruption reports. This has been fixed properly in one of the online repair patchsets that are under review by replacing the xchk_disable_reaping calls with an exclusive filesystem freeze. Disabling background gc isn't sufficient to fix the problem. In other words, scrub doesn't need to call xfs_inodegc_stop, which is just as well since it wasn't correct to allow scrub to call xfs_inodegc_start when something else could be calling xfs_inodegc_stop (e.g. trying to freeze the filesystem). Neuter the scrubber for now, and remove the xchk_*_reaping functions. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong --- fs/xfs/scrub/common.c | 26 -------------------------- fs/xfs/scrub/common.h | 2 -- fs/xfs/scrub/fscounters.c | 13 ++++++------- fs/xfs/scrub/scrub.c | 2 -- fs/xfs/scrub/scrub.h | 1 - 5 files changed, 6 insertions(+), 38 deletions(-) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 9bbbf20f401b..e71449658ecc 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -865,29 +865,3 @@ xchk_ilock_inverted( } return -EDEADLOCK; } - -/* Pause background reaping of resources. */ -void -xchk_stop_reaping( - struct xfs_scrub *sc) -{ - sc->flags |= XCHK_REAPING_DISABLED; - xfs_blockgc_stop(sc->mp); - xfs_inodegc_stop(sc->mp); -} - -/* Restart background reaping of resources. */ -void -xchk_start_reaping( - struct xfs_scrub *sc) -{ - /* - * Readonly filesystems do not perform inactivation or speculative - * preallocation, so there's no need to restart the workers. - */ - if (!xfs_is_readonly(sc->mp)) { - xfs_inodegc_start(sc->mp); - xfs_blockgc_start(sc->mp); - } - sc->flags &= ~XCHK_REAPING_DISABLED; -} diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 454145db10e7..2ca80102e704 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -148,7 +148,5 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm) int xchk_metadata_inode_forks(struct xfs_scrub *sc); int xchk_ilock_inverted(struct xfs_inode *ip, uint lock_mode); -void xchk_stop_reaping(struct xfs_scrub *sc); -void xchk_start_reaping(struct xfs_scrub *sc); #endif /* __XFS_SCRUB_COMMON_H__ */ diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c index 6a6f8fe7f87c..88d6961e3886 100644 --- a/fs/xfs/scrub/fscounters.c +++ b/fs/xfs/scrub/fscounters.c @@ -128,13 +128,6 @@ xchk_setup_fscounters( if (error) return error; - /* - * Pause background reclaim while we're scrubbing to reduce the - * likelihood of background perturbations to the counters throwing off - * our calculations. - */ - xchk_stop_reaping(sc); - return xchk_trans_alloc(sc, 0); } @@ -353,6 +346,12 @@ xchk_fscounters( if (fdblocks > mp->m_sb.sb_dblocks) xchk_set_corrupt(sc); + /* + * XXX: We can't quiesce percpu counter updates, so exit early. + * This can be re-enabled when we gain exclusive freeze functionality. + */ + return 0; + /* * If ifree exceeds icount by more than the minimum variance then * something's probably wrong with the counters. diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 2e8e400f10a9..95132490fda5 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -171,8 +171,6 @@ xchk_teardown( } if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) mnt_drop_write_file(sc->file); - if (sc->flags & XCHK_REAPING_DISABLED) - xchk_start_reaping(sc); if (sc->buf) { kmem_free(sc->buf); sc->buf = NULL; diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 3de5287e98d8..4cb32c27df10 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -88,7 +88,6 @@ struct xfs_scrub { /* XCHK state flags grow up from zero, XREP state flags grown down from 2^31 */ #define XCHK_TRY_HARDER (1 << 0) /* can't get resources, try again */ -#define XCHK_REAPING_DISABLED (1 << 2) /* background block reaping paused */ #define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */ /* Metadata scrubbers */ From patchwork Sat Jul 15 06:31:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 13314392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C646C04FE0 for ; Sat, 15 Jul 2023 06:31:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230095AbjGOGbb (ORCPT ); Sat, 15 Jul 2023 02:31:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230024AbjGOGba (ORCPT ); Sat, 15 Jul 2023 02:31:30 -0400 Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF6823589; Fri, 14 Jul 2023 23:31:28 -0700 (PDT) Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-4f96d680399so4294838e87.0; Fri, 14 Jul 2023 23:31:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689402687; x=1691994687; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2AtTDdgaxKTY/8zm45JtN0U97zPWNFUjkiNwdtmAj/s=; b=HxxkPyUuqwR2vZbCvZ1VGJSASgUjH2ZXqetcdxvocehLhHajJeJ3D9hd9GeOalu85F 8ZEi1MEa6E4nNsLT7fgCtH88Ps7g50kouSFSOrdlrVBEhoD7S4ra0eRD/f2ARmIRK8iv ObvXdji/hMqP4K0Qp/+URtm9/fVhog18H7QIf40E2EESNArdmWJflkALPGAI7hNIF60n z8bPak3R7Z92IAcT/Yr8iWY+An4KeNcmIsp74z8vsc+YsAMdtTx8KJ3LR2ui1RgzAIKQ opjEjqQLAINTetb17YC2QzFZ3QRNHhm128Mb3gO4QzTVwl+iNGdRVR3juOCqFp5c6rsD C61g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689402687; x=1691994687; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2AtTDdgaxKTY/8zm45JtN0U97zPWNFUjkiNwdtmAj/s=; b=NfThJw9KVRo9t2m0NR8d7YalyLyDaZNwEKpltr2jGbgO9kN5caGtQ+QCPvl/413+Wg F73YYo0vm2f8AapA3xGjn01trvxUBvVMWolaPOYEsmR5O2R7T1MuY2RN6Pi4VUew+KBS B0MTk0uJqXhaTEOnQphDp4g/JDLSSB/fXGNXMxqWxQluIeX2CPoLqUbxgSvTQslU7GvE DE+LiZxka7JodZfB6Tcm0EL5h+5sNsy2krgfN6OJsBLmaNJ8gi6/9YuoIwzvh6FGXcU3 TNLRlvo3C98cu8AVP2r3D5S/Uh10vdoFqcDwDJe/EwG9Ux0n9oQBUQGI4sFXaEIgiXAP x3Tw== X-Gm-Message-State: ABy/qLbVBawx2uvOT+OB4YIv9rK6bMeNLqm/6dEOG8Lbb6yaV1DKz0vL hwDGYbNzbAc5joTrqBGwOmhXDYy03cg= X-Google-Smtp-Source: APBJJlFyXvXCAgiSAHt3v9BO/0u9g/snZBtXSUNAoVEXprVo4tZNpumLV4h65ncg/je8FZrS5/mNiA== X-Received: by 2002:a19:6703:0:b0:4f8:7781:9875 with SMTP id b3-20020a196703000000b004f877819875mr5369274lfc.60.1689402687019; Fri, 14 Jul 2023 23:31:27 -0700 (PDT) Received: from amir-ThinkPad-T480.lan ([5.29.249.86]) by smtp.gmail.com with ESMTPSA id m6-20020a7bcb86000000b003fbe36a4ce6sm2957360wmi.10.2023.07.14.23.31.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jul 2023 23:31:26 -0700 (PDT) From: Amir Goldstein To: Greg Kroah-Hartman Cc: Sasha Levin , Leah Rumancik , Chandan Babu R , "Darrick J . Wong" , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, stable@vger.kernel.org, Dave Chinner , Dave Chinner Subject: [PATCH 6.1 4/4] xfs: fix xfs_inodegc_stop racing with mod_delayed_work Date: Sat, 15 Jul 2023 09:31:14 +0300 Message-Id: <20230715063114.1485841-5-amir73il@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230715063114.1485841-1-amir73il@gmail.com> References: <20230715063114.1485841-1-amir73il@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" commit 2254a7396a0ca6309854948ee1c0a33fa4268cec upstream. syzbot reported this warning from the faux inodegc shrinker that tries to kick off inodegc work: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 102 at kernel/workqueue.c:1445 __queue_work+0xd44/0x1120 kernel/workqueue.c:1444 RIP: 0010:__queue_work+0xd44/0x1120 kernel/workqueue.c:1444 Call Trace: __queue_delayed_work+0x1c8/0x270 kernel/workqueue.c:1672 mod_delayed_work_on+0xe1/0x220 kernel/workqueue.c:1746 xfs_inodegc_shrinker_scan fs/xfs/xfs_icache.c:2212 [inline] xfs_inodegc_shrinker_scan+0x250/0x4f0 fs/xfs/xfs_icache.c:2191 do_shrink_slab+0x428/0xaa0 mm/vmscan.c:853 shrink_slab+0x175/0x660 mm/vmscan.c:1013 shrink_one+0x502/0x810 mm/vmscan.c:5343 shrink_many mm/vmscan.c:5394 [inline] lru_gen_shrink_node mm/vmscan.c:5511 [inline] shrink_node+0x2064/0x35f0 mm/vmscan.c:6459 kswapd_shrink_node mm/vmscan.c:7262 [inline] balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452 kswapd+0x677/0xd60 mm/vmscan.c:7712 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 This warning corresponds to this code in __queue_work: /* * For a draining wq, only works from the same workqueue are * allowed. The __WQ_DESTROYING helps to spot the issue that * queues a new work item to a wq after destroy_workqueue(wq). */ if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) && WARN_ON_ONCE(!is_chained_work(wq)))) return; For this to trip, we must have a thread draining the inodedgc workqueue and a second thread trying to queue inodegc work to that workqueue. This can happen if freezing or a ro remount race with reclaim poking our faux inodegc shrinker and another thread dropping an unlinked O_RDONLY file: Thread 0 Thread 1 Thread 2 xfs_inodegc_stop xfs_inodegc_shrinker_scan xfs_is_inodegc_enabled xfs_clear_inodegc_enabled xfs_inodegc_queue_all xfs_inodegc_queue xfs_is_inodegc_enabled drain_workqueue llist_empty mod_delayed_work_on(..., 0) __queue_work In other words, everything between the access to inodegc_enabled state and the decision to poke the inodegc workqueue requires some kind of coordination to avoid the WQ_DRAINING state. We could perhaps introduce a lock here, but we could also try to eliminate WQ_DRAINING from the picture. We could replace the drain_workqueue call with a loop that flushes the workqueue and queues workers as long as there is at least one inode present in the per-cpu inodegc llists. We've disabled inodegc at this point, so we know that the number of queued inodes will eventually hit zero as long as xfs_inodegc_start cannot reactivate the workers. There are four callers of xfs_inodegc_start. Three of them come from the VFS with s_umount held: filesystem thawing, failed filesystem freezing, and the rw remount transition. The fourth caller is mounting rw (no remount or freezing possible). There are three callers ofs xfs_inodegc_stop. One is unmounting (no remount or thaw possible). Two of them come from the VFS with s_umount held: fs freezing and ro remount transition. Hence, it is correct to replace the drain_workqueue call with a loop that drains the inodegc llists. Fixes: 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong --- fs/xfs/xfs_icache.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 7ce262dcabca..d884cba1d707 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -431,18 +431,23 @@ xfs_iget_check_free_state( } /* Make all pending inactivation work start immediately. */ -static void +static bool xfs_inodegc_queue_all( struct xfs_mount *mp) { struct xfs_inodegc *gc; int cpu; + bool ret = false; for_each_online_cpu(cpu) { gc = per_cpu_ptr(mp->m_inodegc, cpu); - if (!llist_empty(&gc->list)) + if (!llist_empty(&gc->list)) { mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0); + ret = true; + } } + + return ret; } /* @@ -1894,24 +1899,41 @@ xfs_inodegc_flush( /* * Flush all the pending work and then disable the inode inactivation background - * workers and wait for them to stop. + * workers and wait for them to stop. Caller must hold sb->s_umount to + * coordinate changes in the inodegc_enabled state. */ void xfs_inodegc_stop( struct xfs_mount *mp) { + bool rerun; + if (!xfs_clear_inodegc_enabled(mp)) return; + /* + * Drain all pending inodegc work, including inodes that could be + * queued by racing xfs_inodegc_queue or xfs_inodegc_shrinker_scan + * threads that sample the inodegc state just prior to us clearing it. + * The inodegc flag state prevents new threads from queuing more + * inodes, so we queue pending work items and flush the workqueue until + * all inodegc lists are empty. IOWs, we cannot use drain_workqueue + * here because it does not allow other unserialized mechanisms to + * reschedule inodegc work while this draining is in progress. + */ xfs_inodegc_queue_all(mp); - drain_workqueue(mp->m_inodegc_wq); + do { + flush_workqueue(mp->m_inodegc_wq); + rerun = xfs_inodegc_queue_all(mp); + } while (rerun); trace_xfs_inodegc_stop(mp, __return_address); } /* * Enable the inode inactivation background workers and schedule deferred inode - * inactivation work if there is any. + * inactivation work if there is any. Caller must hold sb->s_umount to + * coordinate changes in the inodegc_enabled state. */ void xfs_inodegc_start(