From patchwork Thu Sep 5 18:21:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792796 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9412718859E for ; Thu, 5 Sep 2024 18:22:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560525; cv=none; b=B/mx7aChsMhnW8Zs8GUjiTTZAphdjXQtoUyWebv2tHLC8qMqQj3Bv2BVV8eg0IFS1EWg5yt0kcqtkxFrwdo6WzKAMGv9MCaqaEk8tkp2YqffEnlPN66xHhUyWmGcAdbvTIpPMKpvTOLUDs1DhQsWoaupUUFhwnWrmP7p3v6w5m8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560525; c=relaxed/simple; bh=pfPrTiSwv55fb6SWEnAa3GIdTeUVnZOqfCPfS1nLuwQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eceJgC1Yl2+GgrOFeZDpKYEIdvtyYb8OTydX6nnlNI95zyJkfXvE4pZ/FY840SOm0+NTPnFhUdndvMj/MH+Bym+zdjCcQmbV/pzaIyyQxpVnD75/A++nD7vS3Z+QfxSIX6KHxwQtojMr2ktyb3vPS7bTmfUHM+aXXaSu3DDapjk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Nt6OCuQr; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Nt6OCuQr" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-2068acc8b98so10813825ad.3 for ; Thu, 05 Sep 2024 11:22:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560523; x=1726165323; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Y9xN5bPMgo2WHZh2fICHFeHRggcCb1gTrZfcU7FHyI4=; b=Nt6OCuQruQqrcZvcy+yya1AUH+orsRmA6kPQ9R11f6qdu0LhJXI/1Zyn/91ENDjLo8 xlQbiEWvy9l6wAuTnfsSQtZLklZ2+3hTsQJSoIKaO4xkqOFU33jGyVJcmfHrRATEtA5S ssdJ+Vkp89ah7GeXgRGHofqNfneJACPJyrQO8FRfdrP4CA92kQDoHmFhvgUA1CTfaiTt 1XWVO1DD5BBrzB6vcq95HJpK0cHtHoqxQMPfBkH9q3uWpbfWc6MYmTMbDG4Op5XfwqzW rdyjR9mSCz9AuT5gv0N9BJ4neGztLVABLZdtImEwSNTVZl/gx6Apzb54eVVZL+SrXHt5 z/LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560523; x=1726165323; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y9xN5bPMgo2WHZh2fICHFeHRggcCb1gTrZfcU7FHyI4=; b=oMzAzuCr5Mm0bMwjdd5u5qvqSfXa3eHZ/5fSnnIuHJR1YmN8iqCUYk28611TKfAHVI 4pjqGZXPgobPHdHz3IMFK/ysgVkjLNEBQ57Qo2H7h+gpJavJlk5h0990BUmfy4m7mDZ7 K29DF7D+gwN84XcvKAuttmfcT+Gc4V31mz0WXo3ojVn29luc/SMtX/MATaZ2HoXWzgpM CfG/8iwZP8/QznQEubpWSVHiJu9r8WO0iGlyPcaAa89x4qrBa2g2xAhqZ6HLU+0q6AvO fIzJhodWdxPVXzqVKPawiOnpSxgIaNK/XjBzIVblUS7DSBIsjsMSIXZldaAH9xDu0sS0 Yf3Q== X-Gm-Message-State: AOJu0Yz/KXYE016rtsyP8CDvOemKUD7pPDt8hlpQ3w3Q8LU2rF79kkHG l2VJ4eGngqU003uyrs1RyeNlm3qbIzGVf52axjktOtT97hazuYVhd+jRrDR0 X-Google-Smtp-Source: AGHT+IFAA9xU2PTcqlGCVHC+KPCOecu8nc2XYoy2daipixbC9UqoJF/ih2P3KswXioliWXWaneT/7Q== X-Received: by 2002:a17:902:e544:b0:205:5bfd:13ae with SMTP id d9443c01a7336-2055bfd226cmr212915475ad.35.1725560522596; Thu, 05 Sep 2024 11:22:02 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:02 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , yangerkun , "Darrick J. Wong" , Christoph Hellwig , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 11/26] xfs: buffer pins need to hold a buffer reference Date: Thu, 5 Sep 2024 11:21:28 -0700 Message-ID: <20240905182144.2691920-12-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 89a4bf0dc3857569a77061d3d5ea2ac85f7e13c6 ] When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_buf_item.c | 88 ++++++++++++++++++++++++++++++++----------- 1 file changed, 65 insertions(+), 23 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index df7322ed73fa..023d4e0385dd 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -452,10 +452,18 @@ xfs_buf_item_format( * This is called to pin the buffer associated with the buf log item in memory * so it cannot be written out. * - * We also always take a reference to the buffer log item here so that the bli - * is held while the item is pinned in memory. This means that we can - * unconditionally drop the reference count a transaction holds when the - * transaction is completed. + * We take a reference to the buffer log item here so that the BLI life cycle + * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and + * inserted into the AIL. + * + * We also need to take a reference to the buffer itself as the BLI unpin + * processing requires accessing the buffer after the BLI has dropped the final + * BLI reference. See xfs_buf_item_unpin() for an explanation. + * If unpins race to drop the final BLI reference and only the + * BLI owns a reference to the buffer, then the loser of the race can have the + * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per + * pin count ensures the life cycle of the buffer extends for as + * long as we hold the buffer pin reference in xfs_buf_item_unpin(). */ STATIC void xfs_buf_item_pin( @@ -470,13 +478,30 @@ xfs_buf_item_pin( trace_xfs_buf_item_pin(bip); + xfs_buf_hold(bip->bli_buf); atomic_inc(&bip->bli_refcount); atomic_inc(&bip->bli_buf->b_pin_count); } /* - * This is called to unpin the buffer associated with the buf log item which - * was previously pinned with a call to xfs_buf_item_pin(). + * This is called to unpin the buffer associated with the buf log item which was + * previously pinned with a call to xfs_buf_item_pin(). We enter this function + * with a buffer pin count, a buffer reference and a BLI reference. + * + * We must drop the BLI reference before we unpin the buffer because the AIL + * doesn't acquire a BLI reference whenever it accesses it. Therefore if the + * refcount drops to zero, the bli could still be AIL resident and the buffer + * submitted for I/O at any point before we return. This can result in IO + * completion freeing the buffer while we are still trying to access it here. + * This race condition can also occur in shutdown situations where we abort and + * unpin buffers from contexts other that journal IO completion. + * + * Hence we have to hold a buffer reference per pin count to ensure that the + * buffer cannot be freed until we have finished processing the unpin operation. + * The reference is taken in xfs_buf_item_pin(), and we must hold it until we + * are done processing the buffer state. In the case of an abort (remove = + * true) then we re-use the current pin reference as the IO reference we hand + * off to IO failure handling. */ STATIC void xfs_buf_item_unpin( @@ -493,24 +518,18 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin(bip); - /* - * Drop the bli ref associated with the pin and grab the hold required - * for the I/O simulation failure in the abort case. We have to do this - * before the pin count drops because the AIL doesn't acquire a bli - * reference. Therefore if the refcount drops to zero, the bli could - * still be AIL resident and the buffer submitted for I/O (and freed on - * completion) at any point before we return. This can be removed once - * the AIL properly holds a reference on the bli. - */ freed = atomic_dec_and_test(&bip->bli_refcount); - if (freed && !stale && remove) - xfs_buf_hold(bp); if (atomic_dec_and_test(&bp->b_pin_count)) wake_up_all(&bp->b_waiters); - /* nothing to do but drop the pin count if the bli is active */ - if (!freed) + /* + * Nothing to do but drop the buffer pin reference if the BLI is + * still active. + */ + if (!freed) { + xfs_buf_rele(bp); return; + } if (stale) { ASSERT(bip->bli_flags & XFS_BLI_STALE); @@ -522,6 +541,15 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin_stale(bip); + /* + * The buffer has been locked and referenced since it was marked + * stale so we own both lock and reference exclusively here. We + * do not need the pin reference any more, so drop it now so + * that we only have one reference to drop once item completion + * processing is complete. + */ + xfs_buf_rele(bp); + /* * If we get called here because of an IO error, we may or may * not have the item on the AIL. xfs_trans_ail_delete() will @@ -538,16 +566,30 @@ xfs_buf_item_unpin( ASSERT(bp->b_log_item == NULL); } xfs_buf_relse(bp); - } else if (remove) { + return; + } + + if (remove) { /* - * The buffer must be locked and held by the caller to simulate - * an async I/O failure. We acquired the hold for this case - * before the buffer was unpinned. + * We need to simulate an async IO failures here to ensure that + * the correct error completion is run on this buffer. This + * requires a reference to the buffer and for the buffer to be + * locked. We can safely pass ownership of the pin reference to + * the IO to ensure that nothing can free the buffer while we + * wait for the lock and then run the IO failure completion. */ xfs_buf_lock(bp); bp->b_flags |= XBF_ASYNC; xfs_buf_ioend_fail(bp); + return; } + + /* + * BLI has no more active references - it will be moved to the AIL to + * manage the remaining BLI/buffer life cycle. There is nothing left for + * us to do here so drop the pin reference to the buffer. + */ + xfs_buf_rele(bp); } STATIC uint