From patchwork Mon Aug 7 17:00:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Lyakas X-Patchwork-Id: 9885843 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 51EF9602CC for ; Mon, 7 Aug 2017 17:00:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FB11286B6 for ; Mon, 7 Aug 2017 17:00:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 344DB286BC; Mon, 7 Aug 2017 17:00:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3321B286D0 for ; Mon, 7 Aug 2017 17:00:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751876AbdHGRAs (ORCPT ); Mon, 7 Aug 2017 13:00:48 -0400 Received: from mail-vk0-f44.google.com ([209.85.213.44]:36852 "EHLO mail-vk0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751820AbdHGRAr (ORCPT ); Mon, 7 Aug 2017 13:00:47 -0400 Received: by mail-vk0-f44.google.com with SMTP id u133so3959886vke.3 for ; Mon, 07 Aug 2017 10:00:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zadarastorage-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=nFlwavMTD/eS1HFyzaseQbS4pKJkngRKC1rNgyEDwxE=; b=q8R/SeIJs2MTTHO2OBlnhs/RI39pNXgH6XV2G7ItMfTdLxkFbkLseAsZ7ncEa4eJUO RPo4Luhxo43j6NsR/aGGCu3NPMsYUtO4AHAr/83JLx12Srmyv7bTfHt+MdH7VYFlXS+6 FbudNvtc5npPK0Lc4Nncmr0qYxVN52b5PsjGS5KnwI2bE/TTfHniY0eRQn4q28wXJBUA nolNDH41y/EDAgYrlpaIfLGrUeb6+xfvrcDbYdvOQVsEsiOBJ1TaqS9Qjt7H7poQlRJs Y5fvbDVEb1PO10BGot/jk4yaWseEigBarD9R5WL52C4eqcqHGKVfCOyFnSsumzA7uBG0 E1Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=nFlwavMTD/eS1HFyzaseQbS4pKJkngRKC1rNgyEDwxE=; b=gUndSSCI6rZ7xRyE8z3Egdo2JzNXGA3OxLL20gXc2B6y4LDbyVd8aIY/lu3E4kKhx9 1D0VT/4BtO0cxTqQaoKcz2ww0QjotvwKTLmvl8IgczLiUXEbeMkxMRCLQkbFNeMvD1zX xaalFAvV1c6A9s+0K8pXQSVrN80IAw2uGfoplr232wLSfpIDsk+apHq9yImizhY+Vv6p pHq0RwMeksrNVnYaGUHr7D2grNIvtsSdL283V8xPCanH0cWbZJx6z2vTlz9DjzFoWrWL j9V2/J7DjtId/EQSVtKYNlkEkz088CmoL//Vj7AktSFFc+h2wyXqnF9G4ffoqLY7s4f6 X8pA== X-Gm-Message-State: AHYfb5jsOlHt3KIBKbqMuHdykBji1EhcaSycMRiVEZGnGFagNwHQ83F3 Gawt3lsiaiyysECnk5vu45RS9cKEO/FG X-Received: by 10.31.54.13 with SMTP id d13mr790587vka.15.1502125246523; Mon, 07 Aug 2017 10:00:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.5.6 with HTTP; Mon, 7 Aug 2017 10:00:46 -0700 (PDT) In-Reply-To: <20170807151651.GA43920@bfoster.bfoster> References: <8948A7010F7A4A758A580103D54CE381@alyakaslap> <20170807151651.GA43920@bfoster.bfoster> From: Alex Lyakas Date: Mon, 7 Aug 2017 20:00:46 +0300 Message-ID: Subject: Re: [RFC - PATCH] xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of the first attribute To: Brian Foster Cc: linux-xfs@vger.kernel.org, Dave Chinner , Shyam Kaushik , Yair Hershko Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hello Brian, Thanks for the review. On Mon, Aug 7, 2017 at 6:16 PM, Brian Foster wrote: > On Mon, Aug 07, 2017 at 03:54:03PM +0300, Alex Lyakas wrote: >> The new attribute leaf buffer is not held locked across the transaction roll >> between the shortform->leaf modification and the addition of the new entry. >> As a result, the attribute buffer modification being made is not atomic >> from an operational perspective. >> Hence the AIL push can grab it in the transient state of "just created" >> after the initial transaction is rolled, because the buffer has been >> released. >> This leads to xfs_attr3_leaf_verify() asserting that hdr.count is zero, >> treating this as in-memory corruption, and shutting down the filesystem. >> >> More info at: >> https://www.spinics.net/lists/linux-xfs/msg08778.html >> > > FYI.. I'm not sure how appropriate it is to use URLs in commit log > descriptions. Perhaps somebody else can chime in on that. I will get rid of it. > >> This patch is based on kernel 3.18.19, which is a long-term (although EOL) >> kernel. >> This is the reason it is marked as RFC. >> > > It's probably best to develop against the latest kernel, get the fix > right, then worry about v3.18 for a backport. Unfortunately, for us this is exactly the opposite. We are running kernel 3.18 in production, and that's where we need the fix. Moving to a different kernel is a significant effort for us. We do it from time to time, but it's always to a long-term stable kernel and not to one of the latests. Still, below I provide a patch for 4.13-rc4. > >> Reproducing the issue is achieved by adding "msleep(1000)" after the >> xfs_trans_roll() call. >> From the limited testing, this patch seems to fix the issue. >> Once/if the community approves this patch, we will do a formal testing. >> >> Signed-off-by: Alex Lyakas >> --- > > Note that your patch doesn't apply for me: > > ... > patching file fs/xfs/libxfs/xfs_attr.c > Hunk #1 FAILED at 285. > patch: **** malformed patch at line 70: commits */ > > Perhaps it has been damaged by your mailer? Otherwise, a few initial > comments... Trying a different mailer now. > >> fs/xfs/libxfs/xfs_attr.c | 24 +++++++++++++++++++++--- >> fs/xfs/libxfs/xfs_attr_leaf.c | 5 ++++- >> fs/xfs/libxfs/xfs_attr_leaf.h | 2 +- >> 3 files changed, 26 insertions(+), 5 deletions(-) >> >> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c >> index 353fb42..c0ced12 100644 >> --- a/fs/xfs/libxfs/xfs_attr.c >> +++ b/fs/xfs/libxfs/xfs_attr.c >> @@ -285,20 +285,21 @@ xfs_attr_set( >> >> xfs_trans_ijoin(args.trans, dp, 0); >> >> /* >> * If the attribute list is non-existent or a shortform list, >> * upgrade it to a single-leaf-block attribute list. >> */ >> if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL || >> (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS && >> dp->i_d.di_anextents == 0)) { >> + xfs_buf_t *leaf_bp = NULL; > > Use struct xfs_buf here and throughout. We're attempting to remove uses > of the old typedefs wherever possible. Noted. > >> >> /* >> * Build initial attribute list (if required). >> */ >> if (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) >> xfs_attr_shortform_create(&args); >> >> /* >> * Try to add the attr to the attribute list in >> * the inode. >> @@ -328,48 +329,65 @@ xfs_attr_set( >> xfs_iunlock(dp, XFS_ILOCK_EXCL); >> >> return error ? error : err2; >> } >> >> /* >> * It won't fit in the shortform, transform to a leaf block. >> * GROT: another possible req'mt for a double-split btree op. >> */ >> xfs_bmap_init(args.flist, args.firstblock); >> - error = xfs_attr_shortform_to_leaf(&args); >> + error = xfs_attr_shortform_to_leaf(&args, &leaf_bp); >> + /* hold the leaf buffer locked, when "args.trans" transaction >> commits */ >> + if (leaf_bp) >> + xfs_trans_bhold(args.trans, leaf_bp); > > Indentation looks off here and throughout. > >> + >> if (!error) { >> error = xfs_bmap_finish(&args.trans, args.flist, >> &committed); >> } >> if (error) { >> ASSERT(committed); >> + if (leaf_bp) >> + xfs_buf_relse(leaf_bp); > > Hmm, I don't think this is right. We don't want to release the buffer > while the transaction still has a reference. Perhaps we should move this > to the "out:" patch after the transaction cancel (and make sure the > pointer is reset at the appropriate place)..? I think I understand what you are saying. After we call xfs_trans_bhold(), we need first the original transaction to commit or to abort. Then we must either rejoin the buffer to a different transaction or release it. I believe the below patch addresses this issue. > >> args.trans = NULL; >> xfs_bmap_cancel(&flist); >> goto out; >> } >> >> /* >> * bmap_finish() may have committed the last trans and started >> * a new one. We need the inode to be in all transactions. >> + * We also need to rejoin the leaf buffer to the new transaction >> + * and prevent it from being unlocked, before we commit it in >> xfs_trans_roll(). >> + * If bmap_finish() did not commit, then we are in the same >> transaction, >> + * and no need to call xfs_trans_bhold() again. >> */ >> - if (committed) >> + if (committed) { >> xfs_trans_ijoin(args.trans, dp, 0); >> + xfs_trans_bjoin(args.trans, leaf_bp); >> + xfs_trans_bhold(args.trans, leaf_bp); > > I don't see why this is necessary. We still have the buffer held and > locked, yes? I believe you are right. We don't care if there was another transaction in-between. Addressed this. > >> + } >> >> /* >> * Commit the leaf transformation. We'll need another (linked) >> * transaction to add the new attribute to the leaf. >> */ >> >> error = xfs_trans_roll(&args.trans, dp); >> - if (error) >> + if (error) { >> + xfs_buf_relse(leaf_bp); > > Same problem as above. > >> goto out; >> + } >> >> + /* rejoin the leaf buffer to the new transaction */ >> + xfs_trans_bjoin(args.trans, leaf_bp); > > This comment should point out that this allows a subsequent read to find > the buffer in the transaction (and avoid a deadlock). Done. > >> } >> >> if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) >> error = xfs_attr_leaf_addname(&args); >> else >> error = xfs_attr_node_addname(&args); >> if (error) >> goto out; >> >> /* >> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c >> index b7cd0a0..2e03c32 100644 >> --- a/fs/xfs/libxfs/xfs_attr_leaf.c >> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c >> @@ -649,23 +649,24 @@ xfs_attr_shortform_getvalue(xfs_da_args_t *args) >> args->valuelen = sfe->valuelen; >> memcpy(args->value, &sfe->nameval[args->namelen], >> args->valuelen); >> return -EEXIST; >> } >> return -ENOATTR; >> } >> >> /* >> * Convert from using the shortform to the leaf. >> + * Upon success, return the leaf buffer. >> */ >> int >> -xfs_attr_shortform_to_leaf(xfs_da_args_t *args) >> +xfs_attr_shortform_to_leaf(xfs_da_args_t *args, xfs_buf_t **bpp) >> { >> xfs_inode_t *dp; >> xfs_attr_shortform_t *sf; >> xfs_attr_sf_entry_t *sfe; >> xfs_da_args_t nargs; >> char *tmpbuffer; >> int error, i, size; >> xfs_dablk_t blkno; >> struct xfs_buf *bp; >> xfs_ifork_t *ifp; >> @@ -733,20 +734,22 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t *args) >> ASSERT(error == -ENOATTR); >> error = xfs_attr3_leaf_add(bp, &nargs); >> ASSERT(error != -ENOSPC); >> if (error) >> goto out; >> sfe = XFS_ATTR_SF_NEXTENTRY(sfe); >> } >> error = 0; >> >> out: >> + if (error == 0) >> + *bpp = bp; > > It looks like the only way error == 0 is where it is set just above, so > we could probably move this before the label. I am a bit confused by this code in xfs_attr_shortform_to_leaf(): ... error = xfs_attr3_leaf_create(args, blkno, &bp); if (error) { error = xfs_da_shrink_inode(args, 0, bp); bp = NULL; if (error) goto out; xfs_idata_realloc(dp, size, XFS_ATTR_FORK); /* try to put */ memcpy(ifp->if_u1.if_data, tmpbuffer, size); /* it back */ goto out; } Here we fail to convert to a leaf. But if xfs_da_shrink_inode() succeeds, we return error==0 back to the caller. But we actually failed to convert. This, however, is not directly related to your comment. > > I'm also wondering whether it might be cleaner to 1.) conditionally > return the buffer when sf->hdr.count == 0 because that is the only case > where the problem occurs and 2.) do the trans_bhold() here in > shortform_to_leaf() when we do actually return it. > > Brian > >> kmem_free(tmpbuffer); >> return error; >> } >> Here is the patch based on kernel 4.13-rc4. int xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes); --- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index de7b9bd..982e322 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -218,6 +218,7 @@ xfs_fsblock_t firstblock; int rsvd = (flags & ATTR_ROOT) != 0; int error, err2, local; + struct xfs_buf *leaf_bp = NULL; XFS_STATS_INC(mp, xs_attr_set); @@ -327,7 +328,13 @@ * GROT: another possible req'mt for a double-split btree op. */ xfs_defer_init(args.dfops, args.firstblock); - error = xfs_attr_shortform_to_leaf(&args); + error = xfs_attr_shortform_to_leaf(&args, &leaf_bp); + /* + * Prevent the leaf buffer from being unlocked + * when "args.trans" transaction commits. + */ + if (leaf_bp) + xfs_trans_bhold(args.trans, leaf_bp); if (!error) error = xfs_defer_finish(&args.trans, args.dfops, dp); if (error) { @@ -345,6 +352,14 @@ if (error) goto out; + /* + * Rejoin the leaf buffer to the new transaction. + * This allows a subsequent read to find the buffer in the + * transaction (and avoid a deadlock). + */ + xfs_trans_bjoin(args.trans, leaf_bp); + /* Prevent from being released at the end of the function */ + leaf_bp = NULL; } if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) @@ -376,6 +391,8 @@ out: if (args.trans) xfs_trans_cancel(args.trans); + if (leaf_bp) + xfs_buf_relse(leaf_bp); xfs_iunlock(dp, XFS_ILOCK_EXCL); return error; } diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index c6c15e5..ab73e4b 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -740,9 +740,10 @@ STATIC void xfs_attr3_leaf_moveents(struct xfs_da_args *args, /* * Convert from using the shortform to the leaf. + * Upon success, return the leaf buffer. */ int -xfs_attr_shortform_to_leaf(xfs_da_args_t *args) +xfs_attr_shortform_to_leaf(xfs_da_args_t *args, struct xfs_buf **bpp) { xfs_inode_t *dp; xfs_attr_shortform_t *sf; @@ -822,6 +823,7 @@ STATIC void xfs_attr3_leaf_moveents(struct xfs_da_args *args, sfe = XFS_ATTR_SF_NEXTENTRY(sfe); } error = 0; + *bpp = bp; out: kmem_free(tmpbuffer); diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h index f7dda0c..7382d4e 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.h +++ b/fs/xfs/libxfs/xfs_attr_leaf.h @@ -48,7 +48,7 @@ void xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff); int xfs_attr_shortform_lookup(struct xfs_da_args *args); int xfs_attr_shortform_getvalue(struct xfs_da_args *args); -int xfs_attr_shortform_to_leaf(struct xfs_da_args *args); +int xfs_attr_shortform_to_leaf(struct xfs_da_args *args, struct xfs_buf **bpp); int xfs_attr_shortform_remove(struct xfs_da_args *args); int xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);