From patchwork Mon Jun 20 15:55:30 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Peterson X-Patchwork-Id: 9188079 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A5B02601C0 for ; Mon, 20 Jun 2016 16:12:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9451C20410 for ; Mon, 20 Jun 2016 16:12:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8905427B16; Mon, 20 Jun 2016 16:12:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D9A2B20410 for ; Mon, 20 Jun 2016 16:12:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754477AbcFTQEM (ORCPT ); Mon, 20 Jun 2016 12:04:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60519 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752232AbcFTQDc (ORCPT ); Mon, 20 Jun 2016 12:03:32 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ABB93C0AED25 for ; Mon, 20 Jun 2016 15:55:34 +0000 (UTC) Received: from loki.redhat.com (ovpn-116-103.phx2.redhat.com [10.3.116.103]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u5KFtXJ1002710; Mon, 20 Jun 2016 11:55:34 -0400 From: Bob Peterson To: Cc: cluster-devel@redhat.com Subject: [PATCH 1/4] gfs2: Fix gfs2_lookup_by_inum lock inversion Date: Mon, 20 Jun 2016 10:55:30 -0500 Message-Id: <1466438133-10564-2-git-send-email-rpeterso@redhat.com> In-Reply-To: <1466438133-10564-1-git-send-email-rpeterso@redhat.com> References: <1466438133-10564-1-git-send-email-rpeterso@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 20 Jun 2016 15:55:34 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Andreas Gruenbacher The current gfs2_lookup_by_inum takes the glock of a presumed inode identified by block number, verifies that the block is indeed an inode, and then instantiates and reads the new inode via gfs2_inode_lookup. However, instantiating a new inode may block on freeing a previous instance of that inode (__wait_on_freeing_inode), and freeing an inode requires to take the glock already held, leading to lock inversion and deadlock. Fix this by first instantiating the new inode, then verifying that the block is an inode (if required), and then reading in the new inode, all in gfs2_inode_lookup. If the block we are looking for is not an inode, we discard the new inode via iget_failed, which marks inodes as bad and unhashes them. Other tasks waiting on that inode will get back a bad inode back from ilookup or iget_locked; in that case, retry the lookup. Signed-off-by: Andreas Gruenbacher Signed-off-by: Bob Peterson --- fs/gfs2/dir.c | 3 +- fs/gfs2/glock.c | 4 +- fs/gfs2/inode.c | 101 +++++++++++++++++++++++++++++++++++++-------------- fs/gfs2/inode.h | 3 +- fs/gfs2/ops_fstype.c | 3 +- 5 files changed, 81 insertions(+), 33 deletions(-) diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index 4a01f30..1b02665 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -1660,7 +1660,8 @@ struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *name, brelse(bh); if (fail_on_exist) return ERR_PTR(-EEXIST); - inode = gfs2_inode_lookup(dir->i_sb, dtype, addr, formal_ino); + inode = gfs2_inode_lookup(dir->i_sb, dtype, addr, formal_ino, + GFS2_BLKST_FREE /* ignore */); if (!IS_ERR(inode)) GFS2_I(inode)->i_rahead = rahead; return inode; diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 706fd93..ce46375 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -576,7 +576,7 @@ static void delete_work_func(struct work_struct *work) struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_delete); struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; struct gfs2_inode *ip; - struct inode *inode; + struct inode *inode = NULL; u64 no_addr = gl->gl_name.ln_number; /* If someone's using this glock to create a new dinode, the block must @@ -590,7 +590,7 @@ static void delete_work_func(struct work_struct *work) if (ip) inode = gfs2_ilookup(sdp->sd_vfs, no_addr); - else + if (IS_ERR_OR_NULL(inode)) inode = gfs2_lookup_by_inum(sdp, no_addr, NULL, GFS2_BLKST_UNLINKED); if (inode && !IS_ERR(inode)) { d_prune_aliases(inode); diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 21dc784..6d5c6bb 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -39,7 +39,33 @@ struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr) { - return ilookup(sb, (unsigned long)no_addr); + struct inode *inode; + +repeat: + inode = ilookup(sb, no_addr); + if (!inode) + return inode; + if (is_bad_inode(inode)) { + iput(inode); + goto repeat; + } + return inode; +} + +static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr) +{ + struct inode *inode; + +repeat: + inode = iget_locked(sb, no_addr); + if (!inode) + return inode; + if (is_bad_inode(inode)) { + iput(inode); + goto repeat; + } + GFS2_I(inode)->i_no_addr = no_addr; + return inode; } /** @@ -78,26 +104,37 @@ static void gfs2_set_iop(struct inode *inode) /** * gfs2_inode_lookup - Lookup an inode * @sb: The super block - * @no_addr: The inode number * @type: The type of the inode + * @no_addr: The inode number + * @no_formal_ino: The inode generation number + * @blktype: Requested block type (GFS2_BLKST_DINODE or GFS2_BLKST_UNLINKED; + * GFS2_BLKST_FREE do indicate not to verify) + * + * If @type is DT_UNKNOWN, the inode type is fetched from disk. + * + * If @blktype is anything other than GFS2_BLKST_FREE (which is used as a + * placeholder because it doesn't otherwise make sense), the on-disk block type + * is verified to be @blktype. * * Returns: A VFS inode, or an error */ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type, - u64 no_addr, u64 no_formal_ino) + u64 no_addr, u64 no_formal_ino, + unsigned int blktype) { struct inode *inode; struct gfs2_inode *ip; struct gfs2_glock *io_gl = NULL; + struct gfs2_holder i_gh; + bool unlock = false; int error; - inode = iget_locked(sb, (unsigned long)no_addr); + inode = gfs2_iget(sb, no_addr); if (!inode) return ERR_PTR(-ENOMEM); ip = GFS2_I(inode); - ip->i_no_addr = no_addr; if (inode->i_state & I_NEW) { struct gfs2_sbd *sdp = GFS2_SB(inode); @@ -112,10 +149,30 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type, if (unlikely(error)) goto fail_put; + if (type == DT_UNKNOWN || blktype != GFS2_BLKST_FREE) { + /* + * The GL_SKIP flag indicates to skip reading the inode + * block. We read the inode with gfs2_inode_refresh + * after possibly checking the block type. + */ + error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, + GL_SKIP, &i_gh); + if (error) + goto fail_put; + unlock = true; + + if (blktype != GFS2_BLKST_FREE) { + error = gfs2_check_blk_type(sdp, no_addr, + blktype); + if (error) + goto fail_put; + } + } + set_bit(GIF_INVALID, &ip->i_flags); error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, &ip->i_iopen_gh); if (unlikely(error)) - goto fail_iopen; + goto fail_put; ip->i_iopen_gh.gh_gl->gl_object = ip; gfs2_glock_put(io_gl); @@ -134,6 +191,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type, unlock_new_inode(inode); } + if (unlock) + gfs2_glock_dq_uninit(&i_gh); return inode; fail_refresh: @@ -141,10 +200,11 @@ fail_refresh: ip->i_iopen_gh.gh_gl->gl_object = NULL; gfs2_glock_dq_wait(&ip->i_iopen_gh); gfs2_holder_uninit(&ip->i_iopen_gh); -fail_iopen: +fail_put: if (io_gl) gfs2_glock_put(io_gl); -fail_put: + if (unlock) + gfs2_glock_dq_uninit(&i_gh); ip->i_gl->gl_object = NULL; fail: iget_failed(inode); @@ -155,23 +215,12 @@ struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr, u64 *no_formal_ino, unsigned int blktype) { struct super_block *sb = sdp->sd_vfs; - struct gfs2_holder i_gh; - struct inode *inode = NULL; + struct inode *inode; int error; - /* Must not read in block until block type is verified */ - error = gfs2_glock_nq_num(sdp, no_addr, &gfs2_inode_glops, - LM_ST_EXCLUSIVE, GL_SKIP, &i_gh); - if (error) - return ERR_PTR(error); - - error = gfs2_check_blk_type(sdp, no_addr, blktype); - if (error) - goto fail; - - inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, 0); + inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, 0, blktype); if (IS_ERR(inode)) - goto fail; + return inode; /* Two extra checks for NFS only */ if (no_formal_ino) { @@ -182,16 +231,12 @@ struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr, error = -EIO; if (GFS2_I(inode)->i_diskflags & GFS2_DIF_SYSTEM) goto fail_iput; - - error = 0; } + return inode; -fail: - gfs2_glock_dq_uninit(&i_gh); - return error ? ERR_PTR(error) : inode; fail_iput: iput(inode); - goto fail; + return ERR_PTR(error); } diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h index e1af0d4..443b46c 100644 --- a/fs/gfs2/inode.h +++ b/fs/gfs2/inode.h @@ -94,7 +94,8 @@ err: } extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, - u64 no_addr, u64 no_formal_ino); + u64 no_addr, u64 no_formal_ino, + unsigned int blktype); extern struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr, u64 *no_formal_ino, unsigned int blktype); diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 4546360..b8f6fc9 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -454,7 +454,8 @@ static int gfs2_lookup_root(struct super_block *sb, struct dentry **dptr, struct dentry *dentry; struct inode *inode; - inode = gfs2_inode_lookup(sb, DT_DIR, no_addr, 0); + inode = gfs2_inode_lookup(sb, DT_DIR, no_addr, 0, + GFS2_BLKST_FREE /* ignore */); if (IS_ERR(inode)) { fs_err(sdp, "can't read in %s inode: %ld\n", name, PTR_ERR(inode)); return PTR_ERR(inode);