From patchwork Tue Aug 25 12:05:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 11735627 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5507717CF for ; Tue, 25 Aug 2020 12:06:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 475D42074A for ; Tue, 25 Aug 2020 12:06:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729461AbgHYMGJ (ORCPT ); Tue, 25 Aug 2020 08:06:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:42086 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729188AbgHYMGF (ORCPT ); Tue, 25 Aug 2020 08:06:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8810DAFC5; Tue, 25 Aug 2020 12:06:34 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 19F9E1E06E9; Tue, 25 Aug 2020 14:05:59 +0200 (CEST) From: Jan Kara To: Cc: yebin , Christoph Hellwig , , Jens Axboe , Jan Kara , stable@vger.kernel.org Subject: [PATCH RFC 1/2] fs: Don't invalidate page buffers in block_write_full_page() Date: Tue, 25 Aug 2020 14:05:53 +0200 Message-Id: <20200825120554.13070-2-jack@suse.cz> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200825120554.13070-1-jack@suse.cz> References: <20200825120554.13070-1-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If block_write_full_page() is called for a page that is beyond current inode size, it will truncate page buffers for the page and return 0. This logic has been added in 2.5.62 in commit 81eb69062588 ("fix ext3 BUG due to race with truncate") in history.git tree to fix a problem with ext3 in data=ordered mode. This particular problem doesn't exist anymore because ext3 is long gone and ext4 handles ordered data differently. Also normally buffers are invalidated by truncate code and there's no need to specially handle this in ->writepage() code. This invalidation of page buffers in block_write_full_page() is causing issues to filesystems (e.g. ext4 or ocfs2) when block device is shrunk under filesystem's hands and metadata buffers get discarded while being tracked by the journalling layer. Although it is obviously "not supported" it can cause kernel crashes like: [ 7986.689400] BUG: unable to handle kernel NULL pointer dereference at +0000000000000008 [ 7986.697197] PGD 0 P4D 0 [ 7986.699724] Oops: 0002 [#1] SMP PTI [ 7986.703200] CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G +O --------- - - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1 [ 7986.716438] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015 [ 7986.723462] RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2] ... [ 7986.810150] Call Trace: [ 7986.812595] __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2] [ 7986.818408] jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2] [ 7986.836467] kjournald2+0xbd/0x270 [jbd2] which is not great. The crash happens because bh->b_private is suddently NULL although BH_JBD flag is still set (this is because block_invalidatepage() cleared BH_Mapped flag and subsequent bh lookup found buffer without BH_Mapped set, called init_page_buffers() which has rewritten bh->b_private). So just remove the invalidation in block_write_full_page(). Note that the buffer cache invalidation when block device changes size is already careful to avoid similar problems by using invalidate_mapping_pages() which skips busy buffers so it was only this odd block_write_full_page() behavior that could tear down bdev buffers under filesystem's hands. Reported-by: Ye Bin CC: stable@vger.kernel.org Signed-off-by: Jan Kara --- fs/buffer.c | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 061dd202979d..163c2c0b9aa3 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2771,16 +2771,6 @@ int nobh_writepage(struct page *page, get_block_t *get_block, /* Is the page fully outside i_size? (truncate in progress) */ offset = i_size & (PAGE_SIZE-1); if (page->index >= end_index+1 || !offset) { - /* - * The page may have dirty, unmapped buffers. For example, - * they may have been added in ext3_writepage(). Make them - * freeable here, so the page does not leak. - */ -#if 0 - /* Not really sure about this - do we need this ? */ - if (page->mapping->a_ops->invalidatepage) - page->mapping->a_ops->invalidatepage(page, offset); -#endif unlock_page(page); return 0; /* don't care */ } @@ -2975,12 +2965,6 @@ int block_write_full_page(struct page *page, get_block_t *get_block, /* Is the page fully outside i_size? (truncate in progress) */ offset = i_size & (PAGE_SIZE-1); if (page->index >= end_index+1 || !offset) { - /* - * The page may have dirty, unmapped buffers. For example, - * they may have been added in ext3_writepage(). Make them - * freeable here, so the page does not leak. - */ - do_invalidatepage(page, 0, PAGE_SIZE); unlock_page(page); return 0; /* don't care */ } From patchwork Tue Aug 25 12:05:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 11735625 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B062D17C7 for ; Tue, 25 Aug 2020 12:06:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A022520706 for ; Tue, 25 Aug 2020 12:06:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729428AbgHYMGI (ORCPT ); Tue, 25 Aug 2020 08:06:08 -0400 Received: from mx2.suse.de ([195.135.220.15]:42078 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729131AbgHYMGF (ORCPT ); Tue, 25 Aug 2020 08:06:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8430FACDF; Tue, 25 Aug 2020 12:06:34 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 1E91E1E1319; Tue, 25 Aug 2020 14:05:59 +0200 (CEST) From: Jan Kara To: Cc: yebin , Christoph Hellwig , , Jens Axboe , Jan Kara Subject: [PATCH RFC 2/2] block: Do not discard buffers under a mounted filesystem Date: Tue, 25 Aug 2020 14:05:54 +0200 Message-Id: <20200825120554.13070-3-jack@suse.cz> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200825120554.13070-1-jack@suse.cz> References: <20200825120554.13070-1-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Discarding blocks and buffers under a mounted filesystem is hardly anything admin wants to do. Usually it will confuse the filesystem and sometimes the loss of buffer_head state (including b_private field) can even cause crashes like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G O --------- - - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1 Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015 RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2] ... Call Trace: __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2] jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2] kjournald2+0xbd/0x270 [jbd2] So refuse fallocate(2) and BLKZEROOUT, BLKDISCARD, BLKSECDISCARD ioctls for a block device having filesystem mounted. Reported-by: Ye Bin Signed-off-by: Jan Kara --- block/ioctl.c | 19 ++++++++++++++++++- fs/block_dev.c | 9 +++++++++ 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/block/ioctl.c b/block/ioctl.c index bdb3bbb253d9..0e3a46b0ffc8 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -113,7 +113,7 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, uint64_t start, len; struct request_queue *q = bdev_get_queue(bdev); struct address_space *mapping = bdev->bd_inode->i_mapping; - + struct super_block *sb; if (!(mode & FMODE_WRITE)) return -EBADF; @@ -134,6 +134,14 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode, if (start + len > i_size_read(bdev->bd_inode)) return -EINVAL; + /* + * Don't mess with device with mounted filesystem. + */ + sb = get_super(bdev); + if (sb) { + drop_super(sb); + return -EBUSY; + } truncate_inode_pages_range(mapping, start, start + len - 1); return blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL, flags); @@ -145,6 +153,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, uint64_t range[2]; struct address_space *mapping; uint64_t start, end, len; + struct super_block *sb; if (!(mode & FMODE_WRITE)) return -EBADF; @@ -165,6 +174,14 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, if (end < start) return -EINVAL; + /* + * Don't mess with device with mounted filesystem. + */ + sb = get_super(bdev); + if (sb) { + drop_super(sb); + return -EBUSY; + } /* Invalidate the page cache, including dirty pages */ mapping = bdev->bd_inode->i_mapping; truncate_inode_pages_range(mapping, start, end); diff --git a/fs/block_dev.c b/fs/block_dev.c index 8ae833e00443..5b398eb7c34c 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1973,6 +1973,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t end = start + len - 1; loff_t isize; int error; + struct super_block *sb; /* Fail if we don't recognize the flags. */ if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) @@ -1996,6 +1997,14 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, if ((start | len) & (bdev_logical_block_size(bdev) - 1)) return -EINVAL; + /* + * Don't mess with device with mounted filesystem. + */ + sb = get_super(bdev); + if (sb) { + drop_super(sb); + return -EBUSY; + } /* Invalidate the page cache, including dirty pages. */ mapping = bdev->bd_inode->i_mapping; truncate_inode_pages_range(mapping, start, end);