From patchwork Mon Jul 4 04:34:31 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chandan Rajendra X-Patchwork-Id: 9211599 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0375F60467 for ; Mon, 4 Jul 2016 04:35:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EAFE62862D for ; Mon, 4 Jul 2016 04:35:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DFD002863D; Mon, 4 Jul 2016 04:35:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 153972862D for ; Mon, 4 Jul 2016 04:35:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752552AbcGDEfs (ORCPT ); Mon, 4 Jul 2016 00:35:48 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:43828 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752444AbcGDEfq (ORCPT ); Mon, 4 Jul 2016 00:35:46 -0400 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u644XilE138468 for ; Mon, 4 Jul 2016 00:35:46 -0400 Received: from e17.ny.us.ibm.com (e17.ny.us.ibm.com [129.33.205.207]) by mx0a-001b2d01.pphosted.com with ESMTP id 23x6ghjty7-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 04 Jul 2016 00:35:46 -0400 Received: from localhost by e17.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 4 Jul 2016 00:35:44 -0400 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e17.ny.us.ibm.com (146.89.104.204) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 4 Jul 2016 00:35:41 -0400 X-IBM-Helo: d01dlp02.pok.ibm.com X-IBM-MailFrom: chandan@linux.vnet.ibm.com Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 0D5E86E803F; Mon, 4 Jul 2016 00:35:22 -0400 (EDT) Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u644ZfXs10289644; Mon, 4 Jul 2016 04:35:42 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6491BAC05E; Mon, 4 Jul 2016 00:35:40 -0400 (EDT) Received: from localhost.in.ibm.com (unknown [9.124.208.86]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP id 3C4F1AC041; Mon, 4 Jul 2016 00:35:37 -0400 (EDT) From: Chandan Rajendra To: clm@fb.com, jbacik@fb.com, dsterba@suse.com Cc: Chandan Rajendra , linux-btrfs@vger.kernel.org Subject: [PATCH V20 11/19] Btrfs: subpage-blocksize: Prevent writes to an extent buffer when PG_writeback flag is set Date: Mon, 4 Jul 2016 10:04:31 +0530 X-Mailer: git-send-email 2.5.5 In-Reply-To: <1467606879-14181-1-git-send-email-chandan@linux.vnet.ibm.com> References: <1467606879-14181-1-git-send-email-chandan@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16070404-0040-0000-0000-000000B8DD39 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16070404-0041-0000-0000-00000492FAFF Message-Id: <1467606879-14181-12-git-send-email-chandan@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-07-04_02:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607040043 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In non-subpage-blocksize scenario, BTRFS_HEADER_FLAG_WRITTEN flag prevents Btrfs code from writing into an extent buffer whose pages are under writeback. This facility isn't sufficient for achieving the same in subpage-blocksize scenario, since we have more than one extent buffer mapped to a page. Hence this patch adds a new flag (i.e. EXTENT_BUFFER_HEAD_WRITEBACK) and corresponding code to track the writeback status of the page and to prevent writes to any of the extent buffers mapped to the page while writeback is going on. Signed-off-by: Chandan Rajendra --- fs/btrfs/ctree.c | 18 ++++++ fs/btrfs/extent-tree.c | 10 ++++ fs/btrfs/extent_io.c | 150 ++++++++++++++++++++++++++++++++++++++++--------- fs/btrfs/extent_io.h | 1 + 4 files changed, 152 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 4e35a21..0a56d1b 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1541,6 +1541,7 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle *trans, struct extent_buffer *parent, int parent_slot, struct extent_buffer **cow_ret) { + struct extent_buffer_head *ebh = eb_head(buf); u64 search_start; int ret; @@ -1555,6 +1556,14 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle *trans, if (!should_cow_block(trans, root, buf)) { trans->dirty = true; + if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK, &ebh->bflags)) { + if (parent) + btrfs_set_lock_blocking(parent); + btrfs_set_lock_blocking(buf); + wait_on_bit_io(&ebh->bflags, + EXTENT_BUFFER_HEAD_WRITEBACK, + TASK_UNINTERRUPTIBLE); + } *cow_ret = buf; return 0; } @@ -2686,6 +2695,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_key *key, struct btrfs_path *p, int ins_len, int cow) { + struct extent_buffer_head *ebh; struct extent_buffer *b; int slot; int ret; @@ -2790,6 +2800,14 @@ again: */ if (!should_cow_block(trans, root, b)) { trans->dirty = true; + ebh = eb_head(b); + if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK, + &ebh->bflags)) { + btrfs_set_path_blocking(p); + wait_on_bit_io(&ebh->bflags, + EXTENT_BUFFER_HEAD_WRITEBACK, + TASK_UNINTERRUPTIBLE); + } goto cow_done; } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 590d0e7..4ead0ff 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8224,15 +8224,25 @@ static struct extent_buffer * btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, int level) { + struct extent_buffer_head *ebh; struct extent_buffer *buf; buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize); if (IS_ERR(buf)) return buf; + ebh = eb_head(buf); btrfs_set_header_generation(buf, trans->transid); btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level); btrfs_tree_lock(buf); + + if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK, + &ebh->bflags)) { + btrfs_set_lock_blocking(buf); + wait_on_bit_io(&ebh->bflags, EXTENT_BUFFER_HEAD_WRITEBACK, + TASK_UNINTERRUPTIBLE); + } + clean_tree_block(trans, root->fs_info, buf); clear_bit(EXTENT_BUFFER_STALE, &buf->ebflags); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 694d2dc..0bdb27d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3725,6 +3725,52 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb) TASK_UNINTERRUPTIBLE); } +static void lock_extent_buffers(struct extent_buffer_head *ebh, + struct extent_page_data *epd) +{ + struct extent_buffer *locked_eb = NULL; + struct extent_buffer *eb; +again: + eb = &ebh->eb; + do { + if (eb == locked_eb) + continue; + + if (!btrfs_try_tree_write_lock(eb)) + goto backoff; + + } while ((eb = eb->eb_next) != NULL); + + return; + +backoff: + if (locked_eb && (locked_eb->start > eb->start)) + btrfs_tree_unlock(locked_eb); + + locked_eb = eb; + + eb = &ebh->eb; + while (eb != locked_eb) { + btrfs_tree_unlock(eb); + eb = eb->eb_next; + } + + flush_write_bio(epd); + + btrfs_tree_lock(locked_eb); + + goto again; +} + +static void unlock_extent_buffers(struct extent_buffer_head *ebh) +{ + struct extent_buffer *eb = &ebh->eb; + + do { + btrfs_tree_unlock(eb); + } while ((eb = eb->eb_next) != NULL); +} + static void lock_extent_buffer_pages(struct extent_buffer_head *ebh, struct extent_page_data *epd) { @@ -3744,21 +3790,17 @@ static void lock_extent_buffer_pages(struct extent_buffer_head *ebh, } static int noinline_for_stack -lock_extent_buffer_for_io(struct extent_buffer *eb, +mark_extent_buffer_writeback(struct extent_buffer *eb, struct btrfs_fs_info *fs_info, struct extent_page_data *epd) { + struct extent_buffer_head *ebh = eb_head(eb); + struct extent_buffer *cur; int dirty; int ret = 0; - if (!btrfs_try_tree_write_lock(eb)) { - flush_write_bio(epd); - btrfs_tree_lock(eb); - } - if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->ebflags)) { dirty = test_bit(EXTENT_BUFFER_DIRTY, &eb->ebflags); - btrfs_tree_unlock(eb); if (!epd->sync_io) { if (!dirty) return 1; @@ -3766,15 +3808,23 @@ lock_extent_buffer_for_io(struct extent_buffer *eb, return 2; } + cur = &ebh->eb; + do { + btrfs_set_lock_blocking(cur); + } while ((cur = cur->eb_next) != NULL); + flush_write_bio(epd); while (1) { wait_on_extent_buffer_writeback(eb); - btrfs_tree_lock(eb); if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->ebflags)) break; - btrfs_tree_unlock(eb); } + + cur = &ebh->eb; + do { + btrfs_clear_lock_blocking(cur); + } while ((cur = cur->eb_next) != NULL); } /* @@ -3782,22 +3832,20 @@ lock_extent_buffer_for_io(struct extent_buffer *eb, * under IO since we can end up having no IO bits set for a short period * of time. */ - spin_lock(&eb_head(eb)->refs_lock); + spin_lock(&ebh->refs_lock); if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->ebflags)) { set_bit(EXTENT_BUFFER_WRITEBACK, &eb->ebflags); - spin_unlock(&eb_head(eb)->refs_lock); + spin_unlock(&ebh->refs_lock); btrfs_set_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN); __percpu_counter_add(&fs_info->dirty_metadata_bytes, -eb->len, fs_info->dirty_metadata_batch); ret = 0; } else { - spin_unlock(&eb_head(eb)->refs_lock); + spin_unlock(&ebh->refs_lock); ret = 1; } - btrfs_tree_unlock(eb); - return ret; } @@ -3947,8 +3995,8 @@ static void set_btree_ioerr(struct extent_buffer *eb, struct page *page) static void end_bio_subpagesize_blocksize_ebh_writepage(struct bio *bio) { - struct bio_vec *bvec; struct extent_buffer *eb; + struct bio_vec *bvec; int i, done; bio_for_each_segment_all(bvec, bio, i) { @@ -3980,6 +4028,15 @@ static void end_bio_subpagesize_blocksize_ebh_writepage(struct bio *bio) end_extent_buffer_writeback(eb); + if (done) { + struct extent_buffer_head *ebh = eb_head(eb); + + clear_bit(EXTENT_BUFFER_HEAD_WRITEBACK, + &ebh->bflags); + smp_mb__after_atomic(); + wake_up_bit(&ebh->bflags, + EXTENT_BUFFER_HEAD_WRITEBACK); + } } while ((eb = eb->eb_next) != NULL); } @@ -3989,6 +4046,7 @@ static void end_bio_subpagesize_blocksize_ebh_writepage(struct bio *bio) static void end_bio_regular_ebh_writepage(struct bio *bio) { + struct extent_buffer_head *ebh; struct extent_buffer *eb; struct bio_vec *bvec; int i, done; @@ -3999,7 +4057,9 @@ static void end_bio_regular_ebh_writepage(struct bio *bio) eb = (struct extent_buffer *)page->private; BUG_ON(!eb); - done = atomic_dec_and_test(&eb_head(eb)->io_bvecs); + ebh = eb_head(eb); + + done = atomic_dec_and_test(&ebh->io_bvecs); if (bio->bi_error || test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->ebflags)) { @@ -4013,6 +4073,10 @@ static void end_bio_regular_ebh_writepage(struct bio *bio) continue; end_extent_buffer_writeback(eb); + + clear_bit(EXTENT_BUFFER_HEAD_WRITEBACK, &ebh->bflags); + smp_mb__after_atomic(); + wake_up_bit(&ebh->bflags, EXTENT_BUFFER_HEAD_WRITEBACK); } bio_put(bio); @@ -4054,8 +4118,14 @@ write_regular_ebh(struct extent_buffer_head *ebh, set_btree_ioerr(eb, p); end_page_writeback(p); if (atomic_sub_and_test(num_pages - i, - &eb_head(eb)->io_bvecs)) + &ebh->io_bvecs)) { end_extent_buffer_writeback(eb); + clear_bit(EXTENT_BUFFER_HEAD_WRITEBACK, + &ebh->bflags); + smp_mb__after_atomic(); + wake_up_bit(&ebh->bflags, + EXTENT_BUFFER_HEAD_WRITEBACK); + } ret = -EIO; break; } @@ -4089,6 +4159,7 @@ static int write_subpagesize_blocksize_ebh(struct extent_buffer_head *ebh, unsigned long i; unsigned long bio_flags = 0; int rw = (epd->sync_io ? WRITE_SYNC : WRITE) | REQ_META; + int nr_eb_submitted = 0; int ret = 0, err = 0; eb = &ebh->eb; @@ -4101,7 +4172,7 @@ static int write_subpagesize_blocksize_ebh(struct extent_buffer_head *ebh, continue; clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->ebflags); - atomic_inc(&eb_head(eb)->io_bvecs); + atomic_inc(&ebh->io_bvecs); if (btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID) bio_flags = EXTENT_BIO_TREE_LOG; @@ -4119,6 +4190,8 @@ static int write_subpagesize_blocksize_ebh(struct extent_buffer_head *ebh, atomic_dec(&eb_head(eb)->io_bvecs); end_extent_buffer_writeback(eb); err = -EIO; + } else { + ++nr_eb_submitted; } } while ((eb = eb->eb_next) != NULL); @@ -4126,6 +4199,12 @@ static int write_subpagesize_blocksize_ebh(struct extent_buffer_head *ebh, update_nr_written(p, wbc, 1); } + if (!nr_eb_submitted) { + clear_bit(EXTENT_BUFFER_HEAD_WRITEBACK, &ebh->bflags); + smp_mb__after_atomic(); + wake_up_bit(&ebh->bflags, EXTENT_BUFFER_HEAD_WRITEBACK); + } + unlock_page(p); return ret; @@ -4237,24 +4316,31 @@ retry: j = 0; ebs_to_write = dirty_ebs = 0; + + lock_extent_buffers(ebh, &epd); + + set_bit(EXTENT_BUFFER_HEAD_WRITEBACK, &ebh->bflags); + eb = &ebh->eb; do { BUG_ON(j >= BITS_PER_LONG); - ret = lock_extent_buffer_for_io(eb, fs_info, &epd); + ret = mark_extent_buffer_writeback(eb, fs_info, + &epd); switch (ret) { case 0: /* - EXTENT_BUFFER_DIRTY was set and we were able to - clear it. + EXTENT_BUFFER_DIRTY was set and we were + able to clear it. */ set_bit(j, &ebs_to_write); break; case 2: /* - EXTENT_BUFFER_DIRTY was set, but we were unable - to clear EXTENT_BUFFER_WRITEBACK that was set - before we got the extent buffer locked. + EXTENT_BUFFER_DIRTY was set, but we were + unable to clear EXTENT_BUFFER_WRITEBACK + that was set before we got the extent + buffer locked. */ set_bit(j, &dirty_ebs); default: @@ -4268,22 +4354,32 @@ retry: ret = 0; + unlock_extent_buffers(ebh); + if (!ebs_to_write) { + clear_bit(EXTENT_BUFFER_HEAD_WRITEBACK, + &ebh->bflags); + smp_mb__after_atomic(); + wake_up_bit(&ebh->bflags, + EXTENT_BUFFER_HEAD_WRITEBACK); free_extent_buffer(&ebh->eb); continue; } /* - Now that we know that atleast one of the extent buffer + Now that we know that atleast one of the extent buffers belonging to the extent buffer head must be written to the disk, lock the extent_buffer_head's pages. */ lock_extent_buffer_pages(ebh, &epd); if (ebh->eb.len < PAGE_SIZE) { - ret = write_subpagesize_blocksize_ebh(ebh, fs_info, wbc, &epd, ebs_to_write); + ret = write_subpagesize_blocksize_ebh(ebh, fs_info, + wbc, &epd, + ebs_to_write); if (dirty_ebs) { - redirty_extent_buffer_pages_for_writepage(&ebh->eb, wbc); + redirty_extent_buffer_pages_for_writepage(&ebh->eb, + wbc); } } else { ret = write_regular_ebh(ebh, fs_info, wbc, &epd); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 2ea8451..e8e504c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -35,6 +35,7 @@ #define EXTENT_BUFFER_HEAD_TREE_REF 0 #define EXTENT_BUFFER_HEAD_DUMMY 1 #define EXTENT_BUFFER_HEAD_IN_TREE 2 +#define EXTENT_BUFFER_HEAD_WRITEBACK 3 /* these are bit numbers for test/set bit on extent buffer */ #define EXTENT_BUFFER_UPTODATE 0