From patchwork Fri Mar 28 21:07:27 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 3907161 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id B761A9F334 for ; Fri, 28 Mar 2014 21:07:39 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id BCAD82034E for ; Fri, 28 Mar 2014 21:07:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AC27020295 for ; Fri, 28 Mar 2014 21:07:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752557AbaC1VHe (ORCPT ); Fri, 28 Mar 2014 17:07:34 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:63461 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752428AbaC1VHd (ORCPT ); Fri, 28 Mar 2014 17:07:33 -0400 Received: from pps.filterd (m0004060 [127.0.0.1]) by mx0b-00082601.pphosted.com (8.14.5/8.14.5) with SMTP id s2SL15Ta004774 for ; Fri, 28 Mar 2014 14:07:32 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : subject : date : message-id : mime-version : content-type; s=facebook; bh=nwhMMGpb6Nig09uhcFJBrjTqITmKCNsWy0krK51ZuOU=; b=pgDsYVDj/nzaPtrynvVcxM8xBuyOI1OstmfTPng7E3ggzsIhr6BxC/OYMFO9Mvaa7yKW C6V3uWO8K37l+xTz8YTPiahdRHvjWLRIZ1Cl3nfA/Jhhg6rkfR8CRTdkXvKSWjuLWqx6 TUw4SEB9s/G22jLEsgY5XN79T6e1hkop+wA= Received: from mail.thefacebook.com (mailwest.thefacebook.com [173.252.71.148]) by mx0b-00082601.pphosted.com with ESMTP id 1jwg6803sy-3 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=OK) for ; Fri, 28 Mar 2014 14:07:32 -0700 Received: from localhost (192.168.57.29) by mail.thefacebook.com (192.168.16.20) with Microsoft SMTP Server (TLS) id 14.3.174.1; Fri, 28 Mar 2014 14:07:29 -0700 From: Josef Bacik To: Subject: [PATCH] Btrfs: don't clear uptodate if the eb is under IO Date: Fri, 28 Mar 2014 17:07:27 -0400 Message-ID: <1396040847-6026-1-git-send-email-jbacik@fb.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 X-Originating-IP: [192.168.57.29] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87, 1.0.14, 0.0.0000 definitions=2014-03-28_07:2014-03-28, 2014-03-28, 1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 kscore.is_bulkscore=5.35740019236286e-07 kscore.compositescore=0 circleOfTrustscore=514.84 compositescore=0.999775624998249 urlsuspect_oldscore=0.999775624998249 suspectscore=3 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=64355 rbsscore=0.999775624998249 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1403280119 X-FB-Internal: deliver Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP So I have an awful exercise script that will run snapshot, balance and send/receive in parallel. This sometimes would crash spectacularly and when it came back up the fs would be completely hosed. Turns out this is because of a bad interaction of balance and send/receive. Send will hold onto its entire path for the whole send, but its blocks could get relocated out from underneath it, and because it doesn't old tree locks theres nothing to keep this from happening. So it will go to read in a slot with an old transid, and we could have re-allocated this block for something else and it could have a completely different transid. But because we think it is invalid we clear uptodate and re-read in the block. If we do this before we actually write out the new block we could write back stale data to the fs, and boom we're screwed. Now we definitely need to fix this disconnect between send and balance, but we really really need to not allow ourselves to accidently read in stale data over new data. So make sure we check if the extent buffer is not under io before clearing uptodate, this will kick back EIO to the caller instead of reading in stale data and keep us from corrupting the fs. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 20 +++++++++++++++++++- fs/btrfs/extent_io.c | 2 +- fs/btrfs/extent_io.h | 1 + fs/btrfs/send.c | 2 ++ fs/btrfs/transaction.h | 2 ++ 5 files changed, 25 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 02ae4d1..716aa18 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -330,6 +330,8 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, { struct extent_state *cached_state = NULL; int ret; + bool need_lock = (current->journal_info == + (void *)BTRFS_SEND_TRANS_STUB); if (!parent_transid || btrfs_header_generation(eb) == parent_transid) return 0; @@ -337,6 +339,11 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, if (atomic) return -EAGAIN; + if (need_lock) { + btrfs_tree_read_lock(eb); + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + } + lock_extent_bits(io_tree, eb->start, eb->start + eb->len - 1, 0, &cached_state); if (extent_buffer_uptodate(eb) && @@ -348,10 +355,21 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, "found %llu\n", eb->start, parent_transid, btrfs_header_generation(eb)); ret = 1; - clear_extent_buffer_uptodate(eb); + + /* + * Things reading via commit roots that don't have normal protection, + * like send, can have a really old block in cache that may point at a + * block that has been free'd and re-allocated. So don't clear uptodate + * if we find an eb that is under IO (dirty/writeback) because we could + * end up reading in the stale data and then writing it back out and + * making everybody very sad. + */ + if (!extent_buffer_under_io(eb)) + clear_extent_buffer_uptodate(eb); out: unlock_extent_cached(io_tree, eb->start, eb->start + eb->len - 1, &cached_state, GFP_NOFS); + btrfs_tree_read_unlock_blocking(eb); return ret; } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index fd78c84..d35a3ca 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4315,7 +4315,7 @@ static void __free_extent_buffer(struct extent_buffer *eb) kmem_cache_free(extent_buffer_cache, eb); } -static int extent_buffer_under_io(struct extent_buffer *eb) +int extent_buffer_under_io(struct extent_buffer *eb) { return (atomic_read(&eb->io_pages) || test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) || diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 58b27e5..c488b45 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -320,6 +320,7 @@ int set_extent_buffer_dirty(struct extent_buffer *eb); int set_extent_buffer_uptodate(struct extent_buffer *eb); int clear_extent_buffer_uptodate(struct extent_buffer *eb); int extent_buffer_uptodate(struct extent_buffer *eb); +int extent_buffer_under_io(struct extent_buffer *eb); int map_private_extent_buffer(struct extent_buffer *eb, unsigned long offset, unsigned long min_len, char **map, unsigned long *map_start, diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 0148999..6f035ee 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5609,7 +5609,9 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) NULL); sort_clone_roots = 1; + current->journal_info = (void *)BTRFS_SEND_TRANS_STUB; ret = send_subvol(sctx); + current->journal_info = NULL; if (ret < 0) goto out; diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index aa014fe..b57b924 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -79,6 +79,8 @@ struct btrfs_transaction { #define TRANS_EXTWRITERS (__TRANS_USERSPACE | __TRANS_START | \ __TRANS_ATTACH) +#define BTRFS_SEND_TRANS_STUB 1 + struct btrfs_trans_handle { u64 transid; u64 bytes_reserved;