From patchwork Tue Mar 26 00:08:40 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Mason X-Patchwork-Id: 2334021 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 45BDDE00E6 for ; Tue, 26 Mar 2013 00:08:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759156Ab3CZAIp (ORCPT ); Mon, 25 Mar 2013 20:08:45 -0400 Received: from dkim2.fusionio.com ([66.114.96.54]:58167 "EHLO dkim2.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758813Ab3CZAIn convert rfc822-to-8bit (ORCPT ); Mon, 25 Mar 2013 20:08:43 -0400 Received: from mx2.fusionio.com (unknown [10.101.1.160]) by dkim2.fusionio.com (Postfix) with ESMTP id DDA159A0645 for ; Mon, 25 Mar 2013 18:08:42 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fusionio.com; s=default; t=1364256522; bh=PtQet1fkPbwzgLw5G0XttXR7iKdpT8c2DIiRGe0FbKA=; h=To:From:In-Reply-To:CC:References:Subject:Date; b=qgoNsqnuMx0Z08BOrzhGg1B+7fQs++Lax9N/kvaWx0ZVVF9UQOpVilg0x3h12A8q6 2u0QRUimRZLv4O9bVztvBc5IrXYoKYN+OrI2ICBba7PrUF5myljceYfgfdsEFGvsjd xw/PbO5/zCwkIqUbYKioT0i52Vny/TfpFvHB56Lo= X-ASG-Debug-ID: 1364256522-0421b539a8232b0001-RKzwh1 Received: from mail1.int.fusionio.com (mail1.int.fusionio.com [10.101.1.21]) by mx2.fusionio.com with ESMTP id btF9fhUnmxdf78MH (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Mon, 25 Mar 2013 18:08:42 -0600 (MDT) X-Barracuda-Envelope-From: clmason@fusionio.com Received: from localhost (67.247.72.189) by mail.fusionio.com (10.101.1.19) with Microsoft SMTP Server (TLS) id 8.3.83.0; Mon, 25 Mar 2013 18:08:41 -0600 MIME-Version: 1.0 To: Chris Mason , Alexandre Oliva , "linux-btrfs@vger.kernel.org" From: Chris Mason In-Reply-To: <20130322203142.27874.84834@localhost.localdomain> CC: "ceph-devel@vger.kernel.org" References: <20130322180705.27874.96638@localhost.localdomain> <20130322203142.27874.84834@localhost.localdomain> Message-ID: <20130326000840.14340.30047@localhost.localdomain> User-Agent: alot/0.3.4 Subject: Re: corruption of active mmapped files in btrfs snapshots Date: Mon, 25 Mar 2013 20:08:40 -0400 X-ASG-Orig-Subj: Re: corruption of active mmapped files in btrfs snapshots X-Barracuda-Connect: mail1.int.fusionio.com[10.101.1.21] X-Barracuda-Start-Time: 1364256522 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://10.101.1.181:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at fusionio.com X-Barracuda-Spam-Score: 3.50 X-Barracuda-Spam-Status: No, SCORE=3.50 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=BSF_SC0_MV0035, BSF_SC0_SA550, BSF_SC0_SA_TO_FROM_ADDR_MATCH X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.126239 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 BSF_SC0_SA_TO_FROM_ADDR_MATCH Sender Address Matches Recipient Address 2.50 BSF_SC0_SA550 Custom Rule SA550 0.50 BSF_SC0_MV0035 Custom Rule BSF_SC0_MV0035 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Quoting Chris Mason (2013-03-22 16:31:42) > Going through the code here, when I change the test to truncate once in > the very beginning, I still get errors. So, it isn't an interaction > between mmap and truncate. It must be a problem between lzo and mmap. With compression off, we use clear_page_dirty_for_io to create a wall between applications using mmap and our crc code. Once we call clear_page_dirty_for_io, it means we're in the process of writing the page and anyone using mmap must wait (by calling page_mkwrite) before they are allowed to change the page. We use it with compression on as well, but it only ends up protecting the crcs. It gets called after the compression is done, which allows applications to race in and modify the pages while we are compressing them. This patch changes our compression code to call clear_page_dirty_for_io before we compress, and then redirty the pages if the compression fails. Alexandre, many thanks for tracking this down into a well defined use case. -chris --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index f173c5a..cdee391 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1257,6 +1257,39 @@ int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end) GFP_NOFS); } +int extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end) +{ + unsigned long index = start >> PAGE_CACHE_SHIFT; + unsigned long end_index = end >> PAGE_CACHE_SHIFT; + struct page *page; + + while (index <= end_index) { + page = find_get_page(inode->i_mapping, index); + BUG_ON(!page); /* Pages should be in the extent_io_tree */ + clear_page_dirty_for_io(page); + page_cache_release(page); + index++; + } + return 0; +} + +int extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end) +{ + unsigned long index = start >> PAGE_CACHE_SHIFT; + unsigned long end_index = end >> PAGE_CACHE_SHIFT; + struct page *page; + + while (index <= end_index) { + page = find_get_page(inode->i_mapping, index); + BUG_ON(!page); /* Pages should be in the extent_io_tree */ + account_page_redirty(page); + __set_page_dirty_nobuffers(page); + page_cache_release(page); + index++; + } + return 0; +} + /* * helper function to set both pages and extents in the tree writeback */ diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 6068a19..258c921 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -325,6 +325,8 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long offset, unsigned long *map_len); int extent_range_uptodate(struct extent_io_tree *tree, u64 start, u64 end); +int extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end); +int extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end); int extent_clear_unlock_delalloc(struct inode *inode, struct extent_io_tree *tree, u64 start, u64 end, struct page *locked_page, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ca1b767..88d4a18 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -353,6 +353,7 @@ static noinline int compress_file_range(struct inode *inode, int i; int will_compress; int compress_type = root->fs_info->compress_type; + int redirty = 0; /* if this is a small write inside eof, kick off a defrag */ if ((end - start + 1) < 16 * 1024 && @@ -415,6 +416,8 @@ again: if (BTRFS_I(inode)->force_compress) compress_type = BTRFS_I(inode)->force_compress; + extent_range_clear_dirty_for_io(inode, start, end); + redirty = 1; ret = btrfs_compress_pages(compress_type, inode->i_mapping, start, total_compressed, pages, @@ -554,6 +557,8 @@ cleanup_and_bail_uncompressed: __set_page_dirty_nobuffers(locked_page); /* unlocked later on in the async handlers */ } + if (redirty) + extent_range_redirty_for_io(inode, start, end); add_async_extent(async_cow, start, end - start + 1, 0, NULL, 0, BTRFS_COMPRESS_NONE); *num_added += 1;