From patchwork Sat Jul 30 16:50:01 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 1022542 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p6UGo8OL031274 for ; Sat, 30 Jul 2011 16:50:08 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751527Ab1G3QuG (ORCPT ); Sat, 30 Jul 2011 12:50:06 -0400 Received: from li9-11.members.linode.com ([67.18.176.11]:59644 "EHLO test.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751443Ab1G3QuE (ORCPT ); Sat, 30 Jul 2011 12:50:04 -0400 Received: from root (helo=tytso-glaptop) by test.thunk.org with local-esmtp (Exim 4.69) (envelope-from ) id 1QnCjj-0004uQ-74; Sat, 30 Jul 2011 16:50:03 +0000 Received: from tytso by tytso-glaptop with local (Exim 4.71) (envelope-from ) id 1QnCjh-0003v8-AZ; Sat, 30 Jul 2011 12:50:01 -0400 Date: Sat, 30 Jul 2011 12:50:01 -0400 From: "Ted Ts'o" To: Fyodor Ustinov Cc: ceph-devel@vger.kernel.org Subject: Re: Kernel 3.0.0 + ext4 + ceph == ... Message-ID: <20110730165001.GI7361@thunk.org> References: <4E33D101.1050504@ufm.su> <9BF9E529-C532-4A94-8362-93C2D1B778DB@mit.edu> <4E3432FC.9030204@ufm.su> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <4E3432FC.9030204@ufm.su> User-Agent: Mutt/1.5.20 (2009-06-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on test.thunk.org); SAEximRunCond expanded to false Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Sat, 30 Jul 2011 16:50:08 +0000 (UTC) On Sat, Jul 30, 2011 at 07:36:12PM +0300, Fyodor Ustinov wrote: > As it is written in subject - 3.0.0 release. > > It's Ubuntu 11.04 with custom kernel Right, sorry, I missed that. And just to be clear this wasn't an -rc kernel but 3.0 final, right? Hmm, looking through recent commits which will shortly be merged into 3.1, this one leaps out, but I'm not sure it's the cause --- how full was your disk at the end of this exercise? I haven't looked at Ceph in quite a while. As I recall it was primarily doing Direct I/O writes, correct? Or does it use buffered I/O? And does it use the new "punch" ioctl to release blocks from the middle of a file? Ext4 added punch support in 3.0, and there are some bug fixes that are going into 3.1, but I don't think there were any that would lead to the failure mode you are seeing. - Ted commit 7132de744ba76930d13033061018ddd7e3e8cd91 Author: Maxim Patlasov Date: Sun Jul 10 19:37:48 2011 -0400 ext4: fix i_blocks/quota accounting when extent insertion fails The current implementation of ext4_free_blocks() always calls dquot_free_block This looks quite sensible in the most cases: blocks to be freed are associated with inode and were accounted in quota and i_blocks some time ago. However, there is a case when blocks to free were not accounted by the time calling ext4_free_blocks() yet: 1. delalloc is on, write_begin pre-allocated some space in quota 2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks() 3. then ext4_ext_map_blocks() gets an error (e.g. ENOSPC) from ext4_ext_insert_extent() and calls ext4_free_blocks(). In this scenario, ext4_free_blocks() calls dquot_free_block() who, in turn, decrements i_blocks for blocks which were not accounted yet (due to delalloc) After clean umount, e2fsck reports something like: > Inode 21, i_blocks is 5080, should be 5128. Fix? because i_blocks was erroneously decremented as explained above. The patch fixes the problem by passing the new flag EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request that the dquot_free_block() call be skipped. Signed-off-by: Maxim Patlasov Signed-off-by: "Theodore Ts'o" Cc: stable@kernel.org --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 49d2cea..d13f3b5 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -526,6 +526,7 @@ struct ext4_new_group_data { #define EXT4_FREE_BLOCKS_METADATA 0x0001 #define EXT4_FREE_BLOCKS_FORGET 0x0002 #define EXT4_FREE_BLOCKS_VALIDATED 0x0004 +#define EXT4_FREE_BLOCKS_NO_QUOT_UPDATE 0x0008 /* * ioctl commands diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 31ae5fb..a862138 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3565,12 +3565,14 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, err = ext4_ext_insert_extent(handle, inode, path, &newex, flags); if (err) { + int fb_flags = flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE ? + EXT4_FREE_BLOCKS_NO_QUOT_UPDATE : 0; /* free data blocks we just allocated */ /* not a good idea to call discard here directly, * but otherwise we'd need to call it every free() */ ext4_discard_preallocations(inode); ext4_free_blocks(handle, inode, NULL, ext4_ext_pblock(&newex), - ext4_ext_get_actual_len(&newex), 0); + ext4_ext_get_actual_len(&newex), fb_flags); goto out2; } diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 389386b..1900ec7 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4637,7 +4637,7 @@ do_more: } ext4_mark_super_dirty(sb); error_return: - if (freed) + if (freed && !(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE)) dquot_free_block(inode, freed); brelse(bitmap_bh); ext4_std_error(sb, err);