Message ID | 1459492512-31435-11-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Apr 01, 2016 at 02:35:01PM +0800, Qu Wenruo wrote: > @@ -5815,6 +5817,23 @@ out_fail: > } > if (delalloc_lock) > mutex_unlock(&BTRFS_I(inode)->delalloc_mutex); > + /* > + * The number of metadata bytes is calculated by the difference > + * between outstanding_extents and reserved_extents. Sometimes though > + * reserve_metadata_bytes() fails to reserve the wanted metadata bytes, > + * indeed it has already done some work to reclaim metadata space, hence > + * both outstanding_extents and reserved_extents would have changed and > + * the bytes we try to reserve would also has changed(may be smaller). > + * So here we try to reserve again. This is much useful for online > + * dedupe, which will easily eat almost all meta space. > + * > + * XXX: Indeed here 3 is arbitrarily choosed, it's a good workaround for > + * online dedupe, later we should find a better method to avoid dedupe > + * enospc issue. > + */ > + if (unlikely(ret == -ENOSPC && loops++ < 3)) > + goto again; > + This does not seem right and needs to be addressed properly before I consider adding the patchset to for-next. I don't have idea how to fix it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba wrote on 2016/05/17 15:20 +0200: > On Fri, Apr 01, 2016 at 02:35:01PM +0800, Qu Wenruo wrote: >> @@ -5815,6 +5817,23 @@ out_fail: >> } >> if (delalloc_lock) >> mutex_unlock(&BTRFS_I(inode)->delalloc_mutex); >> + /* >> + * The number of metadata bytes is calculated by the difference >> + * between outstanding_extents and reserved_extents. Sometimes though >> + * reserve_metadata_bytes() fails to reserve the wanted metadata bytes, >> + * indeed it has already done some work to reclaim metadata space, hence >> + * both outstanding_extents and reserved_extents would have changed and >> + * the bytes we try to reserve would also has changed(may be smaller). >> + * So here we try to reserve again. This is much useful for online >> + * dedupe, which will easily eat almost all meta space. >> + * >> + * XXX: Indeed here 3 is arbitrarily choosed, it's a good workaround for >> + * online dedupe, later we should find a better method to avoid dedupe >> + * enospc issue. >> + */ >> + if (unlikely(ret == -ENOSPC && loops++ < 3)) >> + goto again; >> + > > This does not seem right and needs to be addressed properly before I > consider adding the patchset to for-next. I don't have idea how to fix > it. > > Right, the code is bad, just as Josef pointed out. In fact this behavior really hides a lot of ENOSPC problem and shows quite a lot original metadata reserve limitation of current code. (Mostly hidden by the default 128M max extent size) We are actively debugging and testing new root fix for the problem. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 17, 2016 at 03:20:16PM +0200, David Sterba wrote: > On Fri, Apr 01, 2016 at 02:35:01PM +0800, Qu Wenruo wrote: > > @@ -5815,6 +5817,23 @@ out_fail: > > } > > if (delalloc_lock) > > mutex_unlock(&BTRFS_I(inode)->delalloc_mutex); > > + /* > > + * The number of metadata bytes is calculated by the difference > > + * between outstanding_extents and reserved_extents. Sometimes though > > + * reserve_metadata_bytes() fails to reserve the wanted metadata bytes, > > + * indeed it has already done some work to reclaim metadata space, hence > > + * both outstanding_extents and reserved_extents would have changed and > > + * the bytes we try to reserve would also has changed(may be smaller). > > + * So here we try to reserve again. This is much useful for online > > + * dedupe, which will easily eat almost all meta space. > > + * > > + * XXX: Indeed here 3 is arbitrarily choosed, it's a good workaround for > > + * online dedupe, later we should find a better method to avoid dedupe > > + * enospc issue. > > + */ > > + if (unlikely(ret == -ENOSPC && loops++ < 3)) > > + goto again; > > + > > This does not seem right and needs to be addressed properly before I > consider adding the patchset to for-next. I don't have idea how to fix > it. Agreed, and this sort of issue is a reason why I strongly feel we don't want to merge this series piecemeal until we know that after everything is complete, we can end up with a fully baked in-band dedupe implementation. Luckily Qu says he's on it so if he posts a workable fix here my whole point can become moot. Until then though this is exactly the type of 'fix later' coding we need to be avoiding. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index dabd721..016d2ec 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2421,7 +2421,7 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, * a new extent is revered, then deleted * in one tran, and inc/dec get merged to 0. * - * In this case, we need to remove its dedup + * In this case, we need to remove its dedupe * hash. */ btrfs_dedupe_del(trans, fs_info, node->bytenr); @@ -5675,6 +5675,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) bool delalloc_lock = true; u64 to_free = 0; unsigned dropped; + int loops = 0; /* If we are a free space inode we need to not flush since we will be in * the middle of a transaction commit. We also don't need the delalloc @@ -5690,11 +5691,12 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) btrfs_transaction_in_commit(root->fs_info)) schedule_timeout(1); + num_bytes = ALIGN(num_bytes, root->sectorsize); + +again: if (delalloc_lock) mutex_lock(&BTRFS_I(inode)->delalloc_mutex); - num_bytes = ALIGN(num_bytes, root->sectorsize); - spin_lock(&BTRFS_I(inode)->lock); nr_extents = (unsigned)div64_u64(num_bytes + BTRFS_MAX_EXTENT_SIZE - 1, @@ -5815,6 +5817,23 @@ out_fail: } if (delalloc_lock) mutex_unlock(&BTRFS_I(inode)->delalloc_mutex); + /* + * The number of metadata bytes is calculated by the difference + * between outstanding_extents and reserved_extents. Sometimes though + * reserve_metadata_bytes() fails to reserve the wanted metadata bytes, + * indeed it has already done some work to reclaim metadata space, hence + * both outstanding_extents and reserved_extents would have changed and + * the bytes we try to reserve would also has changed(may be smaller). + * So here we try to reserve again. This is much useful for online + * dedupe, which will easily eat almost all meta space. + * + * XXX: Indeed here 3 is arbitrarily choosed, it's a good workaround for + * online dedupe, later we should find a better method to avoid dedupe + * enospc issue. + */ + if (unlikely(ret == -ENOSPC && loops++ < 3)) + goto again; + return ret; }