diff mbox

[v5] qgroup: Retry after commit on getting EDQUOT

Message ID 20170327172957.2518-1-rgoldwyn@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Goldwyn Rodrigues March 27, 2017, 5:29 p.m. UTC
From: Goldwyn Rodrigues <rgoldwyn@suse.com>

We are facing the same problem with EDQUOT which was experienced with
ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but
here is a quick fix, which may be too big a hammer.

Quotas are reserved during the start of an operation, incrementing
qg->reserved. However, it is written to disk in a commit_transaction
which could take as long as commit_interval. In the meantime there
could be deletions which are not accounted for because deletions are
accounted for only while committed (free_refroot). So, when we get
a EDQUOT flush the data to disk and try again.

This fixes fstests btrfs/139.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
Changes since v1:
 - Changed start_delalloc_roots() to start_delalloc_inode() to target
   the root in question only to reduce the amount of flush to be done.
 - Added wait_ordered_extents().

Changes since v2:
  - Revised patch header
  - removed comment on combining conditions
  - removed test case, to be done in fstests

Changes sinve v3:
  - testcase reinstated
  - return value checks

Changes since v4:
 - removed testcase since btrfs/139 got incorporated in fstests
 - return statements corrected

 fs/btrfs/qgroup.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Comments

David Sterba March 27, 2017, 5:36 p.m. UTC | #1
On Mon, Mar 27, 2017 at 12:29:57PM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> We are facing the same problem with EDQUOT which was experienced with
> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but
> here is a quick fix, which may be too big a hammer.
> 
> Quotas are reserved during the start of an operation, incrementing
> qg->reserved. However, it is written to disk in a commit_transaction
> which could take as long as commit_interval. In the meantime there
> could be deletions which are not accounted for because deletions are
> accounted for only while committed (free_refroot). So, when we get
> a EDQUOT flush the data to disk and try again.
> 
> This fixes fstests btrfs/139.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
> Changes since v1:
>  - Changed start_delalloc_roots() to start_delalloc_inode() to target
>    the root in question only to reduce the amount of flush to be done.
>  - Added wait_ordered_extents().
> 
> Changes since v2:
>   - Revised patch header
>   - removed comment on combining conditions
>   - removed test case, to be done in fstests
> 
> Changes sinve v3:
>   - testcase reinstated
>   - return value checks
> 
> Changes since v4:
>  - removed testcase since btrfs/139 got incorporated in fstests

The point was to keep the test inside the changelog as well.

>  - return statements corrected
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Goldwyn Rodrigues March 27, 2017, 6:13 p.m. UTC | #2
On 03/27/2017 12:36 PM, David Sterba wrote:
> On Mon, Mar 27, 2017 at 12:29:57PM -0500, Goldwyn Rodrigues wrote:
>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>
>> We are facing the same problem with EDQUOT which was experienced with
>> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but
>> here is a quick fix, which may be too big a hammer.
>>
>> Quotas are reserved during the start of an operation, incrementing
>> qg->reserved. However, it is written to disk in a commit_transaction
>> which could take as long as commit_interval. In the meantime there
>> could be deletions which are not accounted for because deletions are
>> accounted for only while committed (free_refroot). So, when we get
>> a EDQUOT flush the data to disk and try again.
>>
>> This fixes fstests btrfs/139.
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> ---
>> Changes since v1:
>>  - Changed start_delalloc_roots() to start_delalloc_inode() to target
>>    the root in question only to reduce the amount of flush to be done.
>>  - Added wait_ordered_extents().
>>
>> Changes since v2:
>>   - Revised patch header
>>   - removed comment on combining conditions
>>   - removed test case, to be done in fstests
>>
>> Changes sinve v3:
>>   - testcase reinstated
>>   - return value checks
>>
>> Changes since v4:
>>  - removed testcase since btrfs/139 got incorporated in fstests
> 
> The point was to keep the test inside the changelog as well.


Yes, that was before we had btrfs/139 in fstest. I have put it in the
patch header. However, if you really want the test script back, I can
put it there. Let me know.
David Sterba March 28, 2017, noon UTC | #3
On Mon, Mar 27, 2017 at 01:13:46PM -0500, Goldwyn Rodrigues wrote:
> 
> 
> On 03/27/2017 12:36 PM, David Sterba wrote:
> > On Mon, Mar 27, 2017 at 12:29:57PM -0500, Goldwyn Rodrigues wrote:
> >> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >>
> >> We are facing the same problem with EDQUOT which was experienced with
> >> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but
> >> here is a quick fix, which may be too big a hammer.
> >>
> >> Quotas are reserved during the start of an operation, incrementing
> >> qg->reserved. However, it is written to disk in a commit_transaction
> >> which could take as long as commit_interval. In the meantime there
> >> could be deletions which are not accounted for because deletions are
> >> accounted for only while committed (free_refroot). So, when we get
> >> a EDQUOT flush the data to disk and try again.
> >>
> >> This fixes fstests btrfs/139.
> >>
> >> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >> ---
> >> Changes since v1:
> >>  - Changed start_delalloc_roots() to start_delalloc_inode() to target
> >>    the root in question only to reduce the amount of flush to be done.
> >>  - Added wait_ordered_extents().
> >>
> >> Changes since v2:
> >>   - Revised patch header
> >>   - removed comment on combining conditions
> >>   - removed test case, to be done in fstests
> >>
> >> Changes sinve v3:
> >>   - testcase reinstated
> >>   - return value checks
> >>
> >> Changes since v4:
> >>  - removed testcase since btrfs/139 got incorporated in fstests
> > 
> > The point was to keep the test inside the changelog as well.
> 
> 
> Yes, that was before we had btrfs/139 in fstest. I have put it in the
> patch header. However, if you really want the test script back, I can
> put it there. Let me know.

I think the test steps are worth keeping in the changelog, even if
there's a fstest.  I'll copy it from v4, patch added to 4.12 queue.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo May 24, 2017, 3:44 a.m. UTC | #4
At 03/28/2017 02:13 AM, Goldwyn Rodrigues wrote:
> 
> 
> On 03/27/2017 12:36 PM, David Sterba wrote:
>> On Mon, Mar 27, 2017 at 12:29:57PM -0500, Goldwyn Rodrigues wrote:
>>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>>
>>> We are facing the same problem with EDQUOT which was experienced with
>>> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but
>>> here is a quick fix, which may be too big a hammer.
>>>
>>> Quotas are reserved during the start of an operation, incrementing
>>> qg->reserved. However, it is written to disk in a commit_transaction
>>> which could take as long as commit_interval. In the meantime there
>>> could be deletions which are not accounted for because deletions are
>>> accounted for only while committed (free_refroot). So, when we get
>>> a EDQUOT flush the data to disk and try again.
>>>
>>> This fixes fstests btrfs/139.

This patch is causing hang for inode_cache mount option.

Which can be easily triggered by btrfs/042 with inode_cache.

The callback trace will be:
Call Trace:
  __schedule+0x374/0xaf0
  schedule+0x3d/0x90
  wait_for_commit+0x4a/0x80 [btrfs]
  ? wake_atomic_t_function+0x60/0x60
  btrfs_commit_transaction+0xe0/0xa10 [btrfs]
  ? start_transaction+0xad/0x510 [btrfs]
  qgroup_reserve+0x1f0/0x350 [btrfs]
  btrfs_qgroup_reserve_data+0xf8/0x2f0 [btrfs]
  ? _raw_spin_unlock+0x27/0x40
  btrfs_check_data_free_space+0x6d/0xb0 [btrfs]
  btrfs_delalloc_reserve_space+0x25/0x70 [btrfs]
  btrfs_save_ino_cache+0x402/0x650 [btrfs]
  commit_fs_roots+0xb7/0x170 [btrfs]
  btrfs_commit_transaction+0x425/0xa10 [btrfs]
  qgroup_reserve+0x1f0/0x350 [btrfs]
  btrfs_qgroup_reserve_data+0xf8/0x2f0 [btrfs]
  ? _raw_spin_unlock+0x27/0x40
  btrfs_check_data_free_space+0x6d/0xb0 [btrfs]
  btrfs_delalloc_reserve_space+0x25/0x70 [btrfs]
  btrfs_direct_IO+0x1c5/0x3b0 [btrfs]
  generic_file_direct_write+0xab/0x150
  btrfs_file_write_iter+0x243/0x530 [btrfs]
  __vfs_write+0xc9/0x120
  vfs_write+0xcb/0x1f0
  SyS_pwrite64+0x79/0x90
  entry_SYSCALL_64_fastpath+0x18/0xad

We're calling btrfs_commit_transaction() inside btrfs_commit_transaction().

Which will definitely hang the system.

Any idea to fix it?

Thanks,
Qu

>>>
>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>> ---
>>> Changes since v1:
>>>   - Changed start_delalloc_roots() to start_delalloc_inode() to target
>>>     the root in question only to reduce the amount of flush to be done.
>>>   - Added wait_ordered_extents().
>>>
>>> Changes since v2:
>>>    - Revised patch header
>>>    - removed comment on combining conditions
>>>    - removed test case, to be done in fstests
>>>
>>> Changes sinve v3:
>>>    - testcase reinstated
>>>    - return value checks
>>>
>>> Changes since v4:
>>>   - removed testcase since btrfs/139 got incorporated in fstests
>>
>> The point was to keep the test inside the changelog as well.
> 
> 
> Yes, that was before we had btrfs/139 in fstest. I have put it in the
> patch header. However, if you really want the test script back, I can
> put it there. Let me know.
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index a5da750..341c594 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2367,6 +2367,7 @@  static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce)
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	u64 ref_root = root->root_key.objectid;
 	int ret = 0;
+	int retried = 0;
 	struct ulist_node *unode;
 	struct ulist_iterator uiter;
 
@@ -2375,7 +2376,7 @@  static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce)
 
 	if (num_bytes == 0)
 		return 0;
-
+retry:
 	spin_lock(&fs_info->qgroup_lock);
 	quota_root = fs_info->quota_root;
 	if (!quota_root)
@@ -2402,6 +2403,27 @@  static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce)
 		qg = unode_aux_to_qgroup(unode);
 
 		if (enforce && !qgroup_check_limits(qg, num_bytes)) {
+			/*
+			 * Commit the tree and retry, since we may have
+			 * deletions which would free up space.
+			 */
+			if (!retried && qg->reserved > 0) {
+				struct btrfs_trans_handle *trans;
+				spin_unlock(&fs_info->qgroup_lock);
+				ret = btrfs_start_delalloc_inodes(root, 0);
+				if (ret)
+					return ret;
+				btrfs_wait_ordered_extents(root, -1, 0,
+						(u64)-1);
+				trans = btrfs_join_transaction(root);
+				if (IS_ERR(trans))
+					return PTR_ERR(trans);
+				ret = btrfs_commit_transaction(trans);
+				if (ret)
+					return ret;
+				retried++;
+				goto retry;
+			}
 			ret = -EDQUOT;
 			goto out;
 		}