diff mbox

btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress

Message ID 20160902025846.4133-1-wangxg.fnst@cn.fujitsu.com (mailing list archive)
State Accepted
Headers show

Commit Message

Xiaoguang Wang Sept. 2, 2016, 2:58 a.m. UTC
In btrfs_async_reclaim_metadata_space(), we use ticket's address to
determine whether asynchronous metadata reclaim work is making progress.

	ticket = list_first_entry(&space_info->tickets,
				  struct reserve_ticket, list);
	if (last_ticket == ticket) {
		flush_state++;
	} else {
		last_ticket = ticket;
		flush_state = FLUSH_DELAYED_ITEMS_NR;
		if (commit_cycles)
			commit_cycles--;
	}

But indeed it's wrong, we should not rely on local variable's address to
do this check, because addresses may be same. In my test environment, I
dd one 168MB file in a 256MB fs, found that for this file, every time
wait_reserve_ticket() called, local variable ticket's address is same,

For above codes, assume a previous ticket's address is addrA, last_ticket
is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
wake up it, then another ticket is added, but with the same address addrA,
now last_ticket will be same to current ticket, then current ticket's flush
work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
which may result in some enospc issues(I have seen this in my test machine).

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  1 +
 fs/btrfs/extent-tree.c | 11 ++++++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

Comments

Josef Bacik Sept. 2, 2016, 1:28 p.m. UTC | #1
On 09/01/2016 10:58 PM, Wang Xiaoguang wrote:
> In btrfs_async_reclaim_metadata_space(), we use ticket's address to
> determine whether asynchronous metadata reclaim work is making progress.
>
> 	ticket = list_first_entry(&space_info->tickets,
> 				  struct reserve_ticket, list);
> 	if (last_ticket == ticket) {
> 		flush_state++;
> 	} else {
> 		last_ticket = ticket;
> 		flush_state = FLUSH_DELAYED_ITEMS_NR;
> 		if (commit_cycles)
> 			commit_cycles--;
> 	}
>
> But indeed it's wrong, we should not rely on local variable's address to
> do this check, because addresses may be same. In my test environment, I
> dd one 168MB file in a 256MB fs, found that for this file, every time
> wait_reserve_ticket() called, local variable ticket's address is same,
>
> For above codes, assume a previous ticket's address is addrA, last_ticket
> is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
> wake up it, then another ticket is added, but with the same address addrA,
> now last_ticket will be same to current ticket, then current ticket's flush
> work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
> which may result in some enospc issues(I have seen this in my test machine).
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

We want to track the progress of the individual tickets, not whether or not we 
make progress on the space_info, so instead store the global ticket id in 
space_info and store the individual ticket_id in the ticket itself, and use that 
as the last_tickets_id.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiaoguang Wang Sept. 6, 2016, 10:18 a.m. UTC | #2
hello,

On 09/02/2016 09:28 PM, Josef Bacik wrote:
> On 09/01/2016 10:58 PM, Wang Xiaoguang wrote:
>> In btrfs_async_reclaim_metadata_space(), we use ticket's address to
>> determine whether asynchronous metadata reclaim work is making progress.
>>
>>     ticket = list_first_entry(&space_info->tickets,
>>                   struct reserve_ticket, list);
>>     if (last_ticket == ticket) {
>>         flush_state++;
>>     } else {
>>         last_ticket = ticket;
>>         flush_state = FLUSH_DELAYED_ITEMS_NR;
>>         if (commit_cycles)
>>             commit_cycles--;
>>     }
>>
>> But indeed it's wrong, we should not rely on local variable's address to
>> do this check, because addresses may be same. In my test environment, I
>> dd one 168MB file in a 256MB fs, found that for this file, every time
>> wait_reserve_ticket() called, local variable ticket's address is same,
>>
>> For above codes, assume a previous ticket's address is addrA, 
>> last_ticket
>> is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
>> wake up it, then another ticket is added, but with the same address 
>> addrA,
>> now last_ticket will be same to current ticket, then current ticket's 
>> flush
>> work will start from current flush_state, not initial 
>> FLUSH_DELAYED_ITEMS_NR,
>> which may result in some enospc issues(I have seen this in my test 
>> machine).
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>
> We want to track the progress of the individual tickets, not whether 
> or not we make progress on the space_info, so instead store the global 
> ticket id in space_info and store the individual ticket_id in the 
> ticket itself, and use that as the last_tickets_id.  Thanks,
Sorry for being late.

In btrfs_async_reclaim_metadata_space(), there is codes:
if (flush_state > COMMIT_TRANS) {
             commit_cycles++;
             if (commit_cycles > 2) {
                 wake_all_tickets(&space_info->tickets);
                 space_info->flush = 0;
             } else {
                 flush_state = FLUSH_DELAYED_ITEMS_NR;
             }
}
 From above codes, if it can not satisfy one current ticket, it'll 
discard all current
queued tickets. So I think your original codes is to track whether or not we
make progress on the space_info, and this patch can fix the issue which 
is described
in commit message.
Also I'm not sure whether I have understood your design completely. If 
I'm wrong, sorry.

Regards,
Xiaoguang Wang

>
> Josef
>
>



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Sept. 6, 2016, 12:52 p.m. UTC | #3
On 09/06/2016 06:18 AM, Wang Xiaoguang wrote:
> hello,
>
> On 09/02/2016 09:28 PM, Josef Bacik wrote:
>> On 09/01/2016 10:58 PM, Wang Xiaoguang wrote:
>>> In btrfs_async_reclaim_metadata_space(), we use ticket's address to
>>> determine whether asynchronous metadata reclaim work is making progress.
>>>
>>>     ticket = list_first_entry(&space_info->tickets,
>>>                   struct reserve_ticket, list);
>>>     if (last_ticket == ticket) {
>>>         flush_state++;
>>>     } else {
>>>         last_ticket = ticket;
>>>         flush_state = FLUSH_DELAYED_ITEMS_NR;
>>>         if (commit_cycles)
>>>             commit_cycles--;
>>>     }
>>>
>>> But indeed it's wrong, we should not rely on local variable's address to
>>> do this check, because addresses may be same. In my test environment, I
>>> dd one 168MB file in a 256MB fs, found that for this file, every time
>>> wait_reserve_ticket() called, local variable ticket's address is same,
>>>
>>> For above codes, assume a previous ticket's address is addrA, last_ticket
>>> is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
>>> wake up it, then another ticket is added, but with the same address addrA,
>>> now last_ticket will be same to current ticket, then current ticket's flush
>>> work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
>>> which may result in some enospc issues(I have seen this in my test machine).
>>>
>>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>
>> We want to track the progress of the individual tickets, not whether or not we
>> make progress on the space_info, so instead store the global ticket id in
>> space_info and store the individual ticket_id in the ticket itself, and use
>> that as the last_tickets_id.  Thanks,
> Sorry for being late.
>
> In btrfs_async_reclaim_metadata_space(), there is codes:
> if (flush_state > COMMIT_TRANS) {
>             commit_cycles++;
>             if (commit_cycles > 2) {
>                 wake_all_tickets(&space_info->tickets);
>                 space_info->flush = 0;
>             } else {
>                 flush_state = FLUSH_DELAYED_ITEMS_NR;
>             }
> }
> From above codes, if it can not satisfy one current ticket, it'll discard all
> current
> queued tickets. So I think your original codes is to track whether or not we
> make progress on the space_info, and this patch can fix the issue which is
> described
> in commit message.
> Also I'm not sure whether I have understood your design completely. If I'm
> wrong, sorry.

Sorry I misread your patch, I thought you were doing space_info->ticket_id++ 
when you added a ticket to the list, not when you removed it.  You can add

Reviewed-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index eff3993..33fe035 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -427,6 +427,7 @@  struct btrfs_space_info {
 	struct list_head ro_bgs;
 	struct list_head priority_tickets;
 	struct list_head tickets;
+	u64 tickets_id;
 
 	struct rw_semaphore groups_sem;
 	/* for block groups in our same type */
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8c8a4d1..2f9a18f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4966,12 +4966,12 @@  static void wake_all_tickets(struct list_head *head)
  */
 static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 {
-	struct reserve_ticket *last_ticket = NULL;
 	struct btrfs_fs_info *fs_info;
 	struct btrfs_space_info *space_info;
 	u64 to_reclaim;
 	int flush_state;
 	int commit_cycles = 0;
+	u64 last_tickets_id;
 
 	fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work);
 	space_info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
@@ -4984,8 +4984,7 @@  static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 		spin_unlock(&space_info->lock);
 		return;
 	}
-	last_ticket = list_first_entry(&space_info->tickets,
-				       struct reserve_ticket, list);
+	last_tickets_id = space_info->tickets_id;
 	spin_unlock(&space_info->lock);
 
 	flush_state = FLUSH_DELAYED_ITEMS_NR;
@@ -5005,10 +5004,10 @@  static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 							      space_info);
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
-		if (last_ticket == ticket) {
+		if (last_tickets_id == space_info->tickets_id) {
 			flush_state++;
 		} else {
-			last_ticket = ticket;
+			last_tickets_id = space_info->tickets_id;
 			flush_state = FLUSH_DELAYED_ITEMS_NR;
 			if (commit_cycles)
 				commit_cycles--;
@@ -5384,6 +5383,7 @@  again:
 			list_del_init(&ticket->list);
 			num_bytes -= ticket->bytes;
 			ticket->bytes = 0;
+			space_info->tickets_id++;
 			wake_up(&ticket->wait);
 		} else {
 			ticket->bytes -= num_bytes;
@@ -5426,6 +5426,7 @@  again:
 			num_bytes -= ticket->bytes;
 			space_info->bytes_may_use += ticket->bytes;
 			ticket->bytes = 0;
+			space_info->tickets_id++;
 			wake_up(&ticket->wait);
 		} else {
 			trace_btrfs_space_reservation(fs_info, "space_info",