diff mbox series

[v4,03/15] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink

Message ID 20190327122418.24027-4-nborisov@suse.com (mailing list archive)
State New, archived
Headers show
Series FITRIM improvement | expand

Commit Message

Nikolay Borisov March 27, 2019, 12:24 p.m. UTC
During device shrink pinned/pending chunks (i.e those which have been
deleted/created respectively, in the current transaction and haven't
touched disk) need to be accounted when doing device shrink. Presently
this happens after the main relocation loop in btrfs_shrink_device,
which could lead to making another go in the body of the function.

Since there is no hard requirement to perform pinned/pending chunks
handling after the relocation loop, move the code before it. This leads
to simplifying the code flow around - i.e no need to use 'goto again'.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 57 +++++++++++++++++++---------------------------
 1 file changed, 24 insertions(+), 33 deletions(-)

Comments

David Sterba April 1, 2019, 6:26 p.m. UTC | #1
On Wed, Mar 27, 2019 at 02:24:06PM +0200, Nikolay Borisov wrote:
> During device shrink pinned/pending chunks (i.e those which have been
> deleted/created respectively, in the current transaction and haven't
> touched disk) need to be accounted when doing device shrink. Presently
> this happens after the main relocation loop in btrfs_shrink_device,
> which could lead to making another go in the body of the function.
> 
> Since there is no hard requirement to perform pinned/pending chunks
> handling after the relocation loop, move the code before it. This leads
> to simplifying the code flow around - i.e no need to use 'goto again'.

On the other hand this starts 2 transactions unconditionally, previously
it was 1 for the final change and 1 if there were pending chunks. This
should be mentioned or explained why this is needed, otherwise the code
looks equivalent to the original version. In this case some guidance in
the changelog could shorten the time to understand the change, I've been
starting at it for half an hour.
Nikolay Borisov April 2, 2019, 5:55 a.m. UTC | #2
On 1.04.19 г. 21:26 ч., David Sterba wrote:
> On Wed, Mar 27, 2019 at 02:24:06PM +0200, Nikolay Borisov wrote:
>> During device shrink pinned/pending chunks (i.e those which have been
>> deleted/created respectively, in the current transaction and haven't
>> touched disk) need to be accounted when doing device shrink. Presently
>> this happens after the main relocation loop in btrfs_shrink_device,
>> which could lead to making another go in the body of the function.
>>
>> Since there is no hard requirement to perform pinned/pending chunks
>> handling after the relocation loop, move the code before it. This leads
>> to simplifying the code flow around - i.e no need to use 'goto again'.
> 
> On the other hand this starts 2 transactions unconditionally, previously
> it was 1 for the final change and 1 if there were pending chunks. This
> should be mentioned or explained why this is needed, otherwise the code
> looks equivalent to the original version. In this case some guidance in
> the changelog could shorten the time to understand the change, I've been
> starting at it for half an hour.
> 

Valid point, how about adding the following sentence at the end of the
changelog :

A notable side effect of this change is that modification of the
device's size requires a transaction to be started and committed before
the relocation loop starts. This is necessary to ensure that relocation
process sees the shrunk device size.
David Sterba April 2, 2019, 3:52 p.m. UTC | #3
On Tue, Apr 02, 2019 at 08:55:32AM +0300, Nikolay Borisov wrote:
> 
> 
> On 1.04.19 г. 21:26 ч., David Sterba wrote:
> > On Wed, Mar 27, 2019 at 02:24:06PM +0200, Nikolay Borisov wrote:
> >> During device shrink pinned/pending chunks (i.e those which have been
> >> deleted/created respectively, in the current transaction and haven't
> >> touched disk) need to be accounted when doing device shrink. Presently
> >> this happens after the main relocation loop in btrfs_shrink_device,
> >> which could lead to making another go in the body of the function.
> >>
> >> Since there is no hard requirement to perform pinned/pending chunks
> >> handling after the relocation loop, move the code before it. This leads
> >> to simplifying the code flow around - i.e no need to use 'goto again'.
> > 
> > On the other hand this starts 2 transactions unconditionally, previously
> > it was 1 for the final change and 1 if there were pending chunks. This
> > should be mentioned or explained why this is needed, otherwise the code
> > looks equivalent to the original version. In this case some guidance in
> > the changelog could shorten the time to understand the change, I've been
> > starting at it for half an hour.
> > 
> 
> Valid point, how about adding the following sentence at the end of the
> changelog :
> 
> A notable side effect of this change is that modification of the
> device's size requires a transaction to be started and committed before
> the relocation loop starts. This is necessary to ensure that relocation
> process sees the shrunk device size.

Thanks, patch updated.
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 66f4032dba13..256f7c5476bc 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4722,15 +4722,16 @@  int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	int slot;
 	int failed = 0;
 	bool retried = false;
-	bool checked_pending_chunks = false;
 	struct extent_buffer *l;
 	struct btrfs_key key;
 	struct btrfs_super_block *super_copy = fs_info->super_copy;
 	u64 old_total = btrfs_super_total_bytes(super_copy);
 	u64 old_size = btrfs_device_get_total_bytes(device);
 	u64 diff;
+	u64 start;
 
-	new_size = round_down(new_size, fs_info->sectorsize);
+	start = round_down(new_size, fs_info->sectorsize);
+	new_size = start;
 	diff = round_down(old_size - new_size, fs_info->sectorsize);
 
 	if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state))
@@ -4742,6 +4743,12 @@  int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 
 	path->reada = READA_BACK;
 
+	trans = btrfs_start_transaction(root, 0);
+	if (IS_ERR(trans)) {
+		btrfs_free_path(path);
+		return PTR_ERR(trans);
+	}
+
 	mutex_lock(&fs_info->chunk_mutex);
 
 	btrfs_device_set_total_bytes(device, new_size);
@@ -4749,7 +4756,21 @@  int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 		device->fs_devices->total_rw_bytes -= diff;
 		atomic64_sub(diff, &fs_info->free_chunk_space);
 	}
-	mutex_unlock(&fs_info->chunk_mutex);
+
+	/*
+	 * Once the device's size has been set to the new size, ensure all
+	 * in-memory chunks are synced to disk so that the loop below sees them
+	 * and relocates them accordingly.
+	 */
+	if (contains_pending_extent(trans->transaction, device, &start, diff)) {
+		mutex_unlock(&fs_info->chunk_mutex);
+		ret = btrfs_commit_transaction(trans);
+		if (ret)
+			goto done;
+	} else {
+		mutex_unlock(&fs_info->chunk_mutex);
+		btrfs_end_transaction(trans);
+	}
 
 again:
 	key.objectid = device->devid;
@@ -4840,36 +4861,6 @@  int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	mutex_lock(&fs_info->chunk_mutex);
-
-	/*
-	 * We checked in the above loop all device extents that were already in
-	 * the device tree. However before we have updated the device's
-	 * total_bytes to the new size, we might have had chunk allocations that
-	 * have not complete yet (new block groups attached to transaction
-	 * handles), and therefore their device extents were not yet in the
-	 * device tree and we missed them in the loop above. So if we have any
-	 * pending chunk using a device extent that overlaps the device range
-	 * that we can not use anymore, commit the current transaction and
-	 * repeat the search on the device tree - this way we guarantee we will
-	 * not have chunks using device extents that end beyond 'new_size'.
-	 */
-	if (!checked_pending_chunks) {
-		u64 start = new_size;
-		u64 len = old_size - new_size;
-
-		if (contains_pending_extent(trans->transaction, device,
-					    &start, len)) {
-			mutex_unlock(&fs_info->chunk_mutex);
-			checked_pending_chunks = true;
-			failed = 0;
-			retried = false;
-			ret = btrfs_commit_transaction(trans);
-			if (ret)
-				goto done;
-			goto again;
-		}
-	}
-
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	if (list_empty(&device->post_commit_list))
 		list_add_tail(&device->post_commit_list,