[1/7] btrfs: make btrfs_issue_discard return bytes discarded
diff mbox

Message ID 1434375680-4115-1-git-send-email-jeffm@suse.com
State Accepted
Headers show

Commit Message

Jeff Mahoney June 15, 2015, 1:41 p.m. UTC
From: Jeff Mahoney <jeffm@suse.com>

Initially this will just be the length argument passed to it,
but the following patches will adjust that to reflect re-alignment
and skipped blocks.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/extent-tree.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

Comments

Jeff Mahoney June 15, 2015, 1:45 p.m. UTC | #1
Apologies for the out-of-order post. git send-email caught ^C
at just the right time and send them without my confirmation. That's
also why the map->lookup casting patch got mixed in here.

The automatic block group removal patch introduced some regressions
in how discards are handled.

1/ FITRIM only iterates over block groups on disk - removed block groups
   won't be trimmed.
2/ Clearing the dirty bit from extents in removed block groups means that
   those extents won't be discarded when the block group is removed.
3/ More of a UI wart: We don't wait on block groups to be removed during
   read-only remount or fs umount. This results in block groups that
   /should/ have been discarded on thin provisioned storage hanging around
   until the file system is mounted read-write again.

The following patches address these problems by:
1/ Iterating over block groups on disk and then iterating over free space.
   This is consistent with how other file systems handle FITRIM.
2/ Putting removed block groups on a list so that they are automatically
   discarded during btrfs_finish_extent_commit after transaction commit.
   Note: This may still leave undiscarded space on disk if the system
   crashes after transaction commit but before discard. The file system
   itself will be compeltely consistent, but the user will need to trim
   manually.
3/ Simple: We call btrfs_delete_unused_bgs explicitly during ro-remount
   and umount.
Changelog:
v1->v2
- -odiscard
 - Fix ordering to ensure that we don't discard extents freed in an
    uncommitted transaction.
- FITRIM
  - Don't start a transaction so the entire run is transactionless
  - The loop can be interrupted while waiting on the chunk mutex and
    after the discard has completed.
  - The only lock held for the duration is the device_list_mutex.  The
    chunk mutex is take per loop iteration so normal operations should
    continue while we're running, even on large file systems.

v2->v3
- -odiscard
 - Factor out get/put block_group->trimming to ensure that cleanup always
   happens at the last reference drop.
 - Cleanup the free space cache on the last reference drop.
 - Use list_move instead of list_add in case of multiple adds.  We still
   issue a warning but we shouldn't fall over.
 - Explicitly delete unused block groups in close_ctree and ro-remount.
- FITRIM
 - Cleaned up pointer tricks that abused &NULL->member.
 - Take the commit_root_sem across loop iteration to protect against
   transaction commit moving the commit root.

v3->v4
- new: skip superblocks during discard
- new: btrfs_issue_discard passes back discarded_bytes on success

v4->v5
- new: check/align/warn on unaligned starting offset in btrfs_issue_discard
- new: handle case where superblock begins before the range on <4k filesystems
- no changes to other patches

Thanks,

-Jeff-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana June 17, 2015, 9:53 a.m. UTC | #2
On Mon, Jun 15, 2015 at 2:41 PM,  <jeffm@suse.com> wrote:
> From: Jeff Mahoney <jeffm@suse.com>
>
> Initially this will just be the length argument passed to it,
> but the following patches will adjust that to reflect re-alignment
> and skipped blocks.
>
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Filipe Manana <fdmanana@suse.com>

> ---
>  fs/btrfs/extent-tree.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 0ec3acd..da1145d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1884,10 +1884,17 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans,
>         return ret;
>  }
>
> -static int btrfs_issue_discard(struct block_device *bdev,
> -                               u64 start, u64 len)
> +static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
> +                              u64 *discarded_bytes)
>  {
> -       return blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_NOFS, 0);
> +       int ret = 0;
> +
> +       *discarded_bytes = 0;
> +       ret = blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_NOFS, 0);
> +       if (!ret)
> +               *discarded_bytes = len;
> +
> +       return ret;
>  }
>
>  int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
> @@ -1908,14 +1915,16 @@ int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
>
>
>                 for (i = 0; i < bbio->num_stripes; i++, stripe++) {
> +                       u64 bytes;
>                         if (!stripe->dev->can_discard)
>                                 continue;
>
>                         ret = btrfs_issue_discard(stripe->dev->bdev,
>                                                   stripe->physical,
> -                                                 stripe->length);
> +                                                 stripe->length,
> +                                                 &bytes);
>                         if (!ret)
> -                               discarded_bytes += stripe->length;
> +                               discarded_bytes += bytes;
>                         else if (ret != -EOPNOTSUPP)
>                                 break; /* Logic errors or -ENOMEM, or -EIO but I don't know how that could happen JDM */
>
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0ec3acd..da1145d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1884,10 +1884,17 @@  static int remove_extent_backref(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-static int btrfs_issue_discard(struct block_device *bdev,
-				u64 start, u64 len)
+static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
+			       u64 *discarded_bytes)
 {
-	return blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_NOFS, 0);
+	int ret = 0;
+
+	*discarded_bytes = 0;
+	ret = blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_NOFS, 0);
+	if (!ret)
+		*discarded_bytes = len;
+
+	return ret;
 }
 
 int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
@@ -1908,14 +1915,16 @@  int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
 
 
 		for (i = 0; i < bbio->num_stripes; i++, stripe++) {
+			u64 bytes;
 			if (!stripe->dev->can_discard)
 				continue;
 
 			ret = btrfs_issue_discard(stripe->dev->bdev,
 						  stripe->physical,
-						  stripe->length);
+						  stripe->length,
+						  &bytes);
 			if (!ret)
-				discarded_bytes += stripe->length;
+				discarded_bytes += bytes;
 			else if (ret != -EOPNOTSUPP)
 				break; /* Logic errors or -ENOMEM, or -EIO but I don't know how that could happen JDM */