diff mbox

[v2] btrfs: cleanup btrfs_async_submit_limit to return the final limit value

Message ID 20171102060353.7103-1-anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Anand Jain Nov. 2, 2017, 6:03 a.m. UTC
We feedback IO progress when it falls below 2/3 times of the limit
obtained from btrfs_async_submit_limit(), and creates a wait for the
write process and makes progress during the async submission.

In general device/transport q depth is 256 and, btrfs_async_submit_limit()
returns 256 times per device which originally was introduced by [1]. But
256 at the device level is for all types of IOs (R/W sync/async) and so
may be it was possible that entire of 256 could have occupied by async
writes and, so later patch [2] took only 2/3 times of 256 which seemed to
work well.

 [1]
 cb03c743c648
 Btrfs: Change the congestion functions to meter the number of async submits as well

 [2]
 4854ddd0ed0a
 Btrfs: Wait for kernel threads to make progress during async submission

This patch is a cleanup patch, no functional changes. And now as we are taking
only 2/3 of limit (256), so btrfs_async_submit_limit() will return 170 itself.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
IMO:
1. If the pdflush issue is fixed, we should go back to bdi congestion method,
as block layer is more appropriate and accurate to tell when the device is
congested. Device q depth 256 is very generic.
2. Consider RAID1 devices at different speed (SSD and iscsi LUN) not too sure
if this approach would lead to the FS layer IO performance  throttle at the
speed of the lowest ? wonder how to reliably test it.

 fs/btrfs/disk-io.c | 6 ++++--
 fs/btrfs/volumes.c | 1 -
 2 files changed, 4 insertions(+), 3 deletions(-)

Comments

David Sterba Nov. 6, 2017, 3:34 p.m. UTC | #1
On Thu, Nov 02, 2017 at 02:03:53PM +0800, Anand Jain wrote:
> We feedback IO progress when it falls below 2/3 times of the limit
> obtained from btrfs_async_submit_limit(), and creates a wait for the
> write process and makes progress during the async submission.
> 
> In general device/transport q depth is 256 and, btrfs_async_submit_limit()
> returns 256 times per device which originally was introduced by [1]. But
> 256 at the device level is for all types of IOs (R/W sync/async) and so
> may be it was possible that entire of 256 could have occupied by async
> writes and, so later patch [2] took only 2/3 times of 256 which seemed to
> work well.
> 
>  [1]
>  cb03c743c648
>  Btrfs: Change the congestion functions to meter the number of async submits as well
> 
>  [2]
>  4854ddd0ed0a
>  Btrfs: Wait for kernel threads to make progress during async submission
> 
> This patch is a cleanup patch, no functional changes. And now as we are taking
> only 2/3 of limit (256), so btrfs_async_submit_limit() will return 170 itself.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> IMO:
> 1. If the pdflush issue is fixed, we should go back to bdi congestion method,
> as block layer is more appropriate and accurate to tell when the device is
> congested. Device q depth 256 is very generic.
> 2. Consider RAID1 devices at different speed (SSD and iscsi LUN) not too sure
> if this approach would lead to the FS layer IO performance  throttle at the
> speed of the lowest ? wonder how to reliably test it.
> 
>  fs/btrfs/disk-io.c | 6 ++++--
>  fs/btrfs/volumes.c | 1 -
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index dfdab849037b..12702e292007 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -861,7 +861,10 @@ unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info)
>  	unsigned long limit = min_t(unsigned long,
>  				    info->thread_pool_size,
>  				    info->fs_devices->open_devices);
> -	return 256 * limit;
> +	/*
> +	 * limit:170 is computed as 2/3 * 256.
> +	 */
> +	return 170 * limit;

Please keep it opencoded, the constant will be calculated by compiler
but for code clarity it's better to be written as 2 / 3 and it's
self-documenting, so you can drop the comment.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 6, 2017, 3:38 p.m. UTC | #2
On Thu, Nov 02, 2017 at 02:03:53PM +0800, Anand Jain wrote:
> We feedback IO progress when it falls below 2/3 times of the limit
> obtained from btrfs_async_submit_limit(), and creates a wait for the
> write process and makes progress during the async submission.
> 
> In general device/transport q depth is 256 and, btrfs_async_submit_limit()
> returns 256 times per device which originally was introduced by [1]. But
> 256 at the device level is for all types of IOs (R/W sync/async) and so
> may be it was possible that entire of 256 could have occupied by async
> writes and, so later patch [2] took only 2/3 times of 256 which seemed to
> work well.
> 
>  [1]
>  cb03c743c648
>  Btrfs: Change the congestion functions to meter the number of async submits as well
> 
>  [2]
>  4854ddd0ed0a
>  Btrfs: Wait for kernel threads to make progress during async submission
> 
> This patch is a cleanup patch, no functional changes. And now as we are taking
> only 2/3 of limit (256), so btrfs_async_submit_limit() will return 170 itself.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> IMO:
> 1. If the pdflush issue is fixed, we should go back to bdi congestion method,
> as block layer is more appropriate and accurate to tell when the device is
> congested. Device q depth 256 is very generic.
> 2. Consider RAID1 devices at different speed (SSD and iscsi LUN) not too sure
> if this approach would lead to the FS layer IO performance  throttle at the
> speed of the lowest ? wonder how to reliably test it.

The referenced commits are from 2008, there have been many changes in
the queue flushing etc, so we might need to revisit the current
behaviour completely. Using the congestion API is desired, but we also
need to keep the IO behaviour (or make it better of course). In such
case I'd suggest small steps so we can possibly catch the regressions.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain Nov. 7, 2017, 2:22 a.m. UTC | #3
>> 1. If the pdflush issue is fixed, we should go back to bdi congestion method,
>> as block layer is more appropriate and accurate to tell when the device is
>> congested. Device q depth 256 is very generic.
>> 2. Consider RAID1 devices at different speed (SSD and iscsi LUN) not too sure
>> if this approach would lead to the FS layer IO performance  throttle at the
>> speed of the lowest ? wonder how to reliably test it.
> 
> The referenced commits are from 2008, there have been many changes in
> the queue flushing etc, so we might need to revisit the current
> behaviour completely. Using the congestion API is desired, but we also
> need to keep the IO behaviour (or make it better of course). In such
> case I'd suggest small steps so we can possibly catch the regressions.

  Ok. Will try. Its still confusing to me.

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfdab849037b..12702e292007 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -861,7 +861,10 @@  unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info)
 	unsigned long limit = min_t(unsigned long,
 				    info->thread_pool_size,
 				    info->fs_devices->open_devices);
-	return 256 * limit;
+	/*
+	 * limit:170 is computed as 2/3 * 256.
+	 */
+	return 170 * limit;
 }
 
 static void run_one_async_start(struct btrfs_work *work)
@@ -887,7 +890,6 @@  static void run_one_async_done(struct btrfs_work *work)
 	fs_info = async->fs_info;
 
 	limit = btrfs_async_submit_limit(fs_info);
-	limit = limit * 2 / 3;
 
 	/*
 	 * atomic_dec_return implies a barrier for waitqueue_active
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d1d8aa226bff..8044790c5de6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -382,7 +382,6 @@  static noinline void run_scheduled_bios(struct btrfs_device *device)
 
 	bdi = device->bdev->bd_bdi;
 	limit = btrfs_async_submit_limit(fs_info);
-	limit = limit * 2 / 3;
 
 loop:
 	spin_lock(&device->io_lock);