diff mbox

[v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.

Message ID 87wpazh3rl.fsf@notabene.neil.brown.name (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown April 5, 2017, 4:33 a.m. UTC
When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file.  This double-throttling can trigger
positive feedback loops that create significant delays.  The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server.  It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable.  52-460 seconds.  Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---

I moved where the flag is set, thanks to suggestion from
Ming Lei.
I've preserved the *-by: tags I was offered despite the code
being different, as the concept is identical.

Thanks,
NeilBrown


 drivers/block/loop.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Ming Lei April 5, 2017, 5:05 a.m. UTC | #1
On Wed, Apr 5, 2017 at 12:33 PM, NeilBrown <neilb@suse.com> wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file.  This double-throttling can trigger
> positive feedback loops that create significant delays.  The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
>
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server.  It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
>
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
> time taken.
>
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable.  52-460 seconds.  Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
>
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: NeilBrown <neilb@suse.com>

Reviewed-by: Ming Lei <tom.leiming@gmail.com>

> ---
>
> I moved where the flag is set, thanks to suggestion from
> Ming Lei.
> I've preserved the *-by: tags I was offered despite the code
> being different, as the concept is identical.
>
> Thanks,
> NeilBrown
>
>
>  drivers/block/loop.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..44b3506fd086 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
>         if (IS_ERR(lo->worker_task))
>                 return -ENOMEM;
>         set_user_nice(lo->worker_task, MIN_NICE);
> +       lo->worker_task->flags |= PF_LESS_THROTTLE;
>         return 0;
>  }
>
> --
> 2.12.2
>
Michal Hocko April 5, 2017, 7:19 a.m. UTC | #2
On Wed 05-04-17 14:33:50, NeilBrown wrote:
> 
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file.  This double-throttling can trigger
> positive feedback loops that create significant delays.  The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
> 
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server.  It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
> 
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
> time taken.
> 
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable.  52-460 seconds.  Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
> 
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> I moved where the flag is set, thanks to suggestion from
> Ming Lei.
> I've preserved the *-by: tags I was offered despite the code
> being different, as the concept is identical.
> 
> Thanks,
> NeilBrown
> 
> 
>  drivers/block/loop.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..44b3506fd086 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
>  	if (IS_ERR(lo->worker_task))
>  		return -ENOMEM;
>  	set_user_nice(lo->worker_task, MIN_NICE);
> +	lo->worker_task->flags |= PF_LESS_THROTTLE;
>  	return 0;

As mentioned elsewhere, PF flags should be updated only on the current
task otherwise there is potential rmw race. Is this safe? The code runs
concurrently with the worker thread.
diff mbox

Patch

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..44b3506fd086 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -852,6 +852,7 @@  static int loop_prepare_queue(struct loop_device *lo)
 	if (IS_ERR(lo->worker_task))
 		return -ENOMEM;
 	set_user_nice(lo->worker_task, MIN_NICE);
+	lo->worker_task->flags |= PF_LESS_THROTTLE;
 	return 0;
 }