diff mbox

[1/1] block: ignore flush requests when storage is clean

Message ID 1466435958-308-1-git-send-email-den@openvz.org (mailing list archive)
State New, archived
Headers show

Commit Message

Denis V. Lunev June 20, 2016, 3:19 p.m. UTC
From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a dirty flag in BlockDriverState which is set
in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
avoid unnessesary flushing when storage is clean.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
---
 block.c                   | 1 +
 block/dirty-bitmap.c      | 3 +++
 block/io.c                | 6 ++++++
 include/block/block_int.h | 1 +
 4 files changed, 11 insertions(+)

Comments

Eric Blake June 20, 2016, 10:33 p.m. UTC | #1
On 06/20/2016 09:19 AM, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Some guests (win2008 server for example) do a lot of unnecessary
> flushing when underlying media has not changed. This adds additional
> overhead on host when calling fsync/fdatasync.
> 
> This change introduces a dirty flag in BlockDriverState which is set
> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
> avoid unnessesary flushing when storage is clean.

s/unnessesary/unnecessary/

> 
> The problem with excessive flushing was found by a performance test
> which does parallel directory tree creation (from 2 processes).
> Results improved from 0.424 loops/sec to 0.432 loops/sec.
> Each loop creates 10^3 directories with 10 files in each.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> ---
>  block.c                   | 1 +
>  block/dirty-bitmap.c      | 3 +++
>  block/io.c                | 6 ++++++
>  include/block/block_int.h | 1 +
>  4 files changed, 11 insertions(+)

Otherwise seems reasonable, but I'll let others with more experience on
flush semantics chime in.
Paolo Bonzini June 21, 2016, 7:32 a.m. UTC | #2
On 20/06/2016 17:19, Denis V. Lunev wrote:
> +    /* Check if storage is actually dirty before flushing to disk */
> +    if (!bs->dirty) {
> +        goto flush_parent;
> +    }
> +    bs->dirty = false;
> +

This should be cleared after the flush is complete.  If you have

    write begin
    write end
    flush #1 begin
    flush #2 begin

Then the second flush must only return after the first has finished.

Paolo
Denis V. Lunev June 21, 2016, 7:41 a.m. UTC | #3
On 06/21/2016 10:32 AM, Paolo Bonzini wrote:
>
> On 20/06/2016 17:19, Denis V. Lunev wrote:
>> +    /* Check if storage is actually dirty before flushing to disk */
>> +    if (!bs->dirty) {
>> +        goto flush_parent;
>> +    }
>> +    bs->dirty = false;
>> +
> This should be cleared after the flush is complete.  If you have
>
>      write begin
>      write end
>      flush #1 begin
>      flush #2 begin
>
> Then the second flush must only return after the first has finished.
>
> Paolo
Really interesting point, I have missed it. Though this case
is exactly one which we want to optimize. 2nd flush is
unnecessary and should not be sent, BUT you perfectly
correct it must return later than the first to the guest.

Have to rework. Nice catch!

Den
Kevin Wolf June 21, 2016, 7:45 a.m. UTC | #4
Am 21.06.2016 um 09:32 hat Paolo Bonzini geschrieben:
> 
> 
> On 20/06/2016 17:19, Denis V. Lunev wrote:
> > +    /* Check if storage is actually dirty before flushing to disk */
> > +    if (!bs->dirty) {
> > +        goto flush_parent;
> > +    }
> > +    bs->dirty = false;
> > +
> 
> This should be cleared after the flush is complete.  If you have
> 
>     write begin
>     write end
>     flush #1 begin
>     flush #2 begin
> 
> Then the second flush must only return after the first has finished.

I think clearing bs->dirty after the flush completion wouldn't
necessarily be right either if there are concurrent writes in flight, as
only completed writes are guaranteed to be flushed by it.

Kevin
Denis V. Lunev June 21, 2016, 8:17 a.m. UTC | #5
On 06/21/2016 10:45 AM, Kevin Wolf wrote:
> Am 21.06.2016 um 09:32 hat Paolo Bonzini geschrieben:
>>
>> On 20/06/2016 17:19, Denis V. Lunev wrote:
>>> +    /* Check if storage is actually dirty before flushing to disk */
>>> +    if (!bs->dirty) {
>>> +        goto flush_parent;
>>> +    }
>>> +    bs->dirty = false;
>>> +
>> This should be cleared after the flush is complete.  If you have
>>
>>      write begin
>>      write end
>>      flush #1 begin
>>      flush #2 begin
>>
>> Then the second flush must only return after the first has finished.
> I think clearing bs->dirty after the flush completion wouldn't
> necessarily be right either if there are concurrent writes in flight, as
> only completed writes are guaranteed to be flushed by it.
>
> Kevin
this is not a problem if flush 2 will return after flush 1.
This will mean that all writes prior to both flushes
will land to the disk.

Keeping this in mind dirty should be cleared before
flush operation start.

Den
Kevin Wolf June 21, 2016, 8:57 a.m. UTC | #6
Am 21.06.2016 um 10:17 hat Denis V. Lunev geschrieben:
> On 06/21/2016 10:45 AM, Kevin Wolf wrote:
> >Am 21.06.2016 um 09:32 hat Paolo Bonzini geschrieben:
> >>
> >>On 20/06/2016 17:19, Denis V. Lunev wrote:
> >>>+    /* Check if storage is actually dirty before flushing to disk */
> >>>+    if (!bs->dirty) {
> >>>+        goto flush_parent;
> >>>+    }
> >>>+    bs->dirty = false;
> >>>+
> >>This should be cleared after the flush is complete.  If you have
> >>
> >>     write begin
> >>     write end
> >>     flush #1 begin
> >>     flush #2 begin
> >>
> >>Then the second flush must only return after the first has finished.
> >I think clearing bs->dirty after the flush completion wouldn't
> >necessarily be right either if there are concurrent writes in flight, as
> >only completed writes are guaranteed to be flushed by it.
> >
> >Kevin
> this is not a problem if flush 2 will return after flush 1.
> This will mean that all writes prior to both flushes
> will land to the disk.

We need to be careful in cases like this:

start + complete write #1
start flush #1
start + complete write #2
start flush #2
complete flush #1
complete flush #2

Here letting flush #2 wait for flush #1 isn't enough because that one
only guarantees to flush write #1, but not write #2. However, flush #2
is required to flush write #2, too.

> Keeping this in mind dirty should be cleared before
> flush operation start.

Yes, if you do it like this, so that flush #2 ends up being an actual
flush instead of just waiting for flush #1, we're good.

Kevin
diff mbox

Patch

diff --git a/block.c b/block.c
index b331eb9..1a4e154 100644
--- a/block.c
+++ b/block.c
@@ -2591,6 +2591,7 @@  int bdrv_truncate(BlockDriverState *bs, int64_t offset)
         bdrv_dirty_bitmap_truncate(bs);
         bdrv_parent_cb_resize(bs);
     }
+    bs->dirty = true; /* file node sync is needed after truncate */
     return ret;
 }
 
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 4902ca5..ad208ad 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -370,6 +370,9 @@  void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
         }
         hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
     }
+
+    /* Set global block driver dirty flag */
+    bs->dirty = true;
 }
 
 /**
diff --git a/block/io.c b/block/io.c
index ebdb9d8..ef372d2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2234,6 +2234,12 @@  int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
         }
     }
 
+    /* Check if storage is actually dirty before flushing to disk */
+    if (!bs->dirty) {
+        goto flush_parent;
+    }
+    bs->dirty = false;
+
     /* But don't actually force it to the disk with cache=unsafe */
     if (bs->open_flags & BDRV_O_NO_FLUSH) {
         goto flush_parent;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 688c6be..a62943b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -417,6 +417,7 @@  struct BlockDriverState {
     int sg;        /* if true, the device is a /dev/sg* */
     int copy_on_read; /* if true, copy read backing sectors into image
                          note this is a reference count */
+    bool dirty;
     bool probed;
 
     BlockDriver *drv; /* NULL means no media */