diff mbox

[2/6] writeback: wb_start_writeback() should use WB_SYNC_ALL for WB_REASON_SYNC

Message ID 1458669320-6819-3-git-send-email-axboe@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jens Axboe March 22, 2016, 5:55 p.m. UTC
If you call sync, the initial call to wakeup_flusher_threads() ends up
calling wb_start_writeback() with reason=WB_REASON_SYNC, but
wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode.
Ensure that we use WB_SYNC_ALL for a sync operation.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/fs-writeback.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Dave Chinner March 22, 2016, 9:34 p.m. UTC | #1
On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote:
> If you call sync, the initial call to wakeup_flusher_threads() ends up
> calling wb_start_writeback() with reason=WB_REASON_SYNC, but
> wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode.
> Ensure that we use WB_SYNC_ALL for a sync operation.

This seems wrong to me. We want background write to happen as
quickly as possible and /not block/ when we first kick sync.

The latter blocking passes of sync use WB_SYNC_ALL to ensure that we
block waiting for all remaining IO to be issued and waited on, but
the background writeback doesn't need to do this.

Cheers,

Dave.
Jens Axboe March 22, 2016, 9:40 p.m. UTC | #2
On 03/22/2016 03:34 PM, Dave Chinner wrote:
> On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote:
>> If you call sync, the initial call to wakeup_flusher_threads() ends up
>> calling wb_start_writeback() with reason=WB_REASON_SYNC, but
>> wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode.
>> Ensure that we use WB_SYNC_ALL for a sync operation.
>
> This seems wrong to me. We want background write to happen as
> quickly as possible and /not block/ when we first kick sync.

It's not going to block. wakeup_flusher_threads() async queues writeback 
work through wb_start_writeback().

> The latter blocking passes of sync use WB_SYNC_ALL to ensure that we
> block waiting for all remaining IO to be issued and waited on, but
> the background writeback doesn't need to do this.

That's fine, they can get to wait on the previously issued IO, which was 
properly submitted with WB_SYNC_ALL.

Maybe I'm missing your point?
Jens Axboe March 22, 2016, 9:51 p.m. UTC | #3
On 03/22/2016 03:40 PM, Jens Axboe wrote:
> On 03/22/2016 03:34 PM, Dave Chinner wrote:
>> On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote:
>>> If you call sync, the initial call to wakeup_flusher_threads() ends up
>>> calling wb_start_writeback() with reason=WB_REASON_SYNC, but
>>> wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode.
>>> Ensure that we use WB_SYNC_ALL for a sync operation.
>>
>> This seems wrong to me. We want background write to happen as
>> quickly as possible and /not block/ when we first kick sync.
>
> It's not going to block. wakeup_flusher_threads() async queues writeback
> work through wb_start_writeback().

For block here, you mean the async work ending up doing 
wait_on_page_writeback() because we're doing WB_SYNC_ALL instead of 
WB_SYNC_NONE?

And if so:

>> The latter blocking passes of sync use WB_SYNC_ALL to ensure that we
>> block waiting for all remaining IO to be issued and waited on, but
>> the background writeback doesn't need to do this.

why not have it do that?
Dave Chinner March 22, 2016, 10:04 p.m. UTC | #4
On Tue, Mar 22, 2016 at 03:40:28PM -0600, Jens Axboe wrote:
> On 03/22/2016 03:34 PM, Dave Chinner wrote:
> >On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote:
> >>If you call sync, the initial call to wakeup_flusher_threads() ends up
> >>calling wb_start_writeback() with reason=WB_REASON_SYNC, but
> >>wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode.
> >>Ensure that we use WB_SYNC_ALL for a sync operation.
> >
> >This seems wrong to me. We want background write to happen as
> >quickly as possible and /not block/ when we first kick sync.
> 
> It's not going to block. wakeup_flusher_threads() async queues
> writeback work through wb_start_writeback().

The flusher threads block, not the initial wakeup. e.g. they will
now block waiting for data writeback to complete before writing the
inode. i.e. this code in __writeback_single_inode() is now triggered
by the background flusher:

        /*
         * Make sure to wait on the data before writing out the metadata.
         * This is important for filesystems that modify metadata on data
         * I/O completion. We don't do it for sync(2) writeback because it has a
         * separate, external IO completion path and ->sync_fs for guaranteeing
         * inode metadata is written back correctly.
         */
        if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) {
                int err = filemap_fdatawait(mapping);
                if (ret == 0)
                        ret = err;
        }

It also changes the writeback chunk size in write_cache_pages(), so
instead of doing a bit of writeback from all dirty inodes, it tries
to write everything from each inode in turn (see
writeback_chunk_size()) which will further exacerbate the wait
above.

> >The latter blocking passes of sync use WB_SYNC_ALL to ensure that we
> >block waiting for all remaining IO to be issued and waited on, but
> >the background writeback doesn't need to do this.
> 
> That's fine, they can get to wait on the previously issued IO, which
> was properly submitted with WB_SYNC_ALL.
> 
> Maybe I'm missing your point?

Making the background flusher block and wait for data makes it
completely ineffective in speeding up sync() processing...

Cheers,

Dave.
Jens Axboe March 22, 2016, 10:07 p.m. UTC | #5
On 03/22/2016 04:04 PM, Dave Chinner wrote:
> On Tue, Mar 22, 2016 at 03:40:28PM -0600, Jens Axboe wrote:
>> On 03/22/2016 03:34 PM, Dave Chinner wrote:
>>> On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote:
>>>> If you call sync, the initial call to wakeup_flusher_threads() ends up
>>>> calling wb_start_writeback() with reason=WB_REASON_SYNC, but
>>>> wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode.
>>>> Ensure that we use WB_SYNC_ALL for a sync operation.
>>>
>>> This seems wrong to me. We want background write to happen as
>>> quickly as possible and /not block/ when we first kick sync.
>>
>> It's not going to block. wakeup_flusher_threads() async queues
>> writeback work through wb_start_writeback().
>
> The flusher threads block, not the initial wakeup. e.g. they will
> now block waiting for data writeback to complete before writing the
> inode. i.e. this code in __writeback_single_inode() is now triggered
> by the background flusher:
>
>          /*
>           * Make sure to wait on the data before writing out the metadata.
>           * This is important for filesystems that modify metadata on data
>           * I/O completion. We don't do it for sync(2) writeback because it has a
>           * separate, external IO completion path and ->sync_fs for guaranteeing
>           * inode metadata is written back correctly.
>           */
>          if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) {
>                  int err = filemap_fdatawait(mapping);
>                  if (ret == 0)
>                          ret = err;
>          }

Yeah, that's not ideal, for this case we'd really like something that 
WB_SYNC_ALL_NOWAIT...

> It also changes the writeback chunk size in write_cache_pages(), so
> instead of doing a bit of writeback from all dirty inodes, it tries
> to write everything from each inode in turn (see
> writeback_chunk_size()) which will further exacerbate the wait
> above.

But that part is fine, if it wasn't for the waiting.

>>> The latter blocking passes of sync use WB_SYNC_ALL to ensure that we
>>> block waiting for all remaining IO to be issued and waited on, but
>>> the background writeback doesn't need to do this.
>>
>> That's fine, they can get to wait on the previously issued IO, which
>> was properly submitted with WB_SYNC_ALL.
>>
>> Maybe I'm missing your point?
>
> Making the background flusher block and wait for data makes it
> completely ineffective in speeding up sync() processing...

Agree, we should not wait on the pages individually, we want them 
submitted and then waited on. And I suppose it's no differently than 
handling the normal buffered write from an application, which then gets 
waited on with fsync() or similar. So I can drop this patch, it'll work 
fine without it.
diff mbox

Patch

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5c46ed9f3e14..97a9e9987134 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -936,7 +936,10 @@  void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
 		return;
 	}
 
-	work->sync_mode	= WB_SYNC_NONE;
+	if (reason == WB_REASON_SYNC)
+		work->sync_mode	= WB_SYNC_ALL;
+	else
+		work->sync_mode	= WB_SYNC_NONE;
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
 	work->reason	= reason;