diff mbox

mm,writeback: Don't use memory reserves for wb_start_writeback

Message ID 20160314160900.GC11400@dhcp22.suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Hocko March 14, 2016, 4:09 p.m. UTC
On Sun 13-03-16 23:22:23, Tetsuo Handa wrote:
[...]

I am not familiar with the writeback code so I might be missing
something essential here but why are we even queueing more and more
work without checking there has been enough already scheduled or in
progress.

Something as simple as:

> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 5c46ed9..21450c7 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -929,7 +929,8 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
>  	 * This is WB_SYNC_NONE writeback, so if allocation fails just
>  	 * wakeup the thread for old dirty data writeback
>  	 */
> -	work = kzalloc(sizeof(*work), GFP_ATOMIC);
> +	work = kzalloc(sizeof(*work),
> +		       GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);

Well, I guess you are right that this doesn't sound like a context
which really needs access to memory reserves and GFP_ATOMIC would more
used for what can be achieved by GFP_NOWAIT now. Using __GFP_NOMEMALLOC
would be needed regardless as you pointed out already because this might
be called from the page reclaim context. So if the above simple hack
or other explicit limit cannot be done then __GFP_NOMEMALLOC is an
absolute minimum.

>  	if (!work) {
>  		trace_writeback_nowork(wb);
>  		wb_wakeup(wb);
> -- 
> 1.8.3.1

Comments

Tejun Heo March 16, 2016, 8:46 p.m. UTC | #1
Hello,

(cc'ing Jan)

On Mon, Mar 14, 2016 at 05:09:00PM +0100, Michal Hocko wrote:
> On Sun 13-03-16 23:22:23, Tetsuo Handa wrote:
> [...]
> 
> I am not familiar with the writeback code so I might be missing
> something essential here but why are we even queueing more and more
> work without checking there has been enough already scheduled or in
> progress.
>
> Something as simple as:
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 6915c950e6e8..aa52e23ac280 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -887,7 +887,7 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
>  {
>  	struct wb_writeback_work *work;
>  
> -	if (!wb_has_dirty_io(wb))
> +	if (!wb_has_dirty_io(wb) || writeback_in_progress(wb))
>  		return;

I'm not sure this would be safe.  It shouldn't harm correctness as
wb_start_writeback() isn't used in sync case but this might change
flush behavior in various ways.  Dropping GFP_ATOMIC as suggested by
Tetsuo is likely better.

Thanks.
Jan Kara March 18, 2016, 1:11 p.m. UTC | #2
On Wed 16-03-16 13:46:17, Tejun Heo wrote:
> Hello,
> 
> (cc'ing Jan)
> 
> On Mon, Mar 14, 2016 at 05:09:00PM +0100, Michal Hocko wrote:
> > On Sun 13-03-16 23:22:23, Tetsuo Handa wrote:
> > [...]
> > 
> > I am not familiar with the writeback code so I might be missing
> > something essential here but why are we even queueing more and more
> > work without checking there has been enough already scheduled or in
> > progress.
> >
> > Something as simple as:
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 6915c950e6e8..aa52e23ac280 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -887,7 +887,7 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
> >  {
> >  	struct wb_writeback_work *work;
> >  
> > -	if (!wb_has_dirty_io(wb))
> > +	if (!wb_has_dirty_io(wb) || writeback_in_progress(wb))
> >  		return;
> 
> I'm not sure this would be safe.  It shouldn't harm correctness as
> wb_start_writeback() isn't used in sync case but this might change
> flush behavior in various ways.  Dropping GFP_ATOMIC as suggested by
> Tetsuo is likely better.

Yes, there can be different requests for different numbers of pages to be
written and you don't want to discard a request to clean 4000 pages just
because a writeback of 10 pages is just running. As Tejun says, this is not
a hard requirement but in general it would be unexpected for the users of
the api...

								Honza
Michal Hocko March 18, 2016, 1:34 p.m. UTC | #3
On Fri 18-03-16 14:11:36, Jan Kara wrote:
> On Wed 16-03-16 13:46:17, Tejun Heo wrote:
> > Hello,
> > 
> > (cc'ing Jan)
> > 
> > On Mon, Mar 14, 2016 at 05:09:00PM +0100, Michal Hocko wrote:
> > > On Sun 13-03-16 23:22:23, Tetsuo Handa wrote:
> > > [...]
> > > 
> > > I am not familiar with the writeback code so I might be missing
> > > something essential here but why are we even queueing more and more
> > > work without checking there has been enough already scheduled or in
> > > progress.
> > >
> > > Something as simple as:
> > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > index 6915c950e6e8..aa52e23ac280 100644
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -887,7 +887,7 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
> > >  {
> > >  	struct wb_writeback_work *work;
> > >  
> > > -	if (!wb_has_dirty_io(wb))
> > > +	if (!wb_has_dirty_io(wb) || writeback_in_progress(wb))
> > >  		return;
> > 
> > I'm not sure this would be safe.  It shouldn't harm correctness as
> > wb_start_writeback() isn't used in sync case but this might change
> > flush behavior in various ways.  Dropping GFP_ATOMIC as suggested by
> > Tetsuo is likely better.
> 
> Yes, there can be different requests for different numbers of pages to be
> written and you don't want to discard a request to clean 4000 pages just
> because a writeback of 10 pages is just running. As Tejun says, this is not
> a hard requirement but in general it would be unexpected for the users of
> the api...

Thanks for the clarification. Then the proper fix indeed is to add
__GFP_NOMEMALLOC.
diff mbox

Patch

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6915c950e6e8..aa52e23ac280 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -887,7 +887,7 @@  void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
 {
 	struct wb_writeback_work *work;
 
-	if (!wb_has_dirty_io(wb))
+	if (!wb_has_dirty_io(wb) || writeback_in_progress(wb))
 		return;
 
 	/*