Message ID | 156378816804.1087.8607636317907921438.stgit@buzz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] mm/filemap: don't initiate writeback if mapping has no dirty pages | expand |
(cc linux-fsdevel and Jan) On Mon, 22 Jul 2019 12:36:08 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote: > Functions like filemap_write_and_wait_range() should do nothing if inode > has no dirty pages or pages currently under writeback. But they anyway > construct struct writeback_control and this does some atomic operations > if CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and > updates state of writeback ownership, on slow path might be more work. > Current this path is safely avoided only when inode mapping has no pages. > > For example generic_file_read_iter() calls filemap_write_and_wait_range() > at each O_DIRECT read - pretty hot path. > > This patch skips starting new writeback if mapping has no dirty tags set. > If writeback is already in progress filemap_write_and_wait_range() will > wait for it. > > ... > > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -408,7 +408,8 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, > .range_end = end, > }; > > - if (!mapping_cap_writeback_dirty(mapping)) > + if (!mapping_cap_writeback_dirty(mapping) || > + !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) > return 0; > > wbc_attach_fdatawrite_inode(&wbc, mapping->host); How does this play with tagged_writepages? We assume that no tagging has been performed by any __filemap_fdatawrite_range() caller?
On 23.07.2019 3:52, Andrew Morton wrote: > > (cc linux-fsdevel and Jan) > > On Mon, 22 Jul 2019 12:36:08 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote: > >> Functions like filemap_write_and_wait_range() should do nothing if inode >> has no dirty pages or pages currently under writeback. But they anyway >> construct struct writeback_control and this does some atomic operations >> if CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and >> updates state of writeback ownership, on slow path might be more work. >> Current this path is safely avoided only when inode mapping has no pages. >> >> For example generic_file_read_iter() calls filemap_write_and_wait_range() >> at each O_DIRECT read - pretty hot path. >> >> This patch skips starting new writeback if mapping has no dirty tags set. >> If writeback is already in progress filemap_write_and_wait_range() will >> wait for it. >> >> ... >> >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -408,7 +408,8 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, >> .range_end = end, >> }; >> >> - if (!mapping_cap_writeback_dirty(mapping)) >> + if (!mapping_cap_writeback_dirty(mapping) || >> + !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) >> return 0; >> >> wbc_attach_fdatawrite_inode(&wbc, mapping->host); > > How does this play with tagged_writepages? We assume that no tagging > has been performed by any __filemap_fdatawrite_range() caller? > Checking also PAGECACHE_TAG_TOWRITE is cheap but seems redundant. To-write tags are supposed to be a subset of dirty tags: to-write is set only when dirty is set and cleared after starting writeback. Special case set_page_writeback_keepwrite() which does not clear to-write should be for dirty page thus dirty tag is not going to be cleared either. Ext4 calls it after redirty_page_for_writepage() XFS even without clear_page_dirty_for_io() Anyway to-write tag without dirty tag or at clear page is confusing.
On Tue 23-07-19 11:16:51, Konstantin Khlebnikov wrote: > On 23.07.2019 3:52, Andrew Morton wrote: > > > > (cc linux-fsdevel and Jan) Thanks for CC Andrew. > > On Mon, 22 Jul 2019 12:36:08 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote: > > > > > Functions like filemap_write_and_wait_range() should do nothing if inode > > > has no dirty pages or pages currently under writeback. But they anyway > > > construct struct writeback_control and this does some atomic operations > > > if CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and > > > updates state of writeback ownership, on slow path might be more work. > > > Current this path is safely avoided only when inode mapping has no pages. > > > > > > For example generic_file_read_iter() calls filemap_write_and_wait_range() > > > at each O_DIRECT read - pretty hot path. Yes, but in common case mapping_needs_writeback() is false for files you do direct IO to (exactly the case with no pages in the mapping). So you shouldn't see the overhead at all. So which case you really care about? > > > This patch skips starting new writeback if mapping has no dirty tags set. > > > If writeback is already in progress filemap_write_and_wait_range() will > > > wait for it. > > > > > > ... > > > > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -408,7 +408,8 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, > > > .range_end = end, > > > }; > > > - if (!mapping_cap_writeback_dirty(mapping)) > > > + if (!mapping_cap_writeback_dirty(mapping) || > > > + !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) > > > return 0; > > > wbc_attach_fdatawrite_inode(&wbc, mapping->host); > > > > How does this play with tagged_writepages? We assume that no tagging > > has been performed by any __filemap_fdatawrite_range() caller? > > > > Checking also PAGECACHE_TAG_TOWRITE is cheap but seems redundant. > > To-write tags are supposed to be a subset of dirty tags: > to-write is set only when dirty is set and cleared after starting writeback. > > Special case set_page_writeback_keepwrite() which does not clear to-write > should be for dirty page thus dirty tag is not going to be cleared either. > Ext4 calls it after redirty_page_for_writepage() > XFS even without clear_page_dirty_for_io() > > Anyway to-write tag without dirty tag or at clear page is confusing. Yeah, TOWRITE tag is intended to be internal to writepages logic so your patch is fine in that regard. Overall the patch looks good to me so I'm just wondering a bit about the motivation... Honza
On 30.07.2019 17:14, Jan Kara wrote: > On Tue 23-07-19 11:16:51, Konstantin Khlebnikov wrote: >> On 23.07.2019 3:52, Andrew Morton wrote: >>> >>> (cc linux-fsdevel and Jan) > > Thanks for CC Andrew. > >>> On Mon, 22 Jul 2019 12:36:08 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote: >>> >>>> Functions like filemap_write_and_wait_range() should do nothing if inode >>>> has no dirty pages or pages currently under writeback. But they anyway >>>> construct struct writeback_control and this does some atomic operations >>>> if CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and >>>> updates state of writeback ownership, on slow path might be more work. >>>> Current this path is safely avoided only when inode mapping has no pages. >>>> >>>> For example generic_file_read_iter() calls filemap_write_and_wait_range() >>>> at each O_DIRECT read - pretty hot path. > > Yes, but in common case mapping_needs_writeback() is false for files you do > direct IO to (exactly the case with no pages in the mapping). So you > shouldn't see the overhead at all. So which case you really care about? > >>>> This patch skips starting new writeback if mapping has no dirty tags set. >>>> If writeback is already in progress filemap_write_and_wait_range() will >>>> wait for it. >>>> >>>> ... >>>> >>>> --- a/mm/filemap.c >>>> +++ b/mm/filemap.c >>>> @@ -408,7 +408,8 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, >>>> .range_end = end, >>>> }; >>>> - if (!mapping_cap_writeback_dirty(mapping)) >>>> + if (!mapping_cap_writeback_dirty(mapping) || >>>> + !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) >>>> return 0; >>>> wbc_attach_fdatawrite_inode(&wbc, mapping->host); >>> >>> How does this play with tagged_writepages? We assume that no tagging >>> has been performed by any __filemap_fdatawrite_range() caller? >>> >> >> Checking also PAGECACHE_TAG_TOWRITE is cheap but seems redundant. >> >> To-write tags are supposed to be a subset of dirty tags: >> to-write is set only when dirty is set and cleared after starting writeback. >> >> Special case set_page_writeback_keepwrite() which does not clear to-write >> should be for dirty page thus dirty tag is not going to be cleared either. >> Ext4 calls it after redirty_page_for_writepage() >> XFS even without clear_page_dirty_for_io() >> >> Anyway to-write tag without dirty tag or at clear page is confusing. > > Yeah, TOWRITE tag is intended to be internal to writepages logic so your > patch is fine in that regard. Overall the patch looks good to me so I'm > just wondering a bit about the motivation... In our case file mixes cached pages and O_DIRECT read. Kind of database were index header is memory mapped while the rest data read via O_DIRECT. I suppose for sharing index between multiple instances. On this path we also hit this bug: https://lore.kernel.org/lkml/156355839560.2063.5265687291430814589.stgit@buzz/ so that's why I've started looking into this code.
On Tue 30-07-19 17:57:18, Konstantin Khlebnikov wrote: > On 30.07.2019 17:14, Jan Kara wrote: > > On Tue 23-07-19 11:16:51, Konstantin Khlebnikov wrote: > > > On 23.07.2019 3:52, Andrew Morton wrote: > > > > > > > > (cc linux-fsdevel and Jan) > > > > Thanks for CC Andrew. > > > > > > On Mon, 22 Jul 2019 12:36:08 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote: > > > > > > > > > Functions like filemap_write_and_wait_range() should do nothing if inode > > > > > has no dirty pages or pages currently under writeback. But they anyway > > > > > construct struct writeback_control and this does some atomic operations > > > > > if CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and > > > > > updates state of writeback ownership, on slow path might be more work. > > > > > Current this path is safely avoided only when inode mapping has no pages. > > > > > > > > > > For example generic_file_read_iter() calls filemap_write_and_wait_range() > > > > > at each O_DIRECT read - pretty hot path. > > > > Yes, but in common case mapping_needs_writeback() is false for files you do > > direct IO to (exactly the case with no pages in the mapping). So you > > shouldn't see the overhead at all. So which case you really care about? > > > > > > > This patch skips starting new writeback if mapping has no dirty tags set. > > > > > If writeback is already in progress filemap_write_and_wait_range() will > > > > > wait for it. > > > > > > > > > > ... > > > > > > > > > > --- a/mm/filemap.c > > > > > +++ b/mm/filemap.c > > > > > @@ -408,7 +408,8 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, > > > > > .range_end = end, > > > > > }; > > > > > - if (!mapping_cap_writeback_dirty(mapping)) > > > > > + if (!mapping_cap_writeback_dirty(mapping) || > > > > > + !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) > > > > > return 0; > > > > > wbc_attach_fdatawrite_inode(&wbc, mapping->host); > > > > > > > > How does this play with tagged_writepages? We assume that no tagging > > > > has been performed by any __filemap_fdatawrite_range() caller? > > > > > > > > > > Checking also PAGECACHE_TAG_TOWRITE is cheap but seems redundant. > > > > > > To-write tags are supposed to be a subset of dirty tags: > > > to-write is set only when dirty is set and cleared after starting writeback. > > > > > > Special case set_page_writeback_keepwrite() which does not clear to-write > > > should be for dirty page thus dirty tag is not going to be cleared either. > > > Ext4 calls it after redirty_page_for_writepage() > > > XFS even without clear_page_dirty_for_io() > > > > > > Anyway to-write tag without dirty tag or at clear page is confusing. > > > > Yeah, TOWRITE tag is intended to be internal to writepages logic so your > > patch is fine in that regard. Overall the patch looks good to me so I'm > > just wondering a bit about the motivation... > > In our case file mixes cached pages and O_DIRECT read. Kind of database > were index header is memory mapped while the rest data read via O_DIRECT. > I suppose for sharing index between multiple instances. OK, that has always been a bit problematic but you're not the first one to have such design ;). So feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> to your patch. > On this path we also hit this bug: > https://lore.kernel.org/lkml/156355839560.2063.5265687291430814589.stgit@buzz/ > so that's why I've started looking into this code. I see. OK. Honza
On 30.07.2019 18:48, Jan Kara wrote: > On Tue 30-07-19 17:57:18, Konstantin Khlebnikov wrote: >> On 30.07.2019 17:14, Jan Kara wrote: >>> On Tue 23-07-19 11:16:51, Konstantin Khlebnikov wrote: >>>> On 23.07.2019 3:52, Andrew Morton wrote: >>>>> >>>>> (cc linux-fsdevel and Jan) >>> >>> Thanks for CC Andrew. >>> >>>>> On Mon, 22 Jul 2019 12:36:08 +0300 Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote: >>>>> >>>>>> Functions like filemap_write_and_wait_range() should do nothing if inode >>>>>> has no dirty pages or pages currently under writeback. But they anyway >>>>>> construct struct writeback_control and this does some atomic operations >>>>>> if CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and >>>>>> updates state of writeback ownership, on slow path might be more work. >>>>>> Current this path is safely avoided only when inode mapping has no pages. >>>>>> >>>>>> For example generic_file_read_iter() calls filemap_write_and_wait_range() >>>>>> at each O_DIRECT read - pretty hot path. >>> >>> Yes, but in common case mapping_needs_writeback() is false for files you do >>> direct IO to (exactly the case with no pages in the mapping). So you >>> shouldn't see the overhead at all. So which case you really care about? >>> >>>>>> This patch skips starting new writeback if mapping has no dirty tags set. >>>>>> If writeback is already in progress filemap_write_and_wait_range() will >>>>>> wait for it. >>>>>> >>>>>> ... >>>>>> >>>>>> --- a/mm/filemap.c >>>>>> +++ b/mm/filemap.c >>>>>> @@ -408,7 +408,8 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, >>>>>> .range_end = end, >>>>>> }; >>>>>> - if (!mapping_cap_writeback_dirty(mapping)) >>>>>> + if (!mapping_cap_writeback_dirty(mapping) || >>>>>> + !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) >>>>>> return 0; >>>>>> wbc_attach_fdatawrite_inode(&wbc, mapping->host); >>>>> >>>>> How does this play with tagged_writepages? We assume that no tagging >>>>> has been performed by any __filemap_fdatawrite_range() caller? >>>>> >>>> >>>> Checking also PAGECACHE_TAG_TOWRITE is cheap but seems redundant. >>>> >>>> To-write tags are supposed to be a subset of dirty tags: >>>> to-write is set only when dirty is set and cleared after starting writeback. >>>> >>>> Special case set_page_writeback_keepwrite() which does not clear to-write >>>> should be for dirty page thus dirty tag is not going to be cleared either. >>>> Ext4 calls it after redirty_page_for_writepage() >>>> XFS even without clear_page_dirty_for_io() >>>> >>>> Anyway to-write tag without dirty tag or at clear page is confusing. >>> >>> Yeah, TOWRITE tag is intended to be internal to writepages logic so your >>> patch is fine in that regard. Overall the patch looks good to me so I'm >>> just wondering a bit about the motivation... >> >> In our case file mixes cached pages and O_DIRECT read. Kind of database >> were index header is memory mapped while the rest data read via O_DIRECT. >> I suppose for sharing index between multiple instances. > > OK, that has always been a bit problematic but you're not the first one to > have such design ;). So feel free to add: > > Reviewed-by: Jan Kara <jack@suse.cz> > > to your patch. Thanks. O_DIRECT has long history of misunderstandings =) It looks some cases are still not documented. My favourite: O_DIRECT write into hole goes into cache, at least for ext4. > >> On this path we also hit this bug: >> https://lore.kernel.org/lkml/156355839560.2063.5265687291430814589.stgit@buzz/ >> so that's why I've started looking into this code. > > I see. OK. > > Honza >
diff --git a/mm/filemap.c b/mm/filemap.c index d0cf700bf201..d9572593e5c7 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -408,7 +408,8 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, .range_end = end, }; - if (!mapping_cap_writeback_dirty(mapping)) + if (!mapping_cap_writeback_dirty(mapping) || + !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) return 0; wbc_attach_fdatawrite_inode(&wbc, mapping->host);
Functions like filemap_write_and_wait_range() should do nothing if inode has no dirty pages or pages currently under writeback. But they anyway construct struct writeback_control and this does some atomic operations if CONFIG_CGROUP_WRITEBACK=y - on fast path it locks inode->i_lock and updates state of writeback ownership, on slow path might be more work. Current this path is safely avoided only when inode mapping has no pages. For example generic_file_read_iter() calls filemap_write_and_wait_range() at each O_DIRECT read - pretty hot path. This patch skips starting new writeback if mapping has no dirty tags set. If writeback is already in progress filemap_write_and_wait_range() will wait for it. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> --- mm/filemap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)