Message ID | 20160915115523.29737-18-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu 15-09-16 14:54:59, Kirill A. Shutemov wrote: > We writeback whole huge page a time. This is one of the things I don't understand. Firstly I didn't see where changes of writeback like this would happen (maybe they come later). Secondly I'm not sure why e.g. writeback should behave atomically wrt huge pages. Is this because radix-tree multiorder entry tracks dirtiness for us at that granularity? BTW, can you also explain why do we need multiorder entries? What do they solve for us? I'm sorry for these basic questions but I'd just like to understand how is this supposed to work... Honza > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/filemap.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 05b42d3e5ed8..53da93156e60 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -372,9 +372,14 @@ static int __filemap_fdatawait_range(struct address_space *mapping, > if (page->index > end) > continue; > > + page = compound_head(page); > wait_on_page_writeback(page); > if (TestClearPageError(page)) > ret = -EIO; > + if (PageTransHuge(page)) { > + index = page->index + HPAGE_PMD_NR; > + i += index - pvec.pages[i]->index - 1; > + } > } > pagevec_release(&pvec); > cond_resched(); > -- > 2.9.3 > >
On Thu, Oct 13, 2016 at 11:44:41AM +0200, Jan Kara wrote: > On Thu 15-09-16 14:54:59, Kirill A. Shutemov wrote: > > We writeback whole huge page a time. > > This is one of the things I don't understand. Firstly I didn't see where > changes of writeback like this would happen (maybe they come later). > Secondly I'm not sure why e.g. writeback should behave atomically wrt huge > pages. Is this because radix-tree multiorder entry tracks dirtiness for us > at that granularity? We track dirty/writeback on per-compound pages: meaning we have one dirty/writeback flag for whole compound page, not on every individual 4k subpage. The same story for radix-tree tags. > BTW, can you also explain why do we need multiorder entries? What do > they solve for us? It helps us having coherent view on tags in radix-tree: no matter which index we refer from the range huge page covers we will get the same answer on which tags set. > I'm sorry for these basic questions but I'd just like to understand how is > this supposed to work... > > Honza > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > mm/filemap.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 05b42d3e5ed8..53da93156e60 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -372,9 +372,14 @@ static int __filemap_fdatawait_range(struct address_space *mapping, > > if (page->index > end) > > continue; > > > > + page = compound_head(page); > > wait_on_page_writeback(page); > > if (TestClearPageError(page)) > > ret = -EIO; > > + if (PageTransHuge(page)) { > > + index = page->index + HPAGE_PMD_NR; > > + i += index - pvec.pages[i]->index - 1; > > + } > > } > > pagevec_release(&pvec); > > cond_resched(); > > -- > > 2.9.3 > > > > > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
On Thu 13-10-16 15:08:44, Kirill A. Shutemov wrote: > On Thu, Oct 13, 2016 at 11:44:41AM +0200, Jan Kara wrote: > > On Thu 15-09-16 14:54:59, Kirill A. Shutemov wrote: > > > We writeback whole huge page a time. > > > > This is one of the things I don't understand. Firstly I didn't see where > > changes of writeback like this would happen (maybe they come later). > > Secondly I'm not sure why e.g. writeback should behave atomically wrt huge > > pages. Is this because radix-tree multiorder entry tracks dirtiness for us > > at that granularity? > > We track dirty/writeback on per-compound pages: meaning we have one > dirty/writeback flag for whole compound page, not on every individual > 4k subpage. The same story for radix-tree tags. > > > BTW, can you also explain why do we need multiorder entries? What do > > they solve for us? > > It helps us having coherent view on tags in radix-tree: no matter which > index we refer from the range huge page covers we will get the same > answer on which tags set. OK, understand that. But why do we need a coherent view? For which purposes exactly do we care that it is not just a bunch of 4k pages that happen to be physically contiguous and thus can be mapped in one PMD? Honza
On Thu, Oct 13, 2016 at 03:18:02PM +0200, Jan Kara wrote: > On Thu 13-10-16 15:08:44, Kirill A. Shutemov wrote: > > On Thu, Oct 13, 2016 at 11:44:41AM +0200, Jan Kara wrote: > > > On Thu 15-09-16 14:54:59, Kirill A. Shutemov wrote: > > > > We writeback whole huge page a time. > > > > > > This is one of the things I don't understand. Firstly I didn't see where > > > changes of writeback like this would happen (maybe they come later). > > > Secondly I'm not sure why e.g. writeback should behave atomically wrt huge > > > pages. Is this because radix-tree multiorder entry tracks dirtiness for us > > > at that granularity? > > > > We track dirty/writeback on per-compound pages: meaning we have one > > dirty/writeback flag for whole compound page, not on every individual > > 4k subpage. The same story for radix-tree tags. > > > > > BTW, can you also explain why do we need multiorder entries? What do > > > they solve for us? > > > > It helps us having coherent view on tags in radix-tree: no matter which > > index we refer from the range huge page covers we will get the same > > answer on which tags set. > > OK, understand that. But why do we need a coherent view? For which purposes > exactly do we care that it is not just a bunch of 4k pages that happen to > be physically contiguous and thus can be mapped in one PMD? My understanding is that things like PageDirty() should be handled on the same granularity as PAGECACHE_TAG_DIRTY, otherwise things can go horribly wrong...
On Mon 24-10-16 14:36:25, Kirill A. Shutemov wrote: > On Thu, Oct 13, 2016 at 03:18:02PM +0200, Jan Kara wrote: > > On Thu 13-10-16 15:08:44, Kirill A. Shutemov wrote: > > > On Thu, Oct 13, 2016 at 11:44:41AM +0200, Jan Kara wrote: > > > > On Thu 15-09-16 14:54:59, Kirill A. Shutemov wrote: > > > > > We writeback whole huge page a time. > > > > > > > > This is one of the things I don't understand. Firstly I didn't see where > > > > changes of writeback like this would happen (maybe they come later). > > > > Secondly I'm not sure why e.g. writeback should behave atomically wrt huge > > > > pages. Is this because radix-tree multiorder entry tracks dirtiness for us > > > > at that granularity? > > > > > > We track dirty/writeback on per-compound pages: meaning we have one > > > dirty/writeback flag for whole compound page, not on every individual > > > 4k subpage. The same story for radix-tree tags. > > > > > > > BTW, can you also explain why do we need multiorder entries? What do > > > > they solve for us? > > > > > > It helps us having coherent view on tags in radix-tree: no matter which > > > index we refer from the range huge page covers we will get the same > > > answer on which tags set. > > > > OK, understand that. But why do we need a coherent view? For which purposes > > exactly do we care that it is not just a bunch of 4k pages that happen to > > be physically contiguous and thus can be mapped in one PMD? > > My understanding is that things like PageDirty() should be handled on the > same granularity as PAGECACHE_TAG_DIRTY, otherwise things can go horribly > wrong... Yeah, I agree with that. My question was rather aiming in the direction: Why don't we keep PageDirty and PAGECACHE_TAG_DIRTY on a page granularity? Why do we push all this to happen only in the head page? In your coverletter of the latest version (BTW thanks for expanding explanations there) you write: - head page (the first subpage) on LRU represents whole huge page; - head page's flags represent state of whole huge page (with few exceptions); - mm can't migrate subpages of the compound page individually; So the fact that flags of a head page represent flags of each individual page is the decision that I'm questioning, at least for PageDirty and PageWriteback flags. I'm asking because frankly, I don't like the series much. IMHO too many places need to know about huge pages and things will get broken frequently. And from filesystem POV I don't really see why a filesystem should care about huge pages *at all*. Sure functions allocating pages into page cache need to care, sure functions mapping pages into page tables need to care. But nobody else should need to be aware we are playing some huge page games... At least that is my idea how things ought to work ;) Your solution seems to go more towards the direction where we have two different sizes of pages in the system and everyone has to cope with it. But I'd also note that you go only half way there - e.g. page lookup functions still work with subpages, some places still use PAGE_SIZE & page->index, ... - so the result is a strange mix. So what are the reasons for having pages forming a huge page bound so tightly? Honza
diff --git a/mm/filemap.c b/mm/filemap.c index 05b42d3e5ed8..53da93156e60 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -372,9 +372,14 @@ static int __filemap_fdatawait_range(struct address_space *mapping, if (page->index > end) continue; + page = compound_head(page); wait_on_page_writeback(page); if (TestClearPageError(page)) ret = -EIO; + if (PageTransHuge(page)) { + index = page->index + HPAGE_PMD_NR; + i += index - pvec.pages[i]->index - 1; + } } pagevec_release(&pvec); cond_resched();
We writeback whole huge page a time. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/filemap.c | 5 +++++ 1 file changed, 5 insertions(+)