Message ID | alpine.LSU.2.11.1510291137430.3369@eggly.anvils (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/29/2015 08:43 PM, Hugh Dickins wrote: > Patch "mm: migrate dirty page without clear_page_dirty_for_io etc", > presently staged in mmotm and linux-next, simplifies the migration of > a PageDirty pagecache page: one stat needs moving from zone to zone > and that's about all. > > It's convenient and safest for it to shift the PageDirty bit from old > page to new, just before updating the zone stats: before copying data > and marking the new PageUptodate. This is all done while both pages > are isolated and locked, just as before; and just as before, there's > a moment when the new page is visible in the radix_tree, but not yet > PageUptodate. What's new is that it may now be briefly visible as > PageDirty before it is PageUptodate. > > When I scoured the tree to see if this could cause a problem anywhere, > the only places I found were in two similar functions __r4w_get_page(): > which look up a page with find_get_page() (not using page lock), then > claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate. > > I'm not sure whether that was right before, but now it might be wrong > (on rare occasions): only claim the page is uptodate if PageUptodate. > Or perhaps the page in question could never be migratable anyway? > Hi Sir Hugh I'm sorry, I admit the code is clear as mud, but your patch below is wrong. The *uptodate return from __r4w_get_page is not really "up-to-date" at all actually it means: "do we need to read the page from storage" writable/dirty pages we do not read from storage but use the newest data in memory. r4w means read-for-write which is when we need to bring in the full stripe to re-calculate raid5/6 . (when only the partial stripe is written) The scenario below of: "briefly visible as PageDirty before it is PageUptodate" is fine in this case because in both cases we do not need to read the page. Thanks for looking Boaz > Signed-off-by: Hugh Dickins <hughd@google.com> This patch is not correct! > --- > > fs/exofs/inode.c | 5 +---- > fs/nfs/objlayout/objio_osd.c | 5 +---- > 2 files changed, 2 insertions(+), 8 deletions(-) > > --- 4.3-next/fs/exofs/inode.c 2015-08-30 11:34:09.000000000 -0700 > +++ linux/fs/exofs/inode.c 2015-10-28 16:55:18.795554294 -0700 > @@ -592,10 +592,7 @@ static struct page *__r4w_get_page(void > } > unlock_page(page); > } > - if (PageDirty(page) || PageWriteback(page)) > - *uptodate = true; > - else > - *uptodate = PageUptodate(page); > + *uptodate = PageUptodate(page); > EXOFS_DBGMSG2("index=0x%lx uptodate=%d\n", index, *uptodate); > return page; > } else { > --- 4.3-next/fs/nfs/objlayout/objio_osd.c 2015-10-21 18:35:07.620645439 -0700 > +++ linux/fs/nfs/objlayout/objio_osd.c 2015-10-28 16:53:55.083686639 -0700 > @@ -476,10 +476,7 @@ static struct page *__r4w_get_page(void > } > unlock_page(page); > } > - if (PageDirty(page) || PageWriteback(page)) > - *uptodate = true; > - else > - *uptodate = PageUptodate(page); > + *uptodate = PageUptodate(page); > dprintk("%s: index=0x%lx uptodate=%d\n", __func__, index, *uptodate); > return page; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 1 Nov 2015, Boaz Harrosh wrote: > On 10/29/2015 08:43 PM, Hugh Dickins wrote: > > Patch "mm: migrate dirty page without clear_page_dirty_for_io etc", > > presently staged in mmotm and linux-next, simplifies the migration of > > a PageDirty pagecache page: one stat needs moving from zone to zone > > and that's about all. > > > > It's convenient and safest for it to shift the PageDirty bit from old > > page to new, just before updating the zone stats: before copying data > > and marking the new PageUptodate. This is all done while both pages > > are isolated and locked, just as before; and just as before, there's > > a moment when the new page is visible in the radix_tree, but not yet > > PageUptodate. What's new is that it may now be briefly visible as > > PageDirty before it is PageUptodate. > > > > When I scoured the tree to see if this could cause a problem anywhere, > > the only places I found were in two similar functions __r4w_get_page(): > > which look up a page with find_get_page() (not using page lock), then > > claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate. > > > > I'm not sure whether that was right before, but now it might be wrong > > (on rare occasions): only claim the page is uptodate if PageUptodate. > > Or perhaps the page in question could never be migratable anyway? > > > > Hi Sir Hugh Hi Boaz - please pardon my informality :) > > I'm sorry, I admit the code is clear as mud, but your patch below is wrong. > > The *uptodate return from __r4w_get_page is not really "up-to-date" at all > actually it means: "do we need to read the page from storage" writable/dirty pages > we do not read from storage but use the newest data in memory. > > r4w means read-for-write which is when we need to bring in the full stripe to > re-calculate raid5/6 . (when only the partial stripe is written) Yes, that's what I understood from the code too, and how PageUptodate is usually used: it allows the caller to bypass the overhead of locking the page, rechecking PageUptodate, and reading it in if still necessary. > > The scenario below of: "briefly visible as PageDirty before it is PageUptodate" > is fine in this case because in both cases we do not need to read the page. But when do you think you have a PageDirty (or PageWriteback) page which is not PageUptodate? We do not ClearPageUptodate when a page is modified. PageUptodate normally remains set for as long as that page remains caching that offset of the file. I think it's true to say that PageUptodate is only cleared when an error, or sometimes an invalidation, occurs (or of course when the page is freed for reuse). I was going to suggest that you check through the places which ClearPageUptodate, but that is rather a confusing exercise, since I think the majority of them are actually redundant - pages don't come from the allocator with PageUptodate set, and a filesystem would already be in trouble if it set PageUptodate before the page was initialized (usually by reading its data in from disk). So I think those ClearPageUptodates on read error are redundant; though I'm not daring to remove them (and they have no bearing on this patch at hand). > > Thanks for looking > Boaz > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > This patch is not correct! I think you have actually confirmed that the patch is correct: why bother to test PageDirty or PageWriteback when PageUptodate already tells you what you need? Or do these filesystems do something unusual with PageUptodate when PageDirty is set? I didn't find it. Thanks, Hugh > > > --- > > > > fs/exofs/inode.c | 5 +---- > > fs/nfs/objlayout/objio_osd.c | 5 +---- > > 2 files changed, 2 insertions(+), 8 deletions(-) > > > > --- 4.3-next/fs/exofs/inode.c 2015-08-30 11:34:09.000000000 -0700 > > +++ linux/fs/exofs/inode.c 2015-10-28 16:55:18.795554294 -0700 > > @@ -592,10 +592,7 @@ static struct page *__r4w_get_page(void > > } > > unlock_page(page); > > } > > - if (PageDirty(page) || PageWriteback(page)) > > - *uptodate = true; > > - else > > - *uptodate = PageUptodate(page); > > + *uptodate = PageUptodate(page); > > EXOFS_DBGMSG2("index=0x%lx uptodate=%d\n", index, *uptodate); > > return page; > > } else { > > --- 4.3-next/fs/nfs/objlayout/objio_osd.c 2015-10-21 18:35:07.620645439 -0700 > > +++ linux/fs/nfs/objlayout/objio_osd.c 2015-10-28 16:53:55.083686639 -0700 > > @@ -476,10 +476,7 @@ static struct page *__r4w_get_page(void > > } > > unlock_page(page); > > } > > - if (PageDirty(page) || PageWriteback(page)) > > - *uptodate = true; > > - else > > - *uptodate = PageUptodate(page); > > + *uptodate = PageUptodate(page); > > dprintk("%s: index=0x%lx uptodate=%d\n", __func__, index, *uptodate); > > return page; > > } > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/02/2015 01:39 AM, Hugh Dickins wrote: <> >> This patch is not correct! > > I think you have actually confirmed that the patch is correct: > why bother to test PageDirty or PageWriteback when PageUptodate > already tells you what you need? > > Or do these filesystems do something unusual with PageUptodate > when PageDirty is set? I didn't find it. > This is kind of delicate stuff. It took me a while to get it right when I did it. I don't remember all the details. But consider this option: exofs_write_begin on a full PAGE_CACHE_SIZE, the page is instantiated new in page-cache is that PageUptodate(page) then? I thought not. (exofs does not set that) Now that page I do not want to read in. The latest data is in memory. (Same when this page is in writeback, dirty-bit is cleared) So for sure if page is dirty or writeback then we surly do not need a read. only if not then we need to consider the PageUptodate(page) state. Do you think the code is actually wrong as is? BTW: Very similar code is in fs/nfs/objlayout/objio_osd.c::__r4w_get_page > Thanks, > Hugh > <> Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2 Nov 2015, Boaz Harrosh wrote: > On 11/02/2015 01:39 AM, Hugh Dickins wrote: > <> > >> This patch is not correct! > > > > I think you have actually confirmed that the patch is correct: > > why bother to test PageDirty or PageWriteback when PageUptodate > > already tells you what you need? > > > > Or do these filesystems do something unusual with PageUptodate > > when PageDirty is set? I didn't find it. > > > > This is kind of delicate stuff. It took me a while to get it right > when I did it. I don't remember all the details. > > But consider this option: Thanks, yes, it helps to have a concrete example in front of us. > > exofs_write_begin on a full PAGE_CACHE_SIZE, the page is instantiated > new in page-cache is that PageUptodate(page) then? I thought not. Right, PageUptodate must not be set until the page has been filled with the correct data. Nor is PageDirty or PageWriteback set at this point, actually. Once page is filled with the correct data, either exofs_write_end() (which uses simple_write_end()) or (internally) exofs_commit_chunk() is called. > (exofs does not set that) It's simple_write_end() or exofs_commit_chunk() which SetPageUptodate in this case. And after that each calls set_page_dirty(), which does the SetPageDirty, before unlocking the page which was supplied locked by exofs_write_begin(). So I don't see where the page is PageDirty without being PageUptodate. > > Now that page I do not want to read in. The latest data is in memory. > (Same when this page is in writeback, dirty-bit is cleared) Understood, but that's what PageUptodate is for. (Quite what happens if there's a write error is not so clear: I think that typically PageError gets set and PageUptodate cleared, to read back in from disk what's actually there - but lose the data we wanted to write; but I can understand different filesystems making different choices there, and didn't study exofs's choice.) > > So for sure if page is dirty or writeback then we surly do not need a read. > only if not then we need to consider the PageUptodate(page) state. PageUptodate is the proper flag to check, to ask if the page contains the correct data: there is no need to consider Dirty or Writeback. > > Do you think the code is actually wrong as is? Not that I know of: just a little too complicated and confusing. But becomes slightly wrong if my simplification to page migration goes through, since that introduces an instant when PageDirty is set before the new page contains the correct data and is marked Uptodate. Hence my patch. > > BTW: Very similar code is in fs/nfs/objlayout/objio_osd.c::__r4w_get_page Indeed, the patch makes the same adjustment to that code too. > > > Thanks, > > Hugh > > > <> > > Thanks > Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/03/2015 04:49 AM, Hugh Dickins wrote: > On Mon, 2 Nov 2015, Boaz Harrosh wrote: >> On 11/02/2015 01:39 AM, Hugh Dickins wrote: >> <> >>>> This patch is not correct! >>> >>> I think you have actually confirmed that the patch is correct: >>> why bother to test PageDirty or PageWriteback when PageUptodate >>> already tells you what you need? >>> >>> Or do these filesystems do something unusual with PageUptodate >>> when PageDirty is set? I didn't find it. >>> >> >> This is kind of delicate stuff. It took me a while to get it right >> when I did it. I don't remember all the details. >> >> But consider this option: > > Thanks, yes, it helps to have a concrete example in front of us. > >> >> exofs_write_begin on a full PAGE_CACHE_SIZE, the page is instantiated >> new in page-cache is that PageUptodate(page) then? I thought not. > > Right, PageUptodate must not be set until the page has been filled with > the correct data. Nor is PageDirty or PageWriteback set at this point, > actually. > > Once page is filled with the correct data, either exofs_write_end() > (which uses simple_write_end()) or (internally) exofs_commit_chunk() > is called. > >> (exofs does not set that) > > It's simple_write_end() or exofs_commit_chunk() which SetPageUptodate > in this case. And after that each calls set_page_dirty(), which does > the SetPageDirty, before unlocking the page which was supplied locked > by exofs_write_begin(). > > So I don't see where the page is PageDirty without being PageUptodate. > >> >> Now that page I do not want to read in. The latest data is in memory. >> (Same when this page is in writeback, dirty-bit is cleared) > > Understood, but that's what PageUptodate is for. > > (Quite what happens if there's a write error is not so clear: I think > that typically PageError gets set and PageUptodate cleared, to read > back in from disk what's actually there - but lose the data we wanted > to write; but I can understand different filesystems making different > choices there, and didn't study exofs's choice.) > >> >> So for sure if page is dirty or writeback then we surly do not need a read. >> only if not then we need to consider the PageUptodate(page) state. > > PageUptodate is the proper flag to check, to ask if the page contains > the correct data: there is no need to consider Dirty or Writeback. > >> >> Do you think the code is actually wrong as is? > > Not that I know of: just a little too complicated and confusing. > > But becomes slightly wrong if my simplification to page migration > goes through, since that introduces an instant when PageDirty is set > before the new page contains the correct data and is marked Uptodate. > Hence my patch. > >> >> BTW: Very similar code is in fs/nfs/objlayout/objio_osd.c::__r4w_get_page > > Indeed, the patch makes the same adjustment to that code too. > OK thanks. Let me setup and test your patch. On top of 4.3 is good? I'll send you a tested-by once I'm done. Boaz >> >>> Thanks, >>> Hugh >>> >> <> >> >> Thanks >> Boaz > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Nov 2015, Boaz Harrosh wrote: > On 11/03/2015 04:49 AM, Hugh Dickins wrote: > > On Mon, 2 Nov 2015, Boaz Harrosh wrote: > >> > >> Do you think the code is actually wrong as is? > > > > Not that I know of: just a little too complicated and confusing. > > > > But becomes slightly wrong if my simplification to page migration > > goes through, since that introduces an instant when PageDirty is set > > before the new page contains the correct data and is marked Uptodate. > > Hence my patch. > > > >> > >> BTW: Very similar code is in fs/nfs/objlayout/objio_osd.c::__r4w_get_page > > > > Indeed, the patch makes the same adjustment to that code too. > > > > OK thanks. Let me setup and test your patch. On top of 4.3 is good? > I'll send you a tested-by once I'm done. Great, thanks Boaz, that will help a lot. On top of 4.3 very good. (Of course, that will not test the rare page migration race I shall introduce later; but it will test that you're doing the right thing with this change at your end, when you will be safe against the race.) Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- 4.3-next/fs/exofs/inode.c 2015-08-30 11:34:09.000000000 -0700 +++ linux/fs/exofs/inode.c 2015-10-28 16:55:18.795554294 -0700 @@ -592,10 +592,7 @@ static struct page *__r4w_get_page(void } unlock_page(page); } - if (PageDirty(page) || PageWriteback(page)) - *uptodate = true; - else - *uptodate = PageUptodate(page); + *uptodate = PageUptodate(page); EXOFS_DBGMSG2("index=0x%lx uptodate=%d\n", index, *uptodate); return page; } else { --- 4.3-next/fs/nfs/objlayout/objio_osd.c 2015-10-21 18:35:07.620645439 -0700 +++ linux/fs/nfs/objlayout/objio_osd.c 2015-10-28 16:53:55.083686639 -0700 @@ -476,10 +476,7 @@ static struct page *__r4w_get_page(void } unlock_page(page); } - if (PageDirty(page) || PageWriteback(page)) - *uptodate = true; - else - *uptodate = PageUptodate(page); + *uptodate = PageUptodate(page); dprintk("%s: index=0x%lx uptodate=%d\n", __func__, index, *uptodate); return page; }
Patch "mm: migrate dirty page without clear_page_dirty_for_io etc", presently staged in mmotm and linux-next, simplifies the migration of a PageDirty pagecache page: one stat needs moving from zone to zone and that's about all. It's convenient and safest for it to shift the PageDirty bit from old page to new, just before updating the zone stats: before copying data and marking the new PageUptodate. This is all done while both pages are isolated and locked, just as before; and just as before, there's a moment when the new page is visible in the radix_tree, but not yet PageUptodate. What's new is that it may now be briefly visible as PageDirty before it is PageUptodate. When I scoured the tree to see if this could cause a problem anywhere, the only places I found were in two similar functions __r4w_get_page(): which look up a page with find_get_page() (not using page lock), then claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate. I'm not sure whether that was right before, but now it might be wrong (on rare occasions): only claim the page is uptodate if PageUptodate. Or perhaps the page in question could never be migratable anyway? Signed-off-by: Hugh Dickins <hughd@google.com> --- fs/exofs/inode.c | 5 +---- fs/nfs/objlayout/objio_osd.c | 5 +---- 2 files changed, 2 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html