diff mbox

osd fs: __r4w_get_page rely on PageUptodate for uptodate

Message ID alpine.LSU.2.11.1510291137430.3369@eggly.anvils (mailing list archive)
State New, archived
Headers show

Commit Message

Hugh Dickins Oct. 29, 2015, 6:43 p.m. UTC
Patch "mm: migrate dirty page without clear_page_dirty_for_io etc",
presently staged in mmotm and linux-next, simplifies the migration of
a PageDirty pagecache page: one stat needs moving from zone to zone
and that's about all.

It's convenient and safest for it to shift the PageDirty bit from old
page to new, just before updating the zone stats: before copying data
and marking the new PageUptodate.  This is all done while both pages
are isolated and locked, just as before; and just as before, there's
a moment when the new page is visible in the radix_tree, but not yet
PageUptodate.  What's new is that it may now be briefly visible as
PageDirty before it is PageUptodate.

When I scoured the tree to see if this could cause a problem anywhere,
the only places I found were in two similar functions __r4w_get_page():
which look up a page with find_get_page() (not using page lock), then
claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.

I'm not sure whether that was right before, but now it might be wrong
(on rare occasions): only claim the page is uptodate if PageUptodate.
Or perhaps the page in question could never be migratable anyway?

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 fs/exofs/inode.c             |    5 +----
 fs/nfs/objlayout/objio_osd.c |    5 +----
 2 files changed, 2 insertions(+), 8 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Boaz Harrosh Nov. 1, 2015, 10 a.m. UTC | #1
On 10/29/2015 08:43 PM, Hugh Dickins wrote:
> Patch "mm: migrate dirty page without clear_page_dirty_for_io etc",
> presently staged in mmotm and linux-next, simplifies the migration of
> a PageDirty pagecache page: one stat needs moving from zone to zone
> and that's about all.
> 
> It's convenient and safest for it to shift the PageDirty bit from old
> page to new, just before updating the zone stats: before copying data
> and marking the new PageUptodate.  This is all done while both pages
> are isolated and locked, just as before; and just as before, there's
> a moment when the new page is visible in the radix_tree, but not yet
> PageUptodate.  What's new is that it may now be briefly visible as
> PageDirty before it is PageUptodate.
> 
> When I scoured the tree to see if this could cause a problem anywhere,
> the only places I found were in two similar functions __r4w_get_page():
> which look up a page with find_get_page() (not using page lock), then
> claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
> 
> I'm not sure whether that was right before, but now it might be wrong
> (on rare occasions): only claim the page is uptodate if PageUptodate.
> Or perhaps the page in question could never be migratable anyway?
> 

Hi Sir Hugh

I'm sorry, I admit the code is clear as mud, but your patch below is wrong.

The *uptodate return from __r4w_get_page is not really "up-to-date" at all
actually it means: "do we need to read the page from storage" writable/dirty pages
we do not read from storage but use the newest data in memory.

r4w means read-for-write which is when we need to bring in the full stripe to
re-calculate raid5/6 . (when only the partial stripe is written)

The scenario below of: "briefly visible as PageDirty before it is PageUptodate"
is fine in this case because in both cases we do not need to read the page.

Thanks for looking
Boaz

> Signed-off-by: Hugh Dickins <hughd@google.com>

This patch is not correct!

> ---
> 
>  fs/exofs/inode.c             |    5 +----
>  fs/nfs/objlayout/objio_osd.c |    5 +----
>  2 files changed, 2 insertions(+), 8 deletions(-)
> 
> --- 4.3-next/fs/exofs/inode.c	2015-08-30 11:34:09.000000000 -0700
> +++ linux/fs/exofs/inode.c	2015-10-28 16:55:18.795554294 -0700
> @@ -592,10 +592,7 @@ static struct page *__r4w_get_page(void
>  			}
>  			unlock_page(page);
>  		}
> -		if (PageDirty(page) || PageWriteback(page))
> -			*uptodate = true;
> -		else
> -			*uptodate = PageUptodate(page);
> +		*uptodate = PageUptodate(page);
>  		EXOFS_DBGMSG2("index=0x%lx uptodate=%d\n", index, *uptodate);
>  		return page;
>  	} else {
> --- 4.3-next/fs/nfs/objlayout/objio_osd.c	2015-10-21 18:35:07.620645439 -0700
> +++ linux/fs/nfs/objlayout/objio_osd.c	2015-10-28 16:53:55.083686639 -0700
> @@ -476,10 +476,7 @@ static struct page *__r4w_get_page(void
>  		}
>  		unlock_page(page);
>  	}
> -	if (PageDirty(page) || PageWriteback(page))
> -		*uptodate = true;
> -	else
> -		*uptodate = PageUptodate(page);
> +	*uptodate = PageUptodate(page);
>  	dprintk("%s: index=0x%lx uptodate=%d\n", __func__, index, *uptodate);
>  	return page;
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugh Dickins Nov. 1, 2015, 11:39 p.m. UTC | #2
On Sun, 1 Nov 2015, Boaz Harrosh wrote:
> On 10/29/2015 08:43 PM, Hugh Dickins wrote:
> > Patch "mm: migrate dirty page without clear_page_dirty_for_io etc",
> > presently staged in mmotm and linux-next, simplifies the migration of
> > a PageDirty pagecache page: one stat needs moving from zone to zone
> > and that's about all.
> > 
> > It's convenient and safest for it to shift the PageDirty bit from old
> > page to new, just before updating the zone stats: before copying data
> > and marking the new PageUptodate.  This is all done while both pages
> > are isolated and locked, just as before; and just as before, there's
> > a moment when the new page is visible in the radix_tree, but not yet
> > PageUptodate.  What's new is that it may now be briefly visible as
> > PageDirty before it is PageUptodate.
> > 
> > When I scoured the tree to see if this could cause a problem anywhere,
> > the only places I found were in two similar functions __r4w_get_page():
> > which look up a page with find_get_page() (not using page lock), then
> > claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
> > 
> > I'm not sure whether that was right before, but now it might be wrong
> > (on rare occasions): only claim the page is uptodate if PageUptodate.
> > Or perhaps the page in question could never be migratable anyway?
> > 
> 
> Hi Sir Hugh

Hi Boaz - please pardon my informality :)

> 
> I'm sorry, I admit the code is clear as mud, but your patch below is wrong.
> 
> The *uptodate return from __r4w_get_page is not really "up-to-date" at all
> actually it means: "do we need to read the page from storage" writable/dirty pages
> we do not read from storage but use the newest data in memory.
> 
> r4w means read-for-write which is when we need to bring in the full stripe to
> re-calculate raid5/6 . (when only the partial stripe is written)

Yes, that's what I understood from the code too, and how PageUptodate
is usually used: it allows the caller to bypass the overhead of locking
the page, rechecking PageUptodate, and reading it in if still necessary.

> 
> The scenario below of: "briefly visible as PageDirty before it is PageUptodate"
> is fine in this case because in both cases we do not need to read the page.

But when do you think you have a PageDirty (or PageWriteback) page which
is not PageUptodate?  We do not ClearPageUptodate when a page is modified.

PageUptodate normally remains set for as long as that page remains caching
that offset of the file.  I think it's true to say that PageUptodate is
only cleared when an error, or sometimes an invalidation, occurs (or of
course when the page is freed for reuse).

I was going to suggest that you check through the places which
ClearPageUptodate, but that is rather a confusing exercise, since I think
the majority of them are actually redundant - pages don't come from the
allocator with PageUptodate set, and a filesystem would already be in
trouble if it set PageUptodate before the page was initialized (usually
by reading its data in from disk).  So I think those ClearPageUptodates
on read error are redundant; though I'm not daring to remove them
(and they have no bearing on this patch at hand).

> 
> Thanks for looking
> Boaz
> 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> 
> This patch is not correct!

I think you have actually confirmed that the patch is correct:
why bother to test PageDirty or PageWriteback when PageUptodate
already tells you what you need?

Or do these filesystems do something unusual with PageUptodate
when PageDirty is set?  I didn't find it.

Thanks,
Hugh

> 
> > ---
> > 
> >  fs/exofs/inode.c             |    5 +----
> >  fs/nfs/objlayout/objio_osd.c |    5 +----
> >  2 files changed, 2 insertions(+), 8 deletions(-)
> > 
> > --- 4.3-next/fs/exofs/inode.c	2015-08-30 11:34:09.000000000 -0700
> > +++ linux/fs/exofs/inode.c	2015-10-28 16:55:18.795554294 -0700
> > @@ -592,10 +592,7 @@ static struct page *__r4w_get_page(void
> >  			}
> >  			unlock_page(page);
> >  		}
> > -		if (PageDirty(page) || PageWriteback(page))
> > -			*uptodate = true;
> > -		else
> > -			*uptodate = PageUptodate(page);
> > +		*uptodate = PageUptodate(page);
> >  		EXOFS_DBGMSG2("index=0x%lx uptodate=%d\n", index, *uptodate);
> >  		return page;
> >  	} else {
> > --- 4.3-next/fs/nfs/objlayout/objio_osd.c	2015-10-21 18:35:07.620645439 -0700
> > +++ linux/fs/nfs/objlayout/objio_osd.c	2015-10-28 16:53:55.083686639 -0700
> > @@ -476,10 +476,7 @@ static struct page *__r4w_get_page(void
> >  		}
> >  		unlock_page(page);
> >  	}
> > -	if (PageDirty(page) || PageWriteback(page))
> > -		*uptodate = true;
> > -	else
> > -		*uptodate = PageUptodate(page);
> > +	*uptodate = PageUptodate(page);
> >  	dprintk("%s: index=0x%lx uptodate=%d\n", __func__, index, *uptodate);
> >  	return page;
> >  }
> > 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh Nov. 2, 2015, 11:05 a.m. UTC | #3
On 11/02/2015 01:39 AM, Hugh Dickins wrote:
<>
>> This patch is not correct!
> 
> I think you have actually confirmed that the patch is correct:
> why bother to test PageDirty or PageWriteback when PageUptodate
> already tells you what you need?
> 
> Or do these filesystems do something unusual with PageUptodate
> when PageDirty is set?  I didn't find it.
> 

This is kind of delicate stuff. It took me a while to get it right
when I did it. I don't remember all the details.

But consider this option:

exofs_write_begin on a full PAGE_CACHE_SIZE, the page is instantiated
new in page-cache is that PageUptodate(page) then? I thought not.
(exofs does not set that)

Now that page I do not want to read in. The latest data is in memory.
(Same when this page is in writeback, dirty-bit is cleared)

So for sure if page is dirty or writeback then we surly do not need a read.
only if not then we need to consider the  PageUptodate(page) state.

Do you think the code is actually wrong as is?

BTW: Very similar code is in fs/nfs/objlayout/objio_osd.c::__r4w_get_page

> Thanks,
> Hugh
> 
<>

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugh Dickins Nov. 3, 2015, 2:49 a.m. UTC | #4
On Mon, 2 Nov 2015, Boaz Harrosh wrote:
> On 11/02/2015 01:39 AM, Hugh Dickins wrote:
> <>
> >> This patch is not correct!
> > 
> > I think you have actually confirmed that the patch is correct:
> > why bother to test PageDirty or PageWriteback when PageUptodate
> > already tells you what you need?
> > 
> > Or do these filesystems do something unusual with PageUptodate
> > when PageDirty is set?  I didn't find it.
> > 
> 
> This is kind of delicate stuff. It took me a while to get it right
> when I did it. I don't remember all the details.
> 
> But consider this option:

Thanks, yes, it helps to have a concrete example in front of us.

> 
> exofs_write_begin on a full PAGE_CACHE_SIZE, the page is instantiated
> new in page-cache is that PageUptodate(page) then? I thought not.

Right, PageUptodate must not be set until the page has been filled with
the correct data.  Nor is PageDirty or PageWriteback set at this point,
actually.

Once page is filled with the correct data, either exofs_write_end()
(which uses simple_write_end()) or (internally) exofs_commit_chunk()
is called.

> (exofs does not set that)

It's simple_write_end() or exofs_commit_chunk() which SetPageUptodate
in this case.  And after that each calls set_page_dirty(), which does
the SetPageDirty, before unlocking the page which was supplied locked
by exofs_write_begin().

So I don't see where the page is PageDirty without being PageUptodate.

> 
> Now that page I do not want to read in. The latest data is in memory.
> (Same when this page is in writeback, dirty-bit is cleared)

Understood, but that's what PageUptodate is for.

(Quite what happens if there's a write error is not so clear: I think
that typically PageError gets set and PageUptodate cleared, to read
back in from disk what's actually there - but lose the data we wanted
to write; but I can understand different filesystems making different
choices there, and didn't study exofs's choice.)

> 
> So for sure if page is dirty or writeback then we surly do not need a read.
> only if not then we need to consider the  PageUptodate(page) state.

PageUptodate is the proper flag to check, to ask if the page contains
the correct data: there is no need to consider Dirty or Writeback.

> 
> Do you think the code is actually wrong as is?

Not that I know of: just a little too complicated and confusing.

But becomes slightly wrong if my simplification to page migration
goes through, since that introduces an instant when PageDirty is set
before the new page contains the correct data and is marked Uptodate.
Hence my patch.

> 
> BTW: Very similar code is in fs/nfs/objlayout/objio_osd.c::__r4w_get_page

Indeed, the patch makes the same adjustment to that code too.

> 
> > Thanks,
> > Hugh
> > 
> <>
> 
> Thanks
> Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh Nov. 3, 2015, 9:24 a.m. UTC | #5
On 11/03/2015 04:49 AM, Hugh Dickins wrote:
> On Mon, 2 Nov 2015, Boaz Harrosh wrote:
>> On 11/02/2015 01:39 AM, Hugh Dickins wrote:
>> <>
>>>> This patch is not correct!
>>>
>>> I think you have actually confirmed that the patch is correct:
>>> why bother to test PageDirty or PageWriteback when PageUptodate
>>> already tells you what you need?
>>>
>>> Or do these filesystems do something unusual with PageUptodate
>>> when PageDirty is set?  I didn't find it.
>>>
>>
>> This is kind of delicate stuff. It took me a while to get it right
>> when I did it. I don't remember all the details.
>>
>> But consider this option:
> 
> Thanks, yes, it helps to have a concrete example in front of us.
> 
>>
>> exofs_write_begin on a full PAGE_CACHE_SIZE, the page is instantiated
>> new in page-cache is that PageUptodate(page) then? I thought not.
> 
> Right, PageUptodate must not be set until the page has been filled with
> the correct data.  Nor is PageDirty or PageWriteback set at this point,
> actually.
> 
> Once page is filled with the correct data, either exofs_write_end()
> (which uses simple_write_end()) or (internally) exofs_commit_chunk()
> is called.
> 
>> (exofs does not set that)
> 
> It's simple_write_end() or exofs_commit_chunk() which SetPageUptodate
> in this case.  And after that each calls set_page_dirty(), which does
> the SetPageDirty, before unlocking the page which was supplied locked
> by exofs_write_begin().
> 
> So I don't see where the page is PageDirty without being PageUptodate.
> 
>>
>> Now that page I do not want to read in. The latest data is in memory.
>> (Same when this page is in writeback, dirty-bit is cleared)
> 
> Understood, but that's what PageUptodate is for.
> 
> (Quite what happens if there's a write error is not so clear: I think
> that typically PageError gets set and PageUptodate cleared, to read
> back in from disk what's actually there - but lose the data we wanted
> to write; but I can understand different filesystems making different
> choices there, and didn't study exofs's choice.)
> 
>>
>> So for sure if page is dirty or writeback then we surly do not need a read.
>> only if not then we need to consider the  PageUptodate(page) state.
> 
> PageUptodate is the proper flag to check, to ask if the page contains
> the correct data: there is no need to consider Dirty or Writeback.
> 
>>
>> Do you think the code is actually wrong as is?
> 
> Not that I know of: just a little too complicated and confusing.
> 
> But becomes slightly wrong if my simplification to page migration
> goes through, since that introduces an instant when PageDirty is set
> before the new page contains the correct data and is marked Uptodate.
> Hence my patch.
> 
>>
>> BTW: Very similar code is in fs/nfs/objlayout/objio_osd.c::__r4w_get_page
> 
> Indeed, the patch makes the same adjustment to that code too.
> 

OK thanks. Let me setup and test your patch. On top of 4.3 is good?
I'll send you a tested-by once I'm done.

Boaz

>>
>>> Thanks,
>>> Hugh
>>>
>> <>
>>
>> Thanks
>> Boaz
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugh Dickins Nov. 3, 2015, 3:39 p.m. UTC | #6
On Tue, 3 Nov 2015, Boaz Harrosh wrote:
> On 11/03/2015 04:49 AM, Hugh Dickins wrote:
> > On Mon, 2 Nov 2015, Boaz Harrosh wrote:
> >>
> >> Do you think the code is actually wrong as is?
> > 
> > Not that I know of: just a little too complicated and confusing.
> > 
> > But becomes slightly wrong if my simplification to page migration
> > goes through, since that introduces an instant when PageDirty is set
> > before the new page contains the correct data and is marked Uptodate.
> > Hence my patch.
> > 
> >>
> >> BTW: Very similar code is in fs/nfs/objlayout/objio_osd.c::__r4w_get_page
> > 
> > Indeed, the patch makes the same adjustment to that code too.
> > 
> 
> OK thanks. Let me setup and test your patch. On top of 4.3 is good?
> I'll send you a tested-by once I'm done.

Great, thanks Boaz, that will help a lot.  On top of 4.3 very good.

(Of course, that will not test the rare page migration race I shall
introduce later; but it will test that you're doing the right thing
with this change at your end, when you will be safe against the race.)

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- 4.3-next/fs/exofs/inode.c	2015-08-30 11:34:09.000000000 -0700
+++ linux/fs/exofs/inode.c	2015-10-28 16:55:18.795554294 -0700
@@ -592,10 +592,7 @@  static struct page *__r4w_get_page(void
 			}
 			unlock_page(page);
 		}
-		if (PageDirty(page) || PageWriteback(page))
-			*uptodate = true;
-		else
-			*uptodate = PageUptodate(page);
+		*uptodate = PageUptodate(page);
 		EXOFS_DBGMSG2("index=0x%lx uptodate=%d\n", index, *uptodate);
 		return page;
 	} else {
--- 4.3-next/fs/nfs/objlayout/objio_osd.c	2015-10-21 18:35:07.620645439 -0700
+++ linux/fs/nfs/objlayout/objio_osd.c	2015-10-28 16:53:55.083686639 -0700
@@ -476,10 +476,7 @@  static struct page *__r4w_get_page(void
 		}
 		unlock_page(page);
 	}
-	if (PageDirty(page) || PageWriteback(page))
-		*uptodate = true;
-	else
-		*uptodate = PageUptodate(page);
+	*uptodate = PageUptodate(page);
 	dprintk("%s: index=0x%lx uptodate=%d\n", __func__, index, *uptodate);
 	return page;
 }