fix page_count in ->iomap_migrate_page()
diff mbox series

Message ID 1544766961-3492-1-git-send-email-openzhangj@gmail.com
State New
Headers show
Series
  • fix page_count in ->iomap_migrate_page()
Related show

Commit Message

zhangjun Dec. 14, 2018, 5:56 a.m. UTC
IOMAP uses PG_private a little different with buffer_head based
filesystem.
It uses it as marker and when set, the page counter is not incremented,
migrate_page_move_mapping() assumes that PG_private indicates a counter
of +1.
so, we have to pass a extra count of -1 to migrate_page_move_mapping()
if the flag is set.

Signed-off-by: zhangjun <openzhangj@gmail.com>
---
 fs/iomap.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Richard Weinberger Dec. 14, 2018, 11:25 a.m. UTC | #1
[CC'ing authors of the code plus mm folks]

Am Freitag, 14. Dezember 2018, 06:56:01 CET schrieb zhangjun:
> IOMAP uses PG_private a little different with buffer_head based
> filesystem.
> It uses it as marker and when set, the page counter is not incremented,
> migrate_page_move_mapping() assumes that PG_private indicates a counter
> of +1.
> so, we have to pass a extra count of -1 to migrate_page_move_mapping()
> if the flag is set.
> 
> Signed-off-by: zhangjun <openzhangj@gmail.com>
> ---
>  fs/iomap.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 64ce240..352e58a 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -544,8 +544,17 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
>  		struct page *page, enum migrate_mode mode)
>  {
>  	int ret;
> +	int extra_count = 0;
>  
> -	ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
> +	/*
> +	 * IOMAP uses PG_private as marker and does not raise the page counter.
> +	 * migrate_page_move_mapping() expects a incremented counter if PG_private
> +	 * is set. Therefore pass -1 as extra_count for this case.
> +	 */
> +	if (page_has_private(page))
> +		extra_count = -1;
> +	ret = migrate_page_move_mapping(mapping, newpage, page,
> +		       NULL, mode, extra_count);
>  	if (ret != MIGRATEPAGE_SUCCESS)
>  		return ret;

This is the third place which needs this workaround.
UBIFS, F2FS, and now iomap.

I agree with Dave that nobody can assume that PG_private implies an additional
page reference.
But page migration does that. Including parts of the write back code.

Thanks,
//richard
Gao Xiang Dec. 14, 2018, 12:26 p.m. UTC | #2
Hi Richard,

On 2018/12/14 19:25, Richard Weinberger wrote:
> This is the third place which needs this workaround.
> UBIFS, F2FS, and now iomap.
> 
> I agree with Dave that nobody can assume that PG_private implies an additional
> page reference.
> But page migration does that. Including parts of the write back code.

It seems that it's clearly documented in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm.h#n780

 * A pagecache page contains an opaque `private' member, which belongs to the
 * page's address_space. Usually, this is the address of a circular list of
 * the page's disk buffers. PG_private must be set to tell the VM to call
 * into the filesystem to release these pages.
 *
 * A page may belong to an inode's memory mapping. In this case, page->mapping
 * is the pointer to the inode, and page->index is the file offset of the page,
 * in units of PAGE_SIZE.
 *
 * If pagecache pages are not associated with an inode, they are said to be
 * anonymous pages. These may become associated with the swapcache, and in that
 * case PG_swapcache is set, and page->private is an offset into the swapcache.
 *
 * In either case (swapcache or inode backed), the pagecache itself holds one
 * reference to the page. Setting PG_private should also increment the
 * refcount. The each user mapping also has a reference to the page.

and when I looked into that, I found
https://lore.kernel.org/lkml/3CB3CA93.D141680B@zip.com.au/


Thanks,
Gao Xiang
Richard Weinberger Dec. 14, 2018, 1:35 p.m. UTC | #3
Am Freitag, 14. Dezember 2018, 13:26:28 CET schrieb Gao Xiang:
> Hi Richard,
> 
> On 2018/12/14 19:25, Richard Weinberger wrote:
> > This is the third place which needs this workaround.
> > UBIFS, F2FS, and now iomap.
> > 
> > I agree with Dave that nobody can assume that PG_private implies an additional
> > page reference.
> > But page migration does that. Including parts of the write back code.
> 
> It seems that it's clearly documented in
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm.h#n780
> 
>  * A pagecache page contains an opaque `private' member, which belongs to the
>  * page's address_space. Usually, this is the address of a circular list of
>  * the page's disk buffers. PG_private must be set to tell the VM to call
>  * into the filesystem to release these pages.
>  *
>  * A page may belong to an inode's memory mapping. In this case, page->mapping
>  * is the pointer to the inode, and page->index is the file offset of the page,
>  * in units of PAGE_SIZE.
>  *
>  * If pagecache pages are not associated with an inode, they are said to be
>  * anonymous pages. These may become associated with the swapcache, and in that
>  * case PG_swapcache is set, and page->private is an offset into the swapcache.
>  *
>  * In either case (swapcache or inode backed), the pagecache itself holds one
>  * reference to the page. Setting PG_private should also increment the
>  * refcount. The each user mapping also has a reference to the page.
> 
> and when I looked into that, I found
> https://lore.kernel.org/lkml/3CB3CA93.D141680B@zip.com.au/

Hmm, in case of UBIFS it seems easy. We can add a get/put_page() around setting/clearing
the flag.
I did that now and so far none of my tests exploded.

Artem, do you remember why UBIFS never raised the page counter when setting PG_private?

Thanks,
//richard
Gao Xiang Dec. 14, 2018, 1:55 p.m. UTC | #4
Hi Richard,

On 2018/12/14 21:35, Richard Weinberger wrote:
> Hmm, in case of UBIFS it seems easy. We can add a get/put_page() around setting/clearing
> the flag.
> I did that now and so far none of my tests exploded.

Yes, many existed codes are based on this restriction in order to be freeable race-free.
and that's it since PG_Private was once introduced at first by Andrew Morton in 2002
for many Linux versions....and it's not bad I think... :)

Thanks,
Gao Xiang
Gao Xiang Dec. 15, 2018, 4:26 a.m. UTC | #5
On 2018/12/14 13:56, zhangjun wrote:
> IOMAP uses PG_private a little different with buffer_head based
> filesystem.
> It uses it as marker and when set, the page counter is not incremented,
> migrate_page_move_mapping() assumes that PG_private indicates a counter
> of +1.
> so, we have to pass a extra count of -1 to migrate_page_move_mapping()
> if the flag is set.
> 
> Signed-off-by: zhangjun <openzhangj@gmail.com>
> ---

I found that it fixed in
https://patchwork.kernel.org/patch/10684835/
and has been merged in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61c6de667263184125d5ca75e894fcad632b0dd3

It seems it has been corrected by Piotr.

Thanks,
Gao Xiang
Christoph Hellwig Dec. 15, 2018, 10:51 a.m. UTC | #6
FYI, for iomap we got a patch to just increment the page count when
setting the private data, and it finally got merged into mainline after
a while.

Not that it totally makes sense to me, but it is what it is.  It would
just be nice if set_page_private took care of it and we had a
clear_page_private to undo it, making the whole scheme at lot more
obvious.
Richard Weinberger Dec. 15, 2018, 11:17 a.m. UTC | #7
Am Samstag, 15. Dezember 2018, 11:51:12 CET schrieb Christoph Hellwig:
> FYI, for iomap we got a patch to just increment the page count when
> setting the private data, and it finally got merged into mainline after
> a while.
> 
> Not that it totally makes sense to me, but it is what it is.  It would
> just be nice if set_page_private took care of it and we had a
> clear_page_private to undo it, making the whole scheme at lot more
> obvious.

Yeah, UBIFS will go the same route. I have already a patch prepared
which increments the page count when UBIFS sets PG_private.

Thanks,
//richard

Patch
diff mbox series

diff --git a/fs/iomap.c b/fs/iomap.c
index 64ce240..352e58a 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -544,8 +544,17 @@  iomap_migrate_page(struct address_space *mapping, struct page *newpage,
 		struct page *page, enum migrate_mode mode)
 {
 	int ret;
+	int extra_count = 0;
 
-	ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
+	/*
+	 * IOMAP uses PG_private as marker and does not raise the page counter.
+	 * migrate_page_move_mapping() expects a incremented counter if PG_private
+	 * is set. Therefore pass -1 as extra_count for this case.
+	 */
+	if (page_has_private(page))
+		extra_count = -1;
+	ret = migrate_page_move_mapping(mapping, newpage, page,
+		       NULL, mode, extra_count);
 	if (ret != MIGRATEPAGE_SUCCESS)
 		return ret;