Message ID | 1544766961-3492-1-git-send-email-openzhangj@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix page_count in ->iomap_migrate_page() | expand |
[CC'ing authors of the code plus mm folks] Am Freitag, 14. Dezember 2018, 06:56:01 CET schrieb zhangjun: > IOMAP uses PG_private a little different with buffer_head based > filesystem. > It uses it as marker and when set, the page counter is not incremented, > migrate_page_move_mapping() assumes that PG_private indicates a counter > of +1. > so, we have to pass a extra count of -1 to migrate_page_move_mapping() > if the flag is set. > > Signed-off-by: zhangjun <openzhangj@gmail.com> > --- > fs/iomap.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/fs/iomap.c b/fs/iomap.c > index 64ce240..352e58a 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -544,8 +544,17 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage, > struct page *page, enum migrate_mode mode) > { > int ret; > + int extra_count = 0; > > - ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0); > + /* > + * IOMAP uses PG_private as marker and does not raise the page counter. > + * migrate_page_move_mapping() expects a incremented counter if PG_private > + * is set. Therefore pass -1 as extra_count for this case. > + */ > + if (page_has_private(page)) > + extra_count = -1; > + ret = migrate_page_move_mapping(mapping, newpage, page, > + NULL, mode, extra_count); > if (ret != MIGRATEPAGE_SUCCESS) > return ret; This is the third place which needs this workaround. UBIFS, F2FS, and now iomap. I agree with Dave that nobody can assume that PG_private implies an additional page reference. But page migration does that. Including parts of the write back code. Thanks, //richard
Hi Richard, On 2018/12/14 19:25, Richard Weinberger wrote: > This is the third place which needs this workaround. > UBIFS, F2FS, and now iomap. > > I agree with Dave that nobody can assume that PG_private implies an additional > page reference. > But page migration does that. Including parts of the write back code. It seems that it's clearly documented in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm.h#n780 * A pagecache page contains an opaque `private' member, which belongs to the * page's address_space. Usually, this is the address of a circular list of * the page's disk buffers. PG_private must be set to tell the VM to call * into the filesystem to release these pages. * * A page may belong to an inode's memory mapping. In this case, page->mapping * is the pointer to the inode, and page->index is the file offset of the page, * in units of PAGE_SIZE. * * If pagecache pages are not associated with an inode, they are said to be * anonymous pages. These may become associated with the swapcache, and in that * case PG_swapcache is set, and page->private is an offset into the swapcache. * * In either case (swapcache or inode backed), the pagecache itself holds one * reference to the page. Setting PG_private should also increment the * refcount. The each user mapping also has a reference to the page. and when I looked into that, I found https://lore.kernel.org/lkml/3CB3CA93.D141680B@zip.com.au/ Thanks, Gao Xiang
Am Freitag, 14. Dezember 2018, 13:26:28 CET schrieb Gao Xiang: > Hi Richard, > > On 2018/12/14 19:25, Richard Weinberger wrote: > > This is the third place which needs this workaround. > > UBIFS, F2FS, and now iomap. > > > > I agree with Dave that nobody can assume that PG_private implies an additional > > page reference. > > But page migration does that. Including parts of the write back code. > > It seems that it's clearly documented in > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm.h#n780 > > * A pagecache page contains an opaque `private' member, which belongs to the > * page's address_space. Usually, this is the address of a circular list of > * the page's disk buffers. PG_private must be set to tell the VM to call > * into the filesystem to release these pages. > * > * A page may belong to an inode's memory mapping. In this case, page->mapping > * is the pointer to the inode, and page->index is the file offset of the page, > * in units of PAGE_SIZE. > * > * If pagecache pages are not associated with an inode, they are said to be > * anonymous pages. These may become associated with the swapcache, and in that > * case PG_swapcache is set, and page->private is an offset into the swapcache. > * > * In either case (swapcache or inode backed), the pagecache itself holds one > * reference to the page. Setting PG_private should also increment the > * refcount. The each user mapping also has a reference to the page. > > and when I looked into that, I found > https://lore.kernel.org/lkml/3CB3CA93.D141680B@zip.com.au/ Hmm, in case of UBIFS it seems easy. We can add a get/put_page() around setting/clearing the flag. I did that now and so far none of my tests exploded. Artem, do you remember why UBIFS never raised the page counter when setting PG_private? Thanks, //richard
Hi Richard, On 2018/12/14 21:35, Richard Weinberger wrote: > Hmm, in case of UBIFS it seems easy. We can add a get/put_page() around setting/clearing > the flag. > I did that now and so far none of my tests exploded. Yes, many existed codes are based on this restriction in order to be freeable race-free. and that's it since PG_Private was once introduced at first by Andrew Morton in 2002 for many Linux versions....and it's not bad I think... :) Thanks, Gao Xiang
On 2018/12/14 13:56, zhangjun wrote: > IOMAP uses PG_private a little different with buffer_head based > filesystem. > It uses it as marker and when set, the page counter is not incremented, > migrate_page_move_mapping() assumes that PG_private indicates a counter > of +1. > so, we have to pass a extra count of -1 to migrate_page_move_mapping() > if the flag is set. > > Signed-off-by: zhangjun <openzhangj@gmail.com> > --- I found that it fixed in https://patchwork.kernel.org/patch/10684835/ and has been merged in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61c6de667263184125d5ca75e894fcad632b0dd3 It seems it has been corrected by Piotr. Thanks, Gao Xiang
FYI, for iomap we got a patch to just increment the page count when setting the private data, and it finally got merged into mainline after a while. Not that it totally makes sense to me, but it is what it is. It would just be nice if set_page_private took care of it and we had a clear_page_private to undo it, making the whole scheme at lot more obvious.
Am Samstag, 15. Dezember 2018, 11:51:12 CET schrieb Christoph Hellwig: > FYI, for iomap we got a patch to just increment the page count when > setting the private data, and it finally got merged into mainline after > a while. > > Not that it totally makes sense to me, but it is what it is. It would > just be nice if set_page_private took care of it and we had a > clear_page_private to undo it, making the whole scheme at lot more > obvious. Yeah, UBIFS will go the same route. I have already a patch prepared which increments the page count when UBIFS sets PG_private. Thanks, //richard
diff --git a/fs/iomap.c b/fs/iomap.c index 64ce240..352e58a 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -544,8 +544,17 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage, struct page *page, enum migrate_mode mode) { int ret; + int extra_count = 0; - ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0); + /* + * IOMAP uses PG_private as marker and does not raise the page counter. + * migrate_page_move_mapping() expects a incremented counter if PG_private + * is set. Therefore pass -1 as extra_count for this case. + */ + if (page_has_private(page)) + extra_count = -1; + ret = migrate_page_move_mapping(mapping, newpage, page, + NULL, mode, extra_count); if (ret != MIGRATEPAGE_SUCCESS) return ret;
IOMAP uses PG_private a little different with buffer_head based filesystem. It uses it as marker and when set, the page counter is not incremented, migrate_page_move_mapping() assumes that PG_private indicates a counter of +1. so, we have to pass a extra count of -1 to migrate_page_move_mapping() if the flag is set. Signed-off-by: zhangjun <openzhangj@gmail.com> --- fs/iomap.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)