Message ID | 20211009093724.10539-4-linyunsheng@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | some optimization for page pool | expand |
On 10/9/21 02:37, Yunsheng Lin wrote: > Introduce __get_page() and __put_page() to operate on the > base page or head of a compound page for the cases when a > page is known to be a base page or head of a compound page. Hi, I wonder if you are aware of a much larger, 137-patch seriesto do that: folio/pageset [1]? The naming you are proposing here does not really improve clarity. There is nothing about __get_page() that makes it clear that it's meant only for head/base pages, while get_page() tail pages as well. And the well-known and widely used get_page() and put_page() get their meaning shifted. This area is hard to get right, and that's why there have been 15 versions, and a lot of contention associated with [1]. If you have an alternate approach, I think it would be better in its own separate series, with a cover letter that, at a minimum, explains how it compares to folios/pagesets. So in case it's not clear, I'd like to request that you drop this one patch from your series. [1] https://lore.kernel.org/r/YSPwmNNuuQhXNToQ@casper.infradead.org thanks,
On Sat, Oct 09, 2021 at 12:49:29PM -0700, John Hubbard wrote: > On 10/9/21 02:37, Yunsheng Lin wrote: > > Introduce __get_page() and __put_page() to operate on the > > base page or head of a compound page for the cases when a > > page is known to be a base page or head of a compound page. > > Hi, > > I wonder if you are aware of a much larger, 137-patch seriesto do that: > folio/pageset [1]? > > The naming you are proposing here does not really improve clarity. There > is nothing about __get_page() that makes it clear that it's meant only > for head/base pages, while get_page() tail pages as well. And the > well-known and widely used get_page() and put_page() get their meaning > shifted. > > This area is hard to get right, and that's why there have been 15 > versions, and a lot of contention associated with [1]. If you have an > alternate approach, I think it would be better in its own separate > series, with a cover letter that, at a minimum, explains how it compares > to folios/pagesets. I wasn't initially sure whether network pagepools should be part of struct folio or should be their own separate type. At this point, I think they should be a folio. But that's all kind of irrelevant until Linus decides whether he's going to take the folio patchset or not. Feel free to let him know your opinion when the inevitable argument blows up again around the next pull request.
On 2021/10/10 4:15, Matthew Wilcox wrote: > On Sat, Oct 09, 2021 at 12:49:29PM -0700, John Hubbard wrote: >> On 10/9/21 02:37, Yunsheng Lin wrote: >>> Introduce __get_page() and __put_page() to operate on the >>> base page or head of a compound page for the cases when a >>> page is known to be a base page or head of a compound page. >> >> Hi, >> >> I wonder if you are aware of a much larger, 137-patch seriesto do that: >> folio/pageset [1]? >> >> The naming you are proposing here does not really improve clarity. There >> is nothing about __get_page() that makes it clear that it's meant only >> for head/base pages, while get_page() tail pages as well. And the >> well-known and widely used get_page() and put_page() get their meaning >> shifted. >> >> This area is hard to get right, and that's why there have been 15 >> versions, and a lot of contention associated with [1]. If you have an >> alternate approach, I think it would be better in its own separate >> series, with a cover letter that, at a minimum, explains how it compares >> to folios/pagesets. As I was not familiar enough with mm, so I tried the semantic of __page_frag_cache_drain(), which expects a base page or the head page of a compound page too. I suppose we may need to put a BUG_ON() to catch the case of user passing a tail page accidentally, which is a run time error and has run time overhead? And adding a new type like folio will allow the compiler to catch the error without any overhead? > > I wasn't initially sure whether network pagepools should be part of > struct folio or should be their own separate type. At this point, I Actually only a few driver are using page pool now, and others are mostly using page allocator directly, see page_frag_alloc_align() and skb_page_frag_refill(), only changing the page pool does not seems helpful here, maybe the whole network stack should be using a new type like folio, as the netstask does not need to deal with tail page directly? And it seems virt_to_page() is one of the things need handling when netstack is changed to use a new type like folio? >
On 09/10/2021 21.49, John Hubbard wrote: > So in case it's not clear, I'd like to request that you drop this one > patch from your series. In my opinion as page_pool maintainer, you should also drop patch 4/4 from this series. I like the first two patches, and they should be resend and can be applied without too much further discussion. --Jesper
On Mon, Oct 11, 2021 at 02:25:08PM +0200, Jesper Dangaard Brouer wrote: > > > On 09/10/2021 21.49, John Hubbard wrote: > > So in case it's not clear, I'd like to request that you drop this one > > patch from your series. > > In my opinion as page_pool maintainer, you should also drop patch 4/4 from > this series. > > I like the first two patches, and they should be resend and can be applied > without too much further discussion. +1 That's what I hinted on the previous version. The patches right now go way beyond the spec of page pool. We are starting to change core networking functions and imho we need a lot more people involved in this discussion, than the ones participating already. As a general note and the reason I am so hesitant, is that we are starting to violate layers here (at least in my opinion). When the recycling was added, my main concern was to keep the network stack unaware (apart from the skb bit). Now suddenly we need to teach frag_ref/unref internal page pool counters and that doesn't feel right. We first need to prove the race can actually happen, before starting to change things. Regards /Ilias > > --Jesper >
On 2021/10/11 20:29, Ilias Apalodimas wrote: > On Mon, Oct 11, 2021 at 02:25:08PM +0200, Jesper Dangaard Brouer wrote: >> >> >> On 09/10/2021 21.49, John Hubbard wrote: >>> So in case it's not clear, I'd like to request that you drop this one >>> patch from your series. >> >> In my opinion as page_pool maintainer, you should also drop patch 4/4 from >> this series. >> >> I like the first two patches, and they should be resend and can be applied >> without too much further discussion. > > +1 Ok, it seems there is a lot of contention about how to avoid calling compound_head() now. Will send out the uncontroversial one first. > That's what I hinted on the previous version. The patches right now go way > beyond the spec of page pool. We are starting to change core networking > functions and imho we need a lot more people involved in this discussion, > than the ones participating already. > > As a general note and the reason I am so hesitant, is that we are starting > to violate layers here (at least in my opinion). When the recycling was > added, my main concern was to keep the network stack unaware (apart from > the skb bit). Now suddenly we need to teach frag_ref/unref internal page Maybe the skb recycle bit is a clever way to avoid dealing with the network stack directly. But that bit might also introduce or hide some problem, like the data race as pointed out by Alexander, and the odd using of page pool in mlx5 driver. > pool counters and that doesn't feel right. We first need to prove the race > can actually happen, before starting to change things. As the network stack is adding a lot of performance improvement, such as sockmap for BPF, which may cause problem for them, will dig more to prove that. > > Regards > /Ilias >> >> --Jesper >> > . >
On Tue, Oct 12, 2021 at 03:38:15PM +0800, Yunsheng Lin wrote: > On 2021/10/11 20:29, Ilias Apalodimas wrote: > > On Mon, Oct 11, 2021 at 02:25:08PM +0200, Jesper Dangaard Brouer wrote: > >> > >> > >> On 09/10/2021 21.49, John Hubbard wrote: > >>> So in case it's not clear, I'd like to request that you drop this one > >>> patch from your series. > >> > >> In my opinion as page_pool maintainer, you should also drop patch 4/4 from > >> this series. > >> > >> I like the first two patches, and they should be resend and can be applied > >> without too much further discussion. > > > > +1 > > Ok, it seems there is a lot of contention about how to avoid calling > compound_head() now. > IMHO compound head is not that heavy. So you could keep the get/put page calls as-is and worry about micro optimizations later, especially since it's intersecting with folio changes atm. > Will send out the uncontroversial one first. > Thanks! > > That's what I hinted on the previous version. The patches right now go way > > beyond the spec of page pool. We are starting to change core networking > > functions and imho we need a lot more people involved in this discussion, > > than the ones participating already. > > > > As a general note and the reason I am so hesitant, is that we are starting > > to violate layers here (at least in my opinion). When the recycling was > > added, my main concern was to keep the network stack unaware (apart from > > the skb bit). Now suddenly we need to teach frag_ref/unref internal page > > Maybe the skb recycle bit is a clever way to avoid dealing with the network > stack directly. > > But that bit might also introduce or hide some problem, like the data race > as pointed out by Alexander, and the odd using of page pool in mlx5 driver. Yea. I was always wondering if unmaping the buffers and let the network stack deal with them eventually would be a good idea (on those special cases). There's an obvious disadvantage (which imho is terrible) in this approach. Any future functions that we add in the core networking code, will need to keep that in mindxi, and unmap some random driver memory if they start playing tricks with the skb and their fragments. IOW I think this is very fragile. > > > pool counters and that doesn't feel right. We first need to prove the race > > can actually happen, before starting to change things. > > As the network stack is adding a lot of performance improvement, such as > sockmap for BPF, which may cause problem for them, will dig more to prove > that. > Ok that's something we need to look at. Are those buffers freed eventually by skb_free_head() etc? Regards /Ilias
diff --git a/include/linux/mm.h b/include/linux/mm.h index 73a52aba448f..5683313c3e9d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -902,7 +902,7 @@ static inline struct page *virt_to_head_page(const void *x) return compound_head(page); } -void __put_page(struct page *page); +void __put_single_or_compound_page(struct page *page); void put_pages_list(struct list_head *pages); @@ -1203,9 +1203,8 @@ static inline bool is_pci_p2pdma_page(const struct page *page) #define page_ref_zero_or_close_to_overflow(page) \ ((unsigned int) page_ref_count(page) + 127u <= 127u) -static inline void get_page(struct page *page) +static inline void __get_page(struct page *page) { - page = compound_head(page); /* * Getting a normal page or the head of a compound page * requires to already have an elevated page->_refcount. @@ -1214,6 +1213,11 @@ static inline void get_page(struct page *page) page_ref_inc(page); } +static inline void get_page(struct page *page) +{ + __get_page(compound_head(page)); +} + bool __must_check try_grab_page(struct page *page, unsigned int flags); struct page *try_grab_compound_head(struct page *page, int refs, unsigned int flags); @@ -1228,10 +1232,8 @@ static inline __must_check bool try_get_page(struct page *page) return true; } -static inline void put_page(struct page *page) +static inline void __put_page(struct page *page) { - page = compound_head(page); - /* * For devmap managed pages we need to catch refcount transition from * 2 to 1, when refcount reach one it means the page is free and we @@ -1244,7 +1246,12 @@ static inline void put_page(struct page *page) } if (put_page_testzero(page)) - __put_page(page); + __put_single_or_compound_page(page); +} + +static inline void put_page(struct page *page) +{ + __put_page(compound_head(page)); } /* diff --git a/mm/swap.c b/mm/swap.c index af3cad4e5378..565cbde1caea 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -111,7 +111,7 @@ static void __put_compound_page(struct page *page) destroy_compound_page(page); } -void __put_page(struct page *page) +void __put_single_or_compound_page(struct page *page) { if (is_zone_device_page(page)) { put_dev_pagemap(page->pgmap); @@ -128,7 +128,7 @@ void __put_page(struct page *page) else __put_single_page(page); } -EXPORT_SYMBOL(__put_page); +EXPORT_SYMBOL(__put_single_or_compound_page); /** * put_pages_list() - release a list of pages @@ -1153,7 +1153,7 @@ void put_devmap_managed_page(struct page *page) if (count == 1) free_devmap_managed_page(page); else if (!count) - __put_page(page); + __put_single_or_compound_page(page); } EXPORT_SYMBOL(put_devmap_managed_page); #endif
Introduce __get_page() and __put_page() to operate on the base page or head of a compound page for the cases when a page is known to be a base page or head of a compound page. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> --- include/linux/mm.h | 21 ++++++++++++++------- mm/swap.c | 6 +++--- 2 files changed, 17 insertions(+), 10 deletions(-)