Message ID | 20221207223731.32784-1-sidhartha.kumar@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [mm-unstable] mm: clarify folio_set_compound_order() zero support | expand |
On 12/7/22 14:37, Sidhartha Kumar wrote: > Document hugetlb's use of a zero compound order so support for zero > orders is not removed from folio_set_compound_order(). > > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> > Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> > Suggested-by: Muchun Song <songmuchun@bytedance.com> > --- > This can be folded into f2b67a51d0ef6871d4fb0c3e8199f278112bd108 > mm: add folio dtor and order setter functions > > include/linux/mm.h | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 443d496949a8..cd8508d728f1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -999,9 +999,16 @@ static inline void set_compound_order(struct page *page, unsigned int order) > #endif > } > > +/* > + * folio_set_compound_order is generally passed a non-zero order to > + * initialize a large folio. However, hugetlb code abuses this by > + * passing in zero when 'dissolving' a large folio. > + */ Wouldn't it be better to instead just create a new function for that case, such as: dissolve_large_folio() ? > static inline void folio_set_compound_order(struct folio *folio, > unsigned int order) > { > + VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > + > folio->_folio_order = order; > #ifdef CONFIG_64BIT > folio->_folio_nr_pages = order ? 1U << order : 0; thanks,
On 12/7/22 4:38 PM, John Hubbard wrote: > On 12/7/22 14:37, Sidhartha Kumar wrote: >> Document hugetlb's use of a zero compound order so support for zero >> orders is not removed from folio_set_compound_order(). >> >> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> >> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> >> Suggested-by: Muchun Song <songmuchun@bytedance.com> >> --- >> This can be folded into f2b67a51d0ef6871d4fb0c3e8199f278112bd108 >> mm: add folio dtor and order setter functions >> >> include/linux/mm.h | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 443d496949a8..cd8508d728f1 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -999,9 +999,16 @@ static inline void set_compound_order(struct page >> *page, unsigned int order) >> #endif >> } >> +/* >> + * folio_set_compound_order is generally passed a non-zero order to >> + * initialize a large folio. However, hugetlb code abuses this by >> + * passing in zero when 'dissolving' a large folio. >> + */ > > Wouldn't it be better to instead just create a new function for that > case, such as: > > dissolve_large_folio() > Prior to the folio conversion, the helper function __destroy_compound_gigantic_page() did: set_compound_order(page, 0); #ifdef CONFIG_64BIT page[1].compound_nr = 0; #endif as part of dissolving the page. My goal for this patch was to create a function that would encapsulate that segment of code with a single call of folio_set_compound_order(folio, 0). set_compound_order() does not set compound_nr to 0 when 0 is passed in to the order argument so explicitly setting it is required. I don't think a separate dissolve_large_folio() function for the hugetlb case is needed as __destroy_compound_gigantic_folio() is pretty concise as it is. > ? > >> static inline void folio_set_compound_order(struct folio *folio, >> unsigned int order) >> { >> + VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >> + >> folio->_folio_order = order; >> #ifdef CONFIG_64BIT >> folio->_folio_nr_pages = order ? 1U << order : 0; > > thanks,
On 12/7/22 17:42, Sidhartha Kumar wrote: >> Wouldn't it be better to instead just create a new function for that >> case, such as: >> >> dissolve_large_folio() >> > > Prior to the folio conversion, the helper function __destroy_compound_gigantic_page() did: > > set_compound_order(page, 0); > #ifdef CONFIG_64BIT > page[1].compound_nr = 0; > #endif > > as part of dissolving the page. My goal for this patch was to create a function that would encapsulate that segment of code with a single call of folio_set_compound_order(folio, 0). set_compound_order() does not set compound_nr to 0 when 0 is passed in to the order argument so explicitly setting it is required. I don't think a separate dissolve_large_folio() function for the hugetlb case is needed as __destroy_compound_gigantic_folio() is pretty concise as it is. > Instead of "this is abusing function X()" comments, we should prefer well-named functions that do something understandable. And you can get that by noticing that folio_set_compound_order() collapses down to nearly nothing in the special "order 0" case. So just inline that code directly into __destroy_compound_gigantic_folio(), taking a moment to fill in and consolidate the CONFIG_64BIT missing parts in mm.h. And now you can get rid of this cruft and "abuse" comment, and instead just end up with two simple lines of code that are crystal clear--as they should be, in a "__destroy" function. Like this: diff --git a/include/linux/mm.h b/include/linux/mm.h index 105878936485..cf227ed00945 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1754,6 +1754,7 @@ static inline void set_page_links(struct page *page, enum zone_type zone, #endif } +#ifdef CONFIG_64BIT /** * folio_nr_pages - The number of pages in the folio. * @folio: The folio. @@ -1764,13 +1765,32 @@ static inline long folio_nr_pages(struct folio *folio) { if (!folio_test_large(folio)) return 1; -#ifdef CONFIG_64BIT return folio->_folio_nr_pages; +} + +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) +{ + folio->_folio_nr_pages = nr_pages; +} #else +/** + * folio_nr_pages - The number of pages in the folio. + * @folio: The folio. + * + * Return: A positive power of two. + */ +static inline long folio_nr_pages(struct folio *folio) +{ + if (!folio_test_large(folio)) + return 1; return 1L << folio->_folio_order; -#endif } +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) +{ +} +#endif + /** * folio_next - Move to the next physical folio. * @folio: The folio we're currently operating on. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e3500c087893..b507a98063e6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1344,7 +1344,8 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, set_page_refcounted(p); } - folio_set_compound_order(folio, 0); + folio->_folio_order = 0; + folio_set_nr_pages(folio, 0); __folio_clear_head(folio); } Yes? thanks,
On Thu, Dec 8, 2022 at 10:27 AM John Hubbard <jhubbard@nvidia.com> wrote: > > On 12/7/22 17:42, Sidhartha Kumar wrote: > >> Wouldn't it be better to instead just create a new function for that > >> case, such as: > >> > >> dissolve_large_folio() > >> > > > > Prior to the folio conversion, the helper function __destroy_compound_gigantic_page() did: > > > > set_compound_order(page, 0); > > #ifdef CONFIG_64BIT > > page[1].compound_nr = 0; > > #endif > > > > as part of dissolving the page. My goal for this patch was to create a function that would encapsulate that segment of code with a single call of folio_set_compound_order(folio, 0). set_compound_order() does not set compound_nr to 0 when 0 is passed in to the order argument so explicitly setting it is required. I don't think a separate dissolve_large_folio() function for the hugetlb case is needed as __destroy_compound_gigantic_folio() is pretty concise as it is. > > > > Instead of "this is abusing function X()" comments, we should prefer > well-named functions that do something understandable. And you can get > that by noticing that folio_set_compound_order() collapses down to > nearly nothing in the special "order 0" case. So just inline that code > directly into __destroy_compound_gigantic_folio(), taking a moment to > fill in and consolidate the CONFIG_64BIT missing parts in mm.h. > > And now you can get rid of this cruft and "abuse" comment, and instead > just end up with two simple lines of code that are crystal clear--as > they should be, in a "__destroy" function. Like this: > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 105878936485..cf227ed00945 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1754,6 +1754,7 @@ static inline void set_page_links(struct page *page, enum zone_type zone, > #endif > } > > +#ifdef CONFIG_64BIT > /** > * folio_nr_pages - The number of pages in the folio. > * @folio: The folio. > @@ -1764,13 +1765,32 @@ static inline long folio_nr_pages(struct folio *folio) > { > if (!folio_test_large(folio)) > return 1; > -#ifdef CONFIG_64BIT > return folio->_folio_nr_pages; > +} > + > +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) I like this approach and helper name since it is more consistent with folio_nr_pages. > +{ > + folio->_folio_nr_pages = nr_pages; > +} > #else > +/** > + * folio_nr_pages - The number of pages in the folio. > + * @folio: The folio. > + * > + * Return: A positive power of two. > + */ > +static inline long folio_nr_pages(struct folio *folio) > +{ > + if (!folio_test_large(folio)) > + return 1; > return 1L << folio->_folio_order; > -#endif > } > > +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) > +{ > +} > +#endif > + > /** > * folio_next - Move to the next physical folio. > * @folio: The folio we're currently operating on. > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e3500c087893..b507a98063e6 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1344,7 +1344,8 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, > set_page_refcounted(p); > } > > - folio_set_compound_order(folio, 0); > + folio->_folio_order = 0; I suggest not touch _folio_order directly, we can introduce another helper like folio_sert_order to set -> _folio_order pairing with folio_order. Thanks. > + folio_set_nr_pages(folio, 0); > __folio_clear_head(folio); > } > > > Yes? > > thanks, > -- > John Hubbard > NVIDIA
On 12/7/22 6:27 PM, John Hubbard wrote: > On 12/7/22 17:42, Sidhartha Kumar wrote: >>> Wouldn't it be better to instead just create a new function for that >>> case, such as: >>> >>> dissolve_large_folio() >>> >> >> Prior to the folio conversion, the helper function >> __destroy_compound_gigantic_page() did: >> >> set_compound_order(page, 0); >> #ifdef CONFIG_64BIT >> page[1].compound_nr = 0; >> #endif >> >> as part of dissolving the page. My goal for this patch was to create a >> function that would encapsulate that segment of code with a single >> call of folio_set_compound_order(folio, 0). set_compound_order() does >> not set compound_nr to 0 when 0 is passed in to the order argument so >> explicitly setting it is required. I don't think a separate >> dissolve_large_folio() function for the hugetlb case is needed as >> __destroy_compound_gigantic_folio() is pretty concise as it is. >> > > Instead of "this is abusing function X()" comments, we should prefer > well-named functions that do something understandable. And you can get > that by noticing that folio_set_compound_order() collapses down to > nearly nothing in the special "order 0" case. So just inline that code > directly into __destroy_compound_gigantic_folio(), taking a moment to > fill in and consolidate the CONFIG_64BIT missing parts in mm.h. > > And now you can get rid of this cruft and "abuse" comment, and instead > just end up with two simple lines of code that are crystal clear--as > they should be, in a "__destroy" function. Like this: > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 105878936485..cf227ed00945 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1754,6 +1754,7 @@ static inline void set_page_links(struct page > *page, enum zone_type zone, > #endif > } > > +#ifdef CONFIG_64BIT > /** > * folio_nr_pages - The number of pages in the folio. > * @folio: The folio. > @@ -1764,13 +1765,32 @@ static inline long folio_nr_pages(struct folio > *folio) > { > if (!folio_test_large(folio)) > return 1; > -#ifdef CONFIG_64BIT > return folio->_folio_nr_pages; > +} > + > +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) > +{ > + folio->_folio_nr_pages = nr_pages; > +} > #else > +/** > + * folio_nr_pages - The number of pages in the folio. > + * @folio: The folio. > + * > + * Return: A positive power of two. > + */ > +static inline long folio_nr_pages(struct folio *folio) > +{ > + if (!folio_test_large(folio)) > + return 1; > return 1L << folio->_folio_order; > -#endif > } > > +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) > +{ > +} > +#endif > + > /** > * folio_next - Move to the next physical folio. > * @folio: The folio we're currently operating on. > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e3500c087893..b507a98063e6 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1344,7 +1344,8 @@ static void > __destroy_compound_gigantic_folio(struct folio *folio, > set_page_refcounted(p); > } > > - folio_set_compound_order(folio, 0); > + folio->_folio_order = 0; > + folio_set_nr_pages(folio, 0); > __folio_clear_head(folio); > } > > > Yes? This works for me, I will take this approach along with Muchun's feedback about a wrapper function so as not to touch _folio_order directly and send out a new version. One question I have is if I should then get rid of folio_set_compound_order() as hugetlb is the only compound page user I've converted to folios so far and its use can be replaced by the suggested folio_set_nr_pages() and folio_set_order(). Hugetlb also has one has one call to folio_set_compound_order() with a non-zero order, should I replace this with a call to folio_set_order() and folio_set_nr_pages() as well, or keep folio_set_compound_order() and remove zero order support and the comment. Please let me know which approach you would prefer. Thanks, Sidhartha Kumar > thanks,
On 12/08/22 10:06, Sidhartha Kumar wrote: > On 12/7/22 6:27 PM, John Hubbard wrote: > > On 12/7/22 17:42, Sidhartha Kumar wrote: > > > > Wouldn't it be better to instead just create a new function for that > > > > case, such as: > > > > > > > > dissolve_large_folio() > > > > > > > > > > Prior to the folio conversion, the helper function > > > __destroy_compound_gigantic_page() did: > > > > > > set_compound_order(page, 0); > > > #ifdef CONFIG_64BIT > > > page[1].compound_nr = 0; > > > #endif > > > > > > as part of dissolving the page. My goal for this patch was to create > > > a function that would encapsulate that segment of code with a single > > > call of folio_set_compound_order(folio, 0). set_compound_order() > > > does not set compound_nr to 0 when 0 is passed in to the order > > > argument so explicitly setting it is required. I don't think a > > > separate dissolve_large_folio() function for the hugetlb case is > > > needed as __destroy_compound_gigantic_folio() is pretty concise as > > > it is. > > > > > > > Instead of "this is abusing function X()" comments, we should prefer > > well-named functions that do something understandable. And you can get > > that by noticing that folio_set_compound_order() collapses down to > > nearly nothing in the special "order 0" case. So just inline that code > > directly into __destroy_compound_gigantic_folio(), taking a moment to > > fill in and consolidate the CONFIG_64BIT missing parts in mm.h. > > > > And now you can get rid of this cruft and "abuse" comment, and instead > > just end up with two simple lines of code that are crystal clear--as > > they should be, in a "__destroy" function. Like this: > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 105878936485..cf227ed00945 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1754,6 +1754,7 @@ static inline void set_page_links(struct page > > *page, enum zone_type zone, > > #endif > > } > > > > +#ifdef CONFIG_64BIT > > /** > > * folio_nr_pages - The number of pages in the folio. > > * @folio: The folio. > > @@ -1764,13 +1765,32 @@ static inline long folio_nr_pages(struct folio > > *folio) > > { > > if (!folio_test_large(folio)) > > return 1; > > -#ifdef CONFIG_64BIT > > return folio->_folio_nr_pages; > > +} > > + > > +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) > > +{ > > + folio->_folio_nr_pages = nr_pages; > > +} > > #else > > +/** > > + * folio_nr_pages - The number of pages in the folio. > > + * @folio: The folio. > > + * > > + * Return: A positive power of two. > > + */ > > +static inline long folio_nr_pages(struct folio *folio) > > +{ > > + if (!folio_test_large(folio)) > > + return 1; > > return 1L << folio->_folio_order; > > -#endif > > } > > > > +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) > > +{ > > +} > > +#endif > > + > > /** > > * folio_next - Move to the next physical folio. > > * @folio: The folio we're currently operating on. > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index e3500c087893..b507a98063e6 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1344,7 +1344,8 @@ static void > > __destroy_compound_gigantic_folio(struct folio *folio, > > set_page_refcounted(p); > > } > > > > - folio_set_compound_order(folio, 0); > > + folio->_folio_order = 0; > > + folio_set_nr_pages(folio, 0); > > __folio_clear_head(folio); > > } > > > > > > Yes? > > This works for me, I will take this approach along with Muchun's feedback > about a wrapper function so as not to touch _folio_order directly and send > out a new version. > > One question I have is if I should then get rid of > folio_set_compound_order() as hugetlb is the only compound page user I've > converted to folios so far and its use can be replaced by the suggested > folio_set_nr_pages() and folio_set_order(). > > Hugetlb also has one has one call to folio_set_compound_order() with a > non-zero order, should I replace this with a call to folio_set_order() and > folio_set_nr_pages() as well, or keep folio_set_compound_order() and remove > zero order support and the comment. Please let me know which approach you > would prefer. My opinion ... which is often wrong :) I agree 100% with John about these being clear and understandable routines, as well as Muchun's comment about the need for an interface for setting _folio_order instead of accessing directly. However, I do have some concern about two independent interfaces for setting _folio_order and _folio_nr_pages. These two fields really do need to be "in sync" and having two interfaces to modify independently may lead to errors. Any thoughts Matthew? You introduced folios as well as the somewhat redundant compound_nr/_folio_nr_pages fields. Here is another thought. Since this discussion started with a question about "Can/should you really pass a zero order to folio_set_compound_order?", what about a separate "folio_clear_order" interface? We would then have: folio_set_order() /* passed non-zero order when creating large folio */ folio_clear_order() /* zero order is implied */ Note that I did drop 'compound' from the names. The more I think about the folio direction we really should not use compound in the same. Both of these could wrap a renamed version of Sidhartha's routine folio_set_compound_order(). In this way the two fields are automatically kept in sync. Again, just my thoughts/opinion.
On Thu, Dec 08, 2022 at 10:06:07AM -0800, Sidhartha Kumar wrote: > On 12/7/22 6:27 PM, John Hubbard wrote: > > On 12/7/22 17:42, Sidhartha Kumar wrote: > > > > Wouldn't it be better to instead just create a new function for that > > > > case, such as: > > > > > > > > dissolve_large_folio() > > > > > > > > > > Prior to the folio conversion, the helper function > > > __destroy_compound_gigantic_page() did: > > > > > > set_compound_order(page, 0); > > > #ifdef CONFIG_64BIT > > > page[1].compound_nr = 0; > > > #endif > > > > > > as part of dissolving the page. My goal for this patch was to create > > > a function that would encapsulate that segment of code with a single > > > call of folio_set_compound_order(folio, 0). set_compound_order() > > > does not set compound_nr to 0 when 0 is passed in to the order > > > argument so explicitly setting it is required. I don't think a > > > separate dissolve_large_folio() function for the hugetlb case is > > > needed as __destroy_compound_gigantic_folio() is pretty concise as > > > it is. > > > > > > > Instead of "this is abusing function X()" comments, we should prefer > > well-named functions that do something understandable. And you can get > > that by noticing that folio_set_compound_order() collapses down to > > nearly nothing in the special "order 0" case. So just inline that code > > directly into __destroy_compound_gigantic_folio(), taking a moment to > > fill in and consolidate the CONFIG_64BIT missing parts in mm.h. > > > > And now you can get rid of this cruft and "abuse" comment, and instead > > just end up with two simple lines of code that are crystal clear--as > > they should be, in a "__destroy" function. Like this: > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 105878936485..cf227ed00945 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1754,6 +1754,7 @@ static inline void set_page_links(struct page > > *page, enum zone_type zone, > > #endif > > } > > > > +#ifdef CONFIG_64BIT > > /** > > * folio_nr_pages - The number of pages in the folio. > > * @folio: The folio. > > @@ -1764,13 +1765,32 @@ static inline long folio_nr_pages(struct folio > > *folio) > > { > > if (!folio_test_large(folio)) > > return 1; > > -#ifdef CONFIG_64BIT > > return folio->_folio_nr_pages; > > +} > > + > > +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) > > +{ > > + folio->_folio_nr_pages = nr_pages; > > +} > > #else > > +/** > > + * folio_nr_pages - The number of pages in the folio. > > + * @folio: The folio. > > + * > > + * Return: A positive power of two. > > + */ > > +static inline long folio_nr_pages(struct folio *folio) > > +{ > > + if (!folio_test_large(folio)) > > + return 1; > > return 1L << folio->_folio_order; > > -#endif > > } > > > > +static inline void folio_set_nr_pages(struct folio *folio, long nr_pages) > > +{ > > +} > > +#endif > > + > > /** > > * folio_next - Move to the next physical folio. > > * @folio: The folio we're currently operating on. > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index e3500c087893..b507a98063e6 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1344,7 +1344,8 @@ static void > > __destroy_compound_gigantic_folio(struct folio *folio, > > set_page_refcounted(p); > > } > > > > - folio_set_compound_order(folio, 0); > > + folio->_folio_order = 0; > > + folio_set_nr_pages(folio, 0); > > __folio_clear_head(folio); > > } > > > > > > Yes? > > This works for me, I will take this approach along with Muchun's feedback > about a wrapper function so as not to touch _folio_order directly and send > out a new version. > > One question I have is if I should then get rid of > folio_set_compound_order() as hugetlb is the only compound page user I've > converted to folios so far and its use can be replaced by the suggested > folio_set_nr_pages() and folio_set_order(). > > Hugetlb also has one has one call to folio_set_compound_order() with a > non-zero order, should I replace this with a call to folio_set_order() and > folio_set_nr_pages() as well, or keep folio_set_compound_order() and remove > zero order support and the comment. Please let me know which approach you > would prefer. None of the above! Whatever we're calling this function *it does not belong* in mm.h. Anything outside the MM calling it is going to be a disaster -- can you imagine what will happen if a filesystem or device driver is handed a folio and decides "Oh, I'll just change the size of this folio"? It is an attractive nuisance and should be confined to mm/internal.h *at best*. Equally, we *must not have* separate folio_set_order() and folio_set_nr_pages(). These are the same thing! They must be kept in sync! If we are to have a folio_set_order() instead of open-coding it, then it should also update nr_pages. So, given that this is now an internal-to-mm, if not internal-to-hugetlb function, I see no reason that it should not handle the case of 0. I haven't studied what hugetlb_dissolve does, or why it can't use the standard split_folio(), but I'm sure there's a good reason.
On 12/8/22 11:33, Matthew Wilcox wrote: > None of the above! > > Whatever we're calling this function *it does not belong* in mm.h. > Anything outside the MM calling it is going to be a disaster -- can you > imagine what will happen if a filesystem or device driver is handed a > folio and decides "Oh, I'll just change the size of this folio"? It is > an attractive nuisance and should be confined to mm/internal.h *at best*. > > Equally, we *must not have* separate folio_set_order() and > folio_set_nr_pages(). These are the same thing! They must be kept > in sync! If we are to have a folio_set_order() instead of open-coding > it, then it should also update nr_pages. > > So, given that this is now an internal-to-mm, if not internal-to-hugetlb > function, I see no reason that it should not handle the case of 0. > I haven't studied what hugetlb_dissolve does, or why it can't use the > standard split_folio(), but I'm sure there's a good reason. Sure, perhaps I overreacted to the "abuse of this function" comment, and the whole thing could be fixed up with a clean name, hiding it from the public mm.h API, and...removing comments about abuse! haha :) thanks,
On 12/08/22 19:33, Matthew Wilcox wrote: > On Thu, Dec 08, 2022 at 10:06:07AM -0800, Sidhartha Kumar wrote: > > On 12/7/22 6:27 PM, John Hubbard wrote: > > > On 12/7/22 17:42, Sidhartha Kumar wrote: > > This works for me, I will take this approach along with Muchun's feedback > > about a wrapper function so as not to touch _folio_order directly and send > > out a new version. > > > > One question I have is if I should then get rid of > > folio_set_compound_order() as hugetlb is the only compound page user I've > > converted to folios so far and its use can be replaced by the suggested > > folio_set_nr_pages() and folio_set_order(). > > > > Hugetlb also has one has one call to folio_set_compound_order() with a > > non-zero order, should I replace this with a call to folio_set_order() and > > folio_set_nr_pages() as well, or keep folio_set_compound_order() and remove > > zero order support and the comment. Please let me know which approach you > > would prefer. > > None of the above! > > Whatever we're calling this function *it does not belong* in mm.h. > Anything outside the MM calling it is going to be a disaster -- can you > imagine what will happen if a filesystem or device driver is handed a > folio and decides "Oh, I'll just change the size of this folio"? It is > an attractive nuisance and should be confined to mm/internal.h *at best*. I suspect it was placed in mm.h as it is the 'folio version' of set_compound_order which resides in mm.h. But, no need to repeat that unfortunate placement. > > Equally, we *must not have* separate folio_set_order() and > folio_set_nr_pages(). These are the same thing! They must be kept > in sync! If we are to have a folio_set_order() instead of open-coding > it, then it should also update nr_pages. Ok. Agree. > So, given that this is now an internal-to-mm, if not internal-to-hugetlb > function, I see no reason that it should not handle the case of 0. > I haven't studied what hugetlb_dissolve does, or why it can't use the > standard split_folio(), but I'm sure there's a good reason. The hugetlb code is changing the compound page/folio it created from a set of individual pages back to individual pages so they can be returned to the low level allocator. Somewhat like what page_alloc/page_free do. split_folio is overkill. split_page would be a closer match. It makes perfect sense to put the function in mm internal.h. Thanks,
On 12/8/22 12:01 PM, Mike Kravetz wrote: > On 12/08/22 19:33, Matthew Wilcox wrote: >> On Thu, Dec 08, 2022 at 10:06:07AM -0800, Sidhartha Kumar wrote: >>> On 12/7/22 6:27 PM, John Hubbard wrote: >>>> On 12/7/22 17:42, Sidhartha Kumar wrote: >>> This works for me, I will take this approach along with Muchun's feedback >>> about a wrapper function so as not to touch _folio_order directly and send >>> out a new version. >>> >>> One question I have is if I should then get rid of >>> folio_set_compound_order() as hugetlb is the only compound page user I've >>> converted to folios so far and its use can be replaced by the suggested >>> folio_set_nr_pages() and folio_set_order(). >>> >>> Hugetlb also has one has one call to folio_set_compound_order() with a >>> non-zero order, should I replace this with a call to folio_set_order() and >>> folio_set_nr_pages() as well, or keep folio_set_compound_order() and remove >>> zero order support and the comment. Please let me know which approach you >>> would prefer. >> >> None of the above! >> >> Whatever we're calling this function *it does not belong* in mm.h. >> Anything outside the MM calling it is going to be a disaster -- can you >> imagine what will happen if a filesystem or device driver is handed a >> folio and decides "Oh, I'll just change the size of this folio"? It is >> an attractive nuisance and should be confined to mm/internal.h *at best*. > > I suspect it was placed in mm.h as it is the 'folio version' of > set_compound_order which resides in mm.h. But, no need to repeat that > unfortunate placement. > >> >> Equally, we *must not have* separate folio_set_order() and >> folio_set_nr_pages(). These are the same thing! They must be kept >> in sync! If we are to have a folio_set_order() instead of open-coding >> it, then it should also update nr_pages. > > Ok. Agree. > >> So, given that this is now an internal-to-mm, if not internal-to-hugetlb >> function, I see no reason that it should not handle the case of 0. >> I haven't studied what hugetlb_dissolve does, or why it can't use the >> standard split_folio(), but I'm sure there's a good reason. > > The hugetlb code is changing the compound page/folio it created from a set of > individual pages back to individual pages so they can be returned to the > low level allocator. Somewhat like what page_alloc/page_free do. split_folio > is overkill. split_page would be a closer match. > > It makes perfect sense to put the function in mm internal.h. > Thanks John, Mike, Matthew, and Muchun for the feedback. To summarize this discussion and outline the next version of this patch, the changes I'll make include: 1) change the name of folio_set_compound_order() to folio_set_order() 2) change the placement of this function from mm.h to mm/internal.h 3) folio_set_order() will set both _folio_order and _folio_nr_pages and handle the zero order case correctly. 4) remove the comment about hugetlb's specific use for zero orders 5) improve the style of folio_set_order() by removing ifdefs from inside the function to doing #ifdef CONFIG_64BIT static inline void folio_set_order(struct folio *folio, unsigned int order) { VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); folio->_folio_order = order; folio->_folio_nr_pages = order ? 1U << order : 0; } #else static inline void folio_set_order(struct folio *folio, unsigned int order) { VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); folio->_folio_order = order; } #endif Please let me know if I missing something. Thanks, Sidhartha Kumar > Thanks,
On 12/8/22 13:58, Sidhartha Kumar wrote: > Thanks John, Mike, Matthew, and Muchun for the feedback. > > To summarize this discussion and outline the next version of this patch, the changes I'll make include: > > 1) change the name of folio_set_compound_order() to folio_set_order() > 2) change the placement of this function from mm.h to mm/internal.h > 3) folio_set_order() will set both _folio_order and _folio_nr_pages and handle the zero order case correctly. > 4) remove the comment about hugetlb's specific use for zero orders > 5) improve the style of folio_set_order() by removing ifdefs from inside the function to doing > > #ifdef CONFIG_64BIT > static inline void folio_set_order(struct folio *folio, > unsigned int order) > { > VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); Sounds good, except for this part: why is a function named folio_set_order() BUG-ing on a non-large folio? The naming is still wrong, perhaps? > > folio->_folio_order = order; > folio->_folio_nr_pages = order ? 1U << order : 0; > } > #else > static inline void folio_set_order(struct folio *folio, > unsigned int order) > { > VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > > folio->_folio_order = order; > } > #endif > > Please let me know if I missing something. > Thanks, > Sidhartha Kumar >> Thanks, > thanks,
On Thu, Dec 08, 2022 at 01:58:20PM -0800, Sidhartha Kumar wrote: > 5) improve the style of folio_set_order() by removing ifdefs from inside the > function to doing > > #ifdef CONFIG_64BIT > static inline void folio_set_order(struct folio *folio, > unsigned int order) > { > VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > > folio->_folio_order = order; > folio->_folio_nr_pages = order ? 1U << order : 0; > } > #else > static inline void folio_set_order(struct folio *folio, > unsigned int order) > { > VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > > folio->_folio_order = order; > } > #endif While we usually prefer to put ifdefs outside the function, I don't think that's justified in this case. I'd rather see a comment inside the function like: static inline void folio_set_order(struct folio *folio, unsigned int order) { VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); folio->_folio_order = order; #ifdef CONFIG_64BIT /* * When hugetlb dissolves a folio, we need to clear the tail * page, rather than setting nr_pages to 1. */ folio->_folio_nr_pages = order ? 1U << order : 0; #endif }
On 12/8/22 2:01 PM, John Hubbard wrote: > On 12/8/22 13:58, Sidhartha Kumar wrote: >> Thanks John, Mike, Matthew, and Muchun for the feedback. >> >> To summarize this discussion and outline the next version of this >> patch, the changes I'll make include: >> >> 1) change the name of folio_set_compound_order() to folio_set_order() >> 2) change the placement of this function from mm.h to mm/internal.h >> 3) folio_set_order() will set both _folio_order and _folio_nr_pages >> and handle the zero order case correctly. >> 4) remove the comment about hugetlb's specific use for zero orders >> 5) improve the style of folio_set_order() by removing ifdefs from >> inside the function to doing >> >> #ifdef CONFIG_64BIT >> static inline void folio_set_order(struct folio *folio, >> unsigned int order) >> { >> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); > > Sounds good, except for this part: why is a function named > folio_set_order() BUG-ing on a non-large folio? The naming > is still wrong, perhaps? > This is because the _folio_nr_pages and _folio_order fields are part of the first tail page in the folio. folio_test_large returns if the folio is larger than one page which would be required for setting the fields. >> >> folio->_folio_order = order; >> folio->_folio_nr_pages = order ? 1U << order : 0; >> } >> #else >> static inline void folio_set_order(struct folio *folio, >> unsigned int order) >> { >> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >> >> folio->_folio_order = order; >> } >> #endif >> >> Please let me know if I missing something. >> Thanks, >> Sidhartha Kumar >>> Thanks, >> > > thanks,
On 12/8/22 14:12, Sidhartha Kumar wrote: > On 12/8/22 2:01 PM, John Hubbard wrote: >> On 12/8/22 13:58, Sidhartha Kumar wrote: >>> Thanks John, Mike, Matthew, and Muchun for the feedback. >>> >>> To summarize this discussion and outline the next version of this patch, the changes I'll make include: >>> >>> 1) change the name of folio_set_compound_order() to folio_set_order() >>> 2) change the placement of this function from mm.h to mm/internal.h >>> 3) folio_set_order() will set both _folio_order and _folio_nr_pages and handle the zero order case correctly. >>> 4) remove the comment about hugetlb's specific use for zero orders >>> 5) improve the style of folio_set_order() by removing ifdefs from inside the function to doing >>> >>> #ifdef CONFIG_64BIT >>> static inline void folio_set_order(struct folio *folio, >>> unsigned int order) >>> { >>> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >> >> Sounds good, except for this part: why is a function named >> folio_set_order() BUG-ing on a non-large folio? The naming >> is still wrong, perhaps? >> > > This is because the _folio_nr_pages and _folio_order fields are part of the first tail page in the folio. folio_test_large returns if the folio is larger than one page which would be required for setting the fields. OK, but then as I said, the name is wrong. One can either: a) handle the non-large case, or b) rename the function to indicate that it only works on large folios. thanks,
On 12/8/22 2:14 PM, John Hubbard wrote: > On 12/8/22 14:12, Sidhartha Kumar wrote: >> On 12/8/22 2:01 PM, John Hubbard wrote: >>> On 12/8/22 13:58, Sidhartha Kumar wrote: >>>> Thanks John, Mike, Matthew, and Muchun for the feedback. >>>> >>>> To summarize this discussion and outline the next version of this >>>> patch, the changes I'll make include: >>>> >>>> 1) change the name of folio_set_compound_order() to folio_set_order() >>>> 2) change the placement of this function from mm.h to mm/internal.h >>>> 3) folio_set_order() will set both _folio_order and _folio_nr_pages >>>> and handle the zero order case correctly. >>>> 4) remove the comment about hugetlb's specific use for zero orders >>>> 5) improve the style of folio_set_order() by removing ifdefs from >>>> inside the function to doing >>>> >>>> #ifdef CONFIG_64BIT >>>> static inline void folio_set_order(struct folio *folio, >>>> unsigned int order) >>>> { >>>> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >>> >>> Sounds good, except for this part: why is a function named >>> folio_set_order() BUG-ing on a non-large folio? The naming >>> is still wrong, perhaps? >>> >> >> This is because the _folio_nr_pages and _folio_order fields are part >> of the first tail page in the folio. folio_test_large returns if the >> folio is larger than one page which would be required for setting the >> fields. > > OK, but then as I said, the name is wrong. One can either: > > a) handle the non-large case, or > > b) rename the function to indicate that it only works on large folios. > Discussed here[1], the BUG_ON line seemed more appropriate over if (!folio_test_large(folio)) return; as the misuse would not be silent. I think I would be against renaming the function as I don't see any large folio specific function names for other accessors of tail page fields. Would both the BUG_ON and return on non-large folio be included then? [1]: https://lore.kernel.org/linux-mm/20221129225039.82257-1-sidhartha.kumar@oracle.com/T/#m98cf80bb21ae533b7385f2e363c602e2c9e2802d > > thanks,
On 12/8/22 14:33, Sidhartha Kumar wrote: > On 12/8/22 2:14 PM, John Hubbard wrote: >> On 12/8/22 14:12, Sidhartha Kumar wrote: >>> On 12/8/22 2:01 PM, John Hubbard wrote: >>>> On 12/8/22 13:58, Sidhartha Kumar wrote: >>>>> Thanks John, Mike, Matthew, and Muchun for the feedback. >>>>> >>>>> To summarize this discussion and outline the next version of this patch, the changes I'll make include: >>>>> >>>>> 1) change the name of folio_set_compound_order() to folio_set_order() >>>>> 2) change the placement of this function from mm.h to mm/internal.h >>>>> 3) folio_set_order() will set both _folio_order and _folio_nr_pages and handle the zero order case correctly. >>>>> 4) remove the comment about hugetlb's specific use for zero orders >>>>> 5) improve the style of folio_set_order() by removing ifdefs from inside the function to doing >>>>> >>>>> #ifdef CONFIG_64BIT >>>>> static inline void folio_set_order(struct folio *folio, >>>>> unsigned int order) >>>>> { >>>>> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >>>> >>>> Sounds good, except for this part: why is a function named >>>> folio_set_order() BUG-ing on a non-large folio? The naming >>>> is still wrong, perhaps? >>>> >>> >>> This is because the _folio_nr_pages and _folio_order fields are part of the first tail page in the folio. folio_test_large returns if the folio is larger than one page which would be required for setting the fields. >> >> OK, but then as I said, the name is wrong. One can either: >> >> a) handle the non-large case, or >> >> b) rename the function to indicate that it only works on large folios. >> > > Discussed here[1], the BUG_ON line seemed more appropriate over > > if (!folio_test_large(folio)) > return; > > as the misuse would not be silent. I think I would be against renaming the function as I don't see any large folio specific function names for other accessors of tail page fields. Would both the BUG_ON and return on non-large folio be included then? Actually, if you want the "misuse to not be silent", today's guidelines would point more toward WARN and return, instead of BUG, btw. I don't think that a survey of existing names is really the final word on what to name this. Names should be accurate, even if other names are less so. How about something like: large_folio_set_order() ? > > > [1]: https://lore.kernel.org/linux-mm/20221129225039.82257-1-sidhartha.kumar@oracle.com/T/#m98cf80bb21ae533b7385f2e363c602e2c9e2802d >> >> thanks, > > thanks,
> On Dec 9, 2022, at 06:39, John Hubbard <jhubbard@nvidia.com> wrote: > > On 12/8/22 14:33, Sidhartha Kumar wrote: >> On 12/8/22 2:14 PM, John Hubbard wrote: >>> On 12/8/22 14:12, Sidhartha Kumar wrote: >>>> On 12/8/22 2:01 PM, John Hubbard wrote: >>>>> On 12/8/22 13:58, Sidhartha Kumar wrote: >>>>>> Thanks John, Mike, Matthew, and Muchun for the feedback. >>>>>> >>>>>> To summarize this discussion and outline the next version of this patch, the changes I'll make include: >>>>>> >>>>>> 1) change the name of folio_set_compound_order() to folio_set_order() >>>>>> 2) change the placement of this function from mm.h to mm/internal.h >>>>>> 3) folio_set_order() will set both _folio_order and _folio_nr_pages and handle the zero order case correctly. >>>>>> 4) remove the comment about hugetlb's specific use for zero orders >>>>>> 5) improve the style of folio_set_order() by removing ifdefs from inside the function to doing >>>>>> >>>>>> #ifdef CONFIG_64BIT >>>>>> static inline void folio_set_order(struct folio *folio, >>>>>> unsigned int order) >>>>>> { >>>>>> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >>>>> >>>>> Sounds good, except for this part: why is a function named >>>>> folio_set_order() BUG-ing on a non-large folio? The naming >>>>> is still wrong, perhaps? >>>>> >>>> >>>> This is because the _folio_nr_pages and _folio_order fields are part of the first tail page in the folio. folio_test_large returns if the folio is larger than one page which would be required for setting the fields. >>> >>> OK, but then as I said, the name is wrong. One can either: >>> >>> a) handle the non-large case, or >>> >>> b) rename the function to indicate that it only works on large folios. >>> >> Discussed here[1], the BUG_ON line seemed more appropriate over >> if (!folio_test_large(folio)) >> return; >> as the misuse would not be silent. I think I would be against renaming the function as I don't see any large folio specific function names for other accessors of tail page fields. Would both the BUG_ON and return on non-large folio be included then? > > Actually, if you want the "misuse to not be silent", today's guidelines > would point more toward WARN and return, instead of BUG, btw. From you advise, I think we can remove VM_BUG_ON and handle non-zero order page, something like: static inline void folio_set_order(struct folio *folio, unsigned int order) { if (!folio_test_large(folio)) { WARN_ON(order); return; } folio->_folio_order = order; #ifdef CONFIG_64BIT folio->_folio_nr_pages = order ? 1U << order : 0; #endif } In this case, 1) we can handle both non-zero and zero (folio_order() works as well for this case) order page. 2) it can prevent OOB for non-large folio and warn unexpected users. 3) Do not BUG. 4) No need to rename folio_set_order. What do you think? Thanks. > > I don't think that a survey of existing names is really the final word on what > to name this. Names should be accurate, even if other names are less so. How > about something like: > > large_folio_set_order() > > ? > >> [1]: https://lore.kernel.org/linux-mm/20221129225039.82257-1-sidhartha.kumar@oracle.com/T/#m98cf80bb21ae533b7385f2e363c602e2c9e2802d >>> >>> thanks, > > thanks, > -- > John Hubbard > NVIDIA
On 12/9/22 06:27, Muchun Song wrote: > From you advise, I think we can remove VM_BUG_ON and handle non-zero > order page, something like: Yes, and thanks for summarizing all the individual feedback into a proposed solution. If we go this route, then I'd suggest a little note above the function, such as: /* * For non-large folios, this will have no effect, other than possibly * generating a warning, if the caller attempts to set a non-zero folio order * for a non-large folio. */ > static inline void folio_set_order(struct folio *folio, > unsigned int order) > { > if (!folio_test_large(folio)) { > WARN_ON(order); Better make that a WARN_ON_ONCE(), to avoid taking the machine down with excessive warnings in the log. > return; > } > > folio->_folio_order = order; > #ifdef CONFIG_64BIT > folio->_folio_nr_pages = order ? 1U << order : 0; > #endif > } > > In this case, > > 1) we can handle both non-zero and zero (folio_order() works as well > for this case) order page. > 2) it can prevent OOB for non-large folio and warn unexpected users. > 3) Do not BUG. > 4) No need to rename folio_set_order. > > What do you think? If the new behavior is OK with everyone, it seems good to me. thanks,
On 12/9/22 13:10, John Hubbard wrote: > On 12/9/22 06:27, Muchun Song wrote: >> From you advise, I think we can remove VM_BUG_ON and handle non-zero >> order page, something like: > > Yes, and thanks for summarizing all the individual feedback into a > proposed solution. > > If we go this route, then I'd suggest a little note above the function, > such as: > > /* > * For non-large folios, this will have no effect, other than possibly > * generating a warning, if the caller attempts to set a non-zero folio order > * for a non-large folio. > */ > >> static inline void folio_set_order(struct folio *folio, >> unsigned int order) >> { >> if (!folio_test_large(folio)) { >> WARN_ON(order); Although, on second thought...I'm still a little confused about why keeping the same name is so important? A very direct approach that has more accurate naming (and therefore no need for a strange comment explaining the behavior) would be: static inline void large_folio_set_order(struct folio *folio, unsigned int order) { if (WARN_ON_ONCE(!folio_test_large(folio))) return; folio->_folio_order = order; #ifdef CONFIG_64BIT folio->_folio_nr_pages = order ? 1U << order : 0; #endif } thanks,
> On Dec 10, 2022, at 05:20, John Hubbard <jhubbard@nvidia.com> wrote: > > On 12/9/22 13:10, John Hubbard wrote: >> On 12/9/22 06:27, Muchun Song wrote: >>> From you advise, I think we can remove VM_BUG_ON and handle non-zero >>> order page, something like: >> Yes, and thanks for summarizing all the individual feedback into a >> proposed solution. >> If we go this route, then I'd suggest a little note above the function, >> such as: >> /* >> * For non-large folios, this will have no effect, other than possibly >> * generating a warning, if the caller attempts to set a non-zero folio order >> * for a non-large folio. >> */ >>> static inline void folio_set_order(struct folio *folio, >>> unsigned int order) >>> { >>> if (!folio_test_large(folio)) { >>> WARN_ON(order); > > Although, on second thought...I'm still a little confused about why > keeping the same name is so important? Just my personal preference. I like its simplicity. I'm not against large_folio_set_order, but suggest folio_set_order. Thanks. > > A very direct approach that has more accurate naming (and therefore no > need for a strange comment explaining the behavior) would be: > > > static inline void large_folio_set_order(struct folio *folio, > unsigned int order) > { > if (WARN_ON_ONCE(!folio_test_large(folio))) > return; > > folio->_folio_order = order; > #ifdef CONFIG_64BIT > folio->_folio_nr_pages = order ? 1U << order : 0; > #endif > } > > > thanks, > -- > John Hubbard > NVIDIA
diff --git a/include/linux/mm.h b/include/linux/mm.h index 443d496949a8..cd8508d728f1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -999,9 +999,16 @@ static inline void set_compound_order(struct page *page, unsigned int order) #endif } +/* + * folio_set_compound_order is generally passed a non-zero order to + * initialize a large folio. However, hugetlb code abuses this by + * passing in zero when 'dissolving' a large folio. + */ static inline void folio_set_compound_order(struct folio *folio, unsigned int order) { + VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); + folio->_folio_order = order; #ifdef CONFIG_64BIT folio->_folio_nr_pages = order ? 1U << order : 0;
Document hugetlb's use of a zero compound order so support for zero orders is not removed from folio_set_compound_order(). Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Muchun Song <songmuchun@bytedance.com> --- This can be folded into f2b67a51d0ef6871d4fb0c3e8199f278112bd108 mm: add folio dtor and order setter functions include/linux/mm.h | 7 +++++++ 1 file changed, 7 insertions(+)