diff mbox series

[v4] mm: Override mTHP "enabled" defaults at kernel cmdline

Message ID 20240814020247.67297-1-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series [v4] mm: Override mTHP "enabled" defaults at kernel cmdline | expand

Commit Message

Barry Song Aug. 14, 2024, 2:02 a.m. UTC
From: Ryan Roberts <ryan.roberts@arm.com>

Add thp_anon= cmdline parameter to allow specifying the default
enablement of each supported anon THP size. The parameter accepts the
following format and can be provided multiple times to configure each
size:

thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>

An example:

thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never

See Documentation/admin-guide/mm/transhuge.rst for more details.

Configuring the defaults at boot time is useful to allow early user
space to take advantage of mTHP before its been configured through
sysfs.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 -v4:
 * use bitmap APIs to set and clear bits. thanks very much for
   David's comment!

 .../admin-guide/kernel-parameters.txt         |  9 ++
 Documentation/admin-guide/mm/transhuge.rst    | 37 +++++--
 mm/huge_memory.c                              | 96 ++++++++++++++++++-
 3 files changed, 134 insertions(+), 8 deletions(-)

Comments

Baolin Wang Aug. 14, 2024, 7:53 a.m. UTC | #1
On 2024/8/14 10:02, Barry Song wrote:
> From: Ryan Roberts <ryan.roberts@arm.com>
> 
> Add thp_anon= cmdline parameter to allow specifying the default
> enablement of each supported anon THP size. The parameter accepts the
> following format and can be provided multiple times to configure each
> size:
> 
> thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
> 
> An example:
> 
> thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> 
> See Documentation/admin-guide/mm/transhuge.rst for more details.
> 
> Configuring the defaults at boot time is useful to allow early user
> space to take advantage of mTHP before its been configured through
> sysfs.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>

LGTM. Feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Just a small nit as below.

> ---
>   -v4:
>   * use bitmap APIs to set and clear bits. thanks very much for
>     David's comment!
> 
>   .../admin-guide/kernel-parameters.txt         |  9 ++
>   Documentation/admin-guide/mm/transhuge.rst    | 37 +++++--
>   mm/huge_memory.c                              | 96 ++++++++++++++++++-
>   3 files changed, 134 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f0057bac20fb..d0d141d50638 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6629,6 +6629,15 @@
>   			<deci-seconds>: poll all this frequency
>   			0: no polling (default)
>   
> +	thp_anon=	[KNL]
> +			Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
> +			state is one of "always", "madvise", "never" or "inherit".
> +			Can be used to control the default behavior of the
> +			system with respect to anonymous transparent hugepages.
> +			Can be used multiple times for multiple anon THP sizes.
> +			See Documentation/admin-guide/mm/transhuge.rst for more
> +			details.
> +
>   	threadirqs	[KNL,EARLY]
>   			Force threading of all interrupt handlers except those
>   			marked explicitly IRQF_NO_THREAD.
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 7072469de8a8..528e1a19d63f 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
>   
>   A higher value may increase memory footprint for some workloads.
>   
> -Boot parameter
> -==============
> -
> -You can change the sysfs boot time defaults of Transparent Hugepage
> -Support by passing the parameter ``transparent_hugepage=always`` or
> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> -to the kernel command line.
> +Boot parameters
> +===============
> +
> +You can change the sysfs boot time default for the top-level "enabled"
> +control by passing the parameter ``transparent_hugepage=always`` or
> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> +kernel command line.
> +
> +Alternatively, each supported anonymous THP size can be controlled by
> +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> +where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> +``madvise``, ``never`` or ``inherit``.
> +
> +For example, the following will set 16K, 32K, 64K THP to ``always``,
> +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> +to ``never``::
> +
> +	thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> +
> +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> +not explicitly configured on the command line are implicitly set to
> +``never``.
> +
> +``transparent_hugepage`` setting only affects the global toggle. If
> +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
> +However, if a valid ``thp_anon`` setting is provided by the user, the
> +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
> +is not defined within a valid ``thp_anon``, its policy will default to
> +``never``.
>   
>   Hugepages in tmpfs/shmem
>   ========================
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1a12c011e2df..c5f4e97b49de 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
>   unsigned long huge_anon_orders_always __read_mostly;
>   unsigned long huge_anon_orders_madvise __read_mostly;
>   unsigned long huge_anon_orders_inherit __read_mostly;
> +static bool anon_orders_configured;
>   
>   unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>   					 unsigned long vm_flags,
> @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
>   	 * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
>   	 * constant so we have to do this here.
>   	 */
> -	huge_anon_orders_inherit = BIT(PMD_ORDER);
> +	if (!anon_orders_configured) {
> +		huge_anon_orders_inherit = BIT(PMD_ORDER);
> +		anon_orders_configured = true;
> +	}
>   
>   	*hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
>   	if (unlikely(!*hugepage_kobj)) {
> @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
>   }
>   __setup("transparent_hugepage=", setup_transparent_hugepage);
>   
> +static inline int get_order_from_str(const char *size_str)
> +{
> +	unsigned long size;
> +	char *endptr;
> +	int order;
> +
> +	size = memparse(size_str, &endptr);
> +	order = fls(size >> PAGE_SHIFT) - 1;

Nit: using get_order() seems more robust?

> +	if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> +		pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> +			size_str, order);
> +		return -EINVAL;
> +	}
> +
> +	return order;
> +}
[snip]
Barry Song Aug. 14, 2024, 8:09 a.m. UTC | #2
On Wed, Aug 14, 2024 at 7:53 PM Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2024/8/14 10:02, Barry Song wrote:
> > From: Ryan Roberts <ryan.roberts@arm.com>
> >
> > Add thp_anon= cmdline parameter to allow specifying the default
> > enablement of each supported anon THP size. The parameter accepts the
> > following format and can be provided multiple times to configure each
> > size:
> >
> > thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
> >
> > An example:
> >
> > thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> >
> > See Documentation/admin-guide/mm/transhuge.rst for more details.
> >
> > Configuring the defaults at boot time is useful to allow early user
> > space to take advantage of mTHP before its been configured through
> > sysfs.
> >
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>
> LGTM. Feel free to add:
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>

Thanks, Baolin!

> Just a small nit as below.
>
> > ---
> >   -v4:
> >   * use bitmap APIs to set and clear bits. thanks very much for
> >     David's comment!
> >
> >   .../admin-guide/kernel-parameters.txt         |  9 ++
> >   Documentation/admin-guide/mm/transhuge.rst    | 37 +++++--
> >   mm/huge_memory.c                              | 96 ++++++++++++++++++-
> >   3 files changed, 134 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index f0057bac20fb..d0d141d50638 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -6629,6 +6629,15 @@
> >                       <deci-seconds>: poll all this frequency
> >                       0: no polling (default)
> >   
> > +     thp_anon=       [KNL]
> > +                     Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
> > +                     state is one of "always", "madvise", "never" or "inherit".
> > +                     Can be used to control the default behavior of the
> > +                     system with respect to anonymous transparent hugepages.
> > +                     Can be used multiple times for multiple anon THP sizes.
> > +                     See Documentation/admin-guide/mm/transhuge.rst for more
> > +                     details.
> > +
> >       threadirqs      [KNL,EARLY]
> >                       Force threading of all interrupt handlers except those
> >                       marked explicitly IRQF_NO_THREAD.
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 7072469de8a8..528e1a19d63f 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
> >   
> >   A higher value may increase memory footprint for some workloads.
> >   
> > -Boot parameter
> > -==============
> > -
> > -You can change the sysfs boot time defaults of Transparent Hugepage
> > -Support by passing the parameter ``transparent_hugepage=always`` or
> > -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> > -to the kernel command line.
> > +Boot parameters
> > +===============
> > +
> > +You can change the sysfs boot time default for the top-level "enabled"
> > +control by passing the parameter ``transparent_hugepage=always`` or
> > +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> > +kernel command line.
> > +
> > +Alternatively, each supported anonymous THP size can be controlled by
> > +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> > +where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> > +``madvise``, ``never`` or ``inherit``.
> > +
> > +For example, the following will set 16K, 32K, 64K THP to ``always``,
> > +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> > +to ``never``::
> > +
> > +     thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> > +
> > +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> > +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> > +not explicitly configured on the command line are implicitly set to
> > +``never``.
> > +
> > +``transparent_hugepage`` setting only affects the global toggle. If
> > +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
> > +However, if a valid ``thp_anon`` setting is provided by the user, the
> > +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
> > +is not defined within a valid ``thp_anon``, its policy will default to
> > +``never``.
> >   
> >   Hugepages in tmpfs/shmem
> >   ========================
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 1a12c011e2df..c5f4e97b49de 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> >   unsigned long huge_anon_orders_always __read_mostly;
> >   unsigned long huge_anon_orders_madvise __read_mostly;
> >   unsigned long huge_anon_orders_inherit __read_mostly;
> > +static bool anon_orders_configured;
> >   
> >   unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> >                                        unsigned long vm_flags,
> > @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
> >        * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
> >        * constant so we have to do this here.
> >        */
> > -     huge_anon_orders_inherit = BIT(PMD_ORDER);
> > +     if (!anon_orders_configured) {
> > +             huge_anon_orders_inherit = BIT(PMD_ORDER);
> > +             anon_orders_configured = true;
> > +     }
> >   
> >       *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
> >       if (unlikely(!*hugepage_kobj)) {
> > @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
> >   }
> >   __setup("transparent_hugepage=", setup_transparent_hugepage);
> >   
> > +static inline int get_order_from_str(const char *size_str)
> > +{
> > +     unsigned long size;
> > +     char *endptr;
> > +     int order;
> > +
> > +     size = memparse(size_str, &endptr);
> > +     order = fls(size >> PAGE_SHIFT) - 1;
>
> Nit: using get_order() seems more robust?

Yes. I agree get_order() is better:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c5f4e97b49de..0f398d0dbaad 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -933,7 +933,7 @@ static inline int get_order_from_str(const char *size_str)
 	int order;
 
 	size = memparse(size_str, &endptr);
-	order = fls(size >> PAGE_SHIFT) - 1;
+	order = get_order(size);
 	if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
 		pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
 			size_str, order);
>
> > +     if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> > +             pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> > +                     size_str, order);
> > +             return -EINVAL;
> > +     }
> > +
> > +     return order;
> > +}
> [snip]
David Hildenbrand Aug. 14, 2024, 8:18 a.m. UTC | #3
On 14.08.24 04:02, Barry Song wrote:
> From: Ryan Roberts <ryan.roberts@arm.com>
> 
> Add thp_anon= cmdline parameter to allow specifying the default
> enablement of each supported anon THP size. The parameter accepts the
> following format and can be provided multiple times to configure each
> size:
> 
> thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
> 
> An example:
> 
> thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> 
> See Documentation/admin-guide/mm/transhuge.rst for more details.
> 
> Configuring the defaults at boot time is useful to allow early user
> space to take advantage of mTHP before its been configured through
> sysfs.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>   -v4:
>   * use bitmap APIs to set and clear bits. thanks very much for
>     David's comment!
> 
>   .../admin-guide/kernel-parameters.txt         |  9 ++
>   Documentation/admin-guide/mm/transhuge.rst    | 37 +++++--
>   mm/huge_memory.c                              | 96 ++++++++++++++++++-
>   3 files changed, 134 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f0057bac20fb..d0d141d50638 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6629,6 +6629,15 @@
>   			<deci-seconds>: poll all this frequency
>   			0: no polling (default)
>   
> +	thp_anon=	[KNL]
> +			Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
> +			state is one of "always", "madvise", "never" or "inherit".
> +			Can be used to control the default behavior of the
> +			system with respect to anonymous transparent hugepages.
> +			Can be used multiple times for multiple anon THP sizes.
> +			See Documentation/admin-guide/mm/transhuge.rst for more
> +			details.
> +
>   	threadirqs	[KNL,EARLY]
>   			Force threading of all interrupt handlers except those
>   			marked explicitly IRQF_NO_THREAD.
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 7072469de8a8..528e1a19d63f 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
>   
>   A higher value may increase memory footprint for some workloads.
>   
> -Boot parameter
> -==============
> -
> -You can change the sysfs boot time defaults of Transparent Hugepage
> -Support by passing the parameter ``transparent_hugepage=always`` or
> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> -to the kernel command line.
> +Boot parameters
> +===============
> +
> +You can change the sysfs boot time default for the top-level "enabled"
> +control by passing the parameter ``transparent_hugepage=always`` or
> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> +kernel command line.
> +
> +Alternatively, each supported anonymous THP size can be controlled by
> +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> +where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> +``madvise``, ``never`` or ``inherit``.
> +
> +For example, the following will set 16K, 32K, 64K THP to ``always``,
> +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> +to ``never``::
> +
> +	thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> +
> +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> +not explicitly configured on the command line are implicitly set to
> +``never``.
> +
> +``transparent_hugepage`` setting only affects the global toggle. If
> +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
> +However, if a valid ``thp_anon`` setting is provided by the user, the
> +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
> +is not defined within a valid ``thp_anon``, its policy will default to
> +``never``.
>   
>   Hugepages in tmpfs/shmem
>   ========================
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1a12c011e2df..c5f4e97b49de 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
>   unsigned long huge_anon_orders_always __read_mostly;
>   unsigned long huge_anon_orders_madvise __read_mostly;
>   unsigned long huge_anon_orders_inherit __read_mostly;
> +static bool anon_orders_configured;
>   
>   unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>   					 unsigned long vm_flags,
> @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
>   	 * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
>   	 * constant so we have to do this here.
>   	 */
> -	huge_anon_orders_inherit = BIT(PMD_ORDER);
> +	if (!anon_orders_configured) {
> +		huge_anon_orders_inherit = BIT(PMD_ORDER);
> +		anon_orders_configured = true;
> +	}
>   
>   	*hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
>   	if (unlikely(!*hugepage_kobj)) {
> @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
>   }
>   __setup("transparent_hugepage=", setup_transparent_hugepage);
>   
> +static inline int get_order_from_str(const char *size_str)
> +{
> +	unsigned long size;
> +	char *endptr;
> +	int order;
> +
> +	size = memparse(size_str, &endptr);

Do we have to also test if is_power_of_2(), and refuse if not? For 
example, what if someone would pass 3K, would the existing check catch it?

> +	order = fls(size >> PAGE_SHIFT) - 1;

Is this a fancy way of writing

order = log2(size >> PAGE_SHIFT);

? :)

Anyhow, if get_order() wraps that, all good.

> +	if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> +		pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> +			size_str, order);
> +		return -EINVAL;
> +	}
> +
> +	return order;
> +}

Apart from that, nothing jumped at me.
Barry Song Aug. 14, 2024, 8:54 a.m. UTC | #4
On Wed, Aug 14, 2024 at 8:18 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 14.08.24 04:02, Barry Song wrote:
> > From: Ryan Roberts <ryan.roberts@arm.com>
> >
> > Add thp_anon= cmdline parameter to allow specifying the default
> > enablement of each supported anon THP size. The parameter accepts the
> > following format and can be provided multiple times to configure each
> > size:
> >
> > thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
> >
> > An example:
> >
> > thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> >
> > See Documentation/admin-guide/mm/transhuge.rst for more details.
> >
> > Configuring the defaults at boot time is useful to allow early user
> > space to take advantage of mTHP before its been configured through
> > sysfs.
> >
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> >   -v4:
> >   * use bitmap APIs to set and clear bits. thanks very much for
> >     David's comment!
> >
> >   .../admin-guide/kernel-parameters.txt         |  9 ++
> >   Documentation/admin-guide/mm/transhuge.rst    | 37 +++++--
> >   mm/huge_memory.c                              | 96 ++++++++++++++++++-
> >   3 files changed, 134 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index f0057bac20fb..d0d141d50638 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -6629,6 +6629,15 @@
> >                       <deci-seconds>: poll all this frequency
> >                       0: no polling (default)
> >  
> > +     thp_anon=       [KNL]
> > +                     Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
> > +                     state is one of "always", "madvise", "never" or "inherit".
> > +                     Can be used to control the default behavior of the
> > +                     system with respect to anonymous transparent hugepages.
> > +                     Can be used multiple times for multiple anon THP sizes.
> > +                     See Documentation/admin-guide/mm/transhuge.rst for more
> > +                     details.
> > +
> >       threadirqs      [KNL,EARLY]
> >                       Force threading of all interrupt handlers except those
> >                       marked explicitly IRQF_NO_THREAD.
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 7072469de8a8..528e1a19d63f 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse::
> >  
> >   A higher value may increase memory footprint for some workloads.
> >  
> > -Boot parameter
> > -==============
> > -
> > -You can change the sysfs boot time defaults of Transparent Hugepage
> > -Support by passing the parameter ``transparent_hugepage=always`` or
> > -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> > -to the kernel command line.
> > +Boot parameters
> > +===============
> > +
> > +You can change the sysfs boot time default for the top-level "enabled"
> > +control by passing the parameter ``transparent_hugepage=always`` or
> > +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> > +kernel command line.
> > +
> > +Alternatively, each supported anonymous THP size can be controlled by
> > +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> > +where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> > +``madvise``, ``never`` or ``inherit``.
> > +
> > +For example, the following will set 16K, 32K, 64K THP to ``always``,
> > +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> > +to ``never``::
> > +
> > +     thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
> > +
> > +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> > +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> > +not explicitly configured on the command line are implicitly set to
> > +``never``.
> > +
> > +``transparent_hugepage`` setting only affects the global toggle. If
> > +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
> > +However, if a valid ``thp_anon`` setting is provided by the user, the
> > +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
> > +is not defined within a valid ``thp_anon``, its policy will default to
> > +``never``.
> >  
> >   Hugepages in tmpfs/shmem
> >   ========================
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 1a12c011e2df..c5f4e97b49de 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> >   unsigned long huge_anon_orders_always __read_mostly;
> >   unsigned long huge_anon_orders_madvise __read_mostly;
> >   unsigned long huge_anon_orders_inherit __read_mostly;
> > +static bool anon_orders_configured;
> >  
> >   unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> >                                        unsigned long vm_flags,
> > @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
> >        * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
> >        * constant so we have to do this here.
> >        */
> > -     huge_anon_orders_inherit = BIT(PMD_ORDER);
> > +     if (!anon_orders_configured) {
> > +             huge_anon_orders_inherit = BIT(PMD_ORDER);
> > +             anon_orders_configured = true;
> > +     }
> >  
> >       *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
> >       if (unlikely(!*hugepage_kobj)) {
> > @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str)
> >   }
> >   __setup("transparent_hugepage=", setup_transparent_hugepage);
> >  
> > +static inline int get_order_from_str(const char *size_str)
> > +{
> > +     unsigned long size;
> > +     char *endptr;
> > +     int order;
> > +
> > +     size = memparse(size_str, &endptr);
>
> Do we have to also test if is_power_of_2(), and refuse if not? For
> example, what if someone would pass 3K, would the existing check catch it?

no, the existing check can't catch it.

I passed thp_anon=15K-64K:always, then I got 16K enabled:

/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
[always] inherit madvise never

I can actually check that by:

static inline int get_order_from_str(const char *size_str)
{
	unsigned long size;
	char *endptr;
	int order;

	size = memparse(size_str, &endptr);

	if (!is_power_of_2(size >> PAGE_SHIFT))
		goto err;
	order = get_order(size);
	if ((1 << order) & ~THP_ORDERS_ALL_ANON)
		goto err;

	return order;
err:
	pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
	return -EINVAL;
}

>
> > +     order = fls(size >> PAGE_SHIFT) - 1;
>
> Is this a fancy way of writing
>
> order = log2(size >> PAGE_SHIFT);
>
> ? :)

I think ilog2 is implemented by fls ?

>
> Anyhow, if get_order() wraps that, all good.

I guess it doesn't check power of 2?

>
> > +     if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> > +             pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> > +                     size_str, order);
> > +             return -EINVAL;
> > +     }
> > +
> > +     return order;
> > +}
>
> Apart from that, nothing jumped at me.

Please take a look at the new get_order_from_str() before I
send v5 :-)

>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
Barry Song Aug. 14, 2024, 10:46 p.m. UTC | #5
On Wed, Aug 14, 2024 at 2:03 PM Barry Song <21cnbao@gmail.com> wrote:
>
> From: Ryan Roberts <ryan.roberts@arm.com>
>
> Add thp_anon= cmdline parameter to allow specifying the default
> enablement of each supported anon THP size. The parameter accepts the
> following format and can be provided multiple times to configure each
> size:
>
> thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value>
>
> An example:
>
> thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
>
> See Documentation/admin-guide/mm/transhuge.rst for more details.
>
> Configuring the defaults at boot time is useful to allow early user
> space to take advantage of mTHP before its been configured through
> sysfs.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---

Hi Andrew,

I saw you have pulled v4. Thanks!

Can you please squash the below changes suggested by Baolin and David?

From af42aa80f45d89798027e44a8711f7737e08b115 Mon Sep 17 00:00:00 2001
From: Barry Song <v-songbaohua@oppo.com>
Date: Thu, 15 Aug 2024 10:34:16 +1200
Subject: [PATCH] mm: use get_oder() and check size is is_power_of_2

Using get_order() is more robust according to Baolin.
It is also better to filter illegal size such as 3KB,
16KB according to David.

Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/huge_memory.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 01beda16aece..d6dade8ac5f6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -952,14 +952,17 @@ static inline int get_order_from_str(const char *size_str)
 	int order;
 
 	size = memparse(size_str, &endptr);
-	order = fls(size >> PAGE_SHIFT) - 1;
-	if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
-		pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
-			size_str, order);
-		return -EINVAL;
-	}
+
+	if (!is_power_of_2(size >> PAGE_SHIFT))
+		goto err;
+	order = get_order(size);
+	if ((1 << order) & ~THP_ORDERS_ALL_ANON)
+		goto err;
 
 	return order;
+err:
+	pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
+	return -EINVAL;
 }
 
 static char str_dup[PAGE_SIZE] __meminitdata;
David Hildenbrand Aug. 15, 2024, 10:26 a.m. UTC | #6
>>> +static inline int get_order_from_str(const char *size_str)
>>> +{
>>> +     unsigned long size;
>>> +     char *endptr;
>>> +     int order;
>>> +
>>> +     size = memparse(size_str, &endptr);
>>
>> Do we have to also test if is_power_of_2(), and refuse if not? For
>> example, what if someone would pass 3K, would the existing check catch it?
> 
> no, the existing check can't catch it.
> 
> I passed thp_anon=15K-64K:always, then I got 16K enabled:
> 
> / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
> [always] inherit madvise never
> 

Okay, so we should document then that start/end of the range must be 
valid THP sizes.

> I can actually check that by:
> 
> static inline int get_order_from_str(const char *size_str)
> {
> 	unsigned long size;
> 	char *endptr;
> 	int order;
> 
> 	size = memparse(size_str, &endptr);
> 
> 	if (!is_power_of_2(size >> PAGE_SHIFT))

No need for the shift.

if (!is_power_of_2(size))

Is likely even more correct if someone would manage to pass something 
stupid like

16385 (16K + 1)

> 		goto err;
> 	order = get_order(size);
> 	if ((1 << order) & ~THP_ORDERS_ALL_ANON)
> 		goto err;
> 
> 	return order;
> err:
> 	pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
> 	return -EINVAL;
> }
> 
>>
>>> +     order = fls(size >> PAGE_SHIFT) - 1;
>>
>> Is this a fancy way of writing
>>
>> order = log2(size >> PAGE_SHIFT);
>>
>> ? :)
> 
> I think ilog2 is implemented by fls ?

Yes, so we should have used that instead. But get_order()
is even better.

> 
>>
>> Anyhow, if get_order() wraps that, all good.
> 
> I guess it doesn't check power of 2?
> 
>>
>>> +     if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
>>> +             pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
>>> +                     size_str, order);
>>> +             return -EINVAL;
>>> +     }
>>> +
>>> +     return order;
>>> +}
>>
>> Apart from that, nothing jumped at me.
> 
> Please take a look at the new get_order_from_str() before I
> send v5 :-)

Besides the shift for is_power_of_2(), LGTM, thanks!
Barry Song Aug. 15, 2024, 11:50 p.m. UTC | #7
On Thu, Aug 15, 2024 at 10:26 PM David Hildenbrand <david@redhat.com> wrote:
>
> >>> +static inline int get_order_from_str(const char *size_str)
> >>> +{
> >>> +     unsigned long size;
> >>> +     char *endptr;
> >>> +     int order;
> >>> +
> >>> +     size = memparse(size_str, &endptr);
> >>
> >> Do we have to also test if is_power_of_2(), and refuse if not? For
> >> example, what if someone would pass 3K, would the existing check catch it?
> >
> > no, the existing check can't catch it.
> >
> > I passed thp_anon=15K-64K:always, then I got 16K enabled:
> >
> > / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
> > [always] inherit madvise never
> >
>
> Okay, so we should document then that start/end of the range must be
> valid THP sizes.

Ack

>
> > I can actually check that by:
> >
> > static inline int get_order_from_str(const char *size_str)
> > {
> >       unsigned long size;
> >       char *endptr;
> >       int order;
> >
> >       size = memparse(size_str, &endptr);
> >
> >       if (!is_power_of_2(size >> PAGE_SHIFT))
>
> No need for the shift.
>
> if (!is_power_of_2(size))
>
> Is likely even more correct if someone would manage to pass something
> stupid like
>
> 16385 (16K + 1)

Ack

>
> >               goto err;
> >       order = get_order(size);
> >       if ((1 << order) & ~THP_ORDERS_ALL_ANON)
> >               goto err;
> >
> >       return order;
> > err:
> >       pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
> >       return -EINVAL;
> > }
> >
> >>
> >>> +     order = fls(size >> PAGE_SHIFT) - 1;
> >>
> >> Is this a fancy way of writing
> >>
> >> order = log2(size >> PAGE_SHIFT);
> >>
> >> ? :)
> >
> > I think ilog2 is implemented by fls ?
>
> Yes, so we should have used that instead. But get_order()
> is even better.
>
> >
> >>
> >> Anyhow, if get_order() wraps that, all good.
> >
> > I guess it doesn't check power of 2?
> >
> >>
> >>> +     if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> >>> +             pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> >>> +                     size_str, order);
> >>> +             return -EINVAL;
> >>> +     }
> >>> +
> >>> +     return order;
> >>> +}
> >>
> >> Apart from that, nothing jumped at me.
> >
> > Please take a look at the new get_order_from_str() before I
> > send v5 :-)
>
> Besides the shift for is_power_of_2(), LGTM, thanks!

Thanks, David!

Hi Andrew,

Apologies for sending another squash request. If you'd
prefer me to send a new v5 that includes all the changes,
please let me know.


Don't shift the size, as it can still detect invalid sizes
like 16K+1. Also, document that the size must be a valid THP
size.

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 15404f06eefd..4468851b6ecb 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -294,8 +294,9 @@ kernel command line.
 
 Alternatively, each supported anonymous THP size can be controlled by
 passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
-where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
-``madvise``, ``never`` or ``inherit``.
+where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
+supported anonymous THP)  and ``<state>`` is one of ``always``, ``madvise``,
+``never`` or ``inherit``.
 
 For example, the following will set 16K, 32K, 64K THP to ``always``,
 set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d6dade8ac5f6..903b47f2b2db 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -953,7 +953,7 @@ static inline int get_order_from_str(const char *size_str)
 
 	size = memparse(size_str, &endptr);
 
-	if (!is_power_of_2(size >> PAGE_SHIFT))
+	if (!is_power_of_2(size))
 		goto err;
 	order = get_order(size);
 	if ((1 << order) & ~THP_ORDERS_ALL_ANON)
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
David Hildenbrand Aug. 16, 2024, 9:33 a.m. UTC | #8
On 16.08.24 01:50, Barry Song wrote:
> On Thu, Aug 15, 2024 at 10:26 PM David Hildenbrand <david@redhat.com> wrote:
>>
>>>>> +static inline int get_order_from_str(const char *size_str)
>>>>> +{
>>>>> +     unsigned long size;
>>>>> +     char *endptr;
>>>>> +     int order;
>>>>> +
>>>>> +     size = memparse(size_str, &endptr);
>>>>
>>>> Do we have to also test if is_power_of_2(), and refuse if not? For
>>>> example, what if someone would pass 3K, would the existing check catch it?
>>>
>>> no, the existing check can't catch it.
>>>
>>> I passed thp_anon=15K-64K:always, then I got 16K enabled:
>>>
>>> / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
>>> [always] inherit madvise never
>>>
>>
>> Okay, so we should document then that start/end of the range must be
>> valid THP sizes.
> 
> Ack
> 
>>
>>> I can actually check that by:
>>>
>>> static inline int get_order_from_str(const char *size_str)
>>> {
>>>        unsigned long size;
>>>        char *endptr;
>>>        int order;
>>>
>>>        size = memparse(size_str, &endptr);
>>>
>>>        if (!is_power_of_2(size >> PAGE_SHIFT))
>>
>> No need for the shift.
>>
>> if (!is_power_of_2(size))
>>
>> Is likely even more correct if someone would manage to pass something
>> stupid like
>>
>> 16385 (16K + 1)
> 
> Ack
> 
>>
>>>                goto err;
>>>        order = get_order(size);
>>>        if ((1 << order) & ~THP_ORDERS_ALL_ANON)
>>>                goto err;
>>>
>>>        return order;
>>> err:
>>>        pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
>>>        return -EINVAL;
>>> }
>>>
>>>>
>>>>> +     order = fls(size >> PAGE_SHIFT) - 1;
>>>>
>>>> Is this a fancy way of writing
>>>>
>>>> order = log2(size >> PAGE_SHIFT);
>>>>
>>>> ? :)
>>>
>>> I think ilog2 is implemented by fls ?
>>
>> Yes, so we should have used that instead. But get_order()
>> is even better.
>>
>>>
>>>>
>>>> Anyhow, if get_order() wraps that, all good.
>>>
>>> I guess it doesn't check power of 2?
>>>
>>>>
>>>>> +     if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
>>>>> +             pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
>>>>> +                     size_str, order);
>>>>> +             return -EINVAL;
>>>>> +     }
>>>>> +
>>>>> +     return order;
>>>>> +}
>>>>
>>>> Apart from that, nothing jumped at me.
>>>
>>> Please take a look at the new get_order_from_str() before I
>>> send v5 :-)
>>
>> Besides the shift for is_power_of_2(), LGTM, thanks!
> 
> Thanks, David!
> 
> Hi Andrew,
> 
> Apologies for sending another squash request. If you'd
> prefer me to send a new v5 that includes all the changes,
> please let me know.
> 
> 
> Don't shift the size, as it can still detect invalid sizes
> like 16K+1. Also, document that the size must be a valid THP
> size.
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 15404f06eefd..4468851b6ecb 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -294,8 +294,9 @@ kernel command line.
>   
>   Alternatively, each supported anonymous THP size can be controlled by
>   passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> -where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> -``madvise``, ``never`` or ``inherit``.
> +where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
> +supported anonymous THP)  and ``<state>`` is one of ``always``, ``madvise``,
> +``never`` or ``inherit``.
>   
>   For example, the following will set 16K, 32K, 64K THP to ``always``,
>   set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d6dade8ac5f6..903b47f2b2db 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -953,7 +953,7 @@ static inline int get_order_from_str(const char *size_str)
>   
>   	size = memparse(size_str, &endptr);
>   
> -	if (!is_power_of_2(size >> PAGE_SHIFT))
> +	if (!is_power_of_2(size))
>   		goto err;


Reading your documentation above, do we also want to test "if (size < 
PAGE_SIZE)", or is that implicitly covered? (likely not I assume?)

I assume it's implicitly covered: if we pass "1k" , it would be mapped 
to "4k" (order-0) and that is not a valid mTHP size, right?

I would appreciate a quick v5, just so can see the final result more 
easily :)
Barry Song Aug. 16, 2024, 9:47 a.m. UTC | #9
On Fri, Aug 16, 2024 at 9:33 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 16.08.24 01:50, Barry Song wrote:
> > On Thu, Aug 15, 2024 at 10:26 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >>>>> +static inline int get_order_from_str(const char *size_str)
> >>>>> +{
> >>>>> +     unsigned long size;
> >>>>> +     char *endptr;
> >>>>> +     int order;
> >>>>> +
> >>>>> +     size = memparse(size_str, &endptr);
> >>>>
> >>>> Do we have to also test if is_power_of_2(), and refuse if not? For
> >>>> example, what if someone would pass 3K, would the existing check catch it?
> >>>
> >>> no, the existing check can't catch it.
> >>>
> >>> I passed thp_anon=15K-64K:always, then I got 16K enabled:
> >>>
> >>> / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
> >>> [always] inherit madvise never
> >>>
> >>
> >> Okay, so we should document then that start/end of the range must be
> >> valid THP sizes.
> >
> > Ack
> >
> >>
> >>> I can actually check that by:
> >>>
> >>> static inline int get_order_from_str(const char *size_str)
> >>> {
> >>>        unsigned long size;
> >>>        char *endptr;
> >>>        int order;
> >>>
> >>>        size = memparse(size_str, &endptr);
> >>>
> >>>        if (!is_power_of_2(size >> PAGE_SHIFT))
> >>
> >> No need for the shift.
> >>
> >> if (!is_power_of_2(size))
> >>
> >> Is likely even more correct if someone would manage to pass something
> >> stupid like
> >>
> >> 16385 (16K + 1)
> >
> > Ack
> >
> >>
> >>>                goto err;
> >>>        order = get_order(size);
> >>>        if ((1 << order) & ~THP_ORDERS_ALL_ANON)
> >>>                goto err;
> >>>
> >>>        return order;
> >>> err:
> >>>        pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
> >>>        return -EINVAL;
> >>> }
> >>>
> >>>>
> >>>>> +     order = fls(size >> PAGE_SHIFT) - 1;
> >>>>
> >>>> Is this a fancy way of writing
> >>>>
> >>>> order = log2(size >> PAGE_SHIFT);
> >>>>
> >>>> ? :)
> >>>
> >>> I think ilog2 is implemented by fls ?
> >>
> >> Yes, so we should have used that instead. But get_order()
> >> is even better.
> >>
> >>>
> >>>>
> >>>> Anyhow, if get_order() wraps that, all good.
> >>>
> >>> I guess it doesn't check power of 2?
> >>>
> >>>>
> >>>>> +     if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
> >>>>> +             pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
> >>>>> +                     size_str, order);
> >>>>> +             return -EINVAL;
> >>>>> +     }
> >>>>> +
> >>>>> +     return order;
> >>>>> +}
> >>>>
> >>>> Apart from that, nothing jumped at me.
> >>>
> >>> Please take a look at the new get_order_from_str() before I
> >>> send v5 :-)
> >>
> >> Besides the shift for is_power_of_2(), LGTM, thanks!
> >
> > Thanks, David!
> >
> > Hi Andrew,
> >
> > Apologies for sending another squash request. If you'd
> > prefer me to send a new v5 that includes all the changes,
> > please let me know.
> >
> >
> > Don't shift the size, as it can still detect invalid sizes
> > like 16K+1. Also, document that the size must be a valid THP
> > size.
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 15404f06eefd..4468851b6ecb 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -294,8 +294,9 @@ kernel command line.
> >
> >   Alternatively, each supported anonymous THP size can be controlled by
> >   passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
> > -where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
> > -``madvise``, ``never`` or ``inherit``.
> > +where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
> > +supported anonymous THP)  and ``<state>`` is one of ``always``, ``madvise``,
> > +``never`` or ``inherit``.
> >
> >   For example, the following will set 16K, 32K, 64K THP to ``always``,
> >   set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index d6dade8ac5f6..903b47f2b2db 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -953,7 +953,7 @@ static inline int get_order_from_str(const char *size_str)
> >
> >       size = memparse(size_str, &endptr);
> >
> > -     if (!is_power_of_2(size >> PAGE_SHIFT))
> > +     if (!is_power_of_2(size))
> >               goto err;
>
>
> Reading your documentation above, do we also want to test "if (size <
> PAGE_SIZE)", or is that implicitly covered? (likely not I assume?)

as we also check the order is valid. so size <PAGE_SIZE will get invalid
order.

static inline int get_order_from_str(const char *size_str)
{
        unsigned long size;
        char *endptr;
        int order;

        size = memparse(size_str, &endptr);

        if (!is_power_of_2(size >> PAGE_SHIFT))
                goto err;
        order = get_order(size);
        if ((1 << order) & ~THP_ORDERS_ALL_ANON)
                goto err;

        return order;
err:

        pr_err("invalid size %s in thp_anon boot parameter\n", size_str);
        return -EINVAL;
}

>
> I assume it's implicitly covered: if we pass "1k" , it would be mapped
> to "4k" (order-0) and that is not a valid mTHP size, right?
>
> I would appreciate a quick v5, just so can see the final result more
> easily :)

sure.

>
> --
> Cheers,
>
> David / dhildenb
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f0057bac20fb..d0d141d50638 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6629,6 +6629,15 @@ 
 			<deci-seconds>: poll all this frequency
 			0: no polling (default)
 
+	thp_anon=	[KNL]
+			Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
+			state is one of "always", "madvise", "never" or "inherit".
+			Can be used to control the default behavior of the
+			system with respect to anonymous transparent hugepages.
+			Can be used multiple times for multiple anon THP sizes.
+			See Documentation/admin-guide/mm/transhuge.rst for more
+			details.
+
 	threadirqs	[KNL,EARLY]
 			Force threading of all interrupt handlers except those
 			marked explicitly IRQF_NO_THREAD.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 7072469de8a8..528e1a19d63f 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -284,13 +284,36 @@  that THP is shared. Exceeding the number would block the collapse::
 
 A higher value may increase memory footprint for some workloads.
 
-Boot parameter
-==============
-
-You can change the sysfs boot time defaults of Transparent Hugepage
-Support by passing the parameter ``transparent_hugepage=always`` or
-``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
-to the kernel command line.
+Boot parameters
+===============
+
+You can change the sysfs boot time default for the top-level "enabled"
+control by passing the parameter ``transparent_hugepage=always`` or
+``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
+kernel command line.
+
+Alternatively, each supported anonymous THP size can be controlled by
+passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
+where ``<size>`` is the THP size and ``<state>`` is one of ``always``,
+``madvise``, ``never`` or ``inherit``.
+
+For example, the following will set 16K, 32K, 64K THP to ``always``,
+set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
+to ``never``::
+
+	thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
+
+``thp_anon=`` may be specified multiple times to configure all THP sizes as
+required. If ``thp_anon=`` is specified at least once, any anon THP sizes
+not explicitly configured on the command line are implicitly set to
+``never``.
+
+``transparent_hugepage`` setting only affects the global toggle. If
+``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
+However, if a valid ``thp_anon`` setting is provided by the user, the
+PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
+is not defined within a valid ``thp_anon``, its policy will default to
+``never``.
 
 Hugepages in tmpfs/shmem
 ========================
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a12c011e2df..c5f4e97b49de 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -81,6 +81,7 @@  unsigned long huge_zero_pfn __read_mostly = ~0UL;
 unsigned long huge_anon_orders_always __read_mostly;
 unsigned long huge_anon_orders_madvise __read_mostly;
 unsigned long huge_anon_orders_inherit __read_mostly;
+static bool anon_orders_configured;
 
 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
 					 unsigned long vm_flags,
@@ -737,7 +738,10 @@  static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
 	 * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
 	 * constant so we have to do this here.
 	 */
-	huge_anon_orders_inherit = BIT(PMD_ORDER);
+	if (!anon_orders_configured) {
+		huge_anon_orders_inherit = BIT(PMD_ORDER);
+		anon_orders_configured = true;
+	}
 
 	*hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
 	if (unlikely(!*hugepage_kobj)) {
@@ -922,6 +926,96 @@  static int __init setup_transparent_hugepage(char *str)
 }
 __setup("transparent_hugepage=", setup_transparent_hugepage);
 
+static inline int get_order_from_str(const char *size_str)
+{
+	unsigned long size;
+	char *endptr;
+	int order;
+
+	size = memparse(size_str, &endptr);
+	order = fls(size >> PAGE_SHIFT) - 1;
+	if ((1 << order) & ~THP_ORDERS_ALL_ANON) {
+		pr_err("invalid size %s(order %d) in thp_anon boot parameter\n",
+			size_str, order);
+		return -EINVAL;
+	}
+
+	return order;
+}
+
+static char str_dup[PAGE_SIZE] __meminitdata;
+static int __init setup_thp_anon(char *str)
+{
+	char *token, *range, *policy, *subtoken;
+	unsigned long always, inherit, madvise;
+	char *start_size, *end_size;
+	int start, end;
+	char *p;
+
+	if (!str || strlen(str) + 1 > PAGE_SIZE)
+		goto err;
+	strcpy(str_dup, str);
+
+	always = huge_anon_orders_always;
+	madvise = huge_anon_orders_madvise;
+	inherit = huge_anon_orders_inherit;
+	p = str_dup;
+	while ((token = strsep(&p, ";")) != NULL) {
+		range = strsep(&token, ":");
+		policy = token;
+
+		if (!policy)
+			goto err;
+
+		while ((subtoken = strsep(&range, ",")) != NULL) {
+			if (strchr(subtoken, '-')) {
+				start_size = strsep(&subtoken, "-");
+				end_size = subtoken;
+
+				start = get_order_from_str(start_size);
+				end = get_order_from_str(end_size);
+			} else {
+				start = end = get_order_from_str(subtoken);
+			}
+
+			if (start < 0 || end < 0 || start > end)
+				goto err;
+
+			if (!strcmp(policy, "always")) {
+				bitmap_set(&always, start, end - start + 1);
+				bitmap_clear(&inherit, start, end - start + 1);
+				bitmap_clear(&madvise, start, end - start + 1);
+			} else if (!strcmp(policy, "madvise")) {
+				bitmap_set(&madvise, start, end - start + 1);
+				bitmap_clear(&inherit, start, end - start + 1);
+				bitmap_clear(&always, start, end - start + 1);
+			} else if (!strcmp(policy, "inherit")) {
+				bitmap_set(&inherit, start, end - start + 1);
+				bitmap_clear(&madvise, start, end - start + 1);
+				bitmap_clear(&always, start, end - start + 1);
+			} else if (!strcmp(policy, "never")) {
+				bitmap_clear(&inherit, start, end - start + 1);
+				bitmap_clear(&madvise, start, end - start + 1);
+				bitmap_clear(&always, start, end - start + 1);
+			} else {
+				pr_err("invalid policy %s in thp_anon boot parameter\n", policy);
+				goto err;
+			}
+		}
+	}
+
+	huge_anon_orders_always = always;
+	huge_anon_orders_madvise = madvise;
+	huge_anon_orders_inherit = inherit;
+	anon_orders_configured = true;
+	return 1;
+
+err:
+	pr_warn("thp_anon=%s: cannot parse, ignored\n", str);
+	return 0;
+}
+__setup("thp_anon=", setup_thp_anon);
+
 pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 {
 	if (likely(vma->vm_flags & VM_WRITE))