diff mbox series

mm/vmscan: when the swappiness is set to 0, memory swapping should be prohibited during the global reclaim process

Message ID CAN2Y7hxDdATNfb=R5J1as3pqA1RsP8c8LubC4QxojK5cJS9Q9w@mail.gmail.com (mailing list archive)
State New
Headers show
Series mm/vmscan: when the swappiness is set to 0, memory swapping should be prohibited during the global reclaim process | expand

Commit Message

ying chen Feb. 27, 2025, 2:34 p.m. UTC
When we use zram as swap disks, global reclaim may cause the memory in some
cgroups with memory.swappiness set to 0 to be swapped into zram. This memory
won't be swapped back immediately after the free memory increases. Instead,
it will continue to occupy the zram space, which may result in no available
zram space for the cgroups with swapping enabled. Therefore, I think that
when the vm.swappiness is set to 0, global reclaim should also refrain
from memory swapping, just like these cgroups.

Signed-off-by: yc1082463 <yc1082463@gmail.com>
---
 mm/vmscan.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

        }
--
2.34.1

Comments

Joshua Hahn Feb. 27, 2025, 3:54 p.m. UTC | #1
On Thu, 27 Feb 2025 22:34:51 +0800 ying chen <yc1082463@gmail.com> wrote:

Hi Ying,

I hope you are having a great day! I wanted to share a few thoughts:

Previously, when the system is under a lot of memory pressure and is
facing OOMs, global reclaim can create space for the system and prevent
going out of memory by swapping, even when swappiness is 0. If this patch
removes that check, it would mean that global reclaim can no longer
"bypass" the swappiness == 0 condition.

I am also CCing Johannes, who is the original author of this section [1],
who clarified in the patch that swappiness == 0 has different meanings for
global reclaim and memory cgroup reclaim.  

> When we use zram as swap disks, global reclaim may cause the memory in some
> cgroups with memory.swappiness set to 0 to be swapped into zram. This memory
> won't be swapped back immediately after the free memory increases. Instead,
> it will continue to occupy the zram space, which may result in no available
> zram space for the cgroups with swapping enabled. Therefore, I think that

IMHO, I think that even with zram, we would still want to allow the system
to reclaim memory & swap out, in case we are facing imminent OOMs. Even if
the memory isn't immediately swapped back in when we are able to manage the
memory spike and see free memory, I imagine that we might not even be able
to manage the spike if we prevent global reclaim from swapping.

These are just some thoughts that I had about the patch. However, my
understanding of zram and reclaim is limited; please feel free to
correct me if you see anything that I am not understanding correctly.

Thank you for your time, have a great day!
Joshua

[1] https://lore.kernel.org/linux-mm/1355767957-4913-4-git-send-email-hannes@cmpxchg.org/

> when the vm.swappiness is set to 0, global reclaim should also refrain
> from memory swapping, just like these cgroups.
> 
> Signed-off-by: yc1082463 <yc1082463@gmail.com>
> ---
>  mm/vmscan.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c767d71c43d7..bdbb0fc03412 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2426,14 +2426,7 @@ static void get_scan_count(struct lruvec
> *lruvec, struct scan_control *sc,
>                 goto out;
>         }
> 
> -       /*
> -        * Global reclaim will swap to prevent OOM even with no
> -        * swappiness, but memcg users want to use this knob to
> -        * disable swapping for individual groups completely when
> -        * using the memory controller's swap limit feature would be
> -        * too expensive.
> -        */
> -       if (cgroup_reclaim(sc) && !swappiness) {
> +       if (!swappiness) {
>                 scan_balance = SCAN_FILE;
>                 goto out;
>         }
> --
> 2.34.1

Sent using hkml (https://github.com/sjp38/hackermail)
Johannes Weiner Feb. 27, 2025, 4:19 p.m. UTC | #2
Hello,

On Thu, Feb 27, 2025 at 07:54:27AM -0800, Joshua Hahn wrote:
> On Thu, 27 Feb 2025 22:34:51 +0800 ying chen <yc1082463@gmail.com> wrote:

> Previously, when the system is under a lot of memory pressure and is
> facing OOMs, global reclaim can create space for the system and prevent
> going out of memory by swapping, even when swappiness is 0. If this patch
> removes that check, it would mean that global reclaim can no longer
> "bypass" the swappiness == 0 condition.
> 
> I am also CCing Johannes, who is the original author of this section [1],
> who clarified in the patch that swappiness == 0 has different meanings for
> global reclaim and memory cgroup reclaim.

Yes. It's been the behavior for decades that swappiness is merely a
preference, and that the VM *will* swap to avert OOM. You would break
users making this change.

If you want to hard-exempt cgroups, set memory.swap.max=0.

[ Yes, it's inconsistent. But it's really cgroup_reclaim() that is the
  oddball in this. Also for historical reasons... ]

> > when the vm.swappiness is set to 0, global reclaim should also refrain
> > from memory swapping, just like these cgroups.
> > 
> > Signed-off-by: yc1082463 <yc1082463@gmail.com>

Nacked-by: Johannes Weiner <hannes@cmpxchg.org>
Shakeel Butt Feb. 27, 2025, 7:12 p.m. UTC | #3
On Thu, Feb 27, 2025 at 10:34:51PM +0800, ying chen wrote:
> When we use zram as swap disks, global reclaim may cause the memory in some
> cgroups with memory.swappiness set to 0 to be swapped into zram. This memory
> won't be swapped back immediately after the free memory increases. Instead,
> it will continue to occupy the zram space, which may result in no available
> zram space for the cgroups with swapping enabled. Therefore, I think that
> when the vm.swappiness is set to 0, global reclaim should also refrain
> from memory swapping, just like these cgroups.
> 
> Signed-off-by: yc1082463 <yc1082463@gmail.com>

It seems like you are still on memcg-v1. What is stopping you to move to
memcg-v2 and use memory.swap.max = 0?
ying chen Feb. 28, 2025, 2:48 a.m. UTC | #4
Yes, I'm still using memcg-v1. But it's too expensive for us to
migrate the production environment to memcg-v2.

On Fri, Feb 28, 2025 at 3:12 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Feb 27, 2025 at 10:34:51PM +0800, ying chen wrote:
> > When we use zram as swap disks, global reclaim may cause the memory in some
> > cgroups with memory.swappiness set to 0 to be swapped into zram. This memory
> > won't be swapped back immediately after the free memory increases. Instead,
> > it will continue to occupy the zram space, which may result in no available
> > zram space for the cgroups with swapping enabled. Therefore, I think that
> > when the vm.swappiness is set to 0, global reclaim should also refrain
> > from memory swapping, just like these cgroups.
> >
> > Signed-off-by: yc1082463 <yc1082463@gmail.com>
>
> It seems like you are still on memcg-v1. What is stopping you to move to
> memcg-v2 and use memory.swap.max = 0?
>
ying chen Feb. 28, 2025, 3:16 a.m. UTC | #5
We only create a limited zram disk size for the cgroups that allow swapping.
If this part of the swap space is occupied by other cgroups that don't
allow swapping,
the cgroups that allow swapping may not have enough swap space available.

On Thu, Feb 27, 2025 at 11:54 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> On Thu, 27 Feb 2025 22:34:51 +0800 ying chen <yc1082463@gmail.com> wrote:
>
> Hi Ying,
>
> I hope you are having a great day! I wanted to share a few thoughts:
>
> Previously, when the system is under a lot of memory pressure and is
> facing OOMs, global reclaim can create space for the system and prevent
> going out of memory by swapping, even when swappiness is 0. If this patch
> removes that check, it would mean that global reclaim can no longer
> "bypass" the swappiness == 0 condition.
>
> I am also CCing Johannes, who is the original author of this section [1],
> who clarified in the patch that swappiness == 0 has different meanings for
> global reclaim and memory cgroup reclaim.
>
> > When we use zram as swap disks, global reclaim may cause the memory in some
> > cgroups with memory.swappiness set to 0 to be swapped into zram. This memory
> > won't be swapped back immediately after the free memory increases. Instead,
> > it will continue to occupy the zram space, which may result in no available
> > zram space for the cgroups with swapping enabled. Therefore, I think that
>
> IMHO, I think that even with zram, we would still want to allow the system
> to reclaim memory & swap out, in case we are facing imminent OOMs. Even if
> the memory isn't immediately swapped back in when we are able to manage the
> memory spike and see free memory, I imagine that we might not even be able
> to manage the spike if we prevent global reclaim from swapping.
>
> These are just some thoughts that I had about the patch. However, my
> understanding of zram and reclaim is limited; please feel free to
> correct me if you see anything that I am not understanding correctly.
>
> Thank you for your time, have a great day!
> Joshua
>
> [1] https://lore.kernel.org/linux-mm/1355767957-4913-4-git-send-email-hannes@cmpxchg.org/
>
> > when the vm.swappiness is set to 0, global reclaim should also refrain
> > from memory swapping, just like these cgroups.
> >
> > Signed-off-by: yc1082463 <yc1082463@gmail.com>
> > ---
> >  mm/vmscan.c | 9 +--------
> >  1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index c767d71c43d7..bdbb0fc03412 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2426,14 +2426,7 @@ static void get_scan_count(struct lruvec
> > *lruvec, struct scan_control *sc,
> >                 goto out;
> >         }
> >
> > -       /*
> > -        * Global reclaim will swap to prevent OOM even with no
> > -        * swappiness, but memcg users want to use this knob to
> > -        * disable swapping for individual groups completely when
> > -        * using the memory controller's swap limit feature would be
> > -        * too expensive.
> > -        */
> > -       if (cgroup_reclaim(sc) && !swappiness) {
> > +       if (!swappiness) {
> >                 scan_balance = SCAN_FILE;
> >                 goto out;
> >         }
> > --
> > 2.34.1
>
> Sent using hkml (https://github.com/sjp38/hackermail)
>
ying chen Feb. 28, 2025, 3:18 a.m. UTC | #6
Got it. Thank you very much.

On Fri, Feb 28, 2025 at 12:19 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Hello,
>
> On Thu, Feb 27, 2025 at 07:54:27AM -0800, Joshua Hahn wrote:
> > On Thu, 27 Feb 2025 22:34:51 +0800 ying chen <yc1082463@gmail.com> wrote:
>
> > Previously, when the system is under a lot of memory pressure and is
> > facing OOMs, global reclaim can create space for the system and prevent
> > going out of memory by swapping, even when swappiness is 0. If this patch
> > removes that check, it would mean that global reclaim can no longer
> > "bypass" the swappiness == 0 condition.
> >
> > I am also CCing Johannes, who is the original author of this section [1],
> > who clarified in the patch that swappiness == 0 has different meanings for
> > global reclaim and memory cgroup reclaim.
>
> Yes. It's been the behavior for decades that swappiness is merely a
> preference, and that the VM *will* swap to avert OOM. You would break
> users making this change.
>
> If you want to hard-exempt cgroups, set memory.swap.max=0.
>
> [ Yes, it's inconsistent. But it's really cgroup_reclaim() that is the
>   oddball in this. Also for historical reasons... ]
>
> > > when the vm.swappiness is set to 0, global reclaim should also refrain
> > > from memory swapping, just like these cgroups.
> > >
> > > Signed-off-by: yc1082463 <yc1082463@gmail.com>
>
> Nacked-by: Johannes Weiner <hannes@cmpxchg.org>
Yafang Shao Feb. 28, 2025, 3:21 a.m. UTC | #7
On Fri, Feb 28, 2025 at 12:19 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Hello,
>
> On Thu, Feb 27, 2025 at 07:54:27AM -0800, Joshua Hahn wrote:
> > On Thu, 27 Feb 2025 22:34:51 +0800 ying chen <yc1082463@gmail.com> wrote:
>
> > Previously, when the system is under a lot of memory pressure and is
> > facing OOMs, global reclaim can create space for the system and prevent
> > going out of memory by swapping, even when swappiness is 0. If this patch
> > removes that check, it would mean that global reclaim can no longer
> > "bypass" the swappiness == 0 condition.
> >
> > I am also CCing Johannes, who is the original author of this section [1],
> > who clarified in the patch that swappiness == 0 has different meanings for
> > global reclaim and memory cgroup reclaim.
>
> Yes. It's been the behavior for decades that swappiness is merely a
> preference, and that the VM *will* swap to avert OOM. You would break
> users making this change.

Hello Johannes,

How about introducing a new value, vm.swappiness=-1, to disable
swapping for global reclaim?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 76378bc257e3..4c22352c331c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2387,13 +2387,19 @@ static void get_scan_count(struct lruvec
*lruvec, struct scan_control *sc,
        }

        /*
-        * Global reclaim will swap to prevent OOM even with no
-        * swappiness, but memcg users want to use this knob to
-        * disable swapping for individual groups completely when
-        * using the memory controller's swap limit feature would be
-        * too expensive.
+        * swappiness > 0:
+        *   Swapping is enabled for both global reclaim and memcg reclaim.
+        *
+        * swappiness = 0:
+        *   Swapping is completely disabled for individual groups when using
+        *   the memory controller's swap limit feature would be too costly.
+        *
+        * swappiness = -1:
+        *   Swapping is disabled for both global reclaim and memcg reclaim.
+        *   This is useful when you want to enable swapping for certain
+        *   memory cgroups while disabling it for others.
         */
-       if (cgroup_reclaim(sc) && !swappiness) {
+       if ((cgroup_reclaim(sc) && !swappiness) || swappiness == -1)
                scan_balance = SCAN_FILE;
                goto out;
        }


Other parts of the code will also need to be updated to accommodate
this new swappiness value.

>
> If you want to hard-exempt cgroups, set memory.swap.max=0.

This does not apply to the root memcg.

>
> [ Yes, it's inconsistent. But it's really cgroup_reclaim() that is the
>   oddball in this. Also for historical reasons... ]
>
> > > when the vm.swappiness is set to 0, global reclaim should also refrain
> > > from memory swapping, just like these cgroups.
> > >
> > > Signed-off-by: yc1082463 <yc1082463@gmail.com>
>
> Nacked-by: Johannes Weiner <hannes@cmpxchg.org>
>
Michal Hocko March 3, 2025, 12:07 p.m. UTC | #8
On Thu 27-02-25 22:34:51, ying chen wrote:
> When we use zram as swap disks, global reclaim may cause the memory in some
> cgroups with memory.swappiness set to 0 to be swapped into zram. This memory
> won't be swapped back immediately after the free memory increases. Instead,
> it will continue to occupy the zram space, which may result in no available
> zram space for the cgroups with swapping enabled. Therefore, I think that
> when the vm.swappiness is set to 0, global reclaim should also refrain
> from memory swapping, just like these cgroups.

You are changing well established and understood semantic while working
around a problem that is not really clear to me. If the zram space is
limited then you should be using swap limits to control who can swap
out, no?

> Signed-off-by: yc1082463 <yc1082463@gmail.com>
> ---
>  mm/vmscan.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c767d71c43d7..bdbb0fc03412 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2426,14 +2426,7 @@ static void get_scan_count(struct lruvec
> *lruvec, struct scan_control *sc,
>                 goto out;
>         }
> 
> -       /*
> -        * Global reclaim will swap to prevent OOM even with no
> -        * swappiness, but memcg users want to use this knob to
> -        * disable swapping for individual groups completely when
> -        * using the memory controller's swap limit feature would be
> -        * too expensive.
> -        */
> -       if (cgroup_reclaim(sc) && !swappiness) {
> +       if (!swappiness) {
>                 scan_balance = SCAN_FILE;
>                 goto out;
>         }
> --
> 2.34.1
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c767d71c43d7..bdbb0fc03412 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2426,14 +2426,7 @@  static void get_scan_count(struct lruvec
*lruvec, struct scan_control *sc,
                goto out;
        }

-       /*
-        * Global reclaim will swap to prevent OOM even with no
-        * swappiness, but memcg users want to use this knob to
-        * disable swapping for individual groups completely when
-        * using the memory controller's swap limit feature would be
-        * too expensive.
-        */
-       if (cgroup_reclaim(sc) && !swappiness) {
+       if (!swappiness) {
                scan_balance = SCAN_FILE;
                goto out;