diff mbox

mm: avoid bothering interrupted task when charge memcg in softirq

Message ID 1531557122-12540-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yafang Shao July 14, 2018, 8:32 a.m. UTC
try_charge maybe executed in packet receive path, which is in interrupt
context.
In this situation, the 'current' is the interrupted task, which may has
no relation to the rx softirq, So it is nonsense to use 'current'.

Avoid bothering the interrupted if page_counter_try_charge failes.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/memcontrol.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Shakeel Butt July 14, 2018, 3:38 p.m. UTC | #1
On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> try_charge maybe executed in packet receive path, which is in interrupt
> context.
> In this situation, the 'current' is the interrupted task, which may has
> no relation to the rx softirq, So it is nonsense to use 'current'.
>

Have you actually seen this occurring? I am not very familiar with the
network code but I can think of two ways try_charge() can be called
from network code. Either through kmem charging or through
mem_cgroup_charge_skmem() and both locations correctly handle
interrupt context.

Shakeel
Yafang Shao July 15, 2018, 2:10 a.m. UTC | #2
On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
> On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>>
>> try_charge maybe executed in packet receive path, which is in interrupt
>> context.
>> In this situation, the 'current' is the interrupted task, which may has
>> no relation to the rx softirq, So it is nonsense to use 'current'.
>>
>
> Have you actually seen this occurring?

Hi Shakeel,

I'm trying to produce this issue, but haven't find it occur yet.

> I am not very familiar with the
> network code but I can think of two ways try_charge() can be called
> from network code. Either through kmem charging or through
> mem_cgroup_charge_skmem() and both locations correctly handle
> interrupt context.
>

Why do you say that mem_cgroup_charge_skmem() correctly hanle
interrupt context ?

Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
context correctly.

mem_cgroup_charge_skmem() is calling  try_charge() twice.
The first one is with GFP_NOWAIT as the gfp_mask, and the second one
is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.

If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
Then mem_cgroup_charge_skmem() will call try_charge() once more with
__GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
again the '
force' label in  try_charge() will be executed and 0 is returned.

No matter what, the 'current' will be used and touched, that is
meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
correctly.

Pls. let me know if I miss something.


Thanks
Yafang
Yafang Shao July 15, 2018, 2:25 a.m. UTC | #3
On Sun, Jul 15, 2018 at 10:10 AM, Yafang Shao <laoar.shao@gmail.com> wrote:
> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>>>
>>> try_charge maybe executed in packet receive path, which is in interrupt
>>> context.
>>> In this situation, the 'current' is the interrupted task, which may has
>>> no relation to the rx softirq, So it is nonsense to use 'current'.
>>>
>>
>> Have you actually seen this occurring?
>
> Hi Shakeel,
>
> I'm trying to produce this issue, but haven't find it occur yet.
>
>> I am not very familiar with the
>> network code but I can think of two ways try_charge() can be called
>> from network code. Either through kmem charging or through
>> mem_cgroup_charge_skmem() and both locations correctly handle
>> interrupt context.
>>
>
> Why do you say that mem_cgroup_charge_skmem() correctly hanle
> interrupt context ?
>
> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
> context correctly.
>
> mem_cgroup_charge_skmem() is calling  try_charge() twice.
> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
>
> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
> Then mem_cgroup_charge_skmem() will call try_charge() once more with
> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
> again the '
> force' label in  try_charge() will be executed and 0 is returned.
>
> No matter what, the 'current' will be used and touched, that is
> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
> correctly.
>
> Pls. let me know if I miss something.
>
>

Maybe bellow change is better,
@@ -2123,6 +2123,9 @@ static int try_charge(struct mem_cgroup *memcg,
gfp_t gfp_mask,
                goto retry;
        }

+       if (!gfpflags_allow_blocking(gfp_mask))
+               goto nomem;
+
        /*
         * Unlike in global OOM situations, memcg is not in a physical
         * memory shortage.  Allow dying and OOM-killed tasks to
@@ -2146,9 +2149,6 @@ static int try_charge(struct mem_cgroup *memcg,
gfp_t gfp_mask,
        if (unlikely(task_in_memcg_oom(current)))
                goto nomem;

-       if (!gfpflags_allow_blocking(gfp_mask))
-               goto nomem;

Thanks
Yafang
Shakeel Butt July 15, 2018, 4:25 a.m. UTC | #4
On Sat, Jul 14, 2018 at 7:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
> > On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>
> >> try_charge maybe executed in packet receive path, which is in interrupt
> >> context.
> >> In this situation, the 'current' is the interrupted task, which may has
> >> no relation to the rx softirq, So it is nonsense to use 'current'.
> >>
> >
> > Have you actually seen this occurring?
>
> Hi Shakeel,
>
> I'm trying to produce this issue, but haven't find it occur yet.
>
> > I am not very familiar with the
> > network code but I can think of two ways try_charge() can be called
> > from network code. Either through kmem charging or through
> > mem_cgroup_charge_skmem() and both locations correctly handle
> > interrupt context.
> >
>
> Why do you say that mem_cgroup_charge_skmem() correctly hanle
> interrupt context ?
>
> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
> context correctly.
>
> mem_cgroup_charge_skmem() is calling  try_charge() twice.
> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
>
> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
> Then mem_cgroup_charge_skmem() will call try_charge() once more with
> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
> again the '
> force' label in  try_charge() will be executed and 0 is returned.
>
> No matter what, the 'current' will be used and touched, that is
> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
> correctly.
>

Hi Yafang,

If you check mem_cgroup_charge_skmem(), the memcg passed is not
'current' but is from the sock object i.e. sk->sk_memcg for which the
network buffer is allocated for.

If the network buffers is allocated through kmem interface, the
charging is bypassed altogether (through memcg_kmem_bypass()) for
interrupt context.

regards,
Shakeel
Yafang Shao July 15, 2018, 5:25 a.m. UTC | #5
On Sun, Jul 15, 2018 at 12:25 PM, Shakeel Butt <shakeelb@google.com> wrote:
> On Sat, Jul 14, 2018 at 7:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>>
>> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> > On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >>
>> >> try_charge maybe executed in packet receive path, which is in interrupt
>> >> context.
>> >> In this situation, the 'current' is the interrupted task, which may has
>> >> no relation to the rx softirq, So it is nonsense to use 'current'.
>> >>
>> >
>> > Have you actually seen this occurring?
>>
>> Hi Shakeel,
>>
>> I'm trying to produce this issue, but haven't find it occur yet.
>>
>> > I am not very familiar with the
>> > network code but I can think of two ways try_charge() can be called
>> > from network code. Either through kmem charging or through
>> > mem_cgroup_charge_skmem() and both locations correctly handle
>> > interrupt context.
>> >
>>
>> Why do you say that mem_cgroup_charge_skmem() correctly hanle
>> interrupt context ?
>>
>> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
>> context correctly.
>>
>> mem_cgroup_charge_skmem() is calling  try_charge() twice.
>> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
>> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
>>
>> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
>> Then mem_cgroup_charge_skmem() will call try_charge() once more with
>> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
>> again the '
>> force' label in  try_charge() will be executed and 0 is returned.
>>
>> No matter what, the 'current' will be used and touched, that is
>> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
>> correctly.
>>
>
> Hi Yafang,
>
> If you check mem_cgroup_charge_skmem(), the memcg passed is not
> 'current' but is from the sock object i.e. sk->sk_memcg for which the
> network buffer is allocated for.
>

That's correct, the memcg if from the sock object.
But the point is, in this situation why 'current' is used in try_charge() ?
As 'current' is not related with the memcg, which is just a interrupted task.


> If the network buffers is allocated through kmem interface, the
> charging is bypassed altogether (through memcg_kmem_bypass()) for
> interrupt context.
>

Yes.

Thanks
Yafang
Shakeel Butt July 15, 2018, 6:34 a.m. UTC | #6
On Sat, Jul 14, 2018 at 10:26 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sun, Jul 15, 2018 at 12:25 PM, Shakeel Butt <shakeelb@google.com> wrote:
> > On Sat, Jul 14, 2018 at 7:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>
> >> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
> >> > On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >> >>
> >> >> try_charge maybe executed in packet receive path, which is in interrupt
> >> >> context.
> >> >> In this situation, the 'current' is the interrupted task, which may has
> >> >> no relation to the rx softirq, So it is nonsense to use 'current'.
> >> >>
> >> >
> >> > Have you actually seen this occurring?
> >>
> >> Hi Shakeel,
> >>
> >> I'm trying to produce this issue, but haven't find it occur yet.
> >>
> >> > I am not very familiar with the
> >> > network code but I can think of two ways try_charge() can be called
> >> > from network code. Either through kmem charging or through
> >> > mem_cgroup_charge_skmem() and both locations correctly handle
> >> > interrupt context.
> >> >
> >>
> >> Why do you say that mem_cgroup_charge_skmem() correctly hanle
> >> interrupt context ?
> >>
> >> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
> >> context correctly.
> >>
> >> mem_cgroup_charge_skmem() is calling  try_charge() twice.
> >> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
> >> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
> >>
> >> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
> >> Then mem_cgroup_charge_skmem() will call try_charge() once more with
> >> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
> >> again the '
> >> force' label in  try_charge() will be executed and 0 is returned.
> >>
> >> No matter what, the 'current' will be used and touched, that is
> >> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
> >> correctly.
> >>
> >
> > Hi Yafang,
> >
> > If you check mem_cgroup_charge_skmem(), the memcg passed is not
> > 'current' but is from the sock object i.e. sk->sk_memcg for which the
> > network buffer is allocated for.
> >
>
> That's correct, the memcg if from the sock object.
> But the point is, in this situation why 'current' is used in try_charge() ?
> As 'current' is not related with the memcg, which is just a interrupted task.
>

Hmm so you mean the behavior of memcg charging in the interrupt
context depends on the state of the interrupted task. As you have
noted, mem_cgroup_charge_skmem() tries charging again with
__GFP_NOFAIL and the charge succeeds. Basically the memcg charging by
mem_cgroup_charge_skmem() will always succeed irrespective of the
state of the interrupted task. However mem_cgroup_charge_skmem() can
return true if the interrupted task was exiting or a fatal signal is
pending or oom victim or reclaiming memory. Can you please explain why
this is bad?

Shakeel
Yafang Shao July 15, 2018, 8:01 a.m. UTC | #7
On Sun, Jul 15, 2018 at 2:34 PM, Shakeel Butt <shakeelb@google.com> wrote:
> On Sat, Jul 14, 2018 at 10:26 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>>
>> On Sun, Jul 15, 2018 at 12:25 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> > On Sat, Jul 14, 2018 at 7:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >>
>> >> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> >> > On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >> >>
>> >> >> try_charge maybe executed in packet receive path, which is in interrupt
>> >> >> context.
>> >> >> In this situation, the 'current' is the interrupted task, which may has
>> >> >> no relation to the rx softirq, So it is nonsense to use 'current'.
>> >> >>
>> >> >
>> >> > Have you actually seen this occurring?
>> >>
>> >> Hi Shakeel,
>> >>
>> >> I'm trying to produce this issue, but haven't find it occur yet.
>> >>
>> >> > I am not very familiar with the
>> >> > network code but I can think of two ways try_charge() can be called
>> >> > from network code. Either through kmem charging or through
>> >> > mem_cgroup_charge_skmem() and both locations correctly handle
>> >> > interrupt context.
>> >> >
>> >>
>> >> Why do you say that mem_cgroup_charge_skmem() correctly hanle
>> >> interrupt context ?
>> >>
>> >> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
>> >> context correctly.
>> >>
>> >> mem_cgroup_charge_skmem() is calling  try_charge() twice.
>> >> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
>> >> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
>> >>
>> >> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
>> >> Then mem_cgroup_charge_skmem() will call try_charge() once more with
>> >> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
>> >> again the '
>> >> force' label in  try_charge() will be executed and 0 is returned.
>> >>
>> >> No matter what, the 'current' will be used and touched, that is
>> >> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
>> >> correctly.
>> >>
>> >
>> > Hi Yafang,
>> >
>> > If you check mem_cgroup_charge_skmem(), the memcg passed is not
>> > 'current' but is from the sock object i.e. sk->sk_memcg for which the
>> > network buffer is allocated for.
>> >
>>
>> That's correct, the memcg if from the sock object.
>> But the point is, in this situation why 'current' is used in try_charge() ?
>> As 'current' is not related with the memcg, which is just a interrupted task.
>>
>
> Hmm so you mean the behavior of memcg charging in the interrupt
> context depends on the state of the interrupted task.

Yes.

> As you have
> noted, mem_cgroup_charge_skmem() tries charging again with
> __GFP_NOFAIL and the charge succeeds. Basically the memcg charging by
> mem_cgroup_charge_skmem() will always succeed irrespective of the
> state of the interrupted task. However mem_cgroup_charge_skmem() can
> return true if the interrupted task was exiting or a fatal signal is
> pending or oom victim or reclaiming memory. Can you please explain why
> this is bad?
>

Let me show you the possible issues cause by this behavoir.
1.  In mem_cgroup_oom(), some  members in 'current' is set.
     That means an innocent task will be in  task_in_memcg_oom state.
     But this task may be in a different memcg, I mean the memcg of
the 'current' may be differenct with the sk->sk_memcg.
     Then when this innocent 'current' do try_charge it will hit  "if
(unlikely(task_in_memcg_oom(current)))" and  -ENOMEM is returned,
While there're maybe some free memory (or some memory could be freed )
in the memcg of the innocent 'task'.

2.  If the interrupted task was exiting or a fatal signal is  pending
or oom victim,
     it will directly goto force and 0 is returned, and then
mem_cgroup_charge_skmem() will return true.
     But mem_cgroup_charge_skmem() maybe need to try the second time
and return false.

That are all unexpected behavoir.

At least we must judge that whether the memcg of 'current' is same
with sk->sk_memcg if we still want to use current here.

Thanks
Yafang
Shakeel Butt July 15, 2018, 3:04 p.m. UTC | #8
On Sun, Jul 15, 2018 at 1:02 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sun, Jul 15, 2018 at 2:34 PM, Shakeel Butt <shakeelb@google.com> wrote:
> > On Sat, Jul 14, 2018 at 10:26 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>
> >> On Sun, Jul 15, 2018 at 12:25 PM, Shakeel Butt <shakeelb@google.com> wrote:
> >> > On Sat, Jul 14, 2018 at 7:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >> >>
> >> >> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
> >> >> > On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >> >> >>
> >> >> >> try_charge maybe executed in packet receive path, which is in interrupt
> >> >> >> context.
> >> >> >> In this situation, the 'current' is the interrupted task, which may has
> >> >> >> no relation to the rx softirq, So it is nonsense to use 'current'.
> >> >> >>
> >> >> >
> >> >> > Have you actually seen this occurring?
> >> >>
> >> >> Hi Shakeel,
> >> >>
> >> >> I'm trying to produce this issue, but haven't find it occur yet.
> >> >>
> >> >> > I am not very familiar with the
> >> >> > network code but I can think of two ways try_charge() can be called
> >> >> > from network code. Either through kmem charging or through
> >> >> > mem_cgroup_charge_skmem() and both locations correctly handle
> >> >> > interrupt context.
> >> >> >
> >> >>
> >> >> Why do you say that mem_cgroup_charge_skmem() correctly hanle
> >> >> interrupt context ?
> >> >>
> >> >> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
> >> >> context correctly.
> >> >>
> >> >> mem_cgroup_charge_skmem() is calling  try_charge() twice.
> >> >> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
> >> >> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
> >> >>
> >> >> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
> >> >> Then mem_cgroup_charge_skmem() will call try_charge() once more with
> >> >> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
> >> >> again the '
> >> >> force' label in  try_charge() will be executed and 0 is returned.
> >> >>
> >> >> No matter what, the 'current' will be used and touched, that is
> >> >> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
> >> >> correctly.
> >> >>
> >> >
> >> > Hi Yafang,
> >> >
> >> > If you check mem_cgroup_charge_skmem(), the memcg passed is not
> >> > 'current' but is from the sock object i.e. sk->sk_memcg for which the
> >> > network buffer is allocated for.
> >> >
> >>
> >> That's correct, the memcg if from the sock object.
> >> But the point is, in this situation why 'current' is used in try_charge() ?
> >> As 'current' is not related with the memcg, which is just a interrupted task.
> >>
> >
> > Hmm so you mean the behavior of memcg charging in the interrupt
> > context depends on the state of the interrupted task.
>
> Yes.
>
> > As you have
> > noted, mem_cgroup_charge_skmem() tries charging again with
> > __GFP_NOFAIL and the charge succeeds. Basically the memcg charging by
> > mem_cgroup_charge_skmem() will always succeed irrespective of the
> > state of the interrupted task. However mem_cgroup_charge_skmem() can
> > return true if the interrupted task was exiting or a fatal signal is
> > pending or oom victim or reclaiming memory. Can you please explain why
> > this is bad?
> >
>
> Let me show you the possible issues cause by this behavoir.
> 1.  In mem_cgroup_oom(), some  members in 'current' is set.
>      That means an innocent task will be in  task_in_memcg_oom state.
>      But this task may be in a different memcg, I mean the memcg of
> the 'current' may be differenct with the sk->sk_memcg.
>      Then when this innocent 'current' do try_charge it will hit  "if
> (unlikely(task_in_memcg_oom(current)))" and  -ENOMEM is returned,
> While there're maybe some free memory (or some memory could be freed )
> in the memcg of the innocent 'task'.
>

No memory will be freed as try_charge() is in interrupt context.

> 2.  If the interrupted task was exiting or a fatal signal is  pending
> or oom victim,
>      it will directly goto force and 0 is returned, and then
> mem_cgroup_charge_skmem() will return true.
>      But mem_cgroup_charge_skmem() maybe need to try the second time
> and return false.
>
> That are all unexpected behavoir.
>

Yes, this is inconsistent behavior. Can you explain how this will
affect network traffic? Basically mem_cgroup_charge_skmem() was
supposed to return false but sometime based on the interrupted task,
mem_cgroup_charge_skmem() returns true. How is this behavior bad for
network traffic?

Please note that I am not against this patch. I just want that the
motivation/reason behind it is very clear.

thanks,
Shakeel
Yafang Shao July 16, 2018, 1:49 a.m. UTC | #9
On Sun, Jul 15, 2018 at 11:04 PM, Shakeel Butt <shakeelb@google.com> wrote:
> On Sun, Jul 15, 2018 at 1:02 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>>
>> On Sun, Jul 15, 2018 at 2:34 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> > On Sat, Jul 14, 2018 at 10:26 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >>
>> >> On Sun, Jul 15, 2018 at 12:25 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> >> > On Sat, Jul 14, 2018 at 7:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >> >>
>> >> >> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> >> >> > On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >> >> >>
>> >> >> >> try_charge maybe executed in packet receive path, which is in interrupt
>> >> >> >> context.
>> >> >> >> In this situation, the 'current' is the interrupted task, which may has
>> >> >> >> no relation to the rx softirq, So it is nonsense to use 'current'.
>> >> >> >>
>> >> >> >
>> >> >> > Have you actually seen this occurring?
>> >> >>
>> >> >> Hi Shakeel,
>> >> >>
>> >> >> I'm trying to produce this issue, but haven't find it occur yet.
>> >> >>
>> >> >> > I am not very familiar with the
>> >> >> > network code but I can think of two ways try_charge() can be called
>> >> >> > from network code. Either through kmem charging or through
>> >> >> > mem_cgroup_charge_skmem() and both locations correctly handle
>> >> >> > interrupt context.
>> >> >> >
>> >> >>
>> >> >> Why do you say that mem_cgroup_charge_skmem() correctly hanle
>> >> >> interrupt context ?
>> >> >>
>> >> >> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
>> >> >> context correctly.
>> >> >>
>> >> >> mem_cgroup_charge_skmem() is calling  try_charge() twice.
>> >> >> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
>> >> >> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
>> >> >>
>> >> >> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
>> >> >> Then mem_cgroup_charge_skmem() will call try_charge() once more with
>> >> >> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
>> >> >> again the '
>> >> >> force' label in  try_charge() will be executed and 0 is returned.
>> >> >>
>> >> >> No matter what, the 'current' will be used and touched, that is
>> >> >> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
>> >> >> correctly.
>> >> >>
>> >> >
>> >> > Hi Yafang,
>> >> >
>> >> > If you check mem_cgroup_charge_skmem(), the memcg passed is not
>> >> > 'current' but is from the sock object i.e. sk->sk_memcg for which the
>> >> > network buffer is allocated for.
>> >> >
>> >>
>> >> That's correct, the memcg if from the sock object.
>> >> But the point is, in this situation why 'current' is used in try_charge() ?
>> >> As 'current' is not related with the memcg, which is just a interrupted task.
>> >>
>> >
>> > Hmm so you mean the behavior of memcg charging in the interrupt
>> > context depends on the state of the interrupted task.
>>
>> Yes.
>>
>> > As you have
>> > noted, mem_cgroup_charge_skmem() tries charging again with
>> > __GFP_NOFAIL and the charge succeeds. Basically the memcg charging by
>> > mem_cgroup_charge_skmem() will always succeed irrespective of the
>> > state of the interrupted task. However mem_cgroup_charge_skmem() can
>> > return true if the interrupted task was exiting or a fatal signal is
>> > pending or oom victim or reclaiming memory. Can you please explain why
>> > this is bad?
>> >
>>
>> Let me show you the possible issues cause by this behavoir.
>> 1.  In mem_cgroup_oom(), some  members in 'current' is set.
>>      That means an innocent task will be in  task_in_memcg_oom state.
>>      But this task may be in a different memcg, I mean the memcg of
>> the 'current' may be differenct with the sk->sk_memcg.
>>      Then when this innocent 'current' do try_charge it will hit  "if
>> (unlikely(task_in_memcg_oom(current)))" and  -ENOMEM is returned,
>> While there're maybe some free memory (or some memory could be freed )
>> in the memcg of the innocent 'task'.
>>
>
> No memory will be freed as try_charge() is in interrupt context.
>

I mean when this interrupted 'current' is running, that's in process context.
In process context it should call try_to_free_mem_cgroup_pages() to
free some memory,
but it will hit "if (unlikely(task_in_memcg_oom(current)))"  before as
it is set in the interrupt context.

That's an obviously issue. Do you understand ?

>> 2.  If the interrupted task was exiting or a fatal signal is  pending
>> or oom victim,
>>      it will directly goto force and 0 is returned, and then
>> mem_cgroup_charge_skmem() will return true.
>>      But mem_cgroup_charge_skmem() maybe need to try the second time
>> and return false.
>>
>> That are all unexpected behavoir.
>>
>
> Yes, this is inconsistent behavior. Can you explain how this will
> affect network traffic? Basically mem_cgroup_charge_skmem() was
> supposed to return false but sometime based on the interrupted task,
> mem_cgroup_charge_skmem() returns true. How is this behavior bad for
> network traffic?
>

You could see the funtion  __sk_mem_raise_allocated().
If mem_cgroup_charge_skmem() return false, it will goto
suppress_allocation and uncharge skmem,
while when mem_cgroup_charge_skmem() return true, it will charge skmem
sucessfully.

The consequence behavior is  sk_rmem_schedule may fail while it should sucess.
And then it will call  tcp_prune_queue() and tcp collapse may take a long time.

> Please note that I am not against this patch. I just want that the
> motivation/reason behind it is very clear.
>

As I have explained before, there're two motivations.
One is the random interrupted task may fail to charge when it is
scheduled to run.
The other one is it may take long time to receive this skb.

Thanks
Yafang
Shakeel Butt July 16, 2018, 3:09 a.m. UTC | #10
On Sun, Jul 15, 2018 at 6:50 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sun, Jul 15, 2018 at 11:04 PM, Shakeel Butt <shakeelb@google.com> wrote:
> > On Sun, Jul 15, 2018 at 1:02 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>
> >> On Sun, Jul 15, 2018 at 2:34 PM, Shakeel Butt <shakeelb@google.com> wrote:
> >> > On Sat, Jul 14, 2018 at 10:26 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >> >>
> >> >> On Sun, Jul 15, 2018 at 12:25 PM, Shakeel Butt <shakeelb@google.com> wrote:
> >> >> > On Sat, Jul 14, 2018 at 7:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >> >> >>
> >> >> >> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
> >> >> >> > On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >> >> >> >>
> >> >> >> >> try_charge maybe executed in packet receive path, which is in interrupt
> >> >> >> >> context.
> >> >> >> >> In this situation, the 'current' is the interrupted task, which may has
> >> >> >> >> no relation to the rx softirq, So it is nonsense to use 'current'.
> >> >> >> >>
> >> >> >> >
> >> >> >> > Have you actually seen this occurring?
> >> >> >>
> >> >> >> Hi Shakeel,
> >> >> >>
> >> >> >> I'm trying to produce this issue, but haven't find it occur yet.
> >> >> >>
> >> >> >> > I am not very familiar with the
> >> >> >> > network code but I can think of two ways try_charge() can be called
> >> >> >> > from network code. Either through kmem charging or through
> >> >> >> > mem_cgroup_charge_skmem() and both locations correctly handle
> >> >> >> > interrupt context.
> >> >> >> >
> >> >> >>
> >> >> >> Why do you say that mem_cgroup_charge_skmem() correctly hanle
> >> >> >> interrupt context ?
> >> >> >>
> >> >> >> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
> >> >> >> context correctly.
> >> >> >>
> >> >> >> mem_cgroup_charge_skmem() is calling  try_charge() twice.
> >> >> >> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
> >> >> >> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
> >> >> >>
> >> >> >> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
> >> >> >> Then mem_cgroup_charge_skmem() will call try_charge() once more with
> >> >> >> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
> >> >> >> again the '
> >> >> >> force' label in  try_charge() will be executed and 0 is returned.
> >> >> >>
> >> >> >> No matter what, the 'current' will be used and touched, that is
> >> >> >> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
> >> >> >> correctly.
> >> >> >>
> >> >> >
> >> >> > Hi Yafang,
> >> >> >
> >> >> > If you check mem_cgroup_charge_skmem(), the memcg passed is not
> >> >> > 'current' but is from the sock object i.e. sk->sk_memcg for which the
> >> >> > network buffer is allocated for.
> >> >> >
> >> >>
> >> >> That's correct, the memcg if from the sock object.
> >> >> But the point is, in this situation why 'current' is used in try_charge() ?
> >> >> As 'current' is not related with the memcg, which is just a interrupted task.
> >> >>
> >> >
> >> > Hmm so you mean the behavior of memcg charging in the interrupt
> >> > context depends on the state of the interrupted task.
> >>
> >> Yes.
> >>
> >> > As you have
> >> > noted, mem_cgroup_charge_skmem() tries charging again with
> >> > __GFP_NOFAIL and the charge succeeds. Basically the memcg charging by
> >> > mem_cgroup_charge_skmem() will always succeed irrespective of the
> >> > state of the interrupted task. However mem_cgroup_charge_skmem() can
> >> > return true if the interrupted task was exiting or a fatal signal is
> >> > pending or oom victim or reclaiming memory. Can you please explain why
> >> > this is bad?
> >> >
> >>
> >> Let me show you the possible issues cause by this behavoir.
> >> 1.  In mem_cgroup_oom(), some  members in 'current' is set.
> >>      That means an innocent task will be in  task_in_memcg_oom state.
> >>      But this task may be in a different memcg, I mean the memcg of
> >> the 'current' may be differenct with the sk->sk_memcg.
> >>      Then when this innocent 'current' do try_charge it will hit  "if
> >> (unlikely(task_in_memcg_oom(current)))" and  -ENOMEM is returned,
> >> While there're maybe some free memory (or some memory could be freed )
> >> in the memcg of the innocent 'task'.
> >>
> >
> > No memory will be freed as try_charge() is in interrupt context.
> >
>
> I mean when this interrupted 'current' is running, that's in process context.
> In process context it should call try_to_free_mem_cgroup_pages() to
> free some memory,
> but it will hit "if (unlikely(task_in_memcg_oom(current)))"  before as
> it is set in the interrupt context.
>
> That's an obviously issue. Do you understand ?
>

Not really. I couldn't find where current->memcg_in_oom can be set in
the interrupt context.

> >> 2.  If the interrupted task was exiting or a fatal signal is  pending
> >> or oom victim,
> >>      it will directly goto force and 0 is returned, and then
> >> mem_cgroup_charge_skmem() will return true.
> >>      But mem_cgroup_charge_skmem() maybe need to try the second time
> >> and return false.
> >>
> >> That are all unexpected behavoir.
> >>
> >
> > Yes, this is inconsistent behavior. Can you explain how this will
> > affect network traffic? Basically mem_cgroup_charge_skmem() was
> > supposed to return false but sometime based on the interrupted task,
> > mem_cgroup_charge_skmem() returns true. How is this behavior bad for
> > network traffic?
> >
>
> You could see the funtion  __sk_mem_raise_allocated().
> If mem_cgroup_charge_skmem() return false, it will goto
> suppress_allocation and uncharge skmem,
> while when mem_cgroup_charge_skmem() return true, it will charge skmem
> sucessfully.
>
> The consequence behavior is  sk_rmem_schedule may fail while it should sucess.
> And then it will call  tcp_prune_queue() and tcp collapse may take a long time.
>

Is that a good thing or bad? From what I understand with your change
if charge fails, sk_rmem_schedule will always fail. However without
your change the interrupted task's state might help sk_rmem_schedule
to pass. I am all for consistent behavior but I wanted to make sure if
that is what you are aiming for.

Anyways, from what I remember Facebook is using the cgroup-v2's tcpmem
accounting. Johannes or Roman can shed some light if they have
observed this issue in production and might have opinion on how to
solve it.

thanks,
Shakeel
Yafang Shao July 16, 2018, 3:38 a.m. UTC | #11
On Mon, Jul 16, 2018 at 11:09 AM, Shakeel Butt <shakeelb@google.com> wrote:
> On Sun, Jul 15, 2018 at 6:50 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>>
>> On Sun, Jul 15, 2018 at 11:04 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> > On Sun, Jul 15, 2018 at 1:02 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >>
>> >> On Sun, Jul 15, 2018 at 2:34 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> >> > On Sat, Jul 14, 2018 at 10:26 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >> >>
>> >> >> On Sun, Jul 15, 2018 at 12:25 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> >> >> > On Sat, Jul 14, 2018 at 7:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >> >> >>
>> >> >> >> On Sat, Jul 14, 2018 at 11:38 PM, Shakeel Butt <shakeelb@google.com> wrote:
>> >> >> >> > On Sat, Jul 14, 2018 at 1:32 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>> >> >> >> >>
>> >> >> >> >> try_charge maybe executed in packet receive path, which is in interrupt
>> >> >> >> >> context.
>> >> >> >> >> In this situation, the 'current' is the interrupted task, which may has
>> >> >> >> >> no relation to the rx softirq, So it is nonsense to use 'current'.
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > Have you actually seen this occurring?
>> >> >> >>
>> >> >> >> Hi Shakeel,
>> >> >> >>
>> >> >> >> I'm trying to produce this issue, but haven't find it occur yet.
>> >> >> >>
>> >> >> >> > I am not very familiar with the
>> >> >> >> > network code but I can think of two ways try_charge() can be called
>> >> >> >> > from network code. Either through kmem charging or through
>> >> >> >> > mem_cgroup_charge_skmem() and both locations correctly handle
>> >> >> >> > interrupt context.
>> >> >> >> >
>> >> >> >>
>> >> >> >> Why do you say that mem_cgroup_charge_skmem() correctly hanle
>> >> >> >> interrupt context ?
>> >> >> >>
>> >> >> >> Let me show you why mem_cgroup_charge_skmem isn't hanling interrupt
>> >> >> >> context correctly.
>> >> >> >>
>> >> >> >> mem_cgroup_charge_skmem() is calling  try_charge() twice.
>> >> >> >> The first one is with GFP_NOWAIT as the gfp_mask, and the second one
>> >> >> >> is with  (GFP_NOWAIT |  __GFP_NOFAIL) as the gfp_mask.
>> >> >> >>
>> >> >> >> If page_counter_try_charge() failes at the first time, -ENOMEM is returned.
>> >> >> >> Then mem_cgroup_charge_skmem() will call try_charge() once more with
>> >> >> >> __GFP_NOFAIL set, and this time if If page_counter_try_charge() failes
>> >> >> >> again the '
>> >> >> >> force' label in  try_charge() will be executed and 0 is returned.
>> >> >> >>
>> >> >> >> No matter what, the 'current' will be used and touched, that is
>> >> >> >> meaning mem_cgroup_charge_skmem() isn't hanling the interrupt context
>> >> >> >> correctly.
>> >> >> >>
>> >> >> >
>> >> >> > Hi Yafang,
>> >> >> >
>> >> >> > If you check mem_cgroup_charge_skmem(), the memcg passed is not
>> >> >> > 'current' but is from the sock object i.e. sk->sk_memcg for which the
>> >> >> > network buffer is allocated for.
>> >> >> >
>> >> >>
>> >> >> That's correct, the memcg if from the sock object.
>> >> >> But the point is, in this situation why 'current' is used in try_charge() ?
>> >> >> As 'current' is not related with the memcg, which is just a interrupted task.
>> >> >>
>> >> >
>> >> > Hmm so you mean the behavior of memcg charging in the interrupt
>> >> > context depends on the state of the interrupted task.
>> >>
>> >> Yes.
>> >>
>> >> > As you have
>> >> > noted, mem_cgroup_charge_skmem() tries charging again with
>> >> > __GFP_NOFAIL and the charge succeeds. Basically the memcg charging by
>> >> > mem_cgroup_charge_skmem() will always succeed irrespective of the
>> >> > state of the interrupted task. However mem_cgroup_charge_skmem() can
>> >> > return true if the interrupted task was exiting or a fatal signal is
>> >> > pending or oom victim or reclaiming memory. Can you please explain why
>> >> > this is bad?
>> >> >
>> >>
>> >> Let me show you the possible issues cause by this behavoir.
>> >> 1.  In mem_cgroup_oom(), some  members in 'current' is set.
>> >>      That means an innocent task will be in  task_in_memcg_oom state.
>> >>      But this task may be in a different memcg, I mean the memcg of
>> >> the 'current' may be differenct with the sk->sk_memcg.
>> >>      Then when this innocent 'current' do try_charge it will hit  "if
>> >> (unlikely(task_in_memcg_oom(current)))" and  -ENOMEM is returned,
>> >> While there're maybe some free memory (or some memory could be freed )
>> >> in the memcg of the innocent 'task'.
>> >>
>> >
>> > No memory will be freed as try_charge() is in interrupt context.
>> >
>>
>> I mean when this interrupted 'current' is running, that's in process context.
>> In process context it should call try_to_free_mem_cgroup_pages() to
>> free some memory,
>> but it will hit "if (unlikely(task_in_memcg_oom(current)))"  before as
>> it is set in the interrupt context.
>>
>> That's an obviously issue. Do you understand ?
>>
>
> Not really. I couldn't find where current->memcg_in_oom can be set in
> the interrupt context.
>

You are right. current->memcg_in_oom can't be set in the interrupt context.

>> >> 2.  If the interrupted task was exiting or a fatal signal is  pending
>> >> or oom victim,
>> >>      it will directly goto force and 0 is returned, and then
>> >> mem_cgroup_charge_skmem() will return true.
>> >>      But mem_cgroup_charge_skmem() maybe need to try the second time
>> >> and return false.
>> >>
>> >> That are all unexpected behavoir.
>> >>
>> >
>> > Yes, this is inconsistent behavior. Can you explain how this will
>> > affect network traffic? Basically mem_cgroup_charge_skmem() was
>> > supposed to return false but sometime based on the interrupted task,
>> > mem_cgroup_charge_skmem() returns true. How is this behavior bad for
>> > network traffic?
>> >
>>
>> You could see the funtion  __sk_mem_raise_allocated().
>> If mem_cgroup_charge_skmem() return false, it will goto
>> suppress_allocation and uncharge skmem,
>> while when mem_cgroup_charge_skmem() return true, it will charge skmem
>> sucessfully.
>>
>> The consequence behavior is  sk_rmem_schedule may fail while it should sucess.
>> And then it will call  tcp_prune_queue() and tcp collapse may take a long time.
>>
>
> Is that a good thing or bad?
> From what I understand with your change
> if charge fails, sk_rmem_schedule will always fail. However without
> your change the interrupted task's state might help sk_rmem_schedule
> to pass. I am all for consistent behavior but I wanted to make sure if
> that is what you are aiming for.
>

Yes, with this change it will always fail. Without this change it may
sucess depends on the  interrupted task's state.
My previous statement makes some mistake.

I have no clear idea it is bad or good. That's why I'm trying to
produce the issue now.
But I think that we should avoid this unexpected behavior due to state
of the random interrupted task.

> Anyways, from what I remember Facebook is using the cgroup-v2's tcpmem
> accounting. Johannes or Roman can shed some light if they have
> observed this issue in production and might have opinion on how to
> solve it.
>
> thanks,
> Shakeel
Michal Hocko July 16, 2018, 7:58 a.m. UTC | #12
On Sat 14-07-18 16:32:02, Yafang Shao wrote:
> try_charge maybe executed in packet receive path, which is in interrupt
> context.
> In this situation, the 'current' is the interrupted task, which may has
> no relation to the rx softirq, So it is nonsense to use 'current'.
> 
> Avoid bothering the interrupted if page_counter_try_charge failes.

I agree with Shakeel that this changelog asks for more information about
"why it matters". Small inconsistencies should be tolerable because the
state we rely on is so rarely set that it shouldn't make a visible
difference in practice.

> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  mm/memcontrol.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 68ef266..13f95db 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2123,6 +2123,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  		goto retry;
>  	}
>  
> +	if (in_softirq())
> +		goto nomem;
> +

If anything would it make more sense to use in_interrupt() to be more
bullet proof for future?

>  	/*
>  	 * Unlike in global OOM situations, memcg is not in a physical
>  	 * memory shortage.  Allow dying and OOM-killed tasks to
> -- 
> 1.8.3.1
Yafang Shao July 16, 2018, 9:45 a.m. UTC | #13
On Mon, Jul 16, 2018 at 3:58 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Sat 14-07-18 16:32:02, Yafang Shao wrote:
>> try_charge maybe executed in packet receive path, which is in interrupt
>> context.
>> In this situation, the 'current' is the interrupted task, which may has
>> no relation to the rx softirq, So it is nonsense to use 'current'.
>>
>> Avoid bothering the interrupted if page_counter_try_charge failes.
>
> I agree with Shakeel that this changelog asks for more information about
> "why it matters". Small inconsistencies should be tolerable because the
> state we rely on is so rarely set that it shouldn't make a visible
> difference in practice.
>

HI Michal,

No, it can make a visible difference in pratice.
The difference is in __sk_mem_raise_allocated().

Without this patch, if the random interrupted task is oom victim or
fatal signal pending or exiting, the charge will success anyway. That
means the cgroup limit doesn't work in this situation.

With this patch, in the same situation the charged memory will be
uncharged as it hits the memcg limit.

That is okay if the memcg of the interrupted task is same with the
sk->sk_memcg,  but it may not okay if they are difference.

I'm trying to prove it, but seems it's very hard to produce this issue.

>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>> ---
>>  mm/memcontrol.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 68ef266..13f95db 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2123,6 +2123,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>>               goto retry;
>>       }
>>
>> +     if (in_softirq())
>> +             goto nomem;
>> +
>
> If anything would it make more sense to use in_interrupt() to be more
> bullet proof for future?
>
>>       /*
>>        * Unlike in global OOM situations, memcg is not in a physical
>>        * memory shortage.  Allow dying and OOM-killed tasks to
>> --
>> 1.8.3.1
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko July 16, 2018, 11:08 a.m. UTC | #14
On Mon 16-07-18 17:45:06, Yafang Shao wrote:
> On Mon, Jul 16, 2018 at 3:58 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Sat 14-07-18 16:32:02, Yafang Shao wrote:
> >> try_charge maybe executed in packet receive path, which is in interrupt
> >> context.
> >> In this situation, the 'current' is the interrupted task, which may has
> >> no relation to the rx softirq, So it is nonsense to use 'current'.
> >>
> >> Avoid bothering the interrupted if page_counter_try_charge failes.
> >
> > I agree with Shakeel that this changelog asks for more information about
> > "why it matters". Small inconsistencies should be tolerable because the
> > state we rely on is so rarely set that it shouldn't make a visible
> > difference in practice.
> >
> 
> HI Michal,
> 
> No, it can make a visible difference in pratice.
> The difference is in __sk_mem_raise_allocated().
> 
> Without this patch, if the random interrupted task is oom victim or
> fatal signal pending or exiting, the charge will success anyway. That
> means the cgroup limit doesn't work in this situation.
> 
> With this patch, in the same situation the charged memory will be
> uncharged as it hits the memcg limit.
> 
> That is okay if the memcg of the interrupted task is same with the
> sk->sk_memcg,  but it may not okay if they are difference.
> 
> I'm trying to prove it, but seems it's very hard to produce this issue.

So it is possible that this is so marginal that it doesn't make any
_practical_ impact after all.
diff mbox

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 68ef266..13f95db 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2123,6 +2123,9 @@  static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		goto retry;
 	}
 
+	if (in_softirq())
+		goto nomem;
+
 	/*
 	 * Unlike in global OOM situations, memcg is not in a physical
 	 * memory shortage.  Allow dying and OOM-killed tasks to