diff mbox

A deadlock when system do not has sufficient memory

Message ID 53FFF2FB.20706@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Junxiao Bi Aug. 29, 2014, 3:26 a.m. UTC
On 08/28/2014 04:16 PM, Xue jiufei wrote:
> Hi Junxiao,
> On 2014/8/25 9:50, Junxiao Bi wrote:
>> Hi Jiufei,
>>
>> Maybe you can consider using PF_FSTRANS flag, set this flag before
>> allocating memory with GFP_KERNEL flag and unset after the allocation.
>> Checking this flag in ocfs2 when trying to free some pages during memory
>> direct reclaim. See an example from upstream commit
>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
>> releasepage if we're freeing memory for fs-related reasons) .
>>
>> Thanks,
>> Junxiao.
>>
> Thank you very much for your suggestion. But in our situation,
> o2net_wq is evicting inode during memory direct reclaim, which cannot
> return error or do nothing because vfs would destroy_inode after evict,
> but we haven't drop inode lock yet.
How about checking the flag in vfs like this? And you can set PF_FSTRANS
flag in o2net_wq context where GFP_NOFS flag can't be set.


commit 8d27fdec5ce234d2f02e4582d340d231396b92af
Author: Junxiao Bi <junxiao.bi@oracle.com>
Date:   Fri Aug 29 11:05:25 2014 +0800

    super: stop shrinker for processes with PF_FSTRANS flag

    For some cluster fs, like ocfs2, it may be impossible to
    set GFP_NOFS for some memory allocation, as the allocation
    is in network common code, like sock_alloc() and in this
    case, the shrinker will call back into the fs and cause
    deadlock when available memory is not enough.

    Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>



Thanks,
Junxiao.

> 
> Thanks
> Xuejiufei
> 
>> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>>> On 2014/8/20 11:57, Xue jiufei wrote:
>>>> Hi all,
>>>> We found there may exist a deadlock when system has not sufficient
>>>> memory. Here's the situation:
>>>>             N1                                      N2
>>>>                                              send message to N1
>>>>       o2net_wq(kworker)
>>>> receiving message and call corresponding
>>>> handler to handle this message. It may 
>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>>> but there's no sufficient memory, lower then
>>>> min watermark. So it wakeup kswapd to reclaim memory
>>>> and itself may also call
>>>> __alloc_pages_direct_reclaim(), trying to
>>>> free some pages.
>>>>
>>>> It tries to free ocfs2 inode
>>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>>> to drop inode lock, sending unlock message to master,
>>>> say N2. When reply comes, queue sc_rx_work and
>>>> wait o2net_wq to handle this work. however
>>>> o2net_wq is still handling last message, so can not 
>>>> process the reply message. It will wait
>>>> o2net_nsw_completed() in o2net_send_message_vec()
>>>> forever. 
>>>> Kswapd thread enconter the same situation.
>>>>
>>>>
>>>> So is there any advice to solve this deadlock?
>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>>
>>>> Thanks.
>>>>
>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>>> in all handlers and return ENOMEM to peer when failed. The peer will
>>> try to resend the message again, o2net_wq can handle other messages.
>>> However, it can not solve all problems. For example, if o2net_wq is
>>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>>> reclaim progress, it also trigger the deadlock. We can not change this
>>> alloc flag.
>>> We have no idea about it. Is there any better ideas. 
>>> Thanks very much.
>>> xuejiufei
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel@oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
>> .
>>
> 
>

Comments

Xue jiufei Aug. 29, 2014, 7:22 a.m. UTC | #1
On 2014/8/29 11:26, Junxiao Bi wrote:
> On 08/28/2014 04:16 PM, Xue jiufei wrote:
>> Hi Junxiao,
>> On 2014/8/25 9:50, Junxiao Bi wrote:
>>> Hi Jiufei,
>>>
>>> Maybe you can consider using PF_FSTRANS flag, set this flag before
>>> allocating memory with GFP_KERNEL flag and unset after the allocation.
>>> Checking this flag in ocfs2 when trying to free some pages during memory
>>> direct reclaim. See an example from upstream commit
>>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
>>> releasepage if we're freeing memory for fs-related reasons) .
>>>
>>> Thanks,
>>> Junxiao.
>>>
>> Thank you very much for your suggestion. But in our situation,
>> o2net_wq is evicting inode during memory direct reclaim, which cannot
>> return error or do nothing because vfs would destroy_inode after evict,
>> but we haven't drop inode lock yet.
> How about checking the flag in vfs like this? And you can set PF_FSTRANS
> flag in o2net_wq context where GFP_NOFS flag can't be set.
> 
> 
> commit 8d27fdec5ce234d2f02e4582d340d231396b92af
> Author: Junxiao Bi <junxiao.bi@oracle.com>
> Date:   Fri Aug 29 11:05:25 2014 +0800
> 
>     super: stop shrinker for processes with PF_FSTRANS flag
> 
>     For some cluster fs, like ocfs2, it may be impossible to
>     set GFP_NOFS for some memory allocation, as the allocation
>     is in network common code, like sock_alloc() and in this
>     case, the shrinker will call back into the fs and cause
>     deadlock when available memory is not enough.
> 
>     Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> 
> diff --git a/fs/super.c b/fs/super.c
> index b9a214d..c4a8dc1 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker
> *shrink,
>         if (!(sc->gfp_mask & __GFP_FS))
>                 return SHRINK_STOP;
> 
> +       if (current->flags & PF_FSTRANS)
> +               return SHRINK_STOP;
> +
>         if (!grab_super_passive(sb))
>                 return SHRINK_STOP;
> 
> 
> Thanks,
> Junxiao.
> 
Yes, this patch can resolve our problem. Thanks a lot.
Have you send this patch to fs-devel list?
>>
>> Thanks
>> Xuejiufei
>>
>>> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>>>> On 2014/8/20 11:57, Xue jiufei wrote:
>>>>> Hi all,
>>>>> We found there may exist a deadlock when system has not sufficient
>>>>> memory. Here's the situation:
>>>>>             N1                                      N2
>>>>>                                              send message to N1
>>>>>       o2net_wq(kworker)
>>>>> receiving message and call corresponding
>>>>> handler to handle this message. It may 
>>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>>>> but there's no sufficient memory, lower then
>>>>> min watermark. So it wakeup kswapd to reclaim memory
>>>>> and itself may also call
>>>>> __alloc_pages_direct_reclaim(), trying to
>>>>> free some pages.
>>>>>
>>>>> It tries to free ocfs2 inode
>>>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>>>> to drop inode lock, sending unlock message to master,
>>>>> say N2. When reply comes, queue sc_rx_work and
>>>>> wait o2net_wq to handle this work. however
>>>>> o2net_wq is still handling last message, so can not 
>>>>> process the reply message. It will wait
>>>>> o2net_nsw_completed() in o2net_send_message_vec()
>>>>> forever. 
>>>>> Kswapd thread enconter the same situation.
>>>>>
>>>>>
>>>>> So is there any advice to solve this deadlock?
>>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>>>
>>>>> Thanks.
>>>>>
>>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>>>> in all handlers and return ENOMEM to peer when failed. The peer will
>>>> try to resend the message again, o2net_wq can handle other messages.
>>>> However, it can not solve all problems. For example, if o2net_wq is
>>>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>>>> reclaim progress, it also trigger the deadlock. We can not change this
>>>> alloc flag.
>>>> We have no idea about it. Is there any better ideas. 
>>>> Thanks very much.
>>>> xuejiufei
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel@oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel@oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>
>>>
>>> .
>>>
>>
>>
> 
> .
>
Junxiao Bi Aug. 29, 2014, 7:30 a.m. UTC | #2
On 08/29/2014 03:22 PM, Xue jiufei wrote:
> On 2014/8/29 11:26, Junxiao Bi wrote:
>> On 08/28/2014 04:16 PM, Xue jiufei wrote:
>>> Hi Junxiao,
>>> On 2014/8/25 9:50, Junxiao Bi wrote:
>>>> Hi Jiufei,
>>>>
>>>> Maybe you can consider using PF_FSTRANS flag, set this flag before
>>>> allocating memory with GFP_KERNEL flag and unset after the allocation.
>>>> Checking this flag in ocfs2 when trying to free some pages during memory
>>>> direct reclaim. See an example from upstream commit
>>>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
>>>> releasepage if we're freeing memory for fs-related reasons) .
>>>>
>>>> Thanks,
>>>> Junxiao.
>>>>
>>> Thank you very much for your suggestion. But in our situation,
>>> o2net_wq is evicting inode during memory direct reclaim, which cannot
>>> return error or do nothing because vfs would destroy_inode after evict,
>>> but we haven't drop inode lock yet.
>> How about checking the flag in vfs like this? And you can set PF_FSTRANS
>> flag in o2net_wq context where GFP_NOFS flag can't be set.
>>
>>
>> commit 8d27fdec5ce234d2f02e4582d340d231396b92af
>> Author: Junxiao Bi <junxiao.bi@oracle.com>
>> Date:   Fri Aug 29 11:05:25 2014 +0800
>>
>>     super: stop shrinker for processes with PF_FSTRANS flag
>>
>>     For some cluster fs, like ocfs2, it may be impossible to
>>     set GFP_NOFS for some memory allocation, as the allocation
>>     is in network common code, like sock_alloc() and in this
>>     case, the shrinker will call back into the fs and cause
>>     deadlock when available memory is not enough.
>>
>>     Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>
>> diff --git a/fs/super.c b/fs/super.c
>> index b9a214d..c4a8dc1 100644
>> --- a/fs/super.c
>> +++ b/fs/super.c
>> @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker
>> *shrink,
>>         if (!(sc->gfp_mask & __GFP_FS))
>>                 return SHRINK_STOP;
>>
>> +       if (current->flags & PF_FSTRANS)
>> +               return SHRINK_STOP;
>> +
>>         if (!grab_super_passive(sb))
>>                 return SHRINK_STOP;
>>
>>
>> Thanks,
>> Junxiao.
>>
> Yes, this patch can resolve our problem. Thanks a lot.
> Have you send this patch to fs-devel list?
No. May you send it with your ocfs2 fix? I think that is more convincing
with the ocfs2 deadlock case. I will monitor it if there is any concern
about it.

Thanks,
Junxiao.
>>>
>>> Thanks
>>> Xuejiufei
>>>
>>>> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>>>>> On 2014/8/20 11:57, Xue jiufei wrote:
>>>>>> Hi all,
>>>>>> We found there may exist a deadlock when system has not sufficient
>>>>>> memory. Here's the situation:
>>>>>>             N1                                      N2
>>>>>>                                              send message to N1
>>>>>>       o2net_wq(kworker)
>>>>>> receiving message and call corresponding
>>>>>> handler to handle this message. It may 
>>>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>>>>> but there's no sufficient memory, lower then
>>>>>> min watermark. So it wakeup kswapd to reclaim memory
>>>>>> and itself may also call
>>>>>> __alloc_pages_direct_reclaim(), trying to
>>>>>> free some pages.
>>>>>>
>>>>>> It tries to free ocfs2 inode
>>>>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>>>>> to drop inode lock, sending unlock message to master,
>>>>>> say N2. When reply comes, queue sc_rx_work and
>>>>>> wait o2net_wq to handle this work. however
>>>>>> o2net_wq is still handling last message, so can not 
>>>>>> process the reply message. It will wait
>>>>>> o2net_nsw_completed() in o2net_send_message_vec()
>>>>>> forever. 
>>>>>> Kswapd thread enconter the same situation.
>>>>>>
>>>>>>
>>>>>> So is there any advice to solve this deadlock?
>>>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>>>>> in all handlers and return ENOMEM to peer when failed. The peer will
>>>>> try to resend the message again, o2net_wq can handle other messages.
>>>>> However, it can not solve all problems. For example, if o2net_wq is
>>>>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>>>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>>>>> reclaim progress, it also trigger the deadlock. We can not change this
>>>>> alloc flag.
>>>>> We have no idea about it. Is there any better ideas. 
>>>>> Thanks very much.
>>>>> xuejiufei
>>>>>> _______________________________________________
>>>>>> Ocfs2-devel mailing list
>>>>>> Ocfs2-devel@oss.oracle.com
>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel@oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
> 
>
diff mbox

Patch

diff --git a/fs/super.c b/fs/super.c
index b9a214d..c4a8dc1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -71,6 +71,9 @@  static unsigned long super_cache_scan(struct shrinker
*shrink,
        if (!(sc->gfp_mask & __GFP_FS))
                return SHRINK_STOP;

+       if (current->flags & PF_FSTRANS)
+               return SHRINK_STOP;
+
        if (!grab_super_passive(sb))
                return SHRINK_STOP;