Message ID | 53FFF2FB.20706@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2014/8/29 11:26, Junxiao Bi wrote: > On 08/28/2014 04:16 PM, Xue jiufei wrote: >> Hi Junxiao, >> On 2014/8/25 9:50, Junxiao Bi wrote: >>> Hi Jiufei, >>> >>> Maybe you can consider using PF_FSTRANS flag, set this flag before >>> allocating memory with GFP_KERNEL flag and unset after the allocation. >>> Checking this flag in ocfs2 when trying to free some pages during memory >>> direct reclaim. See an example from upstream commit >>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in >>> releasepage if we're freeing memory for fs-related reasons) . >>> >>> Thanks, >>> Junxiao. >>> >> Thank you very much for your suggestion. But in our situation, >> o2net_wq is evicting inode during memory direct reclaim, which cannot >> return error or do nothing because vfs would destroy_inode after evict, >> but we haven't drop inode lock yet. > How about checking the flag in vfs like this? And you can set PF_FSTRANS > flag in o2net_wq context where GFP_NOFS flag can't be set. > > > commit 8d27fdec5ce234d2f02e4582d340d231396b92af > Author: Junxiao Bi <junxiao.bi@oracle.com> > Date: Fri Aug 29 11:05:25 2014 +0800 > > super: stop shrinker for processes with PF_FSTRANS flag > > For some cluster fs, like ocfs2, it may be impossible to > set GFP_NOFS for some memory allocation, as the allocation > is in network common code, like sock_alloc() and in this > case, the shrinker will call back into the fs and cause > deadlock when available memory is not enough. > > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > > diff --git a/fs/super.c b/fs/super.c > index b9a214d..c4a8dc1 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker > *shrink, > if (!(sc->gfp_mask & __GFP_FS)) > return SHRINK_STOP; > > + if (current->flags & PF_FSTRANS) > + return SHRINK_STOP; > + > if (!grab_super_passive(sb)) > return SHRINK_STOP; > > > Thanks, > Junxiao. > Yes, this patch can resolve our problem. Thanks a lot. Have you send this patch to fs-devel list? >> >> Thanks >> Xuejiufei >> >>> On 08/22/2014 04:30 PM, Xue jiufei wrote: >>>> On 2014/8/20 11:57, Xue jiufei wrote: >>>>> Hi all, >>>>> We found there may exist a deadlock when system has not sufficient >>>>> memory. Here's the situation: >>>>> N1 N2 >>>>> send message to N1 >>>>> o2net_wq(kworker) >>>>> receiving message and call corresponding >>>>> handler to handle this message. It may >>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL). >>>>> but there's no sufficient memory, lower then >>>>> min watermark. So it wakeup kswapd to reclaim memory >>>>> and itself may also call >>>>> __alloc_pages_direct_reclaim(), trying to >>>>> free some pages. >>>>> >>>>> It tries to free ocfs2 inode >>>>> cache and calls ocfs2_drop_lock()->dlmunlock() >>>>> to drop inode lock, sending unlock message to master, >>>>> say N2. When reply comes, queue sc_rx_work and >>>>> wait o2net_wq to handle this work. however >>>>> o2net_wq is still handling last message, so can not >>>>> process the reply message. It will wait >>>>> o2net_nsw_completed() in o2net_send_message_vec() >>>>> forever. >>>>> Kswapd thread enconter the same situation. >>>>> >>>>> >>>>> So is there any advice to solve this deadlock? >>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag? >>>>> >>>>> Thanks. >>>>> >>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC >>>> in all handlers and return ENOMEM to peer when failed. The peer will >>>> try to resend the message again, o2net_wq can handle other messages. >>>> However, it can not solve all problems. For example, if o2net_wq is >>>> processing sc_connect_work which would call sock_alloc_inode() to alloc >>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter >>>> reclaim progress, it also trigger the deadlock. We can not change this >>>> alloc flag. >>>> We have no idea about it. Is there any better ideas. >>>> Thanks very much. >>>> xuejiufei >>>>> _______________________________________________ >>>>> Ocfs2-devel mailing list >>>>> Ocfs2-devel@oss.oracle.com >>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ocfs2-devel mailing list >>>> Ocfs2-devel@oss.oracle.com >>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>> >>> >>> . >>> >> >> > > . >
On 08/29/2014 03:22 PM, Xue jiufei wrote: > On 2014/8/29 11:26, Junxiao Bi wrote: >> On 08/28/2014 04:16 PM, Xue jiufei wrote: >>> Hi Junxiao, >>> On 2014/8/25 9:50, Junxiao Bi wrote: >>>> Hi Jiufei, >>>> >>>> Maybe you can consider using PF_FSTRANS flag, set this flag before >>>> allocating memory with GFP_KERNEL flag and unset after the allocation. >>>> Checking this flag in ocfs2 when trying to free some pages during memory >>>> direct reclaim. See an example from upstream commit >>>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in >>>> releasepage if we're freeing memory for fs-related reasons) . >>>> >>>> Thanks, >>>> Junxiao. >>>> >>> Thank you very much for your suggestion. But in our situation, >>> o2net_wq is evicting inode during memory direct reclaim, which cannot >>> return error or do nothing because vfs would destroy_inode after evict, >>> but we haven't drop inode lock yet. >> How about checking the flag in vfs like this? And you can set PF_FSTRANS >> flag in o2net_wq context where GFP_NOFS flag can't be set. >> >> >> commit 8d27fdec5ce234d2f02e4582d340d231396b92af >> Author: Junxiao Bi <junxiao.bi@oracle.com> >> Date: Fri Aug 29 11:05:25 2014 +0800 >> >> super: stop shrinker for processes with PF_FSTRANS flag >> >> For some cluster fs, like ocfs2, it may be impossible to >> set GFP_NOFS for some memory allocation, as the allocation >> is in network common code, like sock_alloc() and in this >> case, the shrinker will call back into the fs and cause >> deadlock when available memory is not enough. >> >> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >> >> diff --git a/fs/super.c b/fs/super.c >> index b9a214d..c4a8dc1 100644 >> --- a/fs/super.c >> +++ b/fs/super.c >> @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker >> *shrink, >> if (!(sc->gfp_mask & __GFP_FS)) >> return SHRINK_STOP; >> >> + if (current->flags & PF_FSTRANS) >> + return SHRINK_STOP; >> + >> if (!grab_super_passive(sb)) >> return SHRINK_STOP; >> >> >> Thanks, >> Junxiao. >> > Yes, this patch can resolve our problem. Thanks a lot. > Have you send this patch to fs-devel list? No. May you send it with your ocfs2 fix? I think that is more convincing with the ocfs2 deadlock case. I will monitor it if there is any concern about it. Thanks, Junxiao. >>> >>> Thanks >>> Xuejiufei >>> >>>> On 08/22/2014 04:30 PM, Xue jiufei wrote: >>>>> On 2014/8/20 11:57, Xue jiufei wrote: >>>>>> Hi all, >>>>>> We found there may exist a deadlock when system has not sufficient >>>>>> memory. Here's the situation: >>>>>> N1 N2 >>>>>> send message to N1 >>>>>> o2net_wq(kworker) >>>>>> receiving message and call corresponding >>>>>> handler to handle this message. It may >>>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL). >>>>>> but there's no sufficient memory, lower then >>>>>> min watermark. So it wakeup kswapd to reclaim memory >>>>>> and itself may also call >>>>>> __alloc_pages_direct_reclaim(), trying to >>>>>> free some pages. >>>>>> >>>>>> It tries to free ocfs2 inode >>>>>> cache and calls ocfs2_drop_lock()->dlmunlock() >>>>>> to drop inode lock, sending unlock message to master, >>>>>> say N2. When reply comes, queue sc_rx_work and >>>>>> wait o2net_wq to handle this work. however >>>>>> o2net_wq is still handling last message, so can not >>>>>> process the reply message. It will wait >>>>>> o2net_nsw_completed() in o2net_send_message_vec() >>>>>> forever. >>>>>> Kswapd thread enconter the same situation. >>>>>> >>>>>> >>>>>> So is there any advice to solve this deadlock? >>>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag? >>>>>> >>>>>> Thanks. >>>>>> >>>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC >>>>> in all handlers and return ENOMEM to peer when failed. The peer will >>>>> try to resend the message again, o2net_wq can handle other messages. >>>>> However, it can not solve all problems. For example, if o2net_wq is >>>>> processing sc_connect_work which would call sock_alloc_inode() to alloc >>>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter >>>>> reclaim progress, it also trigger the deadlock. We can not change this >>>>> alloc flag. >>>>> We have no idea about it. Is there any better ideas. >>>>> Thanks very much. >>>>> xuejiufei >>>>>> _______________________________________________ >>>>>> Ocfs2-devel mailing list >>>>>> Ocfs2-devel@oss.oracle.com >>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ocfs2-devel mailing list >>>>> Ocfs2-devel@oss.oracle.com >>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>>> >>>> >>>> . >>>> >>> >>> >> >> . >> > >
diff --git a/fs/super.c b/fs/super.c index b9a214d..c4a8dc1 100644 --- a/fs/super.c +++ b/fs/super.c @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker *shrink, if (!(sc->gfp_mask & __GFP_FS)) return SHRINK_STOP; + if (current->flags & PF_FSTRANS) + return SHRINK_STOP; + if (!grab_super_passive(sb)) return SHRINK_STOP;