Message ID | 20230109151631.24923-3-mgorman@techsingularity.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Discard __GFP_ATOMIC | expand |
On Mon 09-01-23 15:16:26, Mel Gorman wrote: > RT tasks are allowed to dip below the min reserve but ALLOC_HARDER is > typically combined with ALLOC_MIN_RESERVE so RT tasks are a little > unusual. While there is some justification for allowing RT tasks > access to memory reserves, there is a strong chance that a RT task > that is also under memory pressure is at risk of missing deadlines > anyway. Relax how much reserves an RT task can access by treating > it the same as __GFP_HIGH allocations. TBH, I would much rather drop the RT special casing here. As you say if a RT task need to dip into memory reserves it is either already too late because the execution is already under RT constrains or this is init phase where the reclaim is not a problem yet. I have tried to trace down this special case and only found a patch from Robert Love from 2003 which says: : - Let real-time tasks dip further into the reserves than usual in : __alloc_pages(). There are a lot of ways to special case this. This : patch just cuts z->pages_low in half, before doing the incremental min : thing, for real-time tasks. I do not do anything in the low memory slow : path. We can be a _lot_ more aggressive if we want. Right now, we just : give real-time tasks a little help. This doesn't really explain why this is needed. We are really great at preserving a behavior and cementing it for future generations. Maybe we should just drop it and see if something breaks. We would get some reasoning at least finally. So I am not opposed to the patch per se but I would much rather see this branch go away. If you want me I can condense the above into a changelog and send a patch (either on top of this one or replacing it). WDYT? > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 244c1e675dc8..0040b4e00913 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4847,7 +4847,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > */ > alloc_flags &= ~ALLOC_CPUSET; > } else if (unlikely(rt_task(current)) && in_task()) > - alloc_flags |= ALLOC_HARDER; > + alloc_flags |= ALLOC_MIN_RESERVE; > > alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags); > > -- > 2.35.3
On Wed, Jan 11, 2023 at 04:27:29PM +0100, Michal Hocko wrote: > On Mon 09-01-23 15:16:26, Mel Gorman wrote: > > RT tasks are allowed to dip below the min reserve but ALLOC_HARDER is > > typically combined with ALLOC_MIN_RESERVE so RT tasks are a little > > unusual. While there is some justification for allowing RT tasks > > access to memory reserves, there is a strong chance that a RT task > > that is also under memory pressure is at risk of missing deadlines > > anyway. Relax how much reserves an RT task can access by treating > > it the same as __GFP_HIGH allocations. > > TBH, I would much rather drop the RT special casing here. As you say if > a RT task need to dip into memory reserves it is either already too late > because the execution is already under RT constrains or this is init > phase where the reclaim is not a problem yet. > I completely agree. I included it in the changelog because I was tempted to delete it now. I'm wary that the series will result in some allocation failure bug reports and so played it cautious. Hard realtime tasks should be locking down resources in advance. Even a soft-realtime task like audio or video live decoding which cannot jitter should be allocating both memory and any disk space required up-front before the recording starts instead of relying on reserves. At best, reserve access will only delay the problem by a very short interval. > I have tried to trace down this special case and only found a patch from > Robert Love from 2003 which says: > : - Let real-time tasks dip further into the reserves than usual in > : __alloc_pages(). There are a lot of ways to special case this. This > : patch just cuts z->pages_low in half, before doing the incremental min > : thing, for real-time tasks. I do not do anything in the low memory slow > : path. We can be a _lot_ more aggressive if we want. Right now, we just > : give real-time tasks a little help. > > This doesn't really explain why this is needed. > No, it does not but I'm not willing to complain either. 20 years ago, it might have been completely reasonable. > We are really great at preserving a behavior and cementing it for > future generations. Maybe we should just drop it and see if something > breaks. We would get some reasoning at least finally. > > So I am not opposed to the patch per se but I would much rather see this > branch go away. If you want me I can condense the above into a changelog > and send a patch (either on top of this one or replacing it). WDYT? > I agree with you but given the risk of bisections hitting this series, would you be opposed to delaying the removal by 1 kernel release? That way bisections for failures will hit 6.3 and a single commit or at least just a report against 6.3. That would mitigate the risk of a full revert of the series. I can add a note to the changelog mentioning the expected removal so git blame will also highlight it. > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> > > Acked-by: Michal Hocko <mhocko@suse.com> > Thanks.
On Thu 12-01-23 09:36:23, Mel Gorman wrote: [...] > I agree with you but given the risk of bisections hitting this series, > would you be opposed to delaying the removal by 1 kernel release? That > way bisections for failures will hit 6.3 and a single commit or at least > just a report against 6.3. That would mitigate the risk of a full revert > of the series. I can add a note to the changelog mentioning the expected > removal so git blame will also highlight it. Sure. I will post the removal on top of your series and put myself into the "wait for regression chair".
On Thu, Jan 12, 2023 at 10:47:24AM +0100, Michal Hocko wrote: > On Thu 12-01-23 09:36:23, Mel Gorman wrote: > [...] > > I agree with you but given the risk of bisections hitting this series, > > would you be opposed to delaying the removal by 1 kernel release? That > > way bisections for failures will hit 6.3 and a single commit or at least > > just a report against 6.3. That would mitigate the risk of a full revert > > of the series. I can add a note to the changelog mentioning the expected > > removal so git blame will also highlight it. > > Sure. I will post the removal on top of your series and put myself into > the "wait for regression chair". > I'm happy to sit in the same chair and send the patch. If there is an example of an RT task actually caring about memory reserves, I'd like to determine if it's a real problem or a badly designed RT application.
From: Mel Gorman > Sent: 12 January 2023 09:36 ... > Hard realtime tasks should be locking down resources in advance. Even a > soft-realtime task like audio or video live decoding which cannot jitter > should be allocating both memory and any disk space required up-front > before the recording starts instead of relying on reserves. At best, > reserve access will only delay the problem by a very short interval. Or, at least, ensuring the system isn't memory limited. The biggest effect on RT task latency/jitter (on a normal kernel) is hardware interrupt and softint code 'stealing' the cpu. The main 'culprit' being ethernet receive. Unfortunately if you are doing RTP audio (UDP data) you absolutely need the ethernet receive to run. When the softint code decides to drop back to a normal priority kernel worker thread packets get lost. (I've been running 10000 RTP streams - with 10k receive UDP sockets.) So I doubt avoiding sleeps in kmalloc() is going to make a significant difference. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Jan 13, 2023 at 09:04:50AM +0000, David Laight wrote: > From: Mel Gorman > > Sent: 12 January 2023 09:36 > ... > > Hard realtime tasks should be locking down resources in advance. Even a > > soft-realtime task like audio or video live decoding which cannot jitter > > should be allocating both memory and any disk space required up-front > > before the recording starts instead of relying on reserves. At best, > > reserve access will only delay the problem by a very short interval. > > Or, at least, ensuring the system isn't memory limited. > Added to changelog. > The biggest effect on RT task latency/jitter (on a normal kernel) > is hardware interrupt and softint code 'stealing' the cpu. > The main 'culprit' being ethernet receive. > > Unfortunately if you are doing RTP audio (UDP data) you absolutely > need the ethernet receive to run. When the softint code decides > to drop back to a normal priority kernel worker thread packets > get lost. > Yes, although this is a fundamental problem for background networking processing in general IIUC that is independent of mm/. ksoftirqd may be getting stalled behind a higher priority, a realtime task, a task that has built a credit due to sleep time under GENTLE_FAIR_SLEEPERS relative to ksoftirqd etc. As a normal low-priority CPU hog may be at the same priority as ksoftirqd, it can use enough of the scheduler slice for the runqueue to cause an indirect priority inversion if a high priority task is sleeping waiting on network traffic it needs ASAP that's stalled on ksoftirqd. I didn't check for other examples but the only one I'm aware of that boosts ksoftirq priority is during rcutorture (see rcutorture_booster_init). A quick glance doesn't show any possibility of boosting ksoftirqd priority temporarily if dealing with NET_TX_SOFTIRQ although it might be an interesting idea if it was demonstrated with a realistic (or at least semi-realistic) test case that priority inversion can occur due to background RX processing. It's not even PREEMPT_RT-specific, I suspect it's a general problem. I don't think this series makes a difference because if any of the critical tasks are depending on the memory reserves, they're already in serious trouble. > (I've been running 10000 RTP streams - with 10k receive UDP sockets.) > min_reserve access isn't going to fix any stalls with that volume of streams :D > So I doubt avoiding sleeps in kmalloc() is going to make a > significant difference. > Agreed.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 244c1e675dc8..0040b4e00913 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4847,7 +4847,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) */ alloc_flags &= ~ALLOC_CPUSET; } else if (unlikely(rt_task(current)) && in_task()) - alloc_flags |= ALLOC_HARDER; + alloc_flags |= ALLOC_MIN_RESERVE; alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);