Message ID | CAPv3WKftqsEXbdU-geAcUKXBSskhA0V72N61a1a+5DfahLK_Dg@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 02, 2016 at 07:48:38AM +0200, Marcin Wojtas wrote: > Hi Will, > > I think I found a right trace. Following one-liner fixes the issue > beginning from v4.2-rc1 up to v4.4 included: > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -294,7 +294,7 @@ static inline bool > early_page_uninitialised(unsigned long pfn) > > static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid) > { > - return false; > + return true; > } > How does that make a difference in v4.4 since commit 974a786e63c96a2401a78ddba926f34c128474f1 removed the only early_page_nid_uninitialised() ? It further doesn't make sense if deferred memory initialisation is not enabled as the pages will always be initialised. > From what I understood, now order-0 allocation keep no reserve at all. Watermarks should still be preserved. zone_watermark_ok is still there. What might change is the size of reserves for high-order atomic allocations only. Fragmentation shouldn't be a factor. I'm missing some major part of the picture.
Hi Mel, 2016-06-02 15:52 GMT+02:00 Mel Gorman <mgorman@techsingularity.net>: > On Thu, Jun 02, 2016 at 07:48:38AM +0200, Marcin Wojtas wrote: >> Hi Will, >> >> I think I found a right trace. Following one-liner fixes the issue >> beginning from v4.2-rc1 up to v4.4 included: >> >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -294,7 +294,7 @@ static inline bool >> early_page_uninitialised(unsigned long pfn) >> >> static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid) >> { >> - return false; >> + return true; >> } >> > > How does that make a difference in v4.4 since commit > 974a786e63c96a2401a78ddba926f34c128474f1 removed the only > early_page_nid_uninitialised() ? It further doesn't make sense if deferred > memory initialisation is not enabled as the pages will always be > initialised. > Right, it should be "v4.3 included". Your changes were merged to v4.4-rc1 and indeed deferred initialization doesn't play a role from then, but the behavior remained identical. >> From what I understood, now order-0 allocation keep no reserve at all. > > Watermarks should still be preserved. zone_watermark_ok is still there. > What might change is the size of reserves for high-order atomic > allocations only. Fragmentation shouldn't be a factor. I'm missing some > major part of the picture. > I CC'ed you in the last email, as I found out your authorship of interesting patches - please see problem description https://lkml.org/lkml/2016/5/30/1056 Anyway when using v4.4.8 baseline, after reverting below patches: 97a16fc - mm, page_alloc: only enforce watermarks for order-0 allocations 0aaa29a - mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand 974a786 - mm, page_alloc: remove MIGRATE_RESERVE + adding early_page_nid_uninitialised() modification I stop receiving page alloc fail dumps like this one http://pastebin.com/FhRW5DsF, also performance in my test looks very similar. I'd like to understand this phenomenon and check if it's possible to avoid such page-alloc-fail hickups in a nice way. Afterwards, once the dumps finish, the kernel remain stable, but is such behavior expected and intended? What interested me from above-mentioned patches is that last-resort migration on page-alloc fail ('retry_reserve') was removed from rmqueue() in patch: 974a786 - mm, page_alloc: remove MIGRATE_RESERVE Also a section next commit log (0aaa29a - mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand) caught my attention - it began from words: "The reserved pageblocks can not be used for order-0 allocations." This is why I understood that for this kind of allocation there is no reserve kept and we need to count on successful reclaim. However under big stress it seems that the mechanism may not be sufficient. Am I interpreting it correctly? For the record: the newest kernel I was able to reproduce the dumps was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, which comprise a lot (mainly yours) changes in mm, and I'm wondering if there may be a spot fix or rather a series of improvements. I'm looking forward to your opinion and would be grateful for any advice. Best regards, Marcin
On Thu, Jun 02, 2016 at 09:01:55PM +0200, Marcin Wojtas wrote: > >> From what I understood, now order-0 allocation keep no reserve at all. > > > > Watermarks should still be preserved. zone_watermark_ok is still there. > > What might change is the size of reserves for high-order atomic > > allocations only. Fragmentation shouldn't be a factor. I'm missing some > > major part of the picture. > > > > I CC'ed you in the last email, as I found out your authorship of > interesting patches - please see problem description > https://lkml.org/lkml/2016/5/30/1056 > > Anyway when using v4.4.8 baseline, after reverting below patches: > 97a16fc - mm, page_alloc: only enforce watermarks for order-0 allocations > 0aaa29a - mm, page_alloc: reserve pageblocks for high-order atomic > allocations on demand > 974a786 - mm, page_alloc: remove MIGRATE_RESERVE > + adding early_page_nid_uninitialised() modification > The early_page check is wrong because of the check itself rather than the function so that was the bug there. > I stop receiving page alloc fail dumps like this one > http://pastebin.com/FhRW5DsF, also performance in my test looks very > similar. I'd like to understand this phenomenon and check if it's > possible to avoid such page-alloc-fail hickups in a nice way. > Afterwards, once the dumps finish, the kernel remain stable, but is > such behavior expected and intended? > Looking at the pastebin, the page allocation failure appears to be partially due to CMA. If the free_cma pages are substracted from the free pages then it's very close to the low watermark. I suspect kswapd was already active but it had not acted in time to prevent the first allocation. The impact of MIGRATE_RESERVE was to give a larger window for kswapd to do work in but it's a co-incidence. By relying on it for an order-0 allocation it would fragment that area which in your particular case may not matter but actually violates what MIGRATE_RESERVE was for. > For the record: the newest kernel I was able to reproduce the dumps > was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, > which comprise a lot (mainly yours) changes in mm, and I'm wondering > if there may be a spot fix or rather a series of improvements. I'm > looking forward to your opinion and would be grateful for any advice. > I don't believe we want to reintroduce the reserve to cope with CMA. One option would be to widen the gap between low and min watermark by the size of the CMA region. The effect would be to wake kswapd earlier which matters considering the context of the failing allocation was GFP_ATOMIC. The GFP_ATOMIC itself is interesting. If I'm reading this correctly, scsi_get_cmd_from_req() was called from scsi_prep() where it was passing in GFP_ATOMIC but in the page allocation failure, __GFP_ATOMIC is not set. It would be worth chasing down if the allocation site really was GFP_ATOMIC and if so, isolate what stripped that flag and see if it was a mistake.
Hi Mel, 2016-06-03 11:53 GMT+02:00 Mel Gorman <mgorman@techsingularity.net>: > On Thu, Jun 02, 2016 at 09:01:55PM +0200, Marcin Wojtas wrote: >> >> From what I understood, now order-0 allocation keep no reserve at all. >> > >> > Watermarks should still be preserved. zone_watermark_ok is still there. >> > What might change is the size of reserves for high-order atomic >> > allocations only. Fragmentation shouldn't be a factor. I'm missing some >> > major part of the picture. >> > >> >> I CC'ed you in the last email, as I found out your authorship of >> interesting patches - please see problem description >> https://lkml.org/lkml/2016/5/30/1056 >> >> Anyway when using v4.4.8 baseline, after reverting below patches: >> 97a16fc - mm, page_alloc: only enforce watermarks for order-0 allocations >> 0aaa29a - mm, page_alloc: reserve pageblocks for high-order atomic >> allocations on demand >> 974a786 - mm, page_alloc: remove MIGRATE_RESERVE >> + adding early_page_nid_uninitialised() modification >> > > The early_page check is wrong because of the check itself rather than > the function so that was the bug there. Regardless if it was reasonable to do this check here, behavior for all archs other than x86 was changed silently because of 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd"), so I'd consider it as a bug as well. > >> I stop receiving page alloc fail dumps like this one >> http://pastebin.com/FhRW5DsF, also performance in my test looks very >> similar. I'd like to understand this phenomenon and check if it's >> possible to avoid such page-alloc-fail hickups in a nice way. >> Afterwards, once the dumps finish, the kernel remain stable, but is >> such behavior expected and intended? >> > > Looking at the pastebin, the page allocation failure appears to be partially > due to CMA. If the free_cma pages are substracted from the free pages then > it's very close to the low watermark. I suspect kswapd was already active > but it had not acted in time to prevent the first allocation. The impact > of MIGRATE_RESERVE was to give a larger window for kswapd to do work in > but it's a co-incidence. By relying on it for an order-0 allocation it > would fragment that area which in your particular case may not matter but > actually violates what MIGRATE_RESERVE was for. Indeed it's very fragile problem and seems like basing on coincidents - e.g. contrary to buildroot in ubuntu same test can't end up with dumping fail information, so I suspect some timings are satisfied, because e.g. more services run in background. Indeed free_cma is very close to free pages overall, but usually (especially in older kernels (v4.4.8: http://pastebin.com/FhRW5DsF) the gap is much bigger. This may show that the root cause may have varied in time. > >> For the record: the newest kernel I was able to reproduce the dumps >> was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, >> which comprise a lot (mainly yours) changes in mm, and I'm wondering >> if there may be a spot fix or rather a series of improvements. I'm >> looking forward to your opinion and would be grateful for any advice. >> > > I don't believe we want to reintroduce the reserve to cope with CMA. One > option would be to widen the gap between low and min watermark by the > size of the CMA region. The effect would be to wake kswapd earlier which > matters considering the context of the failing allocation was > GFP_ATOMIC. Of course my intention is not reintroducing anything that's gone forever, but just to find out way to overcome current issues. Do you mean increasing CMA size? At the very beginning I played with CMA size (even increased it from 16M to 96M), but it didn't help. Do you think is there any other way to trigger kswapd earlier? > > The GFP_ATOMIC itself is interesting. If I'm reading this correctly, > scsi_get_cmd_from_req() was called from scsi_prep() where it was passing > in GFP_ATOMIC but in the page allocation failure, __GFP_ATOMIC is not > set. It would be worth chasing down if the allocation site really was > GFP_ATOMIC and if so, isolate what stripped that flag and see if it was > a mistake. > Printing flags was introduced recently and I didn't check them (apart of playing with GFP_NOWARN in various places) in older kernels. Thanks for this observation, I'll try to track this down. Best regards, Marcin
On Fri, Jun 03, 2016 at 01:57:06PM +0200, Marcin Wojtas wrote: > >> For the record: the newest kernel I was able to reproduce the dumps > >> was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, > >> which comprise a lot (mainly yours) changes in mm, and I'm wondering > >> if there may be a spot fix or rather a series of improvements. I'm > >> looking forward to your opinion and would be grateful for any advice. > >> > > > > I don't believe we want to reintroduce the reserve to cope with CMA. One > > option would be to widen the gap between low and min watermark by the > > size of the CMA region. The effect would be to wake kswapd earlier which > > matters considering the context of the failing allocation was > > GFP_ATOMIC. > > Of course my intention is not reintroducing anything that's gone > forever, but just to find out way to overcome current issues. Do you > mean increasing CMA size? No. There is a gap between the low and min watermarks. At the low point, kswapd is woken up and at the min point allocation requests either either direct reclaim or fail if they are atomic. What I'm suggesting is that you adjust the low watermark and add the size of the CMA area to it so that kswapd is woken earlier. The watermarks are calculated in __setup_per_zone_wmarks
Hi Mel, 2016-06-03 14:36 GMT+02:00 Mel Gorman <mgorman@techsingularity.net>: > On Fri, Jun 03, 2016 at 01:57:06PM +0200, Marcin Wojtas wrote: >> >> For the record: the newest kernel I was able to reproduce the dumps >> >> was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, >> >> which comprise a lot (mainly yours) changes in mm, and I'm wondering >> >> if there may be a spot fix or rather a series of improvements. I'm >> >> looking forward to your opinion and would be grateful for any advice. >> >> >> > >> > I don't believe we want to reintroduce the reserve to cope with CMA. One >> > option would be to widen the gap between low and min watermark by the >> > size of the CMA region. The effect would be to wake kswapd earlier which >> > matters considering the context of the failing allocation was >> > GFP_ATOMIC. >> >> Of course my intention is not reintroducing anything that's gone >> forever, but just to find out way to overcome current issues. Do you >> mean increasing CMA size? > > No. There is a gap between the low and min watermarks. At the low point, > kswapd is woken up and at the min point allocation requests either > either direct reclaim or fail if they are atomic. What I'm suggesting > is that you adjust the low watermark and add the size of the CMA area > to it so that kswapd is woken earlier. The watermarks are calculated in > __setup_per_zone_wmarks > I printed all zones' settings, whose watermarks are configured within __setup_per_zone_wmarks(). There are three DMA, Normal and Movable - only first one's watermarks have non-zero values. Increasing DMA min watermark didn't help. I also played with increasing /proc/sys/vm/min_free_kbytes from ~2560 to 16000 (__setup_per_zone_wmarks() recalculates watermarks after that) - no effect either. Best regards, Marcin
On Tue, Jun 07, 2016 at 07:36:57PM +0200, Marcin Wojtas wrote: > Hi Mel, > > > > 2016-06-03 14:36 GMT+02:00 Mel Gorman <mgorman@techsingularity.net>: > > On Fri, Jun 03, 2016 at 01:57:06PM +0200, Marcin Wojtas wrote: > >> >> For the record: the newest kernel I was able to reproduce the dumps > >> >> was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, > >> >> which comprise a lot (mainly yours) changes in mm, and I'm wondering > >> >> if there may be a spot fix or rather a series of improvements. I'm > >> >> looking forward to your opinion and would be grateful for any advice. > >> >> > >> > > >> > I don't believe we want to reintroduce the reserve to cope with CMA. One > >> > option would be to widen the gap between low and min watermark by the > >> > size of the CMA region. The effect would be to wake kswapd earlier which > >> > matters considering the context of the failing allocation was > >> > GFP_ATOMIC. > >> > >> Of course my intention is not reintroducing anything that's gone > >> forever, but just to find out way to overcome current issues. Do you > >> mean increasing CMA size? > > > > No. There is a gap between the low and min watermarks. At the low point, > > kswapd is woken up and at the min point allocation requests either > > either direct reclaim or fail if they are atomic. What I'm suggesting > > is that you adjust the low watermark and add the size of the CMA area > > to it so that kswapd is woken earlier. The watermarks are calculated in > > __setup_per_zone_wmarks > > > > I printed all zones' settings, whose watermarks are configured within > __setup_per_zone_wmarks(). There are three DMA, Normal and Movable - > only first one's watermarks have non-zero values. Increasing DMA min > watermark didn't help. I also played with increasing Patch? Did you establish why GFP_ATOMIC (assuming that's the failing site) had not specified __GFP_ATOMIC at the time of the allocation failure?
Hi Mel, My last email got cut in half. 2016-06-08 12:09 GMT+02:00 Mel Gorman <mgorman@techsingularity.net>: > On Tue, Jun 07, 2016 at 07:36:57PM +0200, Marcin Wojtas wrote: >> Hi Mel, >> >> >> >> 2016-06-03 14:36 GMT+02:00 Mel Gorman <mgorman@techsingularity.net>: >> > On Fri, Jun 03, 2016 at 01:57:06PM +0200, Marcin Wojtas wrote: >> >> >> For the record: the newest kernel I was able to reproduce the dumps >> >> >> was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, >> >> >> which comprise a lot (mainly yours) changes in mm, and I'm wondering >> >> >> if there may be a spot fix or rather a series of improvements. I'm >> >> >> looking forward to your opinion and would be grateful for any advice. >> >> >> >> >> > >> >> > I don't believe we want to reintroduce the reserve to cope with CMA. One >> >> > option would be to widen the gap between low and min watermark by the >> >> > size of the CMA region. The effect would be to wake kswapd earlier which >> >> > matters considering the context of the failing allocation was >> >> > GFP_ATOMIC. >> >> >> >> Of course my intention is not reintroducing anything that's gone >> >> forever, but just to find out way to overcome current issues. Do you >> >> mean increasing CMA size? >> > >> > No. There is a gap between the low and min watermarks. At the low point, >> > kswapd is woken up and at the min point allocation requests either >> > either direct reclaim or fail if they are atomic. What I'm suggesting >> > is that you adjust the low watermark and add the size of the CMA area >> > to it so that kswapd is woken earlier. The watermarks are calculated in >> > __setup_per_zone_wmarks >> > >> >> I printed all zones' settings, whose watermarks are configured within >> __setup_per_zone_wmarks(). There are three DMA, Normal and Movable - >> only first one's watermarks have non-zero values. Increasing DMA min >> watermark didn't help. I also played with increasing > > Patch? > I played with increasing min_free_kbytes from ~2600 to 16000. It resulted in shifting watermarks levels in __setup_per_zone_wmarks(), however only for zone DMA. Normal and Movable remained at 0. No progress with avoiding page alloc failures - a gap between 'free' and 'free_cma' was huge, so I don't think that CMA itself would be a root cause. > Did you establish why GFP_ATOMIC (assuming that's the failing site) had > not specified __GFP_ATOMIC at the time of the allocation failure? > Yes. It happens in new_slab() in following lines: return allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); I added "| GFP_ATOMIC" and in such case I got same dumps but with one bit set more in gfp_mask, so I don't think it's an issue. Latest patches in v4.7-rc1 seem to boost page alloc performance enough to avoid problems observed between v4.2 and v4.6. Hence before rebasing from v4.4 to another LTS >v4.7 in future, we decided as a WA to return to using MIGRATE_RESERVE + adding fix for early_page_nid_uninitialised(). Now operation seems stable on all our SoC's during the tests. Best regards, Marcin
Hi Mel, Thanks for posting patch. I tested it on LKv4.4.8. Despite "mode:0x2284020" shows that __GFP_ATOMIC is now not stripped, the issue remains: http://pastebin.com/DmezUJSc Best regards, Marcin 2016-06-09 20:13 GMT+02:00 Marcin Wojtas <mw@semihalf.com>: > Hi Mel, > > My last email got cut in half. > > 2016-06-08 12:09 GMT+02:00 Mel Gorman <mgorman@techsingularity.net>: >> On Tue, Jun 07, 2016 at 07:36:57PM +0200, Marcin Wojtas wrote: >>> Hi Mel, >>> >>> >>> >>> 2016-06-03 14:36 GMT+02:00 Mel Gorman <mgorman@techsingularity.net>: >>> > On Fri, Jun 03, 2016 at 01:57:06PM +0200, Marcin Wojtas wrote: >>> >> >> For the record: the newest kernel I was able to reproduce the dumps >>> >> >> was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, >>> >> >> which comprise a lot (mainly yours) changes in mm, and I'm wondering >>> >> >> if there may be a spot fix or rather a series of improvements. I'm >>> >> >> looking forward to your opinion and would be grateful for any advice. >>> >> >> >>> >> > >>> >> > I don't believe we want to reintroduce the reserve to cope with CMA. One >>> >> > option would be to widen the gap between low and min watermark by the >>> >> > size of the CMA region. The effect would be to wake kswapd earlier which >>> >> > matters considering the context of the failing allocation was >>> >> > GFP_ATOMIC. >>> >> >>> >> Of course my intention is not reintroducing anything that's gone >>> >> forever, but just to find out way to overcome current issues. Do you >>> >> mean increasing CMA size? >>> > >>> > No. There is a gap between the low and min watermarks. At the low point, >>> > kswapd is woken up and at the min point allocation requests either >>> > either direct reclaim or fail if they are atomic. What I'm suggesting >>> > is that you adjust the low watermark and add the size of the CMA area >>> > to it so that kswapd is woken earlier. The watermarks are calculated in >>> > __setup_per_zone_wmarks >>> > >>> >>> I printed all zones' settings, whose watermarks are configured within >>> __setup_per_zone_wmarks(). There are three DMA, Normal and Movable - >>> only first one's watermarks have non-zero values. Increasing DMA min >>> watermark didn't help. I also played with increasing >> >> Patch? >> > > I played with increasing min_free_kbytes from ~2600 to 16000. It > resulted in shifting watermarks levels in __setup_per_zone_wmarks(), > however only for zone DMA. Normal and Movable remained at 0. No > progress with avoiding page alloc failures - a gap between 'free' and > 'free_cma' was huge, so I don't think that CMA itself would be a root > cause. > >> Did you establish why GFP_ATOMIC (assuming that's the failing site) had >> not specified __GFP_ATOMIC at the time of the allocation failure? >> > > Yes. It happens in new_slab() in following lines: > return allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); > I added "| GFP_ATOMIC" and in such case I got same dumps but with one > bit set more in gfp_mask, so I don't think it's an issue. > > Latest patches in v4.7-rc1 seem to boost page alloc performance enough > to avoid problems observed between v4.2 and v4.6. Hence before > rebasing from v4.4 to another LTS >v4.7 in future, we decided as a WA > to return to using MIGRATE_RESERVE + adding fix for > early_page_nid_uninitialised(). Now operation seems stable on all our > SoC's during the tests. > > Best regards, > Marcin
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -294,7 +294,7 @@ static inline bool early_page_uninitialised(unsigned long pfn) static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid) { - return false; + return true; }