Message ID | 20200114002904.h9bSo%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/11] mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations | expand |
On Mon, Jan 13, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > From: Vlastimil Babka <vbabka@suse.cz> > Subject: mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations I absolutely _detest_ how we've had this pattern of trying to change the THP logic in a late -rc game. Considering how unsuccessful some of the earlier attempts have been, late -rc kernels really are *not* the time to make changes like this. I'm going to take it this time, but (a) I'm going to revert *immediately* if anybody even peeps about the new behavior (b) I want this to stop. Andrew - send THP "tweaks" during the merge window, and not like some late-rc "fixes", when there are absolutely _zero_ performance numbers or anything else critical associated with the patch. IOW, there is a decided lack of "bugfix" here. And if there actually _were_ performance numbers, they should have been in the explanation. So stop randomly messing with the THP allocation code, since it's _known_ to be fragile with subtle side effects. Linus
On 1/14/20 3:16 AM, Linus Torvalds wrote: > On Mon, Jan 13, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> >> From: Vlastimil Babka <vbabka@suse.cz> >> Subject: mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations > > I absolutely _detest_ how we've had this pattern of trying to change > the THP logic in a late -rc game. > > Considering how unsuccessful some of the earlier attempts have been, > late -rc kernels really are *not* the time to make changes like this. Agreed, myself would have been much more comfortable in rc1. > I'm going to take it this time, but Thanks. > (a) I'm going to revert *immediately* if anybody even peeps about the > new behavior > > (b) I want this to stop. Andrew - send THP "tweaks" during the merge > window, and not like some late-rc "fixes", when there are absolutely > _zero_ performance numbers or anything else critical associated with > the patch. > > IOW, there is a decided lack of "bugfix" here. And if there actually > _were_ performance numbers, they should have been in the explanation. FWIW Mel did some tests [1] which showed some improvement to THP success rate and no significant extra swapping that was feared. We didn't get confirmation from Andrea about his real life workloads though. [1] https://lore.kernel.org/linux-mm/20191113112042.GG28938@suse.de/ > So stop randomly messing with the THP allocation code, since it's > _known_ to be fragile with subtle side effects. > > Linus >
On Tue 14-01-20 09:05:57, Vlastimil Babka wrote: > On 1/14/20 3:16 AM, Linus Torvalds wrote: > > On Mon, Jan 13, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > >> > >> From: Vlastimil Babka <vbabka@suse.cz> > >> Subject: mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations > > > > I absolutely _detest_ how we've had this pattern of trying to change > > the THP logic in a late -rc game. > > > > Considering how unsuccessful some of the earlier attempts have been, > > late -rc kernels really are *not* the time to make changes like this. > > Agreed, myself would have been much more comfortable in rc1. Yeah I do agree. The patch is a result of a regression I have reported back in early October (http://lkml.kernel.org/r/20191001083743.GC15624@dhcp22.suse.cz). The workload is quite trivial albeit artificial which doesn't make it super urgent. Still something to have addressed though. It has been mostly ignored for quite some time until Vlastimil came with the patch which improved the situation while not causing any other obvious problems (http://lkml.kernel.org/r/20191029151549.GO31513@dhcp22.suse.cz). It has been sitting in the mmotm tree since Oct 29 and in linux-next around a similar time without any reports. 5.5-rc1 (Dec 8) would have been a good time to target but I understand that a more time in linux-next wouldn't hurt and targeting 5.6-rc1 should be sufficient.
--- a/mm/mempolicy.c~mm-thp-tweak-reclaim-compaction-effort-of-local-only-and-all-node-allocations +++ a/mm/mempolicy.c @@ -2148,18 +2148,22 @@ alloc_pages_vma(gfp_t gfp, int order, st nmask = policy_nodemask(gfp, pol); if (!nmask || node_isset(hpage_node, *nmask)) { mpol_cond_put(pol); + /* + * First, try to allocate THP only on local node, but + * don't reclaim unnecessarily, just compact. + */ page = __alloc_pages_node(hpage_node, - gfp | __GFP_THISNODE, order); + gfp | __GFP_THISNODE | __GFP_NORETRY, order); /* * If hugepage allocations are configured to always * synchronous compact or the vma has been madvised * to prefer hugepage backing, retry allowing remote - * memory as well. + * memory with both reclaim and compact as well. */ if (!page && (gfp & __GFP_DIRECT_RECLAIM)) page = __alloc_pages_node(hpage_node, - gfp | __GFP_NORETRY, order); + gfp, order); goto out; } --- a/mm/page_alloc.c~mm-thp-tweak-reclaim-compaction-effort-of-local-only-and-all-node-allocations +++ a/mm/page_alloc.c @@ -4476,8 +4476,11 @@ retry_cpuset: if (page) goto got_pg; - if (order >= pageblock_order && (gfp_mask & __GFP_IO) && - !(gfp_mask & __GFP_RETRY_MAYFAIL)) { + /* + * Checks for costly allocations with __GFP_NORETRY, which + * includes some THP page fault allocations + */ + if (costly_order && (gfp_mask & __GFP_NORETRY)) { /* * If allocating entire pageblock(s) and compaction * failed because all zones are below low watermarks @@ -4498,23 +4501,6 @@ retry_cpuset: if (compact_result == COMPACT_SKIPPED || compact_result == COMPACT_DEFERRED) goto nopage; - } - - /* - * Checks for costly allocations with __GFP_NORETRY, which - * includes THP page fault allocations - */ - if (costly_order && (gfp_mask & __GFP_NORETRY)) { - /* - * If compaction is deferred for high-order allocations, - * it is because sync compaction recently failed. If - * this is the case and the caller requested a THP - * allocation, we do not want to heavily disrupt the - * system, so we fail the allocation instead of entering - * direct reclaim. - */ - if (compact_result == COMPACT_DEFERRED) - goto nopage; /* * Looks like reclaim/compaction is worth trying, but