Message ID | 3576e3520c044beb2a81860aecb2d4f597089300.1682521303.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock"" | expand |
On 4/26/23 17:03, Baolin Wang wrote: > This reverts commit 95e7a450b8190673675836bfef236262ceff084a. > > When I tested thpscale with v6.3 kernel, I found the compaction efficiency > had a great regression compared to v6.2-rc1 kernel. See below numbers: > v6.2-rc v6.3 > Percentage huge-3 81.35 ( 0.00%) 32.97 ( -59.47%) > Percentage huge-5 89.92 ( 0.00%) 41.70 ( -53.63%) > Percentage huge-7 92.41 ( 0.00%) 34.08 ( -63.12%) > Percentage huge-12 90.29 ( 0.00%) 41.10 ( -54.49%) > Percentage huge-18 82.38 ( 0.00%) 41.24 ( -49.95%) > Percentage huge-24 80.34 ( 0.00%) 35.99 ( -55.20%) > Percentage huge-30 88.90 ( 0.00%) 44.20 ( -50.28%) > Percentage huge-32 90.69 ( 0.00%) 79.57 ( -12.25%) > > Ops Compaction stalls 113790.00 207099.00 > Ops Compaction success 33983.00 19488.00 > Ops Compaction failures 79807.00 187611.00 > Ops Compaction efficiency 29.86 9.41 > > After some investigation, I found the commit 95e7a450b819 > ("Revert mm/compaction: fix set skip in fast_find_migrateblock") caused > the regression. This commit revert the commit 7efc3b726103 ("mm/compaction: > fix set skip in fast_find_migrateblock") to fix a CPU stalling issue, which > is caused by compaction stucked in repeating fast_find_migrateblock(). > > And now the compaction stalling issue is addressed by commit cfccd2e63e7e > ("mm, compaction: finish pageblocks on complete migration failure"). So IIRC at that time I was pointing out some scenarios that could make the problem appear even after that commit, and we wanted to revisit that when Mel is back. > we should revert the temporary fix by commit 95e7a450b819, since the > fast pfn found by fast_find_migrateblock() really can help to isolate > some migratable pages. So thanks for the reminder, yet we should make sure the fix is complete before removing the workaround. > After reverting the commit, the regression has gone. > v6.2-rc1 v6.3 v6.3_patched > Percentage huge-3 81.35 ( 0.00%) 32.97 ( -59.47%) 87.78 ( 7.90%) > Percentage huge-5 89.92 ( 0.00%) 41.70 ( -53.63%) 89.68 ( -0.27%) > Percentage huge-7 92.41 ( 0.00%) 34.08 ( -63.12%) 85.89 ( -7.05%) > Percentage huge-12 90.29 ( 0.00%) 41.10 ( -54.49%) 94.10 ( 4.22%) > Percentage huge-18 82.38 ( 0.00%) 41.24 ( -49.95%) 85.06 ( 3.25%) > Percentage huge-24 80.34 ( 0.00%) 35.99 ( -55.20%) 84.38 ( 5.02%) > Percentage huge-30 88.90 ( 0.00%) 44.20 ( -50.28%) 95.54 ( 7.48%) > Percentage huge-32 90.69 ( 0.00%) 79.57 ( -12.25%) 92.30 ( 1.78%) > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/compaction.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 33650541bebc..567c8d41d01e 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1860,7 +1860,6 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) > pfn = cc->zone->zone_start_pfn; > cc->fast_search_fail = 0; > found_block = true; > - set_pageblock_skip(freepage); > break; > } > }
On Wed, Apr 26, 2023 at 05:10:14PM +0200, Vlastimil Babka wrote: > On 4/26/23 17:03, Baolin Wang wrote: > > This reverts commit 95e7a450b8190673675836bfef236262ceff084a. > > > > When I tested thpscale with v6.3 kernel, I found the compaction efficiency > > had a great regression compared to v6.2-rc1 kernel. See below numbers: > > v6.2-rc v6.3 > > Percentage huge-3 81.35 ( 0.00%) 32.97 ( -59.47%) > > Percentage huge-5 89.92 ( 0.00%) 41.70 ( -53.63%) > > Percentage huge-7 92.41 ( 0.00%) 34.08 ( -63.12%) > > Percentage huge-12 90.29 ( 0.00%) 41.10 ( -54.49%) > > Percentage huge-18 82.38 ( 0.00%) 41.24 ( -49.95%) > > Percentage huge-24 80.34 ( 0.00%) 35.99 ( -55.20%) > > Percentage huge-30 88.90 ( 0.00%) 44.20 ( -50.28%) > > Percentage huge-32 90.69 ( 0.00%) 79.57 ( -12.25%) > > > > Ops Compaction stalls 113790.00 207099.00 > > Ops Compaction success 33983.00 19488.00 > > Ops Compaction failures 79807.00 187611.00 > > Ops Compaction efficiency 29.86 9.41 > > > > After some investigation, I found the commit 95e7a450b819 > > ("Revert mm/compaction: fix set skip in fast_find_migrateblock") caused > > the regression. This commit revert the commit 7efc3b726103 ("mm/compaction: > > fix set skip in fast_find_migrateblock") to fix a CPU stalling issue, which > > is caused by compaction stucked in repeating fast_find_migrateblock(). > > > > And now the compaction stalling issue is addressed by commit cfccd2e63e7e > > ("mm, compaction: finish pageblocks on complete migration failure"). So > > IIRC at that time I was pointing out some scenarios that could make the > problem appear even after that commit, and we wanted to revisit that > when Mel is back. > Yes, I've prototyped the fix against 6.3-rc7 and the revert is at the end but the revert on its own has the potential for causing problems. The series needs to be rebased, retested and posted. What I last tested should show up shortly at https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/ mm-follupfastmigrate-v1r1
On 4/26/2023 11:33 PM, Mel Gorman wrote: > On Wed, Apr 26, 2023 at 05:10:14PM +0200, Vlastimil Babka wrote: >> On 4/26/23 17:03, Baolin Wang wrote: >>> This reverts commit 95e7a450b8190673675836bfef236262ceff084a. >>> >>> When I tested thpscale with v6.3 kernel, I found the compaction efficiency >>> had a great regression compared to v6.2-rc1 kernel. See below numbers: >>> v6.2-rc v6.3 >>> Percentage huge-3 81.35 ( 0.00%) 32.97 ( -59.47%) >>> Percentage huge-5 89.92 ( 0.00%) 41.70 ( -53.63%) >>> Percentage huge-7 92.41 ( 0.00%) 34.08 ( -63.12%) >>> Percentage huge-12 90.29 ( 0.00%) 41.10 ( -54.49%) >>> Percentage huge-18 82.38 ( 0.00%) 41.24 ( -49.95%) >>> Percentage huge-24 80.34 ( 0.00%) 35.99 ( -55.20%) >>> Percentage huge-30 88.90 ( 0.00%) 44.20 ( -50.28%) >>> Percentage huge-32 90.69 ( 0.00%) 79.57 ( -12.25%) >>> >>> Ops Compaction stalls 113790.00 207099.00 >>> Ops Compaction success 33983.00 19488.00 >>> Ops Compaction failures 79807.00 187611.00 >>> Ops Compaction efficiency 29.86 9.41 >>> >>> After some investigation, I found the commit 95e7a450b819 >>> ("Revert mm/compaction: fix set skip in fast_find_migrateblock") caused >>> the regression. This commit revert the commit 7efc3b726103 ("mm/compaction: >>> fix set skip in fast_find_migrateblock") to fix a CPU stalling issue, which >>> is caused by compaction stucked in repeating fast_find_migrateblock(). >>> >>> And now the compaction stalling issue is addressed by commit cfccd2e63e7e >>> ("mm, compaction: finish pageblocks on complete migration failure"). So >> >> IIRC at that time I was pointing out some scenarios that could make the >> problem appear even after that commit, and we wanted to revisit that >> when Mel is back. Ah, I missed that, and will check previous discussion. > Yes, I've prototyped the fix against 6.3-rc7 and the revert is at the > end but the revert on its own has the potential for causing problems. The > series needs to be rebased, retested and posted. What I last tested > should show up shortly at > > https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/ mm-follupfastmigrate-v1r1 Thanks.
diff --git a/mm/compaction.c b/mm/compaction.c index 33650541bebc..567c8d41d01e 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1860,7 +1860,6 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) pfn = cc->zone->zone_start_pfn; cc->fast_search_fail = 0; found_block = true; - set_pageblock_skip(freepage); break; } }
This reverts commit 95e7a450b8190673675836bfef236262ceff084a. When I tested thpscale with v6.3 kernel, I found the compaction efficiency had a great regression compared to v6.2-rc1 kernel. See below numbers: v6.2-rc v6.3 Percentage huge-3 81.35 ( 0.00%) 32.97 ( -59.47%) Percentage huge-5 89.92 ( 0.00%) 41.70 ( -53.63%) Percentage huge-7 92.41 ( 0.00%) 34.08 ( -63.12%) Percentage huge-12 90.29 ( 0.00%) 41.10 ( -54.49%) Percentage huge-18 82.38 ( 0.00%) 41.24 ( -49.95%) Percentage huge-24 80.34 ( 0.00%) 35.99 ( -55.20%) Percentage huge-30 88.90 ( 0.00%) 44.20 ( -50.28%) Percentage huge-32 90.69 ( 0.00%) 79.57 ( -12.25%) Ops Compaction stalls 113790.00 207099.00 Ops Compaction success 33983.00 19488.00 Ops Compaction failures 79807.00 187611.00 Ops Compaction efficiency 29.86 9.41 After some investigation, I found the commit 95e7a450b819 ("Revert mm/compaction: fix set skip in fast_find_migrateblock") caused the regression. This commit revert the commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock") to fix a CPU stalling issue, which is caused by compaction stucked in repeating fast_find_migrateblock(). And now the compaction stalling issue is addressed by commit cfccd2e63e7e ("mm, compaction: finish pageblocks on complete migration failure"). So we should revert the temporary fix by commit 95e7a450b819, since the fast pfn found by fast_find_migrateblock() really can help to isolate some migratable pages. After reverting the commit, the regression has gone. v6.2-rc1 v6.3 v6.3_patched Percentage huge-3 81.35 ( 0.00%) 32.97 ( -59.47%) 87.78 ( 7.90%) Percentage huge-5 89.92 ( 0.00%) 41.70 ( -53.63%) 89.68 ( -0.27%) Percentage huge-7 92.41 ( 0.00%) 34.08 ( -63.12%) 85.89 ( -7.05%) Percentage huge-12 90.29 ( 0.00%) 41.10 ( -54.49%) 94.10 ( 4.22%) Percentage huge-18 82.38 ( 0.00%) 41.24 ( -49.95%) 85.06 ( 3.25%) Percentage huge-24 80.34 ( 0.00%) 35.99 ( -55.20%) 84.38 ( 5.02%) Percentage huge-30 88.90 ( 0.00%) 44.20 ( -50.28%) 95.54 ( 7.48%) Percentage huge-32 90.69 ( 0.00%) 79.57 ( -12.25%) 92.30 ( 1.78%) Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/compaction.c | 1 - 1 file changed, 1 deletion(-)