Message ID | a24a86fbae09711e61dc4424aa7aebff718e9995.1678703534.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] mm: compaction: consider the number of scanning compound pages in isolate fail path | expand |
On 03/13/23 18:37, Baolin Wang wrote: > When trying to isolate a migratable pageblock, it can contain several > normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64) > in a pageblock. That means we may hold the lru lock of a normal page to > continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page() > in the same migratable pageblock. > > However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb > page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the > hugetlb's refcount is zero. That means we can still enter the direct compaction > path to allocate a new hugetlb page under the current lru lock, which > may cause possible deadlock. > > To avoid this possible deadlock, we should release the lru lock when trying > to isolate a hugetbl page. Moreover it does not make sense to take the lru > lock to isolate a hugetlb, which is not in the lru list. > > Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages") > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/compaction.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index c9d9ad958e2a..ac8ff152421a 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c Thanks! I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was not considered. However, I wonder if this can really happen in practice? Before the code below, there is this: /* * Periodically drop the lock (if held) regardless of its * contention, to give chance to IRQs. Abort completely if * a fatal signal is pending. */ if (!(low_pfn % COMPACT_CLUSTER_MAX)) { if (locked) { unlock_page_lruvec_irqrestore(locked, flags); locked = NULL; } ... } It would seem that the pfn of a hugetlb page would always be a multiple of COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if that is ALWAYS true and would prefer something like the code you suggested. Did you actually see this deadlock in practice?
On Mon, 13 Mar 2023 10:08:38 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote: > I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was > not considered. However, I wonder if this can really happen in practice? > > Before the code below, there is this: > > /* > * Periodically drop the lock (if held) regardless of its > * contention, to give chance to IRQs. Abort completely if > * a fatal signal is pending. > */ > if (!(low_pfn % COMPACT_CLUSTER_MAX)) { > if (locked) { > unlock_page_lruvec_irqrestore(locked, flags); > locked = NULL; > } > ... > } > > It would seem that the pfn of a hugetlb page would always be a multiple of > COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if > that is ALWAYS true and would prefer something like the code you suggested. > > Did you actually see this deadlock in practice? Presumably the lack of lockdep reports about this tells us something?
On 3/14/2023 1:08 AM, Mike Kravetz wrote: > On 03/13/23 18:37, Baolin Wang wrote: >> When trying to isolate a migratable pageblock, it can contain several >> normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64) >> in a pageblock. That means we may hold the lru lock of a normal page to >> continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page() >> in the same migratable pageblock. >> >> However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb >> page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the >> hugetlb's refcount is zero. That means we can still enter the direct compaction >> path to allocate a new hugetlb page under the current lru lock, which >> may cause possible deadlock. >> >> To avoid this possible deadlock, we should release the lru lock when trying >> to isolate a hugetbl page. Moreover it does not make sense to take the lru >> lock to isolate a hugetlb, which is not in the lru list. >> >> Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages") >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/compaction.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index c9d9ad958e2a..ac8ff152421a 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c > > Thanks! > > I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was > not considered. However, I wonder if this can really happen in practice? > > Before the code below, there is this: > > /* > * Periodically drop the lock (if held) regardless of its > * contention, to give chance to IRQs. Abort completely if > * a fatal signal is pending. > */ > if (!(low_pfn % COMPACT_CLUSTER_MAX)) { > if (locked) { > unlock_page_lruvec_irqrestore(locked, flags); > locked = NULL; > } > ... > } > > It would seem that the pfn of a hugetlb page would always be a multiple of > COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if > that is ALWAYS true and would prefer something like the code you suggested. Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch, which contains 16 contiguous normal pages. > Did you actually see this deadlock in practice? I did not see this issue in practice until now, but I think it can be triggered from code inspection if trying to isolate a CONT-PTE hugetlb.
On 03/14/23 12:11, Baolin Wang wrote: > On 3/14/2023 1:08 AM, Mike Kravetz wrote: > > On 03/13/23 18:37, Baolin Wang wrote: > > > > It would seem that the pfn of a hugetlb page would always be a multiple of > > COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if > > that is ALWAYS true and would prefer something like the code you suggested. > > Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch, > which contains 16 contiguous normal pages. > Right. I keep forgetting about the CONT-* page sizes on arm :( In any case, I think explicitly dropping the lock as you have done is a good idea. Feel free to add, Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
On 3/15/2023 1:27 AM, Mike Kravetz wrote: > On 03/14/23 12:11, Baolin Wang wrote: >> On 3/14/2023 1:08 AM, Mike Kravetz wrote: >>> On 03/13/23 18:37, Baolin Wang wrote: >>> >>> It would seem that the pfn of a hugetlb page would always be a multiple of >>> COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if >>> that is ALWAYS true and would prefer something like the code you suggested. >> >> Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch, >> which contains 16 contiguous normal pages. >> > > Right. I keep forgetting about the CONT-* page sizes on arm :( > > In any case, I think explicitly dropping the lock as you have done is a > good idea. > > Feel free to add, > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Thanks for reviewing.
On 3/13/23 11:37, Baolin Wang wrote: > When trying to isolate a migratable pageblock, it can contain several > normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64) > in a pageblock. That means we may hold the lru lock of a normal page to > continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page() > in the same migratable pageblock. > > However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb > page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the > hugetlb's refcount is zero. That means we can still enter the direct compaction > path to allocate a new hugetlb page under the current lru lock, which > may cause possible deadlock. > > To avoid this possible deadlock, we should release the lru lock when trying > to isolate a hugetbl page. Moreover it does not make sense to take the lru > lock to isolate a hugetlb, which is not in the lru list. > > Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages") > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Thanks! > --- > mm/compaction.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index c9d9ad958e2a..ac8ff152421a 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -893,6 +893,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > } > > if (PageHuge(page) && cc->alloc_contig) { > + if (locked) { > + unlock_page_lruvec_irqrestore(locked, flags); > + locked = NULL; > + } > + > ret = isolate_or_dissolve_huge_page(page, &cc->migratepages); > > /*
diff --git a/mm/compaction.c b/mm/compaction.c index c9d9ad958e2a..ac8ff152421a 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -893,6 +893,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, } if (PageHuge(page) && cc->alloc_contig) { + if (locked) { + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; + } + ret = isolate_or_dissolve_huge_page(page, &cc->migratepages); /*
When trying to isolate a migratable pageblock, it can contain several normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64) in a pageblock. That means we may hold the lru lock of a normal page to continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page() in the same migratable pageblock. However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the hugetlb's refcount is zero. That means we can still enter the direct compaction path to allocate a new hugetlb page under the current lru lock, which may cause possible deadlock. To avoid this possible deadlock, we should release the lru lock when trying to isolate a hugetbl page. Moreover it does not make sense to take the lru lock to isolate a hugetlb, which is not in the lru list. Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/compaction.c | 5 +++++ 1 file changed, 5 insertions(+)