Message ID | 20210104065843.5658-4-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one | expand |
On 1/3/21 10:58 PM, Muchun Song wrote: > When dissolve_free_huge_page() races with __free_huge_page(), we can > do a retry. Because the race window is small. In general, I agree that the race window is small. However, worst case would be if the freeing of the page is put on a work queue. Is it acceptable to keep retrying in that case? In addition, the 'Free some vmemmap' series may slow the free_huge_page path even more. In these worst case scenarios, I am not sure we want to just spin retrying.
On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 1/3/21 10:58 PM, Muchun Song wrote: > > When dissolve_free_huge_page() races with __free_huge_page(), we can > > do a retry. Because the race window is small. > > In general, I agree that the race window is small. However, worst case > would be if the freeing of the page is put on a work queue. Is it acceptable > to keep retrying in that case? In addition, the 'Free some vmemmap' series > may slow the free_huge_page path even more. I also consider the 'Free some vmemmap' series case. In my next version series, I will flush the work before dissolve_free_huge_page returns when encountering this race. So the retry is acceptable. Right? Thanks. > > In these worst case scenarios, I am not sure we want to just spin retrying. > > -- > Mike Kravetz > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > --- > > mm/hugetlb.c | 16 +++++++++++----- > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 72608008f8b4..db00ae375d2a 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > > * nothing for in-use hugepages and non-hugepages. > > * This function returns values like below: > > * > > - * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > > - * (allocated or reserved.) > > - * 0: successfully dissolved free hugepages or the page is not a > > - * hugepage (considered as already dissolved) > > + * -EAGAIN: race with __free_huge_page() and can do a retry > > + * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > > + * (allocated or reserved.) > > + * 0: successfully dissolved free hugepages or the page is not a > > + * hugepage (considered as already dissolved) > > */ > > int dissolve_free_huge_page(struct page *page) > > { > > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page) > > * We should make sure that the page is already on the free list > > * when it is dissolved. > > */ > > - if (unlikely(!PageHugeFreed(head))) > > + if (unlikely(!PageHugeFreed(head))) { > > + rc = -EAGAIN; > > goto out; > > + } > > > > /* > > * Move PageHWPoison flag from head page to the raw error page, > > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > > > > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) { > > page = pfn_to_page(pfn); > > +retry: > > rc = dissolve_free_huge_page(page); > > + if (rc == -EAGAIN) > > + goto retry; > > if (rc) > > break; > > } > >
On Tue, Jan 5, 2021 at 11:14 AM Muchun Song <songmuchun@bytedance.com> wrote: > > On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > > > On 1/3/21 10:58 PM, Muchun Song wrote: > > > When dissolve_free_huge_page() races with __free_huge_page(), we can > > > do a retry. Because the race window is small. > > > > In general, I agree that the race window is small. However, worst case > > would be if the freeing of the page is put on a work queue. Is it acceptable > > to keep retrying in that case? In addition, the 'Free some vmemmap' series > > may slow the free_huge_page path even more. > > I also consider the 'Free some vmemmap' series case. In my next > version series, I will flush the work before dissolve_free_huge_page > returns when encountering this race. So the retry is acceptable. > Right? Hi Mike, How about flushing the @free_hpage_work when encountering this race? > > Thanks. > > > > > In these worst case scenarios, I am not sure we want to just spin retrying. > > > > -- > > Mike Kravetz > > > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > --- > > > mm/hugetlb.c | 16 +++++++++++----- > > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 72608008f8b4..db00ae375d2a 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > > > * nothing for in-use hugepages and non-hugepages. > > > * This function returns values like below: > > > * > > > - * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > > > - * (allocated or reserved.) > > > - * 0: successfully dissolved free hugepages or the page is not a > > > - * hugepage (considered as already dissolved) > > > + * -EAGAIN: race with __free_huge_page() and can do a retry > > > + * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > > > + * (allocated or reserved.) > > > + * 0: successfully dissolved free hugepages or the page is not a > > > + * hugepage (considered as already dissolved) > > > */ > > > int dissolve_free_huge_page(struct page *page) > > > { > > > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page) > > > * We should make sure that the page is already on the free list > > > * when it is dissolved. > > > */ > > > - if (unlikely(!PageHugeFreed(head))) > > > + if (unlikely(!PageHugeFreed(head))) { > > > + rc = -EAGAIN; > > > goto out; > > > + } > > > > > > /* > > > * Move PageHWPoison flag from head page to the raw error page, > > > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > > > > > > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) { > > > page = pfn_to_page(pfn); > > > +retry: > > > rc = dissolve_free_huge_page(page); > > > + if (rc == -EAGAIN) > > > + goto retry; > > > if (rc) > > > break; > > > } > > >
On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote: > When dissolve_free_huge_page() races with __free_huge_page(), we can > do a retry. Because the race window is small. > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/hugetlb.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 72608008f8b4..db00ae375d2a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > * nothing for in-use hugepages and non-hugepages. > * This function returns values like below: > * > - * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > - * (allocated or reserved.) > - * 0: successfully dissolved free hugepages or the page is not a > - * hugepage (considered as already dissolved) > + * -EAGAIN: race with __free_huge_page() and can do a retry > + * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > + * (allocated or reserved.) > + * 0: successfully dissolved free hugepages or the page is not a > + * hugepage (considered as already dissolved) > */ > int dissolve_free_huge_page(struct page *page) > { > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page) > * We should make sure that the page is already on the free list > * when it is dissolved. > */ > - if (unlikely(!PageHugeFreed(head))) > + if (unlikely(!PageHugeFreed(head))) { > + rc = -EAGAIN; > goto out; > + } If dissolve_free_huge_page() races with __free_huge_page() and we detect it with this check, the hugepage is expected to be freed or dissolved by __free_huge_page(), so is it enough just to return with rc = 0 without retrying? Thanks, Naoya Horiguchi > > /* > * Move PageHWPoison flag from head page to the raw error page, > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) { > page = pfn_to_page(pfn); > +retry: > rc = dissolve_free_huge_page(page); > + if (rc == -EAGAIN) > + goto retry; > if (rc) > break; > } > -- > 2.11.0 >
On Tue, Jan 5, 2021 at 2:38 PM HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote: > > When dissolve_free_huge_page() races with __free_huge_page(), we can > > do a retry. Because the race window is small. > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > --- > > mm/hugetlb.c | 16 +++++++++++----- > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 72608008f8b4..db00ae375d2a 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > > * nothing for in-use hugepages and non-hugepages. > > * This function returns values like below: > > * > > - * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > > - * (allocated or reserved.) > > - * 0: successfully dissolved free hugepages or the page is not a > > - * hugepage (considered as already dissolved) > > + * -EAGAIN: race with __free_huge_page() and can do a retry > > + * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > > + * (allocated or reserved.) > > + * 0: successfully dissolved free hugepages or the page is not a > > + * hugepage (considered as already dissolved) > > */ > > int dissolve_free_huge_page(struct page *page) > > { > > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page) > > * We should make sure that the page is already on the free list > > * when it is dissolved. > > */ > > - if (unlikely(!PageHugeFreed(head))) > > + if (unlikely(!PageHugeFreed(head))) { > > + rc = -EAGAIN; > > goto out; > > + } > > If dissolve_free_huge_page() races with __free_huge_page() and we detect > it with this check, the hugepage is expected to be freed or dissolved by > __free_huge_page(), so is it enough just to return with rc = 0 without retrying? The dissolve_free_huge_page() aims to free the page to the buddy allocator not the hugepage pool. So it is not enough just to return 0, right? Or did you mean that we set the page temporary here and let the __free_huge_page do the freeing later for us? Thanks. > > Thanks, > Naoya Horiguchi > > > > > /* > > * Move PageHWPoison flag from head page to the raw error page, > > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > > > > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) { > > page = pfn_to_page(pfn); > > +retry: > > rc = dissolve_free_huge_page(page); > > + if (rc == -EAGAIN) > > + goto retry; > > if (rc) > > break; > > } > > -- > > 2.11.0 > >
On Tue, Jan 05, 2021 at 03:10:35PM +0800, Muchun Song wrote: > On Tue, Jan 5, 2021 at 2:38 PM HORIGUCHI NAOYA(堀口 直也) > <naoya.horiguchi@nec.com> wrote: > > > > On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote: > > > When dissolve_free_huge_page() races with __free_huge_page(), we can > > > do a retry. Because the race window is small. > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > --- > > > mm/hugetlb.c | 16 +++++++++++----- > > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 72608008f8b4..db00ae375d2a 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > > > * nothing for in-use hugepages and non-hugepages. > > > * This function returns values like below: > > > * > > > - * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > > > - * (allocated or reserved.) > > > - * 0: successfully dissolved free hugepages or the page is not a > > > - * hugepage (considered as already dissolved) > > > + * -EAGAIN: race with __free_huge_page() and can do a retry > > > + * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use > > > + * (allocated or reserved.) > > > + * 0: successfully dissolved free hugepages or the page is not a > > > + * hugepage (considered as already dissolved) > > > */ > > > int dissolve_free_huge_page(struct page *page) > > > { > > > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page) > > > * We should make sure that the page is already on the free list > > > * when it is dissolved. > > > */ > > > - if (unlikely(!PageHugeFreed(head))) > > > + if (unlikely(!PageHugeFreed(head))) { > > > + rc = -EAGAIN; > > > goto out; > > > + } > > > > If dissolve_free_huge_page() races with __free_huge_page() and we detect > > it with this check, the hugepage is expected to be freed or dissolved by > > __free_huge_page(), so is it enough just to return with rc = 0 without retrying? > > The dissolve_free_huge_page() aims to free the page to the buddy > allocator not the hugepage pool. So it is not enough just to return 0, > right? Or did you mean that we set the page temporary here and > let the __free_huge_page do the freeing later for us? Thanks. Ah, OK. You're right. Thank you for the answer (and finding/fixing a few bugs). - Naoya
On 1/4/21 7:46 PM, Muchun Song wrote: > On Tue, Jan 5, 2021 at 11:14 AM Muchun Song <songmuchun@bytedance.com> wrote: >> >> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >>> >>> On 1/3/21 10:58 PM, Muchun Song wrote: >>>> When dissolve_free_huge_page() races with __free_huge_page(), we can >>>> do a retry. Because the race window is small. >>> >>> In general, I agree that the race window is small. However, worst case >>> would be if the freeing of the page is put on a work queue. Is it acceptable >>> to keep retrying in that case? In addition, the 'Free some vmemmap' series >>> may slow the free_huge_page path even more. >> >> I also consider the 'Free some vmemmap' series case. In my next >> version series, I will flush the work before dissolve_free_huge_page >> returns when encountering this race. So the retry is acceptable. >> Right? > > Hi Mike, > > How about flushing the @free_hpage_work when > encountering this race? Flushing should be fine. The more I think about it, the more I think spinning to retry is not going to be an issue. Why? As you mentioned the race window is quite small. It just makes me a bit nervous to retry in a tight loop.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 72608008f8b4..db00ae375d2a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, * nothing for in-use hugepages and non-hugepages. * This function returns values like below: * - * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use - * (allocated or reserved.) - * 0: successfully dissolved free hugepages or the page is not a - * hugepage (considered as already dissolved) + * -EAGAIN: race with __free_huge_page() and can do a retry + * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use + * (allocated or reserved.) + * 0: successfully dissolved free hugepages or the page is not a + * hugepage (considered as already dissolved) */ int dissolve_free_huge_page(struct page *page) { @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page) * We should make sure that the page is already on the free list * when it is dissolved. */ - if (unlikely(!PageHugeFreed(head))) + if (unlikely(!PageHugeFreed(head))) { + rc = -EAGAIN; goto out; + } /* * Move PageHWPoison flag from head page to the raw error page, @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) { page = pfn_to_page(pfn); +retry: rc = dissolve_free_huge_page(page); + if (rc == -EAGAIN) + goto retry; if (rc) break; }
When dissolve_free_huge_page() races with __free_huge_page(), we can do a retry. Because the race window is small. Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- mm/hugetlb.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)