Message ID | 20230125015738.912924-1-zokeefe@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount | expand |
On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote: > > During collapse, in a few places we check to see if a given small page > has any unaccounted references. If the refcount on the page doesn't > match our expectations, it must be there is an unknown user concurrently > interested in the page, and so it's not safe to move the contents > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > collapse may succeed on retry. The page may be DMA pinned (for example, pin_user_pages()), it is not worth retrying for such pages. But it may also not be worth optimizing for this case at this point. So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com> > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > Reported-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > --- > mm/khugepaged.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index e23619bfecc4..fa38cae240b9 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) > case SCAN_CGROUP_CHARGE_FAIL: > return -EBUSY; > /* Resource temporary unavailable - trying again might succeed */ > + case SCAN_PAGE_COUNT: > case SCAN_PAGE_LOCK: > case SCAN_PAGE_LRU: > case SCAN_DEL_PAGE_LRU: > -- > 2.39.1.405.gd4c25cc71f-goog >
On Wed, Jan 25, 2023 at 10:06 AM Yang Shi <shy828301@gmail.com> wrote: > > On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > During collapse, in a few places we check to see if a given small page > > has any unaccounted references. If the refcount on the page doesn't > > match our expectations, it must be there is an unknown user concurrently > > interested in the page, and so it's not safe to move the contents > > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > > collapse may succeed on retry. > > The page may be DMA pinned (for example, pin_user_pages()), it is not > worth retrying for such pages. But it may also not be worth optimizing > for this case at this point. > > So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com> Thanks as always, Yang, and good point about DMA pinning. As you mentioned, I don't know if it's worth considering that too much right now, as it's unlikely these two uses (MADV_COLLAPSE and DMA pining) would be used together. We can revisit if necessary later if it's an issue, but for now, I think it's a win that MADV_COLLAPSE (+ a bounded userspace retry loop based off erno) is more likely to succeed.
On Tue, 24 Jan 2023, Zach O'Keefe wrote: > During collapse, in a few places we check to see if a given small page > has any unaccounted references. If the refcount on the page doesn't > match our expectations, it must be there is an unknown user concurrently > interested in the page, and so it's not safe to move the contents > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > collapse may succeed on retry. > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > Reported-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> This was Reviewed-by: Yang Shi <shy828301@gmail.com> and now I'll give it a nudge with Acked-by: Hugh Dickins <hughd@google.com> since it hasn't appeared in mm-unstable or linux-next yet: I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention. Thanks! Hugh > > --- > mm/khugepaged.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index e23619bfecc4..fa38cae240b9 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) > case SCAN_CGROUP_CHARGE_FAIL: > return -EBUSY; > /* Resource temporary unavailable - trying again might succeed */ > + case SCAN_PAGE_COUNT: > case SCAN_PAGE_LOCK: > case SCAN_PAGE_LRU: > case SCAN_DEL_PAGE_LRU: > -- > 2.39.1.405.gd4c25cc71f-goog
On Wed, 8 Feb 2023 21:09:04 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > On Tue, 24 Jan 2023, Zach O'Keefe wrote: > > > During collapse, in a few places we check to see if a given small page > > has any unaccounted references. If the refcount on the page doesn't > > match our expectations, it must be there is an unknown user concurrently > > interested in the page, and so it's not safe to move the contents > > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > > collapse may succeed on retry. > > > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > > Reported-by: Hugh Dickins <hughd@google.com> > > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > This was > Reviewed-by: Yang Shi <shy828301@gmail.com> > and now I'll give it a nudge with > Acked-by: Hugh Dickins <hughd@google.com> > since it hasn't appeared in mm-unstable or linux-next yet: Buildbot failed on [2/2] so I skipped the whole series in expectation of a v2 series, which didn't happen. Instead, Zach trickily sent what was [2/2] as a standalone patch. So [1/2] got lost. Sigh, poor me. Thanks, I'll merge [1/2] into mm-hotfixes. > I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention. I'm not seeing anything in the [1/2] changelog which indicates that a backport is needed. IOW, # cat .signature When fixing a bug, please describe the end-user visible effects of that bug.
On Thu, 9 Feb 2023, Andrew Morton wrote: > > Thanks, I'll merge [1/2] into mm-hotfixes. Great, thanks. > > I'm not seeing anything in the [1/2] changelog which indicates that a > backport is needed. IOW, Correct: it's just changing the errno for some racy cases from "you're wrong, don't bother me again" to "it might be worth having another go": not fixing an instability, as 2/2 was. > > # cat .signature > When fixing a bug, please describe the end-user visible effects of that bug. If whatever's being run by the end-user is coded to try again on -EAGAIN, then the end-user will less often see occasional unexplained failures. Hugh
On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > > > > > I'm not seeing anything in the [1/2] changelog which indicates that a > > backport is needed. IOW, > > Correct: it's just changing the errno for some racy cases from "you're > wrong, don't bother me again" to "it might be worth having another go": > not fixing an instability, as 2/2 was. > > > > > # cat .signature > > When fixing a bug, please describe the end-user visible effects of that bug. > > If whatever's being run by the end-user is coded to try again on -EAGAIN, > then the end-user will less often see occasional unexplained failures. > OK, thanks. I redid the changelog's final paragraph thusly: : In this situation, MADV_COLLAPSE returns -EINVAL when it should return : -EAGAIN. This could cause userspace to conclude that the syscall failed, : when it in fact could succeed by retrying.
On Thu, Feb 9, 2023 at 2:12 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > > > > > > > > > I'm not seeing anything in the [1/2] changelog which indicates that a > > > backport is needed. IOW, > > > > Correct: it's just changing the errno for some racy cases from "you're > > wrong, don't bother me again" to "it might be worth having another go": > > not fixing an instability, as 2/2 was. > > > > > > > > # cat .signature > > > When fixing a bug, please describe the end-user visible effects of that bug. > > > > If whatever's being run by the end-user is coded to try again on -EAGAIN, > > then the end-user will less often see occasional unexplained failures. > > > > OK, thanks. I redid the changelog's final paragraph thusly: > > : In this situation, MADV_COLLAPSE returns -EINVAL when it should return > : -EAGAIN. This could cause userspace to conclude that the syscall failed, > : when it in fact could succeed by retrying. > This looks good to me. Thanks Andrew! Also thanks Hugh for being on the lookout for this patch -- I hastily read through my emails regarding which patches were merged where and had assumed this merged with 2/2. Also, apologies about the confusing v1 [1/2] and v2 [2/2] fiasco; in hindsight that probably wasn't the most decipherable thing to do :) Best, Zach
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e23619bfecc4..fa38cae240b9 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) case SCAN_CGROUP_CHARGE_FAIL: return -EBUSY; /* Resource temporary unavailable - trying again might succeed */ + case SCAN_PAGE_COUNT: case SCAN_PAGE_LOCK: case SCAN_PAGE_LRU: case SCAN_DEL_PAGE_LRU:
During collapse, in a few places we check to see if a given small page has any unaccounted references. If the refcount on the page doesn't match our expectations, it must be there is an unknown user concurrently interested in the page, and so it's not safe to move the contents elsewhere. However, the unaccounted pins are likely an ephemeral state. In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that collapse may succeed on retry. Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Zach O'Keefe <zokeefe@google.com> --- mm/khugepaged.c | 1 + 1 file changed, 1 insertion(+)