diff mbox series

mm/page_alloc: Correct return value of populated elements if bulk array is populated

Message ID 20210628150219.GC3840@techsingularity.net (mailing list archive)
State New
Headers show
Series mm/page_alloc: Correct return value of populated elements if bulk array is populated | expand

Commit Message

Mel Gorman June 28, 2021, 3:02 p.m. UTC
Dave Jones reported the following

	This made it into 5.13 final, and completely breaks NFSD for me
	(Serving tcp v3 mounts).  Existing mounts on clients hang, as do
	new mounts from new clients.  Rebooting the server back to rc7
	everything recovers.

The commit b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after
checking populated elements") returns the wrong value if the array is
already populated which is interpreted as an allocation failure. Dave
reported this fixes his problem and it also passed a test running dbench
over NFS.

Fixes: b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after checking populated elements")
Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> [5.13+]
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Geert Uytterhoeven June 29, 2021, 4:59 p.m. UTC | #1
Hi Mel,

On Mon, Jun 28, 2021 at 5:29 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> Dave Jones reported the following
>
>         This made it into 5.13 final, and completely breaks NFSD for me
>         (Serving tcp v3 mounts).  Existing mounts on clients hang, as do
>         new mounts from new clients.  Rebooting the server back to rc7
>         everything recovers.
>
> The commit b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after
> checking populated elements") returns the wrong value if the array is
> already populated which is interpreted as an allocation failure. Dave
> reported this fixes his problem and it also passed a test running dbench
> over NFS.
>
> Fixes: b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after checking populated elements")
> Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Cc: <stable@vger.kernel.org> [5.13+]

I saw similar failures as Mike Galbraith when doing s2idle or s2ram
on some boards with some configs:

    Freezing of tasks failed after 20.004 seconds (1 tasks refusing to
freeze, wq_busy=0):
    task:NFSv4 callback  state:S stack:    0 pid:  280 ppid:     2
flags:0x00000000
    [<c094b634>] (__schedule) from [<c094b8d0>] (schedule+0xc0/0x110)
    [<c094b8d0>] (schedule) from [<c094faec>] (schedule_timeout+0xc8/0x108)
    [<c094faec>] (schedule_timeout) from [<c092e0a0>] (svc_recv+0x108/0xa30)
    [<c092e0a0>] (svc_recv) from [<c04c5990>] (nfs4_callback_svc+0x6c/0x84)
    [<c04c5990>] (nfs4_callback_svc) from [<c0244ddc>] (kthread+0x128/0x138)
    [<c0244ddc>] (kthread) from [<c0200114>] (ret_from_fork+0x14/0x20)

I've bisected it (twice, as I couldn't believe the result) to the
same commit, which helped me find the fix.

After cherry-picking commit 66d9282523b32281 ("mm/page_alloc: Correct
return value of populated elements if bulk array is populated"),
the problem went away.

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef2265f86b91..04220581579c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5058,7 +5058,7 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	/* Already populated array? */
 	if (unlikely(page_array && nr_pages - nr_populated == 0))
-		return 0;
+		return nr_populated;
 
 	/* Use the single page allocator for one page. */
 	if (nr_pages - nr_populated == 1)