diff mbox series

mm/page_alloc: do bulk array bounds check after checking populated elements

Message ID 20210618125102.GU30378@techsingularity.net (mailing list archive)
State New
Headers show
Series mm/page_alloc: do bulk array bounds check after checking populated elements | expand

Commit Message

Mel Gorman June 18, 2021, 12:51 p.m. UTC
Dan Carpenter reported the following

  The patch 0f87d9d30f21: "mm/page_alloc: add an array-based interface
  to the bulk page allocator" from Apr 29, 2021, leads to the following
  static checker warning:

        mm/page_alloc.c:5338 __alloc_pages_bulk()
        warn: potentially one past the end of array 'page_array[nr_populated]'

The problem can occur if an array is passed in that is fully populated. That
potentially ends up allocating a single page and storing it past the end of
the array. This patch returns 0 if the array is fully populated.

Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mel Gorman <mgorman@techsinguliarity.net>
---
 mm/page_alloc.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Dave Jones June 28, 2021, 4:27 a.m. UTC | #1
On Fri, Jun 18, 2021 at 01:51:02PM +0100, Mel Gorman wrote:
 > Dan Carpenter reported the following
 > 
 >   The patch 0f87d9d30f21: "mm/page_alloc: add an array-based interface
 >   to the bulk page allocator" from Apr 29, 2021, leads to the following
 >   static checker warning:
 > 
 >         mm/page_alloc.c:5338 __alloc_pages_bulk()
 >         warn: potentially one past the end of array 'page_array[nr_populated]'
 > 
 > The problem can occur if an array is passed in that is fully populated. That
 > potentially ends up allocating a single page and storing it past the end of
 > the array. This patch returns 0 if the array is fully populated.
 > 
 > Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
 > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
 > Signed-off-by: Mel Gorman <mgorman@techsinguliarity.net>
 > ---
 >  mm/page_alloc.c | 4 ++++
 >  1 file changed, 4 insertions(+)
 > 
 > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 > index 7124bb00219d..ef2265f86b91 100644
 > --- a/mm/page_alloc.c
 > +++ b/mm/page_alloc.c
 > @@ -5056,6 +5056,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 >  	while (page_array && nr_populated < nr_pages && page_array[nr_populated])
 >  		nr_populated++;
 >  
 > +	/* Already populated array? */
 > +	if (unlikely(page_array && nr_pages - nr_populated == 0))
 > +		return 0;
 > +
 >  	/* Use the single page allocator for one page. */
 >  	if (nr_pages - nr_populated == 1)
 >  		goto failed;


This made it into 5.13 final, and completely breaks NFSD for me (Serving tcp v3 mounts).
Existing mounts on clients hang, as do new mounts from new clients.
Rebooting the server back to rc7 everything recovers.  Bisect lands on
this commit.

$ git bisect start
# good: [13311e74253fe64329390df80bed3f07314ddd61] Linux 5.13-rc7
git bisect good 13311e74253fe64329390df80bed3f07314ddd61
# bad: [b665c68f11192e7b31e9b793f31c78d80558da07] restart watchdog if oopsing
git bisect bad b665c68f11192e7b31e9b793f31c78d80558da07
# good: [b960e0147451915b5d4cd208b7abd3b07ceaf1a2] Merge tag 'for-linus-5.13b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect good b960e0147451915b5d4cd208b7abd3b07ceaf1a2
# bad: [7ce32ac6fb2fc73584b567c73ae0c47528954ec6] Merge branch 'akpm' (patches from Andrew)
git bisect bad 7ce32ac6fb2fc73584b567c73ae0c47528954ec6
# good: [5fa54346caf67b4b1b10b1f390316ae466da4d53] kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
git bisect good 5fa54346caf67b4b1b10b1f390316ae466da4d53
# bad: [72a461adbe88acf6a8cc5dba7720cf94d7056154] mailmap: add Marek's other e-mail address and identity without diacritics
git bisect bad 72a461adbe88acf6a8cc5dba7720cf94d7056154
# good: [ea6d0630100b285f059d0a8d8e86f38a46407536] mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
git bisect good ea6d0630100b285f059d0a8d8e86f38a46407536
# bad: [b3b64ebd38225d8032b5db42938d969b602040c2] mm/page_alloc: do bulk array bounds check after checking populated elements
git bisect bad b3b64ebd38225d8032b5db42938d969b602040c2
# good: [b08e50dd64489e3997029d204f761cb57a3762d2] mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array
git bisect good b08e50dd64489e3997029d204f761cb57a3762d2


	Dave
Dave Jones June 28, 2021, 4:59 a.m. UTC | #2
On Mon, Jun 28, 2021 at 12:27:59AM -0400, Dave Jones wrote:
 > On Fri, Jun 18, 2021 at 01:51:02PM +0100, Mel Gorman wrote:
 >  > Dan Carpenter reported the following
 >  > 
 >  >   The patch 0f87d9d30f21: "mm/page_alloc: add an array-based interface
 >  >   to the bulk page allocator" from Apr 29, 2021, leads to the following
 >  >   static checker warning:
 >  > 
 >  >         mm/page_alloc.c:5338 __alloc_pages_bulk()
 >  >         warn: potentially one past the end of array 'page_array[nr_populated]'
 >  > 
 >  > The problem can occur if an array is passed in that is fully populated. That
 >  > potentially ends up allocating a single page and storing it past the end of
 >  > the array. This patch returns 0 if the array is fully populated.
 >  > 
 >  > Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
 >  > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
 >  > Signed-off-by: Mel Gorman <mgorman@techsinguliarity.net>
 >  > ---
 >  >  mm/page_alloc.c | 4 ++++
 >  >  1 file changed, 4 insertions(+)
 >  > 
 >  > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 >  > index 7124bb00219d..ef2265f86b91 100644
 >  > --- a/mm/page_alloc.c
 >  > +++ b/mm/page_alloc.c
 >  > @@ -5056,6 +5056,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 >  >  	while (page_array && nr_populated < nr_pages && page_array[nr_populated])
 >  >  		nr_populated++;
 >  >  
 >  > +	/* Already populated array? */
 >  > +	if (unlikely(page_array && nr_pages - nr_populated == 0))
 >  > +		return 0;
 >  > +
 >  >  	/* Use the single page allocator for one page. */
 >  >  	if (nr_pages - nr_populated == 1)
 >  >  		goto failed;
 > 
 > 
 > This made it into 5.13 final, and completely breaks NFSD for me (Serving tcp v3 mounts).
 > Existing mounts on clients hang, as do new mounts from new clients.
 > Rebooting the server back to rc7 everything recovers.  Bisect lands on
 > this commit.

replacing the return 0 with a warn gets me this:

[   15.127686] WARNING: CPU: 1 PID: 1673 at mm/page_alloc.c:5060 __alloc_pages_bulk+0x6a/0x560
[   15.129286] CPU: 1 PID: 1673 Comm: lockd Not tainted 5.13.0-firewall+ #15
[   15.129290] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Q3XXG4-P, BIOS 5.6.5 06/30/2018
[   15.129292] RIP: 0010:__alloc_pages_bulk+0x6a/0x560
[   15.129301] Code: 00 48 83 38 00 0f 84 69 04 00 00 48 83 c0 08 31 db eb 0f 48 83 c0 08 48 83 78 f8 00 0f 84 72 03 00 00 83 c3 01 41 39 df 75 e9 <0f> 0b 8b 05 0e 71 32 01 8b 6c 24 14 23 2d 10 71 32 01 85 c0 0f 85
[   15.129303] RSP: 0018:ffffb920031e7e10 EFLAGS: 00010246
[   15.129306] RAX: ffff9fff4ef502f0 RBX: 0000000000000004 RCX: 0000000000000004
[   15.129308] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000cc0
[   15.129309] RBP: 0000000000000004 R08: 0000000000000000 R09: ffff9fff4ef502d0
[   15.129311] R10: 0000000000000000 R11: 0000000000000002 R12: ffff9fff42186240
[   15.129312] R13: 7fffffffffffffff R14: 0000000000000004 R15: 0000000000000004
[   15.129317] FS:  0000000000000000(0000) GS:ffffa00057480000(0000) knlGS:0000000000000000
[   15.129319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.129321] CR2: 00007f0bf731c000 CR3: 000000007840b002 CR4: 00000000001706e0
[   15.129323] Call Trace:
[   15.129327]  svc_recv+0x7a/0x820
[   15.129333]  ? __schedule+0x358/0x700
[   15.129338]  ? nlmsvc_retry_blocked+0x1b/0x280
[   15.156135]  ? grace_ender+0x20/0x20
[   15.157546]  lockd+0x83/0x140
[   15.157564]  kthread+0x116/0x140
[   15.157569]  ? __kthread_create_on_node+0x180/0x180
[   15.157574]  ret_from_fork+0x22/0x30
[   15.157579] ---[ end trace 0854dec2f1b25912 ]---
Mel Gorman June 28, 2021, 11:53 a.m. UTC | #3
On Mon, Jun 28, 2021 at 12:27:59AM -0400, Dave Jones wrote:
> On Fri, Jun 18, 2021 at 01:51:02PM +0100, Mel Gorman wrote:
>  > Dan Carpenter reported the following
>  > 
>  >   The patch 0f87d9d30f21: "mm/page_alloc: add an array-based interface
>  >   to the bulk page allocator" from Apr 29, 2021, leads to the following
>  >   static checker warning:
>  > 
>  >         mm/page_alloc.c:5338 __alloc_pages_bulk()
>  >         warn: potentially one past the end of array 'page_array[nr_populated]'
>  > 
>  > The problem can occur if an array is passed in that is fully populated. That
>  > potentially ends up allocating a single page and storing it past the end of
>  > the array. This patch returns 0 if the array is fully populated.
>  > 
>  > Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
>  > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>  > Signed-off-by: Mel Gorman <mgorman@techsinguliarity.net>
>  > ---
>  >  mm/page_alloc.c | 4 ++++
>  >  1 file changed, 4 insertions(+)
>  > 
>  > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>  > index 7124bb00219d..ef2265f86b91 100644
>  > --- a/mm/page_alloc.c
>  > +++ b/mm/page_alloc.c
>  > @@ -5056,6 +5056,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  >  	while (page_array && nr_populated < nr_pages && page_array[nr_populated])
>  >  		nr_populated++;
>  >  
>  > +	/* Already populated array? */
>  > +	if (unlikely(page_array && nr_pages - nr_populated == 0))
>  > +		return 0;
>  > +
>  >  	/* Use the single page allocator for one page. */
>  >  	if (nr_pages - nr_populated == 1)
>  >  		goto failed;
> 
> 
> This made it into 5.13 final, and completely breaks NFSD for me (Serving tcp v3 mounts).
> Existing mounts on clients hang, as do new mounts from new clients.
> Rebooting the server back to rc7 everything recovers.  Bisect lands on
> this commit.
> 

Thanks Dave, can you try this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef2265f86b91..04220581579c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5058,7 +5058,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	/* Already populated array? */
 	if (unlikely(page_array && nr_pages - nr_populated == 0))
-		return 0;
+		return nr_populated;
 
 	/* Use the single page allocator for one page. */
 	if (nr_pages - nr_populated == 1)
Dave Jones June 28, 2021, 2:48 p.m. UTC | #4
On Mon, Jun 28, 2021 at 12:53:23PM +0100, Mel Gorman wrote:

 > > This made it into 5.13 final, and completely breaks NFSD for me (Serving tcp v3 mounts).
 > > Existing mounts on clients hang, as do new mounts from new clients.
 > > Rebooting the server back to rc7 everything recovers.  Bisect lands on
 > > this commit.
 > 
 > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 > index ef2265f86b91..04220581579c 100644
 > --- a/mm/page_alloc.c
 > +++ b/mm/page_alloc.c
 > @@ -5058,7 +5058,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 >  
 >  	/* Already populated array? */
 >  	if (unlikely(page_array && nr_pages - nr_populated == 0))
 > -		return 0;
 > +		return nr_populated;

Yep, this works.

	Dave
Mel Gorman June 28, 2021, 3:03 p.m. UTC | #5
On Mon, Jun 28, 2021 at 10:48:05AM -0400, Dave Jones wrote:
> On Mon, Jun 28, 2021 at 12:53:23PM +0100, Mel Gorman wrote:
> 
>  > > This made it into 5.13 final, and completely breaks NFSD for me (Serving tcp v3 mounts).
>  > > Existing mounts on clients hang, as do new mounts from new clients.
>  > > Rebooting the server back to rc7 everything recovers.  Bisect lands on
>  > > this commit.
>  > 
>  > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>  > index ef2265f86b91..04220581579c 100644
>  > --- a/mm/page_alloc.c
>  > +++ b/mm/page_alloc.c
>  > @@ -5058,7 +5058,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  >  
>  >  	/* Already populated array? */
>  >  	if (unlikely(page_array && nr_pages - nr_populated == 0))
>  > -		return 0;
>  > +		return nr_populated;
> 
> Yep, this works.
> 

Thanks Dave, it passed a dbench test over NFS locally as well so I sent
a proper version of the patch. Hopefully it'll be picked up relatively
quickly and appear in a 5.13.1 release.
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7124bb00219d..ef2265f86b91 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5056,6 +5056,10 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	while (page_array && nr_populated < nr_pages && page_array[nr_populated])
 		nr_populated++;
 
+	/* Already populated array? */
+	if (unlikely(page_array && nr_pages - nr_populated == 0))
+		return 0;
+
 	/* Use the single page allocator for one page. */
 	if (nr_pages - nr_populated == 1)
 		goto failed;