[Bug,87891] New: kernel BUG at mm/slab.c:2625!

Message ID	20141111170243.c24ce5fdb5efaf0814071847@linux-foundation.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> Date: Tue, 11 Nov 2014 17:02:43 -0800 From: Andrew Morton <akpm@linux-foundation.org> To: Luke Dashjr <luke@dashjr.org> Subject: Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! Message-Id: <20141111170243.c24ce5fdb5efaf0814071847@linux-foundation.org> In-Reply-To: <201411120054.04651.luke@dashjr.org> References: <bug-87891-27@https.bugzilla.kernel.org/> <alpine.DEB.2.11.1411111833220.8762@gentwo.org> <20141111164913.3616531c21c91499871c46de@linux-foundation.org> <201411120054.04651.luke@dashjr.org> Mime-Version: 1.0 Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>, bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, Mel Gorman <mel@csn.ul.ie>, Ming Lei <ming.lei@canonical.com>, Johannes Weiner <hannes@cmpxchg.org>, luke-jr+linuxbugs@utopios.org, dri-devel@lists.freedesktop.org, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Christoph Lameter <cl@linux.com> Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Andrew Morton Nov. 12, 2014, 1:02 a.m. UTC

On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote:

> On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote:
> > But anyway - Luke, please attach your .config to
> > https://bugzilla.kernel.org/show_bug.cgi?id=87891?
> 
> Done: https://bugzilla.kernel.org/attachment.cgi?id=157381
> 

OK, thanks.  No CONFIG_HIGHMEM of course.  I'm stumped.

It might just have been a random memory bitflip or other corruption of
course.  Is it repeatable at all?

If it is, please add the below and retest?

Joonsoo Kim Nov. 12, 2014, 1:22 a.m. UTC | #1

On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote:
> On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote:
> 
> > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote:
> > > But anyway - Luke, please attach your .config to
> > > https://bugzilla.kernel.org/show_bug.cgi?id=87891?
> > 
> > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381
> > 
> 
> OK, thanks.  No CONFIG_HIGHMEM of course.  I'm stumped.

Hello, Andrew.

I think that the cause is GFP_HIGHMEM.
GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM.
Please look at the do_huge_pmd_anonymous_page().
It calls alloc_hugepage_vma() and then alloc_pages_vma() is called
with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE
and then GFP_HIGHUSER_MOVABLE.

Thanks.

Andrew Morton Nov. 12, 2014, 1:44 a.m. UTC | #2

On Wed, 12 Nov 2014 10:22:45 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:

> On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote:
> > On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote:
> > 
> > > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote:
> > > > But anyway - Luke, please attach your .config to
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891?
> > > 
> > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381
> > > 
> > 
> > OK, thanks.  No CONFIG_HIGHMEM of course.  I'm stumped.
> 
> Hello, Andrew.
> 
> I think that the cause is GFP_HIGHMEM.
> GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM.
> Please look at the do_huge_pmd_anonymous_page().
> It calls alloc_hugepage_vma() and then alloc_pages_vma() is called
> with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE
> and then GFP_HIGHUSER_MOVABLE.

OK.

So where's the bug?  I'm inclined to say that it's in ttm.  It's taking
a gfp_mask which means "this is the allocation attempt which we are
attempting to satisfy" and uses that for its own allocation.

But ttm has no business using that gfp_mask for its own allocation
attempt.  If anything it should use something like, err,

	GFP_KERNEL & ~__GFP_IO & ~__GFP_FS | __GFP_HIGH

although as I mentioned earlier, it would be better to avoid allocation
altogether.

Poor ttm guys - this is a bit of a trap we set for them.

Joonsoo Kim Nov. 12, 2014, 2:13 a.m. UTC | #3

On Tue, Nov 11, 2014 at 05:44:12PM -0800, Andrew Morton wrote:
> On Wed, 12 Nov 2014 10:22:45 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> 
> > On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote:
> > > On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote:
> > > 
> > > > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote:
> > > > > But anyway - Luke, please attach your .config to
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891?
> > > > 
> > > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381
> > > > 
> > > 
> > > OK, thanks.  No CONFIG_HIGHMEM of course.  I'm stumped.
> > 
> > Hello, Andrew.
> > 
> > I think that the cause is GFP_HIGHMEM.
> > GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM.
> > Please look at the do_huge_pmd_anonymous_page().
> > It calls alloc_hugepage_vma() and then alloc_pages_vma() is called
> > with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE
> > and then GFP_HIGHUSER_MOVABLE.
> 
> OK.
> 
> So where's the bug?  I'm inclined to say that it's in ttm.  It's taking

I agree that.

> a gfp_mask which means "this is the allocation attempt which we are
> attempting to satisfy" and uses that for its own allocation.
> 
> But ttm has no business using that gfp_mask for its own allocation
> attempt.  If anything it should use something like, err,
> 
> 	GFP_KERNEL & ~__GFP_IO & ~__GFP_FS | __GFP_HIGH
> 
> although as I mentioned earlier, it would be better to avoid allocation
> altogether.

Yes, avoiding would be the best.

If not possible, introducing new common helper for changing shrinker
control's gfp to valid allocation gfp is better than just open code.

Thanks.

> 
> Poor ttm guys - this is a bit of a trap we set for them.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

Tetsuo Handa Nov. 12, 2014, 4:08 a.m. UTC | #4

Andrew Morton wrote:
> Poor ttm guys - this is a bit of a trap we set for them.

Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid deadlock.\")
changed to use sc->gfp_mask rather than GFP_KERNEL.

-       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
-                       GFP_KERNEL);
+       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);

But this bug is caused by sc->gfp_mask containing some flags which are not
in GFP_KERNEL, right? Then, I think

-       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
+       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL);

would hide this bug.

But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag) for
two reasons when __alloc_pages_nodemask() is called from shrinker functions.

(1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow
    recursive __alloc_pages_nodemask() calls, kernel stack could overflow
    under extreme memory pressure.

(2) Some shrinker functions are using sleepable locks which could make kswapd
    sleep for unpredictable duration. If kswapd is unexpectedly blocked inside
    shrinker functions and somebody is expecting that kswapd is running for
    reclaiming memory, it is a memory allocation deadlock.

Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock() to
avoid deadlock inside shrinker functions.\") prevents unlimited recursive
__alloc_pages_nodemask() calls.

Andrew Morton Nov. 12, 2014, 4:38 a.m. UTC | #5

On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:

> Andrew Morton wrote:
> > Poor ttm guys - this is a bit of a trap we set for them.
> 
> Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid deadlock.\")
> changed to use sc->gfp_mask rather than GFP_KERNEL.
> 
> -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
> -                       GFP_KERNEL);
> +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> 
> But this bug is caused by sc->gfp_mask containing some flags which are not
> in GFP_KERNEL, right? Then, I think
> 
> -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL);
> 
> would hide this bug.
> 
> But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag)

Well no - ttm_page_pool_free() should stop calling kmalloc altogether. 
Just do

	struct page *pages_to_free[16];

and rework the code to free 16 pages at a time.  Easy.

Apart from all the other things we're discussing here, it should do
this because kmalloc() isn't very reliable within a shrinker.


> for
> two reasons when __alloc_pages_nodemask() is called from shrinker functions.
> 
> (1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow
>     recursive __alloc_pages_nodemask() calls, kernel stack could overflow
>     under extreme memory pressure.
> 
> (2) Some shrinker functions are using sleepable locks which could make kswapd
>     sleep for unpredictable duration. If kswapd is unexpectedly blocked inside
>     shrinker functions and somebody is expecting that kswapd is running for
>     reclaiming memory, it is a memory allocation deadlock.
> 
> Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock() to
> avoid deadlock inside shrinker functions.\") prevents unlimited recursive
> __alloc_pages_nodemask() calls.

Yes, there are such problems.

Shrinkers do all sorts of surprising things - some of the filesystem
ones do disk writes!  And these involve all sorts of locking and memory
allocations.  But they won't be directly using scan_control.gfp_mask. 
They may be using open-coded __GFP_NOFS for the allocations.  The
complicated ones pass the IO over to kernel threads and wait for them
to complete, which addresses the stack consumption concerns (at least).

[Bug,87891] New: kernel BUG at mm/slab.c:2625!

Commit Message

Comments

Patch