mm: page_alloc: document kmemleak's non-blockable __GFP_NOFAIL case
diff mbox series

Message ID 1562964544-59519-1-git-send-email-yang.shi@linux.alibaba.com
State New
Headers show
Series
  • mm: page_alloc: document kmemleak's non-blockable __GFP_NOFAIL case
Related show

Commit Message

Yang Shi July 12, 2019, 8:49 p.m. UTC
When running ltp's oom test with kmemleak enabled, the below warning was
triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
passed in:

WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
Modules linked in: loop dax_pmem dax_pmem_core
ip_tables x_tables xfs virtio_net net_failover virtio_blk failover
ata_generic virtio_pci virtio_ring virtio libata
CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
...
 kmemleak_alloc+0x4e/0xb0
 kmem_cache_alloc+0x2a7/0x3e0
 ? __kmalloc+0x1d6/0x470
 ? ___might_sleep+0x9c/0x170
 ? mempool_alloc+0x2b0/0x2b0
 mempool_alloc_slab+0x2d/0x40
 mempool_alloc+0x118/0x2b0
 ? __kasan_check_read+0x11/0x20
 ? mempool_resize+0x390/0x390
 ? lock_downgrade+0x3c0/0x3c0
 bio_alloc_bioset+0x19d/0x350
 ? __swap_duplicate+0x161/0x240
 ? bvec_alloc+0x1b0/0x1b0
 ? do_raw_spin_unlock+0xa8/0x140
 ? _raw_spin_unlock+0x27/0x40
 get_swap_bio+0x80/0x230
 ? __x64_sys_madvise+0x50/0x50
 ? end_swap_bio_read+0x310/0x310
 ? __kasan_check_read+0x11/0x20
 ? check_chain_key+0x24e/0x300
 ? bdev_write_page+0x55/0x130
 __swap_writepage+0x5ff/0xb20

The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, kmemleak has
__GFP_NOFAIL set all the time due to commit
d9570ee3bd1d4f20ce63485f5ef05663866fe6c0 ("kmemleak: allow to coexist
with fault injection").

The fault-injection would not try to fail slab or page allocation if
__GFP_NOFAIL is used and that commit tries to turn off fault injection
for kmemleak allocation.  Although __GFP_NOFAIL doesn't guarantee no
failure for all the cases (i.e. non-blockable allocation may fail), it
still makes sense to the most cases.  Kmemleak is also a debugging tool,
so it sounds not worth changing the behavior.

It also meaks sense to keep the warning, so just document the special
case in the comment.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 mm/page_alloc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

David Rientjes July 13, 2019, 7:39 p.m. UTC | #1
On Sat, 13 Jul 2019, Yang Shi wrote:

> When running ltp's oom test with kmemleak enabled, the below warning was
> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
> passed in:
> 
> WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
> Modules linked in: loop dax_pmem dax_pmem_core
> ip_tables x_tables xfs virtio_net net_failover virtio_blk failover
> ata_generic virtio_pci virtio_ring virtio libata
> CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
> ...
>  kmemleak_alloc+0x4e/0xb0
>  kmem_cache_alloc+0x2a7/0x3e0
>  ? __kmalloc+0x1d6/0x470
>  ? ___might_sleep+0x9c/0x170
>  ? mempool_alloc+0x2b0/0x2b0
>  mempool_alloc_slab+0x2d/0x40
>  mempool_alloc+0x118/0x2b0
>  ? __kasan_check_read+0x11/0x20
>  ? mempool_resize+0x390/0x390
>  ? lock_downgrade+0x3c0/0x3c0
>  bio_alloc_bioset+0x19d/0x350
>  ? __swap_duplicate+0x161/0x240
>  ? bvec_alloc+0x1b0/0x1b0
>  ? do_raw_spin_unlock+0xa8/0x140
>  ? _raw_spin_unlock+0x27/0x40
>  get_swap_bio+0x80/0x230
>  ? __x64_sys_madvise+0x50/0x50
>  ? end_swap_bio_read+0x310/0x310
>  ? __kasan_check_read+0x11/0x20
>  ? check_chain_key+0x24e/0x300
>  ? bdev_write_page+0x55/0x130
>  __swap_writepage+0x5ff/0xb20
> 
> The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, kmemleak has
> __GFP_NOFAIL set all the time due to commit
> d9570ee3bd1d4f20ce63485f5ef05663866fe6c0 ("kmemleak: allow to coexist
> with fault injection").
> 

It only clears __GFP_DIRECT_RECLAIM provisionally to see if the allocation 
would immediately succeed before falling back to the elements in the 
mempool.  If that fails, and the mempool is empty, mempool_alloc() 
attempts the allocation with __GFP_DIRECT_RECLAIM.  So for the problem 
described here, I think what we really want is this:

diff --git a/mm/mempool.c b/mm/mempool.c
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -386,7 +386,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 	gfp_mask |= __GFP_NORETRY;	/* don't loop in __alloc_pages */
 	gfp_mask |= __GFP_NOWARN;	/* failures are OK */
 
-	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
+	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO|__GFP_NOFAIL);
 
 repeat_alloc:
 
But bio_alloc_bioset() plays with gfp_mask itself: are we sure that it 
isn't the one clearing __GFP_DIRECT_RECLAIM itself before falling back to 
saved_gfp?

In other words do we also want this?

diff --git a/block/bio.c b/block/bio.c
--- a/block/bio.c
+++ b/block/bio.c
@@ -462,16 +462,16 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
 		 * We solve this, and guarantee forward progress, with a rescuer
 		 * workqueue per bio_set. If we go to allocate and there are
 		 * bios on current->bio_list, we first try the allocation
-		 * without __GFP_DIRECT_RECLAIM; if that fails, we punt those
-		 * bios we would be blocking to the rescuer workqueue before
-		 * we retry with the original gfp_flags.
+		 * without __GFP_DIRECT_RECLAIM or __GFP_NOFAIL; if that fails,
+		 * we punt those bios we would be blocking to the rescuer
+		 * workqueue before we retry with the original gfp_flags.
 		 */
-
 		if (current->bio_list &&
 		    (!bio_list_empty(&current->bio_list[0]) ||
 		     !bio_list_empty(&current->bio_list[1])) &&
 		    bs->rescue_workqueue)
-			gfp_mask &= ~__GFP_DIRECT_RECLAIM;
+			gfp_mask &= ~(__GFP_DIRECT_RECLAIM |
+				      __GFP_NOFAIL);
 
 		p = mempool_alloc(&bs->bio_pool, gfp_mask);
 		if (!p && gfp_mask != saved_gfp) {
Matthew Wilcox July 13, 2019, 9:25 p.m. UTC | #2
On Sat, Jul 13, 2019 at 04:49:04AM +0800, Yang Shi wrote:
> When running ltp's oom test with kmemleak enabled, the below warning was
> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
> passed in:

There are lots of places where kmemleak will call kmalloc with
__GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM (including the XArray code, which
is how I know about it).  It needs to be fixed to allow its internal
allocations to fail and return failure of the original allocation as
a consequence.
Yang Shi July 15, 2019, 3:43 a.m. UTC | #3
On 7/13/19 12:39 PM, David Rientjes wrote:
> On Sat, 13 Jul 2019, Yang Shi wrote:
>
>> When running ltp's oom test with kmemleak enabled, the below warning was
>> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
>> passed in:
>>
>> WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
>> Modules linked in: loop dax_pmem dax_pmem_core
>> ip_tables x_tables xfs virtio_net net_failover virtio_blk failover
>> ata_generic virtio_pci virtio_ring virtio libata
>> CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>> RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
>> ...
>>   kmemleak_alloc+0x4e/0xb0
>>   kmem_cache_alloc+0x2a7/0x3e0
>>   ? __kmalloc+0x1d6/0x470
>>   ? ___might_sleep+0x9c/0x170
>>   ? mempool_alloc+0x2b0/0x2b0
>>   mempool_alloc_slab+0x2d/0x40
>>   mempool_alloc+0x118/0x2b0
>>   ? __kasan_check_read+0x11/0x20
>>   ? mempool_resize+0x390/0x390
>>   ? lock_downgrade+0x3c0/0x3c0
>>   bio_alloc_bioset+0x19d/0x350
>>   ? __swap_duplicate+0x161/0x240
>>   ? bvec_alloc+0x1b0/0x1b0
>>   ? do_raw_spin_unlock+0xa8/0x140
>>   ? _raw_spin_unlock+0x27/0x40
>>   get_swap_bio+0x80/0x230
>>   ? __x64_sys_madvise+0x50/0x50
>>   ? end_swap_bio_read+0x310/0x310
>>   ? __kasan_check_read+0x11/0x20
>>   ? check_chain_key+0x24e/0x300
>>   ? bdev_write_page+0x55/0x130
>>   __swap_writepage+0x5ff/0xb20
>>
>> The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, kmemleak has
>> __GFP_NOFAIL set all the time due to commit
>> d9570ee3bd1d4f20ce63485f5ef05663866fe6c0 ("kmemleak: allow to coexist
>> with fault injection").
>>
> It only clears __GFP_DIRECT_RECLAIM provisionally to see if the allocation
> would immediately succeed before falling back to the elements in the
> mempool.  If that fails, and the mempool is empty, mempool_alloc()
> attempts the allocation with __GFP_DIRECT_RECLAIM.  So for the problem
> described here, I think what we really want is this:
>
> diff --git a/mm/mempool.c b/mm/mempool.c
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -386,7 +386,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
>   	gfp_mask |= __GFP_NORETRY;	/* don't loop in __alloc_pages */
>   	gfp_mask |= __GFP_NOWARN;	/* failures are OK */
>   
> -	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
> +	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO|__GFP_NOFAIL);
>   
>   repeat_alloc:
>   
> But bio_alloc_bioset() plays with gfp_mask itself: are we sure that it
> isn't the one clearing __GFP_DIRECT_RECLAIM itself before falling back to
> saved_gfp?
>
> In other words do we also want this?
>
> diff --git a/block/bio.c b/block/bio.c
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -462,16 +462,16 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
>   		 * We solve this, and guarantee forward progress, with a rescuer
>   		 * workqueue per bio_set. If we go to allocate and there are
>   		 * bios on current->bio_list, we first try the allocation
> -		 * without __GFP_DIRECT_RECLAIM; if that fails, we punt those
> -		 * bios we would be blocking to the rescuer workqueue before
> -		 * we retry with the original gfp_flags.
> +		 * without __GFP_DIRECT_RECLAIM or __GFP_NOFAIL; if that fails,
> +		 * we punt those bios we would be blocking to the rescuer
> +		 * workqueue before we retry with the original gfp_flags.
>   		 */
> -
>   		if (current->bio_list &&
>   		    (!bio_list_empty(&current->bio_list[0]) ||
>   		     !bio_list_empty(&current->bio_list[1])) &&
>   		    bs->rescue_workqueue)
> -			gfp_mask &= ~__GFP_DIRECT_RECLAIM;
> +			gfp_mask &= ~(__GFP_DIRECT_RECLAIM |
> +				      __GFP_NOFAIL);
>   
>   		p = mempool_alloc(&bs->bio_pool, gfp_mask);
>   		if (!p && gfp_mask != saved_gfp) {

I don't think it will make any difference by removing __GFP_NOFAIL 
outside kmemleak. The problem is the commit 
d9570ee3bd1d4f20ce63485f5ef05663866fe6c0 ("kmemleak: allow to coexist 
with fault injection") makes __GFP_NOFAIL is set for kmemleak always in 
order to turn off fault-injection for kmemleak.

As long as kmemleak is called in ~__GFP_DIRECT_RECLAIM path, the warning 
might be hit.

And since kmemleak is just a debugging tool, so IMHO I don't think this 
is worth fixing, so I came up with the patch to document it.
Yang Shi July 15, 2019, 3:47 a.m. UTC | #4
On 7/13/19 2:25 PM, Matthew Wilcox wrote:
> On Sat, Jul 13, 2019 at 04:49:04AM +0800, Yang Shi wrote:
>> When running ltp's oom test with kmemleak enabled, the below warning was
>> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
>> passed in:
> There are lots of places where kmemleak will call kmalloc with
> __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM (including the XArray code, which
> is how I know about it).  It needs to be fixed to allow its internal
> allocations to fail and return failure of the original allocation as
> a consequence.

Do you mean kmemleak internal allocation? It would fail even though 
__GFP_NOFAIL is passed in if GFP_NOWAIT is specified. Currently buddy 
allocator will not retry if the allocation is non-blockable.
Matthew Wilcox July 15, 2019, 1:06 p.m. UTC | #5
On Sun, Jul 14, 2019 at 08:47:07PM -0700, Yang Shi wrote:
> 
> 
> On 7/13/19 2:25 PM, Matthew Wilcox wrote:
> > On Sat, Jul 13, 2019 at 04:49:04AM +0800, Yang Shi wrote:
> > > When running ltp's oom test with kmemleak enabled, the below warning was
> > > triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
> > > passed in:
> > There are lots of places where kmemleak will call kmalloc with
> > __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM (including the XArray code, which
> > is how I know about it).  It needs to be fixed to allow its internal
> > allocations to fail and return failure of the original allocation as
> > a consequence.
> 
> Do you mean kmemleak internal allocation? It would fail even though
> __GFP_NOFAIL is passed in if GFP_NOWAIT is specified. Currently buddy
> allocator will not retry if the allocation is non-blockable.

Actually it sets off a warning.  Which is the right response from the
core mm code because specifying __GFP_NOFAIL and __GFP_NOWAIT makes no
sense.
Michal Hocko July 15, 2019, 1:17 p.m. UTC | #6
On Sat 13-07-19 04:49:04, Yang Shi wrote:
> When running ltp's oom test with kmemleak enabled, the below warning was
> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
> passed in:

kmemleak is broken and this is a long term issue. I thought that
Catalin had something to address this.

While this patch only adds a comment and discourages future changes of
the warning which is fine and probably something that we should do,
kmemleak really should be fixed sooner than later.
Michal Hocko July 15, 2019, 1:18 p.m. UTC | #7
On Sat 13-07-19 12:39:16, David Rientjes wrote:
> On Sat, 13 Jul 2019, Yang Shi wrote:
> 
> > When running ltp's oom test with kmemleak enabled, the below warning was
> > triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
> > passed in:
> > 
> > WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
> > Modules linked in: loop dax_pmem dax_pmem_core
> > ip_tables x_tables xfs virtio_net net_failover virtio_blk failover
> > ata_generic virtio_pci virtio_ring virtio libata
> > CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
> > ...
> >  kmemleak_alloc+0x4e/0xb0
> >  kmem_cache_alloc+0x2a7/0x3e0
> >  ? __kmalloc+0x1d6/0x470
> >  ? ___might_sleep+0x9c/0x170
> >  ? mempool_alloc+0x2b0/0x2b0
> >  mempool_alloc_slab+0x2d/0x40
> >  mempool_alloc+0x118/0x2b0
> >  ? __kasan_check_read+0x11/0x20
> >  ? mempool_resize+0x390/0x390
> >  ? lock_downgrade+0x3c0/0x3c0
> >  bio_alloc_bioset+0x19d/0x350
> >  ? __swap_duplicate+0x161/0x240
> >  ? bvec_alloc+0x1b0/0x1b0
> >  ? do_raw_spin_unlock+0xa8/0x140
> >  ? _raw_spin_unlock+0x27/0x40
> >  get_swap_bio+0x80/0x230
> >  ? __x64_sys_madvise+0x50/0x50
> >  ? end_swap_bio_read+0x310/0x310
> >  ? __kasan_check_read+0x11/0x20
> >  ? check_chain_key+0x24e/0x300
> >  ? bdev_write_page+0x55/0x130
> >  __swap_writepage+0x5ff/0xb20
> > 
> > The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, kmemleak has
> > __GFP_NOFAIL set all the time due to commit
> > d9570ee3bd1d4f20ce63485f5ef05663866fe6c0 ("kmemleak: allow to coexist
> > with fault injection").
> > 
> 
> It only clears __GFP_DIRECT_RECLAIM provisionally to see if the allocation 
> would immediately succeed before falling back to the elements in the 
> mempool.  If that fails, and the mempool is empty, mempool_alloc() 
> attempts the allocation with __GFP_DIRECT_RECLAIM.  So for the problem 
> described here, I think what we really want is this:
> 
> diff --git a/mm/mempool.c b/mm/mempool.c
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -386,7 +386,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
>  	gfp_mask |= __GFP_NORETRY;	/* don't loop in __alloc_pages */
>  	gfp_mask |= __GFP_NOWARN;	/* failures are OK */
>  
> -	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
> +	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO|__GFP_NOFAIL);
>  
>  repeat_alloc:

No, I do not think we should make mempool allocator more complex for
something that is an implementation problem the kmemleak.
Catalin Marinas July 15, 2019, 3:01 p.m. UTC | #8
On 15 Jul 2019, at 08:17, Michal Hocko <mhocko@kernel.org> wrote:
> On Sat 13-07-19 04:49:04, Yang Shi wrote:
>> When running ltp's oom test with kmemleak enabled, the below warning was
>> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
>> passed in:
> 
> kmemleak is broken and this is a long term issue. I thought that
> Catalin had something to address this.

What needs to be done in the short term is revert commit d9570ee3bd1d4f20ce63485f5ef05663866fe6c0. Longer term the solution is to embed kmemleak metadata into the slab so that we don’t have the situation where the primary slab allocation success but the kmemleak metadata fails. 

I’m on holiday for one more week with just a phone to reply from but feel free to revert the above commit. I’ll follow up with a better solution. 

Catalin
Qian Cai July 15, 2019, 3:18 p.m. UTC | #9
On Mon, 2019-07-15 at 10:01 -0500, Catalin Marinas wrote:
> On 15 Jul 2019, at 08:17, Michal Hocko <mhocko@kernel.org> wrote:
> > On Sat 13-07-19 04:49:04, Yang Shi wrote:
> > > When running ltp's oom test with kmemleak enabled, the below warning was
> > > triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
> > > passed in:
> > 
> > kmemleak is broken and this is a long term issue. I thought that
> > Catalin had something to address this.
> 
> What needs to be done in the short term is revert commit
> d9570ee3bd1d4f20ce63485f5ef05663866fe6c0. Longer term the solution is to embed
> kmemleak metadata into the slab so that we don’t have the situation where the
> primary slab allocation success but the kmemleak metadata fails. 
> 
> I’m on holiday for one more week with just a phone to reply from but feel free
> to revert the above commit. I’ll follow up with a better solution. 

Well, the reverting will only make the situation worst for the kmemleak under
memory pressure. In the meantime, if someone wants to push for the mempool
solution with tunable pool sizes along with the reverting, that could be an
improvement.

https://lore.kernel.org/linux-mm/20190328145917.GC10283@arrakis.emea.arm.com/
Yang Shi July 15, 2019, 4:58 p.m. UTC | #10
On 7/15/19 8:18 AM, Qian Cai wrote:
> On Mon, 2019-07-15 at 10:01 -0500, Catalin Marinas wrote:
>> On 15 Jul 2019, at 08:17, Michal Hocko <mhocko@kernel.org> wrote:
>>> On Sat 13-07-19 04:49:04, Yang Shi wrote:
>>>> When running ltp's oom test with kmemleak enabled, the below warning was
>>>> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
>>>> passed in:
>>> kmemleak is broken and this is a long term issue. I thought that
>>> Catalin had something to address this.
>> What needs to be done in the short term is revert commit
>> d9570ee3bd1d4f20ce63485f5ef05663866fe6c0. Longer term the solution is to embed
>> kmemleak metadata into the slab so that we don’t have the situation where the
>> primary slab allocation success but the kmemleak metadata fails.
>>
>> I’m on holiday for one more week with just a phone to reply from but feel free
>> to revert the above commit. I’ll follow up with a better solution.
> Well, the reverting will only make the situation worst for the kmemleak under
> memory pressure. In the meantime, if someone wants to push for the mempool

I think this is expected by reverting that commit since kmemleak 
metadata could fail. But, it could fail too even though that commit is 
not reverted if the context is non-blockable.

> solution with tunable pool sizes along with the reverting, that could be an
> improvement.
>
> https://lore.kernel.org/linux-mm/20190328145917.GC10283@arrakis.emea.arm.com/
Yang Shi July 15, 2019, 5 p.m. UTC | #11
On 7/15/19 6:06 AM, Matthew Wilcox wrote:
> On Sun, Jul 14, 2019 at 08:47:07PM -0700, Yang Shi wrote:
>>
>> On 7/13/19 2:25 PM, Matthew Wilcox wrote:
>>> On Sat, Jul 13, 2019 at 04:49:04AM +0800, Yang Shi wrote:
>>>> When running ltp's oom test with kmemleak enabled, the below warning was
>>>> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
>>>> passed in:
>>> There are lots of places where kmemleak will call kmalloc with
>>> __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM (including the XArray code, which
>>> is how I know about it).  It needs to be fixed to allow its internal
>>> allocations to fail and return failure of the original allocation as
>>> a consequence.
>> Do you mean kmemleak internal allocation? It would fail even though
>> __GFP_NOFAIL is passed in if GFP_NOWAIT is specified. Currently buddy
>> allocator will not retry if the allocation is non-blockable.
> Actually it sets off a warning.  Which is the right response from the
> core mm code because specifying __GFP_NOFAIL and __GFP_NOWAIT makes no
> sense.

Yes, this is what I meant. Kmemleak did a trick to fool fault-injection 
by passing in __GFP_NOFAIL, but it doesn't make sense for non-blockable 
allocation.
Yang Shi July 16, 2019, 5:38 p.m. UTC | #12
On 7/15/19 8:01 AM, Catalin Marinas wrote:
> On 15 Jul 2019, at 08:17, Michal Hocko <mhocko@kernel.org> wrote:
>> On Sat 13-07-19 04:49:04, Yang Shi wrote:
>>> When running ltp's oom test with kmemleak enabled, the below warning was
>>> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
>>> passed in:
>> kmemleak is broken and this is a long term issue. I thought that
>> Catalin had something to address this.
> What needs to be done in the short term is revert commit d9570ee3bd1d4f20ce63485f5ef05663866fe6c0. Longer term the solution is to embed kmemleak metadata into the slab so that we don’t have the situation where the primary slab allocation success but the kmemleak metadata fails.
>
> I’m on holiday for one more week with just a phone to reply from but feel free to revert the above commit. I’ll follow up with a better solution.

Thanks, I'm going to submit a new patch to revert that commit.

Yang

>
> Catalin
Andrew Morton July 25, 2019, 2:48 a.m. UTC | #13
On Sat, 13 Jul 2019 04:49:04 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote:

> When running ltp's oom test with kmemleak enabled, the below warning was
> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
> passed in:
> 
> ...
>
> The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, kmemleak has
> __GFP_NOFAIL set all the time due to commit
> d9570ee3bd1d4f20ce63485f5ef05663866fe6c0 ("kmemleak: allow to coexist
> with fault injection").
> 
> The fault-injection would not try to fail slab or page allocation if
> __GFP_NOFAIL is used and that commit tries to turn off fault injection
> for kmemleak allocation.  Although __GFP_NOFAIL doesn't guarantee no
> failure for all the cases (i.e. non-blockable allocation may fail), it
> still makes sense to the most cases.  Kmemleak is also a debugging tool,
> so it sounds not worth changing the behavior.
> 
> It also meaks sense to keep the warning, so just document the special
> case in the comment.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4531,8 +4531,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  	 */
>  	if (gfp_mask & __GFP_NOFAIL) {
>  		/*
> -		 * All existing users of the __GFP_NOFAIL are blockable, so warn
> -		 * of any new users that actually require GFP_NOWAIT
> +		 * The users of the __GFP_NOFAIL are expected be blockable,
> +		 * and this is true for the most cases except for kmemleak.
> +		 * The kmemleak pass in __GFP_NOFAIL to skip fault injection,
> +		 * however kmemleak may allocate object at some non-blockable
> +		 * context to trigger this warning.
> +		 *
> +		 * Keep this warning since it is still useful for the most
> +		 * normal cases.
>  		 */

Comment has rather a lot of typos.  I'd normally fix them but I think
I'll duck this patch until the kmemleak situation is addressed, so we
can add a kmemleakless long-term comment, if desired.
Yang Shi July 25, 2019, 5:21 p.m. UTC | #14
On 7/24/19 7:48 PM, Andrew Morton wrote:
> On Sat, 13 Jul 2019 04:49:04 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>> When running ltp's oom test with kmemleak enabled, the below warning was
>> triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
>> passed in:
>>
>> ...
>>
>> The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, kmemleak has
>> __GFP_NOFAIL set all the time due to commit
>> d9570ee3bd1d4f20ce63485f5ef05663866fe6c0 ("kmemleak: allow to coexist
>> with fault injection").
>>
>> The fault-injection would not try to fail slab or page allocation if
>> __GFP_NOFAIL is used and that commit tries to turn off fault injection
>> for kmemleak allocation.  Although __GFP_NOFAIL doesn't guarantee no
>> failure for all the cases (i.e. non-blockable allocation may fail), it
>> still makes sense to the most cases.  Kmemleak is also a debugging tool,
>> so it sounds not worth changing the behavior.
>>
>> It also meaks sense to keep the warning, so just document the special
>> case in the comment.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4531,8 +4531,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>>   	 */
>>   	if (gfp_mask & __GFP_NOFAIL) {
>>   		/*
>> -		 * All existing users of the __GFP_NOFAIL are blockable, so warn
>> -		 * of any new users that actually require GFP_NOWAIT
>> +		 * The users of the __GFP_NOFAIL are expected be blockable,
>> +		 * and this is true for the most cases except for kmemleak.
>> +		 * The kmemleak pass in __GFP_NOFAIL to skip fault injection,
>> +		 * however kmemleak may allocate object at some non-blockable
>> +		 * context to trigger this warning.
>> +		 *
>> +		 * Keep this warning since it is still useful for the most
>> +		 * normal cases.
>>   		 */
> Comment has rather a lot of typos.  I'd normally fix them but I think
> I'll duck this patch until the kmemleak situation is addressed, so we
> can add a kmemleakless long-term comment, if desired.

Actually, this has been replaced by reverting the problematic commit. 
And, the patch has been in -mm tree. Please see: 
revert-kmemleak-allow-to-coexist-with-fault-injection.patch

I think we would like to have this merged in 5.3-rc1 or rc2?

Patch
diff mbox series

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8a..cac6efb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4531,8 +4531,14 @@  bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 	 */
 	if (gfp_mask & __GFP_NOFAIL) {
 		/*
-		 * All existing users of the __GFP_NOFAIL are blockable, so warn
-		 * of any new users that actually require GFP_NOWAIT
+		 * The users of the __GFP_NOFAIL are expected be blockable,
+		 * and this is true for the most cases except for kmemleak.
+		 * The kmemleak pass in __GFP_NOFAIL to skip fault injection,
+		 * however kmemleak may allocate object at some non-blockable
+		 * context to trigger this warning.
+		 *
+		 * Keep this warning since it is still useful for the most
+		 * normal cases.
 		 */
 		if (WARN_ON_ONCE(!can_direct_reclaim))
 			goto fail;