kmemleak: don't use __GFP_NOFAIL
diff mbox

Message ID f054219d-6daa-68b1-0c60-0acd9ad8c5ab@i-love.sakura.ne.jp
State New
Headers show

Commit Message

Tetsuo Handa May 28, 2018, 1:05 p.m. UTC
From f0b7f6c2146f693fec6706bf9e3c34687c73f21a Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 28 May 2018 21:49:51 +0900
Subject: [PATCH] kmemleak: don't use __GFP_NOFAIL

Commit d9570ee3bd1d4f20 ("kmemleak: allow to coexist with fault injection")
added __GFP_NOFAIL to gfp_kmemleak_mask() macro because memory allocation
fault injection trivially disables kmemleak.

But since !__GFP_DIRECT_RECLAIM && __GFP_NOFAIL memory allocation is not
supported, Mathieu Malaterre is observing warning messages upon
mempool_alloc(gfp_msk & ~__GFP_DIRECT_RECLAIM) allocation request.

[  269.039118] NIP [c020e8f8] __alloc_pages_nodemask+0xa88/0xfec
[  269.039124] LR [c020e2e0] __alloc_pages_nodemask+0x470/0xfec
[  269.039128] Call Trace:
[  269.039136] [dde3b750] [c020e2e0]  __alloc_pages_nodemask+0x470/0xfec (unreliable)
[  269.039146] [dde3b820] [c0288c14] new_slab+0x53c/0x970
[  269.039155] [dde3b880] [c028b61c] ___slab_alloc.constprop.23+0x28c/0x468
[  269.039163] [dde3b920] [c028c754] kmem_cache_alloc+0x290/0x3dc
[  269.039177] [dde3b990] [c02a6030] create_object+0x50/0x3d0
[  269.039185] [dde3b9e0] [c028c7a8] kmem_cache_alloc+0x2e4/0x3dc
[  269.039193] [dde3ba50] [c0200f88] mempool_alloc+0x7c/0x164
[  269.039205] [dde3bab0] [c03e33c0] bio_alloc_bioset+0x130/0x298
[  269.039216] [dde3baf0] [c0278694] get_swap_bio+0x34/0xe8
[  269.039223] [dde3bb30] [c0278fb4] __swap_writepage+0x22c/0x644
[  269.039237] [dde3bbb0] [c022528c] pageout.isra.13+0x238/0x52c
[  269.039246] [dde3bc10] [c02288a0] shrink_page_list+0x9d4/0x1768
[  269.039254] [dde3bcb0] [c022a264] shrink_inactive_list+0x2c4/0xa34
[  269.039262] [dde3bd40] [c022b454] shrink_node_memcg+0x344/0xe34
[  269.039270] [dde3bde0] [c022c068] shrink_node+0x124/0x73c
[  269.039277] [dde3be50] [c022d78c] kswapd+0x318/0xb2c
[  269.039291] [dde3bf10] [c008e264] kthread+0x138/0x1f0
[  269.039300] [dde3bf40] [c001b2e4] ret_from_kernel_thread+0x5c/0x64

Since the intent of adding __GFP_NOFAIL is not to disable kmemleak by
failing the N'th allocation request, it should be possible to workaround
it by simply retrying N'th allocation request. Thus, this patch changes
callers of gfp_kmemleak_mask() macro to retry for several times.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Mathieu Malaterre <malat@debian.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
---
 mm/kmemleak.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

Michal Hocko May 28, 2018, 1:24 p.m. UTC | #1
I've found the previous report [1] finally. Adding Chunyu Hu to the CC
list. The report which triggered this one is [2]

[1] http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com
[2] http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com

I am not really familiar with the kmemleak code but the expectation that
you can make a forward progress in an unknown allocation context seems
broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
and refill it from a reasonably strong contexts (e.g. in a sleepable
context)?

On Mon 28-05-18 22:05:21, Tetsuo Handa wrote:
> >From f0b7f6c2146f693fec6706bf9e3c34687c73f21a Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Mon, 28 May 2018 21:49:51 +0900
> Subject: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> Commit d9570ee3bd1d4f20 ("kmemleak: allow to coexist with fault injection")
> added __GFP_NOFAIL to gfp_kmemleak_mask() macro because memory allocation
> fault injection trivially disables kmemleak.
> 
> But since !__GFP_DIRECT_RECLAIM && __GFP_NOFAIL memory allocation is not
> supported, Mathieu Malaterre is observing warning messages upon
> mempool_alloc(gfp_msk & ~__GFP_DIRECT_RECLAIM) allocation request.
> 
> [  269.039118] NIP [c020e8f8] __alloc_pages_nodemask+0xa88/0xfec
> [  269.039124] LR [c020e2e0] __alloc_pages_nodemask+0x470/0xfec
> [  269.039128] Call Trace:
> [  269.039136] [dde3b750] [c020e2e0]  __alloc_pages_nodemask+0x470/0xfec (unreliable)
> [  269.039146] [dde3b820] [c0288c14] new_slab+0x53c/0x970
> [  269.039155] [dde3b880] [c028b61c] ___slab_alloc.constprop.23+0x28c/0x468
> [  269.039163] [dde3b920] [c028c754] kmem_cache_alloc+0x290/0x3dc
> [  269.039177] [dde3b990] [c02a6030] create_object+0x50/0x3d0
> [  269.039185] [dde3b9e0] [c028c7a8] kmem_cache_alloc+0x2e4/0x3dc
> [  269.039193] [dde3ba50] [c0200f88] mempool_alloc+0x7c/0x164
> [  269.039205] [dde3bab0] [c03e33c0] bio_alloc_bioset+0x130/0x298
> [  269.039216] [dde3baf0] [c0278694] get_swap_bio+0x34/0xe8
> [  269.039223] [dde3bb30] [c0278fb4] __swap_writepage+0x22c/0x644
> [  269.039237] [dde3bbb0] [c022528c] pageout.isra.13+0x238/0x52c
> [  269.039246] [dde3bc10] [c02288a0] shrink_page_list+0x9d4/0x1768
> [  269.039254] [dde3bcb0] [c022a264] shrink_inactive_list+0x2c4/0xa34
> [  269.039262] [dde3bd40] [c022b454] shrink_node_memcg+0x344/0xe34
> [  269.039270] [dde3bde0] [c022c068] shrink_node+0x124/0x73c
> [  269.039277] [dde3be50] [c022d78c] kswapd+0x318/0xb2c
> [  269.039291] [dde3bf10] [c008e264] kthread+0x138/0x1f0
> [  269.039300] [dde3bf40] [c001b2e4] ret_from_kernel_thread+0x5c/0x64
> 
> Since the intent of adding __GFP_NOFAIL is not to disable kmemleak by
> failing the N'th allocation request, it should be possible to workaround
> it by simply retrying N'th allocation request. Thus, this patch changes
> callers of gfp_kmemleak_mask() macro to retry for several times.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: Mathieu Malaterre <malat@debian.org>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
>  mm/kmemleak.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 9a085d5..973998b 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -126,7 +126,7 @@
>  /* GFP bitmask for kmemleak internal allocations */
>  #define gfp_kmemleak_mask(gfp)	(((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
>  				 __GFP_NORETRY | __GFP_NOMEMALLOC | \
> -				 __GFP_NOWARN | __GFP_NOFAIL)
> +				 __GFP_NOWARN)
>  
>  /* scanning area inside a memory block */
>  struct kmemleak_scan_area {
> @@ -548,10 +548,12 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
>  					     int min_count, gfp_t gfp)
>  {
>  	unsigned long flags;
> -	struct kmemleak_object *object, *parent;
> +	struct kmemleak_object *object = NULL, *parent;
>  	struct rb_node **link, *rb_parent;
> +	unsigned int i;
>  
> -	object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
> +	for (i = 0; i < 10 && !object; i++)
> +		object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
>  	if (!object) {
>  		pr_warn("Cannot allocate a kmemleak_object structure\n");
>  		kmemleak_disable();
> @@ -763,7 +765,8 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
>  {
>  	unsigned long flags;
>  	struct kmemleak_object *object;
> -	struct kmemleak_scan_area *area;
> +	struct kmemleak_scan_area *area = NULL;
> +	unsigned int i;
>  
>  	object = find_and_get_object(ptr, 1);
>  	if (!object) {
> @@ -772,7 +775,9 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
>  		return;
>  	}
>  
> -	area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
> +	for (i = 0; i < 10 && !area; i++)
> +		area = kmem_cache_alloc(scan_area_cache,
> +					gfp_kmemleak_mask(gfp));
>  	if (!area) {
>  		pr_warn("Cannot allocate a scan area\n");
>  		goto out;
> -- 
> 1.8.3.1
> 
>
Tetsuo Handa May 28, 2018, 9:05 p.m. UTC | #2
Michal Hocko wrote:
> I've found the previous report [1] finally. Adding Chunyu Hu to the CC
> list. The report which triggered this one is [2]
> 
> [1] http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com
> [2] http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com
> 
> I am not really familiar with the kmemleak code but the expectation that
> you can make a forward progress in an unknown allocation context seems
> broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
> and refill it from a reasonably strong contexts (e.g. in a sleepable
> context)?

Or, we can undo the original allocation if the kmemleak allocation failed?

kmalloc(size, gfp) {
  ptr = do_kmalloc(size, gfp);
  if (ptr) {
    object = do_kmalloc(size, gfp_kmemleak_mask(gfp));
    if (!object) {
      kfree(ptr);
      return NULL;
    }
    // Store information of ptr into object.
  }
  return ptr;
}
Chunyu Hu May 29, 2018, 1:27 p.m. UTC | #3
----- Original Message -----
> From: "Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>
> To: mhocko@suse.com
> Cc: malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, "catalin marinas" <catalin.marinas@arm.com>,
> chuhu@redhat.com
> Sent: Tuesday, May 29, 2018 5:05:45 AM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> Michal Hocko wrote:
> > I've found the previous report [1] finally. Adding Chunyu Hu to the CC
> > list. The report which triggered this one is [2]
> > 
> > [1]
> > http://lkml.kernel.org/r/1524243513-29118-1-git-send-email-chuhu@redhat.com
> > [2]
> > http://lkml.kernel.org/r/CA+7wUswp_Sr=hHqi1bwRZ3FE2wY5ozZWZ8Z1BgrFnSAmijUKjA@mail.gmail.com
> > 
> > I am not really familiar with the kmemleak code but the expectation that
> > you can make a forward progress in an unknown allocation context seems
> > broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
> > and refill it from a reasonably strong contexts (e.g. in a sleepable
> > context)?
> 
> Or, we can undo the original allocation if the kmemleak allocation failed?

If so, you are making kmemleak a fault injection trigger. But the original
purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault injection.
(discussion in [1])

I'm trying with per task way for fault injection, and did some tries. In my 
try, I removed this from NOFAIL kmemleak and kmemleak survived with the per
task fault injection helper (disable/enable of task). Maybe I can send another
RFC for the api. 

> 
> kmalloc(size, gfp) {
>   ptr = do_kmalloc(size, gfp);
>   if (ptr) {
>     object = do_kmalloc(size, gfp_kmemleak_mask(gfp));
>     if (!object) {
>       kfree(ptr);
>       return NULL;
>     }
>     // Store information of ptr into object.
>   }
>   return ptr;
> }
>
Tetsuo Handa May 29, 2018, 1:46 p.m. UTC | #4
On 2018/05/29 22:27, Chunyu Hu wrote:
>>> I am not really familiar with the kmemleak code but the expectation that
>>> you can make a forward progress in an unknown allocation context seems
>>> broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
>>> and refill it from a reasonably strong contexts (e.g. in a sleepable
>>> context)?
>>
>> Or, we can undo the original allocation if the kmemleak allocation failed?
> 
> If so, you are making kmemleak a fault injection trigger. But the original
> purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault injection.
> (discussion in [1])

I don't think that applying fault injection to kmemleak allocations is bad
(except that fault injection messages might be noisy).

> 
> I'm trying with per task way for fault injection, and did some tries. In my 
> try, I removed this from NOFAIL kmemleak and kmemleak survived with the per
> task fault injection helper (disable/enable of task). Maybe I can send another
> RFC for the api. 

You could carry __GFP_NO_FAULT_INJECTION using per "struct task_struct" flag.

But I think that undoing the original allocation if the kmemleak allocation failed
has an advantage that it does not disable kmemleak when the system is under memory
pressure (i.e. about to invoke the OOM killer); allowing us to test memory pressure
conditions.
Chunyu Hu May 30, 2018, 9:35 a.m. UTC | #5
----- Original Message -----
> From: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>
> To: "Chunyu Hu" <chuhu@redhat.com>
> Cc: mhocko@suse.com, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org, "catalin marinas"
> <catalin.marinas@arm.com>
> Sent: Tuesday, May 29, 2018 9:46:59 PM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> On 2018/05/29 22:27, Chunyu Hu wrote:
> >>> I am not really familiar with the kmemleak code but the expectation that
> >>> you can make a forward progress in an unknown allocation context seems
> >>> broken to me. Why kmemleak cannot pre-allocate a pool of object_cache
> >>> and refill it from a reasonably strong contexts (e.g. in a sleepable
> >>> context)?
> >>
> >> Or, we can undo the original allocation if the kmemleak allocation failed?
> > 
> > If so, you are making kmemleak a fault injection trigger. But the original
> > purpose for adding GFP_NOFAIL[2] is just for making kmemleak avoid fault
> > injection.
> > (discussion in [1])
> 
> I don't think that applying fault injection to kmemleak allocations is bad
> (except that fault injection messages might be noisy).

Maybe we provide a way for user decide to apply fault inject to kmemleak or not 
is better, by adding another sys file in /sys/kernel/debug/failslab and the
fail_page_alloc.

> 
> > 
> > I'm trying with per task way for fault injection, and did some tries. In my
> > try, I removed this from NOFAIL kmemleak and kmemleak survived with the per
> > task fault injection helper (disable/enable of task). Maybe I can send
> > another
> > RFC for the api.
> 
> You could carry __GFP_NO_FAULT_INJECTION using per "struct task_struct" flag.

Thanks for this suggestion. 

I'm trying to reuse the make_it_fail field in task for fault injection. As adding
an extra memory alloc flag is not thought so good,  I think adding task flag
is either? 

> 
> But I think that undoing the original allocation if the kmemleak allocation
> failed
> has an advantage that it does not disable kmemleak when the system is under
> memory
> pressure (i.e. about to invoke the OOM killer); allowing us to test memory
> pressure
> conditions.

There should be benefit, this is redefining kmemleak's principle, currently it
won't affect other allocation directly, but if we free the other user's mem
alloc without thinking about the context seems also risk. 

> 
>
Michal Hocko May 30, 2018, 10:46 a.m. UTC | #6
On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
[...]
> I'm trying to reuse the make_it_fail field in task for fault injection. As adding
> an extra memory alloc flag is not thought so good,  I think adding task flag
> is either? 

Yeah, task flag will be reduced to KMEMLEAK enabled configurations
without an additional maint. overhead. Anyway, you should really think
about how to guarantee trackability for atomic allocation requests. You
cannot simply assume that GFP_NOWAIT will succeed. I guess you really
want to have a pre-populated pool of objects for those requests. The
obvious question is how to balance such a pool. It ain't easy to track
memory by allocating more memory...

Patch
diff mbox

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 9a085d5..973998b 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -126,7 +126,7 @@ 
 /* GFP bitmask for kmemleak internal allocations */
 #define gfp_kmemleak_mask(gfp)	(((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
 				 __GFP_NORETRY | __GFP_NOMEMALLOC | \
-				 __GFP_NOWARN | __GFP_NOFAIL)
+				 __GFP_NOWARN)
 
 /* scanning area inside a memory block */
 struct kmemleak_scan_area {
@@ -548,10 +548,12 @@  static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
 					     int min_count, gfp_t gfp)
 {
 	unsigned long flags;
-	struct kmemleak_object *object, *parent;
+	struct kmemleak_object *object = NULL, *parent;
 	struct rb_node **link, *rb_parent;
+	unsigned int i;
 
-	object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
+	for (i = 0; i < 10 && !object; i++)
+		object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
 	if (!object) {
 		pr_warn("Cannot allocate a kmemleak_object structure\n");
 		kmemleak_disable();
@@ -763,7 +765,8 @@  static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
 {
 	unsigned long flags;
 	struct kmemleak_object *object;
-	struct kmemleak_scan_area *area;
+	struct kmemleak_scan_area *area = NULL;
+	unsigned int i;
 
 	object = find_and_get_object(ptr, 1);
 	if (!object) {
@@ -772,7 +775,9 @@  static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
 		return;
 	}
 
-	area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
+	for (i = 0; i < 10 && !area; i++)
+		area = kmem_cache_alloc(scan_area_cache,
+					gfp_kmemleak_mask(gfp));
 	if (!area) {
 		pr_warn("Cannot allocate a scan area\n");
 		goto out;