diff mbox

kmemleak: don't use __GFP_NOFAIL

Message ID 2074740225.5769475.1527763882580.JavaMail.zimbra@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chunyu Hu May 31, 2018, 10:51 a.m. UTC
----- Original Message -----
> From: "Michal Hocko" <mhocko@suse.com>
> To: "Chunyu Hu" <chuhu@redhat.com>
> Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com>
> Sent: Wednesday, May 30, 2018 8:38:26 PM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > 
> > ----- Original Message -----
> > > From: "Michal Hocko" <mhocko@suse.com>
> > > To: "Chunyu Hu" <chuhu@redhat.com>
> > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
> > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > > "catalin marinas" <catalin.marinas@arm.com>
> > > Sent: Wednesday, May 30, 2018 6:46:37 PM
> > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > > 
> > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
> > > [...]
> > > > I'm trying to reuse the make_it_fail field in task for fault injection.
> > > > As
> > > > adding
> > > > an extra memory alloc flag is not thought so good,  I think adding task
> > > > flag
> > > > is either?
> > > 
> > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations
> > > without an additional maint. overhead. Anyway, you should really think
> > > about how to guarantee trackability for atomic allocation requests. You
> > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really
> > 
> > Sure. While I'm using task->make_it_fail, I'm still in the direction of
> > making kmemleak avoid fault inject with task flag instead of page alloc
> > flag.
> > 
> > > want to have a pre-populated pool of objects for those requests. The
> > > obvious question is how to balance such a pool. It ain't easy to track
> > > memory by allocating more memory...
> > 
> > This solution is going to make kmemleak trace really nofail. We can think
> > later.
> > 
> > while I'm thinking about if fault inject can be disabled via flag in task.
> > 
> > Actually, I'm doing something like below, the disable_fault_inject() is
> > just setting a flag in task->make_it_fail. But this will depend on if
> > fault injection accept a change like this. CCing Akinobu
> 
> You still seem to be missing my point I am afraid (or I am ;). So say
> that you want to track a GFP_NOWAIT allocation request. So create_object
> will get called with that gfp mask and no matter what you try here your
> tracking object will be allocated in a weak allocation context as well
> and disable kmemleak. So it only takes a more heavy memory pressure and
> the tracing is gone...

Michal,

Thank you for the good suggestion. You mean GFP_NOWAIT still can make create_object
fail and as a result kmemleak disable itself. So it's not so useful, just like
the current __GFP_NOFAIL usage in create_object. 

In the first thread, we discussed this. and that time you suggested we have 
fault injection disabled when kmemleak is working and suggested per task way.
so my head has been stuck in that point. While now you gave a better suggestion
that why not we pre allocate a urgent pool for kmemleak objects. After thinking
for a while, I got  your point, it's a good way for improving kmemleak to make
it can tolerate light allocation failure. And catalin mentioned that we have
one option that use the early_log array as urgent pool, which has the similar
ideology.

Basing on your suggestions, I tried to draft this, what does it look to you? 
another strong alloc mask and an extra thread for fill the pool, which containts
1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then
get a object from the pool. 


 
    object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
    if (!object) {
+       object = kmemleak_get_pool_object();
+       pr_info("total=%u", total);
+   }
+   if (!object) {
        pr_warn("Cannot allocate a kmemleak_object structure\n");
        kmemleak_disable();
        return NULL;
@@ -1872,8 +1957,10 @@ static ssize_t kmemleak_write(struct file *file, const char __user *user_buf,
        kmemleak_stack_scan = 0;
    else if (strncmp(buf, "scan=on", 7) == 0)
        start_scan_thread();
-   else if (strncmp(buf, "scan=off", 8) == 0)
+   else if (strncmp(buf, "scan=off", 8) == 0) {
        stop_scan_thread();
+       stop_pool_thread();
+   }
    else if (strncmp(buf, "scan=", 5) == 0) {
        unsigned long secs;
 
@@ -1929,6 +2016,7 @@ static void __kmemleak_do_cleanup(void)
 static void kmemleak_do_cleanup(struct work_struct *work)
 {
    stop_scan_thread();
+   stop_pool_thread();
 
    mutex_lock(&scan_mutex);
    /*
@@ -2114,6 +2202,7 @@ static int __init kmemleak_late_init(void)
        pr_warn("Failed to create the debugfs kmemleak file\n");
    mutex_lock(&scan_mutex);
    start_scan_thread();
+   start_pool_thread();
    mutex_unlock(&scan_mutex);
 
    pr_info("Kernel memory leak detector initialized\n");                           



> --
> Michal Hocko
> SUSE Labs
>

Comments

Michal Hocko May 31, 2018, 11:35 a.m. UTC | #1
On Thu 31-05-18 06:51:22, Chunyu Hu wrote:
> 
> 
> ----- Original Message -----
> > From: "Michal Hocko" <mhocko@suse.com>
> > To: "Chunyu Hu" <chuhu@redhat.com>
> > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com>
> > Sent: Wednesday, May 30, 2018 8:38:26 PM
> > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > 
> > On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > > 
> > > ----- Original Message -----
> > > > From: "Michal Hocko" <mhocko@suse.com>
> > > > To: "Chunyu Hu" <chuhu@redhat.com>
> > > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
> > > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > > > "catalin marinas" <catalin.marinas@arm.com>
> > > > Sent: Wednesday, May 30, 2018 6:46:37 PM
> > > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > > > 
> > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
> > > > [...]
> > > > > I'm trying to reuse the make_it_fail field in task for fault injection.
> > > > > As
> > > > > adding
> > > > > an extra memory alloc flag is not thought so good,  I think adding task
> > > > > flag
> > > > > is either?
> > > > 
> > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations
> > > > without an additional maint. overhead. Anyway, you should really think
> > > > about how to guarantee trackability for atomic allocation requests. You
> > > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really
> > > 
> > > Sure. While I'm using task->make_it_fail, I'm still in the direction of
> > > making kmemleak avoid fault inject with task flag instead of page alloc
> > > flag.
> > > 
> > > > want to have a pre-populated pool of objects for those requests. The
> > > > obvious question is how to balance such a pool. It ain't easy to track
> > > > memory by allocating more memory...
> > > 
> > > This solution is going to make kmemleak trace really nofail. We can think
> > > later.
> > > 
> > > while I'm thinking about if fault inject can be disabled via flag in task.
> > > 
> > > Actually, I'm doing something like below, the disable_fault_inject() is
> > > just setting a flag in task->make_it_fail. But this will depend on if
> > > fault injection accept a change like this. CCing Akinobu
> > 
> > You still seem to be missing my point I am afraid (or I am ;). So say
> > that you want to track a GFP_NOWAIT allocation request. So create_object
> > will get called with that gfp mask and no matter what you try here your
> > tracking object will be allocated in a weak allocation context as well
> > and disable kmemleak. So it only takes a more heavy memory pressure and
> > the tracing is gone...
> 
> Michal,
> 
> Thank you for the good suggestion. You mean GFP_NOWAIT still can make create_object
> fail and as a result kmemleak disable itself. So it's not so useful, just like
> the current __GFP_NOFAIL usage in create_object. 
> 
> In the first thread, we discussed this. and that time you suggested we have 
> fault injection disabled when kmemleak is working and suggested per task way.
> so my head has been stuck in that point. While now you gave a better suggestion
> that why not we pre allocate a urgent pool for kmemleak objects. After thinking
> for a while, I got  your point, it's a good way for improving kmemleak to make
> it can tolerate light allocation failure. And catalin mentioned that we have
> one option that use the early_log array as urgent pool, which has the similar
> ideology.
> 
> Basing on your suggestions, I tried to draft this, what does it look to you? 
> another strong alloc mask and an extra thread for fill the pool, which containts
> 1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then
> get a object from the pool. 

I am not really familiar with kmemleak code base to judge the
implementation. Could you be more specific about the highlevel design
please? Who is the producer and how does it sync with consumers?
Chunyu Hu May 31, 2018, 12:28 p.m. UTC | #2
----- Original Message -----
> From: "Michal Hocko" <mhocko@suse.com>
> To: "Chunyu Hu" <chuhu@redhat.com>
> Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>, malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita" <akinobu.mita@gmail.com>
> Sent: Thursday, May 31, 2018 7:35:08 PM
> Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> 
> On Thu 31-05-18 06:51:22, Chunyu Hu wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Michal Hocko" <mhocko@suse.com>
> > > To: "Chunyu Hu" <chuhu@redhat.com>
> > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
> > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > > "catalin marinas" <catalin.marinas@arm.com>, "Akinobu Mita"
> > > <akinobu.mita@gmail.com>
> > > Sent: Wednesday, May 30, 2018 8:38:26 PM
> > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > > 
> > > On Wed 30-05-18 07:42:59, Chunyu Hu wrote:
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Michal Hocko" <mhocko@suse.com>
> > > > > To: "Chunyu Hu" <chuhu@redhat.com>
> > > > > Cc: "Tetsuo Handa" <penguin-kernel@i-love.sakura.ne.jp>,
> > > > > malat@debian.org, dvyukov@google.com, linux-mm@kvack.org,
> > > > > "catalin marinas" <catalin.marinas@arm.com>
> > > > > Sent: Wednesday, May 30, 2018 6:46:37 PM
> > > > > Subject: Re: [PATCH] kmemleak: don't use __GFP_NOFAIL
> > > > > 
> > > > > On Wed 30-05-18 05:35:37, Chunyu Hu wrote:
> > > > > [...]
> > > > > > I'm trying to reuse the make_it_fail field in task for fault
> > > > > > injection.
> > > > > > As
> > > > > > adding
> > > > > > an extra memory alloc flag is not thought so good,  I think adding
> > > > > > task
> > > > > > flag
> > > > > > is either?
> > > > > 
> > > > > Yeah, task flag will be reduced to KMEMLEAK enabled configurations
> > > > > without an additional maint. overhead. Anyway, you should really
> > > > > think
> > > > > about how to guarantee trackability for atomic allocation requests.
> > > > > You
> > > > > cannot simply assume that GFP_NOWAIT will succeed. I guess you really
> > > > 
> > > > Sure. While I'm using task->make_it_fail, I'm still in the direction of
> > > > making kmemleak avoid fault inject with task flag instead of page alloc
> > > > flag.
> > > > 
> > > > > want to have a pre-populated pool of objects for those requests. The
> > > > > obvious question is how to balance such a pool. It ain't easy to
> > > > > track
> > > > > memory by allocating more memory...
> > > > 
> > > > This solution is going to make kmemleak trace really nofail. We can
> > > > think
> > > > later.
> > > > 
> > > > while I'm thinking about if fault inject can be disabled via flag in
> > > > task.
> > > > 
> > > > Actually, I'm doing something like below, the disable_fault_inject() is
> > > > just setting a flag in task->make_it_fail. But this will depend on if
> > > > fault injection accept a change like this. CCing Akinobu
> > > 
> > > You still seem to be missing my point I am afraid (or I am ;). So say
> > > that you want to track a GFP_NOWAIT allocation request. So create_object
> > > will get called with that gfp mask and no matter what you try here your
> > > tracking object will be allocated in a weak allocation context as well
> > > and disable kmemleak. So it only takes a more heavy memory pressure and
> > > the tracing is gone...
> > 
> > Michal,
> > 
> > Thank you for the good suggestion. You mean GFP_NOWAIT still can make
> > create_object
> > fail and as a result kmemleak disable itself. So it's not so useful, just
> > like
> > the current __GFP_NOFAIL usage in create_object.
> > 
> > In the first thread, we discussed this. and that time you suggested we have
> > fault injection disabled when kmemleak is working and suggested per task
> > way.
> > so my head has been stuck in that point. While now you gave a better
> > suggestion
> > that why not we pre allocate a urgent pool for kmemleak objects. After
> > thinking
> > for a while, I got  your point, it's a good way for improving kmemleak to
> > make
> > it can tolerate light allocation failure. And catalin mentioned that we
> > have
> > one option that use the early_log array as urgent pool, which has the
> > similar
> > ideology.
> > 
> > Basing on your suggestions, I tried to draft this, what does it look to
> > you?
> > another strong alloc mask and an extra thread for fill the pool, which
> > containts
> > 1M objects in a frequency of 100 ms. If first kmem_cache_alloc failed, then
> > get a object from the pool.
> 
> I am not really familiar with kmemleak code base to judge the
> implementation. Could you be more specific about the highlevel design
> please? Who is the producer and how does it sync with consumers?

OK. 

To better describe. We know that, kmemleak_object is meta object for kmemleak
trace, and each time kmem_cache_alloc(or other) success, the another following
kmem_cache_alloc would be called (in create_object() to get a kmemleak_object 
and this must succeed, otherwise kmemleak would generate too many false positives
as a result of losing track to a memory block which could contain pointer to 
other objects. so kmemleak trace choose to disable itself when getting such
a allocation failure. 

When facing fault injection, this would become an issue that kmemleak would
easily disable itself when fault injected. And  memory allocation can
happen in irq context, so the followed kmemleak_alloc can't choose a
very strong way for allocation (such as blackable). So we can prepare
a dynamic kmemleak_object pool. And the design is in fact rather straight,
by maintaining a list of kmemleak_object. 

So the reproducer is a new kernel thread. which do a kmemleak_object(contains
list member itself, so easy to link) allocation every 100ms, in a strong
allocation way (can sleep and reclaim), to the pool_object_list, and the max
length of the list is 1024*1024 (1M).
 
  [pool_thread (reproducer)]                   
    pool_object_list<-->kmemleak_object<-->kmemleak_object...<-->...

And the consumer is create_object(). it can pick one from the list when
got failure in first weak allocation. 

  [task doing memory alloc (consumer)]
    kmem_cache_alloc()
        create_object() 
           kmem_cache_alloc()
             (fail ?)--Yes ---> (get kmemleak_object from the pool_object_lsit)
                     |_ No ---> got kmemleak_object
                  [insert kmemleak_object to rb tree]

And consumer and producer are synced with spinlock kmemleak_object_lock(maybe
call pool_object_lock)

  [spin lock]
  kmemleak_object_lock

Hope I described it clear...
diff mbox

Patch

diff --git a/mm/kmemleak.c b/mm/kmemleak.c                                                                                                                                   
index 9a085d5..7163489 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -128,6 +128,10 @@ 
                 __GFP_NORETRY | __GFP_NOMEMALLOC | \
                 __GFP_NOWARN | __GFP_NOFAIL)
 
+#define gfp_kmemleak_mask_strong() (__GFP_NOMEMALLOC | \
+                __GFP_NOWARN | __GFP_RECLAIM | __GFP_NOFAIL)
+
+
 /* scanning area inside a memory block */
 struct kmemleak_scan_area {
    struct hlist_node node;
@@ -299,6 +303,83 @@  struct early_log {
    kmemleak_disable();     \
 } while (0)
 
+static DEFINE_SPINLOCK(kmemleak_object_lock);
+static LIST_HEAD(pool_object_list);
+static unsigned int volatile total;
+static unsigned int pool_object_max = 1024 * 1024;
+static struct task_struct *pool_thread;
+
+static struct kmemleak_object* kmemleak_pool_fill(void)
+{
+   struct kmemleak_object *object = NULL;
+   unsigned long flags;
+
+   object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask_strong());
+   spin_lock_irqsave(&kmemleak_object_lock, flags);
+   if (object) {
+       list_add(&object->object_list, &pool_object_list);
+       total++;
+   }
+   spin_unlock_irqrestore(&kmemleak_object_lock, flags);
+   return object;
+}
+
+static struct kmemleak_object* kmemleak_get_pool_object(void)
+{
+   struct kmemleak_object *object = NULL;
+   unsigned long flags;
+
+   spin_lock_irqsave(&kmemleak_object_lock, flags);
+   if (!list_empty(&pool_object_list)) {
+       object = list_first_entry(&pool_object_list,struct kmemleak_object,
+               object_list);
+       list_del(&object->object_list);
+       total--;
+   }
+   spin_unlock_irqrestore(&kmemleak_object_lock, flags);
+   return object;
+}
+
+static int kmemleak_pool_thread(void *nothinng)
+{
+   struct kmemleak_object *object = NULL;
+   while (!kthread_should_stop()) {
+       if (READ_ONCE(total) < pool_object_max) {
+           object = kmemleak_pool_fill();
+           WARN_ON(!object);
+       }
+       schedule_timeout_interruptible(msecs_to_jiffies(100));
+   }
+   return 0;
+}
+
+static void start_pool_thread(void)
+{
+   if (pool_thread)
+       return;
+   pool_thread = kthread_run(kmemleak_pool_thread, NULL, "kmemleak_pool");
+   if (IS_ERR(pool_thread)) {
+       pr_warn("Failed to create the scan thread\n");
+       pool_thread = NULL;
+   }
+}
+static void stop_pool_thread(void)
+{
+   struct kmemleak_object *object;
+   unsigned long flags;
+   if (pool_thread) {
+       kthread_stop(pool_thread);
+       pool_thread = NULL;
+   }
+   spin_lock_irqsave(&kmemleak_object_lock, flags);
+   list_for_each_entry(object, &pool_object_list, object_list) {
+       list_del(&object->object_list);
+       kmem_cache_free(object_cache, object);
+   }
+   spin_unlock_irqrestore(&kmemleak_object_lock, flags);
+}
+
 /*
  * Printing of the objects hex dump to the seq file. The number of lines to be
  * printed is limited to HEX_MAX_LINES to prevent seq file spamming. The
@@ -553,6 +634,10 @@  static struct kmemleak_object *create_object(unsigned long ptr, size_t size,