diff mbox

[5/6] lightnvm: pblk: free up mempool allocation for erases correctly

Message ID 20171001132555.GA5763@hercules.tuxera.com (mailing list archive)
State New, archived
Headers show

Commit Message

Rakesh Pandit Oct. 1, 2017, 1:25 p.m. UTC
While separating read and erase mempools in 22da65a1b pblk_g_rq_cache
was used two times to set aside memory both for erase and read
requests.  Because same kmem cache is used repeatedly a single call to
kmem_cache_destroy wouldn't deallocate everything.  Repeatedly doing
loading and unloading of pblk modules would eventually result in some
leak.

The fix is to really use separate kmem cache and track it
appropriately.

Fixes: 22da65a1b ("lightnvm: pblk: decouple read/erase mempools")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
---
 drivers/lightnvm/pblk-init.c | 16 ++++++++++++++--
 drivers/lightnvm/pblk.h      |  1 +
 2 files changed, 15 insertions(+), 2 deletions(-)

Comments

=?UTF-8?q?Javier=20Gonz=C3=A1lez?= Oct. 2, 2017, 12:09 p.m. UTC | #1
> On 1 Oct 2017, at 15.25, Rakesh Pandit <rakesh@tuxera.com> wrote:
> 
> While separating read and erase mempools in 22da65a1b pblk_g_rq_cache
> was used two times to set aside memory both for erase and read
> requests.  Because same kmem cache is used repeatedly a single call to
> kmem_cache_destroy wouldn't deallocate everything.  Repeatedly doing
> loading and unloading of pblk modules would eventually result in some
> leak.
> 
> The fix is to really use separate kmem cache and track it
> appropriately.
> 
> Fixes: 22da65a1b ("lightnvm: pblk: decouple read/erase mempools")
> Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
> 

I'm not sure I follow this logic. I assume that you're thinking of the
refcount on kmem_cache. During cache creation, all is good; if a
different cache creation fails, destruction is guaranteed, since the
refcount is 0. On tear down (pblk_core_free), we destroy the mempools
associated to the caches. In this case, the refcount goes to 0 too, as
we destroy the 2 mempools. So I don't see where the leak can happen. Am
I missing something?

In any case, Jens reported some bugs on the mempools, where we did not
guarantee forward progress. Here you can find the original discussion
and the mempool audit [1]. Would be good if you reviewed these.

[1] https://www.spinics.net/lists/kernel/msg2602274.html

Thanks,
Javier
Rakesh Pandit Oct. 2, 2017, 12:25 p.m. UTC | #2
On Mon, Oct 02, 2017 at 02:09:35PM +0200, Javier González wrote:
> > On 1 Oct 2017, at 15.25, Rakesh Pandit <rakesh@tuxera.com> wrote:
> > 
> > While separating read and erase mempools in 22da65a1b pblk_g_rq_cache
> > was used two times to set aside memory both for erase and read
> > requests.  Because same kmem cache is used repeatedly a single call to
> > kmem_cache_destroy wouldn't deallocate everything.  Repeatedly doing
> > loading and unloading of pblk modules would eventually result in some
> > leak.
> > 
> > The fix is to really use separate kmem cache and track it
> > appropriately.
> > 
> > Fixes: 22da65a1b ("lightnvm: pblk: decouple read/erase mempools")
> > Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
> > 
> 
> I'm not sure I follow this logic. I assume that you're thinking of the
> refcount on kmem_cache. During cache creation, all is good; if a
> different cache creation fails, destruction is guaranteed, since the
> refcount is 0. On tear down (pblk_core_free), we destroy the mempools
> associated to the caches. In this case, the refcount goes to 0 too, as
> we destroy the 2 mempools. So I don't see where the leak can happen. Am
> I missing something?
> 
> In any case, Jens reported some bugs on the mempools, where we did not
> guarantee forward progress. Here you can find the original discussion
> and the mempool audit [1]. Would be good if you reviewed these.
> 
> [1] https://www.spinics.net/lists/kernel/msg2602274.html
> 

Thanks, yes makes sense to follow up in patch thread.  I will respond
to above questions there later today.
Rakesh Pandit Oct. 2, 2017, 5:18 p.m. UTC | #3
On Mon, Oct 02, 2017 at 03:25:10PM +0300, Rakesh Pandit wrote:
> On Mon, Oct 02, 2017 at 02:09:35PM +0200, Javier González wrote:
> > > On 1 Oct 2017, at 15.25, Rakesh Pandit <rakesh@tuxera.com> wrote:
> > > 
> > > While separating read and erase mempools in 22da65a1b pblk_g_rq_cache
> > > was used two times to set aside memory both for erase and read
> > > requests.  Because same kmem cache is used repeatedly a single call to
> > > kmem_cache_destroy wouldn't deallocate everything.  Repeatedly doing
> > > loading and unloading of pblk modules would eventually result in some
> > > leak.
> > > 
> > > The fix is to really use separate kmem cache and track it
> > > appropriately.
> > > 
> > > Fixes: 22da65a1b ("lightnvm: pblk: decouple read/erase mempools")
> > > Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
> > > 
> > 
> > I'm not sure I follow this logic. I assume that you're thinking of the
> > refcount on kmem_cache. During cache creation, all is good; if a
> > different cache creation fails, destruction is guaranteed, since the
> > refcount is 0. On tear down (pblk_core_free), we destroy the mempools
> > associated to the caches. In this case, the refcount goes to 0 too, as
> > we destroy the 2 mempools. So I don't see where the leak can happen. Am
> > I missing something?
> > 
> > In any case, Jens reported some bugs on the mempools, where we did not
> > guarantee forward progress. Here you can find the original discussion
> > and the mempool audit [1]. Would be good if you reviewed these.
> > 
> > [1] https://www.spinics.net/lists/kernel/msg2602274.html
> > 
> 
> Thanks, yes makes sense to follow up in patch thread.  I will respond
> to above questions there later today.
>

I wasn't thinking it right in addition to looking at test results from
a incorrectly instrumented debugged version.

I went through the series you pointed and all seem okay to me now.

Please drop this patch.

Regards,
=?UTF-8?q?Javier=20Gonz=C3=A1lez?= Oct. 3, 2017, 6:42 a.m. UTC | #4
> On 2 Oct 2017, at 19.18, Rakesh Pandit <rakesh@tuxera.com> wrote:
> 
> On Mon, Oct 02, 2017 at 03:25:10PM +0300, Rakesh Pandit wrote:
>> On Mon, Oct 02, 2017 at 02:09:35PM +0200, Javier González wrote:
>>>> On 1 Oct 2017, at 15.25, Rakesh Pandit <rakesh@tuxera.com> wrote:
>>>> 
>>>> While separating read and erase mempools in 22da65a1b pblk_g_rq_cache
>>>> was used two times to set aside memory both for erase and read
>>>> requests.  Because same kmem cache is used repeatedly a single call to
>>>> kmem_cache_destroy wouldn't deallocate everything.  Repeatedly doing
>>>> loading and unloading of pblk modules would eventually result in some
>>>> leak.
>>>> 
>>>> The fix is to really use separate kmem cache and track it
>>>> appropriately.
>>>> 
>>>> Fixes: 22da65a1b ("lightnvm: pblk: decouple read/erase mempools")
>>>> Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
>>> 
>>> I'm not sure I follow this logic. I assume that you're thinking of the
>>> refcount on kmem_cache. During cache creation, all is good; if a
>>> different cache creation fails, destruction is guaranteed, since the
>>> refcount is 0. On tear down (pblk_core_free), we destroy the mempools
>>> associated to the caches. In this case, the refcount goes to 0 too, as
>>> we destroy the 2 mempools. So I don't see where the leak can happen. Am
>>> I missing something?
>>> 
>>> In any case, Jens reported some bugs on the mempools, where we did not
>>> guarantee forward progress. Here you can find the original discussion
>>> and the mempool audit [1]. Would be good if you reviewed these.
>>> 
>>> [1] https://www.spinics.net/lists/kernel/msg2602274.html
>> 
>> Thanks, yes makes sense to follow up in patch thread.  I will respond
>> to above questions there later today.
> 
> I wasn't thinking it right in addition to looking at test results from
> a incorrectly instrumented debugged version.
> 
> I went through the series you pointed and all seem okay to me now.
> 
> Please drop this patch.
> 

Cool.

Javier
diff mbox

Patch

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 9d9adcf..519e5cf 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -21,7 +21,7 @@ 
 #include "pblk.h"
 
 static struct kmem_cache *pblk_ws_cache, *pblk_rec_cache, *pblk_g_rq_cache,
-				*pblk_w_rq_cache;
+				*pblk_w_rq_cache, *pblk_e_rq_cache;
 static DECLARE_RWSEM(pblk_lock);
 struct bio_set *pblk_bio_set;
 
@@ -206,12 +206,23 @@  static int pblk_init_global_caches(struct pblk *pblk)
 		return -ENOMEM;
 	}
 
+	pblk_e_rq_cache = kmem_cache_create("pblk_e_rq", pblk_e_rq_size,
+				0, 0, NULL);
+	if (!pblk_e_rq_cache) {
+		kmem_cache_destroy(pblk_ws_cache);
+		kmem_cache_destroy(pblk_rec_cache);
+		kmem_cache_destroy(pblk_g_rq_cache);
+		up_write(&pblk_lock);
+		return -ENOMEM;
+	}
+
 	pblk_w_rq_cache = kmem_cache_create("pblk_w_rq", pblk_w_rq_size,
 				0, 0, NULL);
 	if (!pblk_w_rq_cache) {
 		kmem_cache_destroy(pblk_ws_cache);
 		kmem_cache_destroy(pblk_rec_cache);
 		kmem_cache_destroy(pblk_g_rq_cache);
+		kmem_cache_destroy(pblk_e_rq_cache);
 		up_write(&pblk_lock);
 		return -ENOMEM;
 	}
@@ -252,7 +263,7 @@  static int pblk_core_init(struct pblk *pblk)
 		goto free_rec_pool;
 
 	pblk->e_rq_pool = mempool_create_slab_pool(geo->nr_luns,
-							pblk_g_rq_cache);
+							pblk_e_rq_cache);
 	if (!pblk->e_rq_pool)
 		goto free_r_rq_pool;
 
@@ -327,6 +338,7 @@  static void pblk_core_free(struct pblk *pblk)
 	kmem_cache_destroy(pblk_ws_cache);
 	kmem_cache_destroy(pblk_rec_cache);
 	kmem_cache_destroy(pblk_g_rq_cache);
+	kmem_cache_destroy(pblk_e_rq_cache);
 	kmem_cache_destroy(pblk_w_rq_cache);
 }
 
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index fcac246..03834d1 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -651,6 +651,7 @@  struct pblk_line_ws {
 
 #define pblk_g_rq_size (sizeof(struct nvm_rq) + sizeof(struct pblk_g_ctx))
 #define pblk_w_rq_size (sizeof(struct nvm_rq) + sizeof(struct pblk_c_ctx))
+#define pblk_e_rq_size pblk_g_rq_size
 
 /*
  * pblk ring buffer operations