diff mbox series

slab: Error out on duplicate cache names when DEBUG_VM=y

Message ID 20240804212839.685795-1-pedro.falcato@gmail.com (mailing list archive)
State New
Headers show
Series slab: Error out on duplicate cache names when DEBUG_VM=y | expand

Commit Message

Pedro Falcato Aug. 4, 2024, 9:28 p.m. UTC
Duplicate slab cache names can create havoc for userspace tooling that
expects slab cache names to be unique. This is a reasonable expectation.

Link: https://lore.kernel.org/linux-fsdevel/2d1d053da1cafb3e7940c4f25952da4f0af34e38.1722293276.git.osandov@fb.com/
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
 mm/slab_common.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Pedro Falcato Aug. 5, 2024, 10:24 a.m. UTC | #1
On Sun, Aug 4, 2024 at 10:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> Duplicate slab cache names can create havoc for userspace tooling that
> expects slab cache names to be unique. This is a reasonable expectation.

For completeness, I just had a look at duplicate cache names around
the kernel using
git grep -Eoh "kmem_cache_create.*\"," | grep -Eo \".*\" | uniq -d
(which seems to be correct)

which results in the following patch (on top of torvalds/linux.git
master, so file_lock_cache hasn't been fixed yet)

This patch being so small is what leads me to believe that erroring
out here is safe. Of course, no one knows what the out-of-tree modules
do.

diff --git a/drivers/scsi/snic/snic_main.c b/drivers/scsi/snic/snic_main.c
index cc824dcfe7d..abc78320c66 100644
--- a/drivers/scsi/snic/snic_main.c
+++ b/drivers/scsi/snic/snic_main.c
@@ -873,7 +873,7 @@ snic_global_data_init(void)
       snic_glob->req_cache[SNIC_REQ_CACHE_MAX_SGL] = cachep;

       len = sizeof(struct snic_host_req);
-       cachep = kmem_cache_create("snic_req_maxsgl", len, SNIC_SG_DESC_ALIGN,
+       cachep = kmem_cache_create("snic_req_tm", len, SNIC_SG_DESC_ALIGN,
                                  SLAB_HWCACHE_ALIGN, NULL);
       if (!cachep) {
               SNIC_ERR("Failed to create snic tm req slab\n");
diff --git a/fs/locks.c b/fs/locks.c
index 9afb16e0683..e45cad40f8b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2984,7 +2984,7 @@ static int __init filelock_init(void)
       filelock_cache = kmem_cache_create("file_lock_cache",
                       sizeof(struct file_lock), 0, SLAB_PANIC, NULL);

-       filelease_cache = kmem_cache_create("file_lock_cache",
+       filelease_cache = kmem_cache_create("file_lease_cache",
                       sizeof(struct file_lease), 0, SLAB_PANIC, NULL);

       for_each_possible_cpu(i) {
Vlastimil Babka Aug. 5, 2024, 10:38 a.m. UTC | #2
On 8/5/24 12:24, Pedro Falcato wrote:
> On Sun, Aug 4, 2024 at 10:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>>
>> Duplicate slab cache names can create havoc for userspace tooling that
>> expects slab cache names to be unique. This is a reasonable expectation.
> 
> For completeness, I just had a look at duplicate cache names around
> the kernel using
> git grep -Eoh "kmem_cache_create.*\"," | grep -Eo \".*\" | uniq -d
> (which seems to be correct)
> 
> which results in the following patch (on top of torvalds/linux.git
> master, so file_lock_cache hasn't been fixed yet)
> 
> This patch being so small is what leads me to believe that erroring
> out here is safe. Of course, no one knows what the out-of-tree modules
> do.

What about module unload/reload with a SLAB_TYPESAFE_BY_RCU cache that will
delay its freeing. Soon also if there are kfree_rcu()'s in flight. And the
zombie cache can stay also permamently around if it fails to be destroy
because some objects were not freed.

In all these cases the cache's refcount should be 0 at that point, so
minimally the check should ignore those. But I would also rather make it be
a warning anyway, at least for a few releases.

> diff --git a/drivers/scsi/snic/snic_main.c b/drivers/scsi/snic/snic_main.c
> index cc824dcfe7d..abc78320c66 100644
> --- a/drivers/scsi/snic/snic_main.c
> +++ b/drivers/scsi/snic/snic_main.c
> @@ -873,7 +873,7 @@ snic_global_data_init(void)
>        snic_glob->req_cache[SNIC_REQ_CACHE_MAX_SGL] = cachep;
> 
>        len = sizeof(struct snic_host_req);
> -       cachep = kmem_cache_create("snic_req_maxsgl", len, SNIC_SG_DESC_ALIGN,
> +       cachep = kmem_cache_create("snic_req_tm", len, SNIC_SG_DESC_ALIGN,
>                                   SLAB_HWCACHE_ALIGN, NULL);
>        if (!cachep) {
>                SNIC_ERR("Failed to create snic tm req slab\n");
> diff --git a/fs/locks.c b/fs/locks.c
> index 9afb16e0683..e45cad40f8b 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2984,7 +2984,7 @@ static int __init filelock_init(void)
>        filelock_cache = kmem_cache_create("file_lock_cache",
>                        sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
> 
> -       filelease_cache = kmem_cache_create("file_lock_cache",
> +       filelease_cache = kmem_cache_create("file_lease_cache",
>                        sizeof(struct file_lease), 0, SLAB_PANIC, NULL);
> 
>        for_each_possible_cpu(i) {
Mateusz Guzik Aug. 5, 2024, 11:24 a.m. UTC | #3
On Mon, Aug 05, 2024 at 12:38:29PM +0200, Vlastimil Babka wrote:
> What about module unload/reload with a SLAB_TYPESAFE_BY_RCU cache that will
> delay its freeing. Soon also if there are kfree_rcu()'s in flight. And the
> zombie cache can stay also permamently around if it fails to be destroy
> because some objects were not freed.
> 

It should be an invariant that the cache is fully whacked by the time
kmem_cache_destroy returns, at worst with the exception of when leaked
items are encountered (but even then it should be renamed to something
indicating it is defunct).

Suppose a cache could not have been destroyed and was left as is, then
the offending module was loaded again -- now you got two with the same
name, which is not that great either.

I find myself quite surprised that kmem_cache_destroy can return even if
cache destruction is still pending.

This was added in 657dc2f97220 ("slab: remove synchronous rcu_barrier()
call in memcg cache release path"), citing batching benefits for
frequent kmem cache creation/destruction.

I believe the very notion of doing that *frequently* is b0rked and any
code doing it should be patched to stop regardless.

Even so, if there are any benefits to the committed patch, it perhaps
can be augmented so that the kmem_cache_destroy caller can wait for the
entire thing to finish (e.g., with a completion).
diff mbox series

Patch

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 40b582a014b..c4f31c887f9 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -88,6 +88,19 @@  unsigned int kmem_cache_size(struct kmem_cache *s)
 EXPORT_SYMBOL(kmem_cache_size);
 
 #ifdef CONFIG_DEBUG_VM
+
+static bool kmem_cache_is_duplicate_name(const char *name)
+{
+	struct kmem_cache *s;
+
+	list_for_each_entry(s, &slab_caches, list) {
+		if (!strcmp(s->name, name))
+			return true;
+	}
+
+	return false;
+}
+
 static int kmem_cache_sanity_check(const char *name, unsigned int size)
 {
 	if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) {
@@ -95,6 +108,12 @@  static int kmem_cache_sanity_check(const char *name, unsigned int size)
 		return -EINVAL;
 	}
 
+	if (kmem_cache_is_duplicate_name(name)) {
+		/* Duplicate names will confuse slabtop, et al */
+		pr_err("%s: name %s already exists as a cache\n", __func__, name);
+		return -EINVAL;
+	}
+
 	WARN_ON(strchr(name, ' '));	/* It confuses parsers */
 	return 0;
 }