Message ID | 20230626123252.73dbc139@endymion.delvare (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | module: print module name on refcount error | expand |
On Mon 26-06-23 12:32:52, Jean Delvare wrote: > If module_put() triggers a refcount error, include the culprit > module name in the warning message, to easy further investigation of > the issue. > > Signed-off-by: Jean Delvare <jdelvare@suse.de> > Suggested-by: Michal Hocko <mhocko@suse.com> > Cc: Luis Chamberlain <mcgrof@kernel.org> > --- > kernel/module/main.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > --- linux-6.3.orig/kernel/module/main.c > +++ linux-6.3/kernel/module/main.c > @@ -850,7 +850,9 @@ void module_put(struct module *module) > if (module) { > preempt_disable(); > ret = atomic_dec_if_positive(&module->refcnt); > - WARN_ON(ret < 0); /* Failed to put refcount */ > + WARN(ret < 0, > + KERN_WARNING "Failed to put refcount for module %s\n", > + module->name); Would it make sense to also print the refcnt here? In our internal bug report it has turned out that this was an overflow (put missing) rather than an underflow (too many put calls). Seeing the value could give a clue about that. We had to configure panic_on_warn to capture a dump to learn more which is rather impractical. Other than that the module information on its own is an improvement because one knows where to start looking or to reduce the tracing data collected. In any case Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > trace_module_put(module, _RET_IP_); > preempt_enable(); > } > > > -- > Jean Delvare > SUSE L3 Support
On Mon, Jun 26, 2023 at 12:32:52PM +0200, Jean Delvare wrote: > If module_put() triggers a refcount error, include the culprit > module name in the warning message, to easy further investigation of > the issue. > > Signed-off-by: Jean Delvare <jdelvare@suse.de> > Suggested-by: Michal Hocko <mhocko@suse.com> > Cc: Luis Chamberlain <mcgrof@kernel.org> > --- > kernel/module/main.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > --- linux-6.3.orig/kernel/module/main.c > +++ linux-6.3/kernel/module/main.c > @@ -850,7 +850,9 @@ void module_put(struct module *module) > if (module) { > preempt_disable(); > ret = atomic_dec_if_positive(&module->refcnt); > - WARN_ON(ret < 0); /* Failed to put refcount */ > + WARN(ret < 0, > + KERN_WARNING "Failed to put refcount for module %s\n", > + module->name); > trace_module_put(module, _RET_IP_); > preempt_enable(); > } > The mod struct ends up actually being allocated, we first read the ELF passed by userspace and we end up allocating space for struct module when reading the ELF section ".gnu.linkonce.this_module". We cache the ELF section index in info->index.mod, we finally copy the module into the allocated space with move_module(). In linux-next code this is much more clear now. What prevents us from racing to free the module and thus invalidating the name? For instance the system call to delete_module() could hammer and so have tons of threads racing try_stop_module(), eventually one of them could win and free_module() would kick in gear. What prevents code from racing the free with a random module_put() called by some other piece of code? I realize this may implicate even the existing code seems racy. Luis
Hi Luis, On Fri, 30 Jun 2023 16:05:33 -0700, Luis Chamberlain wrote: > On Mon, Jun 26, 2023 at 12:32:52PM +0200, Jean Delvare wrote: > > If module_put() triggers a refcount error, include the culprit > > module name in the warning message, to easy further investigation of > > the issue. > > > > Signed-off-by: Jean Delvare <jdelvare@suse.de> > > Suggested-by: Michal Hocko <mhocko@suse.com> > > Cc: Luis Chamberlain <mcgrof@kernel.org> > > --- > > kernel/module/main.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > --- linux-6.3.orig/kernel/module/main.c > > +++ linux-6.3/kernel/module/main.c > > @@ -850,7 +850,9 @@ void module_put(struct module *module) > > if (module) { > > preempt_disable(); > > ret = atomic_dec_if_positive(&module->refcnt); > > - WARN_ON(ret < 0); /* Failed to put refcount */ > > + WARN(ret < 0, > > + KERN_WARNING "Failed to put refcount for module %s\n", > > + module->name); > > trace_module_put(module, _RET_IP_); > > preempt_enable(); > > } > > > > The mod struct ends up actually being allocated, we first read the ELF > passed by userspace and we end up allocating space for struct module > when reading the ELF section ".gnu.linkonce.this_module". We cache > the ELF section index in info->index.mod, we finally copy the module > into the allocated space with move_module(). > > In linux-next code this is much more clear now. > > What prevents us from racing to free the module and thus invalidating > the name? > > For instance the system call to delete_module() could hammer and > so have tons of threads racing try_stop_module(), eventually one of > them could win and free_module() would kick in gear. > > What prevents code from racing the free with a random module_put() > called by some other piece of code? > > I realize this may implicate even the existing code seems racy. You are the maintainer so I'll trust your expertise, but this is how I understand it: if we hit this WARN, this means reference counting is screwed. If this is an underflow, we still have a reference to the module while refcnt is zero, meaning the module could be removed at any time. This is inherent to the issue we are reporting, and not related to the proposed change. The name is just one field of struct module, refcnt is in the very same situation already. So the whole piece of code is best effort reporting and assumes (both before and after my proposed change) that nobody attempted to unload the module yet.
On Sat, 1 Jul 2023 17:57:27 +0200, Jean Delvare wrote: > On Fri, 30 Jun 2023 16:05:33 -0700, Luis Chamberlain wrote: > > On Mon, Jun 26, 2023 at 12:32:52PM +0200, Jean Delvare wrote: > > > If module_put() triggers a refcount error, include the culprit > > > module name in the warning message, to easy further investigation of > > > the issue. > > > > > > Signed-off-by: Jean Delvare <jdelvare@suse.de> > > > Suggested-by: Michal Hocko <mhocko@suse.com> > > > Cc: Luis Chamberlain <mcgrof@kernel.org> > > > --- > > > kernel/module/main.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > --- linux-6.3.orig/kernel/module/main.c > > > +++ linux-6.3/kernel/module/main.c > > > @@ -850,7 +850,9 @@ void module_put(struct module *module) > > > if (module) { > > > preempt_disable(); > > > ret = atomic_dec_if_positive(&module->refcnt); > > > - WARN_ON(ret < 0); /* Failed to put refcount */ > > > + WARN(ret < 0, > > > + KERN_WARNING "Failed to put refcount for module %s\n", > > > + module->name); > > > trace_module_put(module, _RET_IP_); > > > preempt_enable(); > > > } > > > > The mod struct ends up actually being allocated, we first read the ELF > > passed by userspace and we end up allocating space for struct module > > when reading the ELF section ".gnu.linkonce.this_module". We cache > > the ELF section index in info->index.mod, we finally copy the module > > into the allocated space with move_module(). > > > > In linux-next code this is much more clear now. > > > > What prevents us from racing to free the module and thus invalidating > > the name? > > > > For instance the system call to delete_module() could hammer and > > so have tons of threads racing try_stop_module(), eventually one of > > them could win and free_module() would kick in gear. > > > > What prevents code from racing the free with a random module_put() > > called by some other piece of code? > > > > I realize this may implicate even the existing code seems racy. > > You are the maintainer so I'll trust your expertise, but this is how I > understand it: if we hit this WARN, this means reference counting is > screwed. If this is an underflow, we still have a reference to the > module while refcnt is zero, meaning the module could be removed at any > time. This is inherent to the issue we are reporting, and not related > to the proposed change. The name is just one field of struct module, > refcnt is in the very same situation already. > > So the whole piece of code is best effort reporting and assumes (both > before and after my proposed change) that nobody attempted to unload > the module yet. I thought some more about it and one potential problem with my proposed change is if the module has indeed already been freed and the memory already reused for a different purpose. We are in trouble already (we just called atomic_dec_if_positive on a random memory location) but the WARN message could become very messy if the memory where module.name used to reside no longer contains any string terminator (binary zero). So we probably want to play it safe and add a length limitation when printing the module name. Something like: WARN(ret < 0, KERN_WARNING "Failed to put refcount for module %.*s\n", (int)MODULE_NAME_LEN, module->name);
On Fri 30-06-23 16:05:33, Luis Chamberlain wrote: [...] > What prevents code from racing the free with a random module_put() > called by some other piece of code? Wouldn't be ref count a garbage already? How can you race when freeing if module_put fail?
Hi Michal, On Wed, 28 Jun 2023 12:30:35 +0200, Michal Hocko wrote: > On Mon 26-06-23 12:32:52, Jean Delvare wrote: > > If module_put() triggers a refcount error, include the culprit > > module name in the warning message, to easy further investigation of > > the issue. > > > > Signed-off-by: Jean Delvare <jdelvare@suse.de> > > Suggested-by: Michal Hocko <mhocko@suse.com> > > Cc: Luis Chamberlain <mcgrof@kernel.org> > > --- > > kernel/module/main.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > --- linux-6.3.orig/kernel/module/main.c > > +++ linux-6.3/kernel/module/main.c > > @@ -850,7 +850,9 @@ void module_put(struct module *module) > > if (module) { > > preempt_disable(); > > ret = atomic_dec_if_positive(&module->refcnt); > > - WARN_ON(ret < 0); /* Failed to put refcount */ > > + WARN(ret < 0, > > + KERN_WARNING "Failed to put refcount for module %s\n", > > + module->name); > > Would it make sense to also print the refcnt here? In our internal bug > report it has turned out that this was an overflow (put missing) rather > than an underflow (too many put calls). Seeing the value could give a > clue about that. We had to configure panic_on_warn to capture a dump to > learn more which is rather impractical. Well, other calls to module_put() or try_module_get() could happen in parallel, so at the time we print refcnt, its value could be different from the one which triggered the WARN. Additionally, catching an overflow in module_put() is counterintuitive, it only works by accident because the counter gets to negative values. If we really want to reliably report overflows as such then we should add a dedicated WARN to try_module_get(). Doesn't look trivial though. With my proposed implementation, I don't think it's necessary to turn on panic_on_warn to debug further. Once you know which module is culprit, enabling tracing for this specific module should give you all the details you need to figure out what's going on.
On Tue 04-07-23 14:43:12, Jean Delvare wrote: > Hi Michal, > > On Wed, 28 Jun 2023 12:30:35 +0200, Michal Hocko wrote: > > On Mon 26-06-23 12:32:52, Jean Delvare wrote: > > > If module_put() triggers a refcount error, include the culprit > > > module name in the warning message, to easy further investigation of > > > the issue. > > > > > > Signed-off-by: Jean Delvare <jdelvare@suse.de> > > > Suggested-by: Michal Hocko <mhocko@suse.com> > > > Cc: Luis Chamberlain <mcgrof@kernel.org> > > > --- > > > kernel/module/main.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > --- linux-6.3.orig/kernel/module/main.c > > > +++ linux-6.3/kernel/module/main.c > > > @@ -850,7 +850,9 @@ void module_put(struct module *module) > > > if (module) { > > > preempt_disable(); > > > ret = atomic_dec_if_positive(&module->refcnt); > > > - WARN_ON(ret < 0); /* Failed to put refcount */ > > > + WARN(ret < 0, > > > + KERN_WARNING "Failed to put refcount for module %s\n", > > > + module->name); > > > > Would it make sense to also print the refcnt here? In our internal bug > > report it has turned out that this was an overflow (put missing) rather > > than an underflow (too many put calls). Seeing the value could give a > > clue about that. We had to configure panic_on_warn to capture a dump to > > learn more which is rather impractical. > > Well, other calls to module_put() or try_module_get() could happen in > parallel, so at the time we print refcnt, its value could be different > from the one which triggered the WARN. Racess with module_put should be impossible because all of them should fail, right? Races with put are possible but we do not need an exact value to tell the difference between over and underflow, no?
On Tue, 4 Jul 2023 15:05:33 +0200, Michal Hocko wrote: > On Tue 04-07-23 14:43:12, Jean Delvare wrote: > > On Wed, 28 Jun 2023 12:30:35 +0200, Michal Hocko wrote: > > > Would it make sense to also print the refcnt here? In our internal bug > > > report it has turned out that this was an overflow (put missing) rather > > > than an underflow (too many put calls). Seeing the value could give a > > > clue about that. We had to configure panic_on_warn to capture a dump to > > > learn more which is rather impractical. > > > > Well, other calls to module_put() or try_module_get() could happen in > > parallel, so at the time we print refcnt, its value could be different > > from the one which triggered the WARN. > > Racess with module_put should be impossible because all of them should > fail, right? Most probably yes, but after taking a deeper look at the code, I wouldn't swear. For example delete_module() will decrement refcnt and increment it again if the module can't actually be removed. This could get refcnt to positive again briefly, at which point another module_put() could succeed. > Races with put are possible but we do not need an exact > value to tell the difference between over and underflow, no? Indeed not. But my other points still stand. Plus, if you really want to know the refcnt value, it's already visible in /sys/module/*/refcnt and lsmod.
On Mon, Jul 03, 2023 at 03:47:22PM +0200, Michal Hocko wrote: > On Fri 30-06-23 16:05:33, Luis Chamberlain wrote: > [...] > > What prevents code from racing the free with a random module_put() > > called by some other piece of code? > > Wouldn't be ref count a garbage already? How can you race when freeing > if module_put fail? It could yes, ie, so this risks at all being junk. So best IMHO is to tidy up all the get / puts and add respective tests to fix all this mess with proper messages as needed. My cursory review of the refcnt stuf is I see some races possible. While I'd be happy to help debugging aids, adding accesses to random memory for a string seems more risk prone. Luis
On Fri 07-07-23 11:56:49, Luis Chamberlain wrote: > On Mon, Jul 03, 2023 at 03:47:22PM +0200, Michal Hocko wrote: > > On Fri 30-06-23 16:05:33, Luis Chamberlain wrote: > > [...] > > > What prevents code from racing the free with a random module_put() > > > called by some other piece of code? > > > > Wouldn't be ref count a garbage already? How can you race when freeing > > if module_put fail? > > It could yes, ie, so this risks at all being junk. Could you be more specific please? I still do not see a scenario where module string name would be junk while refcount itself would be a valid memory. > So best IMHO is > to tidy up all the get / puts and add respective tests to fix all > this mess with proper messages as needed. My cursory review of the > refcnt stuf is I see some races possible. It would likely be better to use refcount_t instead of atomic_t. > While I'd be happy to help debugging aids, adding accesses to random > memory for a string seems more risk prone. If there is really a scenario when module could be unloaded leaving dangling struct module behind then we have a real problem as this is exported to userspace IIRC. Not to mention module_get/put calls modifying memory (UAF).
On Mon, Jul 10, 2023 at 07:43:01AM +0200, Michal Hocko wrote: > On Fri 07-07-23 11:56:49, Luis Chamberlain wrote: > > On Mon, Jul 03, 2023 at 03:47:22PM +0200, Michal Hocko wrote: > > > On Fri 30-06-23 16:05:33, Luis Chamberlain wrote: > > > [...] > > > > What prevents code from racing the free with a random module_put() > > > > called by some other piece of code? > > > > > > Wouldn't be ref count a garbage already? How can you race when freeing > > > if module_put fail? > > > > It could yes, ie, so this risks at all being junk. > > Could you be more specific please? I still do not see a scenario where > module string name would be junk while refcount itself would be a valid > memory. That is true, but if refcount is invalid so will the memory for the string. > > So best IMHO is > > to tidy up all the get / puts and add respective tests to fix all > > this mess with proper messages as needed. My cursory review of the > > refcnt stuf is I see some races possible. > > It would likely be better to use refcount_t instead of atomic_t. Patches welcomed. > > While I'd be happy to help debugging aids, adding accesses to random > > memory for a string seems more risk prone. > > If there is really a scenario when module could be unloaded leaving > dangling struct module behind then we have a real problem as this is > exported to userspace IIRC. Not to mention module_get/put calls > modifying memory (UAF). That doesn't mean issues could not exist, given its all protected under privileged execution. All I'm suggesting is I look at this code and don't trust it, and think it could use some love. The selftests for kmod could be used to stress test but also stress-ng now also has module load and unloading so if there are races we can likely exploit them with either the kmod selftest or stress-ng module loading. Luis
Hi Luis, Michal, On Wed, 26 Jul 2023 13:59:06 -0700, Luis Chamberlain wrote: > On Mon, Jul 10, 2023 at 07:43:01AM +0200, Michal Hocko wrote: > > On Fri 07-07-23 11:56:49, Luis Chamberlain wrote: > > > On Mon, Jul 03, 2023 at 03:47:22PM +0200, Michal Hocko wrote: > > > > On Fri 30-06-23 16:05:33, Luis Chamberlain wrote: > > > > [...] > > > > > What prevents code from racing the free with a random module_put() > > > > > called by some other piece of code? > > > > > > > > Wouldn't be ref count a garbage already? How can you race when freeing > > > > if module_put fail? > > > > > > It could yes, ie, so this risks at all being junk. > > > > Could you be more specific please? I still do not see a scenario where > > module string name would be junk while refcount itself would be a valid > > memory. > > That is true, but if refcount is invalid so will the memory for the > string. This isn't how I read the code, and this is exactly the reason why I submitted this patch in the first place. As far as I can see, there are 3 possibilities: 1* The refcount is correct, everything is fine. 2* The refcount is wrong (we are trying to put a ref which was never taken), however the module wasn't unloaded yet, so the module name is still readable. 3* The refcount is wrong and the module has already been unloaded. The memory may have been reused already, so the module name can't be read. My patch is only useful in case 2. Although it doesn't cover all cases, I think it is relevant because unloading modules is something you rarely do in production, so if the refcount goes wrong, we will almost always be in case 2. That being said, if you don't like my proposal for whatever reason, or prefer addressing the issue in a different way, no problem at all. > > It would likely be better to use refcount_t instead of atomic_t. > > Patches welcomed. Michal, do I understand correctly that this would prevent the case our customer had (too many gets), but won't make a difference for actual too-many-puts situations?
On Mon 28-08-23 14:18:30, Jean Delvare wrote: [...] > > > It would likely be better to use refcount_t instead of atomic_t. > > > > Patches welcomed. > > Michal, do I understand correctly that this would prevent the case our > customer had (too many gets), but won't make a difference for actual > too-many-puts situations? yes, refcount_t cannot protect from too-many-puts because there is not real way to protect from those AFAICS. At a certain moment you just drop to 0 and lose your object and all following that is a UAF. But I do not think this is actually the interesting case at all.
--- linux-6.3.orig/kernel/module/main.c +++ linux-6.3/kernel/module/main.c @@ -850,7 +850,9 @@ void module_put(struct module *module) if (module) { preempt_disable(); ret = atomic_dec_if_positive(&module->refcnt); - WARN_ON(ret < 0); /* Failed to put refcount */ + WARN(ret < 0, + KERN_WARNING "Failed to put refcount for module %s\n", + module->name); trace_module_put(module, _RET_IP_); preempt_enable(); }
If module_put() triggers a refcount error, include the culprit module name in the warning message, to easy further investigation of the issue. Signed-off-by: Jean Delvare <jdelvare@suse.de> Suggested-by: Michal Hocko <mhocko@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> --- kernel/module/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)