Message ID | ZvEFQAWVgWNd9j7e@gondor.apana.org.au (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Herbert Xu |
Headers | show |
Series | hwrng: core - Add WARN_ON for buggy read return values | expand |
On Mon Sep 23, 2024 at 9:05 AM EEST, Herbert Xu wrote: > Dear TPM maintainers: There's really only just me (for past 10 years). Maybe that should be updatred. > > Please have a look at the tpm hwrng driver because it appears to > be returning a length longer than the buffer length that we gave > it. In particular, tpm2 appears to be the culprit (though I didn't > really check tpm1 at all so it could also be buggy). > > The following patch hopefully should confirm that this is indeed > caused by TPM and not some other HWRNG driver. > > ---8<--- > If a buggy driver returns a length that is longer than the size > of the buffer provided to it, then this may lead to a buffer overread > in the caller. > > Stop this by adding a check for it in the hwrng core. > > Reported-by: Guangwu Zhang <guazhang@redhat.com> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index 57c51efa5613..018316f54621 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -181,8 +181,15 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, > int present; > > BUG_ON(!mutex_is_locked(&reading_mutex)); > - if (rng->read) > - return rng->read(rng, (void *)buffer, size, wait); > + if (rng->read) { > + int err; > + > + err = rng->read(rng, buffer, size, wait); > + if (WARN_ON_ONCE(err > 0 && err > size)) Are you sure you want to use WARN_ON_ONCE here instead of pr_warn_once()? I.e. is it worth of taking down the whole kernel? > + err = size; > + > + return err; > + } > > if (rng->data_present) > present = rng->data_present(rng, wait); BR, Jarkko
On Mon Sep 23, 2024 at 10:52 AM EEST, Jarkko Sakkinen wrote: > On Mon Sep 23, 2024 at 9:05 AM EEST, Herbert Xu wrote: > > Dear TPM maintainers: > > There's really only just me (for past 10 years). Maybe that should be > updatred. > > > > > Please have a look at the tpm hwrng driver because it appears to > > be returning a length longer than the buffer length that we gave > > it. In particular, tpm2 appears to be the culprit (though I didn't > > really check tpm1 at all so it could also be buggy). > > > > The following patch hopefully should confirm that this is indeed > > caused by TPM and not some other HWRNG driver. > > > > > > > > ---8<--- > > If a buggy driver returns a length that is longer than the size > > of the buffer provided to it, then this may lead to a buffer overread > > in the caller. > > > > Stop this by adding a check for it in the hwrng core. > > > > Reported-by: Guangwu Zhang <guazhang@redhat.com> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > index 57c51efa5613..018316f54621 100644 > > --- a/drivers/char/hw_random/core.c > > +++ b/drivers/char/hw_random/core.c > > @@ -181,8 +181,15 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, > > int present; > > > > BUG_ON(!mutex_is_locked(&reading_mutex)); > > - if (rng->read) > > - return rng->read(rng, (void *)buffer, size, wait); > > + if (rng->read) { > > + int err; > > + > > + err = rng->read(rng, buffer, size, wait); > > + if (WARN_ON_ONCE(err > 0 && err > size)) > > Are you sure you want to use WARN_ON_ONCE here instead of > pr_warn_once()? I.e. is it worth of taking down the whole > kernel? I looked at tpm2_get_random() and it is pretty inefficient (not same as saying it has a bug). I'd love to reimplement it. We would be better of it would pull random let's say with 256 byte or 512 byte chunks and cache that internal to tpm_chip. Then the requests would be served from that pre-fetched pool and not do round-trip to TPM every single time. This would improve overall kernel performance given the reduced number of wait states. I wonder if anyone knows what would be a good fetch size that will work on most TPM2 chips? BR, Jarkko
On Mon Sep 23, 2024 at 11:07 AM EEST, Jarkko Sakkinen wrote: > On Mon Sep 23, 2024 at 10:52 AM EEST, Jarkko Sakkinen wrote: > > On Mon Sep 23, 2024 at 9:05 AM EEST, Herbert Xu wrote: > > > Dear TPM maintainers: > > > > There's really only just me (for past 10 years). Maybe that should be > > updatred. > > > > > > > > Please have a look at the tpm hwrng driver because it appears to > > > be returning a length longer than the buffer length that we gave > > > it. In particular, tpm2 appears to be the culprit (though I didn't > > > really check tpm1 at all so it could also be buggy). > > > > > > The following patch hopefully should confirm that this is indeed > > > caused by TPM and not some other HWRNG driver. > > > > > > > > > > > > > > ---8<--- > > > If a buggy driver returns a length that is longer than the size > > > of the buffer provided to it, then this may lead to a buffer overread > > > in the caller. > > > > > > Stop this by adding a check for it in the hwrng core. > > > > > > Reported-by: Guangwu Zhang <guazhang@redhat.com> > > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > > index 57c51efa5613..018316f54621 100644 > > > --- a/drivers/char/hw_random/core.c > > > +++ b/drivers/char/hw_random/core.c > > > @@ -181,8 +181,15 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, > > > int present; > > > > > > BUG_ON(!mutex_is_locked(&reading_mutex)); > > > - if (rng->read) > > > - return rng->read(rng, (void *)buffer, size, wait); > > > + if (rng->read) { > > > + int err; > > > + > > > + err = rng->read(rng, buffer, size, wait); > > > + if (WARN_ON_ONCE(err > 0 && err > size)) > > > > Are you sure you want to use WARN_ON_ONCE here instead of > > pr_warn_once()? I.e. is it worth of taking down the whole > > kernel? > > I looked at tpm2_get_random() and it is pretty inefficient (not same > as saying it has a bug). I'd love to reimplement it. > > We would be better of it would pull random let's say with 256 byte > or 512 byte chunks and cache that internal to tpm_chip. Then the > requests would be served from that pre-fetched pool and not do > round-trip to TPM every single time. > > This would improve overall kernel performance given the reduced > number of wait states. I wonder if anyone knows what would be a > good fetch size that will work on most TPM2 chips? Herbert, I'm going to go to gym but I could help you to debug the issue you're seeing with a patch from tpm2_get_random(). We need to fix that one first ofc (as you never should build on top of broken code). Once back from gym and grocery shopping I'll bake a debugging patch. BR, Jarkko
On Mon, Sep 23, 2024 at 10:52:49AM +0300, Jarkko Sakkinen wrote: > On Mon Sep 23, 2024 at 9:05 AM EEST, Herbert Xu wrote: > > Dear TPM maintainers: > > There's really only just me (for past 10 years). Maybe that should be > updatred. :) > > BUG_ON(!mutex_is_locked(&reading_mutex)); > > - if (rng->read) > > - return rng->read(rng, (void *)buffer, size, wait); > > + if (rng->read) { > > + int err; > > + > > + err = rng->read(rng, buffer, size, wait); > > + if (WARN_ON_ONCE(err > 0 && err > size)) > > Are you sure you want to use WARN_ON_ONCE here instead of > pr_warn_once()? I.e. is it worth of taking down the whole > kernel? Absolutely. If this triggers it's a serious kernel bug and we should gather as much information as possible. pr_warn_once is not the same thing as WARN_ON_ONCE in terms of what it prints. If people want to turn WARNs into BUGs, then they've only got themselves to blame when the kernel goes down. On the other hand perhaps they *do* want this to panic and we should hand it to them. Thanks,
On Mon Sep 23, 2024 at 12:26 PM EEST, Herbert Xu wrote: > > + > > > + err = rng->read(rng, buffer, size, wait); > > > + if (WARN_ON_ONCE(err > 0 && err > size)) > > > > Are you sure you want to use WARN_ON_ONCE here instead of > > pr_warn_once()? I.e. is it worth of taking down the whole > > kernel? > > Absolutely. If this triggers it's a serious kernel bug and we > should gather as much information as possible. pr_warn_once is > not the same thing as WARN_ON_ONCE in terms of what it prints. Personally I allow the use of WARN only as the last resort. If you need stack printout you can always use dump_stack(). > > If people want to turn WARNs into BUGs, then they've only got > themselves to blame when the kernel goes down. On the other > hand perhaps they *do* want this to panic and we should hand > it to them. Actually when you turn on "panic_on_warn" the user expectation is and should be that the sites where WARN is used have been hand picked with consideration so that panic happens for a reason. This has also been denoted repeatedly by Greg: https://lore.kernel.org/linux-cve-announce/2024061828-CVE-2024-36975-6719@gregkh/ I should check this somewhere but actually these days a wrongly chosen WARN() might lead to CVE entry. That fix was by me but I never created the CVE. Greg, did we have something under Documentation/ that would fully address the use of WARN? BR, Jarkko
On Mon Sep 23, 2024 at 5:31 PM EEST, Jarkko Sakkinen wrote: > Greg, did we have something under Documentation/ that would fully > address the use of WARN? ... personally I think that even if there was a separate document, this should be somehow addressed already in SubmittingPatches so that it can't be possibly missed by anyone. BR, Jarkko
On Mon, Sep 23, 2024 at 05:31:47PM +0300, Jarkko Sakkinen wrote: > On Mon Sep 23, 2024 at 12:26 PM EEST, Herbert Xu wrote: > > > + > > > > + err = rng->read(rng, buffer, size, wait); > > > > + if (WARN_ON_ONCE(err > 0 && err > size)) > > > > > > Are you sure you want to use WARN_ON_ONCE here instead of > > > pr_warn_once()? I.e. is it worth of taking down the whole > > > kernel? > > > > Absolutely. If this triggers it's a serious kernel bug and we > > should gather as much information as possible. pr_warn_once is > > not the same thing as WARN_ON_ONCE in terms of what it prints. > > Personally I allow the use of WARN only as the last resort. > > If you need stack printout you can always use dump_stack(). > > > > > If people want to turn WARNs into BUGs, then they've only got > > themselves to blame when the kernel goes down. On the other > > hand perhaps they *do* want this to panic and we should hand > > it to them. > > Actually when you turn on "panic_on_warn" the user expectation is and > should be that the sites where WARN is used have been hand picked with > consideration so that panic happens for a reason. > > This has also been denoted repeatedly by Greg: > > https://lore.kernel.org/linux-cve-announce/2024061828-CVE-2024-36975-6719@gregkh/ > > I should check this somewhere but actually these days a wrongly chosen > WARN() might lead to CVE entry. That fix was by me but I never created > the CVE. > > Greg, did we have something under Documentation/ that would fully > address the use of WARN? Please see: https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on which describes that. We should make it more explicit that any WARN() or WARN_ON() calls that can be hit by user interactions somehow, will end up getting a CVE id when we fix it up to not do so. thanks, greg k-h
On Mon Sep 23, 2024 at 5:48 PM EEST, Greg KH wrote: > On Mon, Sep 23, 2024 at 05:31:47PM +0300, Jarkko Sakkinen wrote: > > On Mon Sep 23, 2024 at 12:26 PM EEST, Herbert Xu wrote: > > > > + > > > > > + err = rng->read(rng, buffer, size, wait); > > > > > + if (WARN_ON_ONCE(err > 0 && err > size)) > > > > > > > > Are you sure you want to use WARN_ON_ONCE here instead of > > > > pr_warn_once()? I.e. is it worth of taking down the whole > > > > kernel? > > > > > > Absolutely. If this triggers it's a serious kernel bug and we > > > should gather as much information as possible. pr_warn_once is > > > not the same thing as WARN_ON_ONCE in terms of what it prints. > > > > Personally I allow the use of WARN only as the last resort. > > > > If you need stack printout you can always use dump_stack(). > > > > > > > > If people want to turn WARNs into BUGs, then they've only got > > > themselves to blame when the kernel goes down. On the other > > > hand perhaps they *do* want this to panic and we should hand > > > it to them. > > > > Actually when you turn on "panic_on_warn" the user expectation is and > > should be that the sites where WARN is used have been hand picked with > > consideration so that panic happens for a reason. > > > > This has also been denoted repeatedly by Greg: > > > > https://lore.kernel.org/linux-cve-announce/2024061828-CVE-2024-36975-6719@gregkh/ > > > > I should check this somewhere but actually these days a wrongly chosen > > WARN() might lead to CVE entry. That fix was by me but I never created > > the CVE. > > > > Greg, did we have something under Documentation/ that would fully > > address the use of WARN? > > Please see: > https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on > which describes that. We should make it more explicit that any WARN() > or WARN_ON() calls that can be hit by user interactions somehow, will > end up getting a CVE id when we fix it up to not do so. I bookmarked this thanks :-) Herbert, I'll do comprehensive testing tmrw by adding some invariants to tpm2_get_random(). I'd really love to reimplement it because the current implementation frankly sucks (and it's by me) but yep, we nee to fix it first and foremost. > > thanks, > > greg k-h BR, Jarkko
On Mon, Sep 23, 2024 at 04:48:27PM +0200, Greg KH wrote: > > Please see: > https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on > which describes that. We should make it more explicit that any WARN() > or WARN_ON() calls that can be hit by user interactions somehow, will > end up getting a CVE id when we fix it up to not do so. If the aformentioned WARN_ON hits, then the driver has probabaly already done a buffer overrun so it's a CVE anyway. Cheers,
On Tue Sep 24, 2024 at 1:32 AM EEST, Herbert Xu wrote: > On Mon, Sep 23, 2024 at 04:48:27PM +0200, Greg KH wrote: > > > > Please see: > > https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on > > which describes that. We should make it more explicit that any WARN() > > or WARN_ON() calls that can be hit by user interactions somehow, will > > end up getting a CVE id when we fix it up to not do so. > > If the aformentioned WARN_ON hits, then the driver has probabaly > already done a buffer overrun so it's a CVE anyway. We'll see I finally got into testing this. Sorry for latencies, I'm switching jobs and unfortunately German Post Office lost my priority mail containing contracts (sent them from Finland to Berlin) so have been signing, scanning etc. the whole day :-) My last week in the current job, and next week is the first in the new job, so this week is a bit bumpy. BR, Jarkko
On Tue Sep 24, 2024 at 7:05 PM EEST, Jarkko Sakkinen wrote: > On Tue Sep 24, 2024 at 1:32 AM EEST, Herbert Xu wrote: > > On Mon, Sep 23, 2024 at 04:48:27PM +0200, Greg KH wrote: > > > > > > Please see: > > > https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on > > > which describes that. We should make it more explicit that any WARN() > > > or WARN_ON() calls that can be hit by user interactions somehow, will > > > end up getting a CVE id when we fix it up to not do so. > > > > If the aformentioned WARN_ON hits, then the driver has probabaly > > already done a buffer overrun so it's a CVE anyway. > > We'll see I finally got into testing this. Sorry for latencies, I'm > switching jobs and unfortunately German Post Office lost my priority > mail containing contracts (sent them from Finland to Berlin) so have > been signing, scanning etc. the whole day :-) My last week in the > current job, and next week is the first in the new job, so this > week is a bit bumpy. I get nothing with this: diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index aba024cbe7c5..856a8356d971 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -341,12 +341,15 @@ int tpm2_get_random(struct tpm_chip *chip, u8 *dest, size_t max) dest_ptr += recd; total += recd; + + WARN_ON(num_bytes < recd); num_bytes -= recd; } while (retries-- && total < max); tpm_buf_destroy(&buf); tpm2_end_auth_session(chip); + WARN_ON(total > max); return total ? total : -EIO; out: tpm_buf_destroy(&buf); [WARN_ON()'s here are only for the temporary diff] Call stack: 1. tpm2_get_random(): https://elixir.bootlin.com/linux/v6.11-rc7/source/drivers/char/tpm/tpm2-cmd.c#L281 2. tpm_get_random(): https://elixir.bootlin.com/linux/v6.11-rc7/source/drivers/char/tpm/tpm-interface.c#L430 3. tpm_hwrng_read(): https://elixir.bootlin.com/linux/v6.11-rc7/source/drivers/char/tpm/tpm-chip.c#L524 Everything seems to have also appropriate range checks. Without any traces that would provide more information I don't see the smoking gun. BR, Jarkko
On Tue, Sep 24, 2024 at 08:43:00PM +0300, Jarkko Sakkinen wrote: > > Without any traces that would provide more information I don't see > the smoking gun. I haven't confirmed that it's definitely the tpm2 driver, it's just based on the backtrace. Hopefully my patch will confirm it one way or the other. Here is the backtrace: [ 100.784159] vmd 0000:c2:00.5: Bound to PCI domain 10002 [ 100.786209] Monitor-Mwait will be used to enter C-1 state [ 100.786225] Monitor-Mwait will be used to enter C-2 state [ 100.786244] ACPI: \_SB_.SCK0.C000: Found 2 idle states [ 100.823093] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 [ 100.823636] ACPI: button: Power Button [PWRF] [ 100.905756] ERST: Error Record Serialization Table (ERST) support is initialized. [ 100.905858] pstore: Using crash dump compression: deflate [ 100.905861] pstore: Registered erst as persistent store backend [ 100.907044] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 100.908305] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A [ 100.926608] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A [ 100.942953] Non-volatile memory driver v1.3 [ 100.947908] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22) [ 101.226913] ACPI: bus type drm_connector registered [ 101.229708] alg: ecdh-nist-p256 (ecdh-nist-p256-generic) is disabled due to FIPS [ 101.229745] tpm tpm0: crypto ecdh allocation failed [ 101.236311] tpm tpm0: A TPM error (708) occurred start auth session [ 101.238797] ================================================================== [ 101.238800] BUG: KASAN: slab-out-of-bounds in blake2s_update+0x135/0x2b0 [ 101.238808] Read of size 44 at addr ff11000167334d98 by task hwrng/318 [ 101.238811] [ 101.238813] CPU: 26 UID: 0 PID: 318 Comm: hwrng Not tainted 6.11.0-0.rc5.22.el10.x86_64+debug #1 [ 101.238818] Hardware name: Supermicro SSG-110P-NTR10-EI018/X12SPO-NTF, BIOS 1.3 05/20/2022 [ 101.238820] Call Trace: [ 101.238823] <TASK> [ 101.238826] dump_stack_lvl+0x6f/0xb0 [ 101.238833] ? blake2s_update+0x135/0x2b0 [ 101.238836] print_address_description.constprop.0+0x88/0x330 [ 101.238843] ? blake2s_update+0x135/0x2b0 [ 101.238847] print_report+0x108/0x209 [ 101.238851] ? blake2s_update+0x135/0x2b0 [ 101.238855] ? __virt_addr_valid+0x20b/0x440 [ 101.238859] ? blake2s_update+0x135/0x2b0 [ 101.238863] kasan_report+0xa8/0xe0 [ 101.238868] ? blake2s_update+0x135/0x2b0 [ 101.238874] kasan_check_range+0x10f/0x1f0 [ 101.238879] __asan_memcpy+0x23/0x60 [ 101.238883] blake2s_update+0x135/0x2b0 [ 101.238887] add_hwgenerator_randomness+0x3d/0xe0 [ 101.238895] hwrng_fillfn+0x144/0x270 [ 101.238900] ? __pfx_hwrng_fillfn+0x10/0x10 [ 101.238904] kthread+0x2d2/0x3a0 [ 101.238908] ? __pfx_kthread+0x10/0x10 [ 101.238912] ret_from_fork+0x31/0x70 [ 101.238917] ? __pfx_kthread+0x10/0x10 [ 101.238920] ret_from_fork_asm+0x1a/0x30 [ 101.238929] </TASK> [ 101.238931] [ 101.238932] Allocated by task 1: [ 101.238934] kasan_save_stack+0x30/0x50 [ 101.238937] kasan_save_track+0x14/0x30 [ 101.238940] __kasan_kmalloc+0x8f/0xa0 [ 101.238942] __kmalloc_noprof+0x1fe/0x410 [ 101.238947] kobj_map+0x7e/0x6d0 [ 101.238951] cdev_add+0x92/0x180 [ 101.238954] tty_cdev_add+0x17a/0x280 [ 101.238957] tty_register_device_attr+0x401/0x740 [ 101.238960] tty_register_driver+0x381/0x6f0 [ 101.238963] vty_init+0x2c1/0x2f0 [ 101.238967] tty_init+0x13b/0x150 [ 101.238970] do_one_initcall+0x11c/0x5c0 [ 101.238975] do_initcalls+0x1b4/0x1f0 [ 101.238980] kernel_init_freeable+0x4ae/0x520 [ 101.238984] kernel_init+0x1c/0x150 [ 101.238988] ret_from_fork+0x31/0x70 [ 101.238991] ret_from_fork_asm+0x1a/0x30 [ 101.238994] [ 101.238995] The buggy address belongs to the object at ff11000167334d80 [ 101.238995] which belongs to the cache kmalloc-64 of size 64 [ 101.238998] The buggy address is located 24 bytes inside of [ 101.238998] allocated 56-byte region [ff11000167334d80, ff11000167334db8) [ 101.239002] [ 101.239003] The buggy address belongs to the physical page: [ 101.239004] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x167334 [ 101.239008] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) [ 101.239012] page_type: 0xfdffffff(slab) [ 101.239016] raw: 0017ffffc0000000 ff1100010003c8c0 dead000000000122 0000000000000000 [ 101.239019] raw: 0000000000000000 0000000000200020 00000001fdffffff 0000000000000000 [ 101.239021] page dumped because: kasan: bad access detected [ 101.239023] [ 101.239024] Memory state around the buggy address: [ 101.239025] ff11000167334c80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc [ 101.239028] ff11000167334d00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc [ 101.239030] >ff11000167334d80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc [ 101.239031] ^ [ 101.239033] ff11000167334e00: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc [ 101.239035] ff11000167334e80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc [ 101.239037] ================================================================== [ 101.383067] rdac: device handler registered [ 101.383412] hp_sw: device handler registered [ 101.383415] emc: device handler registered [ 101.383879] alua: device handler registered [ 101.391255] xhci_hcd 0000:00:14.0: xHCI Host Controller [ 101.391892] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1 [ 101.393706] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000000009810 [ 101.399646] xhci_hcd 0000:00:14.0: xHCI Host Controller [ 101.400136] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2 [ 101.400163] xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed [ 101.400818] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.11 [ 101.400823] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 101.400826] usb usb1: Product: xHCI Host Controller [ 101.400829] usb usb1: Manufacturer: Linux 6.11.0-0.rc5.22.el10.x86_64+debug xhci-hcd [ 101.400832] usb usb1: SerialNumber: 0000:00:14.0 [ 101.403055] hub 1-0:1.0: USB hub found [ 101.403222] hub 1-0:1.0: 16 ports detected [ 101.657974] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.11 [ 101.657982] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 101.657986] usb usb2: Product: xHCI Host Controller [ 101.657990] usb usb2: Manufacturer: Linux 6.11.0-0.rc5.22.el10.x86_64+debug xhci-hcd [ 101.657993] usb usb2: SerialNumber: 0000:00:14.0 [ 101.660659] hub 2-0:1.0: USB hub found [ 101.660882] hub 2-0:1.0: 10 ports detected {code} Thanks,
On Fri, 2024-09-27 at 08:42 +0800, Herbert Xu wrote: > On Tue, Sep 24, 2024 at 08:43:00PM +0300, Jarkko Sakkinen wrote: > > > > Without any traces that would provide more information I don't see > > the smoking gun. > > I haven't confirmed that it's definitely the tpm2 driver, it's just > based on the backtrace. Hopefully my patch will confirm it one way > or the other. Here is the backtrace: Agreed. > > [ 100.784159] vmd 0000:c2:00.5: Bound to PCI domain 10002 > [ 100.786209] Monitor-Mwait will be used to enter C-1 state > [ 100.786225] Monitor-Mwait will be used to enter C-2 state > [ 100.786244] ACPI: \_SB_.SCK0.C000: Found 2 idle states > [ 100.823093] input: Power Button as > /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 > [ 100.823636] ACPI: button: Power Button [PWRF] > [ 100.905756] ERST: Error Record Serialization Table (ERST) support > is initialized. > [ 100.905858] pstore: Using crash dump compression: deflate > [ 100.905861] pstore: Registered erst as persistent store backend > [ 100.907044] Serial: 8250/16550 driver, 4 ports, IRQ sharing > enabled > [ 100.908305] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = > 115200) is a 16550A > [ 100.926608] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = > 115200) is a 16550A > [ 100.942953] Non-volatile memory driver v1.3 > [ 100.947908] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id > 22) > [ 101.226913] ACPI: bus type drm_connector registered > [ 101.229708] alg: ecdh-nist-p256 (ecdh-nist-p256-generic) is > disabled due to FIPS > [ 101.229745] tpm tpm0: crypto ecdh allocation failed > [ 101.236311] tpm tpm0: A TPM error (708) occurred start auth I guess it is TPM2_StartAuthSession returning TPM_RC_VALUE. Probably James should look into this as the bus encryption code is clearly tripping here. I'm on second week on a new job so cannot promise any bandwidth yet this week. Earliest next week... BR, Jarkko
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 57c51efa5613..018316f54621 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -181,8 +181,15 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, int present; BUG_ON(!mutex_is_locked(&reading_mutex)); - if (rng->read) - return rng->read(rng, (void *)buffer, size, wait); + if (rng->read) { + int err; + + err = rng->read(rng, buffer, size, wait); + if (WARN_ON_ONCE(err > 0 && err > size)) + err = size; + + return err; + } if (rng->data_present) present = rng->data_present(rng, wait);
Dear TPM maintainers: Please have a look at the tpm hwrng driver because it appears to be returning a length longer than the buffer length that we gave it. In particular, tpm2 appears to be the culprit (though I didn't really check tpm1 at all so it could also be buggy). The following patch hopefully should confirm that this is indeed caused by TPM and not some other HWRNG driver. ---8<--- If a buggy driver returns a length that is longer than the size of the buffer provided to it, then this may lead to a buffer overread in the caller. Stop this by adding a check for it in the hwrng core. Reported-by: Guangwu Zhang <guazhang@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>