Message ID | 20250317-tegra-v1-1-78474efc0386@debian.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | spi: tegra210-quad: Improve reset and tx failures | expand |
On Mon, Mar 17, 2025 at 08:44:01AM -0700, Breno Leitao wrote: > My UEFI machines with tegra210-quad consistently report "device reset > failed". Investigation showed this isn't an actual failure > - __device_reset() returns -ENOENT because ACPI has no "*_RST" method. That's not the case, it's returning an error because there is no reset controller discoverable via any mechanism. There's no specific handling for ACPI here. It's also not clear that this is a false positive, the driver did indeed fail to reset the device and especially for the error handling case that seems like relevant information. At the very least the changelog should be clarified.
Hello Mark, On Mon, Mar 17, 2025 at 04:45:31PM +0000, Mark Brown wrote: > On Mon, Mar 17, 2025 at 08:44:01AM -0700, Breno Leitao wrote: > > My UEFI machines with tegra210-quad consistently report "device reset > > failed". Investigation showed this isn't an actual failure > > - __device_reset() returns -ENOENT because ACPI has no "*_RST" method. > > That's not the case, it's returning an error because there is no reset > controller discoverable via any mechanism. > Sorry, I was not very familiar with this subsystem, but I chase down __device_reset(), and I found the return was coming from: int __device_reset(struct device *dev, bool optional) { acpi_handle handle = ACPI_HANDLE(dev); if (handle) { if (!acpi_has_method(handle, "_RST")) return optional ? 0 : -ENOENT; > There's no specific handling for ACPI here. Do you mean no _RST method as stated above? > It's also not clear that this is a false positive, the > driver did indeed fail to reset the device and especially for the error > handling case that seems like relevant information. If the driver failed to reset the device, then device_reset_optional() it will return an error code, but it will not return an error code if the RST method is not found, right? Sorry, if I am mis-understading the code here. > At the very least the changelog should be clarified. What would you add to the changelog to make this clear? Thanks for the quick review! --breno
On Mon, Mar 17, 2025 at 09:56:43AM -0700, Breno Leitao wrote: > Hello Mark, > > On Mon, Mar 17, 2025 at 04:45:31PM +0000, Mark Brown wrote: > > On Mon, Mar 17, 2025 at 08:44:01AM -0700, Breno Leitao wrote: > > > My UEFI machines with tegra210-quad consistently report "device reset > > > failed". Investigation showed this isn't an actual failure > > > - __device_reset() returns -ENOENT because ACPI has no "*_RST" method. > > That's not the case, it's returning an error because there is no reset > > controller discoverable via any mechanism. > Sorry, I was not very familiar with this subsystem, but I chase down > __device_reset(), and I found the return was coming from: > int __device_reset(struct device *dev, bool optional) > { > acpi_handle handle = ACPI_HANDLE(dev); > if (handle) { > if (!acpi_has_method(handle, "_RST")) > return optional ? 0 : -ENOENT; > > There's no specific handling for ACPI here. > Do you mean no _RST method as stated above? That's only happening in the case where the device has an ACPI handle, the SPI driver has no idea why the reset API failed to look up a reset controller. Your change is to the SPI driver, not the reset framework. > > It's also not clear that this is a false positive, the > > driver did indeed fail to reset the device and especially for the error > > handling case that seems like relevant information. > If the driver failed to reset the device, then device_reset_optional() > it will return an error code, but it will not return an error code if > the RST method is not found, right? > Sorry, if I am mis-understading the code here. Clearly if no reset controller is available then the driver will have been unable to reset the hardware. That seems like something it actually wanted to do, especially in the error handling case - it's a lot less likely that we'll recover things without the reset happening. During probe it's possibly not so urgent but at other times it seems more relevant. > > At the very least the changelog should be clarified. > What would you add to the changelog to make this clear? For starters the mention of ACPI is irrelevant to what the SPI driver is doing. This sounds like a change specific to ACPI but it affects all users.
Hello Mark, On Mon, Mar 17, 2025 at 05:24:24PM +0000, Mark Brown wrote: > > > There's no specific handling for ACPI here. > > > Do you mean no _RST method as stated above? > > That's only happening in the case where the device has an ACPI handle, > the SPI driver has no idea why the reset API failed to look up a reset > controller. Your change is to the SPI driver, not the reset framework. > > > > It's also not clear that this is a false positive, the > > > driver did indeed fail to reset the device and especially for the error > > > handling case that seems like relevant information. > > > If the driver failed to reset the device, then device_reset_optional() > > it will return an error code, but it will not return an error code if > > the RST method is not found, right? > > > Sorry, if I am mis-understading the code here. > > Clearly if no reset controller is available then the driver will have > been unable to reset the hardware. That seems like something it > actually wanted to do, especially in the error handling case - it's a > lot less likely that we'll recover things without the reset happening. > During probe it's possibly not so urgent but at other times it seems > more relevant. Thanks for your answer! Let me backup and explain how I am understanding this issue, and my possible wrong assumptions: 1) The SPI controller is reseted in the driver in a few cases: a) At probe time b) At transmission side (when there is a timeout to the controller) 2) On the machines I have, I understand that the controller failed to reset on both cases: a) At boot time with "tegra-qspi NVDA1513:00: device reset failed" b) At error handling with, the message below: tegra-qspi NVDA1513:00: QSPI Transfer failed with timeout: 0 spi_master spi0: failed to transfer one message from queue spi_master spi0: noqueue transfer failed WARNING: CPU: 1 PID: 1221 at drivers/spi/spi-tegra210-quad.c:1120 tegra_qspi_transfer_one_message+0x780/0x918 [spi_tegra210_quad] Full log at: https://paste.debian.net/1363773/ c) I don't see the "device reset failed" in this case transmission side. But the device doesn't recover also. 3) These device fail to reset at probe because there is no ACPI method related resetting then (aka "_RST" methods), thus, device_reset() will return -ENOENT; 4) Not being able to reset the driver seems to be the root cause of the WARNING flood I am seeing. My assumptions, now: 1) This controller doesn't have _RST ACPI method by design. 2) It is OK to not have reset methods (!?) 3) There are two helpers to reset the driver device_reset_optional() and device_reset(). a) For device_reset(), the helper will fail if the device doesn't reset, thus, for ACPI systems, the _RST method needs to exist and return successful, otherwise it will return a ERRNO. b) device_reset_optional() only fails if the reset fail (either in ACPI or not), but, doesn't fail (aka returning 0) if reset methods (aka _RST in ACPI) is not available. c) Given assumption #1, device_reset_optional() is more appropriate given that this method does not exist anyway. d) This should be a no-op for systems that have proper reset methods. Thanks for helping me with this issue, --breno
On Tue, Mar 18, 2025 at 03:36:54AM -0700, Breno Leitao wrote: > My assumptions, now: > 1) This controller doesn't have _RST ACPI method by design. > 2) It is OK to not have reset methods (!?) Well, that's not clear to me. It seems likely to work a lot of the time on probe but I don't know how well it handles a warm reboot for example. Like I say the error handling case seems more likely to be at least less effective without a reset controller so it'd be worth logging. In the DT the reset controller is a required property which suggests the driver might be assuming it's got the hardware into a known state.
On Tue, Mar 18, 2025 at 12:48:13PM +0000, Mark Brown wrote: > Well, that's not clear to me. It seems likely to work a lot of the time > on probe but I don't know how well it handles a warm reboot for example. > Like I say the error handling case seems more likely to be at least less > effective without a reset controller so it'd be worth logging. In the > DT the reset controller is a required property which suggests the driver > might be assuming it's got the hardware into a known state. Makes sense. Another question, for platforms like this one that doesn't have the device reset methods, what can we do to stop the bleed? Basically every message that is sent to the SPI controller will fail, which will trigger the device_reet() which is a no-op, but the device will continue to be online. Should we disable the device after some point? Regarding this patchset, I understand that patch #1 is not ideal as discussed above, what about patch 2 and 3? Thanks --breno
On Tue, Mar 18, 2025 at 10:02:47AM -0700, Breno Leitao wrote: > Makes sense. Another question, for platforms like this one that doesn't > have the device reset methods, what can we do to stop the bleed? > Basically every message that is sent to the SPI controller will fail, > which will trigger the device_reet() which is a no-op, but the device > will continue to be online. Should we disable the device after some > point? The SPI controller is only going to be doing something because some driver for an attached SPI device is trying to do something. Presumably whatever driver that is won't be having a good time and can hopefully figure something out, though given that SPI is simple and not hotpluggable this isn't really something that comes up a lot in production so I'd be unsurprised to see things just keep on retrying. I'd expect to see any substantial error handling in the driver for the device rather than in the controller. Obviously there's something wrong with the device description here which is upsetting the controller driver. > Regarding this patchset, I understand that patch #1 is not ideal as > discussed above, what about patch 2 and 3? If I didn't say anything they're probably fine.
On Tue, Mar 18, 2025 at 05:34:55PM +0000, Mark Brown wrote: > On Tue, Mar 18, 2025 at 10:02:47AM -0700, Breno Leitao wrote: > > > Makes sense. Another question, for platforms like this one that doesn't > > have the device reset methods, what can we do to stop the bleed? > > > Basically every message that is sent to the SPI controller will fail, > > which will trigger the device_reet() which is a no-op, but the device > > will continue to be online. Should we disable the device after some > > point? > > The SPI controller is only going to be doing something because some > driver for an attached SPI device is trying to do something. Presumably > whatever driver that is won't be having a good time and can hopefully > figure something out, though given that SPI is simple and not > hotpluggable this isn't really something that comes up a lot in > production so I'd be unsurprised to see things just keep on retrying. > I'd expect to see any substantial error handling in the driver for the > device rather than in the controller. Good point. In my specific case, this is coming from tpm_tis, which is not aware that the device is totally dead, and continues to ask for random numbers: tegra_qspi_transfer_one_message __spi_pump_transfer_message __spi_sync spi_sync tpm_tis_spi_transfer tpm_tis_spi_read_bytes tpm_tis_request_locality tpm_chip_start tpm_try_get_ops tpm_find_get_ops tpm_get_random tpm_hwrng_read hwrng_fillfn kthread ret_from_fork Looking at tpm_tis, it seems it doesn't care if the the SPI is dead, and just forward through the requests, which never complete. Adding Arnd to see if he has any idea about this. Arnd, Summary of the proiblem: tpm_tis is trying to read random numbers through a dead SPI controller. That causes infinite amounts of warnings on the kernel, given that the controller is WARNing on time outs (which is being fixed in one of the patches in this patchset). Question: Should tpm_tis be aware that the underneath SPI controller is dead, and eventually get unplugged? > Obviously there's something wrong with the device description here which > is upsetting the controller driver. > > > Regarding this patchset, I understand that patch #1 is not ideal as > > discussed above, what about patch 2 and 3? > > If I didn't say anything they're probably fine. Do you want me to resend those two separately, or, is this thread enough? Thanks again, --breno
On Tue, Mar 18, 2025 at 11:29:26AM -0700, Breno Leitao wrote: > On Tue, Mar 18, 2025 at 05:34:55PM +0000, Mark Brown wrote: > > On Tue, Mar 18, 2025 at 10:02:47AM -0700, Breno Leitao wrote: > > > > > Makes sense. Another question, for platforms like this one that doesn't > > > have the device reset methods, what can we do to stop the bleed? > > > > > Basically every message that is sent to the SPI controller will fail, > > > which will trigger the device_reet() which is a no-op, but the device > > > will continue to be online. Should we disable the device after some > > > point? > > > > The SPI controller is only going to be doing something because some > > driver for an attached SPI device is trying to do something. Presumably > > whatever driver that is won't be having a good time and can hopefully > > figure something out, though given that SPI is simple and not > > hotpluggable this isn't really something that comes up a lot in > > production so I'd be unsurprised to see things just keep on retrying. > > I'd expect to see any substantial error handling in the driver for the > > device rather than in the controller. > > Good point. In my specific case, this is coming from tpm_tis, > which is not aware that the device is totally dead, and continues to ask > for random numbers: > > tegra_qspi_transfer_one_message > __spi_pump_transfer_message > __spi_sync > spi_sync > tpm_tis_spi_transfer > tpm_tis_spi_read_bytes > tpm_tis_request_locality > tpm_chip_start > tpm_try_get_ops > tpm_find_get_ops > tpm_get_random > tpm_hwrng_read > hwrng_fillfn > kthread > ret_from_fork > > Looking at tpm_tis, it seems it doesn't care if the the SPI is dead, and > just forward through the requests, which never complete. Adding Arnd to > see if he has any idea about this. > > Arnd, > > Summary of the proiblem: tpm_tis is trying to read random numbers > through a dead SPI controller. That causes infinite amounts of warnings > on the kernel, given that the controller is WARNing on time outs (which > is being fixed in one of the patches in this patchset). > > Question: Should tpm_tis be aware that the underneath SPI controller is > dead, and eventually get unplugged? Adding Arnd to the email.
On Tue, Mar 18, 2025 at 11:29:26AM -0700, Breno Leitao wrote: > On Tue, Mar 18, 2025 at 05:34:55PM +0000, Mark Brown wrote: > > On Tue, Mar 18, 2025 at 10:02:47AM -0700, Breno Leitao wrote: > > > Regarding this patchset, I understand that patch #1 is not ideal as > > > discussed above, what about patch 2 and 3? > > If I didn't say anything they're probably fine. > Do you want me to resend those two separately, or, is this thread > enough? Please resend. I think I was anticipating a new version of this patch with a clarified changelog and some rework to tone down the logging that's generated similar to the other patches rather than just silently ignoring the lack of a reset controller.
On Tue, Mar 18, 2025, at 19:32, Breno Leitao wrote: > On Tue, Mar 18, 2025 at 11:29:26AM -0700, Breno Leitao wrote: >> On Tue, Mar 18, 2025 at 05:34:55PM +0000, Mark Brown wrote: >> >> Summary of the proiblem: tpm_tis is trying to read random numbers >> through a dead SPI controller. That causes infinite amounts of warnings >> on the kernel, given that the controller is WARNing on time outs (which >> is being fixed in one of the patches in this patchset). >> >> Question: Should tpm_tis be aware that the underneath SPI controller is >> dead, and eventually get unplugged? > > Adding Arnd to the email. Hi Breno, That does sound like the easiest answer: if the spi controller driver knows that it needs a reset but there is no reset controller, shutting itself down and removing its child devices seems like the least offensive action. No idea if there are other spi controllers that do something like this. Arnd
On Tue, Mar 18, 2025 at 06:35:18PM +0000, Mark Brown wrote: > > Do you want me to resend those two separately, or, is this thread > > enough? > > Please resend. I think I was anticipating a new version of this patch > with a clarified changelog and some rework to tone down the logging > that's generated similar to the other patches rather than just silently > ignoring the lack of a reset controller. Sorry, I am more than happy to change it the way you prefer, but, the warnings coming from "device reset failed" are already printed once: Here are the instances of calls to device_reset(), all of them with `dev_warn_once()`: if (device_reset(tqspi->dev) < 0) dev_warn_once(tqspi->dev, "device reset failed\n"); and /* Reset controller if timeout happens */ if (device_reset(tqspi->dev) < 0) dev_warn_once(tqspi->dev, "device reset failed\n"); So, this one is not very noisy. Should I change anything? On the other side, I see some other messages that are very noise, being displayed at every message that is failing to go through. They are: spi_master spi0: failed to transfer one message from queue spi_master spi0: noqueue transfer failed I will rate limit those as well. Thanks for your direction, --breno
On Tue, Mar 18, 2025 at 08:00:05PM +0100, Arnd Bergmann wrote: > That does sound like the easiest answer: if the spi controller driver > knows that it needs a reset but there is no reset controller, shutting > itself down and removing its child devices seems like the least > offensive action. In that case it's probably more just refuse to probe in the first case without the reset controller. Given that the device isn't working at all it seems like the hardware description is broken anyway... > No idea if there are other spi controllers that do something like this. I'm really not thrilled about adding runtime error handling at that level in the controller - it'll start to get into policy stuff if anyone does something clever and realistically it's all broken hardware description or very severe physical failure type stuff.
On Tue, Mar 18, 2025 at 12:08:56PM -0700, Breno Leitao wrote: > Sorry, I am more than happy to change it the way you prefer, but, the > warnings coming from "device reset failed" are already printed once: > Here are the instances of calls to device_reset(), all of them with > `dev_warn_once()`: Oh, in that case I guess just drop it (or based on Arnd's suggestion change the one in probe() to be fatal).
On Tue, Mar 18, 2025, at 20:13, Mark Brown wrote: > On Tue, Mar 18, 2025 at 08:00:05PM +0100, Arnd Bergmann wrote: > >> That does sound like the easiest answer: if the spi controller driver >> knows that it needs a reset but there is no reset controller, shutting >> itself down and removing its child devices seems like the least >> offensive action. > > In that case it's probably more just refuse to probe in the first case > without the reset controller. Given that the device isn't working at > all it seems like the hardware description is broken anyway... Right, I see now that it's doing a rather silly if (device_reset(tqspi->dev) < 0) dev_warn_once(tqspi->dev, "device reset failed\n"); after which it just continues instead of propagating returning the error from the probe function. This is also broken when the reset controller driver has not been loaded yet and it should do an -EPROBE_DEFER. In case of a broken ACPI table, this would simply fail the probe() with an error, which seems like a sensible behavior. Arnd
Hello Arnd, Thierry, Jonathan, Sowjanya, On Tue, Mar 18, 2025 at 09:07:28PM +0100, Arnd Bergmann wrote: > On Tue, Mar 18, 2025, at 20:13, Mark Brown wrote: > > On Tue, Mar 18, 2025 at 08:00:05PM +0100, Arnd Bergmann wrote: > > > >> That does sound like the easiest answer: if the spi controller driver > >> knows that it needs a reset but there is no reset controller, shutting > >> itself down and removing its child devices seems like the least > >> offensive action. > > > > In that case it's probably more just refuse to probe in the first case > > without the reset controller. Given that the device isn't working at > > all it seems like the hardware description is broken anyway... > > Right, I see now that it's doing a rather silly > > if (device_reset(tqspi->dev) < 0) > dev_warn_once(tqspi->dev, "device reset failed\n"); > > after which it just continues instead of propagating returning > the error from the probe function. This would be another option, and I would be happy to update this patch with this suggestion. This patch was attempting to address the issue the other way around, where I was expecting that the reset methods are optional, thus marking the device_reset() function as optional. It appears that on certain UEFI machine types, the ACPI firmware doesn't implement the _RST methods, and device_reset() will *always* fail. It's unclear whether this is due to a broken ACPI table or if it was intentionally designed this way. Tagging the driver maintainer (Thierry, Jonathan, Sowjanya) who might have a better understanding of the design in such cases. > This is also broken when > the reset controller driver has not been loaded yet and it > should do an -EPROBE_DEFER. > > In case of a broken ACPI table, this would simply fail the > probe() with an error, which seems like a sensible behavior. Do we agree that the device reset methods MUST always exist (on both DT and UEFI hosts)? Anyway, from my naive view, we should: 1) Mark as required, and fail the probe, if this device_reset() must have available methods. (Arnd's suggestion) 2) Mark device_reset as optional if device reset is optional (as the current situation suggest). a) If the requirements are different for DT and UEFI, then should we create a "device_reset_optional_on_acpi_but_not_DT()" helper to handle such cases(!?) Thanks for the discussion, --breno
On Tue, Mar 18, 2025 at 09:07:28PM +0100, Arnd Bergmann wrote: > On Tue, Mar 18, 2025, at 20:13, Mark Brown wrote: > > In that case it's probably more just refuse to probe in the first case > > without the reset controller. Given that the device isn't working at > > all it seems like the hardware description is broken anyway... > Right, I see now that it's doing a rather silly > > if (device_reset(tqspi->dev) < 0) > dev_warn_once(tqspi->dev, "device reset failed\n"); > after which it just continues instead of propagating returning > the error from the probe function. This is also broken when > the reset controller driver has not been loaded yet and it > should do an -EPROBE_DEFER. Modulo the probe deferral it does make a degree of sense in the probe function since there's a reasonable chance things are in a reset state by virtue of never having been touched since power on, you do see things like this as a transition measure. > In case of a broken ACPI table, this would simply fail the > probe() with an error, which seems like a sensible behavior. Yes. If we need to support these ACPI tables the driver will need to learn how to get the hardware back into default state itself, assuming that's possible and there's no FIFO clearing issues or anything.
On Wed, Mar 19, 2025 at 03:09:57AM -0700, Breno Leitao wrote: > Hello Arnd, Thierry, Jonathan, Sowjanya, > > On Tue, Mar 18, 2025 at 09:07:28PM +0100, Arnd Bergmann wrote: > > On Tue, Mar 18, 2025, at 20:13, Mark Brown wrote: > > > On Tue, Mar 18, 2025 at 08:00:05PM +0100, Arnd Bergmann wrote: > > > > > >> That does sound like the easiest answer: if the spi controller driver > > >> knows that it needs a reset but there is no reset controller, shutting > > >> itself down and removing its child devices seems like the least > > >> offensive action. > > > > > > In that case it's probably more just refuse to probe in the first case > > > without the reset controller. Given that the device isn't working at > > > all it seems like the hardware description is broken anyway... > > > > Right, I see now that it's doing a rather silly > > > > if (device_reset(tqspi->dev) < 0) > > dev_warn_once(tqspi->dev, "device reset failed\n"); > > > > after which it just continues instead of propagating returning > > the error from the probe function. > > This would be another option, and I would be happy to update this patch > with this suggestion. > > This patch was attempting to address the issue the other way around, > where I was expecting that the reset methods are optional, thus > marking the device_reset() function as optional. > > It appears that on certain UEFI machine types, the ACPI firmware doesn't > implement the _RST methods, and device_reset() will *always* fail. It's > unclear whether this is due to a broken ACPI table or if it was > intentionally designed this way. > > Tagging the driver maintainer (Thierry, Jonathan, Sowjanya) who might > have a better understanding of the design in such cases. Can you specify what device this is and what software you've been running (including firmware, L4T release, etc.)? I can try to find out if this is a known issue, or if it's even intended to be this way. > > This is also broken when > > the reset controller driver has not been loaded yet and it > > should do an -EPROBE_DEFER. > > > > In case of a broken ACPI table, this would simply fail the > > probe() with an error, which seems like a sensible behavior. > > Do we agree that the device reset methods MUST always exist (on both DT > and UEFI hosts)? > > Anyway, from my naive view, we should: > > 1) Mark as required, and fail the probe, if this device_reset() must > have available methods. (Arnd's suggestion) > > 2) Mark device_reset as optional if device reset is optional (as the > current situation suggest). > > a) If the requirements are different for DT and UEFI, then should we > create a "device_reset_optional_on_acpi_but_not_DT()" helper to > handle such cases(!?) I'm not very familiar with the ACPI side of things, but my recollection is that essentially ACPI talks to BPMP in the background, much the same way that we do using the BPMP driver if booted with a DT. I wouldn't expect there to be any functional differences, so the lack of _RST for this controller seems strange. Again, if you can provide a bit more information about the set up, I can try to find out more. Thanks, Thierry
Hello Thierry, On Wed, Mar 19, 2025 at 07:26:52PM +0100, Thierry Reding wrote: > On Wed, Mar 19, 2025 at 03:09:57AM -0700, Breno Leitao wrote: > > Hello Arnd, Thierry, Jonathan, Sowjanya, > > > > On Tue, Mar 18, 2025 at 09:07:28PM +0100, Arnd Bergmann wrote: > > > On Tue, Mar 18, 2025, at 20:13, Mark Brown wrote: > > > > On Tue, Mar 18, 2025 at 08:00:05PM +0100, Arnd Bergmann wrote: > > > > > > > >> That does sound like the easiest answer: if the spi controller driver > > > >> knows that it needs a reset but there is no reset controller, shutting > > > >> itself down and removing its child devices seems like the least > > > >> offensive action. > > > > > > > > In that case it's probably more just refuse to probe in the first case > > > > without the reset controller. Given that the device isn't working at > > > > all it seems like the hardware description is broken anyway... > > > > > > Right, I see now that it's doing a rather silly > > > > > > if (device_reset(tqspi->dev) < 0) > > > dev_warn_once(tqspi->dev, "device reset failed\n"); > > > > > > after which it just continues instead of propagating returning > > > the error from the probe function. > > > > This would be another option, and I would be happy to update this patch > > with this suggestion. > > > > This patch was attempting to address the issue the other way around, > > where I was expecting that the reset methods are optional, thus > > marking the device_reset() function as optional. > > > > It appears that on certain UEFI machine types, the ACPI firmware doesn't > > implement the _RST methods, and device_reset() will *always* fail. It's > > unclear whether this is due to a broken ACPI table or if it was > > intentionally designed this way. > > > > Tagging the driver maintainer (Thierry, Jonathan, Sowjanya) who might > > have a better understanding of the design in such cases. > > Can you specify what device this is and what software you've been > running (including firmware, L4T release, etc.)? I can try to find out > if this is a known issue, or if it's even intended to be this way. This is running on a NVIDIA Grace arm64 host. Here are a few details I collected, from ACPI tables and dmesg. If this is not enough, would you mind helping me to find how to get the data you are looking for? DSDT Table: * Original Table Header: * Signature "DSDT" * Length 0x00001EC9 (7881) * Revision 0x02 * Checksum 0x9B * OEM ID "NVIDIA" * OEM Table ID "TH500" * OEM Revision 0x00000001 (1) * Compiler ID "INTL" * Compiler Version 0x20220331 (539099953) .... Device (QSP1) { Name (_HID, "NVDA1513") // _HID: Hardware ID Name (_UID, 0x01) // _UID: Unique ID Name (_CCA, 0x01) // _CCA: Cache Coherency Attribute Name (_STA, 0x0F) // _STA: Status Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings { Memory32Fixed (ReadWrite, 0x03250000, // Address Base 0x00010000, // Address Length ) Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, ) { 0x0000003A, } }) Name (_DSD, Package (0x02) // _DSD: Device-Specific Data { ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301") /* Device Properties for _DSD */, Package (0x01) { Package (0x02) { "spi-max-frequency", 0x00989680 } } }) } dmesg: # dmesg | grep -i ACPI [ 0.000000] efi: RTPROP=0x3c5409e398 ACPI 2.0=0x3c44c7b018 SMBIOS 3.0=0x3c54095618 TPMFinalLog=0x3c42c00000 MEMATTR=0x3c4d60b018 ESRT=0x3c4fa6fb98 TPMEventLog=0x3c42907018 RNG=0x3c44c7aa98 MEMRESERVE=0x3c42904e98 [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x0000003C44C7B018 000024 (v02 NVIDIA) [ 0.000000] ACPI: XSDT 0x0000003C44C7B098 00012C (v01 NVIDIA A M I 00000001 AMI 00000001) [ 0.000000] ACPI: FACP 0x0000003C44C7AD98 000114 (v06 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: DSDT 0x0000003C44C74018 001EC9 (v02 NVIDIA TH500 00000001 INTL 20220331) [ 0.000000] ACPI: FIDT 0x0000003C44C7BF18 00009C (v01 ALASKA A M I 01072009 AMI 00010013) [ 0.000000] ACPI: SSDT 0x0000003C44C7BB18 00015E (v02 ALASKA PRMOPREG 00001000 INTL 20220331) [ 0.000000] ACPI: SPMI 0x0000003C44C7BD98 000041 (v05 ALASKA A M I 00000000 AMI. 00000000) [ 0.000000] ACPI: FPDT 0x0000003C44C7BE18 000034 (v01 ALASKA T241c1 00000001 AMI 00000001) [ 0.000000] ACPI: PRMT 0x0000003C44C7BE98 00003C (v00 ALASKA A M I 00000001 AMI 00000001) [ 0.000000] ACPI: SDEI 0x0000003C44C7A018 000024 (v01 NVIDIA A M I 00000001 NVDA 00000001) [ 0.000000] ACPI: HEST 0x0000003C44C78018 001054 (v01 NVIDIA A M I 00000001 NVDA 00000001) [ 0.000000] ACPI: BERT 0x0000003C44C7AF18 000030 (v01 NVIDIA A M I 00000001 NVDA 00000001) [ 0.000000] ACPI: EINJ 0x0000003C44C7AB18 000170 (v01 NVIDIA A M I 00000001 NVDA 00000001) [ 0.000000] ACPI: ERST 0x0000003C44C7A098 000290 (v01 NVIDIA A M I 00000001 NVDA 00000001) [ 0.000000] ACPI: GTDT 0x0000003C44C7A498 000084 (v03 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: APIC 0x0000003C44C76018 001778 (v06 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: PPTT 0x0000003C44C71018 0020F8 (v03 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SSDT 0x0000003C42BA9018 006CA7 (v02 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SPCR 0x0000003C44C7AF98 000050 (v02 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SSDT 0x0000003C44C7A998 0000CC (v02 NVIDIA A M I 00000001 INTL 20220331) [ 0.000000] ACPI: SSDT 0x0000003C44C7A598 0000C0 (v02 NVIDIA TH500 00000001 INTL 20220331) [ 0.000000] ACPI: SSDT 0x0000003C42BA7018 00190C (v02 NVIDIA BPMP_S0 00000001 INTL 20220331) [ 0.000000] ACPI: MCFG 0x0000003C44C7A698 00007C (v01 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SSDT 0x0000003C44C70018 0003AE (v02 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SSDT 0x0000003C44C70418 0003AE (v02 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SSDT 0x0000003C44C70818 0003AF (v02 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SSDT 0x0000003C42BA6018 0003AF (v02 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SSDT 0x0000003C42BA6418 0003AF (v02 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: IORT 0x0000003C42BA5018 00072B (v06 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: APMT 0x0000003C42BA5E98 00013C (v00 ALASKA A M I 00000001 AMI 00000001) [ 0.000000] ACPI: MPAM 0x0000003C42BA6E98 000084 (v01 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: SLIT 0x0000003C42BA6F98 00002D (v01 ALASKA A M I 00000001 AMI 00000001) [ 0.000000] ACPI: SRAT 0x0000003C42BA6818 000574 (v03 NVIDIA A M I 00000001 ARMH 00010000) [ 0.000000] ACPI: HMAT 0x0000003C44C70E98 0000A6 (v02 ALASKA A M I 00000001 AMI 00000001) [ 0.000000] ACPI: WSMT 0x0000003C44C70F98 000028 (v01 ALASKA A M I 00000001 AMI 00000001) [ 0.000000] ACPI: TPM2 0x0000003C44C7A918 00004C (v04 ALASKA A M I 00000001 AMI 00000000) [ 0.000000] ACPI: SPCR: console: pl011,mmio32,0xc280000,115200 [ 0.000000] ACPI: Use ACPI SPCR as default console: Yes [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x80000000-0x3c7fffffff] [ 0.000000] psci: probing for conduit method from ACPI. [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x20000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x40000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x50000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x60000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x70000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x80000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x90000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0xa0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0xb0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0xe0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x100000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x110000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x120000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x130000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x140000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x150000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x160000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x170000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x180000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x190000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1a0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1c0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1d0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1e0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1f0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x200000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x210000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x220000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x230000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x240000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x250000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x260000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x270000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x280000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x290000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2a0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2b0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2c0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2d0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2e0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2f0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x300000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x310000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x320000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x330000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x340000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x350000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x360000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x370000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x380000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3a0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3b0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3c0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3d0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3e0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3f0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x400000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x410000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x420000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x430000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x440000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x460000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x480000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x490000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4a0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4b0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4c0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4d0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4e0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4f0000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x500000 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x510000 -> Node 0 [ 0.000320] ACPI: Core revision 20240827 [ 0.420784] ACPI: Added _OSI(Module Device) [ 0.420785] ACPI: Added _OSI(Processor Device) [ 0.420785] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.420786] ACPI: Added _OSI(Processor Aggregator Device) [ 0.421939] ACPI: 11 ACPI AML tables successfully acquired and loaded [ 0.422284] ACPI: Interpreter enabled [ 0.422286] ACPI: Using GIC for interrupt routing [ 0.422296] ACPI: MCFG table detected, 5 entries [ 0.422732] ACPI: IORT: SMMU-v3[11000000] Mapped to Proximity domain 0 [ 0.422746] ACPI: IORT: SMMU-v3[12000000] Mapped to Proximity domain 0 [ 0.422757] ACPI: IORT: SMMU-v3[15000000] Mapped to Proximity domain 0 [ 0.422767] ACPI: IORT: SMMU-v3[16000000] Mapped to Proximity domain 0 [ 0.422780] ACPI: IORT: SMMU-v3[5000000] Mapped to Proximity domain 0 [ 0.425981] ACPI: CPU0 has been hot-added [ 0.426000] ACPI: CPU1 has been hot-added [ 0.426017] ACPI: CPU2 has been hot-added [ 0.426034] ACPI: CPU3 has been hot-added [ 0.426050] ACPI: CPU4 has been hot-added [ 0.426066] ACPI: CPU5 has been hot-added [ 0.426083] ACPI: CPU6 has been hot-added [ 0.426099] ACPI: CPU7 has been hot-added [ 0.426117] ACPI: CPU8 has been hot-added [ 0.426132] ACPI: CPU9 has been hot-added [ 0.426149] ACPI: CPU10 has been hot-added [ 0.426164] ACPI: CPU11 has been hot-added [ 0.426181] ACPI: CPU12 has been hot-added [ 0.426196] ACPI: CPU13 has been hot-added [ 0.426212] ACPI: CPU14 has been hot-added [ 0.426228] ACPI: CPU15 has been hot-added [ 0.426245] ACPI: CPU16 has been hot-added [ 0.426262] ACPI: CPU17 has been hot-added [ 0.426278] ACPI: CPU18 has been hot-added [ 0.426295] ACPI: CPU19 has been hot-added [ 0.426310] ACPI: CPU20 has been hot-added [ 0.426327] ACPI: CPU21 has been hot-added [ 0.426342] ACPI: CPU22 has been hot-added [ 0.426358] ACPI: CPU23 has been hot-added [ 0.426374] ACPI: CPU24 has been hot-added [ 0.426390] ACPI: CPU25 has been hot-added [ 0.426406] ACPI: CPU26 has been hot-added [ 0.426421] ACPI: CPU27 has been hot-added [ 0.426438] ACPI: CPU28 has been hot-added [ 0.426453] ACPI: CPU29 has been hot-added [ 0.426470] ACPI: CPU30 has been hot-added [ 0.426485] ACPI: CPU31 has been hot-added [ 0.426502] ACPI: CPU32 has been hot-added [ 0.426517] ACPI: CPU33 has been hot-added [ 0.426533] ACPI: CPU34 has been hot-added [ 0.426549] ACPI: CPU35 has been hot-added [ 0.426571] ACPI: CPU36 has been hot-added [ 0.426587] ACPI: CPU37 has been hot-added [ 0.426602] ACPI: CPU38 has been hot-added [ 0.426620] ACPI: CPU39 has been hot-added [ 0.426640] ACPI: CPU40 has been hot-added [ 0.426657] ACPI: CPU41 has been hot-added [ 0.426672] ACPI: CPU42 has been hot-added [ 0.426688] ACPI: CPU43 has been hot-added [ 0.426706] ACPI: CPU44 has been hot-added [ 0.426721] ACPI: CPU45 has been hot-added [ 0.426737] ACPI: CPU46 has been hot-added [ 0.426752] ACPI: CPU47 has been hot-added [ 0.426769] ACPI: CPU48 has been hot-added [ 0.426785] ACPI: CPU49 has been hot-added [ 0.426802] ACPI: CPU50 has been hot-added [ 0.426818] ACPI: CPU51 has been hot-added [ 0.426834] ACPI: CPU52 has been hot-added [ 0.426849] ACPI: CPU53 has been hot-added [ 0.426865] ACPI: CPU54 has been hot-added [ 0.426882] ACPI: CPU55 has been hot-added [ 0.426898] ACPI: CPU56 has been hot-added [ 0.426914] ACPI: CPU57 has been hot-added [ 0.426929] ACPI: CPU58 has been hot-added [ 0.426945] ACPI: CPU59 has been hot-added [ 0.426961] ACPI: CPU60 has been hot-added [ 0.426978] ACPI: CPU61 has been hot-added [ 0.426994] ACPI: CPU62 has been hot-added [ 0.427009] ACPI: CPU63 has been hot-added [ 0.427025] ACPI: CPU64 has been hot-added [ 0.427042] ACPI: CPU65 has been hot-added [ 0.427058] ACPI: CPU66 has been hot-added [ 0.427073] ACPI: CPU67 has been hot-added [ 0.427090] ACPI: CPU68 has been hot-added [ 0.427105] ACPI: CPU69 has been hot-added [ 0.427121] ACPI: CPU70 has been hot-added [ 0.427137] ACPI: CPU71 has been hot-added [ 0.427213] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff]) [ 0.427217] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] [ 0.427237] acpi PNP0A08:00: _OSC: platform does not support [PME AER DPC] [ 0.427257] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] [ 0.427372] acpi PNP0A08:00: ECAM area [mem 0x610010000000-0x61001fffffff] reserved by PNP0C02:01 [ 0.427383] acpi PNP0A08:00: ECAM at [mem 0x610010000000-0x61001fffffff] for [bus 00-ff] [ 0.427389] ACPI: Remapped I/O 0x0000610020000000 to [io 0x0000-0xffff window] [ 0.427887] ACPI: PCI Root Bridge [PCI3] (domain 0003 [bus 00-ff]) [ 0.427888] acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] [ 0.427903] acpi PNP0A08:01: _OSC: platform does not support [PME AER DPC] [ 0.427921] acpi PNP0A08:01: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] [ 0.428032] acpi PNP0A08:01: ECAM area [mem 0x618010000000-0x61801fffffff] reserved by PNP0C02:02 [ 0.428039] acpi PNP0A08:01: ECAM at [mem 0x618010000000-0x61801fffffff] for [bus 00-ff] [ 0.428043] ACPI: Remapped I/O 0x0000618020000000 to [io 0x10000-0x1ffff window] [ 0.428358] ACPI: PCI Root Bridge [PCI6] (domain 0006 [bus 00-ff]) [ 0.428359] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] [ 0.428370] acpi PNP0A08:02: _OSC: platform does not support [PME AER DPC] [ 0.428388] acpi PNP0A08:02: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] [ 0.428498] acpi PNP0A08:02: ECAM area [mem 0x630010000000-0x63001fffffff] reserved by PNP0C02:03 [ 0.428505] acpi PNP0A08:02: ECAM at [mem 0x630010000000-0x63001fffffff] for [bus 00-ff] [ 0.428508] ACPI: Remapped I/O 0x0000630020000000 to [io 0x20000-0x2ffff window] [ 0.428907] ACPI: PCI Root Bridge [PCI7] (domain 0007 [bus 00-ff]) [ 0.428908] acpi PNP0A08:03: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] [ 0.428920] acpi PNP0A08:03: _OSC: platform does not support [PME AER DPC] [ 0.428937] acpi PNP0A08:03: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] [ 0.429049] acpi PNP0A08:03: ECAM area [mem 0x640010000000-0x64001fffffff] reserved by PNP0C02:04 [ 0.429054] acpi PNP0A08:03: ECAM at [mem 0x640010000000-0x64001fffffff] for [bus 00-ff] [ 0.429057] ACPI: Remapped I/O 0x0000640020000000 to [io 0x30000-0x3ffff window] [ 0.429354] ACPI: PCI Root Bridge [PCI8] (domain 0008 [bus 00-ff]) [ 0.429355] acpi PNP0A08:04: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] [ 0.429365] acpi PNP0A08:04: _OSC: platform does not support [PME AER DPC] [ 0.429382] acpi PNP0A08:04: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] [ 0.429496] acpi PNP0A08:04: ECAM area [mem 0x650010000000-0x65001fffffff] reserved by PNP0C02:05 [ 0.429501] acpi PNP0A08:04: ECAM at [mem 0x650010000000-0x65001fffffff] for [bus 00-ff] [ 0.429504] ACPI: Remapped I/O 0x0000650020000000 to [io 0x40000-0x4ffff window] [ 0.431221] ACPI: bus type USB registered [ 0.432569] pnp: PnP ACPI init [ 0.432766] pnp: PnP ACPI: found 5 devices [ 0.482111] ACPI: thermal: Thermal Zone [TZ00] (43 C) [ 0.482392] ACPI: thermal: Thermal Zone [TZ01] (41 C) [ 0.482663] ACPI: thermal: Thermal Zone [TZ02] (41 C) [ 0.482937] ACPI: thermal: Thermal Zone [TZ03] (43 C) [ 0.483204] ACPI: thermal: Thermal Zone [TZ04] (40 C) [ 0.483473] ACPI: thermal: Thermal Zone [TZ05] (40 C) [ 0.483740] ACPI: thermal: Thermal Zone [TZ06] (40 C) [ 0.484007] ACPI: thermal: Thermal Zone [TZ07] (37 C) [ 0.484272] ACPI: thermal: Thermal Zone [TZ08] (32 C) [ 0.484558] ACPI: thermal: Thermal Zone [TZ09] (43 C) [ 0.484825] ACPI: thermal: Thermal Zone [TZ0A] (31 C) [ 0.485092] ACPI: thermal: Thermal Zone [TZ0B] (32 C) [ 0.485139] ACPI: thermal: Thermal Zone [TZL0] (57 C) [ 0.485257] ACPI GTDT: found 1 SBSA generic Watchdog(s). [ 0.835253] power_meter ACPI000D:01: Found ACPI power meter. [ 0.835269] power_meter ACPI000D:01: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). [ 0.835301] power_meter ACPI000D:02: Found ACPI power meter. [ 0.835309] power_meter ACPI000D:02: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). [ 0.835325] power_meter ACPI000D:03: Found ACPI power meter. [ 0.835332] power_meter ACPI000D:03: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). [ 1.667708] Modules linked in: spi_tegra210_quad acpi_power_meter loop efivarfs autofs4 [ 2.451653] ACPI: bus type drm_connector registered Thanks for helping us here, Breno
On Wed, Mar 19, 2025 at 11:53:53AM -0700, Breno Leitao wrote: > Hello Thierry, > > On Wed, Mar 19, 2025 at 07:26:52PM +0100, Thierry Reding wrote: > > On Wed, Mar 19, 2025 at 03:09:57AM -0700, Breno Leitao wrote: > > > Hello Arnd, Thierry, Jonathan, Sowjanya, > > > > > > On Tue, Mar 18, 2025 at 09:07:28PM +0100, Arnd Bergmann wrote: > > > > On Tue, Mar 18, 2025, at 20:13, Mark Brown wrote: > > > > > On Tue, Mar 18, 2025 at 08:00:05PM +0100, Arnd Bergmann wrote: > > > > > > > > > >> That does sound like the easiest answer: if the spi controller driver > > > > >> knows that it needs a reset but there is no reset controller, shutting > > > > >> itself down and removing its child devices seems like the least > > > > >> offensive action. > > > > > > > > > > In that case it's probably more just refuse to probe in the first case > > > > > without the reset controller. Given that the device isn't working at > > > > > all it seems like the hardware description is broken anyway... > > > > > > > > Right, I see now that it's doing a rather silly > > > > > > > > if (device_reset(tqspi->dev) < 0) > > > > dev_warn_once(tqspi->dev, "device reset failed\n"); > > > > > > > > after which it just continues instead of propagating returning > > > > the error from the probe function. > > > > > > This would be another option, and I would be happy to update this patch > > > with this suggestion. > > > > > > This patch was attempting to address the issue the other way around, > > > where I was expecting that the reset methods are optional, thus > > > marking the device_reset() function as optional. > > > > > > It appears that on certain UEFI machine types, the ACPI firmware doesn't > > > implement the _RST methods, and device_reset() will *always* fail. It's > > > unclear whether this is due to a broken ACPI table or if it was > > > intentionally designed this way. > > > > > > Tagging the driver maintainer (Thierry, Jonathan, Sowjanya) who might > > > have a better understanding of the design in such cases. > > > > Can you specify what device this is and what software you've been > > running (including firmware, L4T release, etc.)? I can try to find out > > if this is a known issue, or if it's even intended to be this way. > > This is running on a NVIDIA Grace arm64 host. > > Here are a few details I collected, from ACPI tables and dmesg. If this > is not enough, would you mind helping me to find how to get the data you > are looking for? > > DSDT Table: > > * Original Table Header: > * Signature "DSDT" > * Length 0x00001EC9 (7881) > * Revision 0x02 > * Checksum 0x9B > * OEM ID "NVIDIA" > * OEM Table ID "TH500" > * OEM Revision 0x00000001 (1) > * Compiler ID "INTL" > * Compiler Version 0x20220331 (539099953) > > .... > Device (QSP1) > { > Name (_HID, "NVDA1513") // _HID: Hardware ID > Name (_UID, 0x01) // _UID: Unique ID > Name (_CCA, 0x01) // _CCA: Cache Coherency Attribute > Name (_STA, 0x0F) // _STA: Status > Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings > { > Memory32Fixed (ReadWrite, > 0x03250000, // Address Base > 0x00010000, // Address Length > ) > Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, ) > { > 0x0000003A, > } > }) > Name (_DSD, Package (0x02) // _DSD: Device-Specific Data > { > ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301") /* Device Properties for _DSD */, > Package (0x01) > { > Package (0x02) > { > "spi-max-frequency", > 0x00989680 > } > } > }) > } > > dmesg: > > # dmesg | grep -i ACPI > [ 0.000000] efi: RTPROP=0x3c5409e398 ACPI 2.0=0x3c44c7b018 SMBIOS 3.0=0x3c54095618 TPMFinalLog=0x3c42c00000 MEMATTR=0x3c4d60b018 ESRT=0x3c4fa6fb98 TPMEventLog=0x3c42907018 RNG=0x3c44c7aa98 MEMRESERVE=0x3c42904e98 > [ 0.000000] ACPI: Early table checksum verification disabled > [ 0.000000] ACPI: RSDP 0x0000003C44C7B018 000024 (v02 NVIDIA) > [ 0.000000] ACPI: XSDT 0x0000003C44C7B098 00012C (v01 NVIDIA A M I 00000001 AMI 00000001) > [ 0.000000] ACPI: FACP 0x0000003C44C7AD98 000114 (v06 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: DSDT 0x0000003C44C74018 001EC9 (v02 NVIDIA TH500 00000001 INTL 20220331) > [ 0.000000] ACPI: FIDT 0x0000003C44C7BF18 00009C (v01 ALASKA A M I 01072009 AMI 00010013) > [ 0.000000] ACPI: SSDT 0x0000003C44C7BB18 00015E (v02 ALASKA PRMOPREG 00001000 INTL 20220331) > [ 0.000000] ACPI: SPMI 0x0000003C44C7BD98 000041 (v05 ALASKA A M I 00000000 AMI. 00000000) > [ 0.000000] ACPI: FPDT 0x0000003C44C7BE18 000034 (v01 ALASKA T241c1 00000001 AMI 00000001) > [ 0.000000] ACPI: PRMT 0x0000003C44C7BE98 00003C (v00 ALASKA A M I 00000001 AMI 00000001) > [ 0.000000] ACPI: SDEI 0x0000003C44C7A018 000024 (v01 NVIDIA A M I 00000001 NVDA 00000001) > [ 0.000000] ACPI: HEST 0x0000003C44C78018 001054 (v01 NVIDIA A M I 00000001 NVDA 00000001) > [ 0.000000] ACPI: BERT 0x0000003C44C7AF18 000030 (v01 NVIDIA A M I 00000001 NVDA 00000001) > [ 0.000000] ACPI: EINJ 0x0000003C44C7AB18 000170 (v01 NVIDIA A M I 00000001 NVDA 00000001) > [ 0.000000] ACPI: ERST 0x0000003C44C7A098 000290 (v01 NVIDIA A M I 00000001 NVDA 00000001) > [ 0.000000] ACPI: GTDT 0x0000003C44C7A498 000084 (v03 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: APIC 0x0000003C44C76018 001778 (v06 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: PPTT 0x0000003C44C71018 0020F8 (v03 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SSDT 0x0000003C42BA9018 006CA7 (v02 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SPCR 0x0000003C44C7AF98 000050 (v02 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SSDT 0x0000003C44C7A998 0000CC (v02 NVIDIA A M I 00000001 INTL 20220331) > [ 0.000000] ACPI: SSDT 0x0000003C44C7A598 0000C0 (v02 NVIDIA TH500 00000001 INTL 20220331) > [ 0.000000] ACPI: SSDT 0x0000003C42BA7018 00190C (v02 NVIDIA BPMP_S0 00000001 INTL 20220331) > [ 0.000000] ACPI: MCFG 0x0000003C44C7A698 00007C (v01 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SSDT 0x0000003C44C70018 0003AE (v02 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SSDT 0x0000003C44C70418 0003AE (v02 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SSDT 0x0000003C44C70818 0003AF (v02 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SSDT 0x0000003C42BA6018 0003AF (v02 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SSDT 0x0000003C42BA6418 0003AF (v02 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: IORT 0x0000003C42BA5018 00072B (v06 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: APMT 0x0000003C42BA5E98 00013C (v00 ALASKA A M I 00000001 AMI 00000001) > [ 0.000000] ACPI: MPAM 0x0000003C42BA6E98 000084 (v01 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: SLIT 0x0000003C42BA6F98 00002D (v01 ALASKA A M I 00000001 AMI 00000001) > [ 0.000000] ACPI: SRAT 0x0000003C42BA6818 000574 (v03 NVIDIA A M I 00000001 ARMH 00010000) > [ 0.000000] ACPI: HMAT 0x0000003C44C70E98 0000A6 (v02 ALASKA A M I 00000001 AMI 00000001) > [ 0.000000] ACPI: WSMT 0x0000003C44C70F98 000028 (v01 ALASKA A M I 00000001 AMI 00000001) > [ 0.000000] ACPI: TPM2 0x0000003C44C7A918 00004C (v04 ALASKA A M I 00000001 AMI 00000000) > [ 0.000000] ACPI: SPCR: console: pl011,mmio32,0xc280000,115200 > [ 0.000000] ACPI: Use ACPI SPCR as default console: Yes > [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x80000000-0x3c7fffffff] > [ 0.000000] psci: probing for conduit method from ACPI. > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x20000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x40000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x50000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x60000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x70000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x80000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x90000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0xa0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0xb0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0xe0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x100000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x110000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x120000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x130000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x140000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x150000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x160000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x170000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x180000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x190000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1a0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1c0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1d0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1e0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1f0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x200000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x210000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x220000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x230000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x240000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x250000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x260000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x270000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x280000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x290000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2a0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2b0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2c0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2d0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2e0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x2f0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x300000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x310000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x320000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x330000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x340000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x350000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x360000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x370000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x380000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3a0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3b0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3c0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3d0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3e0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x3f0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x400000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x410000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x420000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x430000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x440000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x460000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x480000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x490000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4a0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4b0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4c0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4d0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4e0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x4f0000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x500000 -> Node 0 > [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x510000 -> Node 0 > [ 0.000320] ACPI: Core revision 20240827 > [ 0.420784] ACPI: Added _OSI(Module Device) > [ 0.420785] ACPI: Added _OSI(Processor Device) > [ 0.420785] ACPI: Added _OSI(3.0 _SCP Extensions) > [ 0.420786] ACPI: Added _OSI(Processor Aggregator Device) > [ 0.421939] ACPI: 11 ACPI AML tables successfully acquired and loaded > [ 0.422284] ACPI: Interpreter enabled > [ 0.422286] ACPI: Using GIC for interrupt routing > [ 0.422296] ACPI: MCFG table detected, 5 entries > [ 0.422732] ACPI: IORT: SMMU-v3[11000000] Mapped to Proximity domain 0 > [ 0.422746] ACPI: IORT: SMMU-v3[12000000] Mapped to Proximity domain 0 > [ 0.422757] ACPI: IORT: SMMU-v3[15000000] Mapped to Proximity domain 0 > [ 0.422767] ACPI: IORT: SMMU-v3[16000000] Mapped to Proximity domain 0 > [ 0.422780] ACPI: IORT: SMMU-v3[5000000] Mapped to Proximity domain 0 > [ 0.425981] ACPI: CPU0 has been hot-added > [ 0.426000] ACPI: CPU1 has been hot-added > [ 0.426017] ACPI: CPU2 has been hot-added > [ 0.426034] ACPI: CPU3 has been hot-added > [ 0.426050] ACPI: CPU4 has been hot-added > [ 0.426066] ACPI: CPU5 has been hot-added > [ 0.426083] ACPI: CPU6 has been hot-added > [ 0.426099] ACPI: CPU7 has been hot-added > [ 0.426117] ACPI: CPU8 has been hot-added > [ 0.426132] ACPI: CPU9 has been hot-added > [ 0.426149] ACPI: CPU10 has been hot-added > [ 0.426164] ACPI: CPU11 has been hot-added > [ 0.426181] ACPI: CPU12 has been hot-added > [ 0.426196] ACPI: CPU13 has been hot-added > [ 0.426212] ACPI: CPU14 has been hot-added > [ 0.426228] ACPI: CPU15 has been hot-added > [ 0.426245] ACPI: CPU16 has been hot-added > [ 0.426262] ACPI: CPU17 has been hot-added > [ 0.426278] ACPI: CPU18 has been hot-added > [ 0.426295] ACPI: CPU19 has been hot-added > [ 0.426310] ACPI: CPU20 has been hot-added > [ 0.426327] ACPI: CPU21 has been hot-added > [ 0.426342] ACPI: CPU22 has been hot-added > [ 0.426358] ACPI: CPU23 has been hot-added > [ 0.426374] ACPI: CPU24 has been hot-added > [ 0.426390] ACPI: CPU25 has been hot-added > [ 0.426406] ACPI: CPU26 has been hot-added > [ 0.426421] ACPI: CPU27 has been hot-added > [ 0.426438] ACPI: CPU28 has been hot-added > [ 0.426453] ACPI: CPU29 has been hot-added > [ 0.426470] ACPI: CPU30 has been hot-added > [ 0.426485] ACPI: CPU31 has been hot-added > [ 0.426502] ACPI: CPU32 has been hot-added > [ 0.426517] ACPI: CPU33 has been hot-added > [ 0.426533] ACPI: CPU34 has been hot-added > [ 0.426549] ACPI: CPU35 has been hot-added > [ 0.426571] ACPI: CPU36 has been hot-added > [ 0.426587] ACPI: CPU37 has been hot-added > [ 0.426602] ACPI: CPU38 has been hot-added > [ 0.426620] ACPI: CPU39 has been hot-added > [ 0.426640] ACPI: CPU40 has been hot-added > [ 0.426657] ACPI: CPU41 has been hot-added > [ 0.426672] ACPI: CPU42 has been hot-added > [ 0.426688] ACPI: CPU43 has been hot-added > [ 0.426706] ACPI: CPU44 has been hot-added > [ 0.426721] ACPI: CPU45 has been hot-added > [ 0.426737] ACPI: CPU46 has been hot-added > [ 0.426752] ACPI: CPU47 has been hot-added > [ 0.426769] ACPI: CPU48 has been hot-added > [ 0.426785] ACPI: CPU49 has been hot-added > [ 0.426802] ACPI: CPU50 has been hot-added > [ 0.426818] ACPI: CPU51 has been hot-added > [ 0.426834] ACPI: CPU52 has been hot-added > [ 0.426849] ACPI: CPU53 has been hot-added > [ 0.426865] ACPI: CPU54 has been hot-added > [ 0.426882] ACPI: CPU55 has been hot-added > [ 0.426898] ACPI: CPU56 has been hot-added > [ 0.426914] ACPI: CPU57 has been hot-added > [ 0.426929] ACPI: CPU58 has been hot-added > [ 0.426945] ACPI: CPU59 has been hot-added > [ 0.426961] ACPI: CPU60 has been hot-added > [ 0.426978] ACPI: CPU61 has been hot-added > [ 0.426994] ACPI: CPU62 has been hot-added > [ 0.427009] ACPI: CPU63 has been hot-added > [ 0.427025] ACPI: CPU64 has been hot-added > [ 0.427042] ACPI: CPU65 has been hot-added > [ 0.427058] ACPI: CPU66 has been hot-added > [ 0.427073] ACPI: CPU67 has been hot-added > [ 0.427090] ACPI: CPU68 has been hot-added > [ 0.427105] ACPI: CPU69 has been hot-added > [ 0.427121] ACPI: CPU70 has been hot-added > [ 0.427137] ACPI: CPU71 has been hot-added > [ 0.427213] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff]) > [ 0.427217] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > [ 0.427237] acpi PNP0A08:00: _OSC: platform does not support [PME AER DPC] > [ 0.427257] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] > [ 0.427372] acpi PNP0A08:00: ECAM area [mem 0x610010000000-0x61001fffffff] reserved by PNP0C02:01 > [ 0.427383] acpi PNP0A08:00: ECAM at [mem 0x610010000000-0x61001fffffff] for [bus 00-ff] > [ 0.427389] ACPI: Remapped I/O 0x0000610020000000 to [io 0x0000-0xffff window] > [ 0.427887] ACPI: PCI Root Bridge [PCI3] (domain 0003 [bus 00-ff]) > [ 0.427888] acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > [ 0.427903] acpi PNP0A08:01: _OSC: platform does not support [PME AER DPC] > [ 0.427921] acpi PNP0A08:01: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] > [ 0.428032] acpi PNP0A08:01: ECAM area [mem 0x618010000000-0x61801fffffff] reserved by PNP0C02:02 > [ 0.428039] acpi PNP0A08:01: ECAM at [mem 0x618010000000-0x61801fffffff] for [bus 00-ff] > [ 0.428043] ACPI: Remapped I/O 0x0000618020000000 to [io 0x10000-0x1ffff window] > [ 0.428358] ACPI: PCI Root Bridge [PCI6] (domain 0006 [bus 00-ff]) > [ 0.428359] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > [ 0.428370] acpi PNP0A08:02: _OSC: platform does not support [PME AER DPC] > [ 0.428388] acpi PNP0A08:02: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] > [ 0.428498] acpi PNP0A08:02: ECAM area [mem 0x630010000000-0x63001fffffff] reserved by PNP0C02:03 > [ 0.428505] acpi PNP0A08:02: ECAM at [mem 0x630010000000-0x63001fffffff] for [bus 00-ff] > [ 0.428508] ACPI: Remapped I/O 0x0000630020000000 to [io 0x20000-0x2ffff window] > [ 0.428907] ACPI: PCI Root Bridge [PCI7] (domain 0007 [bus 00-ff]) > [ 0.428908] acpi PNP0A08:03: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > [ 0.428920] acpi PNP0A08:03: _OSC: platform does not support [PME AER DPC] > [ 0.428937] acpi PNP0A08:03: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] > [ 0.429049] acpi PNP0A08:03: ECAM area [mem 0x640010000000-0x64001fffffff] reserved by PNP0C02:04 > [ 0.429054] acpi PNP0A08:03: ECAM at [mem 0x640010000000-0x64001fffffff] for [bus 00-ff] > [ 0.429057] ACPI: Remapped I/O 0x0000640020000000 to [io 0x30000-0x3ffff window] > [ 0.429354] ACPI: PCI Root Bridge [PCI8] (domain 0008 [bus 00-ff]) > [ 0.429355] acpi PNP0A08:04: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > [ 0.429365] acpi PNP0A08:04: _OSC: platform does not support [PME AER DPC] > [ 0.429382] acpi PNP0A08:04: _OSC: OS now controls [PCIeHotplug PCIeCapability LTR] > [ 0.429496] acpi PNP0A08:04: ECAM area [mem 0x650010000000-0x65001fffffff] reserved by PNP0C02:05 > [ 0.429501] acpi PNP0A08:04: ECAM at [mem 0x650010000000-0x65001fffffff] for [bus 00-ff] > [ 0.429504] ACPI: Remapped I/O 0x0000650020000000 to [io 0x40000-0x4ffff window] > [ 0.431221] ACPI: bus type USB registered > [ 0.432569] pnp: PnP ACPI init > [ 0.432766] pnp: PnP ACPI: found 5 devices > [ 0.482111] ACPI: thermal: Thermal Zone [TZ00] (43 C) > [ 0.482392] ACPI: thermal: Thermal Zone [TZ01] (41 C) > [ 0.482663] ACPI: thermal: Thermal Zone [TZ02] (41 C) > [ 0.482937] ACPI: thermal: Thermal Zone [TZ03] (43 C) > [ 0.483204] ACPI: thermal: Thermal Zone [TZ04] (40 C) > [ 0.483473] ACPI: thermal: Thermal Zone [TZ05] (40 C) > [ 0.483740] ACPI: thermal: Thermal Zone [TZ06] (40 C) > [ 0.484007] ACPI: thermal: Thermal Zone [TZ07] (37 C) > [ 0.484272] ACPI: thermal: Thermal Zone [TZ08] (32 C) > [ 0.484558] ACPI: thermal: Thermal Zone [TZ09] (43 C) > [ 0.484825] ACPI: thermal: Thermal Zone [TZ0A] (31 C) > [ 0.485092] ACPI: thermal: Thermal Zone [TZ0B] (32 C) > [ 0.485139] ACPI: thermal: Thermal Zone [TZL0] (57 C) > [ 0.485257] ACPI GTDT: found 1 SBSA generic Watchdog(s). > [ 0.835253] power_meter ACPI000D:01: Found ACPI power meter. > [ 0.835269] power_meter ACPI000D:01: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). > [ 0.835301] power_meter ACPI000D:02: Found ACPI power meter. > [ 0.835309] power_meter ACPI000D:02: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). > [ 0.835325] power_meter ACPI000D:03: Found ACPI power meter. > [ 0.835332] power_meter ACPI000D:03: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). > [ 1.667708] Modules linked in: spi_tegra210_quad acpi_power_meter loop efivarfs autofs4 > [ 2.451653] ACPI: bus type drm_connector registered > > Thanks for helping us here, Can you maybe help clarify at what point you start seeing errors induced by the recovery mechanism? Is this happening immediately on boot? Or does something go wrong later down the line and then you start seeing recovery issues? I wonder if there's an issue with hand-off from the bootloader/firmware, or if there's a bug in the kernel that causes the failure and subsequent error messages due to the missing _RST. Thierry
Hello Thierry, On Fri, Mar 21, 2025 at 01:40:44PM +0100, Thierry Reding wrote: > Can you maybe help clarify at what point you start seeing errors induced > by the recovery mechanism? This is after a while. Something happen to QSPI and the warnings and device reset failed start going haywire. Most of the machines are fine, but, some get into this situation. Thanks --breno
On Fri, Mar 21, 2025 at 09:28:34AM -0700, Breno Leitao wrote: > Hello Thierry, > > On Fri, Mar 21, 2025 at 01:40:44PM +0100, Thierry Reding wrote: > > Can you maybe help clarify at what point you start seeing errors induced > > by the recovery mechanism? > > This is after a while. Something happen to QSPI and the warnings and > device reset failed start going haywire. > > Most of the machines are fine, but, some get into this situation. Is it always the same devices, or does it happen randomly? Thierry
On Mon, Mar 24, 2025 at 02:17:11PM +0100, Thierry Reding wrote: > On Fri, Mar 21, 2025 at 09:28:34AM -0700, Breno Leitao wrote: > > Hello Thierry, > > > > On Fri, Mar 21, 2025 at 01:40:44PM +0100, Thierry Reding wrote: > > > Can you maybe help clarify at what point you start seeing errors induced > > > by the recovery mechanism? > > > > This is after a while. Something happen to QSPI and the warnings and > > device reset failed start going haywire. > > > > Most of the machines are fine, but, some get into this situation. > > Is it always the same devices, or does it happen randomly? We got this in two different and unrelated machines, already. I want to come back to how the driver should behave. We probably want to distinguish what is the correct behaviour we expect from the driver, they are (IMO): 1) The reset handlers are NOT optional and the device should fail to probe. 2) The reset handlers ARE optional, and we should mark them as such. Can you shed some light on what is the right behaviour we want to implement? From what I am hearing, we are more inclined towards 2). Is this correct? Thanks for helping us to figure out this issue, --breno
On Tue, Mar 25, 2025 at 09:56:10AM -0700, Breno Leitao wrote: > On Mon, Mar 24, 2025 at 02:17:11PM +0100, Thierry Reding wrote: > > On Fri, Mar 21, 2025 at 09:28:34AM -0700, Breno Leitao wrote: > > > Hello Thierry, > > > > > > On Fri, Mar 21, 2025 at 01:40:44PM +0100, Thierry Reding wrote: > > > > Can you maybe help clarify at what point you start seeing errors induced > > > > by the recovery mechanism? > > > > > > This is after a while. Something happen to QSPI and the warnings and > > > device reset failed start going haywire. > > > > > > Most of the machines are fine, but, some get into this situation. > > > > Is it always the same devices, or does it happen randomly? > > We got this in two different and unrelated machines, already. > > I want to come back to how the driver should behave. We probably want to > distinguish what is the correct behaviour we expect from the driver, > they are (IMO): > > 1) The reset handlers are NOT optional and the device should fail to > probe. > > 2) The reset handlers ARE optional, and we should mark them as such. > > Can you shed some light on what is the right behaviour we want to > implement? > > From what I am hearing, we are more inclined towards 2). Is this > correct? Yes, I think 2) is what I'd be inclined towards. _RST is clearly not available for at least certain firmware releases, so they are de-facto optional. Even if they are ever implemented, it'd be wise to keep supporting the case where they are not available, so treating them as optional is the right way to go. Thierry
diff --git a/drivers/spi/spi-tegra210-quad.c b/drivers/spi/spi-tegra210-quad.c index 08e49a8768943..9027f995a6669 100644 --- a/drivers/spi/spi-tegra210-quad.c +++ b/drivers/spi/spi-tegra210-quad.c @@ -999,7 +999,7 @@ static void tegra_qspi_handle_error(struct tegra_qspi *tqspi) dev_err(tqspi->dev, "error in transfer, fifo status 0x%08x\n", tqspi->status_reg); tegra_qspi_dump_regs(tqspi); tegra_qspi_flush_fifos(tqspi, true); - if (device_reset(tqspi->dev) < 0) + if (device_reset_optional(tqspi->dev) < 0) dev_warn_once(tqspi->dev, "device reset failed\n"); } @@ -1149,7 +1149,7 @@ static int tegra_qspi_combined_seq_xfer(struct tegra_qspi *tqspi, } /* Reset controller if timeout happens */ - if (device_reset(tqspi->dev) < 0) + if (device_reset_optional(tqspi->dev) < 0) dev_warn_once(tqspi->dev, "device reset failed\n"); ret = -EIO; @@ -1606,7 +1606,7 @@ static int tegra_qspi_probe(struct platform_device *pdev) goto exit_pm_disable; } - if (device_reset(tqspi->dev) < 0) + if (device_reset_optional(tqspi->dev) < 0) dev_warn_once(tqspi->dev, "device reset failed\n"); tqspi->def_command1_reg = QSPI_M_S | QSPI_CS_SW_HW | QSPI_CS_SW_VAL;
My UEFI machines with tegra210-quad consistently report "device reset failed". Investigation showed this isn't an actual failure - __device_reset() returns -ENOENT because ACPI has no "*_RST" method. Replace device_reset() with device_reset_optional() to prevent errors when the reset method doesn't exist. With this change, the function only fails if the actual device reset operation fails when called. Signed-off-by: Breno Leitao <leitao@debian.org> --- drivers/spi/spi-tegra210-quad.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)