diff mbox series

iommu/arm-smmu: Demote error messages to debug in shutdown callback

Message ID 20200327132852.10352-1-saiprakash.ranjan@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series iommu/arm-smmu: Demote error messages to debug in shutdown callback | expand

Commit Message

Sai Prakash Ranjan March 27, 2020, 1:28 p.m. UTC
Currently on reboot/shutdown, the following messages are
displayed on the console as error messages before the
system reboots/shutdown.

On SC7180:

  arm-smmu 15000000.iommu: removing device with active domains!
  arm-smmu 5040000.iommu: removing device with active domains!

Demote the log level to debug since it does not offer much
help in identifying/fixing any issue as the system is anyways
going down and reduce spamming the kernel log.

Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Robin Murphy March 27, 2020, 2:12 p.m. UTC | #1
On 2020-03-27 1:28 pm, Sai Prakash Ranjan wrote:
> Currently on reboot/shutdown, the following messages are
> displayed on the console as error messages before the
> system reboots/shutdown.
> 
> On SC7180:
> 
>    arm-smmu 15000000.iommu: removing device with active domains!
>    arm-smmu 5040000.iommu: removing device with active domains!
> 
> Demote the log level to debug since it does not offer much
> help in identifying/fixing any issue as the system is anyways
> going down and reduce spamming the kernel log.

I've gone back and forth on this pretty much ever since we added the 
shutdown hook - on the other hand, if any devices *are* still running in 
those domains at this point, then once we turn off the SMMU and let 
those IOVAs go out on the bus as physical addresses, all manner of 
weirdness may ensue. Thus there is an argument for *some* indication 
that this may happen, although IMO it could be downgraded to at least 
dev_warn().

Robin.

> Reported-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> ---
>   drivers/iommu/arm-smmu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 16c4b87af42b..0a865e32054a 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -2250,7 +2250,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>   		return -ENODEV;
>   
>   	if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS))
> -		dev_err(&pdev->dev, "removing device with active domains!\n");
> +		dev_dbg(&pdev->dev, "removing device with active domains!\n");
>   
>   	arm_smmu_bus_init(NULL);
>   	iommu_device_unregister(&smmu->iommu);
>
Sai Prakash Ranjan March 27, 2020, 3:09 p.m. UTC | #2
Hi Robin,

Thanks for taking a look at this.

On 2020-03-27 19:42, Robin Murphy wrote:
> On 2020-03-27 1:28 pm, Sai Prakash Ranjan wrote:
>> Currently on reboot/shutdown, the following messages are
>> displayed on the console as error messages before the
>> system reboots/shutdown.
>> 
>> On SC7180:
>> 
>>    arm-smmu 15000000.iommu: removing device with active domains!
>>    arm-smmu 5040000.iommu: removing device with active domains!
>> 
>> Demote the log level to debug since it does not offer much
>> help in identifying/fixing any issue as the system is anyways
>> going down and reduce spamming the kernel log.
> 
> I've gone back and forth on this pretty much ever since we added the
> shutdown hook - on the other hand, if any devices *are* still running
> in those domains at this point, then once we turn off the SMMU and let
> those IOVAs go out on the bus as physical addresses, all manner of
> weirdness may ensue. Thus there is an argument for *some* indication
> that this may happen, although IMO it could be downgraded to at least
> dev_warn().
> 

Any pointers to the weirdness here after SMMU is turned off?
Because if we look at the call sites, device_shutdown is called
from kernel_restart_prepare or kernel_shutdown_prepare which would
mean system is going down anyways, so do we really care about these
error messages or warnings from SMMU?

  arm_smmu_device_shutdown
   platform_drv_shutdown
    device_shutdown
     kernel_restart_prepare
      kernel_restart


Thanks,
Sai
Rob Clark March 27, 2020, 4:17 p.m. UTC | #3
On Fri, Mar 27, 2020 at 8:10 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> Hi Robin,
>
> Thanks for taking a look at this.
>
> On 2020-03-27 19:42, Robin Murphy wrote:
> > On 2020-03-27 1:28 pm, Sai Prakash Ranjan wrote:
> >> Currently on reboot/shutdown, the following messages are
> >> displayed on the console as error messages before the
> >> system reboots/shutdown.
> >>
> >> On SC7180:
> >>
> >>    arm-smmu 15000000.iommu: removing device with active domains!
> >>    arm-smmu 5040000.iommu: removing device with active domains!
> >>
> >> Demote the log level to debug since it does not offer much
> >> help in identifying/fixing any issue as the system is anyways
> >> going down and reduce spamming the kernel log.
> >
> > I've gone back and forth on this pretty much ever since we added the
> > shutdown hook - on the other hand, if any devices *are* still running
> > in those domains at this point, then once we turn off the SMMU and let
> > those IOVAs go out on the bus as physical addresses, all manner of
> > weirdness may ensue. Thus there is an argument for *some* indication
> > that this may happen, although IMO it could be downgraded to at least
> > dev_warn().
> >
>
> Any pointers to the weirdness here after SMMU is turned off?
> Because if we look at the call sites, device_shutdown is called
> from kernel_restart_prepare or kernel_shutdown_prepare which would
> mean system is going down anyways, so do we really care about these
> error messages or warnings from SMMU?
>
>   arm_smmu_device_shutdown
>    platform_drv_shutdown
>     device_shutdown
>      kernel_restart_prepare
>       kernel_restart
>

I'd guess that drm/msm is not detaching all of it's UNMANAGED domains
in shutdown.  Although *presumably* the device_link stuff would
prevent the SMMU from shutting down while gpu/display is still active?
 If not I think we have bigger problems.

I hadn't really noticed the error msgs before, not sure if that is
just because the screen is off by the time they happen or if they are
a new warning..

BR,
-R
Sai Prakash Ranjan March 27, 2020, 6:18 p.m. UTC | #4
Hi Rob,

On 2020-03-27 21:47, Rob Clark wrote:
> On Fri, Mar 27, 2020 at 8:10 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>> 
>> Hi Robin,
>> 
>> Thanks for taking a look at this.
>> 
>> On 2020-03-27 19:42, Robin Murphy wrote:
>> > On 2020-03-27 1:28 pm, Sai Prakash Ranjan wrote:
>> >> Currently on reboot/shutdown, the following messages are
>> >> displayed on the console as error messages before the
>> >> system reboots/shutdown.
>> >>
>> >> On SC7180:
>> >>
>> >>    arm-smmu 15000000.iommu: removing device with active domains!
>> >>    arm-smmu 5040000.iommu: removing device with active domains!
>> >>
>> >> Demote the log level to debug since it does not offer much
>> >> help in identifying/fixing any issue as the system is anyways
>> >> going down and reduce spamming the kernel log.
>> >
>> > I've gone back and forth on this pretty much ever since we added the
>> > shutdown hook - on the other hand, if any devices *are* still running
>> > in those domains at this point, then once we turn off the SMMU and let
>> > those IOVAs go out on the bus as physical addresses, all manner of
>> > weirdness may ensue. Thus there is an argument for *some* indication
>> > that this may happen, although IMO it could be downgraded to at least
>> > dev_warn().
>> >
>> 
>> Any pointers to the weirdness here after SMMU is turned off?
>> Because if we look at the call sites, device_shutdown is called
>> from kernel_restart_prepare or kernel_shutdown_prepare which would
>> mean system is going down anyways, so do we really care about these
>> error messages or warnings from SMMU?
>> 
>>   arm_smmu_device_shutdown
>>    platform_drv_shutdown
>>     device_shutdown
>>      kernel_restart_prepare
>>       kernel_restart
>> 
> 
> I'd guess that drm/msm is not detaching all of it's UNMANAGED domains
> in shutdown.  Although *presumably* the device_link stuff would
> prevent the SMMU from shutting down while gpu/display is still active?
>  If not I think we have bigger problems.
> 

Where is the shutdown callback in drm/msm? I don't see any...

Thanks,
Sai
Robin Murphy March 27, 2020, 7:02 p.m. UTC | #5
On 2020-03-27 3:09 pm, Sai Prakash Ranjan wrote:
> Hi Robin,
> 
> Thanks for taking a look at this.
> 
> On 2020-03-27 19:42, Robin Murphy wrote:
>> On 2020-03-27 1:28 pm, Sai Prakash Ranjan wrote:
>>> Currently on reboot/shutdown, the following messages are
>>> displayed on the console as error messages before the
>>> system reboots/shutdown.
>>>
>>> On SC7180:
>>>
>>>    arm-smmu 15000000.iommu: removing device with active domains!
>>>    arm-smmu 5040000.iommu: removing device with active domains!
>>>
>>> Demote the log level to debug since it does not offer much
>>> help in identifying/fixing any issue as the system is anyways
>>> going down and reduce spamming the kernel log.
>>
>> I've gone back and forth on this pretty much ever since we added the
>> shutdown hook - on the other hand, if any devices *are* still running
>> in those domains at this point, then once we turn off the SMMU and let
>> those IOVAs go out on the bus as physical addresses, all manner of
>> weirdness may ensue. Thus there is an argument for *some* indication
>> that this may happen, although IMO it could be downgraded to at least
>> dev_warn().
>>
> 
> Any pointers to the weirdness here after SMMU is turned off?
> Because if we look at the call sites, device_shutdown is called
> from kernel_restart_prepare or kernel_shutdown_prepare which would
> mean system is going down anyways, so do we really care about these
> error messages or warnings from SMMU?
> 
>   arm_smmu_device_shutdown
>    platform_drv_shutdown
>     device_shutdown
>      kernel_restart_prepare
>       kernel_restart

Imagine your network driver doesn't implement a .shutdown method (so the 
hardware is still active regardless of device links), happens to have an 
Rx buffer or descriptor ring DMA-mapped at an IOVA that looks like the 
physical address of the memory containing some part of the kernel text 
lower down that call stack, and the MAC receives a broadcast IP packet 
at about the point arm_smmu_device_shutdown() is returning. Enjoy 
debugging that ;)

And if coincidental memory corruption seems too far-fetched for your 
liking, other fun alternatives might include "display tries to scan out 
from powered-off device, deadlocks interconnect and prevents anything 
else making progress", or "access to TZC-protected physical address 
triggers interrupt and over-eager Secure firmware resets system before 
orderly poweroff has a chance to finish".

Of course the fact that in practice we'll *always* see the warning 
because there's no way to tear down the default DMA domains, and even if 
all devices *have* been nicely quiesced there's no way to tell, is 
certainly less than ideal. Like I say, it's not entirely clear-cut 
either way...

Robin.
Sai Prakash Ranjan March 28, 2020, 7:35 a.m. UTC | #6
Hi Robin,

On 2020-03-28 00:32, Robin Murphy wrote:
> On 2020-03-27 3:09 pm, Sai Prakash Ranjan wrote:
> 
> Imagine your network driver doesn't implement a .shutdown method (so
> the hardware is still active regardless of device links), happens to
> have an Rx buffer or descriptor ring DMA-mapped at an IOVA that looks
> like the physical address of the memory containing some part of the
> kernel text lower down that call stack, and the MAC receives a
> broadcast IP packet at about the point arm_smmu_device_shutdown() is
> returning. Enjoy debugging that ;)
> 
> And if coincidental memory corruption seems too far-fetched for your
> liking, other fun alternatives might include "display tries to scan
> out from powered-off device, deadlocks interconnect and prevents
> anything else making progress", or "access to TZC-protected physical
> address triggers interrupt and over-eager Secure firmware resets
> system before orderly poweroff has a chance to finish".
> 
> Of course the fact that in practice we'll *always* see the warning
> because there's no way to tear down the default DMA domains, and even
> if all devices *have* been nicely quiesced there's no way to tell, is
> certainly less than ideal. Like I say, it's not entirely clear-cut
> either way...
> 

Thanks for these examples, good to know these scenarios in case we come 
across these.
However, if we see these error/warning messages appear everytime then 
what will be
the credibility of these messages? We will just ignore these messages 
when
these issues you mention actually appears because we see them everytime 
on
reboot or shutdown. So doesn't it make sense to enable these only when
we are debugging? We could argue that how will we know the issue could 
be related
to SMMU, but that's the case even now.

The reason why this came up was that, we had a NOC(interconnect) error 
which does
have a logging atleast in QCOM platforms from the secure side(it prints 
these on the console)
after the SMMU err messages and there was a confusion if it was related 
to these messages.
However, NOC error messages did identify the issue with the USB and it 
was solved later.
So these SMMU err/warning messages could be misleading like the above 
case almost everytime.

The probability of the issues you mentioned occuring is very less than 
the actual reboot,
shutdown scenarios, for ex: we run reboot stress test for thousands of 
times and these messages
don't add anything special in those cases when any issue occurs because 
they are seen
everytime.

Thanks,
Sai
Doug Anderson March 30, 2020, 6:24 p.m. UTC | #7
Hi,

On Sat, Mar 28, 2020 at 12:35 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> > Of course the fact that in practice we'll *always* see the warning
> > because there's no way to tear down the default DMA domains, and even
> > if all devices *have* been nicely quiesced there's no way to tell, is
> > certainly less than ideal. Like I say, it's not entirely clear-cut
> > either way...
> >
>
> Thanks for these examples, good to know these scenarios in case we come
> across these.
> However, if we see these error/warning messages appear everytime then
> what will be
> the credibility of these messages? We will just ignore these messages
> when
> these issues you mention actually appears because we see them everytime
> on
> reboot or shutdown.

I would agree that if these messages are expected to be seen every
time, there's no way to fix them, and they're not indicative of any
problem then something should be done.  Seeing something printed at
"dev_error" level with an exclamation point (!) at the end makes me
feel like this is something that needs immediate action on my part.

If we really can't do better but feel that the messages need to be
there, at least make them dev_info and less scary like:

  arm-smmu 15000000.iommu: turning off; DMA should be quiesced before now

...that would still give you a hint in the logs that if you saw a DMA
transaction after the message that it was a bug but also wouldn't
sound scary to someone who wasn't seeing any other problems.

-Doug
Sai Prakash Ranjan March 31, 2020, 7:36 a.m. UTC | #8
Hi,

On 2020-03-30 23:54, Doug Anderson wrote:
> Hi,
> 
> On Sat, Mar 28, 2020 at 12:35 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>> 
>> > Of course the fact that in practice we'll *always* see the warning
>> > because there's no way to tear down the default DMA domains, and even
>> > if all devices *have* been nicely quiesced there's no way to tell, is
>> > certainly less than ideal. Like I say, it's not entirely clear-cut
>> > either way...
>> >
>> 
>> Thanks for these examples, good to know these scenarios in case we 
>> come
>> across these.
>> However, if we see these error/warning messages appear everytime then
>> what will be
>> the credibility of these messages? We will just ignore these messages
>> when
>> these issues you mention actually appears because we see them 
>> everytime
>> on
>> reboot or shutdown.
> 
> I would agree that if these messages are expected to be seen every
> time, there's no way to fix them, and they're not indicative of any
> problem then something should be done.  Seeing something printed at
> "dev_error" level with an exclamation point (!) at the end makes me
> feel like this is something that needs immediate action on my part.
> 
> If we really can't do better but feel that the messages need to be
> there, at least make them dev_info and less scary like:
> 
>   arm-smmu 15000000.iommu: turning off; DMA should be quiesced before 
> now
> 
> ...that would still give you a hint in the logs that if you saw a DMA
> transaction after the message that it was a bug but also wouldn't
> sound scary to someone who wasn't seeing any other problems.
> 

We can do this if Robin is OK?

-Sai
Will Deacon March 31, 2020, 7:44 a.m. UTC | #9
On Tue, Mar 31, 2020 at 01:06:11PM +0530, Sai Prakash Ranjan wrote:
> On 2020-03-30 23:54, Doug Anderson wrote:
> > On Sat, Mar 28, 2020 at 12:35 AM Sai Prakash Ranjan
> > <saiprakash.ranjan@codeaurora.org> wrote:
> > > 
> > > > Of course the fact that in practice we'll *always* see the warning
> > > > because there's no way to tear down the default DMA domains, and even
> > > > if all devices *have* been nicely quiesced there's no way to tell, is
> > > > certainly less than ideal. Like I say, it's not entirely clear-cut
> > > > either way...
> > > >
> > > 
> > > Thanks for these examples, good to know these scenarios in case we
> > > come
> > > across these.
> > > However, if we see these error/warning messages appear everytime then
> > > what will be
> > > the credibility of these messages? We will just ignore these messages
> > > when
> > > these issues you mention actually appears because we see them
> > > everytime
> > > on
> > > reboot or shutdown.
> > 
> > I would agree that if these messages are expected to be seen every
> > time, there's no way to fix them, and they're not indicative of any
> > problem then something should be done.  Seeing something printed at
> > "dev_error" level with an exclamation point (!) at the end makes me
> > feel like this is something that needs immediate action on my part.
> > 
> > If we really can't do better but feel that the messages need to be
> > there, at least make them dev_info and less scary like:
> > 
> >   arm-smmu 15000000.iommu: turning off; DMA should be quiesced before
> > now
> > 
> > ...that would still give you a hint in the logs that if you saw a DMA
> > transaction after the message that it was a bug but also wouldn't
> > sound scary to someone who wasn't seeing any other problems.
> > 
> 
> We can do this if Robin is OK?

It would be nice if you could figure out which domains are still live when
the SMMU is being shut down in your case and verify that it *is* infact
benign before we start making the message more friendly. As Robin said
earlier, rogue DMA is a real nightmare to debug.

Will
Sai Prakash Ranjan March 31, 2020, 7:53 a.m. UTC | #10
Hi Will,

On 2020-03-31 13:14, Will Deacon wrote:
> On Tue, Mar 31, 2020 at 01:06:11PM +0530, Sai Prakash Ranjan wrote:
>> On 2020-03-30 23:54, Doug Anderson wrote:
>> > On Sat, Mar 28, 2020 at 12:35 AM Sai Prakash Ranjan
>> > <saiprakash.ranjan@codeaurora.org> wrote:
>> > >
>> > > > Of course the fact that in practice we'll *always* see the warning
>> > > > because there's no way to tear down the default DMA domains, and even
>> > > > if all devices *have* been nicely quiesced there's no way to tell, is
>> > > > certainly less than ideal. Like I say, it's not entirely clear-cut
>> > > > either way...
>> > > >
>> > >
>> > > Thanks for these examples, good to know these scenarios in case we
>> > > come
>> > > across these.
>> > > However, if we see these error/warning messages appear everytime then
>> > > what will be
>> > > the credibility of these messages? We will just ignore these messages
>> > > when
>> > > these issues you mention actually appears because we see them
>> > > everytime
>> > > on
>> > > reboot or shutdown.
>> >
>> > I would agree that if these messages are expected to be seen every
>> > time, there's no way to fix them, and they're not indicative of any
>> > problem then something should be done.  Seeing something printed at
>> > "dev_error" level with an exclamation point (!) at the end makes me
>> > feel like this is something that needs immediate action on my part.
>> >
>> > If we really can't do better but feel that the messages need to be
>> > there, at least make them dev_info and less scary like:
>> >
>> >   arm-smmu 15000000.iommu: turning off; DMA should be quiesced before
>> > now
>> >
>> > ...that would still give you a hint in the logs that if you saw a DMA
>> > transaction after the message that it was a bug but also wouldn't
>> > sound scary to someone who wasn't seeing any other problems.
>> >
>> 
>> We can do this if Robin is OK?
> 
> It would be nice if you could figure out which domains are still live 
> when
> the SMMU is being shut down in your case and verify that it *is* infact
> benign before we start making the message more friendly. As Robin said
> earlier, rogue DMA is a real nightmare to debug.
> 

I could see this error message for all the clients of apps_smmu.
I checked manually enabling bypass and removing iommus dt property
for each client of apps_smmu.

Thanks,
Sai
Doug Anderson April 22, 2020, 7:49 p.m. UTC | #11
Hi,

On Tue, Mar 31, 2020 at 12:53 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> Hi Will,
>
> On 2020-03-31 13:14, Will Deacon wrote:
> > On Tue, Mar 31, 2020 at 01:06:11PM +0530, Sai Prakash Ranjan wrote:
> >> On 2020-03-30 23:54, Doug Anderson wrote:
> >> > On Sat, Mar 28, 2020 at 12:35 AM Sai Prakash Ranjan
> >> > <saiprakash.ranjan@codeaurora.org> wrote:
> >> > >
> >> > > > Of course the fact that in practice we'll *always* see the warning
> >> > > > because there's no way to tear down the default DMA domains, and even
> >> > > > if all devices *have* been nicely quiesced there's no way to tell, is
> >> > > > certainly less than ideal. Like I say, it's not entirely clear-cut
> >> > > > either way...
> >> > > >
> >> > >
> >> > > Thanks for these examples, good to know these scenarios in case we
> >> > > come
> >> > > across these.
> >> > > However, if we see these error/warning messages appear everytime then
> >> > > what will be
> >> > > the credibility of these messages? We will just ignore these messages
> >> > > when
> >> > > these issues you mention actually appears because we see them
> >> > > everytime
> >> > > on
> >> > > reboot or shutdown.
> >> >
> >> > I would agree that if these messages are expected to be seen every
> >> > time, there's no way to fix them, and they're not indicative of any
> >> > problem then something should be done.  Seeing something printed at
> >> > "dev_error" level with an exclamation point (!) at the end makes me
> >> > feel like this is something that needs immediate action on my part.
> >> >
> >> > If we really can't do better but feel that the messages need to be
> >> > there, at least make them dev_info and less scary like:
> >> >
> >> >   arm-smmu 15000000.iommu: turning off; DMA should be quiesced before
> >> > now
> >> >
> >> > ...that would still give you a hint in the logs that if you saw a DMA
> >> > transaction after the message that it was a bug but also wouldn't
> >> > sound scary to someone who wasn't seeing any other problems.
> >> >
> >>
> >> We can do this if Robin is OK?
> >
> > It would be nice if you could figure out which domains are still live
> > when
> > the SMMU is being shut down in your case and verify that it *is* infact
> > benign before we start making the message more friendly. As Robin said
> > earlier, rogue DMA is a real nightmare to debug.
> >
>
> I could see this error message for all the clients of apps_smmu.
> I checked manually enabling bypass and removing iommus dt property
> for each client of apps_smmu.

Any update on the status here?  If I'm reading the conversation above,
Robin said: "we'll *always* see the warning because there's no way to
tear down the default DMA domains, and even if all devices *have* been
nicely quiesced there's no way to tell".  Did I understand that
properly?  If so, it seems like it's fully expected to see this
message on every reboot and it doesn't necessarily signify anything
bad.

-Doug
Sai Prakash Ranjan April 23, 2020, 8:17 a.m. UTC | #12
Hi,

On 2020-04-23 01:19, Doug Anderson wrote:
> Hi,
> 
> On Tue, Mar 31, 2020 at 12:53 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>> 
>> Hi Will,
>> 
>> On 2020-03-31 13:14, Will Deacon wrote:
>> > On Tue, Mar 31, 2020 at 01:06:11PM +0530, Sai Prakash Ranjan wrote:
>> >> On 2020-03-30 23:54, Doug Anderson wrote:
>> >> > On Sat, Mar 28, 2020 at 12:35 AM Sai Prakash Ranjan
>> >> > <saiprakash.ranjan@codeaurora.org> wrote:
>> >> > >
>> >> > > > Of course the fact that in practice we'll *always* see the warning
>> >> > > > because there's no way to tear down the default DMA domains, and even
>> >> > > > if all devices *have* been nicely quiesced there's no way to tell, is
>> >> > > > certainly less than ideal. Like I say, it's not entirely clear-cut
>> >> > > > either way...
>> >> > > >
>> >> > >
>> >> > > Thanks for these examples, good to know these scenarios in case we
>> >> > > come
>> >> > > across these.
>> >> > > However, if we see these error/warning messages appear everytime then
>> >> > > what will be
>> >> > > the credibility of these messages? We will just ignore these messages
>> >> > > when
>> >> > > these issues you mention actually appears because we see them
>> >> > > everytime
>> >> > > on
>> >> > > reboot or shutdown.
>> >> >
>> >> > I would agree that if these messages are expected to be seen every
>> >> > time, there's no way to fix them, and they're not indicative of any
>> >> > problem then something should be done.  Seeing something printed at
>> >> > "dev_error" level with an exclamation point (!) at the end makes me
>> >> > feel like this is something that needs immediate action on my part.
>> >> >
>> >> > If we really can't do better but feel that the messages need to be
>> >> > there, at least make them dev_info and less scary like:
>> >> >
>> >> >   arm-smmu 15000000.iommu: turning off; DMA should be quiesced before
>> >> > now
>> >> >
>> >> > ...that would still give you a hint in the logs that if you saw a DMA
>> >> > transaction after the message that it was a bug but also wouldn't
>> >> > sound scary to someone who wasn't seeing any other problems.
>> >> >
>> >>
>> >> We can do this if Robin is OK?
>> >
>> > It would be nice if you could figure out which domains are still live
>> > when
>> > the SMMU is being shut down in your case and verify that it *is* infact
>> > benign before we start making the message more friendly. As Robin said
>> > earlier, rogue DMA is a real nightmare to debug.
>> >
>> 
>> I could see this error message for all the clients of apps_smmu.
>> I checked manually enabling bypass and removing iommus dt property
>> for each client of apps_smmu.
> 
> Any update on the status here?  If I'm reading the conversation above,
> Robin said: "we'll *always* see the warning because there's no way to
> tear down the default DMA domains, and even if all devices *have* been
> nicely quiesced there's no way to tell".  Did I understand that
> properly?  If so, it seems like it's fully expected to see this
> message on every reboot and it doesn't necessarily signify anything
> bad.
> 

Understanding is the same, waiting for Will and Robin to check if its OK
to make the message more friendly.

-Sai
Robin Murphy April 23, 2020, 9:28 a.m. UTC | #13
On 2020-04-23 9:17 am, Sai Prakash Ranjan wrote:
[...]
>> Any update on the status here?  If I'm reading the conversation above,
>> Robin said: "we'll *always* see the warning because there's no way to
>> tear down the default DMA domains, and even if all devices *have* been
>> nicely quiesced there's no way to tell".  Did I understand that
>> properly?  If so, it seems like it's fully expected to see this
>> message on every reboot and it doesn't necessarily signify anything
>> bad.
>>
> 
> Understanding is the same, waiting for Will and Robin to check if its OK
> to make the message more friendly.

The way I see it, we essentially just want *something* visible that will 
correlate with any misbehaviour that *might* result from turning off a 
possibly-live context. How about simply "disabling translation", at 
dev_warn or dev_info level?

Robin.
Sai Prakash Ranjan April 23, 2020, 9:41 a.m. UTC | #14
On 2020-04-23 14:58, Robin Murphy wrote:
> On 2020-04-23 9:17 am, Sai Prakash Ranjan wrote:
> [...]
>>> Any update on the status here?  If I'm reading the conversation 
>>> above,
>>> Robin said: "we'll *always* see the warning because there's no way to
>>> tear down the default DMA domains, and even if all devices *have* 
>>> been
>>> nicely quiesced there's no way to tell".  Did I understand that
>>> properly?  If so, it seems like it's fully expected to see this
>>> message on every reboot and it doesn't necessarily signify anything
>>> bad.
>>> 
>> 
>> Understanding is the same, waiting for Will and Robin to check if its 
>> OK
>> to make the message more friendly.
> 
> The way I see it, we essentially just want *something* visible that
> will correlate with any misbehaviour that *might* result from turning
> off a possibly-live context. How about simply "disabling translation",
> at dev_warn or dev_info level?
> 


Sounds good, I'll go with disabling translation with dev_info.

Thanks,
Sai
diff mbox series

Patch

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16c4b87af42b..0a865e32054a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2250,7 +2250,7 @@  static int arm_smmu_device_remove(struct platform_device *pdev)
 		return -ENODEV;
 
 	if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS))
-		dev_err(&pdev->dev, "removing device with active domains!\n");
+		dev_dbg(&pdev->dev, "removing device with active domains!\n");
 
 	arm_smmu_bus_init(NULL);
 	iommu_device_unregister(&smmu->iommu);