diff mbox series

[1/2] iommu: add support for drivers that manage iommu explicitly

Message ID 20190702202631.32148-2-robdclark@gmail.com (mailing list archive)
State New, archived
Headers show
Series iommu: handle drivers that manage iommu directly | expand

Commit Message

Rob Clark July 2, 2019, 8:26 p.m. UTC
From: Rob Clark <robdclark@chromium.org>

Avoid attaching any non-driver managed domain if the driver indicates
that it manages the iommu directly.

This solves a couple problems that drm/msm + arm-smmu has with the iommu
framework:

1) In some cases the bootloader takes the iommu out of bypass and
   enables the display.  This is in particular a problem on the aarch64
   laptops that exist these days, and modern snapdragon android devices.
   (Older devices also enabled the display in bootloader but did not
   take the iommu out of bypass.)  Attaching a DMA or IDENTITY domain
   while scanout is active, before the driver has a chance to intervene,
   makes things go *boom*

2) We are currently blocked on landing support for GPU per-context
   pagetables because of the domain attached before driver's ->probe()
   is called.

This solves both problems.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/iommu/iommu.c  | 11 +++++++++++
 include/linux/device.h |  3 ++-
 2 files changed, 13 insertions(+), 1 deletion(-)

Comments

Robin Murphy July 3, 2019, 12:42 p.m. UTC | #1
On 02/07/2019 21:26, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Avoid attaching any non-driver managed domain if the driver indicates
> that it manages the iommu directly.
> 
> This solves a couple problems that drm/msm + arm-smmu has with the iommu
> framework:
> 
> 1) In some cases the bootloader takes the iommu out of bypass and
>     enables the display.  This is in particular a problem on the aarch64
>     laptops that exist these days, and modern snapdragon android devices.
>     (Older devices also enabled the display in bootloader but did not
>     take the iommu out of bypass.)  Attaching a DMA or IDENTITY domain
>     while scanout is active, before the driver has a chance to intervene,
>     makes things go *boom*

In the general case, we have to assume that things already went boom 
long ago, as soon as the IOMMU itself was probed and reset. By the time 
we get to the point of binding of a client driver, also assume that the 
IOMMU is already powered off and stopping traffic because the RPM device 
links aren't in place yet and it believes itself unused.

> 2) We are currently blocked on landing support for GPU per-context
>     pagetables because of the domain attached before driver's ->probe()
>     is called.

I'm getting a little fed up of explaining that that problem is specific 
to the current behaviour of one particular IOMMU driver and trying to 
work around it anywhere other than in that driver is at best an 
unreliable hack.

> This solves both problems.

For a very, very specific value of "solve"... ;)

> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/iommu/iommu.c  | 11 +++++++++++
>   include/linux/device.h |  3 ++-
>   2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0c674d80c37f..efa0957f9772 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1573,6 +1573,17 @@ static int __iommu_attach_device(struct iommu_domain *domain,
>   	    domain->ops->is_attach_deferred(domain, dev))
>   		return 0;
>   
> +	/*
> +	 * If driver is going to manage iommu directly, then avoid
> +	 * attaching any non driver managed domain.  There could
> +	 * be already active dma underway (ie. scanout in case of
> +	 * bootloader enabled display), and interfering with that
> +	 * will make things go *boom*
> +	 */
> +	if ((domain->type != IOMMU_DOMAIN_UNMANAGED) &&
> +	    dev->driver && dev->driver->driver_manages_iommu)
> +		return 0;

This leaving things half-hanging is really ugly, but more than that it 
assumes that allocating a default domain in the first place isn't 
disruptive - I'm not 100% sure that's *always* the case today, and it's 
definitely likely to change in future as part of improving the current 
request_dm_for_dev() mechanism. As it happens, those proposed changes 
would not only break this idea, but make it redundant, since they're 
about forcing the default domain type to passthrough on a per-device 
basis, which leads to an equivalent end result to this patch, but in a 
cleaner and more robust manner.

Robin.

> +
>   	if (unlikely(domain->ops->attach_dev == NULL))
>   		return -ENODEV;
>   
> diff --git a/include/linux/device.h b/include/linux/device.h
> index e138baabe01e..d98aa4d3c8c3 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -282,7 +282,8 @@ struct device_driver {
>   	struct module		*owner;
>   	const char		*mod_name;	/* used for built-in modules */
>   
> -	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
> +	bool suppress_bind_attrs:1;	/* disables bind/unbind via sysfs */
> +	bool driver_manages_iommu:1;	/* driver manages IOMMU explicitly */
>   	enum probe_type probe_type;
>   
>   	const struct of_device_id	*of_match_table;
>
Rob Clark July 3, 2019, 2:18 p.m. UTC | #2
On Wed, Jul 3, 2019 at 5:42 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 02/07/2019 21:26, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Avoid attaching any non-driver managed domain if the driver indicates
> > that it manages the iommu directly.
> >
> > This solves a couple problems that drm/msm + arm-smmu has with the iommu
> > framework:
> >
> > 1) In some cases the bootloader takes the iommu out of bypass and
> >     enables the display.  This is in particular a problem on the aarch64
> >     laptops that exist these days, and modern snapdragon android devices.
> >     (Older devices also enabled the display in bootloader but did not
> >     take the iommu out of bypass.)  Attaching a DMA or IDENTITY domain
> >     while scanout is active, before the driver has a chance to intervene,
> >     makes things go *boom*
>
> In the general case, we have to assume that things already went boom
> long ago, as soon as the IOMMU itself was probed and reset. By the time
> we get to the point of binding of a client driver, also assume that the
> IOMMU is already powered off and stopping traffic because the RPM device
> links aren't in place yet and it believes itself unused.

you are correct that this is only part of what is needed to get things
working.  We also need Bjorn's patch set to inherit SMR and CB config
during init:

https://www.spinics.net/lists/arm-kernel/msg732246.html

>
> > 2) We are currently blocked on landing support for GPU per-context
> >     pagetables because of the domain attached before driver's ->probe()
> >     is called.
>
> I'm getting a little fed up of explaining that that problem is specific
> to the current behaviour of one particular IOMMU driver and trying to
> work around it anywhere other than in that driver is at best an
> unreliable hack.

Perhaps the GPU part of the problem.  The display part is not.
However I'm fine to move the "don't actually attach" part into
arm-smmu if that is preferred.  The next person to hit the same
problem on a different iommu could certainly move the check or copy it
into their iommu driver.

> > This solves both problems.
>
> For a very, very specific value of "solve"... ;)

well, "solve" == "it boots fine and doesn't explode"..

I'm certainly happy to entertain alternative suggestions, but these
are real problems that need solutions.

BR,
-R

> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   drivers/iommu/iommu.c  | 11 +++++++++++
> >   include/linux/device.h |  3 ++-
> >   2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 0c674d80c37f..efa0957f9772 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1573,6 +1573,17 @@ static int __iommu_attach_device(struct iommu_domain *domain,
> >           domain->ops->is_attach_deferred(domain, dev))
> >               return 0;
> >
> > +     /*
> > +      * If driver is going to manage iommu directly, then avoid
> > +      * attaching any non driver managed domain.  There could
> > +      * be already active dma underway (ie. scanout in case of
> > +      * bootloader enabled display), and interfering with that
> > +      * will make things go *boom*
> > +      */
> > +     if ((domain->type != IOMMU_DOMAIN_UNMANAGED) &&
> > +         dev->driver && dev->driver->driver_manages_iommu)
> > +             return 0;
>
> This leaving things half-hanging is really ugly, but more than that it
> assumes that allocating a default domain in the first place isn't
> disruptive - I'm not 100% sure that's *always* the case today, and it's
> definitely likely to change in future as part of improving the current
> request_dm_for_dev() mechanism. As it happens, those proposed changes
> would not only break this idea, but make it redundant, since they're
> about forcing the default domain type to passthrough on a per-device
> basis, which leads to an equivalent end result to this patch, but in a
> cleaner and more robust manner.
>
> Robin.
>
> > +
> >       if (unlikely(domain->ops->attach_dev == NULL))
> >               return -ENODEV;
> >
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index e138baabe01e..d98aa4d3c8c3 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -282,7 +282,8 @@ struct device_driver {
> >       struct module           *owner;
> >       const char              *mod_name;      /* used for built-in modules */
> >
> > -     bool suppress_bind_attrs;       /* disables bind/unbind via sysfs */
> > +     bool suppress_bind_attrs:1;     /* disables bind/unbind via sysfs */
> > +     bool driver_manages_iommu:1;    /* driver manages IOMMU explicitly */
> >       enum probe_type probe_type;
> >
> >       const struct of_device_id       *of_match_table;
> >
Joerg Roedel July 4, 2019, 8:20 a.m. UTC | #3
Hi Rob,

On Tue, Jul 02, 2019 at 01:26:18PM -0700, Rob Clark wrote:
> 1) In some cases the bootloader takes the iommu out of bypass and
>    enables the display.  This is in particular a problem on the aarch64
>    laptops that exist these days, and modern snapdragon android devices.
>    (Older devices also enabled the display in bootloader but did not
>    take the iommu out of bypass.)  Attaching a DMA or IDENTITY domain
>    while scanout is active, before the driver has a chance to intervene,
>    makes things go *boom*

Just to make sure I get this right: The bootloader inializes the SMMU
and creates non-identity mappings for the GPU? And when the SMMU driver
in Linux takes over this breaks display output.

> +	/*
> +	 * If driver is going to manage iommu directly, then avoid
> +	 * attaching any non driver managed domain.  There could
> +	 * be already active dma underway (ie. scanout in case of
> +	 * bootloader enabled display), and interfering with that
> +	 * will make things go *boom*
> +	 */
> +	if ((domain->type != IOMMU_DOMAIN_UNMANAGED) &&
> +	    dev->driver && dev->driver->driver_manages_iommu)
> +		return 0;
> +

When the default domain is attached, there is usually no driver attached
yet. I think this needs to be communicated by the firmware to Linux and
the code should check against that.

> -	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
> +	bool suppress_bind_attrs:1;	/* disables bind/unbind via sysfs */
> +	bool driver_manages_iommu:1;	/* driver manages IOMMU explicitly */

How does this field get set?



	Joerg
Rob Clark July 4, 2019, 1:51 p.m. UTC | #4
On Thu, Jul 4, 2019 at 1:20 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> Hi Rob,
>
> On Tue, Jul 02, 2019 at 01:26:18PM -0700, Rob Clark wrote:
> > 1) In some cases the bootloader takes the iommu out of bypass and
> >    enables the display.  This is in particular a problem on the aarch64
> >    laptops that exist these days, and modern snapdragon android devices.
> >    (Older devices also enabled the display in bootloader but did not
> >    take the iommu out of bypass.)  Attaching a DMA or IDENTITY domain
> >    while scanout is active, before the driver has a chance to intervene,
> >    makes things go *boom*
>
> Just to make sure I get this right: The bootloader inializes the SMMU
> and creates non-identity mappings for the GPU? And when the SMMU driver
> in Linux takes over this breaks display output.

correct

> > +     /*
> > +      * If driver is going to manage iommu directly, then avoid
> > +      * attaching any non driver managed domain.  There could
> > +      * be already active dma underway (ie. scanout in case of
> > +      * bootloader enabled display), and interfering with that
> > +      * will make things go *boom*
> > +      */
> > +     if ((domain->type != IOMMU_DOMAIN_UNMANAGED) &&
> > +         dev->driver && dev->driver->driver_manages_iommu)
> > +             return 0;
> > +
>
> When the default domain is attached, there is usually no driver attached
> yet. I think this needs to be communicated by the firmware to Linux and
> the code should check against that.

At least for the OF case, it happens in the of_dma_configure() which
happens from really_probe(), so there is normally a driver.  There are
a few exceptional cases, where drivers call of_dma_configure() on
their own sub-device without a driver attached (hence the need to
check if dev->driver is NULL).

I'm also interested in the ACPI case eventually... the aarch64
"windows" laptops do have ACPI.  But for now we are booting with DT
since there is quite a lot of work before we get to point of using
ACPI.  (In particular, under windows, device power management is done
thru a Platform Extension  Plugin (PEP), but so far linux has no such
mechanism.)

We really don't have control of the firmware.  But when arm-smmu is
probed it can read back the hw state and figure out what is going on
(with an RFC series[1] from Bjorn which was posted earlier), so we
don't really need to depend on the firmware.

> > -     bool suppress_bind_attrs;       /* disables bind/unbind via sysfs */
> > +     bool suppress_bind_attrs:1;     /* disables bind/unbind via sysfs */
> > +     bool driver_manages_iommu:1;    /* driver manages IOMMU explicitly */
>
> How does this field get set?


It is set in the driver in the second patch[2] in this series.

BR,
-R

[1] https://www.spinics.net/lists/arm-kernel/msg732246.html
[2] https://patchwork.freedesktop.org/patch/315291/
diff mbox series

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c674d80c37f..efa0957f9772 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1573,6 +1573,17 @@  static int __iommu_attach_device(struct iommu_domain *domain,
 	    domain->ops->is_attach_deferred(domain, dev))
 		return 0;
 
+	/*
+	 * If driver is going to manage iommu directly, then avoid
+	 * attaching any non driver managed domain.  There could
+	 * be already active dma underway (ie. scanout in case of
+	 * bootloader enabled display), and interfering with that
+	 * will make things go *boom*
+	 */
+	if ((domain->type != IOMMU_DOMAIN_UNMANAGED) &&
+	    dev->driver && dev->driver->driver_manages_iommu)
+		return 0;
+
 	if (unlikely(domain->ops->attach_dev == NULL))
 		return -ENODEV;
 
diff --git a/include/linux/device.h b/include/linux/device.h
index e138baabe01e..d98aa4d3c8c3 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -282,7 +282,8 @@  struct device_driver {
 	struct module		*owner;
 	const char		*mod_name;	/* used for built-in modules */
 
-	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
+	bool suppress_bind_attrs:1;	/* disables bind/unbind via sysfs */
+	bool driver_manages_iommu:1;	/* driver manages IOMMU explicitly */
 	enum probe_type probe_type;
 
 	const struct of_device_id	*of_match_table;