diff mbox series

[RFC,v3,5/6] dt-bindings: of: Add restricted DMA pool

Message ID 20210106034124.30560-6-tientzu@chromium.org (mailing list archive)
State Superseded
Headers show
Series Restricted DMA | expand

Commit Message

Claire Chang Jan. 6, 2021, 3:41 a.m. UTC
Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the device tree.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Konrad Rzeszutek Wilk Jan. 6, 2021, 6:57 p.m. UTC | #1
On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> Introduce the new compatible string, restricted-dma-pool, for restricted
> DMA. One can specify the address and length of the restricted DMA memory
> region by restricted-dma-pool in the device tree.
> 
> Signed-off-by: Claire Chang <tientzu@chromium.org>
> ---
>  .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index e8d3096d922c..44975e2a1fd2 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>            used as a shared pool of DMA buffers for a set of devices. It can
>            be used by an operating system to instantiate the necessary pool
>            management subsystem if necessary.
> +        - restricted-dma-pool: This indicates a region of memory meant to be
> +          used as a pool of restricted DMA buffers for a set of devices. The
> +          memory region would be the only region accessible to those devices.
> +          When using this, the no-map and reusable properties must not be set,
> +          so the operating system can create a virtual mapping that will be used
> +          for synchronization. The main purpose for restricted DMA is to
> +          mitigate the lack of DMA access control on systems without an IOMMU,
> +          which could result in the DMA accessing the system memory at
> +          unexpected times and/or unexpected addresses, possibly leading to data
> +          leakage or corruption. The feature on its own provides a basic level
> +          of protection against the DMA overwriting buffer contents at
> +          unexpected times. However, to protect against general data leakage and
> +          system memory corruption, the system needs to provide way to restrict
> +          the DMA to a predefined memory region.

Heya!

I think I am missing something obvious here so please bear with my
questions:

 - This code adds the means of having the SWIOTLB pool tied to a specific
   memory correct?

 - Nothing stops the physical device from bypassing the SWIOTLB buffer.
   That is if an errant device screwed up the length or DMA address, the
   SWIOTLB would gladly do what the device told it do?

 - This has to be combined with SWIOTLB-force-ish to always use the
   bounce buffer, otherwise you could still do DMA without using
   SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
Claire Chang Jan. 7, 2021, 5:39 p.m. UTC | #2
On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>
> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > Introduce the new compatible string, restricted-dma-pool, for restricted
> > DMA. One can specify the address and length of the restricted DMA memory
> > region by restricted-dma-pool in the device tree.
> >
> > Signed-off-by: Claire Chang <tientzu@chromium.org>
> > ---
> >  .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > index e8d3096d922c..44975e2a1fd2 100644
> > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> >            used as a shared pool of DMA buffers for a set of devices. It can
> >            be used by an operating system to instantiate the necessary pool
> >            management subsystem if necessary.
> > +        - restricted-dma-pool: This indicates a region of memory meant to be
> > +          used as a pool of restricted DMA buffers for a set of devices. The
> > +          memory region would be the only region accessible to those devices.
> > +          When using this, the no-map and reusable properties must not be set,
> > +          so the operating system can create a virtual mapping that will be used
> > +          for synchronization. The main purpose for restricted DMA is to
> > +          mitigate the lack of DMA access control on systems without an IOMMU,
> > +          which could result in the DMA accessing the system memory at
> > +          unexpected times and/or unexpected addresses, possibly leading to data
> > +          leakage or corruption. The feature on its own provides a basic level
> > +          of protection against the DMA overwriting buffer contents at
> > +          unexpected times. However, to protect against general data leakage and
> > +          system memory corruption, the system needs to provide way to restrict
> > +          the DMA to a predefined memory region.
>
> Heya!
>
> I think I am missing something obvious here so please bear with my
> questions:
>
>  - This code adds the means of having the SWIOTLB pool tied to a specific
>    memory correct?

It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
code to create another DMA pool tied to a specific memory region for a given set
of devices. It bounces the streaming DMA (map/unmap) in and out of that region
and does the memory allocation (dma_direct_alloc) from the same region.

>
>
>  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
>    That is if an errant device screwed up the length or DMA address, the
>    SWIOTLB would gladly do what the device told it do?

So the system needs to provide a way to lock down the memory access, e.g. MPU.

>
>  - This has to be combined with SWIOTLB-force-ish to always use the
>    bounce buffer, otherwise you could still do DMA without using
>    SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?

Since restricted DMA is for the devices that are not behind an IOMMU, I change
the criteria
`if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
to
`if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
in dma_direct_map_page().

Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
(get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).

Thanks!
Konrad Rzeszutek Wilk Jan. 7, 2021, 6 p.m. UTC | #3
On Fri, Jan 08, 2021 at 01:39:43AM +0800, Claire Chang wrote:
> On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> >
> > On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > > Introduce the new compatible string, restricted-dma-pool, for restricted
> > > DMA. One can specify the address and length of the restricted DMA memory
> > > region by restricted-dma-pool in the device tree.
> > >
> > > Signed-off-by: Claire Chang <tientzu@chromium.org>
> > > ---
> > >  .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
> > >  1 file changed, 24 insertions(+)
> > >
> > > diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > index e8d3096d922c..44975e2a1fd2 100644
> > > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> > >            used as a shared pool of DMA buffers for a set of devices. It can
> > >            be used by an operating system to instantiate the necessary pool
> > >            management subsystem if necessary.
> > > +        - restricted-dma-pool: This indicates a region of memory meant to be
> > > +          used as a pool of restricted DMA buffers for a set of devices. The
> > > +          memory region would be the only region accessible to those devices.
> > > +          When using this, the no-map and reusable properties must not be set,
> > > +          so the operating system can create a virtual mapping that will be used
> > > +          for synchronization. The main purpose for restricted DMA is to
> > > +          mitigate the lack of DMA access control on systems without an IOMMU,
> > > +          which could result in the DMA accessing the system memory at
> > > +          unexpected times and/or unexpected addresses, possibly leading to data
> > > +          leakage or corruption. The feature on its own provides a basic level
> > > +          of protection against the DMA overwriting buffer contents at
> > > +          unexpected times. However, to protect against general data leakage and
> > > +          system memory corruption, the system needs to provide way to restrict
> > > +          the DMA to a predefined memory region.
> >
> > Heya!
> >
> > I think I am missing something obvious here so please bear with my
> > questions:
> >
> >  - This code adds the means of having the SWIOTLB pool tied to a specific
> >    memory correct?
> 
> It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
> code to create another DMA pool tied to a specific memory region for a given set
> of devices. It bounces the streaming DMA (map/unmap) in and out of that region
> and does the memory allocation (dma_direct_alloc) from the same region.

Right, so why can't it follow the same mechanism that Xen SWIOTLB does - which
had exactly the same problem (needed special handling on the pool) - and do
a similar code?

> 
> >
> >
> >  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> >    That is if an errant device screwed up the length or DMA address, the
> >    SWIOTLB would gladly do what the device told it do?
> 
> So the system needs to provide a way to lock down the memory access, e.g. MPU.

OK! Would it be prudent to have this in the description above perhaps?
> 
> >
> >  - This has to be combined with SWIOTLB-force-ish to always use the
> >    bounce buffer, otherwise you could still do DMA without using
> >    SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
> 
> Since restricted DMA is for the devices that are not behind an IOMMU, I change
> the criteria
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
> to
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
> in dma_direct_map_page().
> 
> Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
> (get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).
> 
> Thanks!
Florian Fainelli Jan. 7, 2021, 6:14 p.m. UTC | #4
On 1/7/21 10:00 AM, Konrad Rzeszutek Wilk wrote:
>>>
>>>
>>>  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
>>>    That is if an errant device screwed up the length or DMA address, the
>>>    SWIOTLB would gladly do what the device told it do?
>>
>> So the system needs to provide a way to lock down the memory access, e.g. MPU.
> 
> OK! Would it be prudent to have this in the description above perhaps?

Yes this is something that must be documented as a requirement for the
restricted DMA pool users, otherwise attempting to do restricted DMA
pool is no different than say, using a device private CMA region.
Without the enforcement, this is just a best effort.
Claire Chang Jan. 12, 2021, 7:47 a.m. UTC | #5
On Fri, Jan 8, 2021 at 2:15 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> On 1/7/21 10:00 AM, Konrad Rzeszutek Wilk wrote:
> >>>
> >>>
> >>>  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> >>>    That is if an errant device screwed up the length or DMA address, the
> >>>    SWIOTLB would gladly do what the device told it do?
> >>
> >> So the system needs to provide a way to lock down the memory access, e.g. MPU.
> >
> > OK! Would it be prudent to have this in the description above perhaps?
>
> Yes this is something that must be documented as a requirement for the
> restricted DMA pool users, otherwise attempting to do restricted DMA
> pool is no different than say, using a device private CMA region.
> Without the enforcement, this is just a best effort.

Will add in the next version.

> --
> Florian
Rob Herring Jan. 20, 2021, 4:53 p.m. UTC | #6
On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> Introduce the new compatible string, restricted-dma-pool, for restricted
> DMA. One can specify the address and length of the restricted DMA memory
> region by restricted-dma-pool in the device tree.

If this goes into DT, I think we should be able to use dma-ranges for 
this purpose instead. Normally, 'dma-ranges' is for physical bus 
restrictions, but there's no reason it can't be used for policy or to 
express restrictions the firmware has enabled.

> Signed-off-by: Claire Chang <tientzu@chromium.org>
> ---
>  .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index e8d3096d922c..44975e2a1fd2 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>            used as a shared pool of DMA buffers for a set of devices. It can
>            be used by an operating system to instantiate the necessary pool
>            management subsystem if necessary.
> +        - restricted-dma-pool: This indicates a region of memory meant to be
> +          used as a pool of restricted DMA buffers for a set of devices. The
> +          memory region would be the only region accessible to those devices.
> +          When using this, the no-map and reusable properties must not be set,
> +          so the operating system can create a virtual mapping that will be used
> +          for synchronization. The main purpose for restricted DMA is to
> +          mitigate the lack of DMA access control on systems without an IOMMU,
> +          which could result in the DMA accessing the system memory at
> +          unexpected times and/or unexpected addresses, possibly leading to data
> +          leakage or corruption. The feature on its own provides a basic level
> +          of protection against the DMA overwriting buffer contents at
> +          unexpected times. However, to protect against general data leakage and
> +          system memory corruption, the system needs to provide way to restrict
> +          the DMA to a predefined memory region.
>          - vendor specific string in the form <vendor>,[<device>-]<usage>
>  no-map (optional) - empty property
>      - Indicates the operating system must not create a virtual mapping
> @@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>  			compatible = "acme,multimedia-memory";
>  			reg = <0x77000000 0x4000000>;
>  		};
> +
> +		restricted_dma_mem_reserved: restricted_dma_mem_reserved {
> +			compatible = "restricted-dma-pool";
> +			reg = <0x50000000 0x400000>;
> +		};
>  	};
>  
>  	/* ... */
> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>  		memory-region = <&multimedia_reserved>;
>  		/* ... */
>  	};
> +
> +	pcie_device: pcie_device@0,0 {
> +		memory-region = <&restricted_dma_mem_reserved>;

PCI hosts often have inbound window configurations that limit the 
address range and translate PCI to bus addresses. Those windows happen 
to be configured by dma-ranges. In any case, wouldn't you want to put 
the configuration in the PCI host node? Is there a usecase of 
restricting one PCIe device and not another? 

Rob
Robin Murphy Jan. 20, 2021, 5:30 p.m. UTC | #7
On 2021-01-20 16:53, Rob Herring wrote:
> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
>> Introduce the new compatible string, restricted-dma-pool, for restricted
>> DMA. One can specify the address and length of the restricted DMA memory
>> region by restricted-dma-pool in the device tree.
> 
> If this goes into DT, I think we should be able to use dma-ranges for
> this purpose instead. Normally, 'dma-ranges' is for physical bus
> restrictions, but there's no reason it can't be used for policy or to
> express restrictions the firmware has enabled.

There would still need to be some way to tell SWIOTLB to pick up the 
corresponding chunk of memory and to prevent the kernel from using it 
for anything else, though.

>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>> ---
>>   .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
>>   1 file changed, 24 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>> index e8d3096d922c..44975e2a1fd2 100644
>> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>>             used as a shared pool of DMA buffers for a set of devices. It can
>>             be used by an operating system to instantiate the necessary pool
>>             management subsystem if necessary.
>> +        - restricted-dma-pool: This indicates a region of memory meant to be
>> +          used as a pool of restricted DMA buffers for a set of devices. The
>> +          memory region would be the only region accessible to those devices.
>> +          When using this, the no-map and reusable properties must not be set,
>> +          so the operating system can create a virtual mapping that will be used
>> +          for synchronization. The main purpose for restricted DMA is to
>> +          mitigate the lack of DMA access control on systems without an IOMMU,
>> +          which could result in the DMA accessing the system memory at
>> +          unexpected times and/or unexpected addresses, possibly leading to data
>> +          leakage or corruption. The feature on its own provides a basic level
>> +          of protection against the DMA overwriting buffer contents at
>> +          unexpected times. However, to protect against general data leakage and
>> +          system memory corruption, the system needs to provide way to restrict
>> +          the DMA to a predefined memory region.
>>           - vendor specific string in the form <vendor>,[<device>-]<usage>
>>   no-map (optional) - empty property
>>       - Indicates the operating system must not create a virtual mapping
>> @@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>>   			compatible = "acme,multimedia-memory";
>>   			reg = <0x77000000 0x4000000>;
>>   		};
>> +
>> +		restricted_dma_mem_reserved: restricted_dma_mem_reserved {
>> +			compatible = "restricted-dma-pool";
>> +			reg = <0x50000000 0x400000>;
>> +		};
>>   	};
>>   
>>   	/* ... */
>> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>>   		memory-region = <&multimedia_reserved>;
>>   		/* ... */
>>   	};
>> +
>> +	pcie_device: pcie_device@0,0 {
>> +		memory-region = <&restricted_dma_mem_reserved>;
> 
> PCI hosts often have inbound window configurations that limit the
> address range and translate PCI to bus addresses. Those windows happen
> to be configured by dma-ranges. In any case, wouldn't you want to put
> the configuration in the PCI host node? Is there a usecase of
> restricting one PCIe device and not another?

The general design seems to accommodate devices having their own pools 
such that they can't even snoop on each others' transient DMA data. If 
the interconnect had a way of wiring up, say, PCI RIDs to AMBA NSAIDs, 
then in principle you could certainly apply that to PCI endpoints too 
(presumably you'd also disallow them from peer-to-peer transactions at 
the PCI level too).

Robin.
Rob Herring Jan. 20, 2021, 9:31 p.m. UTC | #8
On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-01-20 16:53, Rob Herring wrote:
> > On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> >> Introduce the new compatible string, restricted-dma-pool, for restricted
> >> DMA. One can specify the address and length of the restricted DMA memory
> >> region by restricted-dma-pool in the device tree.
> >
> > If this goes into DT, I think we should be able to use dma-ranges for
> > this purpose instead. Normally, 'dma-ranges' is for physical bus
> > restrictions, but there's no reason it can't be used for policy or to
> > express restrictions the firmware has enabled.
>
> There would still need to be some way to tell SWIOTLB to pick up the
> corresponding chunk of memory and to prevent the kernel from using it
> for anything else, though.

Don't we already have that problem if dma-ranges had a very small
range? We just get lucky because the restriction is generally much
more RAM than needed.

In any case, wouldn't finding all the dma-ranges do this? We're
already walking the tree to find the max DMA address now.

> >> Signed-off-by: Claire Chang <tientzu@chromium.org>
> >> ---
> >>   .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
> >>   1 file changed, 24 insertions(+)
> >>
> >> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> >> index e8d3096d922c..44975e2a1fd2 100644
> >> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> >> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> >> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> >>             used as a shared pool of DMA buffers for a set of devices. It can
> >>             be used by an operating system to instantiate the necessary pool
> >>             management subsystem if necessary.
> >> +        - restricted-dma-pool: This indicates a region of memory meant to be
> >> +          used as a pool of restricted DMA buffers for a set of devices. The
> >> +          memory region would be the only region accessible to those devices.
> >> +          When using this, the no-map and reusable properties must not be set,
> >> +          so the operating system can create a virtual mapping that will be used
> >> +          for synchronization. The main purpose for restricted DMA is to
> >> +          mitigate the lack of DMA access control on systems without an IOMMU,
> >> +          which could result in the DMA accessing the system memory at
> >> +          unexpected times and/or unexpected addresses, possibly leading to data
> >> +          leakage or corruption. The feature on its own provides a basic level
> >> +          of protection against the DMA overwriting buffer contents at
> >> +          unexpected times. However, to protect against general data leakage and
> >> +          system memory corruption, the system needs to provide way to restrict
> >> +          the DMA to a predefined memory region.
> >>           - vendor specific string in the form <vendor>,[<device>-]<usage>
> >>   no-map (optional) - empty property
> >>       - Indicates the operating system must not create a virtual mapping
> >> @@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
> >>                      compatible = "acme,multimedia-memory";
> >>                      reg = <0x77000000 0x4000000>;
> >>              };
> >> +
> >> +            restricted_dma_mem_reserved: restricted_dma_mem_reserved {
> >> +                    compatible = "restricted-dma-pool";
> >> +                    reg = <0x50000000 0x400000>;
> >> +            };
> >>      };
> >>
> >>      /* ... */
> >> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
> >>              memory-region = <&multimedia_reserved>;
> >>              /* ... */
> >>      };
> >> +
> >> +    pcie_device: pcie_device@0,0 {
> >> +            memory-region = <&restricted_dma_mem_reserved>;
> >
> > PCI hosts often have inbound window configurations that limit the
> > address range and translate PCI to bus addresses. Those windows happen
> > to be configured by dma-ranges. In any case, wouldn't you want to put
> > the configuration in the PCI host node? Is there a usecase of
> > restricting one PCIe device and not another?
>
> The general design seems to accommodate devices having their own pools
> such that they can't even snoop on each others' transient DMA data. If
> the interconnect had a way of wiring up, say, PCI RIDs to AMBA NSAIDs,
> then in principle you could certainly apply that to PCI endpoints too
> (presumably you'd also disallow them from peer-to-peer transactions at
> the PCI level too).

At least for PCI, I think we can handle this. We have the BDF in the
3rd address cell in dma-ranges. The Openfirmware spec says those are 0
in the case of ranges. It doesn't talk about dma-ranges though. But I
think we could extend it to allow for BDF. Though typically with PCIe
every device is behind its own bridge and each bridge node can have a
dma-ranges.

Rob
Robin Murphy Jan. 21, 2021, 1:09 a.m. UTC | #9
On 2021-01-20 21:31, Rob Herring wrote:
> On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2021-01-20 16:53, Rob Herring wrote:
>>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
>>>> Introduce the new compatible string, restricted-dma-pool, for restricted
>>>> DMA. One can specify the address and length of the restricted DMA memory
>>>> region by restricted-dma-pool in the device tree.
>>>
>>> If this goes into DT, I think we should be able to use dma-ranges for
>>> this purpose instead. Normally, 'dma-ranges' is for physical bus
>>> restrictions, but there's no reason it can't be used for policy or to
>>> express restrictions the firmware has enabled.
>>
>> There would still need to be some way to tell SWIOTLB to pick up the
>> corresponding chunk of memory and to prevent the kernel from using it
>> for anything else, though.
> 
> Don't we already have that problem if dma-ranges had a very small
> range? We just get lucky because the restriction is generally much
> more RAM than needed.

Not really - if a device has a naturally tiny addressing capability that 
doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer will be 
allocated then it's unlikely to work well, but that's just crap system 
design. Yes, memory pressure in ZONE_DMA{32} is particularly problematic 
for such limited devices, but it's irrelevant to the issue at hand here.

What we have here is a device that's not allowed to see *kernel* memory 
at all. It's been artificially constrained to a particular region by a 
TZASC or similar, and the only data which should ever be placed in that 
region is data intended for that device to see. That way if it tries to 
go rogue it physically can't start slurping data intended for other 
devices or not mapped for DMA at all. The bouncing is an important part 
of this - I forget the title off-hand but there was an interesting paper 
a few years ago which demonstrated that even with an IOMMU, streaming 
DMA of in-place buffers could reveal enough adjacent data from the same 
page to mount an attack on the system. Memory pressure should be 
immaterial since the size of each bounce pool carveout will presumably 
be tuned for the needs of the given device.

> In any case, wouldn't finding all the dma-ranges do this? We're
> already walking the tree to find the max DMA address now.

If all you can see are two "dma-ranges" properties, how do you propose 
to tell that one means "this is the extent of what I can address, please 
set my masks and dma-range-map accordingly and try to allocate things 
where I can reach them" while the other means "take this output range 
away from the page allocator and hook it up as my dedicated bounce pool, 
because it is Serious Security Time"? Especially since getting that 
choice wrong either way would be a Bad Thing.

Robin.

>>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>>> ---
>>>>    .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
>>>>    1 file changed, 24 insertions(+)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>>> index e8d3096d922c..44975e2a1fd2 100644
>>>> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>>> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>>> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>>>>              used as a shared pool of DMA buffers for a set of devices. It can
>>>>              be used by an operating system to instantiate the necessary pool
>>>>              management subsystem if necessary.
>>>> +        - restricted-dma-pool: This indicates a region of memory meant to be
>>>> +          used as a pool of restricted DMA buffers for a set of devices. The
>>>> +          memory region would be the only region accessible to those devices.
>>>> +          When using this, the no-map and reusable properties must not be set,
>>>> +          so the operating system can create a virtual mapping that will be used
>>>> +          for synchronization. The main purpose for restricted DMA is to
>>>> +          mitigate the lack of DMA access control on systems without an IOMMU,
>>>> +          which could result in the DMA accessing the system memory at
>>>> +          unexpected times and/or unexpected addresses, possibly leading to data
>>>> +          leakage or corruption. The feature on its own provides a basic level
>>>> +          of protection against the DMA overwriting buffer contents at
>>>> +          unexpected times. However, to protect against general data leakage and
>>>> +          system memory corruption, the system needs to provide way to restrict
>>>> +          the DMA to a predefined memory region.
>>>>            - vendor specific string in the form <vendor>,[<device>-]<usage>
>>>>    no-map (optional) - empty property
>>>>        - Indicates the operating system must not create a virtual mapping
>>>> @@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>>>>                       compatible = "acme,multimedia-memory";
>>>>                       reg = <0x77000000 0x4000000>;
>>>>               };
>>>> +
>>>> +            restricted_dma_mem_reserved: restricted_dma_mem_reserved {
>>>> +                    compatible = "restricted-dma-pool";
>>>> +                    reg = <0x50000000 0x400000>;
>>>> +            };
>>>>       };
>>>>
>>>>       /* ... */
>>>> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>>>>               memory-region = <&multimedia_reserved>;
>>>>               /* ... */
>>>>       };
>>>> +
>>>> +    pcie_device: pcie_device@0,0 {
>>>> +            memory-region = <&restricted_dma_mem_reserved>;
>>>
>>> PCI hosts often have inbound window configurations that limit the
>>> address range and translate PCI to bus addresses. Those windows happen
>>> to be configured by dma-ranges. In any case, wouldn't you want to put
>>> the configuration in the PCI host node? Is there a usecase of
>>> restricting one PCIe device and not another?
>>
>> The general design seems to accommodate devices having their own pools
>> such that they can't even snoop on each others' transient DMA data. If
>> the interconnect had a way of wiring up, say, PCI RIDs to AMBA NSAIDs,
>> then in principle you could certainly apply that to PCI endpoints too
>> (presumably you'd also disallow them from peer-to-peer transactions at
>> the PCI level too).
> 
> At least for PCI, I think we can handle this. We have the BDF in the
> 3rd address cell in dma-ranges. The Openfirmware spec says those are 0
> in the case of ranges. It doesn't talk about dma-ranges though. But I
> think we could extend it to allow for BDF. Though typically with PCIe
> every device is behind its own bridge and each bridge node can have a
> dma-ranges.
> 
> Rob
>
Rob Herring Jan. 21, 2021, 3:48 p.m. UTC | #10
On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-01-20 21:31, Rob Herring wrote:
> > On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 2021-01-20 16:53, Rob Herring wrote:
> >>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> >>>> Introduce the new compatible string, restricted-dma-pool, for restricted
> >>>> DMA. One can specify the address and length of the restricted DMA memory
> >>>> region by restricted-dma-pool in the device tree.
> >>>
> >>> If this goes into DT, I think we should be able to use dma-ranges for
> >>> this purpose instead. Normally, 'dma-ranges' is for physical bus
> >>> restrictions, but there's no reason it can't be used for policy or to
> >>> express restrictions the firmware has enabled.
> >>
> >> There would still need to be some way to tell SWIOTLB to pick up the
> >> corresponding chunk of memory and to prevent the kernel from using it
> >> for anything else, though.
> >
> > Don't we already have that problem if dma-ranges had a very small
> > range? We just get lucky because the restriction is generally much
> > more RAM than needed.
>
> Not really - if a device has a naturally tiny addressing capability that
> doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer will be
> allocated then it's unlikely to work well, but that's just crap system
> design. Yes, memory pressure in ZONE_DMA{32} is particularly problematic
> for such limited devices, but it's irrelevant to the issue at hand here.

Yesterday's crap system design is today's security feature. Couldn't
this feature make crap system design work better?

> What we have here is a device that's not allowed to see *kernel* memory
> at all. It's been artificially constrained to a particular region by a
> TZASC or similar, and the only data which should ever be placed in that

May have been constrained, but that's entirely optional.

In the optional case where the setup is entirely up to the OS, I don't
think this belongs in the DT at all. Perhaps that should be solved
first.

> region is data intended for that device to see. That way if it tries to
> go rogue it physically can't start slurping data intended for other
> devices or not mapped for DMA at all. The bouncing is an important part
> of this - I forget the title off-hand but there was an interesting paper
> a few years ago which demonstrated that even with an IOMMU, streaming
> DMA of in-place buffers could reveal enough adjacent data from the same
> page to mount an attack on the system. Memory pressure should be
> immaterial since the size of each bounce pool carveout will presumably
> be tuned for the needs of the given device.
>
> > In any case, wouldn't finding all the dma-ranges do this? We're
> > already walking the tree to find the max DMA address now.
>
> If all you can see are two "dma-ranges" properties, how do you propose
> to tell that one means "this is the extent of what I can address, please
> set my masks and dma-range-map accordingly and try to allocate things
> where I can reach them" while the other means "take this output range
> away from the page allocator and hook it up as my dedicated bounce pool,
> because it is Serious Security Time"? Especially since getting that
> choice wrong either way would be a Bad Thing.

Either we have some heuristic based on the size or we add some hint.
The point is let's build on what we already have for defining DMA
accessible memory in DT rather than some parallel mechanism.

Rob
Robin Murphy Jan. 21, 2021, 5:29 p.m. UTC | #11
On 2021-01-21 15:48, Rob Herring wrote:
> On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy <robin.murphy@arm.com>
> wrote:
>> 
>> On 2021-01-20 21:31, Rob Herring wrote:
>>> On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy
>>> <robin.murphy@arm.com> wrote:
>>>> 
>>>> On 2021-01-20 16:53, Rob Herring wrote:
>>>>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang
>>>>> wrote:
>>>>>> Introduce the new compatible string, restricted-dma-pool,
>>>>>> for restricted DMA. One can specify the address and length
>>>>>> of the restricted DMA memory region by restricted-dma-pool
>>>>>> in the device tree.
>>>>> 
>>>>> If this goes into DT, I think we should be able to use
>>>>> dma-ranges for this purpose instead. Normally, 'dma-ranges'
>>>>> is for physical bus restrictions, but there's no reason it
>>>>> can't be used for policy or to express restrictions the
>>>>> firmware has enabled.
>>>> 
>>>> There would still need to be some way to tell SWIOTLB to pick
>>>> up the corresponding chunk of memory and to prevent the kernel
>>>> from using it for anything else, though.
>>> 
>>> Don't we already have that problem if dma-ranges had a very
>>> small range? We just get lucky because the restriction is
>>> generally much more RAM than needed.
>> 
>> Not really - if a device has a naturally tiny addressing capability
>> that doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer
>> will be allocated then it's unlikely to work well, but that's just
>> crap system design. Yes, memory pressure in ZONE_DMA{32} is
>> particularly problematic for such limited devices, but it's
>> irrelevant to the issue at hand here.
> 
> Yesterday's crap system design is today's security feature. Couldn't 
> this feature make crap system design work better?

Indeed! Say you bring out your shiny new "Strawberry Flan 4" machine
with all the latest connectivity, but tragically its PCIe can only
address 25% of the RAM. So you decide to support deploying it in two
configurations: one where it runs normally for best performance, and
another "secure" one where it dedicates that quarter of RAM as a 
restricted DMA pool for any PCIe devices - that way, even if that hotel 
projector you plug in turns out to be a rogue Thunderbolt endpoint, it 
can never snarf your private keys off your eMMC out of the page cache.

(Yes, is is the thinnest of strawmen, but it sets the scene for the 
point you raised...)

...which is that in both cases the dma-ranges will still be identical. 
So how is the kernel going to know whether to steal that whole area from 
memblock before anything else can allocate from it, or not?

I don't disagree that even in Claire's original intended case it would 
be semantically correct to describe the hardware-firewalled region with 
dma-ranges. It just turns out not to be necessary, and you're already 
arguing for not adding anything in DT that doesn't need to be.

>> What we have here is a device that's not allowed to see *kernel*
>> memory at all. It's been artificially constrained to a particular
>> region by a TZASC or similar, and the only data which should ever
>> be placed in that
> 
> May have been constrained, but that's entirely optional.
> 
> In the optional case where the setup is entirely up to the OS, I
> don't think this belongs in the DT at all. Perhaps that should be
> solved first.

Yes! Let's definitely consider that case! Say you don't have any 
security or physical limitations but want to use a bounce pool for some 
device anyway because reasons (perhaps copying streaming DMA data to a 
better guaranteed alignment gives an overall performance win). Now the 
*only* relevant thing to communicate to the kernel is to, ahem, reserve 
a large chunk of memory, and use it for this special purpose. Isn't that 
literally what reserved-memory bindings are for?

>> region is data intended for that device to see. That way if it
>> tries to go rogue it physically can't start slurping data intended
>> for other devices or not mapped for DMA at all. The bouncing is an
>> important part of this - I forget the title off-hand but there was
>> an interesting paper a few years ago which demonstrated that even
>> with an IOMMU, streaming DMA of in-place buffers could reveal
>> enough adjacent data from the same page to mount an attack on the
>> system. Memory pressure should be immaterial since the size of each
>> bounce pool carveout will presumably be tuned for the needs of the
>> given device.
>> 
>>> In any case, wouldn't finding all the dma-ranges do this? We're 
>>> already walking the tree to find the max DMA address now.
>> 
>> If all you can see are two "dma-ranges" properties, how do you
>> propose to tell that one means "this is the extent of what I can
>> address, please set my masks and dma-range-map accordingly and try
>> to allocate things where I can reach them" while the other means
>> "take this output range away from the page allocator and hook it up
>> as my dedicated bounce pool, because it is Serious Security Time"?
>> Especially since getting that choice wrong either way would be a
>> Bad Thing.
> 
> Either we have some heuristic based on the size or we add some hint. 
> The point is let's build on what we already have for defining DMA 
> accessible memory in DT rather than some parallel mechanism.

The point I'm trying to bang home is that it's really not about the DMA 
accessibility, it's about the purpose of the memory itself. Even when 
DMA accessibility *is* relevant it's already implied by that purpose, 
from the point of view of the implementation. The only difference it 
might make is to the end user if they want to ascertain whether the 
presence of such a pool represents protection against an untrusted 
device or just some DMA optimisation tweak.

Robin.
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..44975e2a1fd2 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,20 @@  compatible (optional) - standard definition
           used as a shared pool of DMA buffers for a set of devices. It can
           be used by an operating system to instantiate the necessary pool
           management subsystem if necessary.
+        - restricted-dma-pool: This indicates a region of memory meant to be
+          used as a pool of restricted DMA buffers for a set of devices. The
+          memory region would be the only region accessible to those devices.
+          When using this, the no-map and reusable properties must not be set,
+          so the operating system can create a virtual mapping that will be used
+          for synchronization. The main purpose for restricted DMA is to
+          mitigate the lack of DMA access control on systems without an IOMMU,
+          which could result in the DMA accessing the system memory at
+          unexpected times and/or unexpected addresses, possibly leading to data
+          leakage or corruption. The feature on its own provides a basic level
+          of protection against the DMA overwriting buffer contents at
+          unexpected times. However, to protect against general data leakage and
+          system memory corruption, the system needs to provide way to restrict
+          the DMA to a predefined memory region.
         - vendor specific string in the form <vendor>,[<device>-]<usage>
 no-map (optional) - empty property
     - Indicates the operating system must not create a virtual mapping
@@ -120,6 +134,11 @@  one for multimedia processing (named multimedia-memory@77000000, 64MiB).
 			compatible = "acme,multimedia-memory";
 			reg = <0x77000000 0x4000000>;
 		};
+
+		restricted_dma_mem_reserved: restricted_dma_mem_reserved {
+			compatible = "restricted-dma-pool";
+			reg = <0x50000000 0x400000>;
+		};
 	};
 
 	/* ... */
@@ -138,4 +157,9 @@  one for multimedia processing (named multimedia-memory@77000000, 64MiB).
 		memory-region = <&multimedia_reserved>;
 		/* ... */
 	};
+
+	pcie_device: pcie_device@0,0 {
+		memory-region = <&restricted_dma_mem_reserved>;
+		/* ... */
+	};
 };