mbox series

[RFC,v2,0/2] dmapool: Mitigate device-controllable mem. corruption

Message ID 20241119205529.3871048-1-bjohannesmeyer@gmail.com (mailing list archive)
Headers show
Series dmapool: Mitigate device-controllable mem. corruption | expand

Message

Brian Johannesmeyer Nov. 19, 2024, 8:55 p.m. UTC
We discovered a security-related issue in the DMA pool allocator.

V1 of our RFC was submitted to the Linux kernel security team. They
recommended submitting it to the relevant subsystem maintainers and the
hardening mailing list instead, as they did not consider this an explicit
security issue. Their rationale was that Linux implicitly assumes hardware
can be trusted.

**Threat Model**: While Linux drivers typically trust their hardware, there
may be specific drivers that do not operate under this assumption. Hence,
this threat model assumes a malicious peripheral device capable of
corrupting DMA data to exploit the kernel. In this scenario, the device
manipulates kernel-initialized data (similar to the attack described in the
Thunderclap paper [0]) to achieve arbitrary kernel memory corruption. 

**DMA pool background**. A DMA pool aims to reduce the overhead of DMA
allocations by creating a large DMA buffer --- the "pool" --- from which
smaller buffers are allocated as needed. Fundamentally, a DMA pool
functions like a heap: it is a structure composed of linked memory
"blocks", which, in this context, are DMA buffers. When a driver employs a
DMA pool, it grants the device access not only to these blocks but also to
the pointers linking them.

**Vulnerability**. Similar to traditional heap corruption vulnerabilities
--- where a malicious program corrupts heap metadata to e.g., hijack
control flow --- a malicious device may corrupt DMA pool metadata. This
corruption can trivially lead to arbitrary kernel memory corruption from
any driver that uses it. Indeed, because the DMA pool API is extensively
used, this vulnerability is not confined to a single instance. In fact,
every usage of the DMA pool API is potentially vulnerable. An exploit
proceeds with the following steps:

1. The DMA `pool` initializes its list of blocks, then points to the first
block.
2. The malicious device overwrites the first 8 bytes of the first block ---
which contain its `next_block` pointer --- to an arbitrary kernel address,
`kernel_addr`.
3. The driver makes its first call to `dma_pool_alloc()`, after which, the
pool should point to the second block. However, it instead points to
`kernel_addr`.
4. The driver again calls `dma_pool_alloc()`, which incorrectly returns
`kernel_addr`. Therefore, anytime the driver writes to this "block", it may
corrupt sensitive kernel data.

I have a PDF document that illustrates how these steps work. Please let me
know if you would like me to share it with you.

**Proposed mitigation**. To mitigate the corruption of DMA pool metadata
(i.e., the pointers linking the blocks), the metadata should be moved into
non-DMA memory, ensuring it cannot be altered by a device. I have included
a patch series that implements this change. Since I am not deeply familiar
with the DMA pool internals, I would appreciate any feedback on the
patches. I have tested the patches with the `DMAPOOL_TEST` test and my own
basic unit tests that ensure the DMA pool allocator is not vulnerable.

**Performance**. I evaluated the patch set's performance by running the
`DMAPOOL_TEST` test with `DMAPOOL_DEBUG` enabled and with/without the
patches applied. Here is its output *without* the patches applied:
```
dmapool test: size:16   align:16   blocks:8192 time:3194110
dmapool test: size:64   align:64   blocks:8192 time:4730440
dmapool test: size:256  align:256  blocks:8192 time:5489630
dmapool test: size:1024 align:1024 blocks:2048 time:517150
dmapool test: size:4096 align:4096 blocks:1024 time:399616
dmapool test: size:68   align:32   blocks:8192 time:6156527
```

And here is its output *with* the patches applied:
```
dmapool test: size:16   align:16   blocks:8192 time:3541031
dmapool test: size:64   align:64   blocks:8192 time:4227262
dmapool test: size:256  align:256  blocks:8192 time:4890273
dmapool test: size:1024 align:1024 blocks:2048 time:515775
dmapool test: size:4096 align:4096 blocks:1024 time:523096
dmapool test: size:68   align:32   blocks:8192 time:3450830
```

Based on my interpretation of the output, the patch set does not appear to
negatively impact performance. In fact, it shows slight performance
improvements in some tests (i.e., for sizes 64, 256, 1024, and 68).

I speculate that these performance gains may be due to improved spatial
locality of the `next_block` pointers. With the patches applied, the
`next_block` pointers are consistently spaced 24 bytes apart, matching the
new size of `struct dma_block`. Previously, the spacing between
`next_block` pointers depended on the block size, so for 1024-byte blocks,
the pointers were spaced 1024 bytes apart. However, I am still unsure why
the performance improvement for 68-byte blocks is so significant.

[0] Link: https://www.csl.sri.com/~neumann/ndss-iommu.pdf

Brian Johannesmeyer (2):
  dmapool: Move pool metadata into non-DMA memory
  dmapool: Use pool_find_block() in pool_block_err()

 mm/dmapool.c | 96 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 63 insertions(+), 33 deletions(-)

Comments

Greg KH Nov. 19, 2024, 10:14 p.m. UTC | #1
On Tue, Nov 19, 2024 at 09:55:27PM +0100, Brian Johannesmeyer wrote:
> We discovered a security-related issue in the DMA pool allocator.
> 
> V1 of our RFC was submitted to the Linux kernel security team. They
> recommended submitting it to the relevant subsystem maintainers and the
> hardening mailing list instead, as they did not consider this an explicit
> security issue. Their rationale was that Linux implicitly assumes hardware
> can be trusted.
> 
> **Threat Model**: While Linux drivers typically trust their hardware, there
> may be specific drivers that do not operate under this assumption. Hence,
> this threat model assumes a malicious peripheral device capable of
> corrupting DMA data to exploit the kernel. In this scenario, the device
> manipulates kernel-initialized data (similar to the attack described in the
> Thunderclap paper [0]) to achieve arbitrary kernel memory corruption. 
> 
> **DMA pool background**. A DMA pool aims to reduce the overhead of DMA
> allocations by creating a large DMA buffer --- the "pool" --- from which
> smaller buffers are allocated as needed. Fundamentally, a DMA pool
> functions like a heap: it is a structure composed of linked memory
> "blocks", which, in this context, are DMA buffers. When a driver employs a
> DMA pool, it grants the device access not only to these blocks but also to
> the pointers linking them.
> 
> **Vulnerability**. Similar to traditional heap corruption vulnerabilities
> --- where a malicious program corrupts heap metadata to e.g., hijack
> control flow --- a malicious device may corrupt DMA pool metadata. This
> corruption can trivially lead to arbitrary kernel memory corruption from
> any driver that uses it. Indeed, because the DMA pool API is extensively
> used, this vulnerability is not confined to a single instance. In fact,
> every usage of the DMA pool API is potentially vulnerable. An exploit
> proceeds with the following steps:
> 
> 1. The DMA `pool` initializes its list of blocks, then points to the first
> block.
> 2. The malicious device overwrites the first 8 bytes of the first block ---
> which contain its `next_block` pointer --- to an arbitrary kernel address,
> `kernel_addr`.
> 3. The driver makes its first call to `dma_pool_alloc()`, after which, the
> pool should point to the second block. However, it instead points to
> `kernel_addr`.
> 4. The driver again calls `dma_pool_alloc()`, which incorrectly returns
> `kernel_addr`. Therefore, anytime the driver writes to this "block", it may
> corrupt sensitive kernel data.
> 
> I have a PDF document that illustrates how these steps work. Please let me
> know if you would like me to share it with you.

I know I said it privately, but I'll say it here in public, very cool
finding, this is nice work!

> **Proposed mitigation**. To mitigate the corruption of DMA pool metadata
> (i.e., the pointers linking the blocks), the metadata should be moved into
> non-DMA memory, ensuring it cannot be altered by a device. I have included
> a patch series that implements this change. Since I am not deeply familiar
> with the DMA pool internals, I would appreciate any feedback on the
> patches. I have tested the patches with the `DMAPOOL_TEST` test and my own
> basic unit tests that ensure the DMA pool allocator is not vulnerable.
> 
> **Performance**. I evaluated the patch set's performance by running the
> `DMAPOOL_TEST` test with `DMAPOOL_DEBUG` enabled and with/without the
> patches applied. Here is its output *without* the patches applied:
> ```
> dmapool test: size:16   align:16   blocks:8192 time:3194110
> dmapool test: size:64   align:64   blocks:8192 time:4730440
> dmapool test: size:256  align:256  blocks:8192 time:5489630
> dmapool test: size:1024 align:1024 blocks:2048 time:517150
> dmapool test: size:4096 align:4096 blocks:1024 time:399616
> dmapool test: size:68   align:32   blocks:8192 time:6156527
> ```
> 
> And here is its output *with* the patches applied:
> ```
> dmapool test: size:16   align:16   blocks:8192 time:3541031
> dmapool test: size:64   align:64   blocks:8192 time:4227262
> dmapool test: size:256  align:256  blocks:8192 time:4890273
> dmapool test: size:1024 align:1024 blocks:2048 time:515775
> dmapool test: size:4096 align:4096 blocks:1024 time:523096
> dmapool test: size:68   align:32   blocks:8192 time:3450830
> ```

You had mentioned that the size:68 numbers were going to be re-run, has
that happened and this really is that much of a boost to that size?  Or
is this the original numbers?

thanks,

greg k-h
Brian Johannesmeyer Nov. 19, 2024, 10:22 p.m. UTC | #2
On Tue, Nov 19, 2024 at 3:15 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> I know I said it privately, but I'll say it here in public, very cool
> finding, this is nice work!

Thanks! I appreciate your earlier feedback as well.

> You had mentioned that the size:68 numbers were going to be re-run, has
> that happened and this really is that much of a boost to that size?  Or
> is this the original numbers?

I re-ran the test, and the numbers are consistent across multiple
runs. I’m also surprised by how significant the improvement is for the
68-byte block size.

Thanks,

Brian Johannesmeyer