mbox series

[RFC,0/2] dma-pool: allow user to disable atomic pool

Message ID 20210624052010.5676-1-bhe@redhat.com (mailing list archive)
Headers show
Series dma-pool: allow user to disable atomic pool | expand

Message

Baoquan He June 24, 2021, 5:20 a.m. UTC
On x86_64, when crash triggered, an OOM can always be observed in kdump
kernel as below:

~~~~~~~~~
 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ? _raw_spin_unlock_irq+0x24/0x40
  ? __alloc_pages_direct_compact+0x90/0x1b0
  __alloc_pages_slowpath.constprop.0+0xf29/0xf50
  ? __cond_resched+0x16/0x50
  ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
  __alloc_pages+0x24d/0x2c0
  ? __dma_atomic_pool_init+0x93/0x93
  alloc_page_interleave+0x13/0xb0
  atomic_pool_expand+0x118/0x210
  ? __dma_atomic_pool_init+0x93/0x93
  __dma_atomic_pool_init+0x45/0x93
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ......
 DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
 DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
~~~~~~~~~~~

This OOM can be noticed since commit f1d4d47c5851 ("x86/setup: Always
reserve the first 1M of RAM") is merged. The root cause is there's no
available memory in DMA zone in kdump kernel after commit f1d4d47c5851.

In the current atomic pool implementation, there are three atomic mem
pools: atomic_pool_kernel, atomic_pool_dma, atomic_pool_dma32, initialized
with flag GFP_KERNEL, GFP_KERNEL|GFP_DMA, GFP_KERNEL|GFP_DMA32. On x86_64,
normal kernel can allocate all three of them. While, kdump kernel can't
satisfy atomic_pool_dma initialization because there's only low-1M present
for DMA zone, and locked in commit f1d4d47c5851 so that the low-1M won't be
put in buddy allocator.

The atomic pool is generic, and enabled always no matter if
coherent_pool is specify in kernel cmdline or not. Seems the always enabling
of atomic pool is required by AMD MEM ENCRYPTION if CONFIG_DMA_DIRECT_REMAP
is not set, even though the system is non-AMD cpu, or non-x86 ARCHes.
AFAIK, SME requires swiotlb by default. Not sure if atomic has to be
provided, can we disable it in some cases, e.g in kdump kernel?

In this RFC patch, I change the current coherent_pool kernel parameter
to make it allow user to disable atomic pool if not needed with
coherent_pool=0.

If enabling atomic pool is mandatory for SME, maybe we can adjust and
add kernel parameter like, coherent_pool= to specify which pool is
needed, coherent_pool_size= to specify the initialization size: 
coherent_pool= (bit0:kernel, bit1: dma,  bit2:dma32, 
coherent_pool_size= size (range from 128K to 4M).

Any comment or suggestion is appreciated.

Baoquan He (2):
  docs: kernel-parameters: Update to reflect the current default size of
    atomic pool
  dma-pool: allow user to disable atomic pool

 Documentation/admin-guide/kernel-parameters.txt | 4 +++-
 kernel/dma/pool.c                               | 7 +++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig June 24, 2021, 7:40 a.m. UTC | #1
So reduce the amount allocated.  But the pool is needed for proper
operation on systems with memory encryption.  And please add the right
maintainer or at least mailing list for the code you're touching next
time.
Baoquan He June 24, 2021, 9:29 a.m. UTC | #2
On 06/24/21 at 08:40am, Christoph Hellwig wrote:
> So reduce the amount allocated.  But the pool is needed for proper
> operation on systems with memory encryption.  And please add the right
> maintainer or at least mailing list for the code you're touching next
> time.

Oh, I thoutht it's memory issue only, should have run
./scripts/get_maintainer.pl. sorry.

About reducing the amount allocated, it may not help. Because on x86_64,
kdump kernel doesn't put any page of memory into buddy allocator of DMA
zone. Means it will defenitely OOM for atomic_pool_dma initialization.

Wondering in which case or on which device the atomic pool is needed on
AMD system with mem encrytion enabled. As we can see, the OOM will
happen too in kdump kernel on Intel system, even though it's not
necessary.

Thanks
Baoquan
Robin Murphy June 24, 2021, 10:47 a.m. UTC | #3
On 2021-06-24 10:29, Baoquan He wrote:
> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
>> So reduce the amount allocated.  But the pool is needed for proper
>> operation on systems with memory encryption.  And please add the right
>> maintainer or at least mailing list for the code you're touching next
>> time.
> 
> Oh, I thoutht it's memory issue only, should have run
> ./scripts/get_maintainer.pl. sorry.
> 
> About reducing the amount allocated, it may not help. Because on x86_64,
> kdump kernel doesn't put any page of memory into buddy allocator of DMA
> zone. Means it will defenitely OOM for atomic_pool_dma initialization.
> 
> Wondering in which case or on which device the atomic pool is needed on
> AMD system with mem encrytion enabled. As we can see, the OOM will
> happen too in kdump kernel on Intel system, even though it's not
> necessary.

Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle 
here. For DMA_DIRECT_REMAP=y we can assume an atomic pool is always 
needed, since that was the original behaviour anyway. However the 
implications of AMD_MEM_ENCRYPT=y are different - even if support is 
enabled, it still should only be relevant if mem_encrypt_active(), so it 
probably does make sense to have an additional runtime gate on that.

 From a quick scan, use of dma_alloc_from_pool() already depends on 
force_dma_unencrypted() so that's probably fine already, but I think 
we'd need a bit of extra protection around dma_free_from_pool() to 
prevent gen_pool_has_addr() dereferencing NULL if the pools are 
uninitialised, even with your proposed patch as it is. Presumably 
nothing actually called dma_direct_free() when you tested this?

Robin.
Christoph Hellwig June 24, 2021, 12:10 p.m. UTC | #4
On Thu, Jun 24, 2021 at 11:47:31AM +0100, Robin Murphy wrote:
> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
> that was the original behaviour anyway. However the implications of
> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
> should only be relevant if mem_encrypt_active(), so it probably does make
> sense to have an additional runtime gate on that.

Yeah, a check for that would probably be useful.
Baoquan He Aug. 5, 2021, 6:54 a.m. UTC | #5
On 06/24/21 at 11:47am, Robin Murphy wrote:
> On 2021-06-24 10:29, Baoquan He wrote:
> > On 06/24/21 at 08:40am, Christoph Hellwig wrote:
> > > So reduce the amount allocated.  But the pool is needed for proper
> > > operation on systems with memory encryption.  And please add the right
> > > maintainer or at least mailing list for the code you're touching next
> > > time.
> > 
> > Oh, I thoutht it's memory issue only, should have run
> > ./scripts/get_maintainer.pl. sorry.
> > 
> > About reducing the amount allocated, it may not help. Because on x86_64,
> > kdump kernel doesn't put any page of memory into buddy allocator of DMA
> > zone. Means it will defenitely OOM for atomic_pool_dma initialization.
> > 
> > Wondering in which case or on which device the atomic pool is needed on
> > AMD system with mem encrytion enabled. As we can see, the OOM will
> > happen too in kdump kernel on Intel system, even though it's not
> > necessary.

Sorry for very late response, and thank both for your comments.

> 
> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
> that was the original behaviour anyway. However the implications of
> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
> should only be relevant if mem_encrypt_active(), so it probably does make
> sense to have an additional runtime gate on that.

> 
> From a quick scan, use of dma_alloc_from_pool() already depends on
> force_dma_unencrypted() so that's probably fine already, but I think we'd
> need a bit of extra protection around dma_free_from_pool() to prevent
> gen_pool_has_addr() dereferencing NULL if the pools are uninitialised, even
> with your proposed patch as it is. Presumably nothing actually called
> dma_direct_free() when you tested this?

Yes, enforcing the conditional check of force_dma_unencrypted() around
dma_free_from_pool sounds reasonable, just as we have done in
dma_alloc_from_pool().

I have tested this patchset on normal x86_64 systems and one amd system
with SME support, disabling atomic pool can fix the issue that there's no
managed pages in dma zone then requesting page from dma zone will cause
allocation failure. And even disabling atomic pool in 1st kernel didn't
cause any problem on one AMD EPYC system which supports SME. I am not
expert of DMA area, wondering how atomic pool is supposed to do in
SME/SEV system. 

Besides, even though atomic pool is disabled, slub page for allocation
of dma-kmalloc also triggers page allocation failure. So I change to
take another way to fix them, please check v2 post. The atomic pool
disabling an be a good to have change.

Thanks
Baoquan
Tom Lendacky Aug. 10, 2021, 8:52 p.m. UTC | #6
On 8/5/21 1:54 AM, Baoquan He wrote:
> On 06/24/21 at 11:47am, Robin Murphy wrote:
>> On 2021-06-24 10:29, Baoquan He wrote:
>>> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
>>>> So reduce the amount allocated.  But the pool is needed for proper
>>>> operation on systems with memory encryption.  And please add the right
>>>> maintainer or at least mailing list for the code you're touching next
>>>> time.
>>>
>>> Oh, I thoutht it's memory issue only, should have run
>>> ./scripts/get_maintainer.pl. sorry.
>>>
>>> About reducing the amount allocated, it may not help. Because on x86_64,
>>> kdump kernel doesn't put any page of memory into buddy allocator of DMA
>>> zone. Means it will defenitely OOM for atomic_pool_dma initialization.
>>>
>>> Wondering in which case or on which device the atomic pool is needed on
>>> AMD system with mem encrytion enabled. As we can see, the OOM will
>>> happen too in kdump kernel on Intel system, even though it's not
>>> necessary.
> 
> Sorry for very late response, and thank both for your comments.
> 
>>
>> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
>> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
>> that was the original behaviour anyway. However the implications of
>> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
>> should only be relevant if mem_encrypt_active(), so it probably does make
>> sense to have an additional runtime gate on that.
> 
>>
>> From a quick scan, use of dma_alloc_from_pool() already depends on
>> force_dma_unencrypted() so that's probably fine already, but I think we'd
>> need a bit of extra protection around dma_free_from_pool() to prevent
>> gen_pool_has_addr() dereferencing NULL if the pools are uninitialised, even
>> with your proposed patch as it is. Presumably nothing actually called
>> dma_direct_free() when you tested this?
> 
> Yes, enforcing the conditional check of force_dma_unencrypted() around
> dma_free_from_pool sounds reasonable, just as we have done in
> dma_alloc_from_pool().
> 
> I have tested this patchset on normal x86_64 systems and one amd system
> with SME support, disabling atomic pool can fix the issue that there's no
> managed pages in dma zone then requesting page from dma zone will cause
> allocation failure. And even disabling atomic pool in 1st kernel didn't
> cause any problem on one AMD EPYC system which supports SME. I am not
> expert of DMA area, wondering how atomic pool is supposed to do in
> SME/SEV system. 

I think the atomic pool is used by the NVMe driver. My understanding is
that driver will do a dma_alloc_coherent() from interrupt context, so it
needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
would perform a set_memory_decrypted() call, which can sleep. The pool
eliminates that issue (David can correct me if I got that wrong).

Thanks,
Tom

> 
> Besides, even though atomic pool is disabled, slub page for allocation
> of dma-kmalloc also triggers page allocation failure. So I change to
> take another way to fix them, please check v2 post. The atomic pool
> disabling an be a good to have change.
> 
> Thanks
> Baoquan
>
Baoquan He Aug. 11, 2021, 2:23 a.m. UTC | #7
On 08/10/21 at 03:52pm, Tom Lendacky wrote:
> On 8/5/21 1:54 AM, Baoquan He wrote:
> > On 06/24/21 at 11:47am, Robin Murphy wrote:
> >> On 2021-06-24 10:29, Baoquan He wrote:
> >>> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
> >>>> So reduce the amount allocated.  But the pool is needed for proper
> >>>> operation on systems with memory encryption.  And please add the right
> >>>> maintainer or at least mailing list for the code you're touching next
> >>>> time.
> >>>
> >>> Oh, I thoutht it's memory issue only, should have run
> >>> ./scripts/get_maintainer.pl. sorry.
> >>>
> >>> About reducing the amount allocated, it may not help. Because on x86_64,
> >>> kdump kernel doesn't put any page of memory into buddy allocator of DMA
> >>> zone. Means it will defenitely OOM for atomic_pool_dma initialization.
> >>>
> >>> Wondering in which case or on which device the atomic pool is needed on
> >>> AMD system with mem encrytion enabled. As we can see, the OOM will
> >>> happen too in kdump kernel on Intel system, even though it's not
> >>> necessary.
> > 
> > Sorry for very late response, and thank both for your comments.
> > 
> >>
> >> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
> >> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
> >> that was the original behaviour anyway. However the implications of
> >> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
> >> should only be relevant if mem_encrypt_active(), so it probably does make
> >> sense to have an additional runtime gate on that.
> > 
> >>
> >> From a quick scan, use of dma_alloc_from_pool() already depends on
> >> force_dma_unencrypted() so that's probably fine already, but I think we'd
> >> need a bit of extra protection around dma_free_from_pool() to prevent
> >> gen_pool_has_addr() dereferencing NULL if the pools are uninitialised, even
> >> with your proposed patch as it is. Presumably nothing actually called
> >> dma_direct_free() when you tested this?
> > 
> > Yes, enforcing the conditional check of force_dma_unencrypted() around
> > dma_free_from_pool sounds reasonable, just as we have done in
> > dma_alloc_from_pool().
> > 
> > I have tested this patchset on normal x86_64 systems and one amd system
> > with SME support, disabling atomic pool can fix the issue that there's no
> > managed pages in dma zone then requesting page from dma zone will cause
> > allocation failure. And even disabling atomic pool in 1st kernel didn't
> > cause any problem on one AMD EPYC system which supports SME. I am not
> > expert of DMA area, wondering how atomic pool is supposed to do in
> > SME/SEV system. 
> 
> I think the atomic pool is used by the NVMe driver. My understanding is
> that driver will do a dma_alloc_coherent() from interrupt context, so it
> needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
> would perform a set_memory_decrypted() call, which can sleep. The pool
> eliminates that issue (David can correct me if I got that wrong).

Thanks for the information.

While from the code comment around which atomic pool is requested,
on amd system it's used to satisfy decrypting memory because that
can't block. Combined with your saying, it's because NVMe driver
need decrypted memory on AMD system? 

void *dma_direct_alloc(struct device *dev, size_t size,
                dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
	......
        /*
         * Remapping or decrypting memory may block. If either is required and
         * we can't block, allocate the memory from the atomic pools.
         */
        if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
            !gfpflags_allow_blocking(gfp) &&
            (force_dma_unencrypted(dev) ||
             (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev))))
                return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
	......
}

Looking at the those related commits, the below one from David tells 
that atomic dma pool is used when device require non-blocking and
unencrypted buffer. When I checked the system I borrowed, it's AMD EYPC
and SME is enabled. And it has many pci devices, as you can see, its 'ls
pci' outputs 113 lines. But disabling the three atomic pools didn't
trigger any error on that AMD system. Does it mean only specific devices
need this atomic pool in SME/SEV enabling case? Should we add more
details in document or code comment to make clear this? 

commit 82fef0ad811fb5976cf36ccc3d2c3bc0195dfb72
Author: David Rientjes <rientjes@google.com>
Date:   Tue Apr 14 17:05:01 2020 -0700

    x86/mm: unencrypted non-blocking DMA allocations use coherent pools
    
    When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted
    DMA, all non-blocking allocations must originate from the atomic DMA
    coherent pools.
    
    Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT.
    
    Signed-off-by: David Rientjes <rientjes@google.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d6104ea8af0..2bf2222819d3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS
 config AMD_MEM_ENCRYPT
        bool "AMD Secure Memory Encryption (SME) support"
        depends on X86_64 && CPU_SUP_AMD
+       select DMA_COHERENT_POOL
        select DYNAMIC_PHYSICAL_MASK
        select ARCH_USE_MEMREMAP_PROT
        select ARCH_HAS_FORCE_DMA_UNENCRYPTED


[~]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):           1
NUMA node(s):        4
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      AMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7351P 16-Core Processor
BIOS Model name:     AMD EPYC 7351P 16-Core Processor               
Stepping:            2
CPU MHz:             2395.439
BogoMIPS:            4790.87
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K
......

[~]# lspci| wc -l
113

> 
> 
> > 
> > Besides, even though atomic pool is disabled, slub page for allocation
> > of dma-kmalloc also triggers page allocation failure. So I change to
> > take another way to fix them, please check v2 post. The atomic pool
> > disabling an be a good to have change.
> > 
> > Thanks
> > Baoquan
> > 
>
Christoph Hellwig Aug. 11, 2021, 5:52 a.m. UTC | #8
On Tue, Aug 10, 2021 at 03:52:25PM -0500, Tom Lendacky via iommu wrote:
> I think the atomic pool is used by the NVMe driver. My understanding is
> that driver will do a dma_alloc_coherent() from interrupt context, so it
> needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
> would perform a set_memory_decrypted() call, which can sleep. The pool
> eliminates that issue (David can correct me if I got that wrong).

Not just the NVMe driver.  We have plenty of drivers doing that, just
do a quick grep for dma_alloc_* dma_poll_alloc, dma_pool_zalloc with
GFP_ATOMIC (and that won't even find multi-line strings).
Tom Lendacky Aug. 11, 2021, 1:46 p.m. UTC | #9
On 8/10/21 9:23 PM, Baoquan He wrote:
> On 08/10/21 at 03:52pm, Tom Lendacky wrote:
>> On 8/5/21 1:54 AM, Baoquan He wrote:
>>> On 06/24/21 at 11:47am, Robin Murphy wrote:
>>>> On 2021-06-24 10:29, Baoquan He wrote:
>>>>> On 06/24/21 at 08:40am, Christoph Hellwig wrote:

...

> Looking at the those related commits, the below one from David tells 
> that atomic dma pool is used when device require non-blocking and
> unencrypted buffer. When I checked the system I borrowed, it's AMD EYPC
> and SME is enabled. And it has many pci devices, as you can see, its 'ls
> pci' outputs 113 lines. But disabling the three atomic pools didn't
> trigger any error on that AMD system. Does it mean only specific devices
> need this atomic pool in SME/SEV enabling case? Should we add more
> details in document or code comment to make clear this? 

It very well could be just the devices being used. Under SME (bare metal),
if a device supports 64-bit DMA, then bounce buffers aren't used and the
DMA can be performed directly to encrypted memory, so there is no need to
issue a set_memory_decrypted() call, so I would assume it likely isn't
using the pool.

Under SEV, however, all DMA has to go through guest un-encrypted memory.
If you pass through a device that does dma_alloc_coherent() calls with
GFP_ATOMIC, then the pool will be needed.

Thanks,
Tom