Message ID | 1446363977-23656-3-git-send-email-jungseoklee85@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, 1 Nov 2015, Jungseok Lee wrote: > There is no room to adjust 'atom_size' now when a generic percpu area > is used. It would be redundant to write down an architecture-specific > setup_per_cpu_areas() in order to only change the 'atom_size'. Thus, > this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE > by default. The value could be updated if needed by architecture. What is atom_size? Why would you want a difference allocation size here? The percpu area is virtually mapped regardless. So you will have contiguous addresses even without atom_size.
On Mon, Nov 02, 2015 at 10:10:23AM -0600, Christoph Lameter wrote: > On Sun, 1 Nov 2015, Jungseok Lee wrote: > > > There is no room to adjust 'atom_size' now when a generic percpu area > > is used. It would be redundant to write down an architecture-specific > > setup_per_cpu_areas() in order to only change the 'atom_size'. Thus, > > this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE > > by default. The value could be updated if needed by architecture. > > What is atom_size? Why would you want a difference allocation size here? > The percpu area is virtually mapped regardless. So you will have > contiguous addresses even without atom_size. I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK the approach (and the definition of PERCPU_ATOM_SIZE), therefore rendering this patch unnecessary. IIUC, this is used to enforce some alignment of the per-CPU IRQ stack to be able to check whether the current stack is process or IRQ on exception entry. But there are other, less intrusive ways to achieve the same (e.g. x86).
On Mon, 2 Nov 2015, Catalin Marinas wrote: > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK > the approach (and the definition of PERCPU_ATOM_SIZE), therefore > rendering this patch unnecessary. IIUC, this is used to enforce some > alignment of the per-CPU IRQ stack to be able to check whether the > current stack is process or IRQ on exception entry. But there are other, > less intrusive ways to achieve the same (e.g. x86). The percpu allocator allows the specification of alignment requirements.
On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote: > On Mon, 2 Nov 2015, Catalin Marinas wrote: > > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK > > the approach (and the definition of PERCPU_ATOM_SIZE), therefore > > rendering this patch unnecessary. IIUC, this is used to enforce some > > alignment of the per-CPU IRQ stack to be able to check whether the > > current stack is process or IRQ on exception entry. But there are other, > > less intrusive ways to achieve the same (e.g. x86). > > The percpu allocator allows the specification of alignment requirements. Patch 3/3 does something like this: DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE) where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't guarantee alignment greater than PAGE_SIZE.
On Mon, 2 Nov 2015, Catalin Marinas wrote: > On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote: > > On Mon, 2 Nov 2015, Catalin Marinas wrote: > > > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK > > > the approach (and the definition of PERCPU_ATOM_SIZE), therefore > > > rendering this patch unnecessary. IIUC, this is used to enforce some > > > alignment of the per-CPU IRQ stack to be able to check whether the > > > current stack is process or IRQ on exception entry. But there are other, > > > less intrusive ways to achieve the same (e.g. x86). > > > > The percpu allocator allows the specification of alignment requirements. > > Patch 3/3 does something like this: > > DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE) > > where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't > guarantee alignment greater than PAGE_SIZE. And we cannot use percpu_alloc() instead? Aligning the whole of the percpu area because one allocation requires it?
On Mon, Nov 02, 2015 at 12:11:33PM -0600, Christoph Lameter wrote: > On Mon, 2 Nov 2015, Catalin Marinas wrote: > > > On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote: > > > On Mon, 2 Nov 2015, Catalin Marinas wrote: > > > > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK > > > > the approach (and the definition of PERCPU_ATOM_SIZE), therefore > > > > rendering this patch unnecessary. IIUC, this is used to enforce some > > > > alignment of the per-CPU IRQ stack to be able to check whether the > > > > current stack is process or IRQ on exception entry. But there are other, > > > > less intrusive ways to achieve the same (e.g. x86). > > > > > > The percpu allocator allows the specification of alignment requirements. > > > > Patch 3/3 does something like this: > > > > DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE) > > > > where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't > > guarantee alignment greater than PAGE_SIZE. > > And we cannot use percpu_alloc() instead? Aligning the whole of the percpu > area because one allocation requires it? I haven't tried but it seems that pcpu_alloc() has a WARN() when align > PAGE_SIZE and it would fail. As I said in a previous reply, I don't think this patch is necessary, mainly because I don't particularly like the logic for detecting the IRQ stack re-entrance based on the stack pointer alignment.
On Nov 3, 2015, at 1:22 AM, Catalin Marinas wrote: Hi Catalin, > On Mon, Nov 02, 2015 at 10:10:23AM -0600, Christoph Lameter wrote: >> On Sun, 1 Nov 2015, Jungseok Lee wrote: >> >>> There is no room to adjust 'atom_size' now when a generic percpu area >>> is used. It would be redundant to write down an architecture-specific >>> setup_per_cpu_areas() in order to only change the 'atom_size'. Thus, >>> this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE >>> by default. The value could be updated if needed by architecture. >> >> What is atom_size? Why would you want a difference allocation size here? >> The percpu area is virtually mapped regardless. So you will have >> contiguous addresses even without atom_size. > > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK > the approach (and the definition of PERCPU_ATOM_SIZE), therefore > rendering this patch unnecessary. IIUC, this is used to enforce some > alignment of the per-CPU IRQ stack to be able to check whether the > current stack is process or IRQ on exception entry. But there are other, > less intrusive ways to achieve the same (e.g. x86). First of all, thanks for clarification! That is why I chose the word, 'doubtable', in the cover letter. I will give up this approach. I've been paranoid about "another pointer read" which you mentioned [1] for over a week. This wrong idea is my conclusion with respect to your feedback. I think I've failed to follow you here. Most ideas came from x86 implementation when I started this work. v2, [2] might be close to x86 approach. At that time, for IRQ re-entrance check, count based method was used. But count was considered a redundant variable since we have preempt_count. As a result, the top-bit comparison idea, which is an origin of this IRQ_STACK_SIZE alignment, have taken the work, re-entrance check. Like x86, if we pick up the count method, we could achieve the goal without this unnecessary alignment. How about your opinon? I copy and paste x86 code (arch/x86/entry/entry_64.S) for convenience. It has a comment on why the redundancy is allowed. ----8<---- .macro interrupt func cld ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS testb $3, CS(%rsp) jz 1f /* * IRQ from user mode. Switch to kernel gsbase and inform context * tracking that we're in kernel mode. */ SWAPGS #ifdef CONFIG_CONTEXT_TRACKING call enter_from_user_mode #endif 1: /* * Save previous stack pointer, optionally switch to interrupt stack. * irq_count is used to check if a CPU is already on an interrupt stack * or not. While this is essentially redundant with preempt_count it is * a little cheaper to use a separate counter in the PDA (short of * moving irq_enter into assembly, which would be too much work) */ movq %rsp, %rdi incl PER_CPU_VAR(irq_count) cmovzq PER_CPU_VAR(irq_stack_ptr), %rsp pushq %rdi /* We entered an interrupt context - irqs are off: */ TRACE_IRQS_OFF call \func /* rdi points to pt_regs */ .endm /* * The interrupt stubs push (~vector+0x80) onto the stack and * then jump to common_interrupt. */ .p2align CONFIG_X86_L1_CACHE_SHIFT common_interrupt: ASM_CLAC addq $-0x80, (%rsp) /* Adjust vector to [-256, -1] range */ interrupt do_IRQ ----8<---- Additionally, I've been thinking of do_softirq_own_stack() which is your another comment [3]. Recently, I've realized there is possibility that I misunderstood your intention. Did you mean that irq_handler hook is not enough? Should do_softirq_own_stack() be implemented together? If so, this is my another failure.. It perfectly makes sense. I hope these are the last two pieces of this interesting feature. Thanks again! Best Regards Jungseok Lee [1] https://lkml.org/lkml/2015/10/19/596 [2] http://article.gmane.org/gmane.linux.kernel/2037257 [3] http://article.gmane.org/gmane.linux.kernel/2041877
On Nov 3, 2015, at 1:10 AM, Christoph Lameter wrote: Dear Christoph, > On Sun, 1 Nov 2015, Jungseok Lee wrote: > >> There is no room to adjust 'atom_size' now when a generic percpu area >> is used. It would be redundant to write down an architecture-specific >> setup_per_cpu_areas() in order to only change the 'atom_size'. Thus, >> this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE >> by default. The value could be updated if needed by architecture. > > What is atom_size? Why would you want a difference allocation size here? > The percpu area is virtually mapped regardless. So you will have > contiguous addresses even without atom_size. I think Catalin have already written down a perfect explanation. I'd like memory with an alignment greater than PAGE_SIZE. But, __per_cpu_offset[] is PAGE_SIZE aligned under a generic setup_per_cpu_areas(). That is, secondary cores cannot get that kind of space. Thanks for taking a look at this doubtable change! Best Regards Jungseok Lee
Hi Jungseok, On 03/11/15 13:49, Jungseok Lee wrote: > Additionally, I've been thinking of do_softirq_own_stack() which is your > another comment [3]. Recently, I've realized there is possibility that > I misunderstood your intention. Did you mean that irq_handler hook is not > enough? Should do_softirq_own_stack() be implemented together? I've been putting together a version to illustrate this, I aim to post it before the end of this week... Thanks, James
On Nov 4, 2015, at 2:58 AM, James Morse wrote: > Hi Jungseok, Hi James, > On 03/11/15 13:49, Jungseok Lee wrote: >> Additionally, I've been thinking of do_softirq_own_stack() which is your >> another comment [3]. Recently, I've realized there is possibility that >> I misunderstood your intention. Did you mean that irq_handler hook is not >> enough? Should do_softirq_own_stack() be implemented together? > > I've been putting together a version to illustrate this, I aim to post it > before the end of this week… It sounds great! I will wait for your version. Best Regards Jungseok Lee
diff --git a/include/linux/percpu.h b/include/linux/percpu.h index 4bc6daf..57a2f16 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -18,6 +18,10 @@ #define PERCPU_MODULE_RESERVE 0 #endif +#ifndef PERCPU_ATOM_SIZE +#define PERCPU_ATOM_SIZE PAGE_SIZE +#endif + /* minimum unit size, also is the maximum supported allocation size */ #define PCPU_MIN_UNIT_SIZE PFN_ALIGN(32 << 10) diff --git a/mm/percpu.c b/mm/percpu.c index a63b4d8..cd1e0ec 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -2201,8 +2201,8 @@ void __init setup_per_cpu_areas(void) * what the legacy allocator did. */ rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE, - PERCPU_DYNAMIC_RESERVE, PAGE_SIZE, NULL, - pcpu_dfl_fc_alloc, pcpu_dfl_fc_free); + PERCPU_DYNAMIC_RESERVE, PERCPU_ATOM_SIZE, + NULL, pcpu_dfl_fc_alloc, pcpu_dfl_fc_free); if (rc < 0) panic("Failed to initialize percpu areas."); @@ -2231,7 +2231,7 @@ void __init setup_per_cpu_areas(void) ai = pcpu_alloc_alloc_info(1, 1); fc = memblock_virt_alloc_from_nopanic(unit_size, - PAGE_SIZE, + PERCPU_ATOM_SIZE, __pa(MAX_DMA_ADDRESS)); if (!ai || !fc) panic("Failed to allocate memory for percpu areas.");
There is no room to adjust 'atom_size' now when a generic percpu area is used. It would be redundant to write down an architecture-specific setup_per_cpu_areas() in order to only change the 'atom_size'. Thus, this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE by default. The value could be updated if needed by architecture. Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com> --- include/linux/percpu.h | 4 ++++ mm/percpu.c | 6 +++--- 2 files changed, 7 insertions(+), 3 deletions(-)