diff mbox

[v6,2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

Message ID 1446363977-23656-3-git-send-email-jungseoklee85@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jungseok Lee Nov. 1, 2015, 7:46 a.m. UTC
There is no room to adjust 'atom_size' now when a generic percpu area
is used. It would be redundant to write down an architecture-specific
setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
by default. The value could be updated if needed by architecture.

Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
---
 include/linux/percpu.h | 4 ++++
 mm/percpu.c            | 6 +++---
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Christoph Lameter (Ampere) Nov. 2, 2015, 4:10 p.m. UTC | #1
On Sun, 1 Nov 2015, Jungseok Lee wrote:

> There is no room to adjust 'atom_size' now when a generic percpu area
> is used. It would be redundant to write down an architecture-specific
> setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
> this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
> by default. The value could be updated if needed by architecture.

What is atom_size? Why would you want a difference allocation size here?
The percpu area is virtually mapped regardless. So you will have
contiguous addresses even without atom_size.
Catalin Marinas Nov. 2, 2015, 4:22 p.m. UTC | #2
On Mon, Nov 02, 2015 at 10:10:23AM -0600, Christoph Lameter wrote:
> On Sun, 1 Nov 2015, Jungseok Lee wrote:
> 
> > There is no room to adjust 'atom_size' now when a generic percpu area
> > is used. It would be redundant to write down an architecture-specific
> > setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
> > this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
> > by default. The value could be updated if needed by architecture.
> 
> What is atom_size? Why would you want a difference allocation size here?
> The percpu area is virtually mapped regardless. So you will have
> contiguous addresses even without atom_size.

I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
the approach (and the definition of PERCPU_ATOM_SIZE), therefore
rendering this patch unnecessary. IIUC, this is used to enforce some
alignment of the per-CPU IRQ stack to be able to check whether the
current stack is process or IRQ on exception entry. But there are other,
less intrusive ways to achieve the same (e.g. x86).
Christoph Lameter (Ampere) Nov. 2, 2015, 4:48 p.m. UTC | #3
On Mon, 2 Nov 2015, Catalin Marinas wrote:

> I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> rendering this patch unnecessary. IIUC, this is used to enforce some
> alignment of the per-CPU IRQ stack to be able to check whether the
> current stack is process or IRQ on exception entry. But there are other,
> less intrusive ways to achieve the same (e.g. x86).

The percpu allocator allows the specification of alignment requirements.
Catalin Marinas Nov. 2, 2015, 5:35 p.m. UTC | #4
On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote:
> On Mon, 2 Nov 2015, Catalin Marinas wrote:
> > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> > the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> > rendering this patch unnecessary. IIUC, this is used to enforce some
> > alignment of the per-CPU IRQ stack to be able to check whether the
> > current stack is process or IRQ on exception entry. But there are other,
> > less intrusive ways to achieve the same (e.g. x86).
> 
> The percpu allocator allows the specification of alignment requirements.

Patch 3/3 does something like this:

DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE)

where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't
guarantee alignment greater than PAGE_SIZE.
Christoph Lameter (Ampere) Nov. 2, 2015, 6:11 p.m. UTC | #5
On Mon, 2 Nov 2015, Catalin Marinas wrote:

> On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote:
> > On Mon, 2 Nov 2015, Catalin Marinas wrote:
> > > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> > > the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> > > rendering this patch unnecessary. IIUC, this is used to enforce some
> > > alignment of the per-CPU IRQ stack to be able to check whether the
> > > current stack is process or IRQ on exception entry. But there are other,
> > > less intrusive ways to achieve the same (e.g. x86).
> >
> > The percpu allocator allows the specification of alignment requirements.
>
> Patch 3/3 does something like this:
>
> DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE)
>
> where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't
> guarantee alignment greater than PAGE_SIZE.

And we cannot use percpu_alloc() instead? Aligning the whole of the percpu
area because one allocation requires it?
Catalin Marinas Nov. 2, 2015, 6:31 p.m. UTC | #6
On Mon, Nov 02, 2015 at 12:11:33PM -0600, Christoph Lameter wrote:
> On Mon, 2 Nov 2015, Catalin Marinas wrote:
> 
> > On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote:
> > > On Mon, 2 Nov 2015, Catalin Marinas wrote:
> > > > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> > > > the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> > > > rendering this patch unnecessary. IIUC, this is used to enforce some
> > > > alignment of the per-CPU IRQ stack to be able to check whether the
> > > > current stack is process or IRQ on exception entry. But there are other,
> > > > less intrusive ways to achieve the same (e.g. x86).
> > >
> > > The percpu allocator allows the specification of alignment requirements.
> >
> > Patch 3/3 does something like this:
> >
> > DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE)
> >
> > where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't
> > guarantee alignment greater than PAGE_SIZE.
> 
> And we cannot use percpu_alloc() instead? Aligning the whole of the percpu
> area because one allocation requires it?

I haven't tried but it seems that pcpu_alloc() has a WARN() when align >
PAGE_SIZE and it would fail.

As I said in a previous reply, I don't think this patch is necessary,
mainly because I don't particularly like the logic for detecting the IRQ
stack re-entrance based on the stack pointer alignment.
Jungseok Lee Nov. 3, 2015, 1:49 p.m. UTC | #7
On Nov 3, 2015, at 1:22 AM, Catalin Marinas wrote:

Hi Catalin,

> On Mon, Nov 02, 2015 at 10:10:23AM -0600, Christoph Lameter wrote:
>> On Sun, 1 Nov 2015, Jungseok Lee wrote:
>> 
>>> There is no room to adjust 'atom_size' now when a generic percpu area
>>> is used. It would be redundant to write down an architecture-specific
>>> setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
>>> this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
>>> by default. The value could be updated if needed by architecture.
>> 
>> What is atom_size? Why would you want a difference allocation size here?
>> The percpu area is virtually mapped regardless. So you will have
>> contiguous addresses even without atom_size.
> 
> I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> rendering this patch unnecessary. IIUC, this is used to enforce some
> alignment of the per-CPU IRQ stack to be able to check whether the
> current stack is process or IRQ on exception entry. But there are other,
> less intrusive ways to achieve the same (e.g. x86).

First of all, thanks for clarification!

That is why I chose the word, 'doubtable', in the cover letter. I will
give up this approach. I've been paranoid about "another pointer read"
which you mentioned [1] for over a week. This wrong idea is my conclusion
with respect to your feedback. I think I've failed to follow you here.

Most ideas came from x86 implementation when I started this work. v2, [2]
might be close to x86 approach. At that time, for IRQ re-entrance check,
count based method was used. But count was considered a redundant variable
since we have preempt_count. As a result, the top-bit comparison idea,
which is an origin of this IRQ_STACK_SIZE alignment, have taken the work,
re-entrance check. Like x86, if we pick up the count method, we could
achieve the goal without this unnecessary alignment. How about your opinon?

I copy and paste x86 code (arch/x86/entry/entry_64.S) for convenience. It has
a comment on why the redundancy is allowed.

----8<----
        .macro interrupt func
        cld
        ALLOC_PT_GPREGS_ON_STACK
        SAVE_C_REGS
        SAVE_EXTRA_REGS

        testb   $3, CS(%rsp)
        jz      1f

        /*
         * IRQ from user mode.  Switch to kernel gsbase and inform context
         * tracking that we're in kernel mode.
         */
        SWAPGS
#ifdef CONFIG_CONTEXT_TRACKING
        call enter_from_user_mode
#endif

1:
        /*
         * Save previous stack pointer, optionally switch to interrupt stack.
         * irq_count is used to check if a CPU is already on an interrupt stack
         * or not. While this is essentially redundant with preempt_count it is
         * a little cheaper to use a separate counter in the PDA (short of
         * moving irq_enter into assembly, which would be too much work)
         */
        movq    %rsp, %rdi
        incl    PER_CPU_VAR(irq_count)
        cmovzq  PER_CPU_VAR(irq_stack_ptr), %rsp
        pushq   %rdi
        /* We entered an interrupt context - irqs are off: */
        TRACE_IRQS_OFF

        call    \func   /* rdi points to pt_regs */
        .endm

        /*
         * The interrupt stubs push (~vector+0x80) onto the stack and
         * then jump to common_interrupt.
         */
        .p2align CONFIG_X86_L1_CACHE_SHIFT
common_interrupt:
        ASM_CLAC
        addq    $-0x80, (%rsp)                  /* Adjust vector to [-256, -1] range */
        interrupt do_IRQ
----8<----

Additionally, I've been thinking of do_softirq_own_stack() which is your
another comment [3]. Recently, I've realized there is possibility that
I misunderstood your intention. Did you mean that irq_handler hook is not
enough? Should do_softirq_own_stack() be implemented together? If so,
this is my another failure.. It perfectly makes sense.

I hope these are the last two pieces of this interesting feature.

Thanks again!

Best Regards
Jungseok Lee

[1] https://lkml.org/lkml/2015/10/19/596
[2] http://article.gmane.org/gmane.linux.kernel/2037257
[3] http://article.gmane.org/gmane.linux.kernel/2041877
Jungseok Lee Nov. 3, 2015, 2:11 p.m. UTC | #8
On Nov 3, 2015, at 1:10 AM, Christoph Lameter wrote:

Dear Christoph,

> On Sun, 1 Nov 2015, Jungseok Lee wrote:
> 
>> There is no room to adjust 'atom_size' now when a generic percpu area
>> is used. It would be redundant to write down an architecture-specific
>> setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
>> this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
>> by default. The value could be updated if needed by architecture.
> 
> What is atom_size? Why would you want a difference allocation size here?
> The percpu area is virtually mapped regardless. So you will have
> contiguous addresses even without atom_size.

I think Catalin have already written down a perfect explanation. I'd like
memory with an alignment greater than PAGE_SIZE. But, __per_cpu_offset[]
is PAGE_SIZE aligned under a generic setup_per_cpu_areas(). That is,
secondary cores cannot get that kind of space.

Thanks for taking a look at this doubtable change!

Best Regards
Jungseok Lee
James Morse Nov. 3, 2015, 5:58 p.m. UTC | #9
Hi Jungseok,

On 03/11/15 13:49, Jungseok Lee wrote:
> Additionally, I've been thinking of do_softirq_own_stack() which is your
> another comment [3]. Recently, I've realized there is possibility that
> I misunderstood your intention. Did you mean that irq_handler hook is not
> enough? Should do_softirq_own_stack() be implemented together?

I've been putting together a version to illustrate this, I aim to post it
before the end of this week...


Thanks,

James
Jungseok Lee Nov. 4, 2015, 1:35 p.m. UTC | #10
On Nov 4, 2015, at 2:58 AM, James Morse wrote:
> Hi Jungseok,

Hi James,

> On 03/11/15 13:49, Jungseok Lee wrote:
>> Additionally, I've been thinking of do_softirq_own_stack() which is your
>> another comment [3]. Recently, I've realized there is possibility that
>> I misunderstood your intention. Did you mean that irq_handler hook is not
>> enough? Should do_softirq_own_stack() be implemented together?
> 
> I've been putting together a version to illustrate this, I aim to post it
> before the end of this week…

It sounds great! I will wait for your version.

Best Regards
Jungseok Lee
diff mbox

Patch

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 4bc6daf..57a2f16 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -18,6 +18,10 @@ 
 #define PERCPU_MODULE_RESERVE		0
 #endif
 
+#ifndef PERCPU_ATOM_SIZE
+#define PERCPU_ATOM_SIZE		PAGE_SIZE
+#endif
+
 /* minimum unit size, also is the maximum supported allocation size */
 #define PCPU_MIN_UNIT_SIZE		PFN_ALIGN(32 << 10)
 
diff --git a/mm/percpu.c b/mm/percpu.c
index a63b4d8..cd1e0ec 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2201,8 +2201,8 @@  void __init setup_per_cpu_areas(void)
 	 * what the legacy allocator did.
 	 */
 	rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
-				    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE, NULL,
-				    pcpu_dfl_fc_alloc, pcpu_dfl_fc_free);
+				    PERCPU_DYNAMIC_RESERVE, PERCPU_ATOM_SIZE,
+				    NULL, pcpu_dfl_fc_alloc, pcpu_dfl_fc_free);
 	if (rc < 0)
 		panic("Failed to initialize percpu areas.");
 
@@ -2231,7 +2231,7 @@  void __init setup_per_cpu_areas(void)
 
 	ai = pcpu_alloc_alloc_info(1, 1);
 	fc = memblock_virt_alloc_from_nopanic(unit_size,
-					      PAGE_SIZE,
+					      PERCPU_ATOM_SIZE,
 					      __pa(MAX_DMA_ADDRESS));
 	if (!ai || !fc)
 		panic("Failed to allocate memory for percpu areas.");