diff mbox

[RFC] ARM: mm: disable kmap_high_get() for SMP

Message ID 1362372667-953-1-git-send-email-iamjoonsoo.kim@lge.com (mailing list archive)
State New, archived
Headers show

Commit Message

Joonsoo Kim March 4, 2013, 4:51 a.m. UTC
With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
sequential ordered, because kmap_high_get() use global kmap_lock().
It is not welcome situation, so turn off this optimization for SMP.

Cc: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Comments

Nicolas Pitre March 5, 2013, 9:36 a.m. UTC | #1
On Mon, 4 Mar 2013, Joonsoo Kim wrote:

> With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
> sequential ordered, because kmap_high_get() use global kmap_lock().
> It is not welcome situation, so turn off this optimization for SMP.

I'm not sure I understand the problem.

The lock taken by kmap_high_get() is released right away before that 
function returns and therefore this is not actually serializing 
anything.


Nicolas
Joonsoo Kim March 7, 2013, 8:12 a.m. UTC | #2
Hello, Nicolas.

On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote:
> On Mon, 4 Mar 2013, Joonsoo Kim wrote:
> 
> > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
> > sequential ordered, because kmap_high_get() use global kmap_lock().
> > It is not welcome situation, so turn off this optimization for SMP.
> 
> I'm not sure I understand the problem.
> 
> The lock taken by kmap_high_get() is released right away before that 
> function returns and therefore this is not actually serializing 
> anything.

Yes, you understand what I want to say correctly.
Sorry for bad explanation.

Following is reasons why I send this patch with RFC tag.

If we have more cpus, performance degration is possible although
it is very short time to holding the lock in kmap_high_get().

And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices 
has 2G memory(highmem 1G>), so probability for finding matched entry
is approximately < 1/512. This probability can be more decreasing
for device which have more memory. So I think that waste time to find
matched entry is more than saved time.

Above is my humble opinion, so please let me know what I am missing.

Thanks.

> 
> 
> Nicolas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Nicolas Pitre March 7, 2013, 9:36 a.m. UTC | #3
On Thu, 7 Mar 2013, Joonsoo Kim wrote:

> Hello, Nicolas.
> 
> On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote:
> > On Mon, 4 Mar 2013, Joonsoo Kim wrote:
> > 
> > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
> > > sequential ordered, because kmap_high_get() use global kmap_lock().
> > > It is not welcome situation, so turn off this optimization for SMP.
> > 
> > I'm not sure I understand the problem.
> > 
> > The lock taken by kmap_high_get() is released right away before that 
> > function returns and therefore this is not actually serializing 
> > anything.
> 
> Yes, you understand what I want to say correctly.
> Sorry for bad explanation.
> 
> Following is reasons why I send this patch with RFC tag.
> 
> If we have more cpus, performance degration is possible although
> it is very short time to holding the lock in kmap_high_get().
> 
> And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices 
> has 2G memory(highmem 1G>), so probability for finding matched entry
> is approximately < 1/512. This probability can be more decreasing
> for device which have more memory. So I think that waste time to find
> matched entry is more than saved time.
> 
> Above is my humble opinion, so please let me know what I am missing.

Please look at the kmap_high_get() code again.  It performs no 
searching at all.  What it does is:

- lock the kmap array against concurrent changes

- if the given page is not highmem, unlock and return NULL

- otherwise increment that page reference count, unlock, and return the 
  mapped address for that page.

There is almost zero cost to this function, independently of the number 
of kmap entries, whereas it does save much bigger costs elsewhere when 
it is successful.


Nicolas
Joonsoo Kim March 7, 2013, 10:35 a.m. UTC | #4
2013/3/7 Nicolas Pitre <nicolas.pitre@linaro.org>:
> On Thu, 7 Mar 2013, Joonsoo Kim wrote:
>
>> Hello, Nicolas.
>>
>> On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote:
>> > On Mon, 4 Mar 2013, Joonsoo Kim wrote:
>> >
>> > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
>> > > sequential ordered, because kmap_high_get() use global kmap_lock().
>> > > It is not welcome situation, so turn off this optimization for SMP.
>> >
>> > I'm not sure I understand the problem.
>> >
>> > The lock taken by kmap_high_get() is released right away before that
>> > function returns and therefore this is not actually serializing
>> > anything.
>>
>> Yes, you understand what I want to say correctly.
>> Sorry for bad explanation.
>>
>> Following is reasons why I send this patch with RFC tag.
>>
>> If we have more cpus, performance degration is possible although
>> it is very short time to holding the lock in kmap_high_get().
>>
>> And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices
>> has 2G memory(highmem 1G>), so probability for finding matched entry
>> is approximately < 1/512. This probability can be more decreasing
>> for device which have more memory. So I think that waste time to find
>> matched entry is more than saved time.
>>
>> Above is my humble opinion, so please let me know what I am missing.
>
> Please look at the kmap_high_get() code again.  It performs no
> searching at all.  What it does is:

If page is not highmem, it may be already filtered in kmap_atomic().
So we only consider highmem page.

For highmem page, it perform searching.
In kmap_high_get(), page_address() is called.
In page_address(), it hash PA and iterate a list for this hashed value.

And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is
that kmap(), kunmap() works without irq disabled.

Thanks.

> - lock the kmap array against concurrent changes
>
> - if the given page is not highmem, unlock and return NULL
>
> - otherwise increment that page reference count, unlock, and return the
>   mapped address for that page.
>
> There is almost zero cost to this function, independently of the number
> of kmap entries, whereas it does save much bigger costs elsewhere when
> it is successful.
>
>
> Nicolas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Joonsoo Kim March 19, 2013, 5:05 a.m. UTC | #5
On Thu, Mar 07, 2013 at 07:35:51PM +0900, JoonSoo Kim wrote:
> 2013/3/7 Nicolas Pitre <nicolas.pitre@linaro.org>:
> > On Thu, 7 Mar 2013, Joonsoo Kim wrote:
> >
> >> Hello, Nicolas.
> >>
> >> On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote:
> >> > On Mon, 4 Mar 2013, Joonsoo Kim wrote:
> >> >
> >> > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
> >> > > sequential ordered, because kmap_high_get() use global kmap_lock().
> >> > > It is not welcome situation, so turn off this optimization for SMP.
> >> >
> >> > I'm not sure I understand the problem.
> >> >
> >> > The lock taken by kmap_high_get() is released right away before that
> >> > function returns and therefore this is not actually serializing
> >> > anything.
> >>
> >> Yes, you understand what I want to say correctly.
> >> Sorry for bad explanation.
> >>
> >> Following is reasons why I send this patch with RFC tag.
> >>
> >> If we have more cpus, performance degration is possible although
> >> it is very short time to holding the lock in kmap_high_get().
> >>
> >> And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices
> >> has 2G memory(highmem 1G>), so probability for finding matched entry
> >> is approximately < 1/512. This probability can be more decreasing
> >> for device which have more memory. So I think that waste time to find
> >> matched entry is more than saved time.
> >>
> >> Above is my humble opinion, so please let me know what I am missing.
> >
> > Please look at the kmap_high_get() code again.  It performs no
> > searching at all.  What it does is:
> 
> If page is not highmem, it may be already filtered in kmap_atomic().
> So we only consider highmem page.
> 
> For highmem page, it perform searching.
> In kmap_high_get(), page_address() is called.
> In page_address(), it hash PA and iterate a list for this hashed value.
> 
> And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is
> that kmap(), kunmap() works without irq disabled.
> 
> Thanks.

Hello, Nicolas.

For just confirm, you don't agree with this, right?

Thanks.

> 
> > - lock the kmap array against concurrent changes
> >
> > - if the given page is not highmem, unlock and return NULL
> >
> > - otherwise increment that page reference count, unlock, and return the
> >   mapped address for that page.
> >
> > There is almost zero cost to this function, independently of the number
> > of kmap entries, whereas it does save much bigger costs elsewhere when
> > it is successful.
> >
> >
> > Nicolas
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Nicolas Pitre March 19, 2013, 12:40 p.m. UTC | #6
On Tue, 19 Mar 2013, Joonsoo Kim wrote:

> On Thu, Mar 07, 2013 at 07:35:51PM +0900, JoonSoo Kim wrote:
> > 2013/3/7 Nicolas Pitre <nicolas.pitre@linaro.org>:
> > > On Thu, 7 Mar 2013, Joonsoo Kim wrote:
> > >
> > >> Hello, Nicolas.
> > >>
> > >> On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote:
> > >> > On Mon, 4 Mar 2013, Joonsoo Kim wrote:
> > >> >
> > >> > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
> > >> > > sequential ordered, because kmap_high_get() use global kmap_lock().
> > >> > > It is not welcome situation, so turn off this optimization for SMP.
> > >> >
> > >> > I'm not sure I understand the problem.
> > >> >
> > >> > The lock taken by kmap_high_get() is released right away before that
> > >> > function returns and therefore this is not actually serializing
> > >> > anything.
> > >>
> > >> Yes, you understand what I want to say correctly.
> > >> Sorry for bad explanation.
> > >>
> > >> Following is reasons why I send this patch with RFC tag.
> > >>
> > >> If we have more cpus, performance degration is possible although
> > >> it is very short time to holding the lock in kmap_high_get().
> > >>
> > >> And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices
> > >> has 2G memory(highmem 1G>), so probability for finding matched entry
> > >> is approximately < 1/512. This probability can be more decreasing
> > >> for device which have more memory. So I think that waste time to find
> > >> matched entry is more than saved time.
> > >>
> > >> Above is my humble opinion, so please let me know what I am missing.
> > >
> > > Please look at the kmap_high_get() code again.  It performs no
> > > searching at all.  What it does is:
> > 
> > If page is not highmem, it may be already filtered in kmap_atomic().
> > So we only consider highmem page.
> > 
> > For highmem page, it perform searching.
> > In kmap_high_get(), page_address() is called.
> > In page_address(), it hash PA and iterate a list for this hashed value.
> > 
> > And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is
> > that kmap(), kunmap() works without irq disabled.
> > 
> > Thanks.
> 
> Hello, Nicolas.
> 
> For just confirm, you don't agree with this, right?

Right, I don't agree.  I don't believe the saving you claim are bigger 
than the advantages from this functionality.



Nicolas
diff mbox

Patch

diff --git a/arch/arm/include/asm/highmem.h b/arch/arm/include/asm/highmem.h
index 8c5e828..82fea0f 100644
--- a/arch/arm/include/asm/highmem.h
+++ b/arch/arm/include/asm/highmem.h
@@ -26,15 +26,13 @@  extern void kunmap_high(struct page *page);
  * The reason for kmap_high_get() is to ensure that the currently kmap'd
  * page usage count does not decrease to zero while we're using its
  * existing virtual mapping in an atomic context.  With a VIVT cache this
- * is essential to do, but with a VIPT cache this is only an optimization
- * so not to pay the price of establishing a second mapping if an existing
- * one can be used.  However, on platforms without hardware TLB maintenance
- * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
- * the locking involved must also disable IRQs which is incompatible with
- * the IPI mechanism used by global TLB operations.
+ * is essential to do, but with a VIPT cache this is only an optimization.
+ * With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
+ * sequential ordered, because kmap_high_get() use global kmap_lock().
+ * It is not welcome situation, so turn off this optimization for SMP.
  */
 #define ARCH_NEEDS_KMAP_HIGH_GET
-#if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6)
+#if defined(CONFIG_SMP)
 #undef ARCH_NEEDS_KMAP_HIGH_GET
 #if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT)
 #error "The sum of features in your kernel config cannot be supported together"