Message ID | 1362372667-953-1-git-send-email-iamjoonsoo.kim@lge.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 4 Mar 2013, Joonsoo Kim wrote: > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() > sequential ordered, because kmap_high_get() use global kmap_lock(). > It is not welcome situation, so turn off this optimization for SMP. I'm not sure I understand the problem. The lock taken by kmap_high_get() is released right away before that function returns and therefore this is not actually serializing anything. Nicolas
Hello, Nicolas. On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote: > On Mon, 4 Mar 2013, Joonsoo Kim wrote: > > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() > > sequential ordered, because kmap_high_get() use global kmap_lock(). > > It is not welcome situation, so turn off this optimization for SMP. > > I'm not sure I understand the problem. > > The lock taken by kmap_high_get() is released right away before that > function returns and therefore this is not actually serializing > anything. Yes, you understand what I want to say correctly. Sorry for bad explanation. Following is reasons why I send this patch with RFC tag. If we have more cpus, performance degration is possible although it is very short time to holding the lock in kmap_high_get(). And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices has 2G memory(highmem 1G>), so probability for finding matched entry is approximately < 1/512. This probability can be more decreasing for device which have more memory. So I think that waste time to find matched entry is more than saved time. Above is my humble opinion, so please let me know what I am missing. Thanks. > > > Nicolas > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On Thu, 7 Mar 2013, Joonsoo Kim wrote: > Hello, Nicolas. > > On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote: > > On Mon, 4 Mar 2013, Joonsoo Kim wrote: > > > > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() > > > sequential ordered, because kmap_high_get() use global kmap_lock(). > > > It is not welcome situation, so turn off this optimization for SMP. > > > > I'm not sure I understand the problem. > > > > The lock taken by kmap_high_get() is released right away before that > > function returns and therefore this is not actually serializing > > anything. > > Yes, you understand what I want to say correctly. > Sorry for bad explanation. > > Following is reasons why I send this patch with RFC tag. > > If we have more cpus, performance degration is possible although > it is very short time to holding the lock in kmap_high_get(). > > And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices > has 2G memory(highmem 1G>), so probability for finding matched entry > is approximately < 1/512. This probability can be more decreasing > for device which have more memory. So I think that waste time to find > matched entry is more than saved time. > > Above is my humble opinion, so please let me know what I am missing. Please look at the kmap_high_get() code again. It performs no searching at all. What it does is: - lock the kmap array against concurrent changes - if the given page is not highmem, unlock and return NULL - otherwise increment that page reference count, unlock, and return the mapped address for that page. There is almost zero cost to this function, independently of the number of kmap entries, whereas it does save much bigger costs elsewhere when it is successful. Nicolas
2013/3/7 Nicolas Pitre <nicolas.pitre@linaro.org>: > On Thu, 7 Mar 2013, Joonsoo Kim wrote: > >> Hello, Nicolas. >> >> On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote: >> > On Mon, 4 Mar 2013, Joonsoo Kim wrote: >> > >> > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() >> > > sequential ordered, because kmap_high_get() use global kmap_lock(). >> > > It is not welcome situation, so turn off this optimization for SMP. >> > >> > I'm not sure I understand the problem. >> > >> > The lock taken by kmap_high_get() is released right away before that >> > function returns and therefore this is not actually serializing >> > anything. >> >> Yes, you understand what I want to say correctly. >> Sorry for bad explanation. >> >> Following is reasons why I send this patch with RFC tag. >> >> If we have more cpus, performance degration is possible although >> it is very short time to holding the lock in kmap_high_get(). >> >> And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices >> has 2G memory(highmem 1G>), so probability for finding matched entry >> is approximately < 1/512. This probability can be more decreasing >> for device which have more memory. So I think that waste time to find >> matched entry is more than saved time. >> >> Above is my humble opinion, so please let me know what I am missing. > > Please look at the kmap_high_get() code again. It performs no > searching at all. What it does is: If page is not highmem, it may be already filtered in kmap_atomic(). So we only consider highmem page. For highmem page, it perform searching. In kmap_high_get(), page_address() is called. In page_address(), it hash PA and iterate a list for this hashed value. And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is that kmap(), kunmap() works without irq disabled. Thanks. > - lock the kmap array against concurrent changes > > - if the given page is not highmem, unlock and return NULL > > - otherwise increment that page reference count, unlock, and return the > mapped address for that page. > > There is almost zero cost to this function, independently of the number > of kmap entries, whereas it does save much bigger costs elsewhere when > it is successful. > > > Nicolas > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On Thu, Mar 07, 2013 at 07:35:51PM +0900, JoonSoo Kim wrote: > 2013/3/7 Nicolas Pitre <nicolas.pitre@linaro.org>: > > On Thu, 7 Mar 2013, Joonsoo Kim wrote: > > > >> Hello, Nicolas. > >> > >> On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote: > >> > On Mon, 4 Mar 2013, Joonsoo Kim wrote: > >> > > >> > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() > >> > > sequential ordered, because kmap_high_get() use global kmap_lock(). > >> > > It is not welcome situation, so turn off this optimization for SMP. > >> > > >> > I'm not sure I understand the problem. > >> > > >> > The lock taken by kmap_high_get() is released right away before that > >> > function returns and therefore this is not actually serializing > >> > anything. > >> > >> Yes, you understand what I want to say correctly. > >> Sorry for bad explanation. > >> > >> Following is reasons why I send this patch with RFC tag. > >> > >> If we have more cpus, performance degration is possible although > >> it is very short time to holding the lock in kmap_high_get(). > >> > >> And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices > >> has 2G memory(highmem 1G>), so probability for finding matched entry > >> is approximately < 1/512. This probability can be more decreasing > >> for device which have more memory. So I think that waste time to find > >> matched entry is more than saved time. > >> > >> Above is my humble opinion, so please let me know what I am missing. > > > > Please look at the kmap_high_get() code again. It performs no > > searching at all. What it does is: > > If page is not highmem, it may be already filtered in kmap_atomic(). > So we only consider highmem page. > > For highmem page, it perform searching. > In kmap_high_get(), page_address() is called. > In page_address(), it hash PA and iterate a list for this hashed value. > > And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is > that kmap(), kunmap() works without irq disabled. > > Thanks. Hello, Nicolas. For just confirm, you don't agree with this, right? Thanks. > > > - lock the kmap array against concurrent changes > > > > - if the given page is not highmem, unlock and return NULL > > > > - otherwise increment that page reference count, unlock, and return the > > mapped address for that page. > > > > There is almost zero cost to this function, independently of the number > > of kmap entries, whereas it does save much bigger costs elsewhere when > > it is successful. > > > > > > Nicolas > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On Tue, 19 Mar 2013, Joonsoo Kim wrote: > On Thu, Mar 07, 2013 at 07:35:51PM +0900, JoonSoo Kim wrote: > > 2013/3/7 Nicolas Pitre <nicolas.pitre@linaro.org>: > > > On Thu, 7 Mar 2013, Joonsoo Kim wrote: > > > > > >> Hello, Nicolas. > > >> > > >> On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote: > > >> > On Mon, 4 Mar 2013, Joonsoo Kim wrote: > > >> > > > >> > > With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() > > >> > > sequential ordered, because kmap_high_get() use global kmap_lock(). > > >> > > It is not welcome situation, so turn off this optimization for SMP. > > >> > > > >> > I'm not sure I understand the problem. > > >> > > > >> > The lock taken by kmap_high_get() is released right away before that > > >> > function returns and therefore this is not actually serializing > > >> > anything. > > >> > > >> Yes, you understand what I want to say correctly. > > >> Sorry for bad explanation. > > >> > > >> Following is reasons why I send this patch with RFC tag. > > >> > > >> If we have more cpus, performance degration is possible although > > >> it is very short time to holding the lock in kmap_high_get(). > > >> > > >> And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices > > >> has 2G memory(highmem 1G>), so probability for finding matched entry > > >> is approximately < 1/512. This probability can be more decreasing > > >> for device which have more memory. So I think that waste time to find > > >> matched entry is more than saved time. > > >> > > >> Above is my humble opinion, so please let me know what I am missing. > > > > > > Please look at the kmap_high_get() code again. It performs no > > > searching at all. What it does is: > > > > If page is not highmem, it may be already filtered in kmap_atomic(). > > So we only consider highmem page. > > > > For highmem page, it perform searching. > > In kmap_high_get(), page_address() is called. > > In page_address(), it hash PA and iterate a list for this hashed value. > > > > And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is > > that kmap(), kunmap() works without irq disabled. > > > > Thanks. > > Hello, Nicolas. > > For just confirm, you don't agree with this, right? Right, I don't agree. I don't believe the saving you claim are bigger than the advantages from this functionality. Nicolas
diff --git a/arch/arm/include/asm/highmem.h b/arch/arm/include/asm/highmem.h index 8c5e828..82fea0f 100644 --- a/arch/arm/include/asm/highmem.h +++ b/arch/arm/include/asm/highmem.h @@ -26,15 +26,13 @@ extern void kunmap_high(struct page *page); * The reason for kmap_high_get() is to ensure that the currently kmap'd * page usage count does not decrease to zero while we're using its * existing virtual mapping in an atomic context. With a VIVT cache this - * is essential to do, but with a VIPT cache this is only an optimization - * so not to pay the price of establishing a second mapping if an existing - * one can be used. However, on platforms without hardware TLB maintenance - * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since - * the locking involved must also disable IRQs which is incompatible with - * the IPI mechanism used by global TLB operations. + * is essential to do, but with a VIPT cache this is only an optimization. + * With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() + * sequential ordered, because kmap_high_get() use global kmap_lock(). + * It is not welcome situation, so turn off this optimization for SMP. */ #define ARCH_NEEDS_KMAP_HIGH_GET -#if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6) +#if defined(CONFIG_SMP) #undef ARCH_NEEDS_KMAP_HIGH_GET #if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT) #error "The sum of features in your kernel config cannot be supported together"
With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() sequential ordered, because kmap_high_get() use global kmap_lock(). It is not welcome situation, so turn off this optimization for SMP. Cc: Nicolas Pitre <nico@linaro.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>