mbox series

[RFC,v4,0/6] arm64: tlb: add support for TTL feature

Message ID 20200324134534.1570-1-yezhenyu2@huawei.com (mailing list archive)
Headers show
Series arm64: tlb: add support for TTL feature | expand

Message

Zhenyu Ye March 24, 2020, 1:45 p.m. UTC
In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
feature allows TLBs to be issued with a level allowing for quicker
invalidation.  This series provide support for this feature. 

Patch 1 and Patch 2 was provided by Marc on his NV series[1] patches,
which detect the TTL feature and add __tlbi_level interface.

See patches for details, Thanks.

[1] https://lore.kernel.org/linux-arm-kernel/20200211174938.27809-1-maz@kernel.org/


ChangeList:
v1:
add support for TTL feature in arm64.

v2:
build the patch on Marc's NV series[1].

v3:
use vma->vm_flags to replace mm->context.flags.

v4:
add Marc's patches into my series.


Marc Zyngier (2):
  arm64: Detect the ARMv8.4 TTL feature
  arm64: Add level-hinted TLB invalidation helper

Zhenyu Ye (4):
  arm64: Add level-hinted TLB invalidation helper to tlbi_user
  mm: Add page table level flags to vm_flags
  arm64: tlb: Use translation level hint in vm_flags
  mm: Set VM_LEVEL flags in some tlb_flush functions

 arch/arm64/include/asm/cpucaps.h  |  3 +-
 arch/arm64/include/asm/mmu.h      |  2 +
 arch/arm64/include/asm/sysreg.h   |  1 +
 arch/arm64/include/asm/tlb.h      | 12 +++++
 arch/arm64/include/asm/tlbflush.h | 74 ++++++++++++++++++++++++++++---
 arch/arm64/kernel/cpufeature.c    | 11 +++++
 arch/arm64/mm/hugetlbpage.c       |  4 +-
 arch/arm64/mm/mmu.c               | 14 ++++++
 include/asm-generic/pgtable.h     | 16 ++++++-
 include/linux/mm.h                | 10 +++++
 include/trace/events/mmflags.h    | 15 ++++++-
 mm/huge_memory.c                  |  8 +++-
 12 files changed, 157 insertions(+), 13 deletions(-)

Comments

Peter Zijlstra March 24, 2020, 3:01 p.m. UTC | #1
On Tue, Mar 24, 2020 at 09:45:28PM +0800, Zhenyu Ye wrote:
> In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
> feature allows TLBs to be issued with a level allowing for quicker
> invalidation.  This series provide support for this feature. 
> 
> Patch 1 and Patch 2 was provided by Marc on his NV series[1] patches,
> which detect the TTL feature and add __tlbi_level interface.

I realy hate how it makes vma->vm_flags more important for tlbi.
Zhenyu Ye March 25, 2020, 4:49 a.m. UTC | #2
Hi Peter,

On 2020/3/24 23:01, Peter Zijlstra wrote:
> On Tue, Mar 24, 2020 at 09:45:28PM +0800, Zhenyu Ye wrote:
>> In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
>> feature allows TLBs to be issued with a level allowing for quicker
>> invalidation.  This series provide support for this feature. 
>>
>> Patch 1 and Patch 2 was provided by Marc on his NV series[1] patches,
>> which detect the TTL feature and add __tlbi_level interface.
> 
> I realy hate how it makes vma->vm_flags more important for tlbi.
> 

Thanks for your review.

The tlbi interfaces only have two parameters: vma and addr. If we
try to not use vma->vm_flags, we may should have to add a parameter
to some of these interfaces(such as flush_tlb_range), which are
common to all architectures.

I'm not sure if this is feasible, because this feature is only
supported by ARM64 currently.


Thanks,
Zhenyu
Peter Zijlstra March 25, 2020, 1:32 p.m. UTC | #3
On Wed, Mar 25, 2020 at 12:49:45PM +0800, Zhenyu Ye wrote:
> Hi Peter,
> 
> On 2020/3/24 23:01, Peter Zijlstra wrote:
> > On Tue, Mar 24, 2020 at 09:45:28PM +0800, Zhenyu Ye wrote:
> >> In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
> >> feature allows TLBs to be issued with a level allowing for quicker
> >> invalidation.  This series provide support for this feature. 
> >>
> >> Patch 1 and Patch 2 was provided by Marc on his NV series[1] patches,
> >> which detect the TTL feature and add __tlbi_level interface.
> > 
> > I realy hate how it makes vma->vm_flags more important for tlbi.
> > 
> 
> Thanks for your review.
> 
> The tlbi interfaces only have two parameters: vma and addr. If we
> try to not use vma->vm_flags, we may should have to add a parameter
> to some of these interfaces(such as flush_tlb_range), which are
> common to all architectures.
> 
> I'm not sure if this is feasible, because this feature is only
> supported by ARM64 currently.

Power (p9-radix) also has level dependent invalidation instructions, so
at the very least you can hook them up as well.
James Morse March 25, 2020, 4:15 p.m. UTC | #4
Hi Zhenyu,

On 3/24/20 1:45 PM, Zhenyu Ye wrote:
> In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
> feature allows TLBs to be issued with a level allowing for quicker
> invalidation.  This series provide support for this feature. 
> 
> Patch 1 and Patch 2 was provided by Marc on his NV series[1] patches,
> which detect the TTL feature and add __tlbi_level interface.

How does this interact with THP?
(I don't see anything on that in the series.)

With THP, there is no one answer to the size of mapping in a VMA.
This is a problem because the arm-arm has in "Translation table level
hints" in D5.10.2 of DDI0487E.a:
| If an incorrect value for the entry being invalidated by the
| instruction is specified in the TTL field, then no entries are
| required by the architecture to be invalidated from the TLB.

If we get it wrong, not TLB maintenance occurs!

Unless THP leaves its fingerprints on the vma, I think you can only do
this for VMA types that THP can't mess with. (see
transparent_hugepage_enabled())


Thanks,

James
Peter Zijlstra March 25, 2020, 4:41 p.m. UTC | #5
On Wed, Mar 25, 2020 at 04:15:58PM +0000, James Morse wrote:
> Hi Zhenyu,
> 
> On 3/24/20 1:45 PM, Zhenyu Ye wrote:
> > In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
> > feature allows TLBs to be issued with a level allowing for quicker
> > invalidation.  This series provide support for this feature. 
> > 
> > Patch 1 and Patch 2 was provided by Marc on his NV series[1] patches,
> > which detect the TTL feature and add __tlbi_level interface.
> 
> How does this interact with THP?
> (I don't see anything on that in the series.)
> 
> With THP, there is no one answer to the size of mapping in a VMA.
> This is a problem because the arm-arm has in "Translation table level
> hints" in D5.10.2 of DDI0487E.a:
> | If an incorrect value for the entry being invalidated by the
> | instruction is specified in the TTL field, then no entries are
> | required by the architecture to be invalidated from the TLB.
> 
> If we get it wrong, not TLB maintenance occurs!
> 
> Unless THP leaves its fingerprints on the vma, I think you can only do
> this for VMA types that THP can't mess with. (see
> transparent_hugepage_enabled())

The convention way to deal with that is to issue the TBLI for all
possible sizes.

Power9 has all this, please look there.
Zhenyu Ye March 26, 2020, 6:45 a.m. UTC | #6
Hi James,

On 2020/3/26 0:15, James Morse wrote:
> Hi Zhenyu,
> 
> On 3/24/20 1:45 PM, Zhenyu Ye wrote:
>> In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
>> feature allows TLBs to be issued with a level allowing for quicker
>> invalidation.  This series provide support for this feature. 
>>
>> Patch 1 and Patch 2 was provided by Marc on his NV series[1] patches,
>> which detect the TTL feature and add __tlbi_level interface.
> 
> How does this interact with THP?
> (I don't see anything on that in the series.)
> 
> With THP, there is no one answer to the size of mapping in a VMA.
> This is a problem because the arm-arm has in "Translation table level
> hints" in D5.10.2 of DDI0487E.a:
> | If an incorrect value for the entry being invalidated by the
> | instruction is specified in the TTL field, then no entries are
> | required by the architecture to be invalidated from the TLB.
> 
> If we get it wrong, not TLB maintenance occurs!
> 

Thanks for your review.  With THP, we should update the TTL value
after the page collapse and merge.  If not sure what it should be,
we can set it to 0 to avoid "no TLB maintenance occur" problem.
The Table D5-53 in DDI0487E.a says:
| when TTL[1:0] is 0b00:
|   This value is reserved, and hardware should treat this as if TTL[3:2] is 0b00
| when TTL[3:2] is 0b00:
|   Hardware must assume that the entry can be from any level.

> Unless THP leaves its fingerprints on the vma, I think you can only do
> this for VMA types that THP can't mess with. (see
> transparent_hugepage_enabled())
> 

I will try to add struct mmu_gather to TLBI interfaces, which has enough
info to track tlb's level.  See in next patch version!


Thanks,
Zhenyu

.
Zhenyu Ye March 26, 2020, 7:15 a.m. UTC | #7
On 2020/3/25 21:32, Peter Zijlstra wrote:
> On Wed, Mar 25, 2020 at 12:49:45PM +0800, Zhenyu Ye wrote:
>> Hi Peter,
>>
>> On 2020/3/24 23:01, Peter Zijlstra wrote:
>>> On Tue, Mar 24, 2020 at 09:45:28PM +0800, Zhenyu Ye wrote:
>>>> In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
>>>> feature allows TLBs to be issued with a level allowing for quicker
>>>> invalidation.  This series provide support for this feature. 
>>>>
>>>> Patch 1 and Patch 2 was provided by Marc on his NV series[1] patches,
>>>> which detect the TTL feature and add __tlbi_level interface.
>>>
>>> I realy hate how it makes vma->vm_flags more important for tlbi.
>>>
>>
>> Thanks for your review.
>>
>> The tlbi interfaces only have two parameters: vma and addr. If we
>> try to not use vma->vm_flags, we may should have to add a parameter
>> to some of these interfaces(such as flush_tlb_range), which are
>> common to all architectures.
>>
>> I'm not sure if this is feasible, because this feature is only
>> supported by ARM64 currently.
> 
> Power (p9-radix) also has level dependent invalidation instructions, so
> at the very least you can hook them up as well.
> 
> .
>

Thanks, I will push my next version soon.

Zhenyu
.