diff mbox series

[1/2] mm/tlb: fix fullmm semantics

Message ID 20231228084642.1765-2-jszhang@kernel.org (mailing list archive)
State New, archived
Headers show
Series riscv: tlb: avoid tlb flushing on exit & execve | expand

Commit Message

Jisheng Zhang Dec. 28, 2023, 8:46 a.m. UTC
From: Nadav Amit <namit@vmware.com>

fullmm in mmu_gather is supposed to indicate that the mm is torn-down
(e.g., on process exit) and can therefore allow certain optimizations.
However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
the TLB should be fully flushed.

Change tlb_finish_mmu() to set need_flush_all and check this flag in
tlb_flush_mmu_tlbonly() when deciding whether a flush is needed.

At the same time, bring the arm64 fullmm on process exit optimization back.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: x86@kernel.org
---
 arch/arm64/include/asm/tlb.h | 5 ++++-
 include/asm-generic/tlb.h    | 2 +-
 mm/mmu_gather.c              | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

Comments

Nadav Amit Dec. 30, 2023, 9:54 a.m. UTC | #1
> On Dec 28, 2023, at 10:46 AM, Jisheng Zhang <jszhang@kernel.org> wrote:
> 
> From: Nadav Amit <namit@vmware.com>
> 
> fullmm in mmu_gather is supposed to indicate that the mm is torn-down
> (e.g., on process exit) and can therefore allow certain optimizations.
> However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
> the TLB should be fully flushed.
> 
> Change tlb_finish_mmu() to set need_flush_all and check this flag in
> tlb_flush_mmu_tlbonly() when deciding whether a flush is needed.
> 
> At the same time, bring the arm64 fullmm on process exit optimization back.
> 
> Signed-off-by: Nadav Amit <namit@vmware.com>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: x86@kernel.org
> ---
> arch/arm64/include/asm/tlb.h | 5 ++++-
> include/asm-generic/tlb.h    | 2 +-
> mm/mmu_gather.c              | 2 +-
> 3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index 846c563689a8..6164c5f3b78f 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -62,7 +62,10 @@ static inline void tlb_flush(struct mmu_gather *tlb)
> 	 * invalidating the walk-cache, since the ASID allocator won't
> 	 * reallocate our ASID without invalidating the entire TLB.
> 	 */
> -	if (tlb->fullmm) {
> +	if (tlb->fullmm)
> +		return;
> +
> +	if (tlb->need_flush_all) {
> 		if (!last_level)
> 			flush_tlb_mm(tlb->mm);
> 		return;
> 

Thanks for pulling my patch out of the abyss, but the chunk above
did not come from my old patch.

My knowledge of arm64 is a bit limited, but the code does not seem
to match the comment, so if it is correct (which I strongly doubt),
the comment should be updated.

[1] https://lore.kernel.org/all/20210131001132.3368247-2-namit@vmware.com/
Jisheng Zhang Jan. 2, 2024, 2:41 a.m. UTC | #2
On Sat, Dec 30, 2023 at 11:54:02AM +0200, Nadav Amit wrote:
> 
> 
> > On Dec 28, 2023, at 10:46 AM, Jisheng Zhang <jszhang@kernel.org> wrote:
> > 
> > From: Nadav Amit <namit@vmware.com>
> > 
> > fullmm in mmu_gather is supposed to indicate that the mm is torn-down
> > (e.g., on process exit) and can therefore allow certain optimizations.
> > However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
> > the TLB should be fully flushed.
> > 
> > Change tlb_finish_mmu() to set need_flush_all and check this flag in
> > tlb_flush_mmu_tlbonly() when deciding whether a flush is needed.
> > 
> > At the same time, bring the arm64 fullmm on process exit optimization back.
> > 
> > Signed-off-by: Nadav Amit <namit@vmware.com>
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Yu Zhao <yuzhao@google.com>
> > Cc: Nick Piggin <npiggin@gmail.com>
> > Cc: x86@kernel.org
> > ---
> > arch/arm64/include/asm/tlb.h | 5 ++++-
> > include/asm-generic/tlb.h    | 2 +-
> > mm/mmu_gather.c              | 2 +-
> > 3 files changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> > index 846c563689a8..6164c5f3b78f 100644
> > --- a/arch/arm64/include/asm/tlb.h
> > +++ b/arch/arm64/include/asm/tlb.h
> > @@ -62,7 +62,10 @@ static inline void tlb_flush(struct mmu_gather *tlb)
> > 	 * invalidating the walk-cache, since the ASID allocator won't
> > 	 * reallocate our ASID without invalidating the entire TLB.
> > 	 */
> > -	if (tlb->fullmm) {
> > +	if (tlb->fullmm)
> > +		return;
> > +
> > +	if (tlb->need_flush_all) {
> > 		if (!last_level)
> > 			flush_tlb_mm(tlb->mm);
> > 		return;
> > 
> 
> Thanks for pulling my patch out of the abyss, but the chunk above
> did not come from my old patch.

I stated this in cover letter msg ;) IMHO, current arm64 uses fullmm as
need_flush_all, so I think we need at least the need_flush_all line.

I'd like to see comments from arm64 experts.

> 
> My knowledge of arm64 is a bit limited, but the code does not seem
> to match the comment, so if it is correct (which I strongly doubt),
> the comment should be updated.

will do if the above change is accepted by arm64

> 
> [1] https://lore.kernel.org/all/20210131001132.3368247-2-namit@vmware.com/
> 
> 
> -- 
> This electronic communication and the information and any files transmitted 
> with it, or attached to it, are confidential and are intended solely for 
> the use of the individual or entity to whom it is addressed and may contain 
> information that is confidential, legally privileged, protected by privacy 
> laws, or otherwise restricted from disclosure to anyone else. If you are 
> not the intended recipient or the person responsible for delivering the 
> e-mail to the intended recipient, you are hereby notified that any use, 
> copying, distributing, dissemination, forwarding, printing, or copying of 
> this e-mail is strictly prohibited. If you received this e-mail in error, 
> please return the e-mail to the sender, delete it from your computer, and 
> destroy any printed copy of it.
Will Deacon Jan. 3, 2024, 5:50 p.m. UTC | #3
On Thu, Dec 28, 2023 at 04:46:41PM +0800, Jisheng Zhang wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> fullmm in mmu_gather is supposed to indicate that the mm is torn-down
> (e.g., on process exit) and can therefore allow certain optimizations.
> However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
> the TLB should be fully flushed.
> 
> Change tlb_finish_mmu() to set need_flush_all and check this flag in
> tlb_flush_mmu_tlbonly() when deciding whether a flush is needed.
> 
> At the same time, bring the arm64 fullmm on process exit optimization back.
> 
> Signed-off-by: Nadav Amit <namit@vmware.com>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: x86@kernel.org
> ---
>  arch/arm64/include/asm/tlb.h | 5 ++++-
>  include/asm-generic/tlb.h    | 2 +-
>  mm/mmu_gather.c              | 2 +-
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index 846c563689a8..6164c5f3b78f 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -62,7 +62,10 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>  	 * invalidating the walk-cache, since the ASID allocator won't
>  	 * reallocate our ASID without invalidating the entire TLB.
>  	 */
> -	if (tlb->fullmm) {
> +	if (tlb->fullmm)
> +		return;
> +
> +	if (tlb->need_flush_all) {
>  		if (!last_level)
>  			flush_tlb_mm(tlb->mm);
>  		return;

Why isn't the 'last_level' check sufficient here? In other words, when do
we perform a !last_level invalidation with 'fullmm' set outside of teardown?

Will
Will Deacon Jan. 3, 2024, 5:57 p.m. UTC | #4
On Wed, Jan 03, 2024 at 05:50:01PM +0000, Will Deacon wrote:
> On Thu, Dec 28, 2023 at 04:46:41PM +0800, Jisheng Zhang wrote:
> > From: Nadav Amit <namit@vmware.com>
> > 
> > fullmm in mmu_gather is supposed to indicate that the mm is torn-down
> > (e.g., on process exit) and can therefore allow certain optimizations.
> > However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
> > the TLB should be fully flushed.
> > 
> > Change tlb_finish_mmu() to set need_flush_all and check this flag in
> > tlb_flush_mmu_tlbonly() when deciding whether a flush is needed.
> > 
> > At the same time, bring the arm64 fullmm on process exit optimization back.
> > 
> > Signed-off-by: Nadav Amit <namit@vmware.com>
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Yu Zhao <yuzhao@google.com>
> > Cc: Nick Piggin <npiggin@gmail.com>
> > Cc: x86@kernel.org
> > ---
> >  arch/arm64/include/asm/tlb.h | 5 ++++-
> >  include/asm-generic/tlb.h    | 2 +-
> >  mm/mmu_gather.c              | 2 +-
> >  3 files changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> > index 846c563689a8..6164c5f3b78f 100644
> > --- a/arch/arm64/include/asm/tlb.h
> > +++ b/arch/arm64/include/asm/tlb.h
> > @@ -62,7 +62,10 @@ static inline void tlb_flush(struct mmu_gather *tlb)
> >  	 * invalidating the walk-cache, since the ASID allocator won't
> >  	 * reallocate our ASID without invalidating the entire TLB.
> >  	 */
> > -	if (tlb->fullmm) {
> > +	if (tlb->fullmm)
> > +		return;
> > +
> > +	if (tlb->need_flush_all) {
> >  		if (!last_level)
> >  			flush_tlb_mm(tlb->mm);
> >  		return;
> 
> Why isn't the 'last_level' check sufficient here? In other words, when do
> we perform a !last_level invalidation with 'fullmm' set outside of teardown?

Sorry, logic inversion typo there. I should've said:

  When do we perform a last_level invalidation with 'fullmm' set outside
  of teardown?

I remember this used to be the case for OOM ages ago, but 687cb0884a71
("mm, oom_reaper: gather each vma to prevent leaking TLB entry") sorted
that out.

I'm not against making this clearer and/or more robust, I'm just trying
to understand whether this is fixing a bug (as implied by the subject)
or not.

Will
Catalin Marinas Jan. 3, 2024, 6:05 p.m. UTC | #5
On Thu, Dec 28, 2023 at 04:46:41PM +0800, Jisheng Zhang wrote:
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index 846c563689a8..6164c5f3b78f 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -62,7 +62,10 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>  	 * invalidating the walk-cache, since the ASID allocator won't
>  	 * reallocate our ASID without invalidating the entire TLB.
>  	 */
> -	if (tlb->fullmm) {
> +	if (tlb->fullmm)
> +		return;
> +
> +	if (tlb->need_flush_all) {
>  		if (!last_level)
>  			flush_tlb_mm(tlb->mm);
>  		return;

I don't think that's correct. IIRC, commit f270ab88fdf2 ("arm64: tlb:
Adjust stride and type of TLBI according to mmu_gather") explicitly
added the !last_level check to invalidate the walk cache (correspondence
between the VA and the page table page rather than the full VA->PA
translation).

> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 129a3a759976..f2d46357bcbb 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -452,7 +452,7 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
>  	 * these bits.
>  	 */
>  	if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
> -	      tlb->cleared_puds || tlb->cleared_p4ds))
> +	      tlb->cleared_puds || tlb->cleared_p4ds || tlb->need_flush_all))
>  		return;
>  
>  	tlb_flush(tlb);
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index 4f559f4ddd21..79298bac3481 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -384,7 +384,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb)
>  		 * On x86 non-fullmm doesn't yield significant difference
>  		 * against fullmm.
>  		 */
> -		tlb->fullmm = 1;
> +		tlb->need_flush_all = 1;
>  		__tlb_reset_range(tlb);
>  		tlb->freed_tables = 1;
>  	}

The optimisation here was added about a year later in commit
7a30df49f63a ("mm: mmu_gather: remove __tlb_reset_range() for force
flush"). Do we still need to keep freed_tables = 1 here? I'd say only
__tlb_reset_range().
Dave Hansen Jan. 3, 2024, 8:26 p.m. UTC | #6
On 1/3/24 10:05, Catalin Marinas wrote:
>> --- a/mm/mmu_gather.c
>> +++ b/mm/mmu_gather.c
>> @@ -384,7 +384,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb)
>>  		 * On x86 non-fullmm doesn't yield significant difference
>>  		 * against fullmm.
>>  		 */
>> -		tlb->fullmm = 1;
>> +		tlb->need_flush_all = 1;
>>  		__tlb_reset_range(tlb);
>>  		tlb->freed_tables = 1;
>>  	}
> The optimisation here was added about a year later in commit
> 7a30df49f63a ("mm: mmu_gather: remove __tlb_reset_range() for force
> flush"). Do we still need to keep freed_tables = 1 here? I'd say only
> __tlb_reset_range().

I think the __tlb_reset_range() can be dangerous if it clears
->freed_tables.  On x86 at least, it might lead to skipping the TLB IPI
for CPUs that are in lazy TLB mode.  When those wake back up they might
start using the freed page tables.

Logically, this little hunk of code is just trying to turn the 'tlb'
from a ranged flush into a full one.  Ideally, just setting
'need_flush_all = 1' would be enough.

Is __tlb_reset_range() doing anything useful for other architectures?  I
think it's just destroying perfectly valid metadata on x86. ;)
Catalin Marinas Jan. 3, 2024, 9:54 p.m. UTC | #7
On Wed, Jan 03, 2024 at 12:26:29PM -0800, Dave Hansen wrote:
> On 1/3/24 10:05, Catalin Marinas wrote:
> >> --- a/mm/mmu_gather.c
> >> +++ b/mm/mmu_gather.c
> >> @@ -384,7 +384,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb)
> >>  		 * On x86 non-fullmm doesn't yield significant difference
> >>  		 * against fullmm.
> >>  		 */
> >> -		tlb->fullmm = 1;
> >> +		tlb->need_flush_all = 1;
> >>  		__tlb_reset_range(tlb);
> >>  		tlb->freed_tables = 1;
> >>  	}
> > The optimisation here was added about a year later in commit
> > 7a30df49f63a ("mm: mmu_gather: remove __tlb_reset_range() for force
> > flush"). Do we still need to keep freed_tables = 1 here? I'd say only
> > __tlb_reset_range().
> 
> I think the __tlb_reset_range() can be dangerous if it clears
> ->freed_tables.  On x86 at least, it might lead to skipping the TLB IPI
> for CPUs that are in lazy TLB mode.  When those wake back up they might
> start using the freed page tables.

You are right, I did not realise freed_tables is reset in
__tlb_reset_range().
Nadav Amit Jan. 4, 2024, 1:26 p.m. UTC | #8
> On Jan 2, 2024, at 4:41 AM, Jisheng Zhang <jszhang@kernel.org> wrote:
> 
> On Sat, Dec 30, 2023 at 11:54:02AM +0200, Nadav Amit wrote:
> 
>> 
>> My knowledge of arm64 is a bit limited, but the code does not seem
>> to match the comment, so if it is correct (which I strongly doubt),
>> the comment should be updated.
> 
> will do if the above change is accepted by arm64

Jisheng, I expected somebody with arm64 knowledge to point it out, and
maybe I am wrong, but I really don’t understand something about the
correctness, if you can please explain.

In the following code:

--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -62,7 +62,10 @@ static inline void tlb_flush(struct mmu_gather *tlb)
	 * invalidating the walk-cache, since the ASID allocator won't
	 * reallocate our ASID without invalidating the entire TLB.
	 */
-	if (tlb->fullmm) {
+	if (tlb->fullmm)
+		return;

You skip flush if fullmm is on. But if page-tables are freed, you may
want to flush immediately and not wait for ASID to be freed to avoid
speculative page walks; these walks at least on x86 caused a mess.

No?
Will Deacon Jan. 4, 2024, 2:40 p.m. UTC | #9
On Thu, Jan 04, 2024 at 03:26:43PM +0200, Nadav Amit wrote:
> 
> 
> > On Jan 2, 2024, at 4:41 AM, Jisheng Zhang <jszhang@kernel.org> wrote:
> > 
> > On Sat, Dec 30, 2023 at 11:54:02AM +0200, Nadav Amit wrote:
> > 
> >> 
> >> My knowledge of arm64 is a bit limited, but the code does not seem
> >> to match the comment, so if it is correct (which I strongly doubt),
> >> the comment should be updated.
> > 
> > will do if the above change is accepted by arm64
> 
> Jisheng, I expected somebody with arm64 knowledge to point it out, and
> maybe I am wrong, but I really don’t understand something about the
> correctness, if you can please explain.
> 
> In the following code:
> 
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -62,7 +62,10 @@ static inline void tlb_flush(struct mmu_gather *tlb)
> 	 * invalidating the walk-cache, since the ASID allocator won't
> 	 * reallocate our ASID without invalidating the entire TLB.
> 	 */
> -	if (tlb->fullmm) {
> +	if (tlb->fullmm)
> +		return;
> 
> You skip flush if fullmm is on. But if page-tables are freed, you may
> want to flush immediately and not wait for ASID to be freed to avoid
> speculative page walks; these walks at least on x86 caused a mess.
> 
> No?

I think Catalin made the same observation here:

https://lore.kernel.org/r/ZZWh4c3ZUtadFqD1@arm.com

and it does indeed look broken.

Will
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 846c563689a8..6164c5f3b78f 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -62,7 +62,10 @@  static inline void tlb_flush(struct mmu_gather *tlb)
 	 * invalidating the walk-cache, since the ASID allocator won't
 	 * reallocate our ASID without invalidating the entire TLB.
 	 */
-	if (tlb->fullmm) {
+	if (tlb->fullmm)
+		return;
+
+	if (tlb->need_flush_all) {
 		if (!last_level)
 			flush_tlb_mm(tlb->mm);
 		return;
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 129a3a759976..f2d46357bcbb 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -452,7 +452,7 @@  static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 	 * these bits.
 	 */
 	if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
-	      tlb->cleared_puds || tlb->cleared_p4ds))
+	      tlb->cleared_puds || tlb->cleared_p4ds || tlb->need_flush_all))
 		return;
 
 	tlb_flush(tlb);
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 4f559f4ddd21..79298bac3481 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -384,7 +384,7 @@  void tlb_finish_mmu(struct mmu_gather *tlb)
 		 * On x86 non-fullmm doesn't yield significant difference
 		 * against fullmm.
 		 */
-		tlb->fullmm = 1;
+		tlb->need_flush_all = 1;
 		__tlb_reset_range(tlb);
 		tlb->freed_tables = 1;
 	}