diff mbox series

percpu: clean up all mappings when pcpu_map_pages() fails

Message ID 20240311194346.2291333-1-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series percpu: clean up all mappings when pcpu_map_pages() fails | expand

Commit Message

Yosry Ahmed March 11, 2024, 7:43 p.m. UTC
In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call
__pcpu_unmap_pages() to clean up mappings on all CPUs where mappings
were created, but not on the CPU where __pcpu_map_pages() fails.

__pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around
vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers
of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping
fails, except pcpu_map_pages(). The reason could be that partial
mappings may be left behind from a failed mapping attempt.

Call __pcpu_unmap_pages() for the failed CPU as well in
pcpu_map_pages().

This was found by code inspection, no failures or bugs were observed.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---

Perhaps the reason __pcpu_unmap_pages() is not currently being called
for the failed CPU is that the size and alignment requirements make sure
we never leave any partial mappings behind? I have no idea. Nonetheless,
I think we want this change as that could be fragile, and is
inconsistent with other callers.

---
 mm/percpu-vm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Yosry Ahmed March 19, 2024, 8:08 p.m. UTC | #1
On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call
> __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings
> were created, but not on the CPU where __pcpu_map_pages() fails.
>
> __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around
> vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers
> of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping
> fails, except pcpu_map_pages(). The reason could be that partial
> mappings may be left behind from a failed mapping attempt.
>
> Call __pcpu_unmap_pages() for the failed CPU as well in
> pcpu_map_pages().
>
> This was found by code inspection, no failures or bugs were observed.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Any thoughts about this change? Should I resend next week after the
merge window?

> ---
>
> Perhaps the reason __pcpu_unmap_pages() is not currently being called
> for the failed CPU is that the size and alignment requirements make sure
> we never leave any partial mappings behind? I have no idea. Nonetheless,
> I think we want this change as that could be fragile, and is
> inconsistent with other callers.
>
> ---
>  mm/percpu-vm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> index 2054c9213c433..cd69caf6aa8d8 100644
> --- a/mm/percpu-vm.c
> +++ b/mm/percpu-vm.c
> @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk,
>         return 0;
>  err:
>         for_each_possible_cpu(tcpu) {
> -               if (tcpu == cpu)
> -                       break;
>                 __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start),
>                                    page_end - page_start);
> +               if (tcpu == cpu)
> +                       break;
>         }
>         pcpu_post_unmap_tlb_flush(chunk, page_start, page_end);
>         return err;
> --
> 2.44.0.278.ge034bb2e1d-goog
>
Dennis Zhou March 19, 2024, 8:32 p.m. UTC | #2
Hi Yosry,

On Tue, Mar 19, 2024 at 01:08:26PM -0700, Yosry Ahmed wrote:
> On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call
> > __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings
> > were created, but not on the CPU where __pcpu_map_pages() fails.
> >
> > __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around
> > vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers
> > of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping
> > fails, except pcpu_map_pages(). The reason could be that partial
> > mappings may be left behind from a failed mapping attempt.
> >
> > Call __pcpu_unmap_pages() for the failed CPU as well in
> > pcpu_map_pages().
> >
> > This was found by code inspection, no failures or bugs were observed.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> 
> Any thoughts about this change? Should I resend next week after the
> merge window?
> 

Sorry for the delay.

I'm looking at the code from mm/kmsan/hooks.c kmsan_ioremap_page_range().
It seems like __vunmap_range_noflush() is called on error for
successfully mapped pages similar to how it's being done in percpu-vm.c.

I haven't read in depth the expectations of vmap_pages_range_noflush()
but on first glance it doesn't seem like percpu is operating out of the
ordinary?

Thanks,
Dennis

> > ---
> >
> > Perhaps the reason __pcpu_unmap_pages() is not currently being called
> > for the failed CPU is that the size and alignment requirements make sure
> > we never leave any partial mappings behind? I have no idea. Nonetheless,
> > I think we want this change as that could be fragile, and is
> > inconsistent with other callers.
> >
> > ---
> >  mm/percpu-vm.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> > index 2054c9213c433..cd69caf6aa8d8 100644
> > --- a/mm/percpu-vm.c
> > +++ b/mm/percpu-vm.c
> > @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk,
> >         return 0;
> >  err:
> >         for_each_possible_cpu(tcpu) {
> > -               if (tcpu == cpu)
> > -                       break;
> >                 __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start),
> >                                    page_end - page_start);
> > +               if (tcpu == cpu)
> > +                       break;
> >         }
> >         pcpu_post_unmap_tlb_flush(chunk, page_start, page_end);
> >         return err;
> > --
> > 2.44.0.278.ge034bb2e1d-goog
> >
Yosry Ahmed March 19, 2024, 8:49 p.m. UTC | #3
On Tue, Mar 19, 2024 at 1:32 PM Dennis Zhou <dennis@kernel.org> wrote:
>
> Hi Yosry,
>
> On Tue, Mar 19, 2024 at 01:08:26PM -0700, Yosry Ahmed wrote:
> > On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call
> > > __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings
> > > were created, but not on the CPU where __pcpu_map_pages() fails.
> > >
> > > __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around
> > > vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers
> > > of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping
> > > fails, except pcpu_map_pages(). The reason could be that partial
> > > mappings may be left behind from a failed mapping attempt.
> > >
> > > Call __pcpu_unmap_pages() for the failed CPU as well in
> > > pcpu_map_pages().
> > >
> > > This was found by code inspection, no failures or bugs were observed.
> > >
> > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> >
> > Any thoughts about this change? Should I resend next week after the
> > merge window?
> >
>
> Sorry for the delay.
>
> I'm looking at the code from mm/kmsan/hooks.c kmsan_ioremap_page_range().
> It seems like __vunmap_range_noflush() is called on error for
> successfully mapped pages similar to how it's being done in percpu-vm.c.

You  picked an unconventional example to compare against :)

>
> I haven't read in depth the expectations of vmap_pages_range_noflush()
> but on first glance it doesn't seem like percpu is operating out of the
> ordinary?

I was looking at vm_map_ram(), vmap(), and __vmalloc_area_node(). They
all call vmap_pages_range()-> vmap_pages_range_noflush().

When vmap_pages_range() fails:
- vm_map_ram() calls
vm_unmap_ram()->free_unmap_vmap_area()->vunmap_range_noflush().
- vmap() calls vunmap()->remove_vm_area()->free_unmap_vmap_area()->
vunmap_range_noflush().
- __vmalloc_area_node() calls
vfree()->remove_vm_area()->free_unmap_vmap_area()->
vunmap_range_noflush().

I think it is needed to clean up any leftover partial mappings. I am
not sure about the kmsan example though.

Adding vmalloc reviewers here as well here.
>
> Thanks,
> Dennis
>
> > > ---
> > >
> > > Perhaps the reason __pcpu_unmap_pages() is not currently being called
> > > for the failed CPU is that the size and alignment requirements make sure
> > > we never leave any partial mappings behind? I have no idea. Nonetheless,
> > > I think we want this change as that could be fragile, and is
> > > inconsistent with other callers.
> > >
> > > ---
> > >  mm/percpu-vm.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> > > index 2054c9213c433..cd69caf6aa8d8 100644
> > > --- a/mm/percpu-vm.c
> > > +++ b/mm/percpu-vm.c
> > > @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk,
> > >         return 0;
> > >  err:
> > >         for_each_possible_cpu(tcpu) {
> > > -               if (tcpu == cpu)
> > > -                       break;
> > >                 __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start),
> > >                                    page_end - page_start);
> > > +               if (tcpu == cpu)
> > > +                       break;
> > >         }
> > >         pcpu_post_unmap_tlb_flush(chunk, page_start, page_end);
> > >         return err;
> > > --
> > > 2.44.0.278.ge034bb2e1d-goog
> > >
Dennis Zhou March 21, 2024, 4:57 p.m. UTC | #4
On Tue, Mar 19, 2024 at 01:49:17PM -0700, Yosry Ahmed wrote:
> On Tue, Mar 19, 2024 at 1:32 PM Dennis Zhou <dennis@kernel.org> wrote:
> >
> > Hi Yosry,
> >
> > On Tue, Mar 19, 2024 at 01:08:26PM -0700, Yosry Ahmed wrote:
> > > On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > >
> > > > In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call
> > > > __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings
> > > > were created, but not on the CPU where __pcpu_map_pages() fails.
> > > >
> > > > __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around
> > > > vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers
> > > > of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping
> > > > fails, except pcpu_map_pages(). The reason could be that partial
> > > > mappings may be left behind from a failed mapping attempt.
> > > >
> > > > Call __pcpu_unmap_pages() for the failed CPU as well in
> > > > pcpu_map_pages().
> > > >
> > > > This was found by code inspection, no failures or bugs were observed.
> > > >
> > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > >
> > > Any thoughts about this change? Should I resend next week after the
> > > merge window?
> > >
> >
> > Sorry for the delay.
> >
> > I'm looking at the code from mm/kmsan/hooks.c kmsan_ioremap_page_range().
> > It seems like __vunmap_range_noflush() is called on error for
> > successfully mapped pages similar to how it's being done in percpu-vm.c.
> 
> You  picked an unconventional example to compare against :)
> 
> >
> > I haven't read in depth the expectations of vmap_pages_range_noflush()
> > but on first glance it doesn't seem like percpu is operating out of the
> > ordinary?
> 
> I was looking at vm_map_ram(), vmap(), and __vmalloc_area_node(). They
> all call vmap_pages_range()-> vmap_pages_range_noflush().
> 
> When vmap_pages_range() fails:
> - vm_map_ram() calls
> vm_unmap_ram()->free_unmap_vmap_area()->vunmap_range_noflush().
> - vmap() calls vunmap()->remove_vm_area()->free_unmap_vmap_area()->
> vunmap_range_noflush().
> - __vmalloc_area_node() calls
> vfree()->remove_vm_area()->free_unmap_vmap_area()->
> vunmap_range_noflush().
> 

Okay so I had a moment to read it more closely. If we're mapping > 1
pages, and one of the latter pages fails. Then we could leave some
mappings installed.

@Andrew, I think this makes sense. Would you please be able to pick this
up? I'm not running a tree this window. I will try to send out the percpu
hotplug changes I've been forward porting for a while now and try to get
that in a branch for-6.10.

Acked-by: Dennis Zhou <dennis@kernel.org>

> I think it is needed to clean up any leftover partial mappings. I am
> not sure about the kmsan example though.
> 

Yeah the kmsan example seems like it could be wrong for the same reason,
but I haven't inspected that more closely.

Thanks,
Dennis

> Adding vmalloc reviewers here as well here.
> >
> >
> > > > ---
> > > >
> > > > Perhaps the reason __pcpu_unmap_pages() is not currently being called
> > > > for the failed CPU is that the size and alignment requirements make sure
> > > > we never leave any partial mappings behind? I have no idea. Nonetheless,
> > > > I think we want this change as that could be fragile, and is
> > > > inconsistent with other callers.
> > > >
> > > > ---
> > > >  mm/percpu-vm.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> > > > index 2054c9213c433..cd69caf6aa8d8 100644
> > > > --- a/mm/percpu-vm.c
> > > > +++ b/mm/percpu-vm.c
> > > > @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk,
> > > >         return 0;
> > > >  err:
> > > >         for_each_possible_cpu(tcpu) {
> > > > -               if (tcpu == cpu)
> > > > -                       break;
> > > >                 __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start),
> > > >                                    page_end - page_start);
> > > > +               if (tcpu == cpu)
> > > > +                       break;
> > > >         }
> > > >         pcpu_post_unmap_tlb_flush(chunk, page_start, page_end);
> > > >         return err;
> > > > --
> > > > 2.44.0.278.ge034bb2e1d-goog
> > > >
diff mbox series

Patch

diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 2054c9213c433..cd69caf6aa8d8 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -231,10 +231,10 @@  static int pcpu_map_pages(struct pcpu_chunk *chunk,
 	return 0;
 err:
 	for_each_possible_cpu(tcpu) {
-		if (tcpu == cpu)
-			break;
 		__pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start),
 				   page_end - page_start);
+		if (tcpu == cpu)
+			break;
 	}
 	pcpu_post_unmap_tlb_flush(chunk, page_start, page_end);
 	return err;