Message ID | 20240311194346.2291333-1-yosryahmed@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | percpu: clean up all mappings when pcpu_map_pages() fails | expand |
On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call > __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings > were created, but not on the CPU where __pcpu_map_pages() fails. > > __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around > vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers > of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping > fails, except pcpu_map_pages(). The reason could be that partial > mappings may be left behind from a failed mapping attempt. > > Call __pcpu_unmap_pages() for the failed CPU as well in > pcpu_map_pages(). > > This was found by code inspection, no failures or bugs were observed. > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Any thoughts about this change? Should I resend next week after the merge window? > --- > > Perhaps the reason __pcpu_unmap_pages() is not currently being called > for the failed CPU is that the size and alignment requirements make sure > we never leave any partial mappings behind? I have no idea. Nonetheless, > I think we want this change as that could be fragile, and is > inconsistent with other callers. > > --- > mm/percpu-vm.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c > index 2054c9213c433..cd69caf6aa8d8 100644 > --- a/mm/percpu-vm.c > +++ b/mm/percpu-vm.c > @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, > return 0; > err: > for_each_possible_cpu(tcpu) { > - if (tcpu == cpu) > - break; > __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), > page_end - page_start); > + if (tcpu == cpu) > + break; > } > pcpu_post_unmap_tlb_flush(chunk, page_start, page_end); > return err; > -- > 2.44.0.278.ge034bb2e1d-goog >
Hi Yosry, On Tue, Mar 19, 2024 at 01:08:26PM -0700, Yosry Ahmed wrote: > On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call > > __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings > > were created, but not on the CPU where __pcpu_map_pages() fails. > > > > __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around > > vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers > > of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping > > fails, except pcpu_map_pages(). The reason could be that partial > > mappings may be left behind from a failed mapping attempt. > > > > Call __pcpu_unmap_pages() for the failed CPU as well in > > pcpu_map_pages(). > > > > This was found by code inspection, no failures or bugs were observed. > > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > > Any thoughts about this change? Should I resend next week after the > merge window? > Sorry for the delay. I'm looking at the code from mm/kmsan/hooks.c kmsan_ioremap_page_range(). It seems like __vunmap_range_noflush() is called on error for successfully mapped pages similar to how it's being done in percpu-vm.c. I haven't read in depth the expectations of vmap_pages_range_noflush() but on first glance it doesn't seem like percpu is operating out of the ordinary? Thanks, Dennis > > --- > > > > Perhaps the reason __pcpu_unmap_pages() is not currently being called > > for the failed CPU is that the size and alignment requirements make sure > > we never leave any partial mappings behind? I have no idea. Nonetheless, > > I think we want this change as that could be fragile, and is > > inconsistent with other callers. > > > > --- > > mm/percpu-vm.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c > > index 2054c9213c433..cd69caf6aa8d8 100644 > > --- a/mm/percpu-vm.c > > +++ b/mm/percpu-vm.c > > @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, > > return 0; > > err: > > for_each_possible_cpu(tcpu) { > > - if (tcpu == cpu) > > - break; > > __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), > > page_end - page_start); > > + if (tcpu == cpu) > > + break; > > } > > pcpu_post_unmap_tlb_flush(chunk, page_start, page_end); > > return err; > > -- > > 2.44.0.278.ge034bb2e1d-goog > >
On Tue, Mar 19, 2024 at 1:32 PM Dennis Zhou <dennis@kernel.org> wrote: > > Hi Yosry, > > On Tue, Mar 19, 2024 at 01:08:26PM -0700, Yosry Ahmed wrote: > > On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > > In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call > > > __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings > > > were created, but not on the CPU where __pcpu_map_pages() fails. > > > > > > __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around > > > vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers > > > of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping > > > fails, except pcpu_map_pages(). The reason could be that partial > > > mappings may be left behind from a failed mapping attempt. > > > > > > Call __pcpu_unmap_pages() for the failed CPU as well in > > > pcpu_map_pages(). > > > > > > This was found by code inspection, no failures or bugs were observed. > > > > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > > > > Any thoughts about this change? Should I resend next week after the > > merge window? > > > > Sorry for the delay. > > I'm looking at the code from mm/kmsan/hooks.c kmsan_ioremap_page_range(). > It seems like __vunmap_range_noflush() is called on error for > successfully mapped pages similar to how it's being done in percpu-vm.c. You picked an unconventional example to compare against :) > > I haven't read in depth the expectations of vmap_pages_range_noflush() > but on first glance it doesn't seem like percpu is operating out of the > ordinary? I was looking at vm_map_ram(), vmap(), and __vmalloc_area_node(). They all call vmap_pages_range()-> vmap_pages_range_noflush(). When vmap_pages_range() fails: - vm_map_ram() calls vm_unmap_ram()->free_unmap_vmap_area()->vunmap_range_noflush(). - vmap() calls vunmap()->remove_vm_area()->free_unmap_vmap_area()-> vunmap_range_noflush(). - __vmalloc_area_node() calls vfree()->remove_vm_area()->free_unmap_vmap_area()-> vunmap_range_noflush(). I think it is needed to clean up any leftover partial mappings. I am not sure about the kmsan example though. Adding vmalloc reviewers here as well here. > > Thanks, > Dennis > > > > --- > > > > > > Perhaps the reason __pcpu_unmap_pages() is not currently being called > > > for the failed CPU is that the size and alignment requirements make sure > > > we never leave any partial mappings behind? I have no idea. Nonetheless, > > > I think we want this change as that could be fragile, and is > > > inconsistent with other callers. > > > > > > --- > > > mm/percpu-vm.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c > > > index 2054c9213c433..cd69caf6aa8d8 100644 > > > --- a/mm/percpu-vm.c > > > +++ b/mm/percpu-vm.c > > > @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, > > > return 0; > > > err: > > > for_each_possible_cpu(tcpu) { > > > - if (tcpu == cpu) > > > - break; > > > __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), > > > page_end - page_start); > > > + if (tcpu == cpu) > > > + break; > > > } > > > pcpu_post_unmap_tlb_flush(chunk, page_start, page_end); > > > return err; > > > -- > > > 2.44.0.278.ge034bb2e1d-goog > > >
On Tue, Mar 19, 2024 at 01:49:17PM -0700, Yosry Ahmed wrote: > On Tue, Mar 19, 2024 at 1:32 PM Dennis Zhou <dennis@kernel.org> wrote: > > > > Hi Yosry, > > > > On Tue, Mar 19, 2024 at 01:08:26PM -0700, Yosry Ahmed wrote: > > > On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > > > > In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call > > > > __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings > > > > were created, but not on the CPU where __pcpu_map_pages() fails. > > > > > > > > __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around > > > > vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers > > > > of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping > > > > fails, except pcpu_map_pages(). The reason could be that partial > > > > mappings may be left behind from a failed mapping attempt. > > > > > > > > Call __pcpu_unmap_pages() for the failed CPU as well in > > > > pcpu_map_pages(). > > > > > > > > This was found by code inspection, no failures or bugs were observed. > > > > > > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > > > > > > Any thoughts about this change? Should I resend next week after the > > > merge window? > > > > > > > Sorry for the delay. > > > > I'm looking at the code from mm/kmsan/hooks.c kmsan_ioremap_page_range(). > > It seems like __vunmap_range_noflush() is called on error for > > successfully mapped pages similar to how it's being done in percpu-vm.c. > > You picked an unconventional example to compare against :) > > > > > I haven't read in depth the expectations of vmap_pages_range_noflush() > > but on first glance it doesn't seem like percpu is operating out of the > > ordinary? > > I was looking at vm_map_ram(), vmap(), and __vmalloc_area_node(). They > all call vmap_pages_range()-> vmap_pages_range_noflush(). > > When vmap_pages_range() fails: > - vm_map_ram() calls > vm_unmap_ram()->free_unmap_vmap_area()->vunmap_range_noflush(). > - vmap() calls vunmap()->remove_vm_area()->free_unmap_vmap_area()-> > vunmap_range_noflush(). > - __vmalloc_area_node() calls > vfree()->remove_vm_area()->free_unmap_vmap_area()-> > vunmap_range_noflush(). > Okay so I had a moment to read it more closely. If we're mapping > 1 pages, and one of the latter pages fails. Then we could leave some mappings installed. @Andrew, I think this makes sense. Would you please be able to pick this up? I'm not running a tree this window. I will try to send out the percpu hotplug changes I've been forward porting for a while now and try to get that in a branch for-6.10. Acked-by: Dennis Zhou <dennis@kernel.org> > I think it is needed to clean up any leftover partial mappings. I am > not sure about the kmsan example though. > Yeah the kmsan example seems like it could be wrong for the same reason, but I haven't inspected that more closely. Thanks, Dennis > Adding vmalloc reviewers here as well here. > > > > > > > > --- > > > > > > > > Perhaps the reason __pcpu_unmap_pages() is not currently being called > > > > for the failed CPU is that the size and alignment requirements make sure > > > > we never leave any partial mappings behind? I have no idea. Nonetheless, > > > > I think we want this change as that could be fragile, and is > > > > inconsistent with other callers. > > > > > > > > --- > > > > mm/percpu-vm.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c > > > > index 2054c9213c433..cd69caf6aa8d8 100644 > > > > --- a/mm/percpu-vm.c > > > > +++ b/mm/percpu-vm.c > > > > @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, > > > > return 0; > > > > err: > > > > for_each_possible_cpu(tcpu) { > > > > - if (tcpu == cpu) > > > > - break; > > > > __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), > > > > page_end - page_start); > > > > + if (tcpu == cpu) > > > > + break; > > > > } > > > > pcpu_post_unmap_tlb_flush(chunk, page_start, page_end); > > > > return err; > > > > -- > > > > 2.44.0.278.ge034bb2e1d-goog > > > >
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index 2054c9213c433..cd69caf6aa8d8 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, return 0; err: for_each_possible_cpu(tcpu) { - if (tcpu == cpu) - break; __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), page_end - page_start); + if (tcpu == cpu) + break; } pcpu_post_unmap_tlb_flush(chunk, page_start, page_end); return err;
In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings were created, but not on the CPU where __pcpu_map_pages() fails. __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping fails, except pcpu_map_pages(). The reason could be that partial mappings may be left behind from a failed mapping attempt. Call __pcpu_unmap_pages() for the failed CPU as well in pcpu_map_pages(). This was found by code inspection, no failures or bugs were observed. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- Perhaps the reason __pcpu_unmap_pages() is not currently being called for the failed CPU is that the size and alignment requirements make sure we never leave any partial mappings behind? I have no idea. Nonetheless, I think we want this change as that could be fragile, and is inconsistent with other callers. --- mm/percpu-vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)