Message ID | 20240906024201.1214712-2-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: hwpoison: two more poison recovery | expand |
On 9/5/2024 7:42 PM, Kefeng Wang wrote: > Like commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on > write faults"), there is another path which could crash because it does > not have recovery code where poison is consumed by the kernel in > do_cow_fault(), a crash calltrace shown below on old kernel, but it > could be happened in the lastest mainline code, > > CPU: 7 PID: 3248 Comm: mpi Kdump: loaded Tainted: G OE 5.10.0 #1 > pc : copy_page+0xc/0xbc > lr : copy_user_highpage+0x50/0x9c > Call trace: > copy_page+0xc/0xbc > do_cow_fault+0x118/0x2bc > do_fault+0x40/0x1a4 > handle_pte_fault+0x154/0x230 > __handle_mm_fault+0x1a8/0x38c > handle_mm_fault+0xf0/0x250 > do_page_fault+0x184/0x454 > do_translation_fault+0xac/0xd4 > do_mem_abort+0x44/0xbc > > Fix it by using copy_mc_user_highpage() to handle this case and return > VM_FAULT_HWPOISON for cow fault. > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > --- > mm/memory.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 42674c0748cb..d310c073a1b3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5089,7 +5089,10 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf) > if (ret & VM_FAULT_DONE_COW) > return ret; > > - copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma); > + if (copy_mc_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma)) { > + ret = VM_FAULT_HWPOISON; > + goto uncharge_out; > + } > __folio_mark_uptodate(folio); > > ret |= finish_fault(vmf); Thanks for catching it! Reviewed-by: Jane Chu <jane.chu@oracle.com> -jane
On 2024/9/6 10:42, Kefeng Wang wrote: > Like commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on > write faults"), there is another path which could crash because it does > not have recovery code where poison is consumed by the kernel in > do_cow_fault(), a crash calltrace shown below on old kernel, but it > could be happened in the lastest mainline code, > > CPU: 7 PID: 3248 Comm: mpi Kdump: loaded Tainted: G OE 5.10.0 #1 > pc : copy_page+0xc/0xbc > lr : copy_user_highpage+0x50/0x9c > Call trace: > copy_page+0xc/0xbc > do_cow_fault+0x118/0x2bc > do_fault+0x40/0x1a4 > handle_pte_fault+0x154/0x230 > __handle_mm_fault+0x1a8/0x38c > handle_mm_fault+0xf0/0x250 > do_page_fault+0x184/0x454 > do_translation_fault+0xac/0xd4 > do_mem_abort+0x44/0xbc > > Fix it by using copy_mc_user_highpage() to handle this case and return > VM_FAULT_HWPOISON for cow fault. > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > --- > mm/memory.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 42674c0748cb..d310c073a1b3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5089,7 +5089,10 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf) > if (ret & VM_FAULT_DONE_COW) > return ret; > > - copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma); > + if (copy_mc_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma)) { > + ret = VM_FAULT_HWPOISON; > + goto uncharge_out; > + } When copy_mc_user_highpage fails, we should have vmf->page locked and hold the extra refcnt of vmf->page. So we should call unlock_page(vmf->page) and put_page(vmf->page) before goto uncharge_out? Thanks. .
On 2024/9/10 9:58, Miaohe Lin wrote: > On 2024/9/6 10:42, Kefeng Wang wrote: >> Like commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on >> write faults"), there is another path which could crash because it does >> not have recovery code where poison is consumed by the kernel in >> do_cow_fault(), a crash calltrace shown below on old kernel, but it >> could be happened in the lastest mainline code, >> >> CPU: 7 PID: 3248 Comm: mpi Kdump: loaded Tainted: G OE 5.10.0 #1 >> pc : copy_page+0xc/0xbc >> lr : copy_user_highpage+0x50/0x9c >> Call trace: >> copy_page+0xc/0xbc >> do_cow_fault+0x118/0x2bc >> do_fault+0x40/0x1a4 >> handle_pte_fault+0x154/0x230 >> __handle_mm_fault+0x1a8/0x38c >> handle_mm_fault+0xf0/0x250 >> do_page_fault+0x184/0x454 >> do_translation_fault+0xac/0xd4 >> do_mem_abort+0x44/0xbc >> >> Fix it by using copy_mc_user_highpage() to handle this case and return >> VM_FAULT_HWPOISON for cow fault. >> >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >> --- >> mm/memory.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 42674c0748cb..d310c073a1b3 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -5089,7 +5089,10 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf) >> if (ret & VM_FAULT_DONE_COW) >> return ret; >> >> - copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma); >> + if (copy_mc_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma)) { >> + ret = VM_FAULT_HWPOISON; >> + goto uncharge_out; >> + } > > When copy_mc_user_highpage fails, we should have vmf->page locked and hold the extra refcnt > of vmf->page. So we should call unlock_page(vmf->page) and put_page(vmf->page) before goto > uncharge_out? > Right, for upstream, we need to more handling for vmf->page, will fix, thanks. > Thanks. > .
diff --git a/mm/memory.c b/mm/memory.c index 42674c0748cb..d310c073a1b3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5089,7 +5089,10 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf) if (ret & VM_FAULT_DONE_COW) return ret; - copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma); + if (copy_mc_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma)) { + ret = VM_FAULT_HWPOISON; + goto uncharge_out; + } __folio_mark_uptodate(folio); ret |= finish_fault(vmf);
Like commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults"), there is another path which could crash because it does not have recovery code where poison is consumed by the kernel in do_cow_fault(), a crash calltrace shown below on old kernel, but it could be happened in the lastest mainline code, CPU: 7 PID: 3248 Comm: mpi Kdump: loaded Tainted: G OE 5.10.0 #1 pc : copy_page+0xc/0xbc lr : copy_user_highpage+0x50/0x9c Call trace: copy_page+0xc/0xbc do_cow_fault+0x118/0x2bc do_fault+0x40/0x1a4 handle_pte_fault+0x154/0x230 __handle_mm_fault+0x1a8/0x38c handle_mm_fault+0xf0/0x250 do_page_fault+0x184/0x454 do_translation_fault+0xac/0xd4 do_mem_abort+0x44/0xbc Fix it by using copy_mc_user_highpage() to handle this case and return VM_FAULT_HWPOISON for cow fault. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- mm/memory.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)