Message ID | 20250214035932.3414337-1-aik@amd.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Herbert Xu |
Headers | show |
Series | KVM: SVM: snp_alloc_firmware_pages: memory leak | expand |
On 2/13/25 21:59, Alexey Kardashevskiy wrote: > Failure to rmpupdate leads to page(s) leak, fix that. > > Signed-off-by: Alexey Kardashevskiy <aik@amd.com> > --- > drivers/crypto/ccp/sev-dev.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c > index 2e87ca0e292a..0b5f8ab657c5 100644 > --- a/drivers/crypto/ccp/sev-dev.c > +++ b/drivers/crypto/ccp/sev-dev.c > @@ -443,8 +443,10 @@ static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order) > return page; > > paddr = __pa((unsigned long)page_address(page)); > - if (rmp_mark_pages_firmware(paddr, npages, false)) > + if (rmp_mark_pages_firmware(paddr, npages, false)) { > + __free_pages(page, order); I'm not sure we can do this. On error, rmp_mark_pages_firmware() attempts to cleanup and restore any pages that were marked firmware. But snp_reclaim_pages() will leak pages that it can't restore and we don't pass back any info to the caller of rmp_mark_pages_firmware() to let it know what pages are truly available to free. Thanks, Tom > return NULL; > + } > > return page; > }
On 15/2/25 01:53, Tom Lendacky wrote: > On 2/13/25 21:59, Alexey Kardashevskiy wrote: >> Failure to rmpupdate leads to page(s) leak, fix that. >> >> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >> --- >> drivers/crypto/ccp/sev-dev.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c >> index 2e87ca0e292a..0b5f8ab657c5 100644 >> --- a/drivers/crypto/ccp/sev-dev.c >> +++ b/drivers/crypto/ccp/sev-dev.c >> @@ -443,8 +443,10 @@ static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order) >> return page; >> >> paddr = __pa((unsigned long)page_address(page)); >> - if (rmp_mark_pages_firmware(paddr, npages, false)) >> + if (rmp_mark_pages_firmware(paddr, npages, false)) { >> + __free_pages(page, order); > > I'm not sure we can do this. On error, rmp_mark_pages_firmware() attempts > to cleanup and restore any pages that were marked firmware. But > snp_reclaim_pages() will leak pages that it can't restore and we don't > pass back any info to the caller of rmp_mark_pages_firmware() to let it > know what pages are truly available to free. oh right. But there is snp_leaked_pages_list which __snp_alloc_firmware_pages() could look at. Or just replace __free_pages() above with: snp_leak_pages(__page_to_pfn(page), 1 << order) so memory leak leaves traces in dmesg, at least? > > Thanks, > Tom > >> return NULL; >> + } >> >> return page; >> }
On 2/17/25 19:24, Alexey Kardashevskiy wrote: > On 15/2/25 01:53, Tom Lendacky wrote: >> On 2/13/25 21:59, Alexey Kardashevskiy wrote: >>> Failure to rmpupdate leads to page(s) leak, fix that. >>> >>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >>> --- >>> drivers/crypto/ccp/sev-dev.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c >>> index 2e87ca0e292a..0b5f8ab657c5 100644 >>> --- a/drivers/crypto/ccp/sev-dev.c >>> +++ b/drivers/crypto/ccp/sev-dev.c >>> @@ -443,8 +443,10 @@ static struct page >>> *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order) >>> return page; >>> paddr = __pa((unsigned long)page_address(page)); >>> - if (rmp_mark_pages_firmware(paddr, npages, false)) >>> + if (rmp_mark_pages_firmware(paddr, npages, false)) { >>> + __free_pages(page, order); >> >> I'm not sure we can do this. On error, rmp_mark_pages_firmware() attempts >> to cleanup and restore any pages that were marked firmware. But >> snp_reclaim_pages() will leak pages that it can't restore and we don't >> pass back any info to the caller of rmp_mark_pages_firmware() to let it >> know what pages are truly available to free. > > oh right. But there is snp_leaked_pages_list which > __snp_alloc_firmware_pages() could look at. > > Or just replace __free_pages() above with: > > snp_leak_pages(__page_to_pfn(page), 1 << order) > > so memory leak leaves traces in dmesg, at least? I haven't looked too closely at the error path, but it might make sense to have rmp_mark_pages_firmware() leak all the pages vs trying to do any cleanup. Also, have snp_reclaim_pages() leak all the pages on any single reclaim page error, because it looks like, in general, that the pages are never free'd if any single page fails to be reclaimed, but only the page that failed and then the remaining pages gets leaked via snp_leak_pages(). Except in sev_ioctl_do_snp_platform_status(), where __free_pages() is called if rmp_mark_pages_firmware() fails, which doesn't seem right. And I'm not sure what to do if rmp_mark_pages_firmware() fails in snp_prep_cmd_buf(), since __sev_do_cmd_locked() is using pre-allocated buffers. But it looks like if snp_prep_cmd_buf() fails or snp_reclaim_cmd_buf() fails, then the buffer usage indicator is never released and commands will just fail at some point... But those buffers are allocated using devm_get_free_pages(), so nothing good would happen if the ccp module is unloaded and those pages are freed in the wrong state. Thanks, Tom > > >> >> Thanks, >> Tom >> >>> return NULL; >>> + } >>> return page; >>> } >
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c index 2e87ca0e292a..0b5f8ab657c5 100644 --- a/drivers/crypto/ccp/sev-dev.c +++ b/drivers/crypto/ccp/sev-dev.c @@ -443,8 +443,10 @@ static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order) return page; paddr = __pa((unsigned long)page_address(page)); - if (rmp_mark_pages_firmware(paddr, npages, false)) + if (rmp_mark_pages_firmware(paddr, npages, false)) { + __free_pages(page, order); return NULL; + } return page; }
Failure to rmpupdate leads to page(s) leak, fix that. Signed-off-by: Alexey Kardashevskiy <aik@amd.com> --- drivers/crypto/ccp/sev-dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)