Message ID | 20240529140739.1387692-3-edgar.iglesias@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen: Support grant mappings | expand |
Hi all, Following this commit, a test which install Debian in a guest with OVMF as firmware started to fail. QEMU exit with an error when GRUB is running on the freshly installed Debian (I don't know if GRUB is starting Linux or not). The error is: Bad ram offset ffffffffffffffff Some logs: http://logs.test-lab.xenproject.org/osstest/logs/186611/test-amd64-amd64-xl-qemuu-ovmf-amd64/info.html Any idea? Something is trying to do something with the address "-1" when it shouldn't? Cheers, Anthony On Wed, May 29, 2024 at 04:07:33PM +0200, Edgar E. Iglesias wrote: > From: "Edgar E. Iglesias" <edgar.iglesias@amd.com> > > When invalidating memory ranges, if we happen to hit the first > entry in a bucket we were never unmapping it. This was harmless > for foreign mappings but now that we're looking to reuse the > mapcache for transient grant mappings, we must unmap entries > when invalidated. > > Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com> > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > --- > hw/xen/xen-mapcache.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c > index bc860f4373..ec95445696 100644 > --- a/hw/xen/xen-mapcache.c > +++ b/hw/xen/xen-mapcache.c > @@ -491,18 +491,23 @@ static void xen_invalidate_map_cache_entry_unlocked(MapCache *mc, > return; > } > entry->lock--; > - if (entry->lock > 0 || pentry == NULL) { > + if (entry->lock > 0) { > return; > } > > - pentry->next = entry->next; > ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size); > if (munmap(entry->vaddr_base, entry->size) != 0) { > perror("unmap fails"); > exit(-1); > } > + > g_free(entry->valid_mapping); > - g_free(entry); > + if (pentry) { > + pentry->next = entry->next; > + g_free(entry); > + } else { > + memset(entry, 0, sizeof *entry); > + } > } > > typedef struct XenMapCacheData { > -- > 2.40.1 > >
On Mon, Jul 1, 2024 at 2:55 PM Anthony PERARD <anthony.perard@vates.tech> wrote: > Hi all, > > Following this commit, a test which install Debian in a guest with OVMF > as firmware started to fail. QEMU exit with an error when GRUB is > running on the freshly installed Debian (I don't know if GRUB is > starting Linux or not). > The error is: > Bad ram offset ffffffffffffffff > > Some logs: > > http://logs.test-lab.xenproject.org/osstest/logs/186611/test-amd64-amd64-xl-qemuu-ovmf-amd64/info.html > > Any idea? Something is trying to do something with the address "-1" when > it shouldn't? > > Hi Anothny, Yes, it looks like something is calling qemu_get_ram_block() on something that isn't mapped. One possible path is in qemu_ram_block_from_host() but there may be others. The following patch may help. Any chance you could try to get a backtrace from QEMU when it failed? diff --git a/system/physmem.c b/system/physmem.c index 33d09f7571..2669c4dbbb 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2277,6 +2277,9 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset, ram_addr_t ram_addr; RCU_READ_LOCK_GUARD(); ram_addr = xen_ram_addr_from_mapcache(ptr); + if (ram_addr == RAM_ADDR_INVALID) { + return NULL; + } block = qemu_get_ram_block(ram_addr); if (block) { *offset = ram_addr - block->offset; > Cheers, > > Anthony > > On Wed, May 29, 2024 at 04:07:33PM +0200, Edgar E. Iglesias wrote: > > From: "Edgar E. Iglesias" <edgar.iglesias@amd.com> > > > > When invalidating memory ranges, if we happen to hit the first > > entry in a bucket we were never unmapping it. This was harmless > > for foreign mappings but now that we're looking to reuse the > > mapcache for transient grant mappings, we must unmap entries > > when invalidated. > > > > Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com> > > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > > --- > > hw/xen/xen-mapcache.c | 11 ++++++++--- > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c > > index bc860f4373..ec95445696 100644 > > --- a/hw/xen/xen-mapcache.c > > +++ b/hw/xen/xen-mapcache.c > > @@ -491,18 +491,23 @@ static void > xen_invalidate_map_cache_entry_unlocked(MapCache *mc, > > return; > > } > > entry->lock--; > > - if (entry->lock > 0 || pentry == NULL) { > > + if (entry->lock > 0) { > > return; > > } > > > > - pentry->next = entry->next; > > ram_block_notify_remove(entry->vaddr_base, entry->size, > entry->size); > > if (munmap(entry->vaddr_base, entry->size) != 0) { > > perror("unmap fails"); > > exit(-1); > > } > > + > > g_free(entry->valid_mapping); > > - g_free(entry); > > + if (pentry) { > > + pentry->next = entry->next; > > + g_free(entry); > > + } else { > > + memset(entry, 0, sizeof *entry); > > + } > > } > > > > typedef struct XenMapCacheData { > > -- > > 2.40.1 > > > > > -- > > Anthony Perard | Vates XCP-ng Developer > > XCP-ng & Xen Orchestra - Vates solutions > > web: https://vates.tech >
On Mon, Jul 1, 2024 at 3:58 PM Edgar E. Iglesias <edgar.iglesias@gmail.com> wrote: > On Mon, Jul 1, 2024 at 2:55 PM Anthony PERARD <anthony.perard@vates.tech> > wrote: > >> Hi all, >> >> Following this commit, a test which install Debian in a guest with OVMF >> as firmware started to fail. QEMU exit with an error when GRUB is >> running on the freshly installed Debian (I don't know if GRUB is >> starting Linux or not). >> The error is: >> Bad ram offset ffffffffffffffff >> >> Some logs: >> >> http://logs.test-lab.xenproject.org/osstest/logs/186611/test-amd64-amd64-xl-qemuu-ovmf-amd64/info.html >> >> Any idea? Something is trying to do something with the address "-1" when >> it shouldn't? >> >> > Hi Anothny, > > Yes, it looks like something is calling qemu_get_ram_block() on something > that isn't mapped. > One possible path is in qemu_ram_block_from_host() but there may be others. > > The following patch may help. > Any chance you could try to get a backtrace from QEMU when it failed? > > diff --git a/system/physmem.c b/system/physmem.c > index 33d09f7571..2669c4dbbb 100644 > --- a/system/physmem.c > +++ b/system/physmem.c > @@ -2277,6 +2277,9 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool > round_offset, > ram_addr_t ram_addr; > RCU_READ_LOCK_GUARD(); > ram_addr = xen_ram_addr_from_mapcache(ptr); > + if (ram_addr == RAM_ADDR_INVALID) { > + return NULL; > + } > block = qemu_get_ram_block(ram_addr); > if (block) { > *offset = ram_addr - block->offset; > > > One more thing, regarding this specific patch. I don't think we should clear the entire entry, the next field should be kept, otherwise we'll disconnect following mappings that will never be found again. IIUC, this could very well be causing the problem you see. Does the following make sense? diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c index 5f23b0adbe..e9df53c19d 100644 --- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -597,7 +597,14 @@ static void xen_invalidate_map_cache_entry_unlocked(MapCache *mc, pentry->next = entry->next; g_free(entry); } else { - memset(entry, 0, sizeof *entry); + /* Invalidate mapping. */ + entry->paddr_index = 0; + entry->vaddr_base = NULL; + entry->size = 0; + g_free(entry->valid_mapping); + entry->valid_mapping = NULL; + entry->flags = 0; + /* Keep entry->next pointing to the rest of the list. */ } } > > >> Cheers, >> >> Anthony >> >> On Wed, May 29, 2024 at 04:07:33PM +0200, Edgar E. Iglesias wrote: >> > From: "Edgar E. Iglesias" <edgar.iglesias@amd.com> >> > >> > When invalidating memory ranges, if we happen to hit the first >> > entry in a bucket we were never unmapping it. This was harmless >> > for foreign mappings but now that we're looking to reuse the >> > mapcache for transient grant mappings, we must unmap entries >> > when invalidated. >> > >> > Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com> >> > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> >> > --- >> > hw/xen/xen-mapcache.c | 11 ++++++++--- >> > 1 file changed, 8 insertions(+), 3 deletions(-) >> > >> > diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c >> > index bc860f4373..ec95445696 100644 >> > --- a/hw/xen/xen-mapcache.c >> > +++ b/hw/xen/xen-mapcache.c >> > @@ -491,18 +491,23 @@ static void >> xen_invalidate_map_cache_entry_unlocked(MapCache *mc, >> > return; >> > } >> > entry->lock--; >> > - if (entry->lock > 0 || pentry == NULL) { >> > + if (entry->lock > 0) { >> > return; >> > } >> > >> > - pentry->next = entry->next; >> > ram_block_notify_remove(entry->vaddr_base, entry->size, >> entry->size); >> > if (munmap(entry->vaddr_base, entry->size) != 0) { >> > perror("unmap fails"); >> > exit(-1); >> > } >> > + >> > g_free(entry->valid_mapping); >> > - g_free(entry); >> > + if (pentry) { >> > + pentry->next = entry->next; >> > + g_free(entry); >> > + } else { >> > + memset(entry, 0, sizeof *entry); >> > + } >> > } >> > >> > typedef struct XenMapCacheData { >> > -- >> > 2.40.1 >> > >> > >> -- >> >> Anthony Perard | Vates XCP-ng Developer >> >> XCP-ng & Xen Orchestra - Vates solutions >> >> web: https://vates.tech >> >
On Mon, Jul 1, 2024 at 4:30 PM Edgar E. Iglesias <edgar.iglesias@gmail.com> wrote: > > > On Mon, Jul 1, 2024 at 3:58 PM Edgar E. Iglesias <edgar.iglesias@gmail.com> > wrote: > >> On Mon, Jul 1, 2024 at 2:55 PM Anthony PERARD <anthony.perard@vates.tech> >> wrote: >> >>> Hi all, >>> >>> Following this commit, a test which install Debian in a guest with OVMF >>> as firmware started to fail. QEMU exit with an error when GRUB is >>> running on the freshly installed Debian (I don't know if GRUB is >>> starting Linux or not). >>> The error is: >>> Bad ram offset ffffffffffffffff >>> >>> Some logs: >>> >>> http://logs.test-lab.xenproject.org/osstest/logs/186611/test-amd64-amd64-xl-qemuu-ovmf-amd64/info.html >>> >>> Any idea? Something is trying to do something with the address "-1" when >>> it shouldn't? >>> >>> >> Hi Anothny, >> >> Yes, it looks like something is calling qemu_get_ram_block() on something >> that isn't mapped. >> One possible path is in qemu_ram_block_from_host() but there may be >> others. >> >> The following patch may help. >> Any chance you could try to get a backtrace from QEMU when it failed? >> >> diff --git a/system/physmem.c b/system/physmem.c >> index 33d09f7571..2669c4dbbb 100644 >> --- a/system/physmem.c >> +++ b/system/physmem.c >> @@ -2277,6 +2277,9 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool >> round_offset, >> ram_addr_t ram_addr; >> RCU_READ_LOCK_GUARD(); >> ram_addr = xen_ram_addr_from_mapcache(ptr); >> + if (ram_addr == RAM_ADDR_INVALID) { >> + return NULL; >> + } >> block = qemu_get_ram_block(ram_addr); >> if (block) { >> *offset = ram_addr - block->offset; >> >> >> > One more thing, regarding this specific patch. I don't think we should > clear the > entire entry, the next field should be kept, otherwise we'll disconnect > following > mappings that will never be found again. IIUC, this could very well be > causing the problem you see. > > Does the following make sense? > > diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c > index 5f23b0adbe..e9df53c19d 100644 > --- a/hw/xen/xen-mapcache.c > +++ b/hw/xen/xen-mapcache.c > @@ -597,7 +597,14 @@ static void > xen_invalidate_map_cache_entry_unlocked(MapCache *mc, > pentry->next = entry->next; > g_free(entry); > } else { > - memset(entry, 0, sizeof *entry); > + /* Invalidate mapping. */ > + entry->paddr_index = 0; > + entry->vaddr_base = NULL; > + entry->size = 0; > + g_free(entry->valid_mapping); > + entry->valid_mapping = NULL; > + entry->flags = 0; > + /* Keep entry->next pointing to the rest of the list. */ > } > } > > > And here without double-freeing entry->valid_mapping: diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c index 5f23b0adbe..667807b3b6 100644 --- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -597,7 +597,13 @@ static void xen_invalidate_map_cache_entry_unlocked(MapCache *mc, pentry->next = entry->next; g_free(entry); } else { - memset(entry, 0, sizeof *entry); + /* Invalidate mapping. */ + entry->paddr_index = 0; + entry->vaddr_base = NULL; + entry->size = 0; + entry->valid_mapping = NULL; + entry->flags = 0; + /* Keep entry->next pointing to the rest of the list. */ } }
On Mon, Jul 01, 2024 at 04:34:53PM +0200, Edgar E. Iglesias wrote: > On Mon, Jul 1, 2024 at 4:30 PM Edgar E. Iglesias <edgar.iglesias@gmail.com> > wrote: > > On Mon, Jul 1, 2024 at 3:58 PM Edgar E. Iglesias <edgar.iglesias@gmail.com> > > wrote: > >> Any chance you could try to get a backtrace from QEMU when it failed? Here it is: #3 0x00007fa8762f4472 in __GI_abort () at ./stdlib/abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {94603440166168, 18446744073709551615, 94603406369640, 0, 0, 94603406346720, 94603440166168, 140361486774256, 0, 140361486773376, 94603401285536, 140361496232688, 94603440166096, 140361486773456, 94603401289576, 140360849280256}}, sa_flags = -1804462896, sa_restorer = 0x748f2d40} #4 0x0000560a92230f0d in qemu_get_ram_block (addr=18446744073709551615) at ../system/physmem.c:801 block = 0x0 #5 0x0000560a922350ab in qemu_ram_block_from_host (ptr=0x7fa84e8fcd00, round_offset=false, offset=0x7fa8748f2de8) at ../system/physmem.c:2280 ram_addr = 18446744073709551615 _rcu_read_auto = 0x1 block = 0x0 host = 0x7fa84e8fcd00 "" _rcu_read_auto = 0x7fa8751f8288 #6 0x0000560a92229669 in memory_region_from_host (ptr=0x7fa84e8fcd00, offset=0x7fa8748f2de8) at ../system/memory.c:2440 block = 0x0 #7 0x0000560a92237418 in address_space_unmap (as=0x560a94b05a20, buffer=0x7fa84e8fcd00, len=32768, is_write=true, access_len=32768) at ../system/physmem.c:3246 mr = 0x0 addr1 = 0 __PRETTY_FUNCTION__ = "address_space_unmap" #8 0x0000560a91fd6cd3 in dma_memory_unmap (as=0x560a94b05a20, buffer=0x7fa84e8fcd00, len=32768, dir=DMA_DIRECTION_FROM_DEVICE, access_len=32768) at /root/build/qemu/include/sysemu/dma.h:236 #9 0x0000560a91fd763d in dma_blk_unmap (dbs=0x560a94d87400) at ../system/dma-helpers.c:93 i = 1 #10 0x0000560a91fd76e6 in dma_complete (dbs=0x560a94d87400, ret=0) at ../system/dma-helpers.c:105 __PRETTY_FUNCTION__ = "dma_complete" #11 0x0000560a91fd781c in dma_blk_cb (opaque=0x560a94d87400, ret=0) at ../system/dma-helpers.c:129 dbs = 0x560a94d87400 ctx = 0x560a9448da90 cur_addr = 0 cur_len = 0 mem = 0x0 __PRETTY_FUNCTION__ = "dma_blk_cb" #12 0x0000560a9232e974 in blk_aio_complete (acb=0x560a9448d5f0) at ../block/block-backend.c:1555 #13 0x0000560a9232ebd1 in blk_aio_read_entry (opaque=0x560a9448d5f0) at ../block/block-backend.c:1610 acb = 0x560a9448d5f0 rwco = 0x560a9448d618 qiov = 0x560a94d87460 __PRETTY_FUNCTION__ = "blk_aio_read_entry" > > One more thing, regarding this specific patch. I don't think we should > > clear the > > entire entry, the next field should be kept, otherwise we'll disconnect > > following > > mappings that will never be found again. IIUC, this could very well be > > causing the problem you see. > > > > Does the following make sense? > > > And here without double-freeing entry->valid_mapping: > > diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c > index 5f23b0adbe..667807b3b6 100644 > --- a/hw/xen/xen-mapcache.c > +++ b/hw/xen/xen-mapcache.c > @@ -597,7 +597,13 @@ static void > xen_invalidate_map_cache_entry_unlocked(MapCache *mc, > pentry->next = entry->next; > g_free(entry); > } else { > - memset(entry, 0, sizeof *entry); > + /* Invalidate mapping. */ > + entry->paddr_index = 0; > + entry->vaddr_base = NULL; > + entry->size = 0; > + entry->valid_mapping = NULL; > + entry->flags = 0; > + /* Keep entry->next pointing to the rest of the list. */ > } > } I've tried this patch, and that fix the issue I've seen. I'll run more tests on it, just in case, but there's no reason that would break something else. Cheers,
On Mon, Jul 1, 2024 at 6:21 PM Anthony PERARD <anthony.perard@vates.tech> wrote: > On Mon, Jul 01, 2024 at 04:34:53PM +0200, Edgar E. Iglesias wrote: > > On Mon, Jul 1, 2024 at 4:30 PM Edgar E. Iglesias < > edgar.iglesias@gmail.com> > > wrote: > > > On Mon, Jul 1, 2024 at 3:58 PM Edgar E. Iglesias < > edgar.iglesias@gmail.com> > > > wrote: > > >> Any chance you could try to get a backtrace from QEMU when it failed? > > Here it is: > > > #3 0x00007fa8762f4472 in __GI_abort () at ./stdlib/abort.c:79 > save_stage = 1 > act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = > 0x20}, sa_mask = {__val = {94603440166168, 18446744073709551615, > 94603406369640, 0, 0, 94603406346720, 94603440166168, 140361486774256, 0, > 140361486773376, 94603401285536, 140361496232688, 94603440166096, > 140361486773456, 94603401289576, 140360849280256}}, sa_flags = -1804462896, > sa_restorer = 0x748f2d40} > #4 0x0000560a92230f0d in qemu_get_ram_block (addr=18446744073709551615) > at ../system/physmem.c:801 > block = 0x0 > #5 0x0000560a922350ab in qemu_ram_block_from_host (ptr=0x7fa84e8fcd00, > round_offset=false, offset=0x7fa8748f2de8) at ../system/physmem.c:2280 > ram_addr = 18446744073709551615 > _rcu_read_auto = 0x1 > block = 0x0 > host = 0x7fa84e8fcd00 "" > _rcu_read_auto = 0x7fa8751f8288 > #6 0x0000560a92229669 in memory_region_from_host (ptr=0x7fa84e8fcd00, > offset=0x7fa8748f2de8) at ../system/memory.c:2440 > block = 0x0 > #7 0x0000560a92237418 in address_space_unmap (as=0x560a94b05a20, > buffer=0x7fa84e8fcd00, len=32768, is_write=true, access_len=32768) at > ../system/physmem.c:3246 > mr = 0x0 > addr1 = 0 > __PRETTY_FUNCTION__ = "address_space_unmap" > #8 0x0000560a91fd6cd3 in dma_memory_unmap (as=0x560a94b05a20, > buffer=0x7fa84e8fcd00, len=32768, dir=DMA_DIRECTION_FROM_DEVICE, > access_len=32768) at /root/build/qemu/include/sysemu/dma.h:236 > #9 0x0000560a91fd763d in dma_blk_unmap (dbs=0x560a94d87400) at > ../system/dma-helpers.c:93 > i = 1 > #10 0x0000560a91fd76e6 in dma_complete (dbs=0x560a94d87400, ret=0) at > ../system/dma-helpers.c:105 > __PRETTY_FUNCTION__ = "dma_complete" > #11 0x0000560a91fd781c in dma_blk_cb (opaque=0x560a94d87400, ret=0) at > ../system/dma-helpers.c:129 > dbs = 0x560a94d87400 > ctx = 0x560a9448da90 > cur_addr = 0 > cur_len = 0 > mem = 0x0 > __PRETTY_FUNCTION__ = "dma_blk_cb" > #12 0x0000560a9232e974 in blk_aio_complete (acb=0x560a9448d5f0) at > ../block/block-backend.c:1555 > #13 0x0000560a9232ebd1 in blk_aio_read_entry (opaque=0x560a9448d5f0) at > ../block/block-backend.c:1610 > acb = 0x560a9448d5f0 > rwco = 0x560a9448d618 > qiov = 0x560a94d87460 > __PRETTY_FUNCTION__ = "blk_aio_read_entry" > > > > One more thing, regarding this specific patch. I don't think we should > > > clear the > > > entire entry, the next field should be kept, otherwise we'll disconnect > > > following > > > mappings that will never be found again. IIUC, this could very well be > > > causing the problem you see. > > > > > > Does the following make sense? > > > > > And here without double-freeing entry->valid_mapping: > > > > diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c > > index 5f23b0adbe..667807b3b6 100644 > > --- a/hw/xen/xen-mapcache.c > > +++ b/hw/xen/xen-mapcache.c > > @@ -597,7 +597,13 @@ static void > > xen_invalidate_map_cache_entry_unlocked(MapCache *mc, > > pentry->next = entry->next; > > g_free(entry); > > } else { > > - memset(entry, 0, sizeof *entry); > > + /* Invalidate mapping. */ > > + entry->paddr_index = 0; > > + entry->vaddr_base = NULL; > > + entry->size = 0; > > + entry->valid_mapping = NULL; > > + entry->flags = 0; > > + /* Keep entry->next pointing to the rest of the list. */ > > } > > } > > I've tried this patch, and that fix the issue I've seen. I'll run more > tests on it, just in case, but there's no reason that would break > something else. > > Cheers, > Thanks Anthony, I'll send out a proper patch tomorrow. Cheers, Edgar
diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c index bc860f4373..ec95445696 100644 --- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -491,18 +491,23 @@ static void xen_invalidate_map_cache_entry_unlocked(MapCache *mc, return; } entry->lock--; - if (entry->lock > 0 || pentry == NULL) { + if (entry->lock > 0) { return; } - pentry->next = entry->next; ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size); if (munmap(entry->vaddr_base, entry->size) != 0) { perror("unmap fails"); exit(-1); } + g_free(entry->valid_mapping); - g_free(entry); + if (pentry) { + pentry->next = entry->next; + g_free(entry); + } else { + memset(entry, 0, sizeof *entry); + } } typedef struct XenMapCacheData {