Message ID | 20090615162815.4830.38216.stgit@host.lart (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tuesday 16 June 2009 00:29:17 Alex Williamson wrote: > The PCI code already knows about option ROMs, so we just need to > mmap some space for it, load it with a copy of the contents, and > complete the plubming to the generic code. > > With this a Linux guest can get access to the ROM contents via > /sys/bus/pci/devices/<dev>/rom. This might also enable the BIOS > to execute ROMs by loading them dynamically from the device > rather than hoping they all fit into RAM. > > Signed-off-by: Alex Williamson <alex.williamson@hp.com> Hi Alex The patch looks fine. One question: if guest write to the ROM, I think the guest would be killed for QEmu would receive a SIGSEGV? I am not sure if it's too severe...
On Thu, 2009-06-18 at 13:30 +0800, Yang, Sheng wrote: > On Tuesday 16 June 2009 00:29:17 Alex Williamson wrote: > > The PCI code already knows about option ROMs, so we just need to > > mmap some space for it, load it with a copy of the contents, and > > complete the plubming to the generic code. > > > > With this a Linux guest can get access to the ROM contents via > > /sys/bus/pci/devices/<dev>/rom. This might also enable the BIOS > > to execute ROMs by loading them dynamically from the device > > rather than hoping they all fit into RAM. > > > The patch looks fine. One question: if guest write to the ROM, I think the > guest would be killed for QEmu would receive a SIGSEGV? I am not sure if it's > too severe... Hi Sheng, Good thought. I tested this with a simple program that mmaps the ROM address from /dev/mem and tries to write to it (using setpci to enable the ROM). The results are a little surprising. On the host, writing to the ROM causes an NMI and the system dies. On the KVM guest, the write is happily discarded, neither segfaulting from the mprotect nor affecting the contents of the ROM. So it seems that something above my mprotect is discarding the write, and if we did hit it, a qemu segfault isn't that far from what happens on bare metal. The one oddity I noticed is that even when the enable bit is clear, the guest can read the ROM. I don't know that this is actually illegal, vs returning zeros or ones though. It seems like maybe the generic PCI code isn't tracking the enable bit. I think that's an independent problem from this patch though. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 19 June 2009 00:28:41 Alex Williamson wrote: > On Thu, 2009-06-18 at 13:30 +0800, Yang, Sheng wrote: > > On Tuesday 16 June 2009 00:29:17 Alex Williamson wrote: > > > The PCI code already knows about option ROMs, so we just need to > > > mmap some space for it, load it with a copy of the contents, and > > > complete the plubming to the generic code. > > > > > > With this a Linux guest can get access to the ROM contents via > > > /sys/bus/pci/devices/<dev>/rom. This might also enable the BIOS > > > to execute ROMs by loading them dynamically from the device > > > rather than hoping they all fit into RAM. > > > > The patch looks fine. One question: if guest write to the ROM, I think > > the guest would be killed for QEmu would receive a SIGSEGV? I am not sure > > if it's too severe... > > Hi Sheng, > > Good thought. I tested this with a simple program that mmaps the ROM > address from /dev/mem and tries to write to it (using setpci to enable > the ROM). The results are a little surprising. On the host, writing to > the ROM causes an NMI and the system dies. On the KVM guest, the write > is happily discarded, neither segfaulting from the mprotect nor > affecting the contents of the ROM. So it seems that something above my > mprotect is discarding the write, and if we did hit it, a qemu segfault > isn't that far from what happens on bare metal. > Oh, that's good. :) > The one oddity I noticed is that even when the enable bit is clear, the > guest can read the ROM. I don't know that this is actually illegal, vs > returning zeros or ones though. It seems like maybe the generic PCI > code isn't tracking the enable bit. I think that's an independent > problem from this patch though. Thanks, That should be fine. I've taken a look at code, seems Linux kernel set enable_bit when someone begin to read rom, and copy rom to buffer, then unmap the rom. So the rom can be read when enable bit clear.
On Fri, 2009-06-19 at 15:27 +0800, Yang, Sheng wrote: > On Friday 19 June 2009 00:28:41 Alex Williamson wrote: > > The one oddity I noticed is that even when the enable bit is clear, the > > guest can read the ROM. I don't know that this is actually illegal, vs > > returning zeros or ones though. It seems like maybe the generic PCI > > code isn't tracking the enable bit. I think that's an independent > > problem from this patch though. Thanks, > > That should be fine. I've taken a look at code, seems Linux kernel set > enable_bit when someone begin to read rom, and copy rom to buffer, then unmap > the rom. So the rom can be read when enable bit clear. For this testing, I used an mmap of the ROM address though, so the kernel caching shouldn't have been involved. It looks to me like the problem is that the map function provided via pci_register_io_region() only knows how to create mappings, not tear them down. I think maybe pci_update_mappings() should still call the map_func when new_addr is -1 to let the io space drive shutdown the mapping. As it is, once we setup the mapping, it lives until something else happens to overlap it, regardless of the state of the PCI BAR. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 19 June 2009 21:44:40 Alex Williamson wrote: > On Fri, 2009-06-19 at 15:27 +0800, Yang, Sheng wrote: > > On Friday 19 June 2009 00:28:41 Alex Williamson wrote: > > > The one oddity I noticed is that even when the enable bit is clear, the > > > guest can read the ROM. I don't know that this is actually illegal, vs > > > returning zeros or ones though. It seems like maybe the generic PCI > > > code isn't tracking the enable bit. I think that's an independent > > > problem from this patch though. Thanks, > > > > That should be fine. I've taken a look at code, seems Linux kernel set > > enable_bit when someone begin to read rom, and copy rom to buffer, then > > unmap the rom. So the rom can be read when enable bit clear. > > For this testing, I used an mmap of the ROM address though, so the > kernel caching shouldn't have been involved. It looks to me like the > problem is that the map function provided via pci_register_io_region() > only knows how to create mappings, not tear them down. I think maybe > pci_update_mappings() should still call the map_func when new_addr is -1 > to let the io space drive shutdown the mapping. As it is, once we setup > the mapping, it lives until something else happens to overlap it, > regardless of the state of the PCI BAR. Thanks, I think it may not necessary to tear them down, for the bar mapping won't change IIUR. And you are accessing the sysfs file, right? In the Linux kernel, IIRC, pci_create_sysfs_dev_files() create sysfs file, and hook the read to pci_read_rom(), which called pci_map_rom(), which would call pci_enable_rom(), and write the enable_rom bit to the rom_base_reg. So that the rom can be read regardless of enable_rom bit state - and . But I also found something interested. The write hook of file, pci_write_rom() seems won't cause NMI(and seems you need write a char rather than 0 to enable the accessing?). So why NMI happen in host?...
On 06/15/2009 07:29 PM, Alex Williamson wrote: > The PCI code already knows about option ROMs, so we just need to > mmap some space for it, load it with a copy of the contents, and > complete the plubming to the generic code. > > With this a Linux guest can get access to the ROM contents via > /sys/bus/pci/devices/<dev>/rom. This might also enable the BIOS > to execute ROMs by loading them dynamically from the device > rather than hoping they all fit into RAM. > Applied, thanks.
On Mon, 2009-06-22 at 13:32 +0800, Yang, Sheng wrote: > On Friday 19 June 2009 21:44:40 Alex Williamson wrote: > > On Fri, 2009-06-19 at 15:27 +0800, Yang, Sheng wrote: > > > On Friday 19 June 2009 00:28:41 Alex Williamson wrote: > > > > The one oddity I noticed is that even when the enable bit is clear, the > > > > guest can read the ROM. I don't know that this is actually illegal, vs > > > > returning zeros or ones though. It seems like maybe the generic PCI > > > > code isn't tracking the enable bit. I think that's an independent > > > > problem from this patch though. Thanks, > > > > > > That should be fine. I've taken a look at code, seems Linux kernel set > > > enable_bit when someone begin to read rom, and copy rom to buffer, then > > > unmap the rom. So the rom can be read when enable bit clear. > > > > For this testing, I used an mmap of the ROM address though, so the > > kernel caching shouldn't have been involved. It looks to me like the > > problem is that the map function provided via pci_register_io_region() > > only knows how to create mappings, not tear them down. I think maybe > > pci_update_mappings() should still call the map_func when new_addr is -1 > > to let the io space drive shutdown the mapping. As it is, once we setup > > the mapping, it lives until something else happens to overlap it, > > regardless of the state of the PCI BAR. Thanks, > > I think it may not necessary to tear them down, for the bar mapping won't > change IIUR. We can't guarantee that, the OS can move them if it understands the resources available to the PCI bus. It typically doesn't move them though. > And you are accessing the sysfs file, right? In the Linux kernel, IIRC, > pci_create_sysfs_dev_files() create sysfs file, and hook the read to > pci_read_rom(), which called pci_map_rom(), which would call pci_enable_rom(), > and write the enable_rom bit to the rom_base_reg. So that the rom can be read > regardless of enable_rom bit state - and . > > But I also found something interested. The write hook of file, pci_write_rom() > seems won't cause NMI(and seems you need write a char rather than 0 to enable > the accessing?). So why NMI happen in host?... As I mentioned, I'm not using the /sys files to write to the ROM precisely because the rom write function is only to enable/disable the ROM BAR. I'm using setpci to manually enable the ROM, then I use the test program below to mmap the ROM address from /dev/mem, read part of it, try to write the first few bytes, then read it back. You'll need to change the hard coded address if you want to test yourself. Obviously don't do it on a system in use by others since it will likely take it down. Thanks, Alex #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #define DEV_MEM "/dev/mem" #define ROM_ADDR 0xe6300000 int main(void) { unsigned char *map; int i, fd = open(DEV_MEM, O_RDWR); if (fd == -1) { printf("Failed to open /dev/mem: %s\n", strerror(errno)); return -1; } map = mmap(NULL, getpagesize(), PROT_READ|PROT_WRITE, MAP_SHARED, fd, ROM_ADDR); if (map == MAP_FAILED) { printf("Failed to mmap /dev/mem: %s\n", strerror(errno)); close(fd); return -1; } for (i = 0; i < 64;) { printf("%02x", map[i++]); if (i % 16 == 0) printf("\n"); else if (i % 4 == 0) printf(" "); else printf(" "); } printf("Writing..."); map[0] = 0xba; map[1] = 0xdb; map[2] = 0xad; map[3] = 0xc0; map[4] = 0xff; map[5] = 0xee; printf("done\n"); for (i = 0; i < 64;) { printf("%02x", map[i++]); if (i % 16 == 0) printf("\n"); else if (i % 4 == 0) printf(" "); else printf(" "); } munmap(map, getpagesize()); close(fd); return 0; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 23 June 2009 00:09:28 Alex Williamson wrote: > On Mon, 2009-06-22 at 13:32 +0800, Yang, Sheng wrote: > > On Friday 19 June 2009 21:44:40 Alex Williamson wrote: > > > On Fri, 2009-06-19 at 15:27 +0800, Yang, Sheng wrote: > > > > On Friday 19 June 2009 00:28:41 Alex Williamson wrote: > > > > > The one oddity I noticed is that even when the enable bit is clear, > > > > > the guest can read the ROM. I don't know that this is actually > > > > > illegal, vs returning zeros or ones though. It seems like maybe > > > > > the generic PCI code isn't tracking the enable bit. I think that's > > > > > an independent problem from this patch though. Thanks, > > > > > > > > That should be fine. I've taken a look at code, seems Linux kernel > > > > set enable_bit when someone begin to read rom, and copy rom to > > > > buffer, then unmap the rom. So the rom can be read when enable bit > > > > clear. > > > > > > For this testing, I used an mmap of the ROM address though, so the > > > kernel caching shouldn't have been involved. It looks to me like the > > > problem is that the map function provided via pci_register_io_region() > > > only knows how to create mappings, not tear them down. I think maybe > > > pci_update_mappings() should still call the map_func when new_addr is > > > -1 to let the io space drive shutdown the mapping. As it is, once we > > > setup the mapping, it lives until something else happens to overlap it, > > > regardless of the state of the PCI BAR. Thanks, > > > > I think it may not necessary to tear them down, for the bar mapping won't > > change IIUR. > > We can't guarantee that, the OS can move them if it understands the > resources available to the PCI bus. It typically doesn't move them > though. > > > And you are accessing the sysfs file, right? In the Linux kernel, IIRC, > > pci_create_sysfs_dev_files() create sysfs file, and hook the read to > > pci_read_rom(), which called pci_map_rom(), which would call > > pci_enable_rom(), and write the enable_rom bit to the rom_base_reg. So > > that the rom can be read regardless of enable_rom bit state - and . > > > > But I also found something interested. The write hook of file, > > pci_write_rom() seems won't cause NMI(and seems you need write a char > > rather than 0 to enable the accessing?). So why NMI happen in host?... > > As I mentioned, I'm not using the /sys files to write to the ROM > precisely because the rom write function is only to enable/disable the > ROM BAR. I'm using setpci to manually enable the ROM, then I use the > test program below to mmap the ROM address from /dev/mem, read part of > it, try to write the first few bytes, then read it back. You'll need to > change the hard coded address if you want to test yourself. Obviously > don't do it on a system in use by others since it will likely take it > down. Thanks, Oh, yes. Sorry for completely miss the method... Yeah, by this method, the ROM shouldn't present to guest. And you are right, the PCI mapping is in only one direction. I think we can fix it in QEmu upstream.
diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 65920d0..dfcd670 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -286,8 +286,8 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, /* Continue to program the card */ } - if ((address >= 0x10 && address <= 0x24) || address == 0x34 || - address == 0x3c || address == 0x3d || + if ((address >= 0x10 && address <= 0x24) || address == 0x30 || + address == 0x34 || address == 0x3c || address == 0x3d || pci_access_cap_config(d, address, len)) { /* used for update-mappings (BAR emulation) */ pci_default_write_config(d, address, val, len); @@ -322,8 +322,8 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) || - (address >= 0x10 && address <= 0x24) || address == 0x34 || - address == 0x3c || address == 0x3d || + (address >= 0x10 && address <= 0x24) || address == 0x30 || + address == 0x34 || address == 0x3c || address == 0x3d || pci_access_cap_config(d, address, len)) { val = pci_default_read_config(d, address, len); DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", @@ -384,11 +384,20 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, /* map physical memory */ pci_dev->v_addrs[i].e_physbase = cur_region->base_addr; - pci_dev->v_addrs[i].u.r_virtbase = - mmap(NULL, - (cur_region->size + 0xFFF) & 0xFFFFF000, - PROT_WRITE | PROT_READ, MAP_SHARED, - cur_region->resource_fd, (off_t) 0); + if (i == PCI_ROM_SLOT) { + pci_dev->v_addrs[i].u.r_virtbase = + mmap(NULL, + (cur_region->size + 0xFFF) & 0xFFFFF000, + PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, + 0, (off_t) 0); + + } else { + pci_dev->v_addrs[i].u.r_virtbase = + mmap(NULL, + (cur_region->size + 0xFFF) & 0xFFFFF000, + PROT_WRITE | PROT_READ, MAP_SHARED, + cur_region->resource_fd, (off_t) 0); + } if (pci_dev->v_addrs[i].u.r_virtbase == MAP_FAILED) { pci_dev->v_addrs[i].u.r_virtbase = NULL; @@ -397,6 +406,14 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, (uint32_t) (cur_region->base_addr)); return -1; } + + if (i == PCI_ROM_SLOT) { + memset(pci_dev->v_addrs[i].u.r_virtbase, 0, + (cur_region->size + 0xFFF) & 0xFFFFF000); + mprotect(pci_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase, + (cur_region->size + 0xFFF) & 0xFFFFF000, PROT_READ); + } + pci_dev->v_addrs[i].r_size = cur_region->size; pci_dev->v_addrs[i].e_size = 0; @@ -468,7 +485,7 @@ again: return 1; } - for (r = 0; r < MAX_IO_REGIONS; r++) { + for (r = 0; r < PCI_NUM_REGIONS; r++) { if (fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) != 3) break; @@ -480,11 +497,13 @@ again: continue; if (flags & IORESOURCE_MEM) { flags &= ~IORESOURCE_IO; - snprintf(name, sizeof(name), "%sresource%d", dir, r); - fd = open(name, O_RDWR); - if (fd == -1) - continue; /* probably ROM */ - rp->resource_fd = fd; + if (r != PCI_ROM_SLOT) { + snprintf(name, sizeof(name), "%sresource%d", dir, r); + fd = open(name, O_RDWR); + if (fd == -1) + continue; + rp->resource_fd = fd; + } } else flags &= ~IORESOURCE_PREFETCH; @@ -1390,6 +1409,17 @@ ram_addr_t assigned_dev_load_option_roms(ram_addr_t rom_base_offset) continue; } + /* Copy ROM contents into the space backing the ROM BAR */ + if (adev->assigned_dev->v_addrs[PCI_ROM_SLOT].r_size >= size && + adev->assigned_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase) { + mprotect(adev->assigned_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase, + size, PROT_READ | PROT_WRITE); + memcpy(adev->assigned_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase, + buf, size); + mprotect(adev->assigned_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase, + size, PROT_READ); + } + /* Scan the buffer for suitable ROMs and increase the offset */ offset += scan_option_rom(adev->assigned_dev->dev.devfn, buf, offset); diff --git a/hw/device-assignment.h b/hw/device-assignment.h index c691e11..713f9b7 100644 --- a/hw/device-assignment.h +++ b/hw/device-assignment.h @@ -36,9 +36,6 @@ /* From include/linux/pci.h in the kernel sources */ #define PCI_DEVFN(slot, func) ((((slot) & 0x1f) << 3) | ((func) & 0x07)) -/* The number of BARs in the config space header */ -#define MAX_IO_REGIONS (6) - typedef struct { int type; /* Memory or port I/O */ int valid; @@ -53,7 +50,7 @@ typedef struct { uint16_t region_number; /* number of active regions */ /* Port I/O or MMIO Regions */ - PCIRegion regions[MAX_IO_REGIONS]; + PCIRegion regions[PCI_NUM_REGIONS]; int config_fd; } PCIDevRegions;
The PCI code already knows about option ROMs, so we just need to mmap some space for it, load it with a copy of the contents, and complete the plubming to the generic code. With this a Linux guest can get access to the ROM contents via /sys/bus/pci/devices/<dev>/rom. This might also enable the BIOS to execute ROMs by loading them dynamically from the device rather than hoping they all fit into RAM. Signed-off-by: Alex Williamson <alex.williamson@hp.com> --- hw/device-assignment.c | 60 ++++++++++++++++++++++++++++++++++++------------ hw/device-assignment.h | 5 +--- 2 files changed, 46 insertions(+), 19 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html