diff mbox

kvm: device-assignment: Add PCI option ROM support

Message ID 20090615162815.4830.38216.stgit@host.lart (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Williamson June 15, 2009, 4:29 p.m. UTC
The PCI code already knows about option ROMs, so we just need to
mmap some space for it, load it with a copy of the contents, and
complete the plubming to the generic code.

With this a Linux guest can get access to the ROM contents via
/sys/bus/pci/devices/<dev>/rom.  This might also enable the BIOS
to execute ROMs by loading them dynamically from the device
rather than hoping they all fit into RAM.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
---

 hw/device-assignment.c |   60 ++++++++++++++++++++++++++++++++++++------------
 hw/device-assignment.h |    5 +---
 2 files changed, 46 insertions(+), 19 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yang, Sheng June 18, 2009, 5:30 a.m. UTC | #1
On Tuesday 16 June 2009 00:29:17 Alex Williamson wrote:
> The PCI code already knows about option ROMs, so we just need to
> mmap some space for it, load it with a copy of the contents, and
> complete the plubming to the generic code.
>
> With this a Linux guest can get access to the ROM contents via
> /sys/bus/pci/devices/<dev>/rom.  This might also enable the BIOS
> to execute ROMs by loading them dynamically from the device
> rather than hoping they all fit into RAM.
>
> Signed-off-by: Alex Williamson <alex.williamson@hp.com>

Hi Alex

The patch looks fine. One question: if guest write to the ROM, I think the 
guest would be killed for QEmu would receive a SIGSEGV? I am not sure if it's 
too severe...
Alex Williamson June 18, 2009, 4:28 p.m. UTC | #2
On Thu, 2009-06-18 at 13:30 +0800, Yang, Sheng wrote:
> On Tuesday 16 June 2009 00:29:17 Alex Williamson wrote:
> > The PCI code already knows about option ROMs, so we just need to
> > mmap some space for it, load it with a copy of the contents, and
> > complete the plubming to the generic code.
> >
> > With this a Linux guest can get access to the ROM contents via
> > /sys/bus/pci/devices/<dev>/rom.  This might also enable the BIOS
> > to execute ROMs by loading them dynamically from the device
> > rather than hoping they all fit into RAM.
> >
> The patch looks fine. One question: if guest write to the ROM, I think the 
> guest would be killed for QEmu would receive a SIGSEGV? I am not sure if it's 
> too severe...

Hi Sheng,

Good thought.  I tested this with a simple program that mmaps the ROM
address from /dev/mem and tries to write to it (using setpci to enable
the ROM).  The results are a little surprising.  On the host, writing to
the ROM causes an NMI and the system dies.  On the KVM guest, the write
is happily discarded, neither segfaulting from the mprotect nor
affecting the contents of the ROM.  So it seems that something above my
mprotect is discarding the write, and if we did hit it, a qemu segfault
isn't that far from what happens on bare metal.

The one oddity I noticed is that even when the enable bit is clear, the
guest can read the ROM.  I don't know that this is actually illegal, vs
returning zeros or ones though.  It seems like maybe the generic PCI
code isn't tracking the enable bit.  I think that's an independent
problem from this patch though.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang, Sheng June 19, 2009, 7:27 a.m. UTC | #3
On Friday 19 June 2009 00:28:41 Alex Williamson wrote:
> On Thu, 2009-06-18 at 13:30 +0800, Yang, Sheng wrote:
> > On Tuesday 16 June 2009 00:29:17 Alex Williamson wrote:
> > > The PCI code already knows about option ROMs, so we just need to
> > > mmap some space for it, load it with a copy of the contents, and
> > > complete the plubming to the generic code.
> > >
> > > With this a Linux guest can get access to the ROM contents via
> > > /sys/bus/pci/devices/<dev>/rom.  This might also enable the BIOS
> > > to execute ROMs by loading them dynamically from the device
> > > rather than hoping they all fit into RAM.
> >
> > The patch looks fine. One question: if guest write to the ROM, I think
> > the guest would be killed for QEmu would receive a SIGSEGV? I am not sure
> > if it's too severe...
>
> Hi Sheng,
>
> Good thought.  I tested this with a simple program that mmaps the ROM
> address from /dev/mem and tries to write to it (using setpci to enable
> the ROM).  The results are a little surprising.  On the host, writing to
> the ROM causes an NMI and the system dies.  On the KVM guest, the write
> is happily discarded, neither segfaulting from the mprotect nor
> affecting the contents of the ROM.  So it seems that something above my
> mprotect is discarding the write, and if we did hit it, a qemu segfault
> isn't that far from what happens on bare metal.
>
Oh, that's good. :)

> The one oddity I noticed is that even when the enable bit is clear, the
> guest can read the ROM.  I don't know that this is actually illegal, vs
> returning zeros or ones though.  It seems like maybe the generic PCI
> code isn't tracking the enable bit.  I think that's an independent
> problem from this patch though.  Thanks,

That should be fine. I've taken a look at code, seems Linux kernel set 
enable_bit when someone begin to read rom, and copy rom to buffer, then unmap 
the rom. So the rom can be read when enable bit clear.
Alex Williamson June 19, 2009, 1:44 p.m. UTC | #4
On Fri, 2009-06-19 at 15:27 +0800, Yang, Sheng wrote:
> On Friday 19 June 2009 00:28:41 Alex Williamson wrote:
> > The one oddity I noticed is that even when the enable bit is clear, the
> > guest can read the ROM.  I don't know that this is actually illegal, vs
> > returning zeros or ones though.  It seems like maybe the generic PCI
> > code isn't tracking the enable bit.  I think that's an independent
> > problem from this patch though.  Thanks,
> 
> That should be fine. I've taken a look at code, seems Linux kernel set 
> enable_bit when someone begin to read rom, and copy rom to buffer, then unmap 
> the rom. So the rom can be read when enable bit clear.

For this testing, I used an mmap of the ROM address though, so the
kernel caching shouldn't have been involved.  It looks to me like the
problem is that the map function provided via pci_register_io_region()
only knows how to create mappings, not tear them down.  I think maybe
pci_update_mappings() should still call the map_func when new_addr is -1
to let the io space drive shutdown the mapping.  As it is, once we setup
the mapping, it lives until something else happens to overlap it,
regardless of the state of the PCI BAR.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang, Sheng June 22, 2009, 5:32 a.m. UTC | #5
On Friday 19 June 2009 21:44:40 Alex Williamson wrote:
> On Fri, 2009-06-19 at 15:27 +0800, Yang, Sheng wrote:
> > On Friday 19 June 2009 00:28:41 Alex Williamson wrote:
> > > The one oddity I noticed is that even when the enable bit is clear, the
> > > guest can read the ROM.  I don't know that this is actually illegal, vs
> > > returning zeros or ones though.  It seems like maybe the generic PCI
> > > code isn't tracking the enable bit.  I think that's an independent
> > > problem from this patch though.  Thanks,
> >
> > That should be fine. I've taken a look at code, seems Linux kernel set
> > enable_bit when someone begin to read rom, and copy rom to buffer, then
> > unmap the rom. So the rom can be read when enable bit clear.
>
> For this testing, I used an mmap of the ROM address though, so the
> kernel caching shouldn't have been involved.  It looks to me like the
> problem is that the map function provided via pci_register_io_region()
> only knows how to create mappings, not tear them down.  I think maybe
> pci_update_mappings() should still call the map_func when new_addr is -1
> to let the io space drive shutdown the mapping.  As it is, once we setup
> the mapping, it lives until something else happens to overlap it,
> regardless of the state of the PCI BAR.  Thanks,

I think it may not necessary to tear them down, for the bar mapping won't 
change IIUR.

And you are accessing the sysfs file, right? In the Linux kernel, IIRC, 
pci_create_sysfs_dev_files() create sysfs file, and hook the read to 
pci_read_rom(), which called pci_map_rom(), which would call pci_enable_rom(), 
and write the enable_rom bit to the rom_base_reg. So that the rom can be read 
regardless of enable_rom bit state - and .

But I also found something interested. The write hook of file, pci_write_rom() 
seems won't cause NMI(and seems you need write a char rather than 0 to enable 
the accessing?). So why NMI happen in host?...
Avi Kivity June 22, 2009, 8:38 a.m. UTC | #6
On 06/15/2009 07:29 PM, Alex Williamson wrote:
> The PCI code already knows about option ROMs, so we just need to
> mmap some space for it, load it with a copy of the contents, and
> complete the plubming to the generic code.
>
> With this a Linux guest can get access to the ROM contents via
> /sys/bus/pci/devices/<dev>/rom.  This might also enable the BIOS
> to execute ROMs by loading them dynamically from the device
> rather than hoping they all fit into RAM.
>    

Applied, thanks.
Alex Williamson June 22, 2009, 4:09 p.m. UTC | #7
On Mon, 2009-06-22 at 13:32 +0800, Yang, Sheng wrote:
> On Friday 19 June 2009 21:44:40 Alex Williamson wrote:
> > On Fri, 2009-06-19 at 15:27 +0800, Yang, Sheng wrote:
> > > On Friday 19 June 2009 00:28:41 Alex Williamson wrote:
> > > > The one oddity I noticed is that even when the enable bit is clear, the
> > > > guest can read the ROM.  I don't know that this is actually illegal, vs
> > > > returning zeros or ones though.  It seems like maybe the generic PCI
> > > > code isn't tracking the enable bit.  I think that's an independent
> > > > problem from this patch though.  Thanks,
> > >
> > > That should be fine. I've taken a look at code, seems Linux kernel set
> > > enable_bit when someone begin to read rom, and copy rom to buffer, then
> > > unmap the rom. So the rom can be read when enable bit clear.
> >
> > For this testing, I used an mmap of the ROM address though, so the
> > kernel caching shouldn't have been involved.  It looks to me like the
> > problem is that the map function provided via pci_register_io_region()
> > only knows how to create mappings, not tear them down.  I think maybe
> > pci_update_mappings() should still call the map_func when new_addr is -1
> > to let the io space drive shutdown the mapping.  As it is, once we setup
> > the mapping, it lives until something else happens to overlap it,
> > regardless of the state of the PCI BAR.  Thanks,
> 
> I think it may not necessary to tear them down, for the bar mapping won't 
> change IIUR.

We can't guarantee that, the OS can move them if it understands the
resources available to the PCI bus.  It typically doesn't move them
though.

> And you are accessing the sysfs file, right? In the Linux kernel, IIRC, 
> pci_create_sysfs_dev_files() create sysfs file, and hook the read to 
> pci_read_rom(), which called pci_map_rom(), which would call pci_enable_rom(), 
> and write the enable_rom bit to the rom_base_reg. So that the rom can be read 
> regardless of enable_rom bit state - and .
> 
> But I also found something interested. The write hook of file, pci_write_rom() 
> seems won't cause NMI(and seems you need write a char rather than 0 to enable 
> the accessing?). So why NMI happen in host?...

As I mentioned, I'm not using the /sys files to write to the ROM
precisely because the rom write function is only to enable/disable the
ROM BAR.  I'm using setpci to manually enable the ROM, then I use the
test program below to mmap the ROM address from /dev/mem, read part of
it, try to write the first few bytes, then read it back.  You'll need to
change the hard coded address if you want to test yourself.  Obviously
don't do it on a system in use by others since it will likely take it
down.  Thanks,

Alex

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>

#define DEV_MEM		"/dev/mem"
#define ROM_ADDR	0xe6300000

int main(void)
{
	unsigned char *map;
	int i, fd = open(DEV_MEM, O_RDWR);

	if (fd == -1) {
		printf("Failed to open /dev/mem: %s\n", strerror(errno));
		return -1;
	}

	map = mmap(NULL, getpagesize(), PROT_READ|PROT_WRITE,
		   MAP_SHARED, fd, ROM_ADDR);

	if (map == MAP_FAILED) {
		printf("Failed to mmap /dev/mem: %s\n", strerror(errno));
		close(fd);
		return -1;
	}

	for (i = 0; i < 64;) {
		printf("%02x", map[i++]);
		if (i % 16 == 0)
			printf("\n");
		else if (i % 4 == 0)
			printf("  ");
		else
			printf(" ");
	}

	printf("Writing...");
	map[0] = 0xba;
	map[1] = 0xdb;
	map[2] = 0xad;
	map[3] = 0xc0;
	map[4] = 0xff;
	map[5] = 0xee;
	printf("done\n");

	for (i = 0; i < 64;) {
		printf("%02x", map[i++]);
		if (i % 16 == 0)
			printf("\n");
		else if (i % 4 == 0)
			printf("  ");
		else
			printf(" ");
	}

	munmap(map, getpagesize());
	close(fd);
	return 0;
}


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang, Sheng June 23, 2009, 1:25 a.m. UTC | #8
On Tuesday 23 June 2009 00:09:28 Alex Williamson wrote:
> On Mon, 2009-06-22 at 13:32 +0800, Yang, Sheng wrote:
> > On Friday 19 June 2009 21:44:40 Alex Williamson wrote:
> > > On Fri, 2009-06-19 at 15:27 +0800, Yang, Sheng wrote:
> > > > On Friday 19 June 2009 00:28:41 Alex Williamson wrote:
> > > > > The one oddity I noticed is that even when the enable bit is clear,
> > > > > the guest can read the ROM.  I don't know that this is actually
> > > > > illegal, vs returning zeros or ones though.  It seems like maybe
> > > > > the generic PCI code isn't tracking the enable bit.  I think that's
> > > > > an independent problem from this patch though.  Thanks,
> > > >
> > > > That should be fine. I've taken a look at code, seems Linux kernel
> > > > set enable_bit when someone begin to read rom, and copy rom to
> > > > buffer, then unmap the rom. So the rom can be read when enable bit
> > > > clear.
> > >
> > > For this testing, I used an mmap of the ROM address though, so the
> > > kernel caching shouldn't have been involved.  It looks to me like the
> > > problem is that the map function provided via pci_register_io_region()
> > > only knows how to create mappings, not tear them down.  I think maybe
> > > pci_update_mappings() should still call the map_func when new_addr is
> > > -1 to let the io space drive shutdown the mapping.  As it is, once we
> > > setup the mapping, it lives until something else happens to overlap it,
> > > regardless of the state of the PCI BAR.  Thanks,
> >
> > I think it may not necessary to tear them down, for the bar mapping won't
> > change IIUR.
>
> We can't guarantee that, the OS can move them if it understands the
> resources available to the PCI bus.  It typically doesn't move them
> though.
>
> > And you are accessing the sysfs file, right? In the Linux kernel, IIRC,
> > pci_create_sysfs_dev_files() create sysfs file, and hook the read to
> > pci_read_rom(), which called pci_map_rom(), which would call
> > pci_enable_rom(), and write the enable_rom bit to the rom_base_reg. So
> > that the rom can be read regardless of enable_rom bit state - and .
> >
> > But I also found something interested. The write hook of file,
> > pci_write_rom() seems won't cause NMI(and seems you need write a char
> > rather than 0 to enable the accessing?). So why NMI happen in host?...
>
> As I mentioned, I'm not using the /sys files to write to the ROM
> precisely because the rom write function is only to enable/disable the
> ROM BAR.  I'm using setpci to manually enable the ROM, then I use the
> test program below to mmap the ROM address from /dev/mem, read part of
> it, try to write the first few bytes, then read it back.  You'll need to
> change the hard coded address if you want to test yourself.  Obviously
> don't do it on a system in use by others since it will likely take it
> down.  Thanks,

Oh, yes. Sorry for completely miss the method... Yeah, by this method, the ROM 
shouldn't present to guest. And you are right, the PCI mapping is in only one 
direction. I think we can fix it in QEmu upstream.
diff mbox

Patch

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 65920d0..dfcd670 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -286,8 +286,8 @@  static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
         /* Continue to program the card */
     }
 
-    if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
-        address == 0x3c || address == 0x3d ||
+    if ((address >= 0x10 && address <= 0x24) || address == 0x30 ||
+        address == 0x34 || address == 0x3c || address == 0x3d ||
         pci_access_cap_config(d, address, len)) {
         /* used for update-mappings (BAR emulation) */
         pci_default_write_config(d, address, val, len);
@@ -322,8 +322,8 @@  static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
     AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
     if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) ||
-	(address >= 0x10 && address <= 0x24) || address == 0x34 ||
-        address == 0x3c || address == 0x3d ||
+	(address >= 0x10 && address <= 0x24) || address == 0x30 ||
+        address == 0x34 || address == 0x3c || address == 0x3d ||
         pci_access_cap_config(d, address, len)) {
         val = pci_default_read_config(d, address, len);
         DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
@@ -384,11 +384,20 @@  static int assigned_dev_register_regions(PCIRegion *io_regions,
 
             /* map physical memory */
             pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
-            pci_dev->v_addrs[i].u.r_virtbase =
-                mmap(NULL,
-                     (cur_region->size + 0xFFF) & 0xFFFFF000,
-                     PROT_WRITE | PROT_READ, MAP_SHARED,
-                     cur_region->resource_fd, (off_t) 0);
+            if (i == PCI_ROM_SLOT) {
+                pci_dev->v_addrs[i].u.r_virtbase =
+                    mmap(NULL,
+                         (cur_region->size + 0xFFF) & 0xFFFFF000,
+                         PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE,
+                         0, (off_t) 0);
+
+            } else {
+                pci_dev->v_addrs[i].u.r_virtbase =
+                    mmap(NULL,
+                         (cur_region->size + 0xFFF) & 0xFFFFF000,
+                         PROT_WRITE | PROT_READ, MAP_SHARED,
+                         cur_region->resource_fd, (off_t) 0);
+            }
 
             if (pci_dev->v_addrs[i].u.r_virtbase == MAP_FAILED) {
                 pci_dev->v_addrs[i].u.r_virtbase = NULL;
@@ -397,6 +406,14 @@  static int assigned_dev_register_regions(PCIRegion *io_regions,
                         (uint32_t) (cur_region->base_addr));
                 return -1;
             }
+
+            if (i == PCI_ROM_SLOT) {
+                memset(pci_dev->v_addrs[i].u.r_virtbase, 0,
+                       (cur_region->size + 0xFFF) & 0xFFFFF000);
+                mprotect(pci_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase,
+                         (cur_region->size + 0xFFF) & 0xFFFFF000, PROT_READ);
+            }
+
             pci_dev->v_addrs[i].r_size = cur_region->size;
             pci_dev->v_addrs[i].e_size = 0;
 
@@ -468,7 +485,7 @@  again:
         return 1;
     }
 
-    for (r = 0; r < MAX_IO_REGIONS; r++) {
+    for (r = 0; r < PCI_NUM_REGIONS; r++) {
 	if (fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) != 3)
 	    break; 
 
@@ -480,11 +497,13 @@  again:
             continue;
         if (flags & IORESOURCE_MEM) {
             flags &= ~IORESOURCE_IO;
-	    snprintf(name, sizeof(name), "%sresource%d", dir, r);
-            fd = open(name, O_RDWR);
-            if (fd == -1)
-                continue;       /* probably ROM */
-            rp->resource_fd = fd;
+            if (r != PCI_ROM_SLOT) {
+                snprintf(name, sizeof(name), "%sresource%d", dir, r);
+                fd = open(name, O_RDWR);
+                if (fd == -1)
+                    continue;
+                rp->resource_fd = fd;
+            }
         } else
             flags &= ~IORESOURCE_PREFETCH;
 
@@ -1390,6 +1409,17 @@  ram_addr_t assigned_dev_load_option_roms(ram_addr_t rom_base_offset)
             continue;
         }
 
+        /* Copy ROM contents into the space backing the ROM BAR */
+        if (adev->assigned_dev->v_addrs[PCI_ROM_SLOT].r_size >= size &&
+            adev->assigned_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase) {
+            mprotect(adev->assigned_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase,
+                     size, PROT_READ | PROT_WRITE);
+            memcpy(adev->assigned_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase,
+                   buf, size);
+            mprotect(adev->assigned_dev->v_addrs[PCI_ROM_SLOT].u.r_virtbase,
+                     size, PROT_READ);
+        }
+
         /* Scan the buffer for suitable ROMs and increase the offset */
         offset += scan_option_rom(adev->assigned_dev->dev.devfn, buf, offset);
 
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index c691e11..713f9b7 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -36,9 +36,6 @@ 
 /* From include/linux/pci.h in the kernel sources */
 #define PCI_DEVFN(slot, func)   ((((slot) & 0x1f) << 3) | ((func) & 0x07))
 
-/* The number of BARs in the config space header */
-#define MAX_IO_REGIONS (6)
-
 typedef struct {
     int type;           /* Memory or port I/O */
     int valid;
@@ -53,7 +50,7 @@  typedef struct {
     uint16_t region_number; /* number of active regions */
 
     /* Port I/O or MMIO Regions */
-    PCIRegion regions[MAX_IO_REGIONS];
+    PCIRegion regions[PCI_NUM_REGIONS];
     int config_fd;
 } PCIDevRegions;