From patchwork Wed Apr 1 15:43:28 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cam Macdonell X-Patchwork-Id: 15712 Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n31FhZew022460 for ; Wed, 1 Apr 2009 15:43:35 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756394AbZDAPne (ORCPT ); Wed, 1 Apr 2009 11:43:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756290AbZDAPne (ORCPT ); Wed, 1 Apr 2009 11:43:34 -0400 Received: from fleet.cs.ualberta.ca ([129.128.22.22]:50475 "EHLO fleet.cs.ualberta.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754949AbZDAPnd (ORCPT ); Wed, 1 Apr 2009 11:43:33 -0400 Received: from localhost.localdomain (st-brides.cs.ualberta.ca [129.128.23.21]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp-auth.cs.ualberta.ca (Postfix) with ESMTP id AF02928005; Wed, 1 Apr 2009 09:43:30 -0600 (MDT) From: Cam Macdonell To: kvm@vger.kernel.org Cc: Cam Macdonell Subject: [PATCH] Add shared memory PCI device that shares a memory object betweens VMs Date: Wed, 1 Apr 2009 09:43:28 -0600 Message-Id: <1238600608-9120-1-git-send-email-cam@cs.ualberta.ca> X-Mailer: git-send-email 1.6.0.6 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch supports sharing memory between VMs and between the host/VM. It's a first cut and comments are encouraged. The goal is to support simple Inter-VM communication with zero-copy access to shared memory. The patch adds the switch -ivshmem (short for Inter-VM shared memory) that is used as follows "-ivshmem file,size". The shared memory object named 'file' will be created/opened and mapped onto a PCI memory device with size 'size'. The PCI device has two BARs, BAR0 for registers and BAR1 for the memory region that maps the file above. The memory region can be mmapped into userspace on the guest (or read and written if you want). The register region will eventually be used to support interrupts which are communicated via unix domain sockets, but I need some tips on how to do this using a qemu character device. Also, feel free to suggest a better name if you have one. Thanks, Cam --- qemu/Makefile.target | 2 + qemu/hw/ivshmem.c | 363 ++++++++++++++++++++++++++++++++++++++++++++++++++ qemu/hw/pc.c | 6 + qemu/hw/pc.h | 3 + qemu/qemu-options.hx | 10 ++ qemu/sysemu.h | 7 + qemu/vl.c | 12 ++ 7 files changed, 403 insertions(+), 0 deletions(-) create mode 100644 qemu/hw/ivshmem.c diff --git a/qemu/Makefile.target b/qemu/Makefile.target index 6eed853..167db55 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -640,6 +640,8 @@ OBJS += e1000.o # Serial mouse OBJS += msmouse.o +# Inter-VM PCI shared memory +OBJS += ivshmem.o ifeq ($(USE_KVM_DEVICE_ASSIGNMENT), 1) OBJS+= device-assignment.o diff --git a/qemu/hw/ivshmem.c b/qemu/hw/ivshmem.c new file mode 100644 index 0000000..27db95f --- /dev/null +++ b/qemu/hw/ivshmem.c @@ -0,0 +1,363 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell + * + * Based On: cirrus_vga.c and rtl8139.c + * + * This code is licensed under the GNU GPL v2. + */ + +#include "hw.h" +#include "console.h" +#include "pc.h" +#include "pci.h" +#include "sysemu.h" + +#include "qemu-common.h" +#include + +#define PCI_COMMAND_IOACCESS 0x0001 +#define PCI_COMMAND_MEMACCESS 0x0002 +#define PCI_COMMAND_BUSMASTER 0x0004 + +//#define DEBUG_IVSHMEM + +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, args...) \ + do {printf("IVSHMEM: " fmt, ##args); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, args...) +#endif + +typedef struct IVShmemState { + uint16_t intrmask; + uint16_t intrstatus; + uint8_t *ivshmem_ptr; + unsigned long ivshmem_offset; + unsigned int ivshmem_size; + unsigned long bios_offset; + unsigned int bios_size; + target_phys_addr_t base_ctrl; + int it_shift; + PCIDevice *pci_dev; + unsigned long map_addr; + unsigned long map_end; + int ivshmem_mmio_io_addr; +} IVShmemState; + +typedef struct PCI_IVShmemState { + PCIDevice dev; + IVShmemState ivshmem_state; +} PCI_IVShmemState; + +typedef struct IVShmemDesc { + char name[1024]; + int size; +} IVShmemDesc; + + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { + IntrMask = 0, + IntrStatus = 16 +}; + +static int num_ivshmem_devices = 0; +static IVShmemDesc ivshmem_desc; + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, + uint32_t addr, uint32_t size, int type) +{ + PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev; + IVShmemState *s = &d->ivshmem_state; + + IVSHMEM_DPRINTF("addr = %u size = %u\n", addr, size); + cpu_register_physical_memory(addr, s->ivshmem_size, s->ivshmem_offset); + +} + +void ivshmem_init(const char * optarg) { + + char * temp; + int size; + + num_ivshmem_devices++; + + /* currently we only support 1 device */ + if (num_ivshmem_devices > MAX_IVSHMEM_DEVICES) { + return; + } + + temp = strdup(optarg); + snprintf(ivshmem_desc.name, 1024, "/%s", strsep(&temp,",")); + size = atol(temp); + if ( size == -1) { + ivshmem_desc.size = TARGET_PAGE_SIZE; + } else { + ivshmem_desc.size = size*1024*1024; + } + IVSHMEM_DPRINTF("optarg is %s, name is %s, size is %d\n", optarg, + ivshmem_desc.name, + ivshmem_desc.size); +} + +int ivshmem_get_size(void) { + return ivshmem_desc.size; +} + +/* accessing registers - based on rtl8139 */ +static void ivshmem_update_irq(IVShmemState *s) +{ + int isr; + isr = (s->intrstatus & s->intrmask) & 0xffff; + + /* don't print ISR resets */ + if (isr) { + IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n", + isr ? 1 : 0, s->intrstatus, s->intrmask); + } + + qemu_set_irq(s->pci_dev->irq[0], (isr != 0)); +} + +static void ivshmem_mmio_map(PCIDevice *pci_dev, int region_num, + uint32_t addr, uint32_t size, int type) +{ + PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev; + IVShmemState *s = &d->ivshmem_state; + + cpu_register_physical_memory(addr + 0, 0x100, s->ivshmem_mmio_io_addr); +} + +static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val) +{ + IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val); + + s->intrmask = val; + + ivshmem_update_irq(s); +} + +static uint32_t ivshmem_IntrMask_read(IVShmemState *s) +{ + uint32_t ret = s->intrmask; + + IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret); + + return ret; +} + +static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val) +{ + IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val); + + s->intrstatus = val; + + ivshmem_update_irq(s); + return; +} + +static uint32_t ivshmem_IntrStatus_read(IVShmemState *s) +{ + uint32_t ret = s->intrstatus; + + /* reading ISR clears all interrupts */ + s->intrstatus = 0; + + ivshmem_update_irq(s); + + return ret; +} + +static void ivshmem_io_writew(void *opaque, uint8_t addr, uint32_t val) +{ + IVShmemState *s = opaque; + + IVSHMEM_DPRINTF("writing 0x%x to 0x%lx\n", addr, (unsigned long) opaque); + + addr &= 0xfe; + + switch (addr) + { + case IntrMask: + ivshmem_IntrMask_write(s, val); + break; + + case IntrStatus: + ivshmem_IntrStatus_write(s, val); + break; + + default: + IVSHMEM_DPRINTF("why are we writing 0x%x\n", addr); + } +} + +static void ivshmem_io_writel(void *opaque, uint8_t addr, uint32_t val) +{ + IVSHMEM_DPRINTF("We shouldn't be writing longs\n"); +} + +static void ivshmem_io_writeb(void *opaque, uint8_t addr, uint32_t val) +{ + IVSHMEM_DPRINTF("We shouldn't be writing bytes\n"); +} + +static uint32_t ivshmem_io_readw(void *opaque, uint8_t addr) +{ + + IVShmemState *s = opaque; + uint32_t ret; + + switch (addr) + { + case IntrMask: + ret = ivshmem_IntrMask_read(s); + break; + + case IntrStatus: + ret = ivshmem_IntrStatus_read(s); + break; + default: + IVSHMEM_DPRINTF("why are we reading 0x%x\n", addr); + ret = 0; + } + + return ret; +} + +static uint32_t ivshmem_io_readl(void *opaque, uint8_t addr) +{ + IVSHMEM_DPRINTF("We shouldn't be reading longs\n"); + return 0; +} + +static uint32_t ivshmem_io_readb(void *opaque, uint8_t addr) +{ + IVSHMEM_DPRINTF("We shouldn't be reading bytes\n"); + + return 0; +} + +static void ivshmem_mmio_writeb(void *opaque, + target_phys_addr_t addr, uint32_t val) +{ + ivshmem_io_writeb(opaque, addr & 0xFF, val); +} + +static void ivshmem_mmio_writew(void *opaque, + target_phys_addr_t addr, uint32_t val) +{ + ivshmem_io_writew(opaque, addr & 0xFF, val); +} + +static void ivshmem_mmio_writel(void *opaque, + target_phys_addr_t addr, uint32_t val) +{ + ivshmem_io_writel(opaque, addr & 0xFF, val); +} + +static uint32_t ivshmem_mmio_readb(void *opaque, target_phys_addr_t addr) +{ + return ivshmem_io_readb(opaque, addr & 0xFF); +} + +static uint32_t ivshmem_mmio_readw(void *opaque, target_phys_addr_t addr) +{ + uint32_t val = ivshmem_io_readw(opaque, addr & 0xFF); + return val; +} + +static uint32_t ivshmem_mmio_readl(void *opaque, target_phys_addr_t addr) +{ + uint32_t val = ivshmem_io_readl(opaque, addr & 0xFF); + return val; +} + +static CPUReadMemoryFunc *ivshmem_mmio_read[3] = { + ivshmem_mmio_readb, + ivshmem_mmio_readw, + ivshmem_mmio_readl, +}; + +static CPUWriteMemoryFunc *ivshmem_mmio_write[3] = { + ivshmem_mmio_writeb, + ivshmem_mmio_writew, + ivshmem_mmio_writel, +}; + +int pci_ivshmem_init(PCIBus *bus, uint8_t *phys_ram_base) +{ + PCI_IVShmemState *d; + IVShmemState *s; + uint8_t *pci_conf; + int ivshmem_fd; + + IVSHMEM_DPRINTF("shared file is %s\n", ivshmem_desc.name); + d = (PCI_IVShmemState *)pci_register_device(bus, "kvm_ivshmem", + sizeof(PCI_IVShmemState), + -1, NULL, NULL); + if (!d) { + return -1; + } + + s = &d->ivshmem_state; + + /* allocate shared memory RAM */ + s->ivshmem_offset = qemu_ram_alloc(ivshmem_desc.size); + IVSHMEM_DPRINTF("size is = %d\n", ivshmem_desc.size); + IVSHMEM_DPRINTF("ivshmem ram offset = %ld\n", s->ivshmem_offset); + + s->ivshmem_ptr = phys_ram_base + s->ivshmem_offset; + + s->pci_dev = &d->dev; + s->ivshmem_size = ivshmem_desc.size; + + pci_conf = d->dev.config; + pci_conf[0x00] = 0xf4; // Qumranet vendor ID 0x5002 + pci_conf[0x01] = 0x1a; + pci_conf[0x02] = 0x10; + pci_conf[0x03] = 0x11; + pci_conf[0x04] = PCI_COMMAND_IOACCESS | PCI_COMMAND_MEMACCESS; + pci_conf[0x0a] = 0x00; // RAM controller + pci_conf[0x0b] = 0x05; + pci_conf[0x0e] = 0x00; // header_type + + pci_conf[PCI_INTERRUPT_PIN] = 1; // we are going to support interrupts + + /* XXX: ivshmem_desc.size must be a power of two */ + + s->ivshmem_mmio_io_addr = cpu_register_io_memory(0, ivshmem_mmio_read, + ivshmem_mmio_write, s); + + /* region for registers*/ + pci_register_io_region(&d->dev, 0, 0x100, + PCI_ADDRESS_SPACE_MEM, ivshmem_mmio_map); + + /* region for shared memory */ + pci_register_io_region(&d->dev, 1, ivshmem_desc.size, + PCI_ADDRESS_SPACE_MEM, ivshmem_map); + + /* open shared memory file */ + if ((ivshmem_fd = shm_open(ivshmem_desc.name, O_CREAT|O_RDWR, S_IRWXU)) < 0) + { + fprintf(stderr, "kvm_ivshmem: could not open shared file\n"); + exit(-1); + } + + ftruncate(ivshmem_fd, ivshmem_desc.size); + + /* mmap onto PCI device's memory */ + if (mmap(s->ivshmem_ptr, ivshmem_desc.size, PROT_READ|PROT_WRITE, + MAP_SHARED|MAP_FIXED, ivshmem_fd, 0) == MAP_FAILED) + { + fprintf(stderr, "kvm_ivshmem: could not mmap shared file\n"); + exit(-1); + } + + IVSHMEM_DPRINTF("shared object mapped to 0x%p\n", s->ivshmem_ptr); + + return 0; +} + diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index d4a4320..34cd1ba 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -64,6 +64,8 @@ static PITState *pit; static IOAPICState *ioapic; static PCIDevice *i440fx_state; +extern int ivshmem_enabled; + static void ioport80_write(void *opaque, uint32_t addr, uint32_t data) { } @@ -1038,6 +1040,10 @@ vga_bios_error: } } + if (pci_enabled && ivshmem_enabled) { + pci_ivshmem_init(pci_bus, phys_ram_base); + } + rtc_state = rtc_init(0x70, i8259[8], 2000); qemu_register_boot_set(pc_boot_set, rtc_state); diff --git a/qemu/hw/pc.h b/qemu/hw/pc.h index 85319ea..0158ef3 100644 --- a/qemu/hw/pc.h +++ b/qemu/hw/pc.h @@ -190,4 +190,7 @@ void isa_ne2000_init(int base, qemu_irq irq, NICInfo *nd); void extboot_init(BlockDriverState *bs, int cmd); +/* ivshmem.c */ +int pci_ivshmem_init(PCIBus *bus, uint8_t *phys_ram_base); + #endif diff --git a/qemu/qemu-options.hx b/qemu/qemu-options.hx index bb4c8e6..84c7af2 100644 --- a/qemu/qemu-options.hx +++ b/qemu/qemu-options.hx @@ -1201,6 +1201,16 @@ The default device is @code{vc} in graphical mode and @code{stdio} in non graphical mode. ETEXI +DEF("ivshmem", HAS_ARG, QEMU_OPTION_ivshmem, \ + "-ivshmem name,size creates or opens a shared file 'name' of size \ + 'size' (in MB) and exposes it as a PCI device in the guest\n") +STEXI +@item -ivshmem @var{file},@var{size} +Creates a POSIX shared file named @var{file} of size @var{size} and creates a +PCI device of the same size that maps the shared file into the device for guests +to access. The created file on the host is located in /dev/shm/ +ETEXI + DEF("pidfile", HAS_ARG, QEMU_OPTION_pidfile, \ "-pidfile file write PID to 'file'\n") STEXI diff --git a/qemu/sysemu.h b/qemu/sysemu.h index d765465..ed34b5a 100644 --- a/qemu/sysemu.h +++ b/qemu/sysemu.h @@ -215,6 +215,13 @@ extern CharDriverState *parallel_hds[MAX_PARALLEL_PORTS]; extern CharDriverState *virtcon_hds[MAX_VIRTIO_CONSOLES]; +/* inter-VM shared memory devices */ + +#define MAX_IVSHMEM_DEVICES 1 + +void ivshmem_init(const char * optarg); +int ivshmem_get_size(void); + #define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR) #ifdef NEED_CPU_H diff --git a/qemu/vl.c b/qemu/vl.c index b3da7ad..e0a08fb 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -219,6 +219,7 @@ static int rtc_date_offset = -1; /* -1 means no change */ int cirrus_vga_enabled = 1; int std_vga_enabled = 0; int vmsvga_enabled = 0; +int ivshmem_enabled = 0; #ifdef TARGET_SPARC int graphic_width = 1024; int graphic_height = 768; @@ -236,6 +237,7 @@ int no_quit = 0; CharDriverState *serial_hds[MAX_SERIAL_PORTS]; CharDriverState *parallel_hds[MAX_PARALLEL_PORTS]; CharDriverState *virtcon_hds[MAX_VIRTIO_CONSOLES]; +const char * ivshmem_device; #ifdef TARGET_I386 int win2k_install_hack = 0; int rtc_td_hack = 0; @@ -4522,6 +4524,7 @@ int main(int argc, char **argv, char **envp) cyls = heads = secs = 0; translation = BIOS_ATA_TRANSLATION_AUTO; monitor_device = "vc:80Cx24C"; + ivshmem_device = NULL; serial_devices[0] = "vc:80Cx24C"; for(i = 1; i < MAX_SERIAL_PORTS; i++) @@ -4944,6 +4947,10 @@ int main(int argc, char **argv, char **envp) parallel_devices[parallel_device_index] = optarg; parallel_device_index++; break; + case QEMU_OPTION_ivshmem: + ivshmem_device = optarg; + ivshmem_enabled = 1; + break; case QEMU_OPTION_loadvm: loadvm = optarg; break; @@ -5416,6 +5423,11 @@ int main(int argc, char **argv, char **envp) } } + if (ivshmem_enabled) { + ivshmem_init(ivshmem_device); + phys_ram_size += ivshmem_get_size(); + } + phys_ram_base = qemu_alloc_physram(phys_ram_size); if (!phys_ram_base) { fprintf(stderr, "Could not allocate physical memory\n");