Message ID | 20110824222510.GC14835@dancer.ca.sandia.gov (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 24.08.2011, at 17:25, David Evensky wrote: > > > This patch adds a PCI device that provides PCI device memory to the > guest. This memory in the guest exists as a shared memory segment in > the host. This is similar memory sharing capability of Nahanni > (ivshmem) available in QEMU. In this case, the shared memory segment > is exposed as a PCI BAR only. > > A new command line argument is added as: > --shmem pci:0xc8000000:16MB:handle=/newmem:create > > which will set the PCI BAR at 0xc8000000, the shared memory segment > and the region pointed to by the BAR will be 16MB. On the host side > the shm_open handle will be '/newmem', and the kvm tool will create > the shared segment, set its size, and initialize it. If the size, > handle, or create flag are absent, they will default to 16MB, > handle=/kvm_shmem, and create will be false. The address family, > 'pci:' is also optional as it is the only address family currently > supported. Only a single --shmem is supported at this time. Did you have a look at ivshmem? It does that today, but also gives you an IRQ line so the guests can poke each other. For something as simple as this, I don't see why we'd need two competing implementations. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 24, 2011 at 10:27:18PM -0500, Alexander Graf wrote: > > On 24.08.2011, at 17:25, David Evensky wrote: > > > > > > > This patch adds a PCI device that provides PCI device memory to the > > guest. This memory in the guest exists as a shared memory segment in > > the host. This is similar memory sharing capability of Nahanni > > (ivshmem) available in QEMU. In this case, the shared memory segment > > is exposed as a PCI BAR only. > > > > A new command line argument is added as: > > --shmem pci:0xc8000000:16MB:handle=/newmem:create > > > > which will set the PCI BAR at 0xc8000000, the shared memory segment > > and the region pointed to by the BAR will be 16MB. On the host side > > the shm_open handle will be '/newmem', and the kvm tool will create > > the shared segment, set its size, and initialize it. If the size, > > handle, or create flag are absent, they will default to 16MB, > > handle=/kvm_shmem, and create will be false. The address family, > > 'pci:' is also optional as it is the only address family currently > > supported. Only a single --shmem is supported at this time. > > Did you have a look at ivshmem? It does that today, but also gives you an IRQ line so the guests can poke each other. For something as simple as this, I don't see why we'd need two competing implementations. Isn't ivshmem in QEMU? If so, then I don't think there isn't any competition. How do you feel that these are competing? \dae > > > Alex > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24.08.2011, at 23:49, David Evensky wrote: > On Wed, Aug 24, 2011 at 10:27:18PM -0500, Alexander Graf wrote: >> >> On 24.08.2011, at 17:25, David Evensky wrote: >> >>> >>> >>> This patch adds a PCI device that provides PCI device memory to the >>> guest. This memory in the guest exists as a shared memory segment in >>> the host. This is similar memory sharing capability of Nahanni >>> (ivshmem) available in QEMU. In this case, the shared memory segment >>> is exposed as a PCI BAR only. >>> >>> A new command line argument is added as: >>> --shmem pci:0xc8000000:16MB:handle=/newmem:create >>> >>> which will set the PCI BAR at 0xc8000000, the shared memory segment >>> and the region pointed to by the BAR will be 16MB. On the host side >>> the shm_open handle will be '/newmem', and the kvm tool will create >>> the shared segment, set its size, and initialize it. If the size, >>> handle, or create flag are absent, they will default to 16MB, >>> handle=/kvm_shmem, and create will be false. The address family, >>> 'pci:' is also optional as it is the only address family currently >>> supported. Only a single --shmem is supported at this time. >> >> Did you have a look at ivshmem? It does that today, but also gives you an IRQ line so the guests can poke each other. For something as simple as this, I don't see why we'd need two competing implementations. > > Isn't ivshmem in QEMU? If so, then I don't think there isn't any > competition. How do you feel that these are competing? Well, it means that you will inside the guest have two different devices depending whether you're using QEMU or kvm-tool. I don't see the point in exposing different devices to the guest just because of NIH. Why should a guest care which device emulation framework you're using? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-08-24 at 21:49 -0700, David Evensky wrote: > On Wed, Aug 24, 2011 at 10:27:18PM -0500, Alexander Graf wrote: > > > > On 24.08.2011, at 17:25, David Evensky wrote: > > > > > > > > > > > This patch adds a PCI device that provides PCI device memory to the > > > guest. This memory in the guest exists as a shared memory segment in > > > the host. This is similar memory sharing capability of Nahanni > > > (ivshmem) available in QEMU. In this case, the shared memory segment > > > is exposed as a PCI BAR only. > > > > > > A new command line argument is added as: > > > --shmem pci:0xc8000000:16MB:handle=/newmem:create > > > > > > which will set the PCI BAR at 0xc8000000, the shared memory segment > > > and the region pointed to by the BAR will be 16MB. On the host side > > > the shm_open handle will be '/newmem', and the kvm tool will create > > > the shared segment, set its size, and initialize it. If the size, > > > handle, or create flag are absent, they will default to 16MB, > > > handle=/kvm_shmem, and create will be false. The address family, > > > 'pci:' is also optional as it is the only address family currently > > > supported. Only a single --shmem is supported at this time. > > > > Did you have a look at ivshmem? It does that today, but also gives > you an IRQ line so the guests can poke each other. For something as > simple as this, I don't see why we'd need two competing > implementations. > > Isn't ivshmem in QEMU? If so, then I don't think there isn't any > competition. How do you feel that these are competing? It's obviously not competing. One thing you might want to consider is making the guest interface compatible with ivshmem. Is there any reason we shouldn't do that? I don't consider that a requirement, just nice to have. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-08-24 at 23:52 -0500, Alexander Graf wrote: > > Isn't ivshmem in QEMU? If so, then I don't think there isn't any > > competition. How do you feel that these are competing? > > Well, it means that you will inside the guest have two different > devices depending whether you're using QEMU or kvm-tool. I don't see > the point in exposing different devices to the guest just because of > NIH. Why should a guest care which device emulation framework you're > using? It's a pretty special-purpose device that requires user configuration so I don't consider QEMU compatibility to be mandatory. It'd be nice to have but not something to bend over backwards for. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/25/11 8:22 AM, Alexander Graf wrote: > > On 25.08.2011, at 00:11, Pekka Enberg wrote: > >> On Wed, 2011-08-24 at 23:52 -0500, Alexander Graf wrote: >>>> Isn't ivshmem in QEMU? If so, then I don't think there isn't any >>>> competition. How do you feel that these are competing? >>> >>> Well, it means that you will inside the guest have two different >>> devices depending whether you're using QEMU or kvm-tool. I don't see >>> the point in exposing different devices to the guest just because of >>> NIH. Why should a guest care which device emulation framework you're >>> using? >> >> It's a pretty special-purpose device that requires user configuration so >> I don't consider QEMU compatibility to be mandatory. It'd be nice to >> have but not something to bend over backwards for. > > Well, the nice thing is that you would get the guest side for free: > > http://gitorious.org/nahanni/guest-code/blobs/master/kernel_module/uio/uio_ivshmem.c > > You also didn't invent your own virtio protocol, no? :) No, because virtio drivers are in Linux kernel proper. Is ivshmem in the kernel tree or planned to be merged at some point? Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 25.08.2011, at 00:37, Pekka Enberg wrote: > On 8/25/11 8:22 AM, Alexander Graf wrote: >> >> On 25.08.2011, at 00:11, Pekka Enberg wrote: >> >>> On Wed, 2011-08-24 at 23:52 -0500, Alexander Graf wrote: >>>>> Isn't ivshmem in QEMU? If so, then I don't think there isn't any >>>>> competition. How do you feel that these are competing? >>>> >>>> Well, it means that you will inside the guest have two different >>>> devices depending whether you're using QEMU or kvm-tool. I don't see >>>> the point in exposing different devices to the guest just because of >>>> NIH. Why should a guest care which device emulation framework you're >>>> using? >>> >>> It's a pretty special-purpose device that requires user configuration so >>> I don't consider QEMU compatibility to be mandatory. It'd be nice to >>> have but not something to bend over backwards for. >> >> Well, the nice thing is that you would get the guest side for free: >> >> http://gitorious.org/nahanni/guest-code/blobs/master/kernel_module/uio/uio_ivshmem.c >> >> You also didn't invent your own virtio protocol, no? :) > > No, because virtio drivers are in Linux kernel proper. Is ivshmem in the kernel tree > or planned to be merged at some point? *shrug* Let's ask Cam. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/25/2011 01:25 AM, David Evensky wrote: > #define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 > #define PCI_DEVICE_ID_VIRTIO_P9 0x1009 > #define PCI_DEVICE_ID_VESA 0x2000 > +#define PCI_DEVICE_ID_PCI_SHMEM 0x0001 > > #define PCI_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > +#define PCI_VENDOR_ID_PCI_SHMEM 0x0001 > #define PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > > Please use a real life vendor ID from http://www.pcidatabase.com. If you're following an existing spec, you should pick the vendor ID matching the device you're emulating. If not, as seems to be the case here, you need your own, or permission from an existing owner of a vendor ID.
On Thu, Aug 25, 2011 at 08:06:34AM +0300, Pekka Enberg wrote: > On Wed, 2011-08-24 at 21:49 -0700, David Evensky wrote: > > On Wed, Aug 24, 2011 at 10:27:18PM -0500, Alexander Graf wrote: > > > > > > On 24.08.2011, at 17:25, David Evensky wrote: > > > > > > > > > > > > > > > This patch adds a PCI device that provides PCI device memory to the > > > > guest. This memory in the guest exists as a shared memory segment in > > > > the host. This is similar memory sharing capability of Nahanni > > > > (ivshmem) available in QEMU. In this case, the shared memory segment > > > > is exposed as a PCI BAR only. > > > > > > > > A new command line argument is added as: > > > > --shmem pci:0xc8000000:16MB:handle=/newmem:create > > > > > > > > which will set the PCI BAR at 0xc8000000, the shared memory segment > > > > and the region pointed to by the BAR will be 16MB. On the host side > > > > the shm_open handle will be '/newmem', and the kvm tool will create > > > > the shared segment, set its size, and initialize it. If the size, > > > > handle, or create flag are absent, they will default to 16MB, > > > > handle=/kvm_shmem, and create will be false. The address family, > > > > 'pci:' is also optional as it is the only address family currently > > > > supported. Only a single --shmem is supported at this time. > > > > > > Did you have a look at ivshmem? It does that today, but also gives > > you an IRQ line so the guests can poke each other. For something as > > simple as this, I don't see why we'd need two competing > > implementations. > > > > Isn't ivshmem in QEMU? If so, then I don't think there isn't any > > competition. How do you feel that these are competing? > > It's obviously not competing. One thing you might want to consider is > making the guest interface compatible with ivshmem. Is there any reason > we shouldn't do that? I don't consider that a requirement, just nice to > have. I think it depends on what the goal is. For us, just having a hunk of memory shared between the host and guests that the guests can ioremap provides a lot. Having the rest of ivshmem's guest interface I don't think would impact our use above, but I haven't tested things with QEMU to verify that. \dae > > Pekka > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I don't know if there is a PCI card that only provides a region of memory. I'm not really trying to provide emulation for a known piece of hardware, so I picked values that weren't being used since there didn't appear to be an 'unknown'. I'll ask around. \dae On Thu, Aug 25, 2011 at 08:41:43AM +0300, Avi Kivity wrote: > On 08/25/2011 01:25 AM, David Evensky wrote: > > #define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 > > #define PCI_DEVICE_ID_VIRTIO_P9 0x1009 > > #define PCI_DEVICE_ID_VESA 0x2000 > >+#define PCI_DEVICE_ID_PCI_SHMEM 0x0001 > > > > #define PCI_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > >+#define PCI_VENDOR_ID_PCI_SHMEM 0x0001 > > #define PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > > > > > > Please use a real life vendor ID from http://www.pcidatabase.com. > If you're following an existing spec, you should pick the vendor ID > matching the device you're emulating. If not, as seems to be the > case here, you need your own, or permission from an existing owner > of a vendor ID. > > -- > I have a truly marvellous patch that fixes the bug which this > signature is too narrow to contain. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2011 at 1:25 AM, David Evensky <evensky@sandia.gov> wrote: > + if (*next == '\0') > + p = next; > + else > + p = next + 1; > + /* parse out size */ > + base = 10; > + if (strcasestr(p, "0x")) > + base = 16; > + size = strtoll(p, &next, base); > + if (next == p && size == 0) { > + pr_info("shmem: no size specified, using default."); > + size = default_size; > + } > + /* look for [KMGkmg][Bb]* uses base 2. */ > + int skip_B = 0; > + if (strspn(next, "KMGkmg")) { /* might have a prefix */ > + if (*(next + 1) == 'B' || *(next + 1) == 'b') > + skip_B = 1; > + switch (*next) { > + case 'K': > + case 'k': > + size = size << KB_SHIFT; > + break; > + case 'M': > + case 'm': > + size = size << MB_SHIFT; > + break; > + case 'G': > + case 'g': > + size = size << GB_SHIFT; > + break; > + default: > + die("shmem: bug in detecting size prefix."); > + break; > + } There's some nice code in perf to parse sizes like this. We could just steal that. > +inline void fill_mem(void *buf, size_t buf_size, char *fill, size_t fill_len) > +{ > + size_t i; > + > + if (fill_len == 1) { > + memset(buf, fill[0], buf_size); > + } else { > + if (buf_size > fill_len) { > + for (i = 0; i < buf_size - fill_len; i += fill_len) > + memcpy(((char *)buf) + i, fill, fill_len); > + memcpy(buf + i, fill, buf_size - i); > + } else { > + memcpy(buf, fill, buf_size); > + } > + } > +} Can we do a memset_pattern4() type of interface instead? I think it's mostly pointless to try to support arbitrary-length 'fill'. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2011 at 09:02:56AM +0300, Pekka Enberg wrote: > On Thu, Aug 25, 2011 at 1:25 AM, David Evensky <evensky@sandia.gov> wrote: > > + ? ? ? if (*next == '\0') > > + ? ? ? ? ? ? ? p = next; > > + ? ? ? else > > + ? ? ? ? ? ? ? p = next + 1; > > + ? ? ? /* parse out size */ > > + ? ? ? base = 10; > > + ? ? ? if (strcasestr(p, "0x")) > > + ? ? ? ? ? ? ? base = 16; > > + ? ? ? size = strtoll(p, &next, base); > > + ? ? ? if (next == p && size == 0) { > > + ? ? ? ? ? ? ? pr_info("shmem: no size specified, using default."); > > + ? ? ? ? ? ? ? size = default_size; > > + ? ? ? } > > + ? ? ? /* look for [KMGkmg][Bb]* ?uses base 2. */ > > + ? ? ? int skip_B = 0; > > + ? ? ? if (strspn(next, "KMGkmg")) { ? /* might have a prefix */ > > + ? ? ? ? ? ? ? if (*(next + 1) == 'B' || *(next + 1) == 'b') > > + ? ? ? ? ? ? ? ? ? ? ? skip_B = 1; > > + ? ? ? ? ? ? ? switch (*next) { > > + ? ? ? ? ? ? ? case 'K': > > + ? ? ? ? ? ? ? case 'k': > > + ? ? ? ? ? ? ? ? ? ? ? size = size << KB_SHIFT; > > + ? ? ? ? ? ? ? ? ? ? ? break; > > + ? ? ? ? ? ? ? case 'M': > > + ? ? ? ? ? ? ? case 'm': > > + ? ? ? ? ? ? ? ? ? ? ? size = size << MB_SHIFT; > > + ? ? ? ? ? ? ? ? ? ? ? break; > > + ? ? ? ? ? ? ? case 'G': > > + ? ? ? ? ? ? ? case 'g': > > + ? ? ? ? ? ? ? ? ? ? ? size = size << GB_SHIFT; > > + ? ? ? ? ? ? ? ? ? ? ? break; > > + ? ? ? ? ? ? ? default: > > + ? ? ? ? ? ? ? ? ? ? ? die("shmem: bug in detecting size prefix."); > > + ? ? ? ? ? ? ? ? ? ? ? break; > > + ? ? ? ? ? ? ? } > > There's some nice code in perf to parse sizes like this. We could just > steal that. That sounds good to me. > > +inline void fill_mem(void *buf, size_t buf_size, char *fill, size_t fill_len) > > +{ > > + ? ? ? size_t i; > > + > > + ? ? ? if (fill_len == 1) { > > + ? ? ? ? ? ? ? memset(buf, fill[0], buf_size); > > + ? ? ? } else { > > + ? ? ? ? ? ? ? if (buf_size > fill_len) { > > + ? ? ? ? ? ? ? ? ? ? ? for (i = 0; i < buf_size - fill_len; i += fill_len) > > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? memcpy(((char *)buf) + i, fill, fill_len); > > + ? ? ? ? ? ? ? ? ? ? ? memcpy(buf + i, fill, buf_size - i); > > + ? ? ? ? ? ? ? } else { > > + ? ? ? ? ? ? ? ? ? ? ? memcpy(buf, fill, buf_size); > > + ? ? ? ? ? ? ? } > > + ? ? ? } > > +} > > Can we do a memset_pattern4() type of interface instead? I think it's > mostly pointless to try to support arbitrary-length 'fill'. Yeah, I can see how the arbitrary fill thing might be too cute. It certainly isn't necessary. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2011 at 1:54 PM, Pekka Enberg <penberg@kernel.org> wrote: > > On 8/25/11 8:34 AM, Asias He wrote: > > Hi, David > > On Thu, Aug 25, 2011 at 6:25 AM, David Evensky <evensky@sandia.gov> wrote: >> >> >> This patch adds a PCI device that provides PCI device memory to the >> guest. This memory in the guest exists as a shared memory segment in >> the host. This is similar memory sharing capability of Nahanni >> (ivshmem) available in QEMU. In this case, the shared memory segment >> is exposed as a PCI BAR only. >> >> A new command line argument is added as: >> --shmem pci:0xc8000000:16MB:handle=/newmem:create >> >> which will set the PCI BAR at 0xc8000000, the shared memory segment >> and the region pointed to by the BAR will be 16MB. On the host side >> the shm_open handle will be '/newmem', and the kvm tool will create >> the shared segment, set its size, and initialize it. If the size, >> handle, or create flag are absent, they will default to 16MB, >> handle=/kvm_shmem, and create will be false. > > I think it's better to use a default BAR address if user does not specify one as well. > This way, > > ./kvm --shmem > > will work with default values with zero configuration. > > Does that sort of thing make sense here? It's a special purpose device > and the guest is expected to ioremap() the memory so it needs to > know the BAR. I mean a default bar address for --shmem device. Yes, guest needs to know this address, but even if we specify the address at command line the guest still does not know this address, no? So having a default bar address does no harm. >> The address family, >> 'pci:' is also optional as it is the only address family currently >> supported. Only a single --shmem is supported at this time. > > So, let's drop the 'pci:' prefix. > > That means the user interface will change if someone adds new address > families. So we should keep the prefix, no? We can have a more flexible option format which does not depend on the order of args, e.g.: --shmem bar=0xc8000000,size=16MB,handle=/newmem,ops=create, type=pci if user does not specify sub-args, just use the default one. -- Asias He -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/25/11 9:30 AM, Asias He wrote: > On Thu, Aug 25, 2011 at 1:54 PM, Pekka Enberg<penberg@kernel.org> wrote: >> On 8/25/11 8:34 AM, Asias He wrote: >> >> Hi, David >> >> On Thu, Aug 25, 2011 at 6:25 AM, David Evensky<evensky@sandia.gov> wrote: >>> >>> This patch adds a PCI device that provides PCI device memory to the >>> guest. This memory in the guest exists as a shared memory segment in >>> the host. This is similar memory sharing capability of Nahanni >>> (ivshmem) available in QEMU. In this case, the shared memory segment >>> is exposed as a PCI BAR only. >>> >>> A new command line argument is added as: >>> --shmem pci:0xc8000000:16MB:handle=/newmem:create >>> >>> which will set the PCI BAR at 0xc8000000, the shared memory segment >>> and the region pointed to by the BAR will be 16MB. On the host side >>> the shm_open handle will be '/newmem', and the kvm tool will create >>> the shared segment, set its size, and initialize it. If the size, >>> handle, or create flag are absent, they will default to 16MB, >>> handle=/kvm_shmem, and create will be false. >> I think it's better to use a default BAR address if user does not specify one as well. >> This way, >> >> ./kvm --shmem >> >> will work with default values with zero configuration. >> >> Does that sort of thing make sense here? It's a special purpose device >> and the guest is expected to ioremap() the memory so it needs to >> know the BAR. > I mean a default bar address for --shmem device. Yes, guest needs to know > this address, but even if we specify the address at command line the guest still > does not know this address, no? So having a default bar address does no harm. How does the user discover what the default BAR is? Which default BAR should we use? I don't think default BAR adds much value here. >>> The address family, >>> 'pci:' is also optional as it is the only address family currently >>> supported. Only a single --shmem is supported at this time. >> So, let's drop the 'pci:' prefix. >> >> That means the user interface will change if someone adds new address >> families. So we should keep the prefix, no? > We can have a more flexible option format which does not depend on the order of > args, e.g.: > > --shmem bar=0xc8000000,size=16MB,handle=/newmem,ops=create, type=pci > > if user does not specify sub-args, just use the default one. Sure, makes sense. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2011 at 3:02 PM, Pekka Enberg <penberg@kernel.org> wrote: > On 8/25/11 9:30 AM, Asias He wrote: >> >> On Thu, Aug 25, 2011 at 1:54 PM, Pekka Enberg<penberg@kernel.org> wrote: >>> >>> On 8/25/11 8:34 AM, Asias He wrote: >>> >>> Hi, David >>> >>> On Thu, Aug 25, 2011 at 6:25 AM, David Evensky<evensky@sandia.gov> >>> wrote: >>>> >>>> This patch adds a PCI device that provides PCI device memory to the >>>> guest. This memory in the guest exists as a shared memory segment in >>>> the host. This is similar memory sharing capability of Nahanni >>>> (ivshmem) available in QEMU. In this case, the shared memory segment >>>> is exposed as a PCI BAR only. >>>> >>>> A new command line argument is added as: >>>> --shmem pci:0xc8000000:16MB:handle=/newmem:create >>>> >>>> which will set the PCI BAR at 0xc8000000, the shared memory segment >>>> and the region pointed to by the BAR will be 16MB. On the host side >>>> the shm_open handle will be '/newmem', and the kvm tool will create >>>> the shared segment, set its size, and initialize it. If the size, >>>> handle, or create flag are absent, they will default to 16MB, >>>> handle=/kvm_shmem, and create will be false. >>> >>> I think it's better to use a default BAR address if user does not specify >>> one as well. >>> This way, >>> >>> ./kvm --shmem >>> >>> will work with default values with zero configuration. >>> >>> Does that sort of thing make sense here? It's a special purpose device >>> and the guest is expected to ioremap() the memory so it needs to >>> know the BAR. >> >> I mean a default bar address for --shmem device. Yes, guest needs to know >> this address, but even if we specify the address at command line the guest >> still >> does not know this address, no? So having a default bar address does no >> harm. > > How does the user discover what the default BAR is? Which default BAR > should we use? I don't think default BAR adds much value here. 1. Print it on startup like, like we do for --name. # kvm run -k ./bzImage -m 448 -c 4 --name guest-26676 --shmem bar=0xc8000000 or 2. kvm stat --shmem David has chosen a default BAR already. #define SHMEM_DEFAULT_ADDR (0xc8000000) >>>> The address family, >>>> 'pci:' is also optional as it is the only address family currently >>>> supported. Only a single --shmem is supported at this time. >>> >>> So, let's drop the 'pci:' prefix. >>> >>> That means the user interface will change if someone adds new address >>> families. So we should keep the prefix, no? >> >> We can have a more flexible option format which does not depend on the >> order of >> args, e.g.: >> >> --shmem bar=0xc8000000,size=16MB,handle=/newmem,ops=create, type=pci >> >> if user does not specify sub-args, just use the default one. > > Sure, makes sense. > > Pekka >
On 8/25/11 10:20 AM, Asias He wrote: > On Thu, Aug 25, 2011 at 3:02 PM, Pekka Enberg<penberg@kernel.org> wrote: >> On 8/25/11 9:30 AM, Asias He wrote: >>> On Thu, Aug 25, 2011 at 1:54 PM, Pekka Enberg<penberg@kernel.org> wrote: >>>> On 8/25/11 8:34 AM, Asias He wrote: >>>> >>>> Hi, David >>>> >>>> On Thu, Aug 25, 2011 at 6:25 AM, David Evensky<evensky@sandia.gov> >>>> wrote: >>>>> This patch adds a PCI device that provides PCI device memory to the >>>>> guest. This memory in the guest exists as a shared memory segment in >>>>> the host. This is similar memory sharing capability of Nahanni >>>>> (ivshmem) available in QEMU. In this case, the shared memory segment >>>>> is exposed as a PCI BAR only. >>>>> >>>>> A new command line argument is added as: >>>>> --shmem pci:0xc8000000:16MB:handle=/newmem:create >>>>> >>>>> which will set the PCI BAR at 0xc8000000, the shared memory segment >>>>> and the region pointed to by the BAR will be 16MB. On the host side >>>>> the shm_open handle will be '/newmem', and the kvm tool will create >>>>> the shared segment, set its size, and initialize it. If the size, >>>>> handle, or create flag are absent, they will default to 16MB, >>>>> handle=/kvm_shmem, and create will be false. >>>> I think it's better to use a default BAR address if user does not specify >>>> one as well. >>>> This way, >>>> >>>> ./kvm --shmem >>>> >>>> will work with default values with zero configuration. >>>> >>>> Does that sort of thing make sense here? It's a special purpose device >>>> and the guest is expected to ioremap() the memory so it needs to >>>> know the BAR. >>> I mean a default bar address for --shmem device. Yes, guest needs to know >>> this address, but even if we specify the address at command line the guest >>> still >>> does not know this address, no? So having a default bar address does no >>> harm. >> How does the user discover what the default BAR is? Which default BAR >> should we use? I don't think default BAR adds much value here. > 1. Print it on startup like, like we do for --name. > # kvm run -k ./bzImage -m 448 -c 4 --name guest-26676 --shmem bar=0xc8000000 > > or > > 2. kvm stat --shmem > > David has chosen a default BAR already. > #define SHMEM_DEFAULT_ADDR (0xc8000000) OK. Makes sense. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2011 at 6:06 AM, Pekka Enberg <penberg@kernel.org> wrote: > On Wed, 2011-08-24 at 21:49 -0700, David Evensky wrote: >> On Wed, Aug 24, 2011 at 10:27:18PM -0500, Alexander Graf wrote: >> > >> > On 24.08.2011, at 17:25, David Evensky wrote: >> > >> > > >> > > >> > > This patch adds a PCI device that provides PCI device memory to the >> > > guest. This memory in the guest exists as a shared memory segment in >> > > the host. This is similar memory sharing capability of Nahanni >> > > (ivshmem) available in QEMU. In this case, the shared memory segment >> > > is exposed as a PCI BAR only. >> > > >> > > A new command line argument is added as: >> > > --shmem pci:0xc8000000:16MB:handle=/newmem:create >> > > >> > > which will set the PCI BAR at 0xc8000000, the shared memory segment >> > > and the region pointed to by the BAR will be 16MB. On the host side >> > > the shm_open handle will be '/newmem', and the kvm tool will create >> > > the shared segment, set its size, and initialize it. If the size, >> > > handle, or create flag are absent, they will default to 16MB, >> > > handle=/kvm_shmem, and create will be false. The address family, >> > > 'pci:' is also optional as it is the only address family currently >> > > supported. Only a single --shmem is supported at this time. >> > >> > Did you have a look at ivshmem? It does that today, but also gives >> you an IRQ line so the guests can poke each other. For something as >> simple as this, I don't see why we'd need two competing >> implementations. >> >> Isn't ivshmem in QEMU? If so, then I don't think there isn't any >> competition. How do you feel that these are competing? > > It's obviously not competing. One thing you might want to consider is > making the guest interface compatible with ivshmem. Is there any reason > we shouldn't do that? I don't consider that a requirement, just nice to > have. The point of implementing the same interface as ivshmem is that users don't need to rejig guests or applications in order to switch between hypervisors. A different interface also prevents same-to-same benchmarks. There is little benefit to creating another virtual device interface when a perfectly good one already exists. The question should be: how is this shmem device different and better than ivshmem? If there is no justification then implement the ivshmem interface. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Stefan, On Thu, Aug 25, 2011 at 1:31 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: >> It's obviously not competing. One thing you might want to consider is >> making the guest interface compatible with ivshmem. Is there any reason >> we shouldn't do that? I don't consider that a requirement, just nice to >> have. > > The point of implementing the same interface as ivshmem is that users > don't need to rejig guests or applications in order to switch between > hypervisors. A different interface also prevents same-to-same > benchmarks. > > There is little benefit to creating another virtual device interface > when a perfectly good one already exists. The question should be: how > is this shmem device different and better than ivshmem? If there is > no justification then implement the ivshmem interface. So which interface are we actually taking about? Userspace/kernel in the guest or hypervisor/guest kernel? Either way, while it would be nice to share the interface but it's not a *requirement* for tools/kvm unless ivshmem is specified in the virtio spec or the driver is in mainline Linux. We don't intend to require people to implement non-standard and non-Linux QEMU interfaces. OTOH, ivshmem would make the PCI ID problem go away. David, Sasha, thoughts? Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2011 at 11:37 AM, Pekka Enberg <penberg@kernel.org> wrote: > Hi Stefan, > > On Thu, Aug 25, 2011 at 1:31 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: >>> It's obviously not competing. One thing you might want to consider is >>> making the guest interface compatible with ivshmem. Is there any reason >>> we shouldn't do that? I don't consider that a requirement, just nice to >>> have. >> >> The point of implementing the same interface as ivshmem is that users >> don't need to rejig guests or applications in order to switch between >> hypervisors. A different interface also prevents same-to-same >> benchmarks. >> >> There is little benefit to creating another virtual device interface >> when a perfectly good one already exists. The question should be: how >> is this shmem device different and better than ivshmem? If there is >> no justification then implement the ivshmem interface. > > So which interface are we actually taking about? Userspace/kernel in the > guest or hypervisor/guest kernel? The hardware interface. Same PCI BAR layout and semantics. > Either way, while it would be nice to share the interface but it's not a > *requirement* for tools/kvm unless ivshmem is specified in the virtio > spec or the driver is in mainline Linux. We don't intend to require people > to implement non-standard and non-Linux QEMU interfaces. OTOH, > ivshmem would make the PCI ID problem go away. Introducing yet another non-standard and non-Linux interface doesn't help though. If there is no significant improvement over ivshmem then it makes sense to let ivshmem gain critical mass and more users instead of fragmenting the space. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2011 at 1:59 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > Introducing yet another non-standard and non-Linux interface doesn't > help though. If there is no significant improvement over ivshmem then > it makes sense to let ivshmem gain critical mass and more users > instead of fragmenting the space. Look, I'm not going to require QEMU compatibility from tools/kvm contributors. If you guys really feel that strongly about the interface, then either - Get Rusty's "virtio spec pixie pee" for ivshmem - Get the Linux driver merged to linux-next - Help out David and Sasha to change interface But don't ask me to block clean code from inclusion to tools/kvm because it doesn't have a QEMU-capable interface. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2011-08-25 at 11:59 +0100, Stefan Hajnoczi wrote: > On Thu, Aug 25, 2011 at 11:37 AM, Pekka Enberg <penberg@kernel.org> wrote: > > Hi Stefan, > > > > On Thu, Aug 25, 2011 at 1:31 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > >>> It's obviously not competing. One thing you might want to consider is > >>> making the guest interface compatible with ivshmem. Is there any reason > >>> we shouldn't do that? I don't consider that a requirement, just nice to > >>> have. > >> > >> The point of implementing the same interface as ivshmem is that users > >> don't need to rejig guests or applications in order to switch between > >> hypervisors. A different interface also prevents same-to-same > >> benchmarks. > >> > >> There is little benefit to creating another virtual device interface > >> when a perfectly good one already exists. The question should be: how > >> is this shmem device different and better than ivshmem? If there is > >> no justification then implement the ivshmem interface. > > > > So which interface are we actually taking about? Userspace/kernel in the > > guest or hypervisor/guest kernel? > > The hardware interface. Same PCI BAR layout and semantics. > > > Either way, while it would be nice to share the interface but it's not a > > *requirement* for tools/kvm unless ivshmem is specified in the virtio > > spec or the driver is in mainline Linux. We don't intend to require people > > to implement non-standard and non-Linux QEMU interfaces. OTOH, > > ivshmem would make the PCI ID problem go away. > > Introducing yet another non-standard and non-Linux interface doesn't > help though. If there is no significant improvement over ivshmem then > it makes sense to let ivshmem gain critical mass and more users > instead of fragmenting the space. I support doing it ivshmem-compatible, though it doesn't have to be a requirement right now (that is, use this patch as a base and build it towards ivshmem - which shouldn't be an issue since this patch provides the PCI+SHM parts which are required by ivshmem anyway). ivshmem is a good, documented, stable interface backed by a lot of research and testing behind it. Looking at the spec it's obvious that Cam had KVM in mind when designing it and thats exactly what we want to have in the KVM tool. David, did you have any plans to extend it to become ivshmem-compatible? If not, would turning it into such break any code that depends on it horribly?
On 08/25/2011 02:15 PM, Pekka Enberg wrote: > On Thu, Aug 25, 2011 at 1:59 PM, Stefan Hajnoczi<stefanha@gmail.com> wrote: > > Introducing yet another non-standard and non-Linux interface doesn't > > help though. If there is no significant improvement over ivshmem then > > it makes sense to let ivshmem gain critical mass and more users > > instead of fragmenting the space. > > Look, I'm not going to require QEMU compatibility from tools/kvm > contributors. If you guys really feel that strongly about the > interface, then either > > - Get Rusty's "virtio spec pixie pee" for ivshmem It's not a virtio device (doesn't do dma). It does have a spec in qemu.git/docs/specs. > - Get the Linux driver merged to linux-next ivshmem uses uio, so it doesn't need an in-kernel driver, IIRC. Map your BAR from sysfs and go. > - Help out David and Sasha to change interface > > But don't ask me to block clean code from inclusion to tools/kvm > because it doesn't have a QEMU-capable interface. A lot of thought has gone into the design and implementation of ivshmem. But don't let that stop you from merging clean code.
On Thu, Aug 25, 2011 at 2:30 PM, Avi Kivity <avi@redhat.com> wrote: > On 08/25/2011 02:15 PM, Pekka Enberg wrote: >> >> On Thu, Aug 25, 2011 at 1:59 PM, Stefan Hajnoczi<stefanha@gmail.com> >> wrote: >> > Introducing yet another non-standard and non-Linux interface doesn't >> > help though. If there is no significant improvement over ivshmem then >> > it makes sense to let ivshmem gain critical mass and more users >> > instead of fragmenting the space. >> >> Look, I'm not going to require QEMU compatibility from tools/kvm >> contributors. If you guys really feel that strongly about the >> interface, then either >> >> - Get Rusty's "virtio spec pixie pee" for ivshmem > > It's not a virtio device (doesn't do dma). It does have a spec in > qemu.git/docs/specs. > >> - Get the Linux driver merged to linux-next > > ivshmem uses uio, so it doesn't need an in-kernel driver, IIRC. Map your > BAR from sysfs and go. Right. On Thu, Aug 25, 2011 at 2:30 PM, Avi Kivity <avi@redhat.com> wrote: >> - Help out David and Sasha to change interface >> >> But don't ask me to block clean code from inclusion to tools/kvm >> because it doesn't have a QEMU-capable interface. > > A lot of thought has gone into the design and implementation of ivshmem. > But don't let that stop you from merging clean code. Thanks, I won't. If you or other KVM folks want to have a say what goes into tools/kvm, I'm happy to send you a pull request against kvm.git. Anyway, Sasha thinks ivshmem is the way to go and that's good enough for me. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2011-08-25 at 14:30 +0300, Avi Kivity wrote: > On 08/25/2011 02:15 PM, Pekka Enberg wrote: > > On Thu, Aug 25, 2011 at 1:59 PM, Stefan Hajnoczi<stefanha@gmail.com> wrote: > > > Introducing yet another non-standard and non-Linux interface doesn't > > > help though. If there is no significant improvement over ivshmem then > > > it makes sense to let ivshmem gain critical mass and more users > > > instead of fragmenting the space. > > > > Look, I'm not going to require QEMU compatibility from tools/kvm > > contributors. If you guys really feel that strongly about the > > interface, then either > > > > - Get Rusty's "virtio spec pixie pee" for ivshmem > > It's not a virtio device (doesn't do dma). It does have a spec in > qemu.git/docs/specs. Please note that the spec you have in /docs/specs is different from what Cam has in his git tree (https://gitorious.org/nahanni/guest-code/blobs/master/device_spec.txt ). If we are going to add it to KVM tool maybe it's a good time to move it out of QEMU tree and make it less QEMU specific? > > > - Get the Linux driver merged to linux-next > > ivshmem uses uio, so it doesn't need an in-kernel driver, IIRC. Map > your BAR from sysfs and go. > > > - Help out David and Sasha to change interface > > > > But don't ask me to block clean code from inclusion to tools/kvm > > because it doesn't have a QEMU-capable interface. > > A lot of thought has gone into the design and implementation of > ivshmem. But don't let that stop you from merging clean code. Theres a big difference in requiring it to be ivshmem compatible because ivshmem is good and requiring it to be ivshmem compatible because thats what QEMU is doing. Looking at the comments in this thread I would have expected to see much more comments regarding the technical supremacy of ivshmem over a simple memory shared block instead of the argument that KVM tools has to conform to QEMU standards.
On 08/25/2011 02:38 PM, Pekka Enberg wrote: > If you or other KVM folks want to have a say what goes into tools/kvm, > I'm happy to send you a pull request against kvm.git. Thanks, but I have my hands full already. I'll stop offering unwanted advice as well. > Anyway, Sasha thinks ivshmem is the way to go and that's good enough for me. > Great.
On 08/25/2011 02:38 PM, Pekka Enberg wrote: >> If you or other KVM folks want to have a say what goes into tools/kvm, >> I'm happy to send you a pull request against kvm.git. On Thu, Aug 25, 2011 at 2:51 PM, Avi Kivity <avi@redhat.com> wrote: > Thanks, but I have my hands full already. I'll stop offering unwanted > advice as well. Your advice has never been unwanted nor do I imagine it to ever be. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adding in the rest of what ivshmem does shouldn't affect our use, *I think*. I hadn't intended this to do everything that ivshmem does, but I can see how that would be useful. It would be cool if it could grow into that. Our requirements for the driver in kvm tool are that another program on the host can create a shared segment (anonymous, non-file backed) with a specified handle, size, and contents. That this segment is available to the guest at boot time at a specified address and that no driver will change the contents of the memory except under direct user action. Also, when the guest goes away the shared memory segment shouldn't be affected (e.g. contents changed). Finally, we cannot change the lightweight nature of kvm tool. This is the feature of ivshmem that I need to check today. I did some testing a month ago, but it wasn't detailed enough to check this out. \dae On Thu, Aug 25, 2011 at 02:25:48PM +0300, Sasha Levin wrote: > On Thu, 2011-08-25 at 11:59 +0100, Stefan Hajnoczi wrote: > > On Thu, Aug 25, 2011 at 11:37 AM, Pekka Enberg <penberg@kernel.org> wrote: > > > Hi Stefan, > > > > > > On Thu, Aug 25, 2011 at 1:31 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > > >>> It's obviously not competing. One thing you might want to consider is > > >>> making the guest interface compatible with ivshmem. Is there any reason > > >>> we shouldn't do that? I don't consider that a requirement, just nice to > > >>> have. > > >> > > >> The point of implementing the same interface as ivshmem is that users > > >> don't need to rejig guests or applications in order to switch between > > >> hypervisors. A different interface also prevents same-to-same > > >> benchmarks. > > >> > > >> There is little benefit to creating another virtual device interface > > >> when a perfectly good one already exists. The question should be: how > > >> is this shmem device different and better than ivshmem? If there is > > >> no justification then implement the ivshmem interface. > > > > > > So which interface are we actually taking about? Userspace/kernel in the > > > guest or hypervisor/guest kernel? > > > > The hardware interface. Same PCI BAR layout and semantics. > > > > > Either way, while it would be nice to share the interface but it's not a > > > *requirement* for tools/kvm unless ivshmem is specified in the virtio > > > spec or the driver is in mainline Linux. We don't intend to require people > > > to implement non-standard and non-Linux QEMU interfaces. OTOH, > > > ivshmem would make the PCI ID problem go away. > > > > Introducing yet another non-standard and non-Linux interface doesn't > > help though. If there is no significant improvement over ivshmem then > > it makes sense to let ivshmem gain critical mass and more users > > instead of fragmenting the space. > > I support doing it ivshmem-compatible, though it doesn't have to be a > requirement right now (that is, use this patch as a base and build it > towards ivshmem - which shouldn't be an issue since this patch provides > the PCI+SHM parts which are required by ivshmem anyway). > > ivshmem is a good, documented, stable interface backed by a lot of > research and testing behind it. Looking at the spec it's obvious that > Cam had KVM in mind when designing it and thats exactly what we want to > have in the KVM tool. > > David, did you have any plans to extend it to become ivshmem-compatible? > If not, would turning it into such break any code that depends on it > horribly? > > -- > > Sasha. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I've tested ivshmem with the latest git pull (had minor trouble building on debian sid, vnc and unused var, but trivial to work around). QEMU's -device ivshmem,size=16,shm=/kvm_shmem seems to function as my proposed --shmem pci:0xfd000000:16M:handle=/kvm_shmem except that I can't specify the BAR. I am able to read what I'm given, 0xfd000000, from lspci -vvv; but for our application we need to be able to specify the address on the command line. If folks are open, I would like to request this feature in the ivshmem. It would be cool to test our application with QEMU, even if we can't use it in production. I didn't check the case where QEMU must create the shared segment from scratch, etc. so I didn't test what differences there are with my proposed 'create' flag or not, but I did look at the ivshmem source and looks like it does the right thing. (Makes me want to steal code to make mine better :-)) \dae On Thu, Aug 25, 2011 at 08:08:06AM -0700, David Evensky wrote: > > Adding in the rest of what ivshmem does shouldn't affect our use, *I > think*. I hadn't intended this to do everything that ivshmem does, > but I can see how that would be useful. It would be cool if it could > grow into that. > > Our requirements for the driver in kvm tool are that another program > on the host can create a shared segment (anonymous, non-file backed) > with a specified handle, size, and contents. That this segment is > available to the guest at boot time at a specified address and that no > driver will change the contents of the memory except under direct user > action. Also, when the guest goes away the shared memory segment > shouldn't be affected (e.g. contents changed). Finally, we cannot > change the lightweight nature of kvm tool. > > This is the feature of ivshmem that I need to check today. I did some > testing a month ago, but it wasn't detailed enough to check this out. > > \dae > > > > > On Thu, Aug 25, 2011 at 02:25:48PM +0300, Sasha Levin wrote: > > On Thu, 2011-08-25 at 11:59 +0100, Stefan Hajnoczi wrote: > > > On Thu, Aug 25, 2011 at 11:37 AM, Pekka Enberg <penberg@kernel.org> wrote: > > > > Hi Stefan, > > > > > > > > On Thu, Aug 25, 2011 at 1:31 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > > > >>> It's obviously not competing. One thing you might want to consider is > > > >>> making the guest interface compatible with ivshmem. Is there any reason > > > >>> we shouldn't do that? I don't consider that a requirement, just nice to > > > >>> have. > > > >> > > > >> The point of implementing the same interface as ivshmem is that users > > > >> don't need to rejig guests or applications in order to switch between > > > >> hypervisors. A different interface also prevents same-to-same > > > >> benchmarks. > > > >> > > > >> There is little benefit to creating another virtual device interface > > > >> when a perfectly good one already exists. The question should be: how > > > >> is this shmem device different and better than ivshmem? If there is > > > >> no justification then implement the ivshmem interface. > > > > > > > > So which interface are we actually taking about? Userspace/kernel in the > > > > guest or hypervisor/guest kernel? > > > > > > The hardware interface. Same PCI BAR layout and semantics. > > > > > > > Either way, while it would be nice to share the interface but it's not a > > > > *requirement* for tools/kvm unless ivshmem is specified in the virtio > > > > spec or the driver is in mainline Linux. We don't intend to require people > > > > to implement non-standard and non-Linux QEMU interfaces. OTOH, > > > > ivshmem would make the PCI ID problem go away. > > > > > > Introducing yet another non-standard and non-Linux interface doesn't > > > help though. If there is no significant improvement over ivshmem then > > > it makes sense to let ivshmem gain critical mass and more users > > > instead of fragmenting the space. > > > > I support doing it ivshmem-compatible, though it doesn't have to be a > > requirement right now (that is, use this patch as a base and build it > > towards ivshmem - which shouldn't be an issue since this patch provides > > the PCI+SHM parts which are required by ivshmem anyway). > > > > ivshmem is a good, documented, stable interface backed by a lot of > > research and testing behind it. Looking at the spec it's obvious that > > Cam had KVM in mind when designing it and thats exactly what we want to > > have in the KVM tool. > > > > David, did you have any plans to extend it to become ivshmem-compatible? > > If not, would turning it into such break any code that depends on it > > horribly? > > > > -- > > > > Sasha. > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2011 12:00 AM, David Evensky wrote: > I've tested ivshmem with the latest git pull (had minor trouble > building on debian sid, vnc and unused var, but trivial to work > around). > > QEMU's -device ivshmem,size=16,shm=/kvm_shmem > > seems to function as my proposed > > --shmem pci:0xfd000000:16M:handle=/kvm_shmem > > except that I can't specify the BAR. I am able to read what > I'm given, 0xfd000000, from lspci -vvv; but for our application > we need to be able to specify the address on the command line. > > If folks are open, I would like to request this feature in the > ivshmem. It's not really possible. Qemu does not lay out the BARs, the guest does (specifically the bios). You might be able to re-arrange the layout after the guest boots. Why do you need the BAR at a specific physical address? > It would be cool to test our application with QEMU, > even if we can't use it in production. Why can't you use qemu in production?
On 08/24/2011 05:25 PM, David Evensky wrote: > > > This patch adds a PCI device that provides PCI device memory to the > guest. This memory in the guest exists as a shared memory segment in > the host. This is similar memory sharing capability of Nahanni > (ivshmem) available in QEMU. In this case, the shared memory segment > is exposed as a PCI BAR only. > > A new command line argument is added as: > --shmem pci:0xc8000000:16MB:handle=/newmem:create > > diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/include/kvm/pci-shmem.h linux-kvm_pci_shmem/tools/kvm/include/kvm/pci-shmem.h > --- linux-kvm/tools/kvm/include/kvm/pci-shmem.h 1969-12-31 16:00:00.000000000 -0800 > +++ linux-kvm_pci_shmem/tools/kvm/include/kvm/pci-shmem.h 2011-08-13 15:43:01.067953711 -0700 > @@ -0,0 +1,13 @@ > +#ifndef KVM__PCI_SHMEM_H > +#define KVM__PCI_SHMEM_H > + > +#include<linux/types.h> > +#include<linux/list.h> > + > +struct kvm; > +struct shmem_info; > + > +int pci_shmem__init(struct kvm *self); > +int pci_shmem__register_mem(struct shmem_info *si); > + > +#endif > diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/include/kvm/virtio-pci-dev.h linux-kvm_pci_shmem/tools/kvm/include/kvm/virtio-pci-dev.h > --- linux-kvm/tools/kvm/include/kvm/virtio-pci-dev.h 2011-08-09 15:38:48.760120973 -0700 > +++ linux-kvm_pci_shmem/tools/kvm/include/kvm/virtio-pci-dev.h 2011-08-18 10:06:12.171539230 -0700 > @@ -15,10 +15,13 @@ > #define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 > #define PCI_DEVICE_ID_VIRTIO_P9 0x1009 > #define PCI_DEVICE_ID_VESA 0x2000 > +#define PCI_DEVICE_ID_PCI_SHMEM 0x0001 > > #define PCI_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > +#define PCI_VENDOR_ID_PCI_SHMEM 0x0001 > #define PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET 0x1af4 FYI, that's not a valid vendor and device ID. Perhaps the RH folks would be willing to reserve a portion of the device ID space in their vendor ID for ya'll to play around with. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 25, 2011 at 04:35:29PM -0500, Anthony Liguori wrote: dev.h > .... > >--- linux-kvm/tools/kvm/include/kvm/virtio-pci-dev.h 2011-08-09 15:38:48.760120973 -0700 > >+++ linux-kvm_pci_shmem/tools/kvm/include/kvm/virtio-pci-dev.h 2011-08-18 10:06:12.171539230 -0700 > >@@ -15,10 +15,13 @@ > > #define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 > > #define PCI_DEVICE_ID_VIRTIO_P9 0x1009 > > #define PCI_DEVICE_ID_VESA 0x2000 > >+#define PCI_DEVICE_ID_PCI_SHMEM 0x0001 > > > > #define PCI_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > >+#define PCI_VENDOR_ID_PCI_SHMEM 0x0001 > > #define PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > > FYI, that's not a valid vendor and device ID. > > Perhaps the RH folks would be willing to reserve a portion of the > device ID space in their vendor ID for ya'll to play around with. That would be cool! I've started asking around some folks at my place to see if we have such a thing; but so far, I've heard nothing. > > Regards, > > Anthony Liguori > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I need to specify the physical address because I need to ioremap the memory during boot. The production issue I think is a memory limitation. We certainly do use QEMU a lot; but for this the kvm tool is a better fit. \dae On Fri, Aug 26, 2011 at 12:11:03AM +0300, Avi Kivity wrote: > On 08/26/2011 12:00 AM, David Evensky wrote: > >I've tested ivshmem with the latest git pull (had minor trouble > >building on debian sid, vnc and unused var, but trivial to work > >around). > > > >QEMU's -device ivshmem,size=16,shm=/kvm_shmem > > > >seems to function as my proposed > > > > --shmem pci:0xfd000000:16M:handle=/kvm_shmem > > > >except that I can't specify the BAR. I am able to read what > >I'm given, 0xfd000000, from lspci -vvv; but for our application > >we need to be able to specify the address on the command line. > > > >If folks are open, I would like to request this feature in the > >ivshmem. > > It's not really possible. Qemu does not lay out the BARs, the guest > does (specifically the bios). You might be able to re-arrange the > layout after the guest boots. > > Why do you need the BAR at a specific physical address? > > >It would be cool to test our application with QEMU, > >even if we can't use it in production. > > Why can't you use qemu in production? > > -- > I have a truly marvellous patch that fixes the bug which this > signature is too narrow to contain. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Just FYI, one issue that I found with exposing host memory regions as a PCI BAR (including via a very old version of the ivshmem driver... haven't tried a newer one) is that x86's pci_mmap_page_range doesn't want to set up a write-back cacheable mapping of a BAR. It may not matter for your requirements, but the uncached access reduced guest<->host bandwidth via the shared memory driver by a lot. If you need the physical address to be fixed, you might be better off by reserving a memory region in the e820 map rather than a PCI BAR, since BARs can move around. On Thu, Aug 25, 2011 at 8:08 AM, David Evensky <evensky@dancer.ca.sandia.gov> wrote: > > Adding in the rest of what ivshmem does shouldn't affect our use, *I > think*. I hadn't intended this to do everything that ivshmem does, > but I can see how that would be useful. It would be cool if it could > grow into that. > > Our requirements for the driver in kvm tool are that another program > on the host can create a shared segment (anonymous, non-file backed) > with a specified handle, size, and contents. That this segment is > available to the guest at boot time at a specified address and that no > driver will change the contents of the memory except under direct user > action. Also, when the guest goes away the shared memory segment > shouldn't be affected (e.g. contents changed). Finally, we cannot > change the lightweight nature of kvm tool. > > This is the feature of ivshmem that I need to check today. I did some > testing a month ago, but it wasn't detailed enough to check this out. > > \dae > > > > > On Thu, Aug 25, 2011 at 02:25:48PM +0300, Sasha Levin wrote: > > On Thu, 2011-08-25 at 11:59 +0100, Stefan Hajnoczi wrote: > > > On Thu, Aug 25, 2011 at 11:37 AM, Pekka Enberg <penberg@kernel.org> wrote: > > > > Hi Stefan, > > > > > > > > On Thu, Aug 25, 2011 at 1:31 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > > > >>> It's obviously not competing. One thing you might want to consider is > > > >>> making the guest interface compatible with ivshmem. Is there any reason > > > >>> we shouldn't do that? I don't consider that a requirement, just nice to > > > >>> have. > > > >> > > > >> The point of implementing the same interface as ivshmem is that users > > > >> don't need to rejig guests or applications in order to switch between > > > >> hypervisors. A different interface also prevents same-to-same > > > >> benchmarks. > > > >> > > > >> There is little benefit to creating another virtual device interface > > > >> when a perfectly good one already exists. The question should be: how > > > >> is this shmem device different and better than ivshmem? If there is > > > >> no justification then implement the ivshmem interface. > > > > > > > > So which interface are we actually taking about? Userspace/kernel in the > > > > guest or hypervisor/guest kernel? > > > > > > The hardware interface. Same PCI BAR layout and semantics. > > > > > > > Either way, while it would be nice to share the interface but it's not a > > > > *requirement* for tools/kvm unless ivshmem is specified in the virtio > > > > spec or the driver is in mainline Linux. We don't intend to require people > > > > to implement non-standard and non-Linux QEMU interfaces. OTOH, > > > > ivshmem would make the PCI ID problem go away. > > > > > > Introducing yet another non-standard and non-Linux interface doesn't > > > help though. If there is no significant improvement over ivshmem then > > > it makes sense to let ivshmem gain critical mass and more users > > > instead of fragmenting the space. > > > > I support doing it ivshmem-compatible, though it doesn't have to be a > > requirement right now (that is, use this patch as a base and build it > > towards ivshmem - which shouldn't be an issue since this patch provides > > the PCI+SHM parts which are required by ivshmem anyway). > > > > ivshmem is a good, documented, stable interface backed by a lot of > > research and testing behind it. Looking at the spec it's obvious that > > Cam had KVM in mind when designing it and thats exactly what we want to > > have in the KVM tool. > > > > David, did you have any plans to extend it to become ivshmem-compatible? > > If not, would turning it into such break any code that depends on it > > horribly? > > > > -- > > > > Sasha. > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks. My initial version did use the E820 map (thus the reason I want to have an 'address family'), but it was suggested that PCI would be a better way to go. When I get the rest of the project going, I will certainly test against that. I am going to have to do a LOT of ioremap's so that might be the bottleneck. That said, I don't think it will end up as an issue. \dae On Thu, Aug 25, 2011 at 03:08:52PM -0700, Eric Northup wrote: > Just FYI, one issue that I found with exposing host memory regions as > a PCI BAR (including via a very old version of the ivshmem driver... > haven't tried a newer one) is that x86's pci_mmap_page_range doesn't > want to set up a write-back cacheable mapping of a BAR. > > It may not matter for your requirements, but the uncached access > reduced guest<->host bandwidth via the shared memory driver by a lot. > > > If you need the physical address to be fixed, you might be better off > by reserving a memory region in the e820 map rather than a PCI BAR, > since BARs can move around. > > > On Thu, Aug 25, 2011 at 8:08 AM, David Evensky > <evensky@dancer.ca.sandia.gov> wrote: > > > > Adding in the rest of what ivshmem does shouldn't affect our use, *I > > think*. ?I hadn't intended this to do everything that ivshmem does, > > but I can see how that would be useful. It would be cool if it could > > grow into that. > > > > Our requirements for the driver in kvm tool are that another program > > on the host can create a shared segment (anonymous, non-file backed) > > with a specified handle, size, and contents. That this segment is > > available to the guest at boot time at a specified address and that no > > driver will change the contents of the memory except under direct user > > action. Also, when the guest goes away the shared memory segment > > shouldn't be affected (e.g. contents changed). Finally, we cannot > > change the lightweight nature of kvm tool. > > > > This is the feature of ivshmem that I need to check today. I did some > > testing a month ago, but it wasn't detailed enough to check this out. > > > > \dae > > > > > > > > > > On Thu, Aug 25, 2011 at 02:25:48PM +0300, Sasha Levin wrote: > > > On Thu, 2011-08-25 at 11:59 +0100, Stefan Hajnoczi wrote: > > > > On Thu, Aug 25, 2011 at 11:37 AM, Pekka Enberg <penberg@kernel.org> wrote: > > > > > Hi Stefan, > > > > > > > > > > On Thu, Aug 25, 2011 at 1:31 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > > > > >>> It's obviously not competing. One thing you might want to consider is > > > > >>> making the guest interface compatible with ivshmem. Is there any reason > > > > >>> we shouldn't do that? I don't consider that a requirement, just nice to > > > > >>> have. > > > > >> > > > > >> The point of implementing the same interface as ivshmem is that users > > > > >> don't need to rejig guests or applications in order to switch between > > > > >> hypervisors. ?A different interface also prevents same-to-same > > > > >> benchmarks. > > > > >> > > > > >> There is little benefit to creating another virtual device interface > > > > >> when a perfectly good one already exists. ?The question should be: how > > > > >> is this shmem device different and better than ivshmem? ?If there is > > > > >> no justification then implement the ivshmem interface. > > > > > > > > > > So which interface are we actually taking about? Userspace/kernel in the > > > > > guest or hypervisor/guest kernel? > > > > > > > > The hardware interface. ?Same PCI BAR layout and semantics. > > > > > > > > > Either way, while it would be nice to share the interface but it's not a > > > > > *requirement* for tools/kvm unless ivshmem is specified in the virtio > > > > > spec or the driver is in mainline Linux. We don't intend to require people > > > > > to implement non-standard and non-Linux QEMU interfaces. OTOH, > > > > > ivshmem would make the PCI ID problem go away. > > > > > > > > Introducing yet another non-standard and non-Linux interface doesn't > > > > help though. ?If there is no significant improvement over ivshmem then > > > > it makes sense to let ivshmem gain critical mass and more users > > > > instead of fragmenting the space. > > > > > > I support doing it ivshmem-compatible, though it doesn't have to be a > > > requirement right now (that is, use this patch as a base and build it > > > towards ivshmem - which shouldn't be an issue since this patch provides > > > the PCI+SHM parts which are required by ivshmem anyway). > > > > > > ivshmem is a good, documented, stable interface backed by a lot of > > > research and testing behind it. Looking at the spec it's obvious that > > > Cam had KVM in mind when designing it and thats exactly what we want to > > > have in the KVM tool. > > > > > > David, did you have any plans to extend it to become ivshmem-compatible? > > > If not, would turning it into such break any code that depends on it > > > horribly? > > > > > > -- > > > > > > Sasha. > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at ?http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2011-08-25 at 16:35 -0500, Anthony Liguori wrote: > On 08/24/2011 05:25 PM, David Evensky wrote: > > > > > > This patch adds a PCI device that provides PCI device memory to the > > guest. This memory in the guest exists as a shared memory segment in > > the host. This is similar memory sharing capability of Nahanni > > (ivshmem) available in QEMU. In this case, the shared memory segment > > is exposed as a PCI BAR only. > > > > A new command line argument is added as: > > --shmem pci:0xc8000000:16MB:handle=/newmem:create > > > > diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/include/kvm/pci-shmem.h linux-kvm_pci_shmem/tools/kvm/include/kvm/pci-shmem.h > > --- linux-kvm/tools/kvm/include/kvm/pci-shmem.h 1969-12-31 16:00:00.000000000 -0800 > > +++ linux-kvm_pci_shmem/tools/kvm/include/kvm/pci-shmem.h 2011-08-13 15:43:01.067953711 -0700 > > @@ -0,0 +1,13 @@ > > +#ifndef KVM__PCI_SHMEM_H > > +#define KVM__PCI_SHMEM_H > > + > > +#include<linux/types.h> > > +#include<linux/list.h> > > + > > +struct kvm; > > +struct shmem_info; > > + > > +int pci_shmem__init(struct kvm *self); > > +int pci_shmem__register_mem(struct shmem_info *si); > > + > > +#endif > > diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/include/kvm/virtio-pci-dev.h linux-kvm_pci_shmem/tools/kvm/include/kvm/virtio-pci-dev.h > > --- linux-kvm/tools/kvm/include/kvm/virtio-pci-dev.h 2011-08-09 15:38:48.760120973 -0700 > > +++ linux-kvm_pci_shmem/tools/kvm/include/kvm/virtio-pci-dev.h 2011-08-18 10:06:12.171539230 -0700 > > @@ -15,10 +15,13 @@ > > #define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 > > #define PCI_DEVICE_ID_VIRTIO_P9 0x1009 > > #define PCI_DEVICE_ID_VESA 0x2000 > > +#define PCI_DEVICE_ID_PCI_SHMEM 0x0001 > > > > #define PCI_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > > +#define PCI_VENDOR_ID_PCI_SHMEM 0x0001 > > #define PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET 0x1af4 > > FYI, that's not a valid vendor and device ID. > > Perhaps the RH folks would be willing to reserve a portion of the device > ID space in their vendor ID for ya'll to play around with. I'm working on a patch on top of David's patch to turn it into a ivshmem device. Once it's ready we would use same vendor/device IDs.
On Thu, 2011-08-25 at 08:08 -0700, David Evensky wrote: > Adding in the rest of what ivshmem does shouldn't affect our use, *I > think*. I hadn't intended this to do everything that ivshmem does, > but I can see how that would be useful. It would be cool if it could > grow into that. David, I've added most of ivshmem on top of your driver (still working on fully understanding the client-server protocol). The changes that might affect your use have been simple: * The shared memory BAR is now 2 instead of 0. * Vendor and device IDs changed. * The device now has MSI-X capability in the header and supporting code to run it. If these points won't affect your use I think there shouldn't be any other issues.
Sasha, That is wonderful. It sounds like it should be OK, and will be happy to test. \dae On Fri, Aug 26, 2011 at 09:33:58AM +0300, Sasha Levin wrote: > On Thu, 2011-08-25 at 08:08 -0700, David Evensky wrote: > > Adding in the rest of what ivshmem does shouldn't affect our use, *I > > think*. I hadn't intended this to do everything that ivshmem does, > > but I can see how that would be useful. It would be cool if it could > > grow into that. > > David, > > I've added most of ivshmem on top of your driver (still working on fully > understanding the client-server protocol). > > The changes that might affect your use have been simple: > * The shared memory BAR is now 2 instead of 0. > * Vendor and device IDs changed. > * The device now has MSI-X capability in the header and supporting code > to run it. > > If these points won't affect your use I think there shouldn't be any > other issues. > > -- > > Sasha. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2011 01:03 AM, David Evensky wrote: > I need to specify the physical address because I need to ioremap the > memory during boot. Did you consider pci_ioremap_bar()? > The production issue I think is a memory limitation. We certainly do > use QEMU a lot; but for this the kvm tool is a better fit. > What is this memory limitation?
On Sun, Aug 28, 2011 at 10:34:45AM +0300, Avi Kivity wrote: > On 08/26/2011 01:03 AM, David Evensky wrote: > >I need to specify the physical address because I need to ioremap the > >memory during boot. > > Did you consider pci_ioremap_bar()? No, the code needs a physical memory address, not a PCI device. I suppose we could do something special for PCI devices, but that wasn't our intent. That said, this was also one of those things you learn every day. :-) > >The production issue I think is a memory limitation. We certainly do > >use QEMU a lot; but for this the kvm tool is a better fit. > > > > What is this memory limitation? It isn't a hard and fast number; just trying to maximize the number of guests we can have. > -- > I have a truly marvellous patch that fixes the bug which this > signature is too narrow to contain. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/builtin-run.c linux-kvm_pci_shmem/tools/kvm/builtin-run.c --- linux-kvm/tools/kvm/builtin-run.c 2011-08-24 10:21:22.342077674 -0700 +++ linux-kvm_pci_shmem/tools/kvm/builtin-run.c 2011-08-24 14:17:33.190451297 -0700 @@ -28,6 +28,8 @@ #include "kvm/sdl.h" #include "kvm/vnc.h" #include "kvm/guest_compat.h" +#include "shmem-util.h" +#include "kvm/pci-shmem.h" #include <linux/types.h> @@ -52,6 +54,8 @@ #define DEFAULT_SCRIPT "none" #define MB_SHIFT (20) +#define KB_SHIFT (10) +#define GB_SHIFT (30) #define MIN_RAM_SIZE_MB (64ULL) #define MIN_RAM_SIZE_BYTE (MIN_RAM_SIZE_MB << MB_SHIFT) @@ -151,6 +155,130 @@ static int virtio_9p_rootdir_parser(cons return 0; } +static int shmem_parser(const struct option *opt, const char *arg, int unset) +{ + const uint64_t default_size = SHMEM_DEFAULT_SIZE; + const uint64_t default_phys_addr = SHMEM_DEFAULT_ADDR; + const char *default_handle = SHMEM_DEFAULT_HANDLE; + enum { PCI, UNK } addr_type = PCI; + uint64_t phys_addr; + uint64_t size; + char *handle = NULL; + int create = 0; + const char *p = arg; + char *next; + int base = 10; + int verbose = 0; + + const int skip_pci = strlen("pci:"); + if (verbose) + pr_info("shmem_parser(%p,%s,%d)", opt, arg, unset); + /* parse out optional addr family */ + if (strcasestr(p, "pci:")) { + p += skip_pci; + addr_type = PCI; + } else if (strcasestr(p, "mem:")) { + die("I can't add to E820 map yet.\n"); + } + /* parse out physical addr */ + base = 10; + if (strcasestr(p, "0x")) + base = 16; + phys_addr = strtoll(p, &next, base); + if (next == p && phys_addr == 0) { + pr_info("shmem: no physical addr specified, using default."); + phys_addr = default_phys_addr; + } + if (*next != ':' && *next != '\0') + die("shmem: unexpected chars after phys addr.\n"); + if (*next == '\0') + p = next; + else + p = next + 1; + /* parse out size */ + base = 10; + if (strcasestr(p, "0x")) + base = 16; + size = strtoll(p, &next, base); + if (next == p && size == 0) { + pr_info("shmem: no size specified, using default."); + size = default_size; + } + /* look for [KMGkmg][Bb]* uses base 2. */ + int skip_B = 0; + if (strspn(next, "KMGkmg")) { /* might have a prefix */ + if (*(next + 1) == 'B' || *(next + 1) == 'b') + skip_B = 1; + switch (*next) { + case 'K': + case 'k': + size = size << KB_SHIFT; + break; + case 'M': + case 'm': + size = size << MB_SHIFT; + break; + case 'G': + case 'g': + size = size << GB_SHIFT; + break; + default: + die("shmem: bug in detecting size prefix."); + break; + } + next += 1 + skip_B; + } + if (*next != ':' && *next != '\0') { + die("shmem: unexpected chars after phys size. <%c><%c>\n", + *next, *p); + } + if (*next == '\0') + p = next; + else + p = next + 1; + /* parse out optional shmem handle */ + const int skip_handle = strlen("handle="); + next = strcasestr(p, "handle="); + if (*p && next) { + if (p != next) + die("unexpected chars before handle\n"); + p += skip_handle; + next = strchrnul(p, ':'); + if (next - p) { + handle = malloc(next - p + 1); + strncpy(handle, p, next - p); + handle[next - p] = '\0'; /* just in case. */ + } + if (*next == '\0') + p = next; + else + p = next + 1; + } + /* parse optional create flag to see if we should create shm seg. */ + if (*p && strcasestr(p, "create")) { + create = 1; + p += strlen("create"); + } + if (*p != '\0') + die("shmem: unexpected trailing chars\n"); + if (handle == NULL) { + handle = malloc(strlen(default_handle) + 1); + strcpy(handle, default_handle); + } + if (verbose) { + pr_info("shmem: phys_addr = %lx", phys_addr); + pr_info("shmem: size = %lx", size); + pr_info("shmem: handle = %s", handle); + pr_info("shmem: create = %d", create); + } + struct shmem_info *si = malloc(sizeof(struct shmem_info)); + si->phys_addr = phys_addr; + si->size = size; + si->handle = handle; + si->create = create; + pci_shmem__register_mem(si); /* ownership of si, etc. passed on. */ + return 0; +} static const struct option options[] = { OPT_GROUP("Basic options:"), @@ -158,6 +286,10 @@ static const struct option options[] = { "A name for the guest"), OPT_INTEGER('c', "cpus", &nrcpus, "Number of CPUs"), OPT_U64('m', "mem", &ram_size, "Virtual machine memory size in MiB."), + OPT_CALLBACK('\0', "shmem", NULL, + "[pci:]<addr>:<size>[:handle=<handle>][:create]", + "Share host shmem with guest via pci device", + shmem_parser), OPT_CALLBACK('d', "disk", NULL, "image or rootfs_dir", "Disk image or rootfs directory", img_name_parser), OPT_BOOLEAN('\0', "balloon", &balloon, "Enable virtio balloon"), OPT_BOOLEAN('\0', "vnc", &vnc, "Enable VNC framebuffer"), @@ -695,6 +827,8 @@ int kvm_cmd_run(int argc, const char **a kbd__init(kvm); + pci_shmem__init(kvm); + if (vnc || sdl) fb = vesa__init(kvm); diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/hw/pci-shmem.c linux-kvm_pci_shmem/tools/kvm/hw/pci-shmem.c --- linux-kvm/tools/kvm/hw/pci-shmem.c 1969-12-31 16:00:00.000000000 -0800 +++ linux-kvm_pci_shmem/tools/kvm/hw/pci-shmem.c 2011-08-24 14:18:09.227234167 -0700 @@ -0,0 +1,59 @@ +#include "shmem-util.h" +#include "kvm/pci-shmem.h" +#include "kvm/virtio-pci-dev.h" +#include "kvm/irq.h" +#include "kvm/kvm.h" +#include "kvm/pci.h" +#include "kvm/util.h" + +static struct pci_device_header pci_shmem_pci_device = { + .vendor_id = PCI_VENDOR_ID_PCI_SHMEM, + .device_id = PCI_DEVICE_ID_PCI_SHMEM, + .header_type = PCI_HEADER_TYPE_NORMAL, + .class = 0xFF0000, /* misc pci device */ +}; + +static struct shmem_info *shmem_region; + +int pci_shmem__register_mem(struct shmem_info *si) +{ + if (shmem_region == NULL) { + shmem_region = si; + } else { + pr_warning("only single shmem currently avail. ignoring.\n"); + free(si); + } + return 0; +} + +int pci_shmem__init(struct kvm *kvm) +{ + u8 dev, line, pin; + char *mem; + int verbose = 0; + if (irq__register_device(PCI_DEVICE_ID_PCI_SHMEM, &dev, &pin, &line) + < 0) + return 0; + /* ignore irq stuff, just want bus info for now. */ + /* pci_shmem_pci_device.irq_pin = pin; */ + /* pci_shmem_pci_device.irq_line = line; */ + if (shmem_region == 0) { + if (verbose == 1) + pr_warning + ("pci_shmem_init: memory region not registered\n"); + return 0; + } + pci_shmem_pci_device.bar[0] = + shmem_region->phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY; + pci_shmem_pci_device.bar_size[0] = shmem_region->size; + pci__register(&pci_shmem_pci_device, dev); + + mem = + setup_shmem(shmem_region->handle, shmem_region->size, + shmem_region->create, verbose); + if (mem == NULL) + return 0; + kvm__register_mem(kvm, shmem_region->phys_addr, shmem_region->size, + mem); + return 1; +} diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/include/kvm/pci-shmem.h linux-kvm_pci_shmem/tools/kvm/include/kvm/pci-shmem.h --- linux-kvm/tools/kvm/include/kvm/pci-shmem.h 1969-12-31 16:00:00.000000000 -0800 +++ linux-kvm_pci_shmem/tools/kvm/include/kvm/pci-shmem.h 2011-08-13 15:43:01.067953711 -0700 @@ -0,0 +1,13 @@ +#ifndef KVM__PCI_SHMEM_H +#define KVM__PCI_SHMEM_H + +#include <linux/types.h> +#include <linux/list.h> + +struct kvm; +struct shmem_info; + +int pci_shmem__init(struct kvm *self); +int pci_shmem__register_mem(struct shmem_info *si); + +#endif diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/include/kvm/virtio-pci-dev.h linux-kvm_pci_shmem/tools/kvm/include/kvm/virtio-pci-dev.h --- linux-kvm/tools/kvm/include/kvm/virtio-pci-dev.h 2011-08-09 15:38:48.760120973 -0700 +++ linux-kvm_pci_shmem/tools/kvm/include/kvm/virtio-pci-dev.h 2011-08-18 10:06:12.171539230 -0700 @@ -15,10 +15,13 @@ #define PCI_DEVICE_ID_VIRTIO_BLN 0x1005 #define PCI_DEVICE_ID_VIRTIO_P9 0x1009 #define PCI_DEVICE_ID_VESA 0x2000 +#define PCI_DEVICE_ID_PCI_SHMEM 0x0001 #define PCI_VENDOR_ID_REDHAT_QUMRANET 0x1af4 +#define PCI_VENDOR_ID_PCI_SHMEM 0x0001 #define PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET 0x1af4 #define PCI_SUBSYSTEM_ID_VESA 0x0004 +#define PCI_SUBSYSTEM_ID_PCI_SHMEM 0x0001 #endif /* VIRTIO_PCI_DEV_H_ */ diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/include/shmem-util.h linux-kvm_pci_shmem/tools/kvm/include/shmem-util.h --- linux-kvm/tools/kvm/include/shmem-util.h 1969-12-31 16:00:00.000000000 -0800 +++ linux-kvm_pci_shmem/tools/kvm/include/shmem-util.h 2011-08-24 14:15:30.271780710 -0700 @@ -0,0 +1,27 @@ +#ifndef SHMEM_UTIL_H +#define SHMEM_UTIL_H + +#include <sys/mman.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <unistd.h> +#include <stdio.h> +#include <string.h> +#include <stdint.h> +#include <stdlib.h> + +#define SHMEM_DEFAULT_SIZE (16 << MB_SHIFT) +#define SHMEM_DEFAULT_ADDR (0xc8000000) +#define SHMEM_DEFAULT_HANDLE "/kvm_shmem" +struct shmem_info { + uint64_t phys_addr; + uint64_t size; + char *handle; + int create; +}; + +inline void *setup_shmem(const char *key, size_t len, int creating, + int verbose); +inline void fill_mem(void *buf, size_t buf_size, char *fill, size_t fill_len); + +#endif /* SHMEM_UTIL_H */ diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/Makefile linux-kvm_pci_shmem/tools/kvm/Makefile --- linux-kvm/tools/kvm/Makefile 2011-08-24 10:21:22.342077676 -0700 +++ linux-kvm_pci_shmem/tools/kvm/Makefile 2011-08-24 13:55:37.442003451 -0700 @@ -77,10 +77,12 @@ OBJS += threadpool.o OBJS += util/parse-options.o OBJS += util/rbtree-interval.o OBJS += util/strbuf.o +OBJS += util/shmem-util.o OBJS += virtio/9p.o OBJS += virtio/9p-pdu.o OBJS += hw/vesa.o OBJS += hw/i8042.o +OBJS += hw/pci-shmem.o FLAGS_BFD := $(CFLAGS) -lbfd has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD)) diff -uprN -X linux-kvm/Documentation/dontdiff linux-kvm/tools/kvm/util/shmem-util.c linux-kvm_pci_shmem/tools/kvm/util/shmem-util.c --- linux-kvm/tools/kvm/util/shmem-util.c 1969-12-31 16:00:00.000000000 -0800 +++ linux-kvm_pci_shmem/tools/kvm/util/shmem-util.c 2011-08-24 14:19:31.801027897 -0700 @@ -0,0 +1,64 @@ +#include <sys/mman.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <unistd.h> +#include <stdio.h> +#include <string.h> +#include "shmem-util.h" + +inline void *setup_shmem(const char *key, size_t len, int creating, int verbose) +{ + int fd; + int rtn; + void *mem; + int flag = O_RDWR; + if (creating) + flag |= O_CREAT; + fd = shm_open(key, flag, S_IRUSR | S_IWUSR); + if (fd == -1) { + fprintf(stderr, "Failed to open shared memory file %s\n", key); + fflush(stderr); + return NULL; + } + if (creating) { + if (verbose) + fprintf(stderr, "Truncating file.\n"); + rtn = ftruncate(fd, (off_t) len); + if (rtn == -1) { + fprintf(stderr, "Can't ftruncate(fd,%ld)\n", len); + fflush(stderr); + } + } + mem = mmap(NULL, len, + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, 0); + close(fd); + if (mem == NULL) { + fprintf(stderr, "Failed to mmap shared memory file\n"); + fflush(stderr); + return NULL; + } + if (creating) { + int fill_bytes = 0xae0dae0d; + if (verbose) + fprintf(stderr, "Filling buffer\n"); + fill_mem(mem, len, (char *)&fill_bytes, 4); + } + return mem; +} + +inline void fill_mem(void *buf, size_t buf_size, char *fill, size_t fill_len) +{ + size_t i; + + if (fill_len == 1) { + memset(buf, fill[0], buf_size); + } else { + if (buf_size > fill_len) { + for (i = 0; i < buf_size - fill_len; i += fill_len) + memcpy(((char *)buf) + i, fill, fill_len); + memcpy(buf + i, fill, buf_size - i); + } else { + memcpy(buf, fill, buf_size); + } + } +}
This patch adds a PCI device that provides PCI device memory to the guest. This memory in the guest exists as a shared memory segment in the host. This is similar memory sharing capability of Nahanni (ivshmem) available in QEMU. In this case, the shared memory segment is exposed as a PCI BAR only. A new command line argument is added as: --shmem pci:0xc8000000:16MB:handle=/newmem:create which will set the PCI BAR at 0xc8000000, the shared memory segment and the region pointed to by the BAR will be 16MB. On the host side the shm_open handle will be '/newmem', and the kvm tool will create the shared segment, set its size, and initialize it. If the size, handle, or create flag are absent, they will default to 16MB, handle=/kvm_shmem, and create will be false. The address family, 'pci:' is also optional as it is the only address family currently supported. Only a single --shmem is supported at this time. Signed-off-by: David Evensky <evensky@sandia.gov> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html