mbox series

[RFC,0/3] VirtIO RDMA

Message ID 20190411110157.14252-1-yuval.shaia@oracle.com (mailing list archive)
Headers show
Series VirtIO RDMA | expand

Message

Yuval Shaia April 11, 2019, 11:01 a.m. UTC
Data center backends use more and more RDMA or RoCE devices and more and
more software runs in virtualized environment.
There is a need for a standard to enable RDMA/RoCE on Virtual Machines.

Virtio is the optimal solution since is the de-facto para-virtualizaton
technology and also because the Virtio specification
allows Hardware Vendors to support Virtio protocol natively in order to
achieve bare metal performance.

This RFC is an effort to addresses challenges in defining the RDMA/RoCE
Virtio Specification and a look forward on possible implementation
techniques.

Open issues/Todo list:
List is huge, this is only start point of the project.
Anyway, here is one example of item in the list:
- Multi VirtQ: Every QP has two rings and every CQ has one. This means that
  in order to support for example 32K QPs we will need 64K VirtQ. Not sure
  that this is reasonable so one option is to have one for all and
  multiplex the traffic on it. This is not good approach as by design it
  introducing an optional starvation. Another approach would be multi
  queues and round-robin (for example) between them.

Expectations from this posting:
In general, any comment is welcome, starting from hey, drop this as it is a
very bad idea, to yeah, go ahead, we really want it.
Idea here is that since it is not a minor effort i first want to know if
there is some sort interest in the community for such device.

The scope of the implementation is limited to probing the device and doing
some basic ibverbs commands. Data-path is not yet implemented. So with this
one can expect only that driver is (partialy) loaded and basic queries and
resource allocation is done.

One note regarding the patchset.
I know it is not standard to collaps patches from several repos as i did
here (qemu and linux) but decided to do it anyway so the whole picture can
be seen.

patch 1: virtio-net: Move some virtio-net-pci decl to include/hw/virtio
	This is a prelimenary patch just as a hack so i will not need to
	impelement new netdev
patch 2: hw/virtio-rdma: VirtIO rdma device
	The implementation of the device
patch 3: RDMA/virtio-rdma: VirtIO rdma driver
	The device driver

Comments

Cornelia Huck April 11, 2019, 5:02 p.m. UTC | #1
On Thu, 11 Apr 2019 14:01:54 +0300
Yuval Shaia <yuval.shaia@oracle.com> wrote:

> Data center backends use more and more RDMA or RoCE devices and more and
> more software runs in virtualized environment.
> There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> 
> Virtio is the optimal solution since is the de-facto para-virtualizaton
> technology and also because the Virtio specification
> allows Hardware Vendors to support Virtio protocol natively in order to
> achieve bare metal performance.
> 
> This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> Virtio Specification and a look forward on possible implementation
> techniques.
> 
> Open issues/Todo list:
> List is huge, this is only start point of the project.
> Anyway, here is one example of item in the list:
> - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
>   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
>   that this is reasonable so one option is to have one for all and
>   multiplex the traffic on it. This is not good approach as by design it
>   introducing an optional starvation. Another approach would be multi
>   queues and round-robin (for example) between them.
> 
> Expectations from this posting:
> In general, any comment is welcome, starting from hey, drop this as it is a
> very bad idea, to yeah, go ahead, we really want it.
> Idea here is that since it is not a minor effort i first want to know if
> there is some sort interest in the community for such device.

My first reaction is: Sounds sensible, but it would be good to have a
spec for this :)

You'll need a spec if you want this to go forward anyway, so at least a
sketch would be good to answer questions such as how many virtqueues
you use for which purpose, what is actually put on the virtqueues,
whether there are negotiable features, and what the expectations for
the device and the driver are. It also makes it easier to understand
how this is supposed to work in practice.

If folks agree that this sounds useful, the next step would be to
reserve an id for the device type.

> 
> The scope of the implementation is limited to probing the device and doing
> some basic ibverbs commands. Data-path is not yet implemented. So with this
> one can expect only that driver is (partialy) loaded and basic queries and
> resource allocation is done.
> 
> One note regarding the patchset.
> I know it is not standard to collaps patches from several repos as i did
> here (qemu and linux) but decided to do it anyway so the whole picture can
> be seen.
> 
> patch 1: virtio-net: Move some virtio-net-pci decl to include/hw/virtio
> 	This is a prelimenary patch just as a hack so i will not need to
> 	impelement new netdev
> patch 2: hw/virtio-rdma: VirtIO rdma device
> 	The implementation of the device
> patch 3: RDMA/virtio-rdma: VirtIO rdma driver
> 	The device driver
>
Jason Gunthorpe April 11, 2019, 5:24 p.m. UTC | #2
On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> On Thu, 11 Apr 2019 14:01:54 +0300
> Yuval Shaia <yuval.shaia@oracle.com> wrote:
> 
> > Data center backends use more and more RDMA or RoCE devices and more and
> > more software runs in virtualized environment.
> > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > 
> > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > technology and also because the Virtio specification
> > allows Hardware Vendors to support Virtio protocol natively in order to
> > achieve bare metal performance.
> > 
> > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > Virtio Specification and a look forward on possible implementation
> > techniques.
> > 
> > Open issues/Todo list:
> > List is huge, this is only start point of the project.
> > Anyway, here is one example of item in the list:
> > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> >   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> >   that this is reasonable so one option is to have one for all and
> >   multiplex the traffic on it. This is not good approach as by design it
> >   introducing an optional starvation. Another approach would be multi
> >   queues and round-robin (for example) between them.
> > 
> > Expectations from this posting:
> > In general, any comment is welcome, starting from hey, drop this as it is a
> > very bad idea, to yeah, go ahead, we really want it.
> > Idea here is that since it is not a minor effort i first want to know if
> > there is some sort interest in the community for such device.
> 
> My first reaction is: Sounds sensible, but it would be good to have a
> spec for this :)

I'm unclear why you'd want to have a virtio queue for anything other
that some kind of command channel.

I'm not sure a QP or CQ benefits from this??

Jason
Yuval Shaia April 11, 2019, 5:34 p.m. UTC | #3
On Thu, Apr 11, 2019 at 05:24:08PM +0000, Jason Gunthorpe wrote:
> On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > On Thu, 11 Apr 2019 14:01:54 +0300
> > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > 
> > > Data center backends use more and more RDMA or RoCE devices and more and
> > > more software runs in virtualized environment.
> > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > 
> > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > technology and also because the Virtio specification
> > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > achieve bare metal performance.
> > > 
> > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > Virtio Specification and a look forward on possible implementation
> > > techniques.
> > > 
> > > Open issues/Todo list:
> > > List is huge, this is only start point of the project.
> > > Anyway, here is one example of item in the list:
> > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > >   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > >   that this is reasonable so one option is to have one for all and
> > >   multiplex the traffic on it. This is not good approach as by design it
> > >   introducing an optional starvation. Another approach would be multi
> > >   queues and round-robin (for example) between them.
> > > 
> > > Expectations from this posting:
> > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > very bad idea, to yeah, go ahead, we really want it.
> > > Idea here is that since it is not a minor effort i first want to know if
> > > there is some sort interest in the community for such device.
> > 
> > My first reaction is: Sounds sensible, but it would be good to have a
> > spec for this :)
> 
> I'm unclear why you'd want to have a virtio queue for anything other
> that some kind of command channel.
> 
> I'm not sure a QP or CQ benefits from this??

Virtqueue is a standard mechanism to pass data from guest to host. By
saying that - it really sounds like QP send and recv rings. So my thought
is to use a standard way for rings. As i've learned this is how it is used
by other virtio devices ex virtio-net.

> 
> Jason
Jason Gunthorpe April 11, 2019, 5:40 p.m. UTC | #4
On Thu, Apr 11, 2019 at 08:34:20PM +0300, Yuval Shaia wrote:
> On Thu, Apr 11, 2019 at 05:24:08PM +0000, Jason Gunthorpe wrote:
> > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > 
> > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > more software runs in virtualized environment.
> > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > 
> > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > technology and also because the Virtio specification
> > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > achieve bare metal performance.
> > > > 
> > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > Virtio Specification and a look forward on possible implementation
> > > > techniques.
> > > > 
> > > > Open issues/Todo list:
> > > > List is huge, this is only start point of the project.
> > > > Anyway, here is one example of item in the list:
> > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > >   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > >   that this is reasonable so one option is to have one for all and
> > > >   multiplex the traffic on it. This is not good approach as by design it
> > > >   introducing an optional starvation. Another approach would be multi
> > > >   queues and round-robin (for example) between them.
> > > > 
> > > > Expectations from this posting:
> > > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > > very bad idea, to yeah, go ahead, we really want it.
> > > > Idea here is that since it is not a minor effort i first want to know if
> > > > there is some sort interest in the community for such device.
> > > 
> > > My first reaction is: Sounds sensible, but it would be good to have a
> > > spec for this :)
> > 
> > I'm unclear why you'd want to have a virtio queue for anything other
> > that some kind of command channel.
> > 
> > I'm not sure a QP or CQ benefits from this??
> 
> Virtqueue is a standard mechanism to pass data from guest to host. By
> saying that - it really sounds like QP send and recv rings. So my thought
> is to use a standard way for rings. As i've learned this is how it is used
> by other virtio devices ex virtio-net.

I doubt you can use virtio queues from userspace securely? Usually
needs a dedicated page for each user space process

Jason
Yuval Shaia April 11, 2019, 5:41 p.m. UTC | #5
On Thu, Apr 11, 2019 at 08:34:20PM +0300, Yuval Shaia wrote:
> On Thu, Apr 11, 2019 at 05:24:08PM +0000, Jason Gunthorpe wrote:
> > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > 
> > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > more software runs in virtualized environment.
> > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > 
> > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > technology and also because the Virtio specification
> > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > achieve bare metal performance.
> > > > 
> > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > Virtio Specification and a look forward on possible implementation
> > > > techniques.
> > > > 
> > > > Open issues/Todo list:
> > > > List is huge, this is only start point of the project.
> > > > Anyway, here is one example of item in the list:
> > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > >   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > >   that this is reasonable so one option is to have one for all and
> > > >   multiplex the traffic on it. This is not good approach as by design it
> > > >   introducing an optional starvation. Another approach would be multi
> > > >   queues and round-robin (for example) between them.
> > > > 
> > > > Expectations from this posting:
> > > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > > very bad idea, to yeah, go ahead, we really want it.
> > > > Idea here is that since it is not a minor effort i first want to know if
> > > > there is some sort interest in the community for such device.
> > > 
> > > My first reaction is: Sounds sensible, but it would be good to have a
> > > spec for this :)
> > 
> > I'm unclear why you'd want to have a virtio queue for anything other
> > that some kind of command channel.
> > 
> > I'm not sure a QP or CQ benefits from this??
> 
> Virtqueue is a standard mechanism to pass data from guest to host. By

And vice versa (CQ?)

> saying that - it really sounds like QP send and recv rings. So my thought
> is to use a standard way for rings. As i've learned this is how it is used
> by other virtio devices ex virtio-net.
> 
> > 
> > Jason
>
Devesh Sharma April 12, 2019, 9:51 a.m. UTC | #6
On Thu, Apr 11, 2019 at 11:11 PM Yuval Shaia <yuval.shaia@oracle.com> wrote:
>
> On Thu, Apr 11, 2019 at 08:34:20PM +0300, Yuval Shaia wrote:
> > On Thu, Apr 11, 2019 at 05:24:08PM +0000, Jason Gunthorpe wrote:
> > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > >
> > > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > > more software runs in virtualized environment.
> > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > >
> > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > > technology and also because the Virtio specification
> > > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > > achieve bare metal performance.
> > > > >
> > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > > Virtio Specification and a look forward on possible implementation
> > > > > techniques.
> > > > >
> > > > > Open issues/Todo list:
> > > > > List is huge, this is only start point of the project.
> > > > > Anyway, here is one example of item in the list:
> > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > > >   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > > >   that this is reasonable so one option is to have one for all and
> > > > >   multiplex the traffic on it. This is not good approach as by design it
> > > > >   introducing an optional starvation. Another approach would be multi
> > > > >   queues and round-robin (for example) between them.
> > > > >
> > > > > Expectations from this posting:
> > > > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > > > very bad idea, to yeah, go ahead, we really want it.
> > > > > Idea here is that since it is not a minor effort i first want to know if
> > > > > there is some sort interest in the community for such device.
> > > >
> > > > My first reaction is: Sounds sensible, but it would be good to have a
> > > > spec for this :)
> > >
> > > I'm unclear why you'd want to have a virtio queue for anything other
> > > that some kind of command channel.
> > >
> > > I'm not sure a QP or CQ benefits from this??
> >
> > Virtqueue is a standard mechanism to pass data from guest to host. By
>
> And vice versa (CQ?)
>
> > saying that - it really sounds like QP send and recv rings. So my thought
> > is to use a standard way for rings. As i've learned this is how it is used
> > by other virtio devices ex virtio-net.
> >
> > >
> > > Jason
> >
I would like to ask more basic question, how virtio queue will glue
with actual h/w qps? I may be to naive though.

-Regards
Devesh
Yuval Shaia April 15, 2019, 10:04 a.m. UTC | #7
On Thu, Apr 11, 2019 at 05:40:26PM +0000, Jason Gunthorpe wrote:
> On Thu, Apr 11, 2019 at 08:34:20PM +0300, Yuval Shaia wrote:
> > On Thu, Apr 11, 2019 at 05:24:08PM +0000, Jason Gunthorpe wrote:
> > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > > 
> > > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > > more software runs in virtualized environment.
> > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > > 
> > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > > technology and also because the Virtio specification
> > > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > > achieve bare metal performance.
> > > > > 
> > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > > Virtio Specification and a look forward on possible implementation
> > > > > techniques.
> > > > > 
> > > > > Open issues/Todo list:
> > > > > List is huge, this is only start point of the project.
> > > > > Anyway, here is one example of item in the list:
> > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > > >   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > > >   that this is reasonable so one option is to have one for all and
> > > > >   multiplex the traffic on it. This is not good approach as by design it
> > > > >   introducing an optional starvation. Another approach would be multi
> > > > >   queues and round-robin (for example) between them.
> > > > > 
> > > > > Expectations from this posting:
> > > > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > > > very bad idea, to yeah, go ahead, we really want it.
> > > > > Idea here is that since it is not a minor effort i first want to know if
> > > > > there is some sort interest in the community for such device.
> > > > 
> > > > My first reaction is: Sounds sensible, but it would be good to have a
> > > > spec for this :)
> > > 
> > > I'm unclear why you'd want to have a virtio queue for anything other
> > > that some kind of command channel.
> > > 
> > > I'm not sure a QP or CQ benefits from this??
> > 
> > Virtqueue is a standard mechanism to pass data from guest to host. By
> > saying that - it really sounds like QP send and recv rings. So my thought
> > is to use a standard way for rings. As i've learned this is how it is used
> > by other virtio devices ex virtio-net.
> 
> I doubt you can use virtio queues from userspace securely? Usually
> needs a dedicated page for each user space process

Not yet started to do any work on datapath, i guess you are right but need
further work on this area.
Thanks for raising the concern.

As i said, there are many open issues at this stage.

> 
> Jason
Yuval Shaia April 15, 2019, 10:27 a.m. UTC | #8
On Fri, Apr 12, 2019 at 03:21:56PM +0530, Devesh Sharma wrote:
> On Thu, Apr 11, 2019 at 11:11 PM Yuval Shaia <yuval.shaia@oracle.com> wrote:
> >
> > On Thu, Apr 11, 2019 at 08:34:20PM +0300, Yuval Shaia wrote:
> > > On Thu, Apr 11, 2019 at 05:24:08PM +0000, Jason Gunthorpe wrote:
> > > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > > >
> > > > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > > > more software runs in virtualized environment.
> > > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > > >
> > > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > > > technology and also because the Virtio specification
> > > > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > > > achieve bare metal performance.
> > > > > >
> > > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > > > Virtio Specification and a look forward on possible implementation
> > > > > > techniques.
> > > > > >
> > > > > > Open issues/Todo list:
> > > > > > List is huge, this is only start point of the project.
> > > > > > Anyway, here is one example of item in the list:
> > > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > > > >   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > > > >   that this is reasonable so one option is to have one for all and
> > > > > >   multiplex the traffic on it. This is not good approach as by design it
> > > > > >   introducing an optional starvation. Another approach would be multi
> > > > > >   queues and round-robin (for example) between them.
> > > > > >
> > > > > > Expectations from this posting:
> > > > > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > > > > very bad idea, to yeah, go ahead, we really want it.
> > > > > > Idea here is that since it is not a minor effort i first want to know if
> > > > > > there is some sort interest in the community for such device.
> > > > >
> > > > > My first reaction is: Sounds sensible, but it would be good to have a
> > > > > spec for this :)
> > > >
> > > > I'm unclear why you'd want to have a virtio queue for anything other
> > > > that some kind of command channel.
> > > >
> > > > I'm not sure a QP or CQ benefits from this??
> > >
> > > Virtqueue is a standard mechanism to pass data from guest to host. By
> >
> > And vice versa (CQ?)
> >
> > > saying that - it really sounds like QP send and recv rings. So my thought
> > > is to use a standard way for rings. As i've learned this is how it is used
> > > by other virtio devices ex virtio-net.
> > >
> > > >
> > > > Jason
> > >
> I would like to ask more basic question, how virtio queue will glue
> with actual h/w qps? I may be to naive though.

Have to admit - I have no idea.
This work is based on emulated device so i'm my case - the emulated device
is creating the virtqueue. I guess that HW device will create a QP and
expose a virtqueue interface to it.
The same driver should serve both the SW and HW devices.

One of the objectives of this RFC is to collaborate an effort and
implementation notes/ideas from HW vendors.

> 
> -Regards
> Devesh
Yuval Shaia April 15, 2019, 10:35 a.m. UTC | #9
On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> On Thu, 11 Apr 2019 14:01:54 +0300
> Yuval Shaia <yuval.shaia@oracle.com> wrote:
> 
> > Data center backends use more and more RDMA or RoCE devices and more and
> > more software runs in virtualized environment.
> > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > 
> > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > technology and also because the Virtio specification
> > allows Hardware Vendors to support Virtio protocol natively in order to
> > achieve bare metal performance.
> > 
> > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > Virtio Specification and a look forward on possible implementation
> > techniques.
> > 
> > Open issues/Todo list:
> > List is huge, this is only start point of the project.
> > Anyway, here is one example of item in the list:
> > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> >   in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> >   that this is reasonable so one option is to have one for all and
> >   multiplex the traffic on it. This is not good approach as by design it
> >   introducing an optional starvation. Another approach would be multi
> >   queues and round-robin (for example) between them.
> > 
> > Expectations from this posting:
> > In general, any comment is welcome, starting from hey, drop this as it is a
> > very bad idea, to yeah, go ahead, we really want it.
> > Idea here is that since it is not a minor effort i first want to know if
> > there is some sort interest in the community for such device.
> 
> My first reaction is: Sounds sensible, but it would be good to have a
> spec for this :)
> 
> You'll need a spec if you want this to go forward anyway, so at least a
> sketch would be good to answer questions such as how many virtqueues
> you use for which purpose, what is actually put on the virtqueues,
> whether there are negotiable features, and what the expectations for
> the device and the driver are. It also makes it easier to understand
> how this is supposed to work in practice.
> 
> If folks agree that this sounds useful, the next step would be to
> reserve an id for the device type.

Thanks for the tips, will sure do that, it is that first i wanted to make
sure there is a use case here.

Waiting for any feedback from the community.

> 
> > 
> > The scope of the implementation is limited to probing the device and doing
> > some basic ibverbs commands. Data-path is not yet implemented. So with this
> > one can expect only that driver is (partialy) loaded and basic queries and
> > resource allocation is done.
> > 
> > One note regarding the patchset.
> > I know it is not standard to collaps patches from several repos as i did
> > here (qemu and linux) but decided to do it anyway so the whole picture can
> > be seen.
> > 
> > patch 1: virtio-net: Move some virtio-net-pci decl to include/hw/virtio
> > 	This is a prelimenary patch just as a hack so i will not need to
> > 	impelement new netdev
> > patch 2: hw/virtio-rdma: VirtIO rdma device
> > 	The implementation of the device
> > patch 3: RDMA/virtio-rdma: VirtIO rdma driver
> > 	The device driver
> > 
>
Hannes Reinecke April 19, 2019, 11:16 a.m. UTC | #10
On 4/15/19 12:35 PM, Yuval Shaia wrote:
> On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
>> On Thu, 11 Apr 2019 14:01:54 +0300
>> Yuval Shaia <yuval.shaia@oracle.com> wrote:
>>
>>> Data center backends use more and more RDMA or RoCE devices and more and
>>> more software runs in virtualized environment.
>>> There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
>>>
>>> Virtio is the optimal solution since is the de-facto para-virtualizaton
>>> technology and also because the Virtio specification
>>> allows Hardware Vendors to support Virtio protocol natively in order to
>>> achieve bare metal performance.
>>>
>>> This RFC is an effort to addresses challenges in defining the RDMA/RoCE
>>> Virtio Specification and a look forward on possible implementation
>>> techniques.
>>>
>>> Open issues/Todo list:
>>> List is huge, this is only start point of the project.
>>> Anyway, here is one example of item in the list:
>>> - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
>>>    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
>>>    that this is reasonable so one option is to have one for all and
>>>    multiplex the traffic on it. This is not good approach as by design it
>>>    introducing an optional starvation. Another approach would be multi
>>>    queues and round-robin (for example) between them.
>>>
Typically there will be a one-to-one mapping between QPs and CPUs (on 
the guest). So while one would need to be prepared to support quite some 
QPs, the expectation is that the actual number of QPs used will be 
rather low.
In a similar vein, multiplexing QPs would be defeating the purpose, as 
the overall idea was to have _independent_ QPs to enhance parallelism.

>>> Expectations from this posting:
>>> In general, any comment is welcome, starting from hey, drop this as it is a
>>> very bad idea, to yeah, go ahead, we really want it.
>>> Idea here is that since it is not a minor effort i first want to know if
>>> there is some sort interest in the community for such device.
>>
>> My first reaction is: Sounds sensible, but it would be good to have a
>> spec for this :)
>>
>> You'll need a spec if you want this to go forward anyway, so at least a
>> sketch would be good to answer questions such as how many virtqueues
>> you use for which purpose, what is actually put on the virtqueues,
>> whether there are negotiable features, and what the expectations for
>> the device and the driver are. It also makes it easier to understand
>> how this is supposed to work in practice.
>>
>> If folks agree that this sounds useful, the next step would be to
>> reserve an id for the device type.
> 
> Thanks for the tips, will sure do that, it is that first i wanted to make
> sure there is a use case here.
> 
> Waiting for any feedback from the community.
> 
I really do like the ides; in fact, it saved me from coding a similar 
thing myself :-)

However, I'm still curious about the overall intent of this driver. 
Where would the I/O be routed _to_ ?
It's nice that we have a virtualized driver, but this driver is
intended to do I/O (even if it doesn't _do_ any I/O ATM :-)
And this I/O needs to be send to (and possibly received from)
something.

So what exactly is this something?
An existing piece of HW on the host?
If so, wouldn't it be more efficient to use vfio, either by using SR-IOV 
or by using virtio-mdev?

Another guest?
If so, how would we route the I/O from one guest to the other?
Shared memory? Implementing a full-blown RDMA switch in qemu?

Oh, and I would _love_ to have a discussion about this at KVM Forum.
Maybe I'll manage to whip up guest-to-guest RDMA connection using 
ivshmem ... let's see.

Cheers,

Hannes
Leon Romanovsky April 22, 2019, 6 a.m. UTC | #11
On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote:
> On 4/15/19 12:35 PM, Yuval Shaia wrote:
> > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > >
> > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > more software runs in virtualized environment.
> > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > >
> > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > technology and also because the Virtio specification
> > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > achieve bare metal performance.
> > > >
> > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > Virtio Specification and a look forward on possible implementation
> > > > techniques.
> > > >
> > > > Open issues/Todo list:
> > > > List is huge, this is only start point of the project.
> > > > Anyway, here is one example of item in the list:
> > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > >    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > >    that this is reasonable so one option is to have one for all and
> > > >    multiplex the traffic on it. This is not good approach as by design it
> > > >    introducing an optional starvation. Another approach would be multi
> > > >    queues and round-robin (for example) between them.
> > > >
> Typically there will be a one-to-one mapping between QPs and CPUs (on the
> guest). So while one would need to be prepared to support quite some QPs,
> the expectation is that the actual number of QPs used will be rather low.
> In a similar vein, multiplexing QPs would be defeating the purpose, as the
> overall idea was to have _independent_ QPs to enhance parallelism.
>
> > > > Expectations from this posting:
> > > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > > very bad idea, to yeah, go ahead, we really want it.
> > > > Idea here is that since it is not a minor effort i first want to know if
> > > > there is some sort interest in the community for such device.
> > >
> > > My first reaction is: Sounds sensible, but it would be good to have a
> > > spec for this :)
> > >
> > > You'll need a spec if you want this to go forward anyway, so at least a
> > > sketch would be good to answer questions such as how many virtqueues
> > > you use for which purpose, what is actually put on the virtqueues,
> > > whether there are negotiable features, and what the expectations for
> > > the device and the driver are. It also makes it easier to understand
> > > how this is supposed to work in practice.
> > >
> > > If folks agree that this sounds useful, the next step would be to
> > > reserve an id for the device type.
> >
> > Thanks for the tips, will sure do that, it is that first i wanted to make
> > sure there is a use case here.
> >
> > Waiting for any feedback from the community.
> >
> I really do like the ides; in fact, it saved me from coding a similar thing
> myself :-)
>
> However, I'm still curious about the overall intent of this driver. Where
> would the I/O be routed _to_ ?
> It's nice that we have a virtualized driver, but this driver is
> intended to do I/O (even if it doesn't _do_ any I/O ATM :-)
> And this I/O needs to be send to (and possibly received from)
> something.
>
> So what exactly is this something?
> An existing piece of HW on the host?
> If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or
> by using virtio-mdev?
>
> Another guest?
> If so, how would we route the I/O from one guest to the other?
> Shared memory? Implementing a full-blown RDMA switch in qemu?
>
> Oh, and I would _love_ to have a discussion about this at KVM Forum.
> Maybe I'll manage to whip up guest-to-guest RDMA connection using ivshmem
> ... let's see.

Following success in previous years to transfer ideas into code,
we started to prepare RDMA miniconference in LPC 2019, which will
be co-located with Kernel Summit and networking track.

I'm confident that such broad audience of kernel developers
will be good fit for such discussion.

Previous years:
2016: https://www.spinics.net/lists/linux-rdma/msg43074.html
2017: https://lwn.net/Articles/734163/
2018: It was so full in audience and intensive that I failed to
summarize it :(

Thanks

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke            Teamlead Storage & Networking
> hare@suse.de                              +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N??rnberg
> GF: Felix Imend??rffer, Mary Higgins, Sri Rasiah
> HRB 21284 (AG N??rnberg)
Jason Gunthorpe April 22, 2019, 4:45 p.m. UTC | #12
On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote:
> On 4/15/19 12:35 PM, Yuval Shaia wrote:
> > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > 
> > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > more software runs in virtualized environment.
> > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > 
> > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > technology and also because the Virtio specification
> > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > achieve bare metal performance.
> > > > 
> > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > Virtio Specification and a look forward on possible implementation
> > > > techniques.
> > > > 
> > > > Open issues/Todo list:
> > > > List is huge, this is only start point of the project.
> > > > Anyway, here is one example of item in the list:
> > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > >    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > >    that this is reasonable so one option is to have one for all and
> > > >    multiplex the traffic on it. This is not good approach as by design it
> > > >    introducing an optional starvation. Another approach would be multi
> > > >    queues and round-robin (for example) between them.
> > > > 
> Typically there will be a one-to-one mapping between QPs and CPUs (on the
> guest). 

Er we are really overloading words here.. The typical expectation is
that a 'RDMA QP' will have thousands and thousands of instances on a
system.

Most likely I think mapping 1:1 a virtio queue to a 'RDMA QP, CQ, SRQ,
etc' is a bad idea...

> However, I'm still curious about the overall intent of this driver. Where
> would the I/O be routed _to_ ?
> It's nice that we have a virtualized driver, but this driver is
> intended to do I/O (even if it doesn't _do_ any I/O ATM :-)
> And this I/O needs to be send to (and possibly received from)
> something.

As yet I have never heard of public RDMA HW that could be coupled to a
virtio scheme. All HW defines their own queue ring buffer formats
without standardization.

> If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or
> by using virtio-mdev?

Using PCI pass through means the guest has to have drivers for the
device. A generic, perhaps slower, virtio path has some appeal in some
cases.

> If so, how would we route the I/O from one guest to the other?
> Shared memory? Implementing a full-blown RDMA switch in qemu?

RoCE rides over the existing ethernet switching layer quemu plugs
into

So if you built a shared memory, local host only, virtio-rdma then
you'd probably run through the ethernet switch upon connection
establishment to match the participating VMs.

Jason
Yuval Shaia April 30, 2019, 12:16 p.m. UTC | #13
On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote:
> On 4/15/19 12:35 PM, Yuval Shaia wrote:
> > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > 
> > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > more software runs in virtualized environment.
> > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > 
> > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > technology and also because the Virtio specification
> > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > achieve bare metal performance.
> > > > 
> > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > Virtio Specification and a look forward on possible implementation
> > > > techniques.
> > > > 
> > > > Open issues/Todo list:
> > > > List is huge, this is only start point of the project.
> > > > Anyway, here is one example of item in the list:
> > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > >    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > >    that this is reasonable so one option is to have one for all and
> > > >    multiplex the traffic on it. This is not good approach as by design it
> > > >    introducing an optional starvation. Another approach would be multi
> > > >    queues and round-robin (for example) between them.
> > > > 
> Typically there will be a one-to-one mapping between QPs and CPUs (on the
> guest). So while one would need to be prepared to support quite some QPs,
> the expectation is that the actual number of QPs used will be rather low.
> In a similar vein, multiplexing QPs would be defeating the purpose, as the
> overall idea was to have _independent_ QPs to enhance parallelism.

Since Jason already addresses the issue then i'll skip it.

> 
> > > > Expectations from this posting:
> > > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > > very bad idea, to yeah, go ahead, we really want it.
> > > > Idea here is that since it is not a minor effort i first want to know if
> > > > there is some sort interest in the community for such device.
> > > 
> > > My first reaction is: Sounds sensible, but it would be good to have a
> > > spec for this :)
> > > 
> > > You'll need a spec if you want this to go forward anyway, so at least a
> > > sketch would be good to answer questions such as how many virtqueues
> > > you use for which purpose, what is actually put on the virtqueues,
> > > whether there are negotiable features, and what the expectations for
> > > the device and the driver are. It also makes it easier to understand
> > > how this is supposed to work in practice.
> > > 
> > > If folks agree that this sounds useful, the next step would be to
> > > reserve an id for the device type.
> > 
> > Thanks for the tips, will sure do that, it is that first i wanted to make
> > sure there is a use case here.
> > 
> > Waiting for any feedback from the community.
> > 
> I really do like the ides; in fact, it saved me from coding a similar thing
> myself :-)

Isn't it the great thing with open source :-)

> 
> However, I'm still curious about the overall intent of this driver. Where
> would the I/O be routed _to_ ?
> It's nice that we have a virtualized driver, but this driver is
> intended to do I/O (even if it doesn't _do_ any I/O ATM :-)
> And this I/O needs to be send to (and possibly received from)
> something.

Idea is to have a virtio-rdma device emulation (patch #2) on host that will
relay the traffic to the real HW on host.

It will be good to have design that will allow Virtio-HW to be plugged to
the host and use the same driver. In this case the emulated device would
not be needed - the driver will "attach" to the Virtqueue exposed by the
virtio-HW instead of the emulated RDMA device.

I don't know of any public virtio-rdma HW.

> 
> So what exactly is this something?
> An existing piece of HW on the host?
> If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or
> by using virtio-mdev?

vfio needs to be implemented by every HW vendor where this approach is a
generic one that is not depended on the HW.

SV-IOV has it's limitations.

And with virtio-mdev, sorry but do not know, can you elaborate more?

> 
> Another guest?

No

> If so, how would we route the I/O from one guest to the other?
> Shared memory? Implementing a full-blown RDMA switch in qemu?
> 
> Oh, and I would _love_ to have a discussion about this at KVM Forum.
> Maybe I'll manage to whip up guest-to-guest RDMA connection using ivshmem
> ... let's see.

Well, I've posted a proposal for a talk, lets see if it'll be accepted.

> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke            Teamlead Storage & Networking
> hare@suse.de                              +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
> HRB 21284 (AG Nürnberg)
Yuval Shaia April 30, 2019, 5:13 p.m. UTC | #14
On Mon, Apr 22, 2019 at 01:45:27PM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote:
> > On 4/15/19 12:35 PM, Yuval Shaia wrote:
> > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > > 
> > > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > > more software runs in virtualized environment.
> > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > > 
> > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > > technology and also because the Virtio specification
> > > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > > achieve bare metal performance.
> > > > > 
> > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > > Virtio Specification and a look forward on possible implementation
> > > > > techniques.
> > > > > 
> > > > > Open issues/Todo list:
> > > > > List is huge, this is only start point of the project.
> > > > > Anyway, here is one example of item in the list:
> > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > > >    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > > >    that this is reasonable so one option is to have one for all and
> > > > >    multiplex the traffic on it. This is not good approach as by design it
> > > > >    introducing an optional starvation. Another approach would be multi
> > > > >    queues and round-robin (for example) between them.
> > > > > 
> > Typically there will be a one-to-one mapping between QPs and CPUs (on the
> > guest). 
> 
> Er we are really overloading words here.. The typical expectation is
> that a 'RDMA QP' will have thousands and thousands of instances on a
> system.
> 
> Most likely I think mapping 1:1 a virtio queue to a 'RDMA QP, CQ, SRQ,
> etc' is a bad idea...

We have three options, no virtqueue for QP, 1 to 1 or multiplexing. What
would be your vote on that?
I think you are for option #1, right? but in this case there is actually no
use of having a virtio-driver, isn't it?

> 
> > However, I'm still curious about the overall intent of this driver. Where
> > would the I/O be routed _to_ ?
> > It's nice that we have a virtualized driver, but this driver is
> > intended to do I/O (even if it doesn't _do_ any I/O ATM :-)
> > And this I/O needs to be send to (and possibly received from)
> > something.
> 
> As yet I have never heard of public RDMA HW that could be coupled to a
> virtio scheme. All HW defines their own queue ring buffer formats
> without standardization.

With virtio it is the time to have a standard, do you agree?

> 
> > If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or
> > by using virtio-mdev?
> 
> Using PCI pass through means the guest has to have drivers for the
> device. A generic, perhaps slower, virtio path has some appeal in some
> cases.

From experience we have with other emulated device the gap is getting lower
as the message size getting higher. So for example with message of size 2M
the emulated device gives close to line rate performances.

> 
> > If so, how would we route the I/O from one guest to the other?
> > Shared memory? Implementing a full-blown RDMA switch in qemu?
> 
> RoCE rides over the existing ethernet switching layer quemu plugs
> into
> 
> So if you built a shared memory, local host only, virtio-rdma then
> you'd probably run through the ethernet switch upon connection
> establishment to match the participating VMs.

Or you may use an enhanced rxe device, which bypass the Ethernet and
perform fast copy, as backend device for the virtio-rdma emulated device.

> 
> Jason
Yuval Shaia April 30, 2019, 5:16 p.m. UTC | #15
On Mon, Apr 22, 2019 at 09:00:34AM +0300, Leon Romanovsky wrote:
> On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote:
> > On 4/15/19 12:35 PM, Yuval Shaia wrote:
> > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > >
> > > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > > more software runs in virtualized environment.
> > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > >
> > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > > technology and also because the Virtio specification
> > > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > > achieve bare metal performance.
> > > > >
> > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > > Virtio Specification and a look forward on possible implementation
> > > > > techniques.
> > > > >
> > > > > Open issues/Todo list:
> > > > > List is huge, this is only start point of the project.
> > > > > Anyway, here is one example of item in the list:
> > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > > >    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > > >    that this is reasonable so one option is to have one for all and
> > > > >    multiplex the traffic on it. This is not good approach as by design it
> > > > >    introducing an optional starvation. Another approach would be multi
> > > > >    queues and round-robin (for example) between them.
> > > > >
> > Typically there will be a one-to-one mapping between QPs and CPUs (on the
> > guest). So while one would need to be prepared to support quite some QPs,
> > the expectation is that the actual number of QPs used will be rather low.
> > In a similar vein, multiplexing QPs would be defeating the purpose, as the
> > overall idea was to have _independent_ QPs to enhance parallelism.
> >
> > > > > Expectations from this posting:
> > > > > In general, any comment is welcome, starting from hey, drop this as it is a
> > > > > very bad idea, to yeah, go ahead, we really want it.
> > > > > Idea here is that since it is not a minor effort i first want to know if
> > > > > there is some sort interest in the community for such device.
> > > >
> > > > My first reaction is: Sounds sensible, but it would be good to have a
> > > > spec for this :)
> > > >
> > > > You'll need a spec if you want this to go forward anyway, so at least a
> > > > sketch would be good to answer questions such as how many virtqueues
> > > > you use for which purpose, what is actually put on the virtqueues,
> > > > whether there are negotiable features, and what the expectations for
> > > > the device and the driver are. It also makes it easier to understand
> > > > how this is supposed to work in practice.
> > > >
> > > > If folks agree that this sounds useful, the next step would be to
> > > > reserve an id for the device type.
> > >
> > > Thanks for the tips, will sure do that, it is that first i wanted to make
> > > sure there is a use case here.
> > >
> > > Waiting for any feedback from the community.
> > >
> > I really do like the ides; in fact, it saved me from coding a similar thing
> > myself :-)
> >
> > However, I'm still curious about the overall intent of this driver. Where
> > would the I/O be routed _to_ ?
> > It's nice that we have a virtualized driver, but this driver is
> > intended to do I/O (even if it doesn't _do_ any I/O ATM :-)
> > And this I/O needs to be send to (and possibly received from)
> > something.
> >
> > So what exactly is this something?
> > An existing piece of HW on the host?
> > If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or
> > by using virtio-mdev?
> >
> > Another guest?
> > If so, how would we route the I/O from one guest to the other?
> > Shared memory? Implementing a full-blown RDMA switch in qemu?
> >
> > Oh, and I would _love_ to have a discussion about this at KVM Forum.
> > Maybe I'll manage to whip up guest-to-guest RDMA connection using ivshmem
> > ... let's see.
> 
> Following success in previous years to transfer ideas into code,
> we started to prepare RDMA miniconference in LPC 2019, which will
> be co-located with Kernel Summit and networking track.
> 
> I'm confident that such broad audience of kernel developers
> will be good fit for such discussion.

Just posted a proposal for a talk at Linux Plumbers.

> 
> Previous years:
> 2016: https://www.spinics.net/lists/linux-rdma/msg43074.html
> 2017: https://lwn.net/Articles/734163/
> 2018: It was so full in audience and intensive that I failed to
> summarize it :(
> 
> Thanks
> 
> >
> > Cheers,
> >
> > Hannes
> > --
> > Dr. Hannes Reinecke            Teamlead Storage & Networking
> > hare@suse.de                              +49 911 74053 688
> > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N??rnberg
> > GF: Felix Imend??rffer, Mary Higgins, Sri Rasiah
> > HRB 21284 (AG N??rnberg)
Jason Gunthorpe May 7, 2019, 7:43 p.m. UTC | #16
On Tue, Apr 30, 2019 at 08:13:54PM +0300, Yuval Shaia wrote:
> On Mon, Apr 22, 2019 at 01:45:27PM -0300, Jason Gunthorpe wrote:
> > On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote:
> > > On 4/15/19 12:35 PM, Yuval Shaia wrote:
> > > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > > > 
> > > > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > > > more software runs in virtualized environment.
> > > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > > > 
> > > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > > > technology and also because the Virtio specification
> > > > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > > > achieve bare metal performance.
> > > > > > 
> > > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > > > Virtio Specification and a look forward on possible implementation
> > > > > > techniques.
> > > > > > 
> > > > > > Open issues/Todo list:
> > > > > > List is huge, this is only start point of the project.
> > > > > > Anyway, here is one example of item in the list:
> > > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > > > >    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > > > >    that this is reasonable so one option is to have one for all and
> > > > > >    multiplex the traffic on it. This is not good approach as by design it
> > > > > >    introducing an optional starvation. Another approach would be multi
> > > > > >    queues and round-robin (for example) between them.
> > > > > > 
> > > Typically there will be a one-to-one mapping between QPs and CPUs (on the
> > > guest). 
> > 
> > Er we are really overloading words here.. The typical expectation is
> > that a 'RDMA QP' will have thousands and thousands of instances on a
> > system.
> > 
> > Most likely I think mapping 1:1 a virtio queue to a 'RDMA QP, CQ, SRQ,
> > etc' is a bad idea...
> 
> We have three options, no virtqueue for QP, 1 to 1 or multiplexing. What
> would be your vote on that?
> I think you are for option #1, right? but in this case there is actually no
> use of having a virtio-driver, isn't it?

The virtio driver is supposed to be a standard, like a hardware
standard, for doing the operation.

It doesn't mean that every single element under the driver needs to
use the virtio format QP.

Jason