mbox series

[0/7] Introduce vdpa management tool

Message ID 20201112064005.349268-1-parav@nvidia.com (mailing list archive)
Headers show
Series Introduce vdpa management tool | expand

Message

Parav Pandit Nov. 12, 2020, 6:39 a.m. UTC
This patchset covers user requirements for managing existing vdpa devices,
using a tool and its internal design notes for kernel drivers.

Background and user requirements:
----------------------------------
(1) Currently VDPA device is created by driver when driver is loaded.
However, user should have a choice when to create or not create a vdpa device
for the underlying parent device.

For example, mlx5 PCI VF and subfunction device supports multiple classes of
device such netdev, vdpa, rdma. Howevever it is not required to always created
vdpa device for such device.

(2) In another use case, a device may support creating one or multiple vdpa
device of same or different class such as net and block.
Creating vdpa devices at driver load time further limits this use case.

(3) A user should be able to monitor and query vdpa queue level or device level
statistics for a given vdpa device.

(4) A user should be able to query what class of vdpa devices are supported
by its parent device.

(5) A user should be able to view supported features and negotiated features
of the vdpa device.

(6) A user should be able to create a vdpa device in vendor agnostic manner
using single tool.

Hence, it is required to have a tool through which user can create one or more
vdpa devices from a parent device which addresses above user requirements.

Example devices:
----------------
 +-----------+ +-----------+ +---------+ +--------+ +-----------+ 
 |vdpa dev 0 | |vdpa dev 1 | |rdma dev | |netdev  | |vdpa dev 3 |
 |type=net   | |type=block | |mlx5_0   | |ens3f0  | |type=net   |
 +----+------+ +-----+-----+ +----+----+ +-----+--+ +----+------+
      |              |            |            |         |
      |              |            |            |         |
 +----+-----+        |       +----+----+       |    +----+----+
 |  mlx5    +--------+       |mlx5     +-------+    |mlx5     |
 |pci vf 2  |                |pci vf 4 |            |pci sf 8 |
 |03:00:2   |                |03:00.4  |            |mlx5_sf.8|
 +----+-----+                +----+----+            +----+----+
      |                           |                      |
      |                      +----+-----+                |
      +----------------------+mlx5      +----------------+
                             |pci pf 0  |
                             |03:00.0   |
                             +----------+

vdpa tool:
----------
vdpa tool is a tool to create, delete vdpa devices from a parent device. It is a
tool that enables user to query statistics, features and may be more attributes
in future.

vdpa tool command draft:
------------------------
(a) List parent devices which supports creating vdpa devices.
It also shows which class types supported by this parent device.
In below command example two parent devices support vdpa device creation.
First is PCI VF whose bdf is 03.00:2.
Second is PCI VF whose name is 03:00.4.
Third is PCI SF whose name is mlx5_core.sf.8

$ vdpa parentdev list
vdpasim
  supported_classes
    net
pci/0000:03.00:3
  supported_classes
    net block
pci/0000:03.00:4
  supported_classes
    net block
auxiliary/mlx5_core.sf.8
  supported_classes
    net

(b) Now add a vdpa device of networking class and show the device.
$ vdpa dev add parentdev pci/0000:03.00:2 type net name foo0 $ vdpa dev show foo0
foo0: parentdev pci/0000:03.00:2 type network parentdev vdpasim vendor_id 0 max_vqs 2 max_vq_size 256

(c) Show features of a vdpa device
$ vdpa dev features show foo0
supported
  iommu platform
  version 1

(d) Dump vdpa device statistics
$ vdpa dev stats show foo0
kickdoorbells 10
wqes 100

(e) Now delete a vdpa device previously created.
$ vdpa dev del foo0

vdpa tool support in this patchset:
-----------------------------------
vdpa tool is created to create, delete and query vdpa devices.
examples:
Show vdpa parent device that supports creating, deleting vdpa devices.

$ vdpa parentdev show
vdpasim:
  supported_classes
    net

$ vdpa parentdev show -jp
{
    "show": {
       "vdpasim": {
          "supported_classes": {
             "net"
        }
    }
}

Create a vdpa device of type networking named as "foo2" from the parent device vdpasim:

$ vdpa dev add parentdev vdpasim type net name foo2

Show the newly created vdpa device by its name:
$ vdpa dev show foo2
foo2: type network parentdev vdpasim vendor_id 0 max_vqs 2 max_vq_size 256

$ vdpa dev show foo2 -jp
{
    "dev": {
        "foo2": {
            "type": "network",
            "parentdev": "vdpasim",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        }
    }
}

Delete the vdpa device after its use:
$ vdpa dev del foo2

vdpa tool support by kernel:
----------------------------
vdpa tool user interface will be supported by existing vdpa kernel framework,
i.e. drivers/vdpa/vdpa.c It services user command through a netlink interface.

Each parent device registers supported callback operations with vdpa subsystem
through which vdpa device(s) can be managed.

FAQs:
-----
1. Where does userspace vdpa tool reside which users can use?
Ans: vdpa tool can possibly reside in iproute2 [1] as it enables user to
create vdpa net devices.

2. Why not create and delete vdpa device using sysfs/configfs?
Ans:
(a) A device creation may involve passing one or more attributes.
Passing multiple attributes and returning error code and more verbose
information for invalid attributes cannot be handled by sysfs/configfs.

(b) netlink framework is rich that enables user space and kernel driver to
provide nested attributes.

(c) Exposing device specific file under sysfs without net namespace
awareness exposes details to multiple containers. Instead exposing
attributes via a netlink socket secures the communication channel with kernel.

(d) netlink socket interface enables to run syscaller kernel tests.

3. Why not use ioctl() interface?
Ans: ioctl() interface replicates the necessary plumbing which already
exists through netlink socket.

4. What happens when one or more user created vdpa devices exist for a
parent PCI VF or SF and such parent device is removed?
Ans: All user created vdpa devices are removed that belong to a parent.

[1] git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git

Next steps:
-----------
(a) Post this patchset and iproute2/vdpa inclusion, remaining two drivers
will be coverted to support vdpa tool instead of creating unmanaged default
device on driver load.
(b) More net specific parameters such as mac, mtu will be added.
(c) Features bits get and set interface will be added.

Parav Pandit (7):
  vdpa: Add missing comment for virtqueue count
  vdpa: Use simpler version of ida allocation
  vdpa: Extend routine to accept vdpa device name
  vdpa: Define vdpa parent device, ops and a netlink interface
  vdpa: Enable a user to add and delete a vdpa device
  vdpa: Enable user to query vdpa device info
  vdpa/vdpa_sim: Enable user to create vdpasim net devices

 drivers/vdpa/Kconfig              |   1 +
 drivers/vdpa/ifcvf/ifcvf_main.c   |   2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
 drivers/vdpa/vdpa.c               | 511 +++++++++++++++++++++++++++++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c  |  81 ++++-
 include/linux/vdpa.h              |  46 ++-
 include/uapi/linux/vdpa.h         |  41 +++
 7 files changed, 660 insertions(+), 24 deletions(-)
 create mode 100644 include/uapi/linux/vdpa.h

Comments

Stefan Hajnoczi Nov. 16, 2020, 9:41 a.m. UTC | #1
Great! A few questions and comments:

How are configuration parameters passed in during device creation
(e.g. MAC address, number of queues)?

Can configuration parameters be changed at runtime (e.g. link up/down)?

Does the configuration parameter interface distinguish between
standard and vendor-specific parameters? Are they namespaced to
prevent naming collisions?

How are software-only parent drivers supported? It's kind of a shame
to modprobe unconditionally if they won't be used. Does vdpatool have
some way of requesting loading a parent driver? That way software
drivers can be loaded on demand.

What is the benefit of making it part of iproute2? If there is not a
significant advantage like sharing code, then I suggest using a
separate repository and package so vdpatool can be installed
separately (e.g. even on AF_VSOCK-only guests without Ethernet).

Stefan
Jakub Kicinski Nov. 16, 2020, 10:23 p.m. UTC | #2
On Thu, 12 Nov 2020 08:39:58 +0200 Parav Pandit wrote:
> FAQs:
> -----
> 1. Where does userspace vdpa tool reside which users can use?
> Ans: vdpa tool can possibly reside in iproute2 [1] as it enables user to
> create vdpa net devices.
> 
> 2. Why not create and delete vdpa device using sysfs/configfs?
> Ans:

> 3. Why not use ioctl() interface?

Obviously I'm gonna ask you - why can't you use devlink?

> Next steps:
> -----------
> (a) Post this patchset and iproute2/vdpa inclusion, remaining two drivers
> will be coverted to support vdpa tool instead of creating unmanaged default
> device on driver load.
> (b) More net specific parameters such as mac, mtu will be added.

How does MAC and MTU belong in this new VDPA thing?
Parav Pandit Nov. 17, 2020, 7:41 p.m. UTC | #3
> From: Stefan Hajnoczi <stefanha@gmail.com>
> Sent: Monday, November 16, 2020 3:11 PM
> Great! A few questions and comments:
> 
> How are configuration parameters passed in during device creation (e.g.
> MAC address, number of queues)?
During device creation time more parameters to be added.
> 
> Can configuration parameters be changed at runtime (e.g. link up/down)?
> 
For representor eswitch based devices, it is usually controlled through it.
For others, I haven't thought about it. If the device supports it, I believe so.
If multiple vpda devices are created over single VF/PF/SF, virtualizing the link for up/down (not just changing the vdpa config bits) can be a challenge.

> Does the configuration parameter interface distinguish between standard
> and vendor-specific parameters? Are they namespaced to prevent naming
> collisions?
Do you have an example of vendor specific parameters?
Since this tool exposes virtio compliant vdpa devices, I didn't consider any vendor specific params.

> 
> How are software-only parent drivers supported? It's kind of a shame to
> modprobe unconditionally if they won't be used. Does vdpatool have some
> way of requesting loading a parent driver? That way software drivers can be
> loaded on demand.
Well, since each parent or management device registers for it, and their type is same, there isn't a way right not to auto load the module.
This will require user to learn what type of vendor device driver to be loaded, which kinds of defeats the purpose.

> 
> What is the benefit of making it part of iproute2? If there is not a significant
> advantage like sharing code, then I suggest using a separate repository and
> package so vdpatool can be installed separately (e.g. even on AF_VSOCK-
> only guests without Ethernet).
Given that vdpa tool intents to create network specific devices, iproute2 seems a better fit than a own repository.
It mainly uses libmnl.

> 
> Stefan
Parav Pandit Nov. 17, 2020, 7:51 p.m. UTC | #4
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, November 17, 2020 3:53 AM
> 
> On Thu, 12 Nov 2020 08:39:58 +0200 Parav Pandit wrote:
> > FAQs:
> > -----
> > 1. Where does userspace vdpa tool reside which users can use?
> > Ans: vdpa tool can possibly reside in iproute2 [1] as it enables user
> > to create vdpa net devices.
> >
> > 2. Why not create and delete vdpa device using sysfs/configfs?
> > Ans:
> 
> > 3. Why not use ioctl() interface?
> 
> Obviously I'm gonna ask you - why can't you use devlink?
> 
This was considered.
However it seems that extending devlink for vdpa specific stats, devices, config sounds overloading devlink beyond its defined scope.

> > Next steps:
> > -----------
> > (a) Post this patchset and iproute2/vdpa inclusion, remaining two
> > drivers will be coverted to support vdpa tool instead of creating
> > unmanaged default device on driver load.
> > (b) More net specific parameters such as mac, mtu will be added.
> 
> How does MAC and MTU belong in this new VDPA thing?
MAC only make sense when user wants to run VF/SF Netdev and vdpa together with different mac address.
Otherwise existing devlink well defined API to have one MAC per function is fine.
Same for MTU, if queues of vdpa vs VF/SF Netdev queues wants have different MTU it make sense to add configure per vdpa device.
Jason Wang Nov. 27, 2020, 3:53 a.m. UTC | #5
On 2020/11/12 下午2:39, Parav Pandit wrote:
> This patchset covers user requirements for managing existing vdpa devices,
> using a tool and its internal design notes for kernel drivers.
>
> Background and user requirements:
> ----------------------------------
> (1) Currently VDPA device is created by driver when driver is loaded.
> However, user should have a choice when to create or not create a vdpa device
> for the underlying parent device.
>
> For example, mlx5 PCI VF and subfunction device supports multiple classes of
> device such netdev, vdpa, rdma. Howevever it is not required to always created
> vdpa device for such device.
>
> (2) In another use case, a device may support creating one or multiple vdpa
> device of same or different class such as net and block.
> Creating vdpa devices at driver load time further limits this use case.
>
> (3) A user should be able to monitor and query vdpa queue level or device level
> statistics for a given vdpa device.
>
> (4) A user should be able to query what class of vdpa devices are supported
> by its parent device.
>
> (5) A user should be able to view supported features and negotiated features
> of the vdpa device.
>
> (6) A user should be able to create a vdpa device in vendor agnostic manner
> using single tool.
>
> Hence, it is required to have a tool through which user can create one or more
> vdpa devices from a parent device which addresses above user requirements.
>
> Example devices:
> ----------------
>   +-----------+ +-----------+ +---------+ +--------+ +-----------+
>   |vdpa dev 0 | |vdpa dev 1 | |rdma dev | |netdev  | |vdpa dev 3 |
>   |type=net   | |type=block | |mlx5_0   | |ens3f0  | |type=net   |
>   +----+------+ +-----+-----+ +----+----+ +-----+--+ +----+------+
>        |              |            |            |         |
>        |              |            |            |         |
>   +----+-----+        |       +----+----+       |    +----+----+
>   |  mlx5    +--------+       |mlx5     +-------+    |mlx5     |
>   |pci vf 2  |                |pci vf 4 |            |pci sf 8 |
>   |03:00:2   |                |03:00.4  |            |mlx5_sf.8|
>   +----+-----+                +----+----+            +----+----+
>        |                           |                      |
>        |                      +----+-----+                |
>        +----------------------+mlx5      +----------------+
>                               |pci pf 0  |
>                               |03:00.0   |
>                               +----------+
>
> vdpa tool:
> ----------
> vdpa tool is a tool to create, delete vdpa devices from a parent device. It is a
> tool that enables user to query statistics, features and may be more attributes
> in future.
>
> vdpa tool command draft:
> ------------------------
> (a) List parent devices which supports creating vdpa devices.
> It also shows which class types supported by this parent device.
> In below command example two parent devices support vdpa device creation.
> First is PCI VF whose bdf is 03.00:2.
> Second is PCI VF whose name is 03:00.4.
> Third is PCI SF whose name is mlx5_core.sf.8
>
> $ vdpa parentdev list
> vdpasim
>    supported_classes
>      net
> pci/0000:03.00:3
>    supported_classes
>      net block
> pci/0000:03.00:4
>    supported_classes
>      net block
> auxiliary/mlx5_core.sf.8
>    supported_classes
>      net
>
> (b) Now add a vdpa device of networking class and show the device.
> $ vdpa dev add parentdev pci/0000:03.00:2 type net name foo0 $ vdpa dev show foo0
> foo0: parentdev pci/0000:03.00:2 type network parentdev vdpasim vendor_id 0 max_vqs 2 max_vq_size 256
>
> (c) Show features of a vdpa device
> $ vdpa dev features show foo0
> supported
>    iommu platform
>    version 1
>
> (d) Dump vdpa device statistics
> $ vdpa dev stats show foo0
> kickdoorbells 10
> wqes 100
>
> (e) Now delete a vdpa device previously created.
> $ vdpa dev del foo0
>
> vdpa tool support in this patchset:
> -----------------------------------
> vdpa tool is created to create, delete and query vdpa devices.
> examples:
> Show vdpa parent device that supports creating, deleting vdpa devices.
>
> $ vdpa parentdev show
> vdpasim:
>    supported_classes
>      net
>
> $ vdpa parentdev show -jp
> {
>      "show": {
>         "vdpasim": {
>            "supported_classes": {
>               "net"
>          }
>      }
> }
>
> Create a vdpa device of type networking named as "foo2" from the parent device vdpasim:
>
> $ vdpa dev add parentdev vdpasim type net name foo2
>
> Show the newly created vdpa device by its name:
> $ vdpa dev show foo2
> foo2: type network parentdev vdpasim vendor_id 0 max_vqs 2 max_vq_size 256
>
> $ vdpa dev show foo2 -jp
> {
>      "dev": {
>          "foo2": {
>              "type": "network",
>              "parentdev": "vdpasim",
>              "vendor_id": 0,
>              "max_vqs": 2,
>              "max_vq_size": 256
>          }
>      }
> }
>
> Delete the vdpa device after its use:
> $ vdpa dev del foo2
>
> vdpa tool support by kernel:
> ----------------------------
> vdpa tool user interface will be supported by existing vdpa kernel framework,
> i.e. drivers/vdpa/vdpa.c It services user command through a netlink interface.
>
> Each parent device registers supported callback operations with vdpa subsystem
> through which vdpa device(s) can be managed.
>
> FAQs:
> -----
> 1. Where does userspace vdpa tool reside which users can use?
> Ans: vdpa tool can possibly reside in iproute2 [1] as it enables user to
> create vdpa net devices.
>
> 2. Why not create and delete vdpa device using sysfs/configfs?
> Ans:
> (a) A device creation may involve passing one or more attributes.
> Passing multiple attributes and returning error code and more verbose
> information for invalid attributes cannot be handled by sysfs/configfs.
>
> (b) netlink framework is rich that enables user space and kernel driver to
> provide nested attributes.
>
> (c) Exposing device specific file under sysfs without net namespace
> awareness exposes details to multiple containers. Instead exposing
> attributes via a netlink socket secures the communication channel with kernel.
>
> (d) netlink socket interface enables to run syscaller kernel tests.
>
> 3. Why not use ioctl() interface?
> Ans: ioctl() interface replicates the necessary plumbing which already
> exists through netlink socket.
>
> 4. What happens when one or more user created vdpa devices exist for a
> parent PCI VF or SF and such parent device is removed?
> Ans: All user created vdpa devices are removed that belong to a parent.
>
> [1] git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
>
> Next steps:
> -----------
> (a) Post this patchset and iproute2/vdpa inclusion, remaining two drivers
> will be coverted to support vdpa tool instead of creating unmanaged default
> device on driver load.
> (b) More net specific parameters such as mac, mtu will be added.
> (c) Features bits get and set interface will be added.


Adding Yong Ji for sharing some thoughts from the view of userspace vDPA 
device.

Thanks
Jason Wang Nov. 30, 2020, 3:36 a.m. UTC | #6
On 2020/11/27 下午1:52, Yongji Xie wrote:
> On Fri, Nov 27, 2020 at 11:53 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/11/12 下午2:39, Parav Pandit wrote:
>     > This patchset covers user requirements for managing existing
>     vdpa devices,
>     > using a tool and its internal design notes for kernel drivers.
>     >
>     > Background and user requirements:
>     > ----------------------------------
>     > (1) Currently VDPA device is created by driver when driver is
>     loaded.
>     > However, user should have a choice when to create or not create
>     a vdpa device
>     > for the underlying parent device.
>     >
>     > For example, mlx5 PCI VF and subfunction device supports
>     multiple classes of
>     > device such netdev, vdpa, rdma. Howevever it is not required to
>     always created
>     > vdpa device for such device.
>     >
>     > (2) In another use case, a device may support creating one or
>     multiple vdpa
>     > device of same or different class such as net and block.
>     > Creating vdpa devices at driver load time further limits this
>     use case.
>     >
>     > (3) A user should be able to monitor and query vdpa queue level
>     or device level
>     > statistics for a given vdpa device.
>     >
>     > (4) A user should be able to query what class of vdpa devices
>     are supported
>     > by its parent device.
>     >
>     > (5) A user should be able to view supported features and
>     negotiated features
>     > of the vdpa device.
>     >
>     > (6) A user should be able to create a vdpa device in vendor
>     agnostic manner
>     > using single tool.
>     >
>     > Hence, it is required to have a tool through which user can
>     create one or more
>     > vdpa devices from a parent device which addresses above user
>     requirements.
>     >
>     > Example devices:
>     > ----------------
>     >   +-----------+ +-----------+ +---------+ +--------+ +-----------+
>     >   |vdpa dev 0 | |vdpa dev 1 | |rdma dev | |netdev  | |vdpa dev 3 |
>     >   |type=net   | |type=block | |mlx5_0   | |ens3f0  | |type=net   |
>     >   +----+------+ +-----+-----+ +----+----+ +-----+--+ +----+------+
>     >        |              |            |            |    |
>     >        |              |            |            |    |
>     >   +----+-----+        |       +----+----+       | +----+----+
>     >   |  mlx5    +--------+       |mlx5     +-------+ |mlx5     |
>     >   |pci vf 2  |                |pci vf 4 | |pci sf 8 |
>     >   |03:00:2   |                |03:00.4  | |mlx5_sf.8|
>     >   +----+-----+                +----+----+ +----+----+
>     >        |                           |   |
>     >        |                      +----+-----+   |
>     >        +----------------------+mlx5 +----------------+
>     >                               |pci pf 0  |
>     >                               |03:00.0   |
>     >                               +----------+
>     >
>     > vdpa tool:
>     > ----------
>     > vdpa tool is a tool to create, delete vdpa devices from a parent
>     device. It is a
>     > tool that enables user to query statistics, features and may be
>     more attributes
>     > in future.
>     >
>     > vdpa tool command draft:
>     > ------------------------
>     > (a) List parent devices which supports creating vdpa devices.
>     > It also shows which class types supported by this parent device.
>     > In below command example two parent devices support vdpa device
>     creation.
>     > First is PCI VF whose bdf is 03.00:2.
>     > Second is PCI VF whose name is 03:00.4.
>     > Third is PCI SF whose name is mlx5_core.sf.8
>     >
>     > $ vdpa parentdev list
>     > vdpasim
>     >    supported_classes
>     >      net
>     > pci/0000:03.00:3
>     >    supported_classes
>     >      net block
>     > pci/0000:03.00:4
>     >    supported_classes
>     >      net block
>     > auxiliary/mlx5_core.sf.8
>     >    supported_classes
>     >      net
>     >
>     > (b) Now add a vdpa device of networking class and show the device.
>     > $ vdpa dev add parentdev pci/0000:03.00:2 type net name foo0 $
>     vdpa dev show foo0
>     > foo0: parentdev pci/0000:03.00:2 type network parentdev vdpasim
>     vendor_id 0 max_vqs 2 max_vq_size 256
>     >
>     > (c) Show features of a vdpa device
>     > $ vdpa dev features show foo0
>     > supported
>     >    iommu platform
>     >    version 1
>     >
>     > (d) Dump vdpa device statistics
>     > $ vdpa dev stats show foo0
>     > kickdoorbells 10
>     > wqes 100
>     >
>     > (e) Now delete a vdpa device previously created.
>     > $ vdpa dev del foo0
>     >
>     > vdpa tool support in this patchset:
>     > -----------------------------------
>     > vdpa tool is created to create, delete and query vdpa devices.
>     > examples:
>     > Show vdpa parent device that supports creating, deleting vdpa
>     devices.
>     >
>     > $ vdpa parentdev show
>     > vdpasim:
>     >    supported_classes
>     >      net
>     >
>     > $ vdpa parentdev show -jp
>     > {
>     >      "show": {
>     >         "vdpasim": {
>     >            "supported_classes": {
>     >               "net"
>     >          }
>     >      }
>     > }
>     >
>     > Create a vdpa device of type networking named as "foo2" from the
>     parent device vdpasim:
>     >
>     > $ vdpa dev add parentdev vdpasim type net name foo2
>     >
>     > Show the newly created vdpa device by its name:
>     > $ vdpa dev show foo2
>     > foo2: type network parentdev vdpasim vendor_id 0 max_vqs 2
>     max_vq_size 256
>     >
>     > $ vdpa dev show foo2 -jp
>     > {
>     >      "dev": {
>     >          "foo2": {
>     >              "type": "network",
>     >              "parentdev": "vdpasim",
>     >              "vendor_id": 0,
>     >              "max_vqs": 2,
>     >              "max_vq_size": 256
>     >          }
>     >      }
>     > }
>     >
>     > Delete the vdpa device after its use:
>     > $ vdpa dev del foo2
>     >
>     > vdpa tool support by kernel:
>     > ----------------------------
>     > vdpa tool user interface will be supported by existing vdpa
>     kernel framework,
>     > i.e. drivers/vdpa/vdpa.c It services user command through a
>     netlink interface.
>     >
>     > Each parent device registers supported callback operations with
>     vdpa subsystem
>     > through which vdpa device(s) can be managed.
>     >
>     > FAQs:
>     > -----
>     > 1. Where does userspace vdpa tool reside which users can use?
>     > Ans: vdpa tool can possibly reside in iproute2 [1] as it enables
>     user to
>     > create vdpa net devices.
>     >
>     > 2. Why not create and delete vdpa device using sysfs/configfs?
>     > Ans:
>     > (a) A device creation may involve passing one or more attributes.
>     > Passing multiple attributes and returning error code and more
>     verbose
>     > information for invalid attributes cannot be handled by
>     sysfs/configfs.
>     >
>     > (b) netlink framework is rich that enables user space and kernel
>     driver to
>     > provide nested attributes.
>     >
>     > (c) Exposing device specific file under sysfs without net namespace
>     > awareness exposes details to multiple containers. Instead exposing
>     > attributes via a netlink socket secures the communication
>     channel with kernel.
>     >
>     > (d) netlink socket interface enables to run syscaller kernel tests.
>     >
>     > 3. Why not use ioctl() interface?
>     > Ans: ioctl() interface replicates the necessary plumbing which
>     already
>     > exists through netlink socket.
>     >
>     > 4. What happens when one or more user created vdpa devices exist
>     for a
>     > parent PCI VF or SF and such parent device is removed?
>     > Ans: All user created vdpa devices are removed that belong to a
>     parent.
>     >
>     > [1]
>     git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
>     <http://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git>
>     >
>     > Next steps:
>     > -----------
>     > (a) Post this patchset and iproute2/vdpa inclusion, remaining
>     two drivers
>     > will be coverted to support vdpa tool instead of creating
>     unmanaged default
>     > device on driver load.
>     > (b) More net specific parameters such as mac, mtu will be added.
>     > (c) Features bits get and set interface will be added.
>
>
>     Adding Yong Ji for sharing some thoughts from the view of
>     userspace vDPA
>     device.
>
>
> Thanks for adding me, Jason!
>
> Now I'm working on a v2 patchset for VDUSE (vDPA Device in Userspace) 
> [1]. This tool is very useful for the vduse device. So I'm considering 
> integrating this into my v2 patchset. But there is one problem:
>
> In this tool, vdpa device config action and enable action are combined 
> into one netlink msg: VDPA_CMD_DEV_NEW. But in vduse case, it needs to 
> be splitted because a chardev should be created and opened by a 
> userspace process before we enable the vdpa device (call 
> vdpa_register_device()).
>
> So I'd like to know whether it's possible (or have some plans) to add 
> two new netlink msgs something like: VDPA_CMD_DEV_ENABLE and 
> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
>

Actually, we've discussed such intermediate step in some early 
discussion. It looks to me VDUSE could be one of the users of this.

Or I wonder whether we can switch to use anonymous inode(fd) for VDUSE 
then fetching it via an VDUSE_GET_DEVICE_FD ioctl?

Thanks


> Thanks,
> Yongji
>
> [1] https://www.spinics.net/lists/linux-mm/msg231576.html
Yongji Xie Nov. 30, 2020, 7:07 a.m. UTC | #7
On Mon, Nov 30, 2020 at 11:36 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/11/27 下午1:52, Yongji Xie wrote:
> > On Fri, Nov 27, 2020 at 11:53 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >     On 2020/11/12 下午2:39, Parav Pandit wrote:
> >     > This patchset covers user requirements for managing existing
> >     vdpa devices,
> >     > using a tool and its internal design notes for kernel drivers.
> >     >
> >     > Background and user requirements:
> >     > ----------------------------------
> >     > (1) Currently VDPA device is created by driver when driver is
> >     loaded.
> >     > However, user should have a choice when to create or not create
> >     a vdpa device
> >     > for the underlying parent device.
> >     >
> >     > For example, mlx5 PCI VF and subfunction device supports
> >     multiple classes of
> >     > device such netdev, vdpa, rdma. Howevever it is not required to
> >     always created
> >     > vdpa device for such device.
> >     >
> >     > (2) In another use case, a device may support creating one or
> >     multiple vdpa
> >     > device of same or different class such as net and block.
> >     > Creating vdpa devices at driver load time further limits this
> >     use case.
> >     >
> >     > (3) A user should be able to monitor and query vdpa queue level
> >     or device level
> >     > statistics for a given vdpa device.
> >     >
> >     > (4) A user should be able to query what class of vdpa devices
> >     are supported
> >     > by its parent device.
> >     >
> >     > (5) A user should be able to view supported features and
> >     negotiated features
> >     > of the vdpa device.
> >     >
> >     > (6) A user should be able to create a vdpa device in vendor
> >     agnostic manner
> >     > using single tool.
> >     >
> >     > Hence, it is required to have a tool through which user can
> >     create one or more
> >     > vdpa devices from a parent device which addresses above user
> >     requirements.
> >     >
> >     > Example devices:
> >     > ----------------
> >     >   +-----------+ +-----------+ +---------+ +--------+ +-----------+
> >     >   |vdpa dev 0 | |vdpa dev 1 | |rdma dev | |netdev  | |vdpa dev 3 |
> >     >   |type=net   | |type=block | |mlx5_0   | |ens3f0  | |type=net   |
> >     >   +----+------+ +-----+-----+ +----+----+ +-----+--+ +----+------+
> >     >        |              |            |            |    |
> >     >        |              |            |            |    |
> >     >   +----+-----+        |       +----+----+       | +----+----+
> >     >   |  mlx5    +--------+       |mlx5     +-------+ |mlx5     |
> >     >   |pci vf 2  |                |pci vf 4 | |pci sf 8 |
> >     >   |03:00:2   |                |03:00.4  | |mlx5_sf.8|
> >     >   +----+-----+                +----+----+ +----+----+
> >     >        |                           |   |
> >     >        |                      +----+-----+   |
> >     >        +----------------------+mlx5 +----------------+
> >     >                               |pci pf 0  |
> >     >                               |03:00.0   |
> >     >                               +----------+
> >     >
> >     > vdpa tool:
> >     > ----------
> >     > vdpa tool is a tool to create, delete vdpa devices from a parent
> >     device. It is a
> >     > tool that enables user to query statistics, features and may be
> >     more attributes
> >     > in future.
> >     >
> >     > vdpa tool command draft:
> >     > ------------------------
> >     > (a) List parent devices which supports creating vdpa devices.
> >     > It also shows which class types supported by this parent device.
> >     > In below command example two parent devices support vdpa device
> >     creation.
> >     > First is PCI VF whose bdf is 03.00:2.
> >     > Second is PCI VF whose name is 03:00.4.
> >     > Third is PCI SF whose name is mlx5_core.sf.8
> >     >
> >     > $ vdpa parentdev list
> >     > vdpasim
> >     >    supported_classes
> >     >      net
> >     > pci/0000:03.00:3
> >     >    supported_classes
> >     >      net block
> >     > pci/0000:03.00:4
> >     >    supported_classes
> >     >      net block
> >     > auxiliary/mlx5_core.sf.8
> >     >    supported_classes
> >     >      net
> >     >
> >     > (b) Now add a vdpa device of networking class and show the device.
> >     > $ vdpa dev add parentdev pci/0000:03.00:2 type net name foo0 $
> >     vdpa dev show foo0
> >     > foo0: parentdev pci/0000:03.00:2 type network parentdev vdpasim
> >     vendor_id 0 max_vqs 2 max_vq_size 256
> >     >
> >     > (c) Show features of a vdpa device
> >     > $ vdpa dev features show foo0
> >     > supported
> >     >    iommu platform
> >     >    version 1
> >     >
> >     > (d) Dump vdpa device statistics
> >     > $ vdpa dev stats show foo0
> >     > kickdoorbells 10
> >     > wqes 100
> >     >
> >     > (e) Now delete a vdpa device previously created.
> >     > $ vdpa dev del foo0
> >     >
> >     > vdpa tool support in this patchset:
> >     > -----------------------------------
> >     > vdpa tool is created to create, delete and query vdpa devices.
> >     > examples:
> >     > Show vdpa parent device that supports creating, deleting vdpa
> >     devices.
> >     >
> >     > $ vdpa parentdev show
> >     > vdpasim:
> >     >    supported_classes
> >     >      net
> >     >
> >     > $ vdpa parentdev show -jp
> >     > {
> >     >      "show": {
> >     >         "vdpasim": {
> >     >            "supported_classes": {
> >     >               "net"
> >     >          }
> >     >      }
> >     > }
> >     >
> >     > Create a vdpa device of type networking named as "foo2" from the
> >     parent device vdpasim:
> >     >
> >     > $ vdpa dev add parentdev vdpasim type net name foo2
> >     >
> >     > Show the newly created vdpa device by its name:
> >     > $ vdpa dev show foo2
> >     > foo2: type network parentdev vdpasim vendor_id 0 max_vqs 2
> >     max_vq_size 256
> >     >
> >     > $ vdpa dev show foo2 -jp
> >     > {
> >     >      "dev": {
> >     >          "foo2": {
> >     >              "type": "network",
> >     >              "parentdev": "vdpasim",
> >     >              "vendor_id": 0,
> >     >              "max_vqs": 2,
> >     >              "max_vq_size": 256
> >     >          }
> >     >      }
> >     > }
> >     >
> >     > Delete the vdpa device after its use:
> >     > $ vdpa dev del foo2
> >     >
> >     > vdpa tool support by kernel:
> >     > ----------------------------
> >     > vdpa tool user interface will be supported by existing vdpa
> >     kernel framework,
> >     > i.e. drivers/vdpa/vdpa.c It services user command through a
> >     netlink interface.
> >     >
> >     > Each parent device registers supported callback operations with
> >     vdpa subsystem
> >     > through which vdpa device(s) can be managed.
> >     >
> >     > FAQs:
> >     > -----
> >     > 1. Where does userspace vdpa tool reside which users can use?
> >     > Ans: vdpa tool can possibly reside in iproute2 [1] as it enables
> >     user to
> >     > create vdpa net devices.
> >     >
> >     > 2. Why not create and delete vdpa device using sysfs/configfs?
> >     > Ans:
> >     > (a) A device creation may involve passing one or more attributes.
> >     > Passing multiple attributes and returning error code and more
> >     verbose
> >     > information for invalid attributes cannot be handled by
> >     sysfs/configfs.
> >     >
> >     > (b) netlink framework is rich that enables user space and kernel
> >     driver to
> >     > provide nested attributes.
> >     >
> >     > (c) Exposing device specific file under sysfs without net namespace
> >     > awareness exposes details to multiple containers. Instead exposing
> >     > attributes via a netlink socket secures the communication
> >     channel with kernel.
> >     >
> >     > (d) netlink socket interface enables to run syscaller kernel tests.
> >     >
> >     > 3. Why not use ioctl() interface?
> >     > Ans: ioctl() interface replicates the necessary plumbing which
> >     already
> >     > exists through netlink socket.
> >     >
> >     > 4. What happens when one or more user created vdpa devices exist
> >     for a
> >     > parent PCI VF or SF and such parent device is removed?
> >     > Ans: All user created vdpa devices are removed that belong to a
> >     parent.
> >     >
> >     > [1]
> >     git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
> >     <http://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git>
> >     >
> >     > Next steps:
> >     > -----------
> >     > (a) Post this patchset and iproute2/vdpa inclusion, remaining
> >     two drivers
> >     > will be coverted to support vdpa tool instead of creating
> >     unmanaged default
> >     > device on driver load.
> >     > (b) More net specific parameters such as mac, mtu will be added.
> >     > (c) Features bits get and set interface will be added.
> >
> >
> >     Adding Yong Ji for sharing some thoughts from the view of
> >     userspace vDPA
> >     device.
> >
> >
> > Thanks for adding me, Jason!
> >
> > Now I'm working on a v2 patchset for VDUSE (vDPA Device in Userspace)
> > [1]. This tool is very useful for the vduse device. So I'm considering
> > integrating this into my v2 patchset. But there is one problem:
> >
> > In this tool, vdpa device config action and enable action are combined
> > into one netlink msg: VDPA_CMD_DEV_NEW. But in vduse case, it needs to
> > be splitted because a chardev should be created and opened by a
> > userspace process before we enable the vdpa device (call
> > vdpa_register_device()).
> >
> > So I'd like to know whether it's possible (or have some plans) to add
> > two new netlink msgs something like: VDPA_CMD_DEV_ENABLE and
> > VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> >
>
> Actually, we've discussed such intermediate step in some early
> discussion. It looks to me VDUSE could be one of the users of this.
>
> Or I wonder whether we can switch to use anonymous inode(fd) for VDUSE
> then fetching it via an VDUSE_GET_DEVICE_FD ioctl?
>

Yes, we can. Actually the current implementation in VDUSE is like
this.  But seems like this is still a intermediate step. The fd should
be binded to a name or something else which need to be configured
before.

Thanks,
Yongji
Jason Wang Dec. 1, 2020, 6:25 a.m. UTC | #8
On 2020/11/30 下午3:07, Yongji Xie wrote:
>>> Thanks for adding me, Jason!
>>>
>>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in Userspace)
>>> [1]. This tool is very useful for the vduse device. So I'm considering
>>> integrating this into my v2 patchset. But there is one problem:
>>>
>>> In this tool, vdpa device config action and enable action are combined
>>> into one netlink msg: VDPA_CMD_DEV_NEW. But in vduse case, it needs to
>>> be splitted because a chardev should be created and opened by a
>>> userspace process before we enable the vdpa device (call
>>> vdpa_register_device()).
>>>
>>> So I'd like to know whether it's possible (or have some plans) to add
>>> two new netlink msgs something like: VDPA_CMD_DEV_ENABLE and
>>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
>>>
>> Actually, we've discussed such intermediate step in some early
>> discussion. It looks to me VDUSE could be one of the users of this.
>>
>> Or I wonder whether we can switch to use anonymous inode(fd) for VDUSE
>> then fetching it via an VDUSE_GET_DEVICE_FD ioctl?
>>
> Yes, we can. Actually the current implementation in VDUSE is like
> this.  But seems like this is still a intermediate step. The fd should
> be binded to a name or something else which need to be configured
> before.


The name could be specified via the netlink. It looks to me the real 
issue is that until the device is connected with a userspace, it can't 
be used. So we also need to fail the enabling if it doesn't opened.

Thanks


>
> Thanks,
> Yongji
>
Yongji Xie Dec. 1, 2020, 9:55 a.m. UTC | #9
On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/11/30 下午3:07, Yongji Xie wrote:
> >>> Thanks for adding me, Jason!
> >>>
> >>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in Userspace)
> >>> [1]. This tool is very useful for the vduse device. So I'm considering
> >>> integrating this into my v2 patchset. But there is one problem:
> >>>
> >>> In this tool, vdpa device config action and enable action are combined
> >>> into one netlink msg: VDPA_CMD_DEV_NEW. But in vduse case, it needs to
> >>> be splitted because a chardev should be created and opened by a
> >>> userspace process before we enable the vdpa device (call
> >>> vdpa_register_device()).
> >>>
> >>> So I'd like to know whether it's possible (or have some plans) to add
> >>> two new netlink msgs something like: VDPA_CMD_DEV_ENABLE and
> >>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> >>>
> >> Actually, we've discussed such intermediate step in some early
> >> discussion. It looks to me VDUSE could be one of the users of this.
> >>
> >> Or I wonder whether we can switch to use anonymous inode(fd) for VDUSE
> >> then fetching it via an VDUSE_GET_DEVICE_FD ioctl?
> >>
> > Yes, we can. Actually the current implementation in VDUSE is like
> > this.  But seems like this is still a intermediate step. The fd should
> > be binded to a name or something else which need to be configured
> > before.
>
>
> The name could be specified via the netlink. It looks to me the real
> issue is that until the device is connected with a userspace, it can't
> be used. So we also need to fail the enabling if it doesn't opened.
>

Yes, that's true. So you mean we can firstly try to fetch the fd
binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use the
name/vduse_id as a attribute to create vdpa device? It looks fine to
me.

Thanks,
Yongji
Parav Pandit Dec. 1, 2020, 11:32 a.m. UTC | #10
> From: Yongji Xie <xieyongji@bytedance.com>
> Sent: Tuesday, December 1, 2020 3:26 PM
> 
> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > >>> Thanks for adding me, Jason!
> > >>>
> > >>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> > >>> Userspace) [1]. This tool is very useful for the vduse device. So
> > >>> I'm considering integrating this into my v2 patchset. But there is
> > >>> one problem:
> > >>>
> > >>> In this tool, vdpa device config action and enable action are
> > >>> combined into one netlink msg: VDPA_CMD_DEV_NEW. But in vduse
> > >>> case, it needs to be splitted because a chardev should be created
> > >>> and opened by a userspace process before we enable the vdpa device
> > >>> (call vdpa_register_device()).
> > >>>
> > >>> So I'd like to know whether it's possible (or have some plans) to
> > >>> add two new netlink msgs something like: VDPA_CMD_DEV_ENABLE
> and
> > >>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> > >>>
> > >> Actually, we've discussed such intermediate step in some early
> > >> discussion. It looks to me VDUSE could be one of the users of this.
> > >>
> > >> Or I wonder whether we can switch to use anonymous inode(fd) for
> > >> VDUSE then fetching it via an VDUSE_GET_DEVICE_FD ioctl?
> > >>
> > > Yes, we can. Actually the current implementation in VDUSE is like
> > > this.  But seems like this is still a intermediate step. The fd
> > > should be binded to a name or something else which need to be
> > > configured before.
> >
> >
> > The name could be specified via the netlink. It looks to me the real
> > issue is that until the device is connected with a userspace, it can't
> > be used. So we also need to fail the enabling if it doesn't opened.
> >
> 
> Yes, that's true. So you mean we can firstly try to fetch the fd binded to a
> name/vduse_id via an VDUSE_GET_DEVICE_FD, then use the
> name/vduse_id as a attribute to create vdpa device? It looks fine to me.

I probably do not well understand. I tried reading patch [1] and few things do not look correct as below.
Creating the vdpa device on the bus device and destroying the device from the workqueue seems unnecessary and racy.

It seems vduse driver needs 
This is something should be done as part of the vdpa dev add command, instead of connecting two sides separately and ensuring race free access to it.

So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.

$ vdpa dev add parentdev vduse_mgmtdev type net name foo2

When above command is executed it creates necessary vdpa device foo2 on the bus.
When user binds foo2 device with the vduse driver, in the probe(), it creates respective char device to access it from user space.
Depending on which driver foo2 device is bound it, it can be used, either via (a) existing vhost stack  or (b) some vdpa Netdev driver? (not sure its current state), or (c) vduse user space.

This will have sane model to me without races unless I am missing something fundamental here.
This way there are not two ways to create vdpa devices from user space.
Consumers can be of different types (vhost, vduse etc) of the bus device as above mentioned.

[1] https://www.spinics.net/lists/linux-mm/msg231581.html
Yongji Xie Dec. 1, 2020, 2:18 p.m. UTC | #11
On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Yongji Xie <xieyongji@bytedance.com>
> > Sent: Tuesday, December 1, 2020 3:26 PM
> >
> > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > >>> Thanks for adding me, Jason!
> > > >>>
> > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> > > >>> Userspace) [1]. This tool is very useful for the vduse device. So
> > > >>> I'm considering integrating this into my v2 patchset. But there is
> > > >>> one problem:
> > > >>>
> > > >>> In this tool, vdpa device config action and enable action are
> > > >>> combined into one netlink msg: VDPA_CMD_DEV_NEW. But in vduse
> > > >>> case, it needs to be splitted because a chardev should be created
> > > >>> and opened by a userspace process before we enable the vdpa device
> > > >>> (call vdpa_register_device()).
> > > >>>
> > > >>> So I'd like to know whether it's possible (or have some plans) to
> > > >>> add two new netlink msgs something like: VDPA_CMD_DEV_ENABLE
> > and
> > > >>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> > > >>>
> > > >> Actually, we've discussed such intermediate step in some early
> > > >> discussion. It looks to me VDUSE could be one of the users of this.
> > > >>
> > > >> Or I wonder whether we can switch to use anonymous inode(fd) for
> > > >> VDUSE then fetching it via an VDUSE_GET_DEVICE_FD ioctl?
> > > >>
> > > > Yes, we can. Actually the current implementation in VDUSE is like
> > > > this.  But seems like this is still a intermediate step. The fd
> > > > should be binded to a name or something else which need to be
> > > > configured before.
> > >
> > >
> > > The name could be specified via the netlink. It looks to me the real
> > > issue is that until the device is connected with a userspace, it can't
> > > be used. So we also need to fail the enabling if it doesn't opened.
> > >
> >
> > Yes, that's true. So you mean we can firstly try to fetch the fd binded to a
> > name/vduse_id via an VDUSE_GET_DEVICE_FD, then use the
> > name/vduse_id as a attribute to create vdpa device? It looks fine to me.
>
> I probably do not well understand. I tried reading patch [1] and few things do not look correct as below.
> Creating the vdpa device on the bus device and destroying the device from the workqueue seems unnecessary and racy.
>
> It seems vduse driver needs
> This is something should be done as part of the vdpa dev add command, instead of connecting two sides separately and ensuring race free access to it.
>
> So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
>

Yes, we can avoid these two ioctls with the help of the management tool.

> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
>
> When above command is executed it creates necessary vdpa device foo2 on the bus.
> When user binds foo2 device with the vduse driver, in the probe(), it creates respective char device to access it from user space.

But vduse driver is not a vdpa bus driver. It works like vdpasim
driver, but offloads the data plane and control plane to a user space
process.

Thanks,
Yongji
Parav Pandit Dec. 1, 2020, 3:58 p.m. UTC | #12
> From: Yongji Xie <xieyongji@bytedance.com>
> Sent: Tuesday, December 1, 2020 7:49 PM
> 
> On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Yongji Xie <xieyongji@bytedance.com>
> > > Sent: Tuesday, December 1, 2020 3:26 PM
> > >
> > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com>
> wrote:
> > > >
> > > >
> > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > >>> Thanks for adding me, Jason!
> > > > >>>
> > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> > > > >>> Userspace) [1]. This tool is very useful for the vduse device.
> > > > >>> So I'm considering integrating this into my v2 patchset. But
> > > > >>> there is one problem:
> > > > >>>
> > > > >>> In this tool, vdpa device config action and enable action are
> > > > >>> combined into one netlink msg: VDPA_CMD_DEV_NEW. But in
> vduse
> > > > >>> case, it needs to be splitted because a chardev should be
> > > > >>> created and opened by a userspace process before we enable the
> > > > >>> vdpa device (call vdpa_register_device()).
> > > > >>>
> > > > >>> So I'd like to know whether it's possible (or have some plans)
> > > > >>> to add two new netlink msgs something like:
> > > > >>> VDPA_CMD_DEV_ENABLE
> > > and
> > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> > > > >>>
> > > > >> Actually, we've discussed such intermediate step in some early
> > > > >> discussion. It looks to me VDUSE could be one of the users of this.
> > > > >>
> > > > >> Or I wonder whether we can switch to use anonymous inode(fd)
> > > > >> for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD ioctl?
> > > > >>
> > > > > Yes, we can. Actually the current implementation in VDUSE is
> > > > > like this.  But seems like this is still a intermediate step.
> > > > > The fd should be binded to a name or something else which need
> > > > > to be configured before.
> > > >
> > > >
> > > > The name could be specified via the netlink. It looks to me the
> > > > real issue is that until the device is connected with a userspace,
> > > > it can't be used. So we also need to fail the enabling if it doesn't
> opened.
> > > >
> > >
> > > Yes, that's true. So you mean we can firstly try to fetch the fd
> > > binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use the
> > > name/vduse_id as a attribute to create vdpa device? It looks fine to me.
> >
> > I probably do not well understand. I tried reading patch [1] and few things
> do not look correct as below.
> > Creating the vdpa device on the bus device and destroying the device from
> the workqueue seems unnecessary and racy.
> >
> > It seems vduse driver needs
> > This is something should be done as part of the vdpa dev add command,
> instead of connecting two sides separately and ensuring race free access to
> it.
> >
> > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
> >
> 
> Yes, we can avoid these two ioctls with the help of the management tool.
> 
> > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> >
> > When above command is executed it creates necessary vdpa device foo2
> on the bus.
> > When user binds foo2 device with the vduse driver, in the probe(), it
> creates respective char device to access it from user space.
>
I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 or netdevsim.
It has its own implementation similar to fuse with its own backend of choice.
More below.

> But vduse driver is not a vdpa bus driver. It works like vdpasim driver, but
> offloads the data plane and control plane to a user space process.

In that case to draw parallel lines,

1. netdevsim:
(a) create resources in kernel sw
(b) datapath simulates in kernel

2. ifc + mlx5 vdpa dev:
(a) creates resource in hw
(b) data path is in hw

3. vduse:
(a) creates resources in userspace sw
(b) data path is in user space.
hence creates data path resources for user space.
So char device is created, removed as result of vdpa device creation.

For example,
$ vdpa dev add parentdev vduse_mgmtdev type net name foo2

Above command will create char device for user space.

Similar command for ifc/mlx5 would have created similar channel for rest of the config commands in hw.
vduse channel = char device, eventfd etc.
ifc/mlx5 hw channel = bar, irq, command interface etc
Netdev sim channel = sw direct calls

Does it make sense?
Yongji Xie Dec. 2, 2020, 3:29 a.m. UTC | #13
On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Yongji Xie <xieyongji@bytedance.com>
> > Sent: Tuesday, December 1, 2020 7:49 PM
> >
> > On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > Sent: Tuesday, December 1, 2020 3:26 PM
> > > >
> > > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com>
> > wrote:
> > > > >
> > > > >
> > > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > > >>> Thanks for adding me, Jason!
> > > > > >>>
> > > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> > > > > >>> Userspace) [1]. This tool is very useful for the vduse device.
> > > > > >>> So I'm considering integrating this into my v2 patchset. But
> > > > > >>> there is one problem:
> > > > > >>>
> > > > > >>> In this tool, vdpa device config action and enable action are
> > > > > >>> combined into one netlink msg: VDPA_CMD_DEV_NEW. But in
> > vduse
> > > > > >>> case, it needs to be splitted because a chardev should be
> > > > > >>> created and opened by a userspace process before we enable the
> > > > > >>> vdpa device (call vdpa_register_device()).
> > > > > >>>
> > > > > >>> So I'd like to know whether it's possible (or have some plans)
> > > > > >>> to add two new netlink msgs something like:
> > > > > >>> VDPA_CMD_DEV_ENABLE
> > > > and
> > > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> > > > > >>>
> > > > > >> Actually, we've discussed such intermediate step in some early
> > > > > >> discussion. It looks to me VDUSE could be one of the users of this.
> > > > > >>
> > > > > >> Or I wonder whether we can switch to use anonymous inode(fd)
> > > > > >> for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD ioctl?
> > > > > >>
> > > > > > Yes, we can. Actually the current implementation in VDUSE is
> > > > > > like this.  But seems like this is still a intermediate step.
> > > > > > The fd should be binded to a name or something else which need
> > > > > > to be configured before.
> > > > >
> > > > >
> > > > > The name could be specified via the netlink. It looks to me the
> > > > > real issue is that until the device is connected with a userspace,
> > > > > it can't be used. So we also need to fail the enabling if it doesn't
> > opened.
> > > > >
> > > >
> > > > Yes, that's true. So you mean we can firstly try to fetch the fd
> > > > binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use the
> > > > name/vduse_id as a attribute to create vdpa device? It looks fine to me.
> > >
> > > I probably do not well understand. I tried reading patch [1] and few things
> > do not look correct as below.
> > > Creating the vdpa device on the bus device and destroying the device from
> > the workqueue seems unnecessary and racy.
> > >
> > > It seems vduse driver needs
> > > This is something should be done as part of the vdpa dev add command,
> > instead of connecting two sides separately and ensuring race free access to
> > it.
> > >
> > > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
> > >
> >
> > Yes, we can avoid these two ioctls with the help of the management tool.
> >
> > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > >
> > > When above command is executed it creates necessary vdpa device foo2
> > on the bus.
> > > When user binds foo2 device with the vduse driver, in the probe(), it
> > creates respective char device to access it from user space.
> >
> I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 or netdevsim.
> It has its own implementation similar to fuse with its own backend of choice.
> More below.
>
> > But vduse driver is not a vdpa bus driver. It works like vdpasim driver, but
> > offloads the data plane and control plane to a user space process.
>
> In that case to draw parallel lines,
>
> 1. netdevsim:
> (a) create resources in kernel sw
> (b) datapath simulates in kernel
>
> 2. ifc + mlx5 vdpa dev:
> (a) creates resource in hw
> (b) data path is in hw
>
> 3. vduse:
> (a) creates resources in userspace sw
> (b) data path is in user space.
> hence creates data path resources for user space.
> So char device is created, removed as result of vdpa device creation.
>
> For example,
> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
>
> Above command will create char device for user space.
>
> Similar command for ifc/mlx5 would have created similar channel for rest of the config commands in hw.
> vduse channel = char device, eventfd etc.
> ifc/mlx5 hw channel = bar, irq, command interface etc
> Netdev sim channel = sw direct calls
>
> Does it make sense?

In my understanding, to make vdpa work, we need a backend (datapath
resources) and a frontend (a vdpa device attached to a vdpa bus). In
the above example, it looks like we use the command "vdpa dev add ..."
 to create a backend, so do we need another command to create a
frontend?

Thanks,
Yongji
Parav Pandit Dec. 2, 2020, 4:53 a.m. UTC | #14
> From: Yongji Xie <xieyongji@bytedance.com>
> Sent: Wednesday, December 2, 2020 9:00 AM
> 
> On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Yongji Xie <xieyongji@bytedance.com>
> > > Sent: Tuesday, December 1, 2020 7:49 PM
> > >
> > > On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > >
> > > >
> > > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > > Sent: Tuesday, December 1, 2020 3:26 PM
> > > > >
> > > > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com>
> > > wrote:
> > > > > >
> > > > > >
> > > > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > > > >>> Thanks for adding me, Jason!
> > > > > > >>>
> > > > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> > > > > > >>> Userspace) [1]. This tool is very useful for the vduse device.
> > > > > > >>> So I'm considering integrating this into my v2 patchset.
> > > > > > >>> But there is one problem:
> > > > > > >>>
> > > > > > >>> In this tool, vdpa device config action and enable action
> > > > > > >>> are combined into one netlink msg: VDPA_CMD_DEV_NEW. But
> > > > > > >>> in
> > > vduse
> > > > > > >>> case, it needs to be splitted because a chardev should be
> > > > > > >>> created and opened by a userspace process before we enable
> > > > > > >>> the vdpa device (call vdpa_register_device()).
> > > > > > >>>
> > > > > > >>> So I'd like to know whether it's possible (or have some
> > > > > > >>> plans) to add two new netlink msgs something like:
> > > > > > >>> VDPA_CMD_DEV_ENABLE
> > > > > and
> > > > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> > > > > > >>>
> > > > > > >> Actually, we've discussed such intermediate step in some
> > > > > > >> early discussion. It looks to me VDUSE could be one of the users of
> this.
> > > > > > >>
> > > > > > >> Or I wonder whether we can switch to use anonymous
> > > > > > >> inode(fd) for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD
> ioctl?
> > > > > > >>
> > > > > > > Yes, we can. Actually the current implementation in VDUSE is
> > > > > > > like this.  But seems like this is still a intermediate step.
> > > > > > > The fd should be binded to a name or something else which
> > > > > > > need to be configured before.
> > > > > >
> > > > > >
> > > > > > The name could be specified via the netlink. It looks to me
> > > > > > the real issue is that until the device is connected with a
> > > > > > userspace, it can't be used. So we also need to fail the
> > > > > > enabling if it doesn't
> > > opened.
> > > > > >
> > > > >
> > > > > Yes, that's true. So you mean we can firstly try to fetch the fd
> > > > > binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use
> > > > > the name/vduse_id as a attribute to create vdpa device? It looks fine to
> me.
> > > >
> > > > I probably do not well understand. I tried reading patch [1] and
> > > > few things
> > > do not look correct as below.
> > > > Creating the vdpa device on the bus device and destroying the
> > > > device from
> > > the workqueue seems unnecessary and racy.
> > > >
> > > > It seems vduse driver needs
> > > > This is something should be done as part of the vdpa dev add
> > > > command,
> > > instead of connecting two sides separately and ensuring race free
> > > access to it.
> > > >
> > > > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
> > > >
> > >
> > > Yes, we can avoid these two ioctls with the help of the management tool.
> > >
> > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > >
> > > > When above command is executed it creates necessary vdpa device
> > > > foo2
> > > on the bus.
> > > > When user binds foo2 device with the vduse driver, in the probe(),
> > > > it
> > > creates respective char device to access it from user space.
> > >
> > I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 or
> netdevsim.
> > It has its own implementation similar to fuse with its own backend of choice.
> > More below.
> >
> > > But vduse driver is not a vdpa bus driver. It works like vdpasim
> > > driver, but offloads the data plane and control plane to a user space process.
> >
> > In that case to draw parallel lines,
> >
> > 1. netdevsim:
> > (a) create resources in kernel sw
> > (b) datapath simulates in kernel
> >
> > 2. ifc + mlx5 vdpa dev:
> > (a) creates resource in hw
> > (b) data path is in hw
> >
> > 3. vduse:
> > (a) creates resources in userspace sw
> > (b) data path is in user space.
> > hence creates data path resources for user space.
> > So char device is created, removed as result of vdpa device creation.
> >
> > For example,
> > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> >
> > Above command will create char device for user space.
> >
> > Similar command for ifc/mlx5 would have created similar channel for rest of
> the config commands in hw.
> > vduse channel = char device, eventfd etc.
> > ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
> > channel = sw direct calls
> >
> > Does it make sense?
> 
> In my understanding, to make vdpa work, we need a backend (datapath
> resources) and a frontend (a vdpa device attached to a vdpa bus). In the above
> example, it looks like we use the command "vdpa dev add ..."
>  to create a backend, so do we need another command to create a frontend?
> 
For block device there is certainly some backend to process the IOs.
Sometimes backend to be setup first, before its front end is exposed.
"vdpa dev add" is the front end command who connects to the backend (implicitly) for network device.

vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel).

And it needs a way to connect to backend when explicitly specified during creation time.
Something like,
$ vdpa dev add parentdev vdpa_vduse type block name foo3 handle <uuid>
In above example some vendor device specific unique handle is passed based on backend setup in hardware/user space.

In below 3 examples, vdpa block simulator is connecting to backend block or file.

$ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev /dev/zero

$ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev /dev/sda2 size=100M offset=10M

$ vdpa dev add parentdev vdpa_block filebackend_sim type block name foo6 file /root/file_backend.txt

Or may be backend connects to the created vdpa device is bound to the driver.
Can vduse attach to the created vdpa block device through the char device and establish the channel to receive IOs, and to setup the block config space?

> Thanks,
> Yongji
Jason Wang Dec. 2, 2020, 5:48 a.m. UTC | #15
On 2020/12/1 下午5:55, Yongji Xie wrote:
> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2020/11/30 下午3:07, Yongji Xie wrote:
>>>>> Thanks for adding me, Jason!
>>>>>
>>>>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in Userspace)
>>>>> [1]. This tool is very useful for the vduse device. So I'm considering
>>>>> integrating this into my v2 patchset. But there is one problem:
>>>>>
>>>>> In this tool, vdpa device config action and enable action are combined
>>>>> into one netlink msg: VDPA_CMD_DEV_NEW. But in vduse case, it needs to
>>>>> be splitted because a chardev should be created and opened by a
>>>>> userspace process before we enable the vdpa device (call
>>>>> vdpa_register_device()).
>>>>>
>>>>> So I'd like to know whether it's possible (or have some plans) to add
>>>>> two new netlink msgs something like: VDPA_CMD_DEV_ENABLE and
>>>>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
>>>>>
>>>> Actually, we've discussed such intermediate step in some early
>>>> discussion. It looks to me VDUSE could be one of the users of this.
>>>>
>>>> Or I wonder whether we can switch to use anonymous inode(fd) for VDUSE
>>>> then fetching it via an VDUSE_GET_DEVICE_FD ioctl?
>>>>
>>> Yes, we can. Actually the current implementation in VDUSE is like
>>> this.  But seems like this is still a intermediate step. The fd should
>>> be binded to a name or something else which need to be configured
>>> before.
>>
>> The name could be specified via the netlink. It looks to me the real
>> issue is that until the device is connected with a userspace, it can't
>> be used. So we also need to fail the enabling if it doesn't opened.
>>
> Yes, that's true. So you mean we can firstly try to fetch the fd
> binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use the
> name/vduse_id as a attribute to create vdpa device? It looks fine to
> me.


Yes, something like this. The anonymous fd will be created during 
dev_add() and the fd will be carried in the msg to userspace.

Thanks


>
> Thanks,
> Yongji
>
Jason Wang Dec. 2, 2020, 5:51 a.m. UTC | #16
On 2020/12/2 下午12:53, Parav Pandit wrote:
>
>> From: Yongji Xie <xieyongji@bytedance.com>
>> Sent: Wednesday, December 2, 2020 9:00 AM
>>
>> On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
>>>
>>>
>>>> From: Yongji Xie <xieyongji@bytedance.com>
>>>> Sent: Tuesday, December 1, 2020 7:49 PM
>>>>
>>>> On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com> wrote:
>>>>>
>>>>>
>>>>>> From: Yongji Xie <xieyongji@bytedance.com>
>>>>>> Sent: Tuesday, December 1, 2020 3:26 PM
>>>>>>
>>>>>> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com>
>>>> wrote:
>>>>>>>
>>>>>>> On 2020/11/30 下午3:07, Yongji Xie wrote:
>>>>>>>>>> Thanks for adding me, Jason!
>>>>>>>>>>
>>>>>>>>>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
>>>>>>>>>> Userspace) [1]. This tool is very useful for the vduse device.
>>>>>>>>>> So I'm considering integrating this into my v2 patchset.
>>>>>>>>>> But there is one problem:
>>>>>>>>>>
>>>>>>>>>> In this tool, vdpa device config action and enable action
>>>>>>>>>> are combined into one netlink msg: VDPA_CMD_DEV_NEW. But
>>>>>>>>>> in
>>>> vduse
>>>>>>>>>> case, it needs to be splitted because a chardev should be
>>>>>>>>>> created and opened by a userspace process before we enable
>>>>>>>>>> the vdpa device (call vdpa_register_device()).
>>>>>>>>>>
>>>>>>>>>> So I'd like to know whether it's possible (or have some
>>>>>>>>>> plans) to add two new netlink msgs something like:
>>>>>>>>>> VDPA_CMD_DEV_ENABLE
>>>>>> and
>>>>>>>>>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
>>>>>>>>>>
>>>>>>>>> Actually, we've discussed such intermediate step in some
>>>>>>>>> early discussion. It looks to me VDUSE could be one of the users of
>> this.
>>>>>>>>> Or I wonder whether we can switch to use anonymous
>>>>>>>>> inode(fd) for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD
>> ioctl?
>>>>>>>> Yes, we can. Actually the current implementation in VDUSE is
>>>>>>>> like this.  But seems like this is still a intermediate step.
>>>>>>>> The fd should be binded to a name or something else which
>>>>>>>> need to be configured before.
>>>>>>>
>>>>>>> The name could be specified via the netlink. It looks to me
>>>>>>> the real issue is that until the device is connected with a
>>>>>>> userspace, it can't be used. So we also need to fail the
>>>>>>> enabling if it doesn't
>>>> opened.
>>>>>> Yes, that's true. So you mean we can firstly try to fetch the fd
>>>>>> binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use
>>>>>> the name/vduse_id as a attribute to create vdpa device? It looks fine to
>> me.
>>>>> I probably do not well understand. I tried reading patch [1] and
>>>>> few things
>>>> do not look correct as below.
>>>>> Creating the vdpa device on the bus device and destroying the
>>>>> device from
>>>> the workqueue seems unnecessary and racy.
>>>>> It seems vduse driver needs
>>>>> This is something should be done as part of the vdpa dev add
>>>>> command,
>>>> instead of connecting two sides separately and ensuring race free
>>>> access to it.
>>>>> So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
>>>>>
>>>> Yes, we can avoid these two ioctls with the help of the management tool.
>>>>
>>>>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
>>>>>
>>>>> When above command is executed it creates necessary vdpa device
>>>>> foo2
>>>> on the bus.
>>>>> When user binds foo2 device with the vduse driver, in the probe(),
>>>>> it
>>>> creates respective char device to access it from user space.
>>>>
>>> I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 or
>> netdevsim.
>>> It has its own implementation similar to fuse with its own backend of choice.
>>> More below.
>>>
>>>> But vduse driver is not a vdpa bus driver. It works like vdpasim
>>>> driver, but offloads the data plane and control plane to a user space process.
>>> In that case to draw parallel lines,
>>>
>>> 1. netdevsim:
>>> (a) create resources in kernel sw
>>> (b) datapath simulates in kernel
>>>
>>> 2. ifc + mlx5 vdpa dev:
>>> (a) creates resource in hw
>>> (b) data path is in hw
>>>
>>> 3. vduse:
>>> (a) creates resources in userspace sw
>>> (b) data path is in user space.
>>> hence creates data path resources for user space.
>>> So char device is created, removed as result of vdpa device creation.
>>>
>>> For example,
>>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
>>>
>>> Above command will create char device for user space.
>>>
>>> Similar command for ifc/mlx5 would have created similar channel for rest of
>> the config commands in hw.
>>> vduse channel = char device, eventfd etc.
>>> ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
>>> channel = sw direct calls
>>>
>>> Does it make sense?
>> In my understanding, to make vdpa work, we need a backend (datapath
>> resources) and a frontend (a vdpa device attached to a vdpa bus). In the above
>> example, it looks like we use the command "vdpa dev add ..."
>>   to create a backend, so do we need another command to create a frontend?
>>
> For block device there is certainly some backend to process the IOs.
> Sometimes backend to be setup first, before its front end is exposed.
> "vdpa dev add" is the front end command who connects to the backend (implicitly) for network device.
>
> vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel).
>
> And it needs a way to connect to backend when explicitly specified during creation time.
> Something like,
> $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle <uuid>
> In above example some vendor device specific unique handle is passed based on backend setup in hardware/user space.
>
> In below 3 examples, vdpa block simulator is connecting to backend block or file.
>
> $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev /dev/zero
>
> $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev /dev/sda2 size=100M offset=10M
>
> $ vdpa dev add parentdev vdpa_block filebackend_sim type block name foo6 file /root/file_backend.txt
>
> Or may be backend connects to the created vdpa device is bound to the driver.
> Can vduse attach to the created vdpa block device through the char device and establish the channel to receive IOs, and to setup the block config space?


I think it can work.

Another thing I wonder it that, do we consider more than one VDUSE 
parentdev(or management dev)? This allows us to have separated devices 
implemented via different processes.

If yes, VDUSE ioctl needs to be extended to register/unregister parentdev.

Thanks


>
>> Thanks,
>> Yongji
Parav Pandit Dec. 2, 2020, 6:24 a.m. UTC | #17
> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, December 2, 2020 11:21 AM
> 
> On 2020/12/2 下午12:53, Parav Pandit wrote:
> >
> >> From: Yongji Xie <xieyongji@bytedance.com>
> >> Sent: Wednesday, December 2, 2020 9:00 AM
> >>
> >> On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
> >>>
> >>>
> >>>> From: Yongji Xie <xieyongji@bytedance.com>
> >>>> Sent: Tuesday, December 1, 2020 7:49 PM
> >>>>
> >>>> On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com>
> wrote:
> >>>>>
> >>>>>
> >>>>>> From: Yongji Xie <xieyongji@bytedance.com>
> >>>>>> Sent: Tuesday, December 1, 2020 3:26 PM
> >>>>>>
> >>>>>> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang
> <jasowang@redhat.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>> On 2020/11/30 下午3:07, Yongji Xie wrote:
> >>>>>>>>>> Thanks for adding me, Jason!
> >>>>>>>>>>
> >>>>>>>>>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> >>>>>>>>>> Userspace) [1]. This tool is very useful for the vduse device.
> >>>>>>>>>> So I'm considering integrating this into my v2 patchset.
> >>>>>>>>>> But there is one problem:
> >>>>>>>>>>
> >>>>>>>>>> In this tool, vdpa device config action and enable action are
> >>>>>>>>>> combined into one netlink msg: VDPA_CMD_DEV_NEW. But in
> >>>> vduse
> >>>>>>>>>> case, it needs to be splitted because a chardev should be
> >>>>>>>>>> created and opened by a userspace process before we enable
> >>>>>>>>>> the vdpa device (call vdpa_register_device()).
> >>>>>>>>>>
> >>>>>>>>>> So I'd like to know whether it's possible (or have some
> >>>>>>>>>> plans) to add two new netlink msgs something like:
> >>>>>>>>>> VDPA_CMD_DEV_ENABLE
> >>>>>> and
> >>>>>>>>>> VDPA_CMD_DEV_DISABLE to make the config path more
> flexible.
> >>>>>>>>>>
> >>>>>>>>> Actually, we've discussed such intermediate step in some early
> >>>>>>>>> discussion. It looks to me VDUSE could be one of the users of
> >> this.
> >>>>>>>>> Or I wonder whether we can switch to use anonymous
> >>>>>>>>> inode(fd) for VDUSE then fetching it via an
> >>>>>>>>> VDUSE_GET_DEVICE_FD
> >> ioctl?
> >>>>>>>> Yes, we can. Actually the current implementation in VDUSE is
> >>>>>>>> like this.  But seems like this is still a intermediate step.
> >>>>>>>> The fd should be binded to a name or something else which need
> >>>>>>>> to be configured before.
> >>>>>>>
> >>>>>>> The name could be specified via the netlink. It looks to me the
> >>>>>>> real issue is that until the device is connected with a
> >>>>>>> userspace, it can't be used. So we also need to fail the
> >>>>>>> enabling if it doesn't
> >>>> opened.
> >>>>>> Yes, that's true. So you mean we can firstly try to fetch the fd
> >>>>>> binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then
> use
> >>>>>> the name/vduse_id as a attribute to create vdpa device? It looks
> >>>>>> fine to
> >> me.
> >>>>> I probably do not well understand. I tried reading patch [1] and
> >>>>> few things
> >>>> do not look correct as below.
> >>>>> Creating the vdpa device on the bus device and destroying the
> >>>>> device from
> >>>> the workqueue seems unnecessary and racy.
> >>>>> It seems vduse driver needs
> >>>>> This is something should be done as part of the vdpa dev add
> >>>>> command,
> >>>> instead of connecting two sides separately and ensuring race free
> >>>> access to it.
> >>>>> So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be
> avoided.
> >>>>>
> >>>> Yes, we can avoid these two ioctls with the help of the management
> tool.
> >>>>
> >>>>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> >>>>>
> >>>>> When above command is executed it creates necessary vdpa device
> >>>>> foo2
> >>>> on the bus.
> >>>>> When user binds foo2 device with the vduse driver, in the probe(),
> >>>>> it
> >>>> creates respective char device to access it from user space.
> >>>>
> >>> I see. So vduse cannot work with any existing vdpa devices like ifc,
> >>> mlx5 or
> >> netdevsim.
> >>> It has its own implementation similar to fuse with its own backend of
> choice.
> >>> More below.
> >>>
> >>>> But vduse driver is not a vdpa bus driver. It works like vdpasim
> >>>> driver, but offloads the data plane and control plane to a user space
> process.
> >>> In that case to draw parallel lines,
> >>>
> >>> 1. netdevsim:
> >>> (a) create resources in kernel sw
> >>> (b) datapath simulates in kernel
> >>>
> >>> 2. ifc + mlx5 vdpa dev:
> >>> (a) creates resource in hw
> >>> (b) data path is in hw
> >>>
> >>> 3. vduse:
> >>> (a) creates resources in userspace sw
> >>> (b) data path is in user space.
> >>> hence creates data path resources for user space.
> >>> So char device is created, removed as result of vdpa device creation.
> >>>
> >>> For example,
> >>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> >>>
> >>> Above command will create char device for user space.
> >>>
> >>> Similar command for ifc/mlx5 would have created similar channel for
> >>> rest of
> >> the config commands in hw.
> >>> vduse channel = char device, eventfd etc.
> >>> ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
> >>> channel = sw direct calls
> >>>
> >>> Does it make sense?
> >> In my understanding, to make vdpa work, we need a backend (datapath
> >> resources) and a frontend (a vdpa device attached to a vdpa bus). In
> >> the above example, it looks like we use the command "vdpa dev add ..."
> >>   to create a backend, so do we need another command to create a
> frontend?
> >>
> > For block device there is certainly some backend to process the IOs.
> > Sometimes backend to be setup first, before its front end is exposed.
> > "vdpa dev add" is the front end command who connects to the backend
> (implicitly) for network device.
> >
> > vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel).
> >
> > And it needs a way to connect to backend when explicitly specified during
> creation time.
> > Something like,
> > $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle
> <uuid>
> > In above example some vendor device specific unique handle is passed
> based on backend setup in hardware/user space.
> >
> > In below 3 examples, vdpa block simulator is connecting to backend block
> or file.
> >
> > $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev
> > /dev/zero
> >
> > $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev
> > /dev/sda2 size=100M offset=10M
> >
> > $ vdpa dev add parentdev vdpa_block filebackend_sim type block name
> > foo6 file /root/file_backend.txt
> >
> > Or may be backend connects to the created vdpa device is bound to the
> driver.
> > Can vduse attach to the created vdpa block device through the char device
> and establish the channel to receive IOs, and to setup the block config space?
> 
> 
> I think it can work.
> 
> Another thing I wonder it that, do we consider more than one VDUSE
> parentdev(or management dev)? This allows us to have separated devices
> implemented via different processes.
Multiple parentdev should be possible per one driver. for example mlx5_vdpa.ko will create multiple parent dev, one for each PCI VFs, SFs.
vdpa dev add can certainly use one parent/mgmt dev to create multiple vdpa devices.
Not sure why do we need to create multiple parent dev for that.
I guess there is just one parent/mgmt. dev for VDUSE. What will each mgmtdev do differently?
Demux of IOs, events will be per individual char dev level?

> 
> If yes, VDUSE ioctl needs to be extended to register/unregister parentdev.
> 
> Thanks
> 
> 
> >
> >> Thanks,
> >> Yongji
Jason Wang Dec. 2, 2020, 7:55 a.m. UTC | #18
On 2020/12/2 下午2:24, Parav Pandit wrote:
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Wednesday, December 2, 2020 11:21 AM
>>
>> On 2020/12/2 下午12:53, Parav Pandit wrote:
>>>> From: Yongji Xie <xieyongji@bytedance.com>
>>>> Sent: Wednesday, December 2, 2020 9:00 AM
>>>>
>>>> On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
>>>>>
>>>>>> From: Yongji Xie <xieyongji@bytedance.com>
>>>>>> Sent: Tuesday, December 1, 2020 7:49 PM
>>>>>>
>>>>>> On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com>
>> wrote:
>>>>>>>
>>>>>>>> From: Yongji Xie <xieyongji@bytedance.com>
>>>>>>>> Sent: Tuesday, December 1, 2020 3:26 PM
>>>>>>>>
>>>>>>>> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang
>> <jasowang@redhat.com>
>>>>>> wrote:
>>>>>>>>> On 2020/11/30 下午3:07, Yongji Xie wrote:
>>>>>>>>>>>> Thanks for adding me, Jason!
>>>>>>>>>>>>
>>>>>>>>>>>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
>>>>>>>>>>>> Userspace) [1]. This tool is very useful for the vduse device.
>>>>>>>>>>>> So I'm considering integrating this into my v2 patchset.
>>>>>>>>>>>> But there is one problem:
>>>>>>>>>>>>
>>>>>>>>>>>> In this tool, vdpa device config action and enable action are
>>>>>>>>>>>> combined into one netlink msg: VDPA_CMD_DEV_NEW. But in
>>>>>> vduse
>>>>>>>>>>>> case, it needs to be splitted because a chardev should be
>>>>>>>>>>>> created and opened by a userspace process before we enable
>>>>>>>>>>>> the vdpa device (call vdpa_register_device()).
>>>>>>>>>>>>
>>>>>>>>>>>> So I'd like to know whether it's possible (or have some
>>>>>>>>>>>> plans) to add two new netlink msgs something like:
>>>>>>>>>>>> VDPA_CMD_DEV_ENABLE
>>>>>>>> and
>>>>>>>>>>>> VDPA_CMD_DEV_DISABLE to make the config path more
>> flexible.
>>>>>>>>>>> Actually, we've discussed such intermediate step in some early
>>>>>>>>>>> discussion. It looks to me VDUSE could be one of the users of
>>>> this.
>>>>>>>>>>> Or I wonder whether we can switch to use anonymous
>>>>>>>>>>> inode(fd) for VDUSE then fetching it via an
>>>>>>>>>>> VDUSE_GET_DEVICE_FD
>>>> ioctl?
>>>>>>>>>> Yes, we can. Actually the current implementation in VDUSE is
>>>>>>>>>> like this.  But seems like this is still a intermediate step.
>>>>>>>>>> The fd should be binded to a name or something else which need
>>>>>>>>>> to be configured before.
>>>>>>>>> The name could be specified via the netlink. It looks to me the
>>>>>>>>> real issue is that until the device is connected with a
>>>>>>>>> userspace, it can't be used. So we also need to fail the
>>>>>>>>> enabling if it doesn't
>>>>>> opened.
>>>>>>>> Yes, that's true. So you mean we can firstly try to fetch the fd
>>>>>>>> binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then
>> use
>>>>>>>> the name/vduse_id as a attribute to create vdpa device? It looks
>>>>>>>> fine to
>>>> me.
>>>>>>> I probably do not well understand. I tried reading patch [1] and
>>>>>>> few things
>>>>>> do not look correct as below.
>>>>>>> Creating the vdpa device on the bus device and destroying the
>>>>>>> device from
>>>>>> the workqueue seems unnecessary and racy.
>>>>>>> It seems vduse driver needs
>>>>>>> This is something should be done as part of the vdpa dev add
>>>>>>> command,
>>>>>> instead of connecting two sides separately and ensuring race free
>>>>>> access to it.
>>>>>>> So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be
>> avoided.
>>>>>> Yes, we can avoid these two ioctls with the help of the management
>> tool.
>>>>>>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
>>>>>>>
>>>>>>> When above command is executed it creates necessary vdpa device
>>>>>>> foo2
>>>>>> on the bus.
>>>>>>> When user binds foo2 device with the vduse driver, in the probe(),
>>>>>>> it
>>>>>> creates respective char device to access it from user space.
>>>>>>
>>>>> I see. So vduse cannot work with any existing vdpa devices like ifc,
>>>>> mlx5 or
>>>> netdevsim.
>>>>> It has its own implementation similar to fuse with its own backend of
>> choice.
>>>>> More below.
>>>>>
>>>>>> But vduse driver is not a vdpa bus driver. It works like vdpasim
>>>>>> driver, but offloads the data plane and control plane to a user space
>> process.
>>>>> In that case to draw parallel lines,
>>>>>
>>>>> 1. netdevsim:
>>>>> (a) create resources in kernel sw
>>>>> (b) datapath simulates in kernel
>>>>>
>>>>> 2. ifc + mlx5 vdpa dev:
>>>>> (a) creates resource in hw
>>>>> (b) data path is in hw
>>>>>
>>>>> 3. vduse:
>>>>> (a) creates resources in userspace sw
>>>>> (b) data path is in user space.
>>>>> hence creates data path resources for user space.
>>>>> So char device is created, removed as result of vdpa device creation.
>>>>>
>>>>> For example,
>>>>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
>>>>>
>>>>> Above command will create char device for user space.
>>>>>
>>>>> Similar command for ifc/mlx5 would have created similar channel for
>>>>> rest of
>>>> the config commands in hw.
>>>>> vduse channel = char device, eventfd etc.
>>>>> ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
>>>>> channel = sw direct calls
>>>>>
>>>>> Does it make sense?
>>>> In my understanding, to make vdpa work, we need a backend (datapath
>>>> resources) and a frontend (a vdpa device attached to a vdpa bus). In
>>>> the above example, it looks like we use the command "vdpa dev add ..."
>>>>    to create a backend, so do we need another command to create a
>> frontend?
>>> For block device there is certainly some backend to process the IOs.
>>> Sometimes backend to be setup first, before its front end is exposed.
>>> "vdpa dev add" is the front end command who connects to the backend
>> (implicitly) for network device.
>>> vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel).
>>>
>>> And it needs a way to connect to backend when explicitly specified during
>> creation time.
>>> Something like,
>>> $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle
>> <uuid>
>>> In above example some vendor device specific unique handle is passed
>> based on backend setup in hardware/user space.
>>> In below 3 examples, vdpa block simulator is connecting to backend block
>> or file.
>>> $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev
>>> /dev/zero
>>>
>>> $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev
>>> /dev/sda2 size=100M offset=10M
>>>
>>> $ vdpa dev add parentdev vdpa_block filebackend_sim type block name
>>> foo6 file /root/file_backend.txt
>>>
>>> Or may be backend connects to the created vdpa device is bound to the
>> driver.
>>> Can vduse attach to the created vdpa block device through the char device
>> and establish the channel to receive IOs, and to setup the block config space?
>>
>>
>> I think it can work.
>>
>> Another thing I wonder it that, do we consider more than one VDUSE
>> parentdev(or management dev)? This allows us to have separated devices
>> implemented via different processes.
> Multiple parentdev should be possible per one driver. for example mlx5_vdpa.ko will create multiple parent dev, one for each PCI VFs, SFs.
> vdpa dev add can certainly use one parent/mgmt dev to create multiple vdpa devices.
> Not sure why do we need to create multiple parent dev for that.
> I guess there is just one parent/mgmt. dev for VDUSE. What will each mgmtdev do differently?
> Demux of IOs, events will be per individual char dev level?


It could be something like how it works for different hardware vendors. 
E.g IFCVF and mlx5 will register different parentdevs. For userspace, we 
need to allow different software vendors to manage their instances 
individually.

Thanks


>
>> If yes, VDUSE ioctl needs to be extended to register/unregister parentdev.
>>
>> Thanks
>>
>>
>>>> Thanks,
>>>> Yongji
Yongji Xie Dec. 2, 2020, 9:21 a.m. UTC | #19
On Wed, Dec 2, 2020 at 12:53 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Yongji Xie <xieyongji@bytedance.com>
> > Sent: Wednesday, December 2, 2020 9:00 AM
> >
> > On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > Sent: Tuesday, December 1, 2020 7:49 PM
> > > >
> > > > On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > > > Sent: Tuesday, December 1, 2020 3:26 PM
> > > > > >
> > > > > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com>
> > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > > > > >>> Thanks for adding me, Jason!
> > > > > > > >>>
> > > > > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> > > > > > > >>> Userspace) [1]. This tool is very useful for the vduse device.
> > > > > > > >>> So I'm considering integrating this into my v2 patchset.
> > > > > > > >>> But there is one problem:
> > > > > > > >>>
> > > > > > > >>> In this tool, vdpa device config action and enable action
> > > > > > > >>> are combined into one netlink msg: VDPA_CMD_DEV_NEW. But
> > > > > > > >>> in
> > > > vduse
> > > > > > > >>> case, it needs to be splitted because a chardev should be
> > > > > > > >>> created and opened by a userspace process before we enable
> > > > > > > >>> the vdpa device (call vdpa_register_device()).
> > > > > > > >>>
> > > > > > > >>> So I'd like to know whether it's possible (or have some
> > > > > > > >>> plans) to add two new netlink msgs something like:
> > > > > > > >>> VDPA_CMD_DEV_ENABLE
> > > > > > and
> > > > > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> > > > > > > >>>
> > > > > > > >> Actually, we've discussed such intermediate step in some
> > > > > > > >> early discussion. It looks to me VDUSE could be one of the users of
> > this.
> > > > > > > >>
> > > > > > > >> Or I wonder whether we can switch to use anonymous
> > > > > > > >> inode(fd) for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD
> > ioctl?
> > > > > > > >>
> > > > > > > > Yes, we can. Actually the current implementation in VDUSE is
> > > > > > > > like this.  But seems like this is still a intermediate step.
> > > > > > > > The fd should be binded to a name or something else which
> > > > > > > > need to be configured before.
> > > > > > >
> > > > > > >
> > > > > > > The name could be specified via the netlink. It looks to me
> > > > > > > the real issue is that until the device is connected with a
> > > > > > > userspace, it can't be used. So we also need to fail the
> > > > > > > enabling if it doesn't
> > > > opened.
> > > > > > >
> > > > > >
> > > > > > Yes, that's true. So you mean we can firstly try to fetch the fd
> > > > > > binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use
> > > > > > the name/vduse_id as a attribute to create vdpa device? It looks fine to
> > me.
> > > > >
> > > > > I probably do not well understand. I tried reading patch [1] and
> > > > > few things
> > > > do not look correct as below.
> > > > > Creating the vdpa device on the bus device and destroying the
> > > > > device from
> > > > the workqueue seems unnecessary and racy.
> > > > >
> > > > > It seems vduse driver needs
> > > > > This is something should be done as part of the vdpa dev add
> > > > > command,
> > > > instead of connecting two sides separately and ensuring race free
> > > > access to it.
> > > > >
> > > > > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
> > > > >
> > > >
> > > > Yes, we can avoid these two ioctls with the help of the management tool.
> > > >
> > > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > > >
> > > > > When above command is executed it creates necessary vdpa device
> > > > > foo2
> > > > on the bus.
> > > > > When user binds foo2 device with the vduse driver, in the probe(),
> > > > > it
> > > > creates respective char device to access it from user space.
> > > >
> > > I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 or
> > netdevsim.
> > > It has its own implementation similar to fuse with its own backend of choice.
> > > More below.
> > >
> > > > But vduse driver is not a vdpa bus driver. It works like vdpasim
> > > > driver, but offloads the data plane and control plane to a user space process.
> > >
> > > In that case to draw parallel lines,
> > >
> > > 1. netdevsim:
> > > (a) create resources in kernel sw
> > > (b) datapath simulates in kernel
> > >
> > > 2. ifc + mlx5 vdpa dev:
> > > (a) creates resource in hw
> > > (b) data path is in hw
> > >
> > > 3. vduse:
> > > (a) creates resources in userspace sw
> > > (b) data path is in user space.
> > > hence creates data path resources for user space.
> > > So char device is created, removed as result of vdpa device creation.
> > >
> > > For example,
> > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > >
> > > Above command will create char device for user space.
> > >
> > > Similar command for ifc/mlx5 would have created similar channel for rest of
> > the config commands in hw.
> > > vduse channel = char device, eventfd etc.
> > > ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
> > > channel = sw direct calls
> > >
> > > Does it make sense?
> >
> > In my understanding, to make vdpa work, we need a backend (datapath
> > resources) and a frontend (a vdpa device attached to a vdpa bus). In the above
> > example, it looks like we use the command "vdpa dev add ..."
> >  to create a backend, so do we need another command to create a frontend?
> >
> For block device there is certainly some backend to process the IOs.
> Sometimes backend to be setup first, before its front end is exposed.

Yes, the backend need to be setup firstly, this is vendor device
specific, not vdpa specific.

> "vdpa dev add" is the front end command who connects to the backend (implicitly) for network device.
>
> vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel).
>
> And it needs a way to connect to backend when explicitly specified during creation time.
> Something like,
> $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle <uuid>
> In above example some vendor device specific unique handle is passed based on backend setup in hardware/user space.
>

Yes, we can work like this. After we setup a backend through an
anonymous inode(fd) from /dev/vduse, we can get a unique handle. Then
use it to create a frontend which will connect to the specific
backend.

> In below 3 examples, vdpa block simulator is connecting to backend block or file.
>
> $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev /dev/zero
>
> $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev /dev/sda2 size=100M offset=10M
>
> $ vdpa dev add parentdev vdpa_block filebackend_sim type block name foo6 file /root/file_backend.txt
>
> Or may be backend connects to the created vdpa device is bound to the driver.
> Can vduse attach to the created vdpa block device through the char device and establish the channel to receive IOs, and to setup the block config space?
>

How to create the vdpa block device? If we use the command "vdpa dev
add..", the command will hang there until a vduse process attaches to
the vdpa block device.

Thanks,
Yongji
Yongji Xie Dec. 2, 2020, 9:27 a.m. UTC | #20
On Wed, Dec 2, 2020 at 1:51 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/12/2 下午12:53, Parav Pandit wrote:
> >
> >> From: Yongji Xie <xieyongji@bytedance.com>
> >> Sent: Wednesday, December 2, 2020 9:00 AM
> >>
> >> On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
> >>>
> >>>
> >>>> From: Yongji Xie <xieyongji@bytedance.com>
> >>>> Sent: Tuesday, December 1, 2020 7:49 PM
> >>>>
> >>>> On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com> wrote:
> >>>>>
> >>>>>
> >>>>>> From: Yongji Xie <xieyongji@bytedance.com>
> >>>>>> Sent: Tuesday, December 1, 2020 3:26 PM
> >>>>>>
> >>>>>> On Tue, Dec 1, 2020 at 2:25 PM Jason Wang <jasowang@redhat.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>> On 2020/11/30 下午3:07, Yongji Xie wrote:
> >>>>>>>>>> Thanks for adding me, Jason!
> >>>>>>>>>>
> >>>>>>>>>> Now I'm working on a v2 patchset for VDUSE (vDPA Device in
> >>>>>>>>>> Userspace) [1]. This tool is very useful for the vduse device.
> >>>>>>>>>> So I'm considering integrating this into my v2 patchset.
> >>>>>>>>>> But there is one problem:
> >>>>>>>>>>
> >>>>>>>>>> In this tool, vdpa device config action and enable action
> >>>>>>>>>> are combined into one netlink msg: VDPA_CMD_DEV_NEW. But
> >>>>>>>>>> in
> >>>> vduse
> >>>>>>>>>> case, it needs to be splitted because a chardev should be
> >>>>>>>>>> created and opened by a userspace process before we enable
> >>>>>>>>>> the vdpa device (call vdpa_register_device()).
> >>>>>>>>>>
> >>>>>>>>>> So I'd like to know whether it's possible (or have some
> >>>>>>>>>> plans) to add two new netlink msgs something like:
> >>>>>>>>>> VDPA_CMD_DEV_ENABLE
> >>>>>> and
> >>>>>>>>>> VDPA_CMD_DEV_DISABLE to make the config path more flexible.
> >>>>>>>>>>
> >>>>>>>>> Actually, we've discussed such intermediate step in some
> >>>>>>>>> early discussion. It looks to me VDUSE could be one of the users of
> >> this.
> >>>>>>>>> Or I wonder whether we can switch to use anonymous
> >>>>>>>>> inode(fd) for VDUSE then fetching it via an VDUSE_GET_DEVICE_FD
> >> ioctl?
> >>>>>>>> Yes, we can. Actually the current implementation in VDUSE is
> >>>>>>>> like this.  But seems like this is still a intermediate step.
> >>>>>>>> The fd should be binded to a name or something else which
> >>>>>>>> need to be configured before.
> >>>>>>>
> >>>>>>> The name could be specified via the netlink. It looks to me
> >>>>>>> the real issue is that until the device is connected with a
> >>>>>>> userspace, it can't be used. So we also need to fail the
> >>>>>>> enabling if it doesn't
> >>>> opened.
> >>>>>> Yes, that's true. So you mean we can firstly try to fetch the fd
> >>>>>> binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD, then use
> >>>>>> the name/vduse_id as a attribute to create vdpa device? It looks fine to
> >> me.
> >>>>> I probably do not well understand. I tried reading patch [1] and
> >>>>> few things
> >>>> do not look correct as below.
> >>>>> Creating the vdpa device on the bus device and destroying the
> >>>>> device from
> >>>> the workqueue seems unnecessary and racy.
> >>>>> It seems vduse driver needs
> >>>>> This is something should be done as part of the vdpa dev add
> >>>>> command,
> >>>> instead of connecting two sides separately and ensuring race free
> >>>> access to it.
> >>>>> So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be avoided.
> >>>>>
> >>>> Yes, we can avoid these two ioctls with the help of the management tool.
> >>>>
> >>>>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> >>>>>
> >>>>> When above command is executed it creates necessary vdpa device
> >>>>> foo2
> >>>> on the bus.
> >>>>> When user binds foo2 device with the vduse driver, in the probe(),
> >>>>> it
> >>>> creates respective char device to access it from user space.
> >>>>
> >>> I see. So vduse cannot work with any existing vdpa devices like ifc, mlx5 or
> >> netdevsim.
> >>> It has its own implementation similar to fuse with its own backend of choice.
> >>> More below.
> >>>
> >>>> But vduse driver is not a vdpa bus driver. It works like vdpasim
> >>>> driver, but offloads the data plane and control plane to a user space process.
> >>> In that case to draw parallel lines,
> >>>
> >>> 1. netdevsim:
> >>> (a) create resources in kernel sw
> >>> (b) datapath simulates in kernel
> >>>
> >>> 2. ifc + mlx5 vdpa dev:
> >>> (a) creates resource in hw
> >>> (b) data path is in hw
> >>>
> >>> 3. vduse:
> >>> (a) creates resources in userspace sw
> >>> (b) data path is in user space.
> >>> hence creates data path resources for user space.
> >>> So char device is created, removed as result of vdpa device creation.
> >>>
> >>> For example,
> >>> $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> >>>
> >>> Above command will create char device for user space.
> >>>
> >>> Similar command for ifc/mlx5 would have created similar channel for rest of
> >> the config commands in hw.
> >>> vduse channel = char device, eventfd etc.
> >>> ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
> >>> channel = sw direct calls
> >>>
> >>> Does it make sense?
> >> In my understanding, to make vdpa work, we need a backend (datapath
> >> resources) and a frontend (a vdpa device attached to a vdpa bus). In the above
> >> example, it looks like we use the command "vdpa dev add ..."
> >>   to create a backend, so do we need another command to create a frontend?
> >>
> > For block device there is certainly some backend to process the IOs.
> > Sometimes backend to be setup first, before its front end is exposed.
> > "vdpa dev add" is the front end command who connects to the backend (implicitly) for network device.
> >
> > vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel).
> >
> > And it needs a way to connect to backend when explicitly specified during creation time.
> > Something like,
> > $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle <uuid>
> > In above example some vendor device specific unique handle is passed based on backend setup in hardware/user space.
> >
> > In below 3 examples, vdpa block simulator is connecting to backend block or file.
> >
> > $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev /dev/zero
> >
> > $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev /dev/sda2 size=100M offset=10M
> >
> > $ vdpa dev add parentdev vdpa_block filebackend_sim type block name foo6 file /root/file_backend.txt
> >
> > Or may be backend connects to the created vdpa device is bound to the driver.
> > Can vduse attach to the created vdpa block device through the char device and establish the channel to receive IOs, and to setup the block config space?
>
>
> I think it can work.
>
> Another thing I wonder it that, do we consider more than one VDUSE
> parentdev(or management dev)? This allows us to have separated devices
> implemented via different processes.
>
> If yes, VDUSE ioctl needs to be extended to register/unregister parentdev.
>

Yes, we need to extend the ioctl to support that. Now we only have one
parentdev represented by /dev/vduse.

Thanks,
Yongji
Parav Pandit Dec. 2, 2020, 11:13 a.m. UTC | #21
> From: Yongji Xie <xieyongji@bytedance.com>
> Sent: Wednesday, December 2, 2020 2:52 PM
> 
> On Wed, Dec 2, 2020 at 12:53 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Yongji Xie <xieyongji@bytedance.com>
> > > Sent: Wednesday, December 2, 2020 9:00 AM
> > >
> > > On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > >
> > > >
> > > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > > Sent: Tuesday, December 1, 2020 7:49 PM
> > > > >
> > > > > On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > > > > Sent: Tuesday, December 1, 2020 3:26 PM
> > > > > > >
> > > > > > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang
> > > > > > > <jasowang@redhat.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > > > > > >>> Thanks for adding me, Jason!
> > > > > > > > >>>
> > > > > > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA
> > > > > > > > >>> Device in
> > > > > > > > >>> Userspace) [1]. This tool is very useful for the vduse device.
> > > > > > > > >>> So I'm considering integrating this into my v2 patchset.
> > > > > > > > >>> But there is one problem:
> > > > > > > > >>>
> > > > > > > > >>> In this tool, vdpa device config action and enable
> > > > > > > > >>> action are combined into one netlink msg:
> > > > > > > > >>> VDPA_CMD_DEV_NEW. But in
> > > > > vduse
> > > > > > > > >>> case, it needs to be splitted because a chardev should
> > > > > > > > >>> be created and opened by a userspace process before we
> > > > > > > > >>> enable the vdpa device (call vdpa_register_device()).
> > > > > > > > >>>
> > > > > > > > >>> So I'd like to know whether it's possible (or have
> > > > > > > > >>> some
> > > > > > > > >>> plans) to add two new netlink msgs something like:
> > > > > > > > >>> VDPA_CMD_DEV_ENABLE
> > > > > > > and
> > > > > > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more
> flexible.
> > > > > > > > >>>
> > > > > > > > >> Actually, we've discussed such intermediate step in
> > > > > > > > >> some early discussion. It looks to me VDUSE could be
> > > > > > > > >> one of the users of
> > > this.
> > > > > > > > >>
> > > > > > > > >> Or I wonder whether we can switch to use anonymous
> > > > > > > > >> inode(fd) for VDUSE then fetching it via an
> > > > > > > > >> VDUSE_GET_DEVICE_FD
> > > ioctl?
> > > > > > > > >>
> > > > > > > > > Yes, we can. Actually the current implementation in
> > > > > > > > > VDUSE is like this.  But seems like this is still a intermediate
> step.
> > > > > > > > > The fd should be binded to a name or something else
> > > > > > > > > which need to be configured before.
> > > > > > > >
> > > > > > > >
> > > > > > > > The name could be specified via the netlink. It looks to
> > > > > > > > me the real issue is that until the device is connected
> > > > > > > > with a userspace, it can't be used. So we also need to
> > > > > > > > fail the enabling if it doesn't
> > > > > opened.
> > > > > > > >
> > > > > > >
> > > > > > > Yes, that's true. So you mean we can firstly try to fetch
> > > > > > > the fd binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD,
> > > > > > > then use the name/vduse_id as a attribute to create vdpa
> > > > > > > device? It looks fine to
> > > me.
> > > > > >
> > > > > > I probably do not well understand. I tried reading patch [1]
> > > > > > and few things
> > > > > do not look correct as below.
> > > > > > Creating the vdpa device on the bus device and destroying the
> > > > > > device from
> > > > > the workqueue seems unnecessary and racy.
> > > > > >
> > > > > > It seems vduse driver needs
> > > > > > This is something should be done as part of the vdpa dev add
> > > > > > command,
> > > > > instead of connecting two sides separately and ensuring race
> > > > > free access to it.
> > > > > >
> > > > > > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be
> avoided.
> > > > > >
> > > > >
> > > > > Yes, we can avoid these two ioctls with the help of the management
> tool.
> > > > >
> > > > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > > > >
> > > > > > When above command is executed it creates necessary vdpa
> > > > > > device
> > > > > > foo2
> > > > > on the bus.
> > > > > > When user binds foo2 device with the vduse driver, in the
> > > > > > probe(), it
> > > > > creates respective char device to access it from user space.
> > > > >
> > > > I see. So vduse cannot work with any existing vdpa devices like
> > > > ifc, mlx5 or
> > > netdevsim.
> > > > It has its own implementation similar to fuse with its own backend of
> choice.
> > > > More below.
> > > >
> > > > > But vduse driver is not a vdpa bus driver. It works like vdpasim
> > > > > driver, but offloads the data plane and control plane to a user space
> process.
> > > >
> > > > In that case to draw parallel lines,
> > > >
> > > > 1. netdevsim:
> > > > (a) create resources in kernel sw
> > > > (b) datapath simulates in kernel
> > > >
> > > > 2. ifc + mlx5 vdpa dev:
> > > > (a) creates resource in hw
> > > > (b) data path is in hw
> > > >
> > > > 3. vduse:
> > > > (a) creates resources in userspace sw
> > > > (b) data path is in user space.
> > > > hence creates data path resources for user space.
> > > > So char device is created, removed as result of vdpa device creation.
> > > >
> > > > For example,
> > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > >
> > > > Above command will create char device for user space.
> > > >
> > > > Similar command for ifc/mlx5 would have created similar channel
> > > > for rest of
> > > the config commands in hw.
> > > > vduse channel = char device, eventfd etc.
> > > > ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
> > > > channel = sw direct calls
> > > >
> > > > Does it make sense?
> > >
> > > In my understanding, to make vdpa work, we need a backend (datapath
> > > resources) and a frontend (a vdpa device attached to a vdpa bus). In
> > > the above example, it looks like we use the command "vdpa dev add ..."
> > >  to create a backend, so do we need another command to create a
> frontend?
> > >
> > For block device there is certainly some backend to process the IOs.
> > Sometimes backend to be setup first, before its front end is exposed.
> 
> Yes, the backend need to be setup firstly, this is vendor device specific, not
> vdpa specific.
> 
> > "vdpa dev add" is the front end command who connects to the backend
> (implicitly) for network device.
> >
> > vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel).
> >
> > And it needs a way to connect to backend when explicitly specified during
> creation time.
> > Something like,
> > $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle
> <uuid>
> > In above example some vendor device specific unique handle is passed
> based on backend setup in hardware/user space.
> >
> 
> Yes, we can work like this. After we setup a backend through an anonymous
> inode(fd) from /dev/vduse, we can get a unique handle. Then use it to
> create a frontend which will connect to the specific backend.

I do not fully understand the inode. But I assume this is some unique handle say uuid or something that both sides backend and vdpa device understand.
It cannot be some kernel internal handle expose to user space.

> 
> > In below 3 examples, vdpa block simulator is connecting to backend block
> or file.
> >
> > $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev
> > /dev/zero
> >
> > $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev
> > /dev/sda2 size=100M offset=10M
> >
> > $ vdpa dev add parentdev vdpa_block filebackend_sim type block name
> > foo6 file /root/file_backend.txt
> >
> > Or may be backend connects to the created vdpa device is bound to the
> driver.
> > Can vduse attach to the created vdpa block device through the char device
> and establish the channel to receive IOs, and to setup the block config space?
> >
> 
> How to create the vdpa block device? If we use the command "vdpa dev
> add..", the command will hang there until a vduse process attaches to the
> vdpa block device.
I was suggesting that vdpa device is created, but it doesn’t have backend attached to it.
It is attached to the backend when ioctl() side does enough setup. This state is handled internally the vduse driver.

But the above method of preparing backend looks more sane.

Regardless of which method is preferred, vduse driver must need a state to detach the vdpa bus device queues etc from the user space.
This is needed because user space process can terminate anytime resulting in detaching dpa bus device in_use by the vhost side.
Yongji Xie Dec. 2, 2020, 1:18 p.m. UTC | #22
On Wed, Dec 2, 2020 at 7:13 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Yongji Xie <xieyongji@bytedance.com>
> > Sent: Wednesday, December 2, 2020 2:52 PM
> >
> > On Wed, Dec 2, 2020 at 12:53 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > Sent: Wednesday, December 2, 2020 9:00 AM
> > > >
> > > > On Tue, Dec 1, 2020 at 11:59 PM Parav Pandit <parav@nvidia.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > > > Sent: Tuesday, December 1, 2020 7:49 PM
> > > > > >
> > > > > > On Tue, Dec 1, 2020 at 7:32 PM Parav Pandit <parav@nvidia.com>
> > wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > From: Yongji Xie <xieyongji@bytedance.com>
> > > > > > > > Sent: Tuesday, December 1, 2020 3:26 PM
> > > > > > > >
> > > > > > > > On Tue, Dec 1, 2020 at 2:25 PM Jason Wang
> > > > > > > > <jasowang@redhat.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 2020/11/30 下午3:07, Yongji Xie wrote:
> > > > > > > > > >>> Thanks for adding me, Jason!
> > > > > > > > > >>>
> > > > > > > > > >>> Now I'm working on a v2 patchset for VDUSE (vDPA
> > > > > > > > > >>> Device in
> > > > > > > > > >>> Userspace) [1]. This tool is very useful for the vduse device.
> > > > > > > > > >>> So I'm considering integrating this into my v2 patchset.
> > > > > > > > > >>> But there is one problem:
> > > > > > > > > >>>
> > > > > > > > > >>> In this tool, vdpa device config action and enable
> > > > > > > > > >>> action are combined into one netlink msg:
> > > > > > > > > >>> VDPA_CMD_DEV_NEW. But in
> > > > > > vduse
> > > > > > > > > >>> case, it needs to be splitted because a chardev should
> > > > > > > > > >>> be created and opened by a userspace process before we
> > > > > > > > > >>> enable the vdpa device (call vdpa_register_device()).
> > > > > > > > > >>>
> > > > > > > > > >>> So I'd like to know whether it's possible (or have
> > > > > > > > > >>> some
> > > > > > > > > >>> plans) to add two new netlink msgs something like:
> > > > > > > > > >>> VDPA_CMD_DEV_ENABLE
> > > > > > > > and
> > > > > > > > > >>> VDPA_CMD_DEV_DISABLE to make the config path more
> > flexible.
> > > > > > > > > >>>
> > > > > > > > > >> Actually, we've discussed such intermediate step in
> > > > > > > > > >> some early discussion. It looks to me VDUSE could be
> > > > > > > > > >> one of the users of
> > > > this.
> > > > > > > > > >>
> > > > > > > > > >> Or I wonder whether we can switch to use anonymous
> > > > > > > > > >> inode(fd) for VDUSE then fetching it via an
> > > > > > > > > >> VDUSE_GET_DEVICE_FD
> > > > ioctl?
> > > > > > > > > >>
> > > > > > > > > > Yes, we can. Actually the current implementation in
> > > > > > > > > > VDUSE is like this.  But seems like this is still a intermediate
> > step.
> > > > > > > > > > The fd should be binded to a name or something else
> > > > > > > > > > which need to be configured before.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The name could be specified via the netlink. It looks to
> > > > > > > > > me the real issue is that until the device is connected
> > > > > > > > > with a userspace, it can't be used. So we also need to
> > > > > > > > > fail the enabling if it doesn't
> > > > > > opened.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, that's true. So you mean we can firstly try to fetch
> > > > > > > > the fd binded to a name/vduse_id via an VDUSE_GET_DEVICE_FD,
> > > > > > > > then use the name/vduse_id as a attribute to create vdpa
> > > > > > > > device? It looks fine to
> > > > me.
> > > > > > >
> > > > > > > I probably do not well understand. I tried reading patch [1]
> > > > > > > and few things
> > > > > > do not look correct as below.
> > > > > > > Creating the vdpa device on the bus device and destroying the
> > > > > > > device from
> > > > > > the workqueue seems unnecessary and racy.
> > > > > > >
> > > > > > > It seems vduse driver needs
> > > > > > > This is something should be done as part of the vdpa dev add
> > > > > > > command,
> > > > > > instead of connecting two sides separately and ensuring race
> > > > > > free access to it.
> > > > > > >
> > > > > > > So VDUSE_DEV_START and VDUSE_DEV_STOP should possibly be
> > avoided.
> > > > > > >
> > > > > >
> > > > > > Yes, we can avoid these two ioctls with the help of the management
> > tool.
> > > > > >
> > > > > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > > > > >
> > > > > > > When above command is executed it creates necessary vdpa
> > > > > > > device
> > > > > > > foo2
> > > > > > on the bus.
> > > > > > > When user binds foo2 device with the vduse driver, in the
> > > > > > > probe(), it
> > > > > > creates respective char device to access it from user space.
> > > > > >
> > > > > I see. So vduse cannot work with any existing vdpa devices like
> > > > > ifc, mlx5 or
> > > > netdevsim.
> > > > > It has its own implementation similar to fuse with its own backend of
> > choice.
> > > > > More below.
> > > > >
> > > > > > But vduse driver is not a vdpa bus driver. It works like vdpasim
> > > > > > driver, but offloads the data plane and control plane to a user space
> > process.
> > > > >
> > > > > In that case to draw parallel lines,
> > > > >
> > > > > 1. netdevsim:
> > > > > (a) create resources in kernel sw
> > > > > (b) datapath simulates in kernel
> > > > >
> > > > > 2. ifc + mlx5 vdpa dev:
> > > > > (a) creates resource in hw
> > > > > (b) data path is in hw
> > > > >
> > > > > 3. vduse:
> > > > > (a) creates resources in userspace sw
> > > > > (b) data path is in user space.
> > > > > hence creates data path resources for user space.
> > > > > So char device is created, removed as result of vdpa device creation.
> > > > >
> > > > > For example,
> > > > > $ vdpa dev add parentdev vduse_mgmtdev type net name foo2
> > > > >
> > > > > Above command will create char device for user space.
> > > > >
> > > > > Similar command for ifc/mlx5 would have created similar channel
> > > > > for rest of
> > > > the config commands in hw.
> > > > > vduse channel = char device, eventfd etc.
> > > > > ifc/mlx5 hw channel = bar, irq, command interface etc Netdev sim
> > > > > channel = sw direct calls
> > > > >
> > > > > Does it make sense?
> > > >
> > > > In my understanding, to make vdpa work, we need a backend (datapath
> > > > resources) and a frontend (a vdpa device attached to a vdpa bus). In
> > > > the above example, it looks like we use the command "vdpa dev add ..."
> > > >  to create a backend, so do we need another command to create a
> > frontend?
> > > >
> > > For block device there is certainly some backend to process the IOs.
> > > Sometimes backend to be setup first, before its front end is exposed.
> >
> > Yes, the backend need to be setup firstly, this is vendor device specific, not
> > vdpa specific.
> >
> > > "vdpa dev add" is the front end command who connects to the backend
> > (implicitly) for network device.
> > >
> > > vhost->vdpa_block_device->backend_io_processor (usr,hw,kernel).
> > >
> > > And it needs a way to connect to backend when explicitly specified during
> > creation time.
> > > Something like,
> > > $ vdpa dev add parentdev vdpa_vduse type block name foo3 handle
> > <uuid>
> > > In above example some vendor device specific unique handle is passed
> > based on backend setup in hardware/user space.
> > >
> >
> > Yes, we can work like this. After we setup a backend through an anonymous
> > inode(fd) from /dev/vduse, we can get a unique handle. Then use it to
> > create a frontend which will connect to the specific backend.
>
> I do not fully understand the inode. But I assume this is some unique handle say uuid or something that both sides backend and vdpa device understand.
> It cannot be some kernel internal handle expose to user space.
>

Yes, the unique handle should be a user-defined stuff.

> >
> > > In below 3 examples, vdpa block simulator is connecting to backend block
> > or file.
> > >
> > > $ vdpa dev add parentdev vdpa_blocksim type block name foo4 blockdev
> > > /dev/zero
> > >
> > > $ vdpa dev add parentdev vdpa_blocksim type block name foo5 blockdev
> > > /dev/sda2 size=100M offset=10M
> > >
> > > $ vdpa dev add parentdev vdpa_block filebackend_sim type block name
> > > foo6 file /root/file_backend.txt
> > >
> > > Or may be backend connects to the created vdpa device is bound to the
> > driver.
> > > Can vduse attach to the created vdpa block device through the char device
> > and establish the channel to receive IOs, and to setup the block config space?
> > >
> >
> > How to create the vdpa block device? If we use the command "vdpa dev
> > add..", the command will hang there until a vduse process attaches to the
> > vdpa block device.
> I was suggesting that vdpa device is created, but it doesn’t have backend attached to it.
> It is attached to the backend when ioctl() side does enough setup. This state is handled internally the vduse driver.
>
> But the above method of preparing backend looks more sane.
>
> Regardless of which method is preferred, vduse driver must need a state to detach the vdpa bus device queues etc from the user space.
> This is needed because user space process can terminate anytime resulting in detaching dpa bus device in_use by the vhost side.

I think the vdpa device should only be detached by the command "vdpa
dev del...". The vduse driver can support reconnecting when user space
process is terminated.

Thanks,
Yongji
David Ahern Dec. 8, 2020, 10:47 p.m. UTC | #23
On 11/26/20 8:53 PM, Jason Wang wrote:
> 1. Where does userspace vdpa tool reside which users can use?
> Ans: vdpa tool can possibly reside in iproute2 [1] as it enables user to
> create vdpa net devices.

iproute2 package is fine with us, but there are some expectations:
syntax, command options and documentation need to be consistent with
other iproute2 commands (this thread suggests it will be but just being
clear), and it needs to re-use code as much as possible (e.g., json
functions). If there is overlap with other tools (devlink, dcb, etc),
you should refactor into common code used by all. Petr Machata has done
this quite a bit for dcb and is a good example to follow.
Michael S. Tsirkin Dec. 16, 2020, 9:13 a.m. UTC | #24
On Tue, Nov 17, 2020 at 07:51:56PM +0000, Parav Pandit wrote:
> 
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Tuesday, November 17, 2020 3:53 AM
> > 
> > On Thu, 12 Nov 2020 08:39:58 +0200 Parav Pandit wrote:
> > > FAQs:
> > > -----
> > > 1. Where does userspace vdpa tool reside which users can use?
> > > Ans: vdpa tool can possibly reside in iproute2 [1] as it enables user
> > > to create vdpa net devices.
> > >
> > > 2. Why not create and delete vdpa device using sysfs/configfs?
> > > Ans:
> > 
> > > 3. Why not use ioctl() interface?
> > 
> > Obviously I'm gonna ask you - why can't you use devlink?
> > 
> This was considered.
> However it seems that extending devlink for vdpa specific stats, devices, config sounds overloading devlink beyond its defined scope.

kuba what's your thinking here? Should I merge this as is?

> > > Next steps:
> > > -----------
> > > (a) Post this patchset and iproute2/vdpa inclusion, remaining two
> > > drivers will be coverted to support vdpa tool instead of creating
> > > unmanaged default device on driver load.
> > > (b) More net specific parameters such as mac, mtu will be added.
> > 
> > How does MAC and MTU belong in this new VDPA thing?
> MAC only make sense when user wants to run VF/SF Netdev and vdpa together with different mac address.
> Otherwise existing devlink well defined API to have one MAC per function is fine.
> Same for MTU, if queues of vdpa vs VF/SF Netdev queues wants have different MTU it make sense to add configure per vdpa device.
Michael S. Tsirkin Dec. 16, 2020, 9:16 a.m. UTC | #25
On Thu, Nov 12, 2020 at 08:39:58AM +0200, Parav Pandit wrote:
> This patchset covers user requirements for managing existing vdpa devices,
> using a tool and its internal design notes for kernel drivers.


I applied bugfix patches 1 and 2.
Others conflict with vdpa sim block support, pls rebase.


> Background and user requirements:
> ----------------------------------
> (1) Currently VDPA device is created by driver when driver is loaded.
> However, user should have a choice when to create or not create a vdpa device
> for the underlying parent device.
> 
> For example, mlx5 PCI VF and subfunction device supports multiple classes of
> device such netdev, vdpa, rdma. Howevever it is not required to always created
> vdpa device for such device.
> 
> (2) In another use case, a device may support creating one or multiple vdpa
> device of same or different class such as net and block.
> Creating vdpa devices at driver load time further limits this use case.
> 
> (3) A user should be able to monitor and query vdpa queue level or device level
> statistics for a given vdpa device.
> 
> (4) A user should be able to query what class of vdpa devices are supported
> by its parent device.
> 
> (5) A user should be able to view supported features and negotiated features
> of the vdpa device.
> 
> (6) A user should be able to create a vdpa device in vendor agnostic manner
> using single tool.
> 
> Hence, it is required to have a tool through which user can create one or more
> vdpa devices from a parent device which addresses above user requirements.
> 
> Example devices:
> ----------------
>  +-----------+ +-----------+ +---------+ +--------+ +-----------+ 
>  |vdpa dev 0 | |vdpa dev 1 | |rdma dev | |netdev  | |vdpa dev 3 |
>  |type=net   | |type=block | |mlx5_0   | |ens3f0  | |type=net   |
>  +----+------+ +-----+-----+ +----+----+ +-----+--+ +----+------+
>       |              |            |            |         |
>       |              |            |            |         |
>  +----+-----+        |       +----+----+       |    +----+----+
>  |  mlx5    +--------+       |mlx5     +-------+    |mlx5     |
>  |pci vf 2  |                |pci vf 4 |            |pci sf 8 |
>  |03:00:2   |                |03:00.4  |            |mlx5_sf.8|
>  +----+-----+                +----+----+            +----+----+
>       |                           |                      |
>       |                      +----+-----+                |
>       +----------------------+mlx5      +----------------+
>                              |pci pf 0  |
>                              |03:00.0   |
>                              +----------+
> 
> vdpa tool:
> ----------
> vdpa tool is a tool to create, delete vdpa devices from a parent device. It is a
> tool that enables user to query statistics, features and may be more attributes
> in future.
> 
> vdpa tool command draft:
> ------------------------
> (a) List parent devices which supports creating vdpa devices.
> It also shows which class types supported by this parent device.
> In below command example two parent devices support vdpa device creation.
> First is PCI VF whose bdf is 03.00:2.
> Second is PCI VF whose name is 03:00.4.
> Third is PCI SF whose name is mlx5_core.sf.8
> 
> $ vdpa parentdev list
> vdpasim
>   supported_classes
>     net
> pci/0000:03.00:3
>   supported_classes
>     net block
> pci/0000:03.00:4
>   supported_classes
>     net block
> auxiliary/mlx5_core.sf.8
>   supported_classes
>     net
> 
> (b) Now add a vdpa device of networking class and show the device.
> $ vdpa dev add parentdev pci/0000:03.00:2 type net name foo0 $ vdpa dev show foo0
> foo0: parentdev pci/0000:03.00:2 type network parentdev vdpasim vendor_id 0 max_vqs 2 max_vq_size 256
> 
> (c) Show features of a vdpa device
> $ vdpa dev features show foo0
> supported
>   iommu platform
>   version 1
> 
> (d) Dump vdpa device statistics
> $ vdpa dev stats show foo0
> kickdoorbells 10
> wqes 100
> 
> (e) Now delete a vdpa device previously created.
> $ vdpa dev del foo0
> 
> vdpa tool support in this patchset:
> -----------------------------------
> vdpa tool is created to create, delete and query vdpa devices.
> examples:
> Show vdpa parent device that supports creating, deleting vdpa devices.
> 
> $ vdpa parentdev show
> vdpasim:
>   supported_classes
>     net
> 
> $ vdpa parentdev show -jp
> {
>     "show": {
>        "vdpasim": {
>           "supported_classes": {
>              "net"
>         }
>     }
> }
> 
> Create a vdpa device of type networking named as "foo2" from the parent device vdpasim:
> 
> $ vdpa dev add parentdev vdpasim type net name foo2
> 
> Show the newly created vdpa device by its name:
> $ vdpa dev show foo2
> foo2: type network parentdev vdpasim vendor_id 0 max_vqs 2 max_vq_size 256
> 
> $ vdpa dev show foo2 -jp
> {
>     "dev": {
>         "foo2": {
>             "type": "network",
>             "parentdev": "vdpasim",
>             "vendor_id": 0,
>             "max_vqs": 2,
>             "max_vq_size": 256
>         }
>     }
> }
> 
> Delete the vdpa device after its use:
> $ vdpa dev del foo2
> 
> vdpa tool support by kernel:
> ----------------------------
> vdpa tool user interface will be supported by existing vdpa kernel framework,
> i.e. drivers/vdpa/vdpa.c It services user command through a netlink interface.
> 
> Each parent device registers supported callback operations with vdpa subsystem
> through which vdpa device(s) can be managed.
> 
> FAQs:
> -----
> 1. Where does userspace vdpa tool reside which users can use?
> Ans: vdpa tool can possibly reside in iproute2 [1] as it enables user to
> create vdpa net devices.
> 
> 2. Why not create and delete vdpa device using sysfs/configfs?
> Ans:
> (a) A device creation may involve passing one or more attributes.
> Passing multiple attributes and returning error code and more verbose
> information for invalid attributes cannot be handled by sysfs/configfs.
> 
> (b) netlink framework is rich that enables user space and kernel driver to
> provide nested attributes.
> 
> (c) Exposing device specific file under sysfs without net namespace
> awareness exposes details to multiple containers. Instead exposing
> attributes via a netlink socket secures the communication channel with kernel.
> 
> (d) netlink socket interface enables to run syscaller kernel tests.
> 
> 3. Why not use ioctl() interface?
> Ans: ioctl() interface replicates the necessary plumbing which already
> exists through netlink socket.
> 
> 4. What happens when one or more user created vdpa devices exist for a
> parent PCI VF or SF and such parent device is removed?
> Ans: All user created vdpa devices are removed that belong to a parent.
> 
> [1] git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
> 
> Next steps:
> -----------
> (a) Post this patchset and iproute2/vdpa inclusion, remaining two drivers
> will be coverted to support vdpa tool instead of creating unmanaged default
> device on driver load.
> (b) More net specific parameters such as mac, mtu will be added.
> (c) Features bits get and set interface will be added.
> 
> Parav Pandit (7):
>   vdpa: Add missing comment for virtqueue count
>   vdpa: Use simpler version of ida allocation
>   vdpa: Extend routine to accept vdpa device name
>   vdpa: Define vdpa parent device, ops and a netlink interface
>   vdpa: Enable a user to add and delete a vdpa device
>   vdpa: Enable user to query vdpa device info
>   vdpa/vdpa_sim: Enable user to create vdpasim net devices
> 
>  drivers/vdpa/Kconfig              |   1 +
>  drivers/vdpa/ifcvf/ifcvf_main.c   |   2 +-
>  drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
>  drivers/vdpa/vdpa.c               | 511 +++++++++++++++++++++++++++++-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c  |  81 ++++-
>  include/linux/vdpa.h              |  46 ++-
>  include/uapi/linux/vdpa.h         |  41 +++
>  7 files changed, 660 insertions(+), 24 deletions(-)
>  create mode 100644 include/uapi/linux/vdpa.h
> 
> -- 
> 2.26.2
Parav Pandit Dec. 16, 2020, 4:54 p.m. UTC | #26
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, December 16, 2020 9:36 PM
> 
> On Wed, 16 Dec 2020 04:13:51 -0500 Michael S. Tsirkin wrote:
> > > > > 3. Why not use ioctl() interface?
> > > >
> > > > Obviously I'm gonna ask you - why can't you use devlink?
> > > >
> > > This was considered.
> > > However it seems that extending devlink for vdpa specific stats, devices,
> config sounds overloading devlink beyond its defined scope.
> >
> > kuba what's your thinking here? Should I merge this as is?
> 
> No objections from me if people familiar with VDPA like it.

I was too occupied with the recent work on subfunction series.
I wanted to change the "parentdev" to "mgmtdev" to make it little more clear for vdpa management tool to see vdpa mgmt device and operate on it.
What do you think? Should I revise v2 or its late?
Michael S. Tsirkin Dec. 16, 2020, 7:57 p.m. UTC | #27
On Wed, Dec 16, 2020 at 04:54:37PM +0000, Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Wednesday, December 16, 2020 9:36 PM
> > 
> > On Wed, 16 Dec 2020 04:13:51 -0500 Michael S. Tsirkin wrote:
> > > > > > 3. Why not use ioctl() interface?
> > > > >
> > > > > Obviously I'm gonna ask you - why can't you use devlink?
> > > > >
> > > > This was considered.
> > > > However it seems that extending devlink for vdpa specific stats, devices,
> > config sounds overloading devlink beyond its defined scope.
> > >
> > > kuba what's your thinking here? Should I merge this as is?
> > 
> > No objections from me if people familiar with VDPA like it.
> 
> I was too occupied with the recent work on subfunction series.
> I wanted to change the "parentdev" to "mgmtdev" to make it little more clear for vdpa management tool to see vdpa mgmt device and operate on it.
> What do you think? Should I revise v2 or its late?

I need a rebase anyway, so sure.
Parav Pandit Dec. 17, 2020, 12:13 p.m. UTC | #28
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, December 17, 2020 1:28 AM
> 
> On Wed, Dec 16, 2020 at 04:54:37PM +0000, Parav Pandit wrote:
> > > From: Jakub Kicinski <kuba@kernel.org>
> > > Sent: Wednesday, December 16, 2020 9:36 PM
> > >
> > > On Wed, 16 Dec 2020 04:13:51 -0500 Michael S. Tsirkin wrote:
> > > > > > > 3. Why not use ioctl() interface?
> > > > > >
> > > > > > Obviously I'm gonna ask you - why can't you use devlink?
> > > > > >
> > > > > This was considered.
> > > > > However it seems that extending devlink for vdpa specific stats,
> devices,
> > > config sounds overloading devlink beyond its defined scope.
> > > >
> > > > kuba what's your thinking here? Should I merge this as is?
> > >
> > > No objections from me if people familiar with VDPA like it.
> >
> > I was too occupied with the recent work on subfunction series.
> > I wanted to change the "parentdev" to "mgmtdev" to make it little more
> clear for vdpa management tool to see vdpa mgmt device and operate on it.
> > What do you think? Should I revise v2 or its late?
> 
> I need a rebase anyway, so sure.
ok. Thanks.
Parav Pandit Jan. 19, 2021, 4:21 a.m. UTC | #29
Hi David,

> From: David Ahern <dsahern@gmail.com>
> Sent: Wednesday, December 9, 2020 4:17 AM
> 
> On 11/26/20 8:53 PM, Jason Wang wrote:
> > 1. Where does userspace vdpa tool reside which users can use?
> > Ans: vdpa tool can possibly reside in iproute2 [1] as it enables user
> > to create vdpa net devices.
> 
> iproute2 package is fine with us, but there are some expectations:
> syntax, command options and documentation need to be consistent with
> other iproute2 commands (this thread suggests it will be but just being clear),
> and it needs to re-use code as much as possible (e.g., json functions). If there
> is overlap with other tools (devlink, dcb, etc), you should refactor into
> common code used by all. Petr Machata has done this quite a bit for dcb and
> is a good example to follow.

Sorry for my late reply. I missed your message until yesterday.
Thanks for the ack and inputs.
Yes, I migrated the iproute2/vdpa to use now uses the common code introduced by dcb tool.
Waiting for kernel side to finish.