diff mbox series

[v3,24/37] backends/iommufd: Introduce the iommufd object

Message ID 20231026103104.1686921-25-zhenzhong.duan@intel.com (mailing list archive)
State New, archived
Headers show
Series vfio: Adopt iommufd | expand

Commit Message

Duan, Zhenzhong Oct. 26, 2023, 10:30 a.m. UTC
From: Eric Auger <eric.auger@redhat.com>

Introduce an iommufd object which allows the interaction
with the host /dev/iommu device.

The /dev/iommu can have been already pre-opened outside of qemu,
in which case the fd can be passed directly along with the
iommufd object:

This allows the iommufd object to be shared accross several
subsystems (VFIO, VDPA, ...). For example, libvirt would open
the /dev/iommu once.

If no fd is passed along with the iommufd object, the /dev/iommu
is opened by the qemu code.

The CONFIG_IOMMUFD option must be set to compile this new object.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 MAINTAINERS              |   7 +
 qapi/qom.json            |  20 +++
 include/sysemu/iommufd.h |  46 +++++++
 backends/iommufd-stub.c  |  59 +++++++++
 backends/iommufd.c       | 268 +++++++++++++++++++++++++++++++++++++++
 backends/Kconfig         |   4 +
 backends/meson.build     |   5 +
 backends/trace-events    |  12 ++
 qemu-options.hx          |  13 ++
 9 files changed, 434 insertions(+)
 create mode 100644 include/sysemu/iommufd.h
 create mode 100644 backends/iommufd-stub.c
 create mode 100644 backends/iommufd.c

Comments

Markus Armbruster Oct. 26, 2023, 1:27 p.m. UTC | #1
Zhenzhong Duan <zhenzhong.duan@intel.com> writes:

> From: Eric Auger <eric.auger@redhat.com>
>
> Introduce an iommufd object which allows the interaction
> with the host /dev/iommu device.
>
> The /dev/iommu can have been already pre-opened outside of qemu,
> in which case the fd can be passed directly along with the
> iommufd object:
>
> This allows the iommufd object to be shared accross several
> subsystems (VFIO, VDPA, ...). For example, libvirt would open
> the /dev/iommu once.
>
> If no fd is passed along with the iommufd object, the /dev/iommu
> is opened by the qemu code.
>
> The CONFIG_IOMMUFD option must be set to compile this new object.
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
>  MAINTAINERS              |   7 +
>  qapi/qom.json            |  20 +++
>  include/sysemu/iommufd.h |  46 +++++++
>  backends/iommufd-stub.c  |  59 +++++++++
>  backends/iommufd.c       | 268 +++++++++++++++++++++++++++++++++++++++
>  backends/Kconfig         |   4 +
>  backends/meson.build     |   5 +
>  backends/trace-events    |  12 ++
>  qemu-options.hx          |  13 ++
>  9 files changed, 434 insertions(+)
>  create mode 100644 include/sysemu/iommufd.h
>  create mode 100644 backends/iommufd-stub.c
>  create mode 100644 backends/iommufd.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7f9912baa0..7aa57ab16f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2109,6 +2109,13 @@ F: hw/vfio/ap.c
>  F: docs/system/s390x/vfio-ap.rst
>  L: qemu-s390x@nongnu.org
>  
> +iommufd
> +M: Yi Liu <yi.l.liu@intel.com>
> +M: Eric Auger <eric.auger@redhat.com>
> +S: Supported
> +F: backends/iommufd.c
> +F: include/sysemu/iommufd.h
> +
>  vhost
>  M: Michael S. Tsirkin <mst@redhat.com>
>  S: Supported
> diff --git a/qapi/qom.json b/qapi/qom.json
> index c53ef978ff..ef0b50f107 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -794,6 +794,22 @@
>  { 'struct': 'VfioUserServerProperties',
>    'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>  
> +##
> +# @IOMMUFDProperties:
> +#
> +# Properties for iommufd objects.
> +#
> +# @fd: file descriptor name previously passed via 'getfd' command,
> +#     which represents a pre-opened /dev/iommu. This allows the

Two spaces between sentences for consistency, please.

> +#     iommufd object to be shared accross several subsystems
> +#     (VFIO, VDPA, ...) and file descriptor to be shared with

Comma before "and file".

Either "the file descriptor", or "file descriptors".

> +#     other process, e.g: DPDK.

e.g.

Alternatively "such as DPDK."

> +#
> +# Since: 8.2
> +##
> +{ 'struct': 'IOMMUFDProperties',
> +        'data': { '*fd': 'str' } }

@fd is optional.  How does the iommufd object behave when @fd is absent?

> +
>  ##
>  # @RngProperties:
>  #
> @@ -934,6 +950,8 @@
   ##
   # @ObjectType:

None of the enum members are documented.  I'm not asking you to fix that
now.

   #
   # Features:
   #
   # @unstable: Member @x-remote-object is experimental.
   #
   # Since: 6.0
   ##
   { 'enum': 'ObjectType',
     'data': [
       [...]
>      'input-barrier',
>      { 'name': 'input-linux',
>        'if': 'CONFIG_LINUX' },
> +    { 'name': 'iommufd',
> +      'if': 'CONFIG_IOMMUFD' },

Should struct IOMMUFDProperties also have 'if': 'CONFIG_IOMMUFD'?

>      'iothread',
>      'main-loop',
>      { 'name': 'memory-backend-epc',
> @@ -1003,6 +1021,8 @@
>        'input-barrier':              'InputBarrierProperties',
>        'input-linux':                { 'type': 'InputLinuxProperties',
>                                        'if': 'CONFIG_LINUX' },
> +      'iommufd':                    { 'type': 'IOMMUFDProperties',
> +                                      'if': 'CONFIG_IOMMUFD' },
>        'iothread':                   'IothreadProperties',
>        'main-loop':                  'MainLoopProperties',
>        'memory-backend-epc':         { 'type': 'MemoryBackendEpcProperties',

[...]
Duan, Zhenzhong Oct. 27, 2023, 7:50 a.m. UTC | #2
>-----Original Message-----
>From: Markus Armbruster <armbru@redhat.com>
>Sent: Thursday, October 26, 2023 9:28 PM
>Subject: Re: [PATCH v3 24/37] backends/iommufd: Introduce the iommufd object
>
>Zhenzhong Duan <zhenzhong.duan@intel.com> writes:
>
>> From: Eric Auger <eric.auger@redhat.com>
>>
>> Introduce an iommufd object which allows the interaction
>> with the host /dev/iommu device.
>>
>> The /dev/iommu can have been already pre-opened outside of qemu,
>> in which case the fd can be passed directly along with the
>> iommufd object:
>>
>> This allows the iommufd object to be shared accross several
>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>> the /dev/iommu once.
>>
>> If no fd is passed along with the iommufd object, the /dev/iommu
>> is opened by the qemu code.
>>
>> The CONFIG_IOMMUFD option must be set to compile this new object.
>>
>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>  MAINTAINERS              |   7 +
>>  qapi/qom.json            |  20 +++
>>  include/sysemu/iommufd.h |  46 +++++++
>>  backends/iommufd-stub.c  |  59 +++++++++
>>  backends/iommufd.c       | 268 +++++++++++++++++++++++++++++++++++++++
>>  backends/Kconfig         |   4 +
>>  backends/meson.build     |   5 +
>>  backends/trace-events    |  12 ++
>>  qemu-options.hx          |  13 ++
>>  9 files changed, 434 insertions(+)
>>  create mode 100644 include/sysemu/iommufd.h
>>  create mode 100644 backends/iommufd-stub.c
>>  create mode 100644 backends/iommufd.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 7f9912baa0..7aa57ab16f 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -2109,6 +2109,13 @@ F: hw/vfio/ap.c
>>  F: docs/system/s390x/vfio-ap.rst
>>  L: qemu-s390x@nongnu.org
>>
>> +iommufd
>> +M: Yi Liu <yi.l.liu@intel.com>
>> +M: Eric Auger <eric.auger@redhat.com>
>> +S: Supported
>> +F: backends/iommufd.c
>> +F: include/sysemu/iommufd.h
>> +
>>  vhost
>>  M: Michael S. Tsirkin <mst@redhat.com>
>>  S: Supported
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index c53ef978ff..ef0b50f107 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -794,6 +794,22 @@
>>  { 'struct': 'VfioUserServerProperties',
>>    'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>
>> +##
>> +# @IOMMUFDProperties:
>> +#
>> +# Properties for iommufd objects.
>> +#
>> +# @fd: file descriptor name previously passed via 'getfd' command,
>> +#     which represents a pre-opened /dev/iommu. This allows the
>
>Two spaces between sentences for consistency, please.

Presume you mean "  'data': { '*fd': 'str' } }" line, not above line.

>
>> +#     iommufd object to be shared accross several subsystems
>> +#     (VFIO, VDPA, ...) and file descriptor to be shared with
>
>Comma before "and file".
>
>Either "the file descriptor", or "file descriptors".
>
>> +#     other process, e.g: DPDK.
>
>e.g.
>
>Alternatively "such as DPDK."

Will fix.

>
>> +#
>> +# Since: 8.2
>> +##
>> +{ 'struct': 'IOMMUFDProperties',
>> +        'data': { '*fd': 'str' } }
>
>@fd is optional.  How does the iommufd object behave when @fd is absent?

If no fd is passed along with the iommufd object, the /dev/iommu
is opened by the qemu code. Let me know if this also needs to be documented.

>
>> +
>>  ##
>>  # @RngProperties:
>>  #
>> @@ -934,6 +950,8 @@
>   ##
>   # @ObjectType:
>
>None of the enum members are documented.  I'm not asking you to fix that
>now.
>
>   #
>   # Features:
>   #
>   # @unstable: Member @x-remote-object is experimental.
>   #
>   # Since: 6.0
>   ##
>   { 'enum': 'ObjectType',
>     'data': [
>       [...]
>>      'input-barrier',
>>      { 'name': 'input-linux',
>>        'if': 'CONFIG_LINUX' },
>> +    { 'name': 'iommufd',
>> +      'if': 'CONFIG_IOMMUFD' },
>
>Should struct IOMMUFDProperties also have 'if': 'CONFIG_IOMMUFD'?

Yes, thanks for point out, will fix.

BRs.
Zhenzhong
Markus Armbruster Oct. 27, 2023, 8:30 a.m. UTC | #3
"Duan, Zhenzhong" <zhenzhong.duan@intel.com> writes:

>>-----Original Message-----
>> From: Markus Armbruster <armbru@redhat.com>
>> Sent: Thursday, October 26, 2023 9:28 PM
>> Subject: Re: [PATCH v3 24/37] backends/iommufd: Introduce the iommufd object
>>
>> Zhenzhong Duan <zhenzhong.duan@intel.com> writes:
>>
>>> From: Eric Auger <eric.auger@redhat.com>
>>>
>>> Introduce an iommufd object which allows the interaction
>>> with the host /dev/iommu device.
>>>
>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>> in which case the fd can be passed directly along with the
>>> iommufd object:
>>>
>>> This allows the iommufd object to be shared accross several
>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>> the /dev/iommu once.
>>>
>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>> is opened by the qemu code.
>>>
>>> The CONFIG_IOMMUFD option must be set to compile this new object.
>>>
>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>> ---
>>>  MAINTAINERS              |   7 +
>>>  qapi/qom.json            |  20 +++
>>>  include/sysemu/iommufd.h |  46 +++++++
>>>  backends/iommufd-stub.c  |  59 +++++++++
>>>  backends/iommufd.c       | 268 +++++++++++++++++++++++++++++++++++++++
>>>  backends/Kconfig         |   4 +
>>>  backends/meson.build     |   5 +
>>>  backends/trace-events    |  12 ++
>>>  qemu-options.hx          |  13 ++
>>>  9 files changed, 434 insertions(+)
>>>  create mode 100644 include/sysemu/iommufd.h
>>>  create mode 100644 backends/iommufd-stub.c
>>>  create mode 100644 backends/iommufd.c
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index 7f9912baa0..7aa57ab16f 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -2109,6 +2109,13 @@ F: hw/vfio/ap.c
>>>  F: docs/system/s390x/vfio-ap.rst
>>>  L: qemu-s390x@nongnu.org
>>>
>>> +iommufd
>>> +M: Yi Liu <yi.l.liu@intel.com>
>>> +M: Eric Auger <eric.auger@redhat.com>
>>> +S: Supported
>>> +F: backends/iommufd.c
>>> +F: include/sysemu/iommufd.h
>>> +
>>>  vhost
>>>  M: Michael S. Tsirkin <mst@redhat.com>
>>>  S: Supported
>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>> index c53ef978ff..ef0b50f107 100644
>>> --- a/qapi/qom.json
>>> +++ b/qapi/qom.json
>>> @@ -794,6 +794,22 @@
>>>  { 'struct': 'VfioUserServerProperties',
>>>    'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>>
>>> +##
>>> +# @IOMMUFDProperties:
>>> +#
>>> +# Properties for iommufd objects.
>>> +#
>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>>> +#     which represents a pre-opened /dev/iommu. This allows the
>>
>> Two spaces between sentences for consistency, please.
>
> Presume you mean "  'data': { '*fd': 'str' } }" line, not above line.

I'd like you to format the doc comment like this:


     # @fd: file descriptor name previously passed via 'getfd' command,
     #     which represents a pre-opened /dev/iommu.  This allows the

Note the two spaces after the period in "/dev/iommu.  This allows".

>>
>>> +#     iommufd object to be shared accross several subsystems
>>> +#     (VFIO, VDPA, ...) and file descriptor to be shared with
>>
>> Comma before "and file".
>>
>> Either "the file descriptor", or "file descriptors".
>>
>>> +#     other process, e.g: DPDK.
>>
>> e.g.
>>
>> Alternatively "such as DPDK."
>
> Will fix.
>
>>
>>> +#
>>> +# Since: 8.2
>>> +##
>>> +{ 'struct': 'IOMMUFDProperties',
>>> +        'data': { '*fd': 'str' } }
>>
>> @fd is optional.  How does the iommufd object behave when @fd is absent?
>
> If no fd is passed along with the iommufd object, the /dev/iommu
> is opened by the qemu code. Let me know if this also needs to be documented.

When something is optional, we pretty much always need to document
behavior when it's absent.

Ideally, behavior when absent is identical to behavior when present with
a certain default value.  Then documenting the default value suffices.
We commonly do this like (default: VALUE).

>>> +
>>>  ##
>>>  # @RngProperties:
>>>  #
>>> @@ -934,6 +950,8 @@
>>   ##
>>   # @ObjectType:
>>
>> None of the enum members are documented.  I'm not asking you to fix that
>> now.
>>
>>   #
>>   # Features:
>>   #
>>   # @unstable: Member @x-remote-object is experimental.
>>   #
>>   # Since: 6.0
>>   ##
>>   { 'enum': 'ObjectType',
>>     'data': [
>>       [...]
>>>      'input-barrier',
>>>      { 'name': 'input-linux',
>>>        'if': 'CONFIG_LINUX' },
>>> +    { 'name': 'iommufd',
>>> +      'if': 'CONFIG_IOMMUFD' },
>>
>> Should struct IOMMUFDProperties also have 'if': 'CONFIG_IOMMUFD'?
>
> Yes, thanks for point out, will fix.

You're welcome!
Duan, Zhenzhong Oct. 27, 2023, 9:41 a.m. UTC | #4
>-----Original Message-----
>From: Markus Armbruster <armbru@redhat.com>
>Sent: Friday, October 27, 2023 4:31 PM
>Subject: Re: [PATCH v3 24/37] backends/iommufd: Introduce the iommufd object
>
>"Duan, Zhenzhong" <zhenzhong.duan@intel.com> writes:
>
>>>-----Original Message-----
>>> From: Markus Armbruster <armbru@redhat.com>
>>> Sent: Thursday, October 26, 2023 9:28 PM
>>> Subject: Re: [PATCH v3 24/37] backends/iommufd: Introduce the iommufd
>object
>>>
>>> Zhenzhong Duan <zhenzhong.duan@intel.com> writes:
>>>
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> Introduce an iommufd object which allows the interaction
>>>> with the host /dev/iommu device.
>>>>
>>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>>> in which case the fd can be passed directly along with the
>>>> iommufd object:
>>>>
>>>> This allows the iommufd object to be shared accross several
>>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>>> the /dev/iommu once.
>>>>
>>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>>> is opened by the qemu code.
>>>>
>>>> The CONFIG_IOMMUFD option must be set to compile this new object.
>>>>
>>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>> ---
>>>>  MAINTAINERS              |   7 +
>>>>  qapi/qom.json            |  20 +++
>>>>  include/sysemu/iommufd.h |  46 +++++++
>>>>  backends/iommufd-stub.c  |  59 +++++++++
>>>>  backends/iommufd.c       | 268
>+++++++++++++++++++++++++++++++++++++++
>>>>  backends/Kconfig         |   4 +
>>>>  backends/meson.build     |   5 +
>>>>  backends/trace-events    |  12 ++
>>>>  qemu-options.hx          |  13 ++
>>>>  9 files changed, 434 insertions(+)
>>>>  create mode 100644 include/sysemu/iommufd.h
>>>>  create mode 100644 backends/iommufd-stub.c
>>>>  create mode 100644 backends/iommufd.c
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 7f9912baa0..7aa57ab16f 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -2109,6 +2109,13 @@ F: hw/vfio/ap.c
>>>>  F: docs/system/s390x/vfio-ap.rst
>>>>  L: qemu-s390x@nongnu.org
>>>>
>>>> +iommufd
>>>> +M: Yi Liu <yi.l.liu@intel.com>
>>>> +M: Eric Auger <eric.auger@redhat.com>
>>>> +S: Supported
>>>> +F: backends/iommufd.c
>>>> +F: include/sysemu/iommufd.h
>>>> +
>>>>  vhost
>>>>  M: Michael S. Tsirkin <mst@redhat.com>
>>>>  S: Supported
>>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>>> index c53ef978ff..ef0b50f107 100644
>>>> --- a/qapi/qom.json
>>>> +++ b/qapi/qom.json
>>>> @@ -794,6 +794,22 @@
>>>>  { 'struct': 'VfioUserServerProperties',
>>>>    'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>>>
>>>> +##
>>>> +# @IOMMUFDProperties:
>>>> +#
>>>> +# Properties for iommufd objects.
>>>> +#
>>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>>>> +#     which represents a pre-opened /dev/iommu. This allows the
>>>
>>> Two spaces between sentences for consistency, please.
>>
>> Presume you mean "  'data': { '*fd': 'str' } }" line, not above line.
>
>I'd like you to format the doc comment like this:
>
>
>     # @fd: file descriptor name previously passed via 'getfd' command,
>     #     which represents a pre-opened /dev/iommu.  This allows the
>
>Note the two spaces after the period in "/dev/iommu.  This allows".

Understood.

>
>>>
>>>> +#     iommufd object to be shared accross several subsystems
>>>> +#     (VFIO, VDPA, ...) and file descriptor to be shared with
>>>
>>> Comma before "and file".
>>>
>>> Either "the file descriptor", or "file descriptors".
>>>
>>>> +#     other process, e.g: DPDK.
>>>
>>> e.g.
>>>
>>> Alternatively "such as DPDK."
>>
>> Will fix.
>>
>>>
>>>> +#
>>>> +# Since: 8.2
>>>> +##
>>>> +{ 'struct': 'IOMMUFDProperties',
>>>> +        'data': { '*fd': 'str' } }
>>>
>>> @fd is optional.  How does the iommufd object behave when @fd is absent?
>>
>> If no fd is passed along with the iommufd object, the /dev/iommu
>> is opened by the qemu code. Let me know if this also needs to be documented.
>
>When something is optional, we pretty much always need to document
>behavior when it's absent.
>
>Ideally, behavior when absent is identical to behavior when present with
>a certain default value.  Then documenting the default value suffices.
>We commonly do this like (default: VALUE).

Got it.

Thanks
Zhenzhong
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 7f9912baa0..7aa57ab16f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2109,6 +2109,13 @@  F: hw/vfio/ap.c
 F: docs/system/s390x/vfio-ap.rst
 L: qemu-s390x@nongnu.org
 
+iommufd
+M: Yi Liu <yi.l.liu@intel.com>
+M: Eric Auger <eric.auger@redhat.com>
+S: Supported
+F: backends/iommufd.c
+F: include/sysemu/iommufd.h
+
 vhost
 M: Michael S. Tsirkin <mst@redhat.com>
 S: Supported
diff --git a/qapi/qom.json b/qapi/qom.json
index c53ef978ff..ef0b50f107 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -794,6 +794,22 @@ 
 { 'struct': 'VfioUserServerProperties',
   'data': { 'socket': 'SocketAddress', 'device': 'str' } }
 
+##
+# @IOMMUFDProperties:
+#
+# Properties for iommufd objects.
+#
+# @fd: file descriptor name previously passed via 'getfd' command,
+#     which represents a pre-opened /dev/iommu. This allows the
+#     iommufd object to be shared accross several subsystems
+#     (VFIO, VDPA, ...) and file descriptor to be shared with
+#     other process, e.g: DPDK.
+#
+# Since: 8.2
+##
+{ 'struct': 'IOMMUFDProperties',
+        'data': { '*fd': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -934,6 +950,8 @@ 
     'input-barrier',
     { 'name': 'input-linux',
       'if': 'CONFIG_LINUX' },
+    { 'name': 'iommufd',
+      'if': 'CONFIG_IOMMUFD' },
     'iothread',
     'main-loop',
     { 'name': 'memory-backend-epc',
@@ -1003,6 +1021,8 @@ 
       'input-barrier':              'InputBarrierProperties',
       'input-linux':                { 'type': 'InputLinuxProperties',
                                       'if': 'CONFIG_LINUX' },
+      'iommufd':                    { 'type': 'IOMMUFDProperties',
+                                      'if': 'CONFIG_IOMMUFD' },
       'iothread':                   'IothreadProperties',
       'main-loop':                  'MainLoopProperties',
       'memory-backend-epc':         { 'type': 'MemoryBackendEpcProperties',
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
new file mode 100644
index 0000000000..f0e5c7eeb8
--- /dev/null
+++ b/include/sysemu/iommufd.h
@@ -0,0 +1,46 @@ 
+#ifndef SYSEMU_IOMMUFD_H
+#define SYSEMU_IOMMUFD_H
+
+#include "qom/object.h"
+#include "qemu/thread.h"
+#include "exec/hwaddr.h"
+#include "exec/cpu-common.h"
+
+#define TYPE_IOMMUFD_BACKEND "iommufd"
+OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
+                    IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND(obj) \
+    OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND_CLASS(klass) \
+    OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND)
+struct IOMMUFDBackendClass {
+    ObjectClass parent_class;
+};
+
+struct IOMMUFDBackend {
+    Object parent;
+
+    /*< protected >*/
+    int fd;            /* /dev/iommu file descriptor */
+    bool owned;        /* is the /dev/iommu opened internally */
+    QemuMutex lock;
+    uint32_t users;
+
+    /*< public >*/
+};
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
+void iommufd_backend_disconnect(IOMMUFDBackend *be);
+
+int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id);
+void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id);
+void iommufd_backend_free_id(int fd, uint32_t id);
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+                            ram_addr_t size, void *vaddr, bool readonly);
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+                              hwaddr iova, ram_addr_t size);
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+                               uint32_t pt_id, uint32_t *out_hwpt);
+#endif
diff --git a/backends/iommufd-stub.c b/backends/iommufd-stub.c
new file mode 100644
index 0000000000..02ac844c17
--- /dev/null
+++ b/backends/iommufd-stub.c
@@ -0,0 +1,59 @@ 
+/*
+ * iommufd container backend stub
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *          Eric Auger <eric.auger@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/iommufd.h"
+#include "qemu/error-report.h"
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
+{
+    return 0;
+}
+void iommufd_backend_disconnect(IOMMUFDBackend *be)
+{
+}
+void iommufd_backend_free_id(int fd, uint32_t id)
+{
+}
+int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id)
+{
+    return 0;
+}
+void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id)
+{
+}
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+                            ram_addr_t size, void *vaddr, bool readonly)
+{
+    return 0;
+}
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+                              hwaddr iova, ram_addr_t size)
+{
+    return 0;
+}
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+                               uint32_t pt_id, uint32_t *out_hwpt)
+{
+    return 0;
+}
diff --git a/backends/iommufd.c b/backends/iommufd.c
new file mode 100644
index 0000000000..3f0ed37847
--- /dev/null
+++ b/backends/iommufd.c
@@ -0,0 +1,268 @@ 
+/*
+ * iommufd container backend
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *          Eric Auger <eric.auger@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/iommufd.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/module.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "monitor/monitor.h"
+#include "trace.h"
+#include <sys/ioctl.h>
+#include <linux/iommufd.h>
+
+static void iommufd_backend_init(Object *obj)
+{
+    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+    be->fd = -1;
+    be->users = 0;
+    be->owned = true;
+    qemu_mutex_init(&be->lock);
+}
+
+static void iommufd_backend_finalize(Object *obj)
+{
+    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+    if (be->owned) {
+        close(be->fd);
+        be->fd = -1;
+    }
+}
+
+static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
+{
+    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+    int fd = -1;
+
+    fd = monitor_fd_param(monitor_cur(), str, errp);
+    if (fd == -1) {
+        error_prepend(errp, "Could not parse remote object fd %s:", str);
+        return;
+    }
+    qemu_mutex_lock(&be->lock);
+    be->fd = fd;
+    be->owned = false;
+    qemu_mutex_unlock(&be->lock);
+    trace_iommu_backend_set_fd(be->fd);
+}
+
+static void iommufd_backend_class_init(ObjectClass *oc, void *data)
+{
+    object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
+}
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
+{
+    int fd, ret = 0;
+
+    qemu_mutex_lock(&be->lock);
+    if (be->users == UINT32_MAX) {
+        error_setg(errp, "too many connections");
+        ret = -E2BIG;
+        goto out;
+    }
+    if (be->owned && !be->users) {
+        fd = qemu_open_old("/dev/iommu", O_RDWR);
+        if (fd < 0) {
+            error_setg_errno(errp, errno, "/dev/iommu opening failed");
+            ret = fd;
+            goto out;
+        }
+        be->fd = fd;
+    }
+    be->users++;
+out:
+    trace_iommufd_backend_connect(be->fd, be->owned,
+                                  be->users, ret);
+    qemu_mutex_unlock(&be->lock);
+    return ret;
+}
+
+void iommufd_backend_disconnect(IOMMUFDBackend *be)
+{
+    qemu_mutex_lock(&be->lock);
+    if (!be->users) {
+        goto out;
+    }
+    be->users--;
+    if (!be->users && be->owned) {
+        close(be->fd);
+        be->fd = -1;
+    }
+out:
+    trace_iommufd_backend_disconnect(be->fd, be->users);
+    qemu_mutex_unlock(&be->lock);
+}
+
+static int iommufd_backend_alloc_ioas(int fd, uint32_t *ioas_id)
+{
+    int ret;
+    struct iommu_ioas_alloc alloc_data  = {
+        .size = sizeof(alloc_data),
+        .flags = 0,
+    };
+
+    ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
+    if (ret) {
+        error_report("Failed to allocate ioas %m");
+    }
+
+    *ioas_id = alloc_data.out_ioas_id;
+    trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
+
+    return ret;
+}
+
+void iommufd_backend_free_id(int fd, uint32_t id)
+{
+    int ret;
+    struct iommu_destroy des = {
+        .size = sizeof(des),
+        .id = id,
+    };
+
+    ret = ioctl(fd, IOMMU_DESTROY, &des);
+    trace_iommufd_backend_free_id(fd, id, ret);
+    if (ret) {
+        error_report("Failed to free id: %u %m", id);
+    }
+}
+
+int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id)
+{
+    int ret;
+
+    ret = iommufd_backend_alloc_ioas(be->fd, ioas_id);
+    trace_iommufd_backend_get_ioas(be->fd, *ioas_id, ret);
+    return ret;
+}
+
+void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id)
+{
+    iommufd_backend_free_id(be->fd, ioas_id);
+    trace_iommufd_backend_put_ioas(be->fd, ioas_id);
+}
+
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+                            ram_addr_t size, void *vaddr, bool readonly)
+{
+    int ret;
+    struct iommu_ioas_map map = {
+        .size = sizeof(map),
+        .flags = IOMMU_IOAS_MAP_READABLE |
+                 IOMMU_IOAS_MAP_FIXED_IOVA,
+        .ioas_id = ioas_id,
+        .__reserved = 0,
+        .user_va = (uintptr_t)vaddr,
+        .iova = iova,
+        .length = size,
+    };
+
+    if (!readonly) {
+        map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
+    }
+
+    ret = ioctl(be->fd, IOMMU_IOAS_MAP, &map);
+    trace_iommufd_backend_map_dma(be->fd, ioas_id, iova, size,
+                                  vaddr, readonly, ret);
+    if (ret) {
+        error_report("IOMMU_IOAS_MAP failed: %m");
+    }
+    return !ret ? 0 : -errno;
+}
+
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+                              hwaddr iova, ram_addr_t size)
+{
+    int ret;
+    struct iommu_ioas_unmap unmap = {
+        .size = sizeof(unmap),
+        .ioas_id = ioas_id,
+        .iova = iova,
+        .length = size,
+    };
+
+    ret = ioctl(be->fd, IOMMU_IOAS_UNMAP, &unmap);
+    trace_iommufd_backend_unmap_dma(be->fd, ioas_id, iova, size, ret);
+    /*
+     * TODO: IOMMUFD doesn't support mapping PCI BARs for now.
+     * It's not a problem if there is no p2p dma, relax it here
+     * and avoid many noisy trigger from vIOMMU side.
+     */
+    if (ret && errno == ENOENT) {
+        ret = 0;
+    }
+    if (ret) {
+        error_report("IOMMU_IOAS_UNMAP failed: %m");
+    }
+    return !ret ? 0 : -errno;
+}
+
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+                               uint32_t pt_id, uint32_t *out_hwpt)
+{
+    int ret;
+    struct iommu_hwpt_alloc alloc_hwpt = {
+        .size = sizeof(struct iommu_hwpt_alloc),
+        .flags = 0,
+        .dev_id = dev_id,
+        .pt_id = pt_id,
+        .__reserved = 0,
+    };
+
+    ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+    trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id,
+                                     alloc_hwpt.out_hwpt_id, ret);
+
+    if (ret) {
+        error_report("IOMMU_HWPT_ALLOC failed: %m");
+    } else {
+        *out_hwpt = alloc_hwpt.out_hwpt_id;
+    }
+    return !ret ? 0 : -errno;
+}
+
+static const TypeInfo iommufd_backend_info = {
+    .name = TYPE_IOMMUFD_BACKEND,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(IOMMUFDBackend),
+    .instance_init = iommufd_backend_init,
+    .instance_finalize = iommufd_backend_finalize,
+    .class_size = sizeof(IOMMUFDBackendClass),
+    .class_init = iommufd_backend_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void register_types(void)
+{
+    type_register_static(&iommufd_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/Kconfig b/backends/Kconfig
index f35abc1609..2cb23f62fa 100644
--- a/backends/Kconfig
+++ b/backends/Kconfig
@@ -1 +1,5 @@ 
 source tpm/Kconfig
+
+config IOMMUFD
+    bool
+    depends on VFIO
diff --git a/backends/meson.build b/backends/meson.build
index 914c7c4afb..05ac57ff15 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -20,6 +20,11 @@  if have_vhost_user
   system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c'))
 endif
 system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost.c'))
+if have_iommufd
+  system_ss.add(files('iommufd.c'))
+else
+  system_ss.add(files('iommufd-stub.c'))
+endif
 if have_vhost_user_crypto
   system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost-user.c'))
 endif
diff --git a/backends/trace-events b/backends/trace-events
index 652eb76a57..e5f828bca2 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -5,3 +5,15 @@  dbus_vmstate_pre_save(void)
 dbus_vmstate_post_load(int version_id) "version_id: %d"
 dbus_vmstate_loading(const char *id) "id: %s"
 dbus_vmstate_saving(const char *id) "id: %s"
+
+# iommufd.c
+iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd=%d owned=%d users=%d (%d)"
+iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
+iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
+iommufd_backend_get_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
+iommufd_backend_put_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
+iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
+iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
+iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u out_hwpt=%u (%d)"
diff --git a/qemu-options.hx b/qemu-options.hx
index e26230bac5..ddfaddf8ce 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5210,6 +5210,19 @@  SRST
 
         The ``share`` boolean option is on by default with memfd.
 
+#ifdef CONFIG_IOMMUFD
+    ``-object iommufd,id=id[,fd=fd]``
+        Creates an iommufd backend which allows control of DMA mapping
+        through the /dev/iommu device.
+
+        The ``id`` parameter is a unique ID which frontends (such as
+        vfio-pci of vdpa) will use to connect with the iommufd backend.
+
+        The ``fd`` parameter is an optional pre-opened file descriptor
+        resulting from /dev/iommu opening. Usually the iommufd is shared
+        across all subsystems, bringing the benefit of centralized
+        reference counting.
+#endif
     ``-object rng-builtin,id=id``
         Creates a random number generator backend which obtains entropy
         from QEMU builtin functions. The ``id`` parameter is a unique ID