diff mbox series

[v6,1/5] docs/migration: add qatzip compression feature

Message ID 20240715221245.12435-2-yichen.wang@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Implement QATzip compression method | expand

Commit Message

Yichen Wang July 15, 2024, 10:12 p.m. UTC
From: Yuan Liu <yuan1.liu@intel.com>

add Intel QATzip compression method introduction

Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Yichen Wang <yichen.wang@bytedance.com>
---
 docs/devel/migration/features.rst           |   1 +
 docs/devel/migration/qatzip-compression.rst | 251 ++++++++++++++++++++
 2 files changed, 252 insertions(+)
 create mode 100644 docs/devel/migration/qatzip-compression.rst

Comments

Yuan Liu July 16, 2024, 2:34 a.m. UTC | #1
> -----Original Message-----
> From: Yichen Wang <yichen.wang@bytedance.com>
> Sent: Tuesday, July 16, 2024 6:13 AM
> To: Peter Xu <peterx@redhat.com>; Fabiano Rosas <farosas@suse.de>; Paolo
> Bonzini <pbonzini@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>;
> Eduardo Habkost <eduardo@habkost.net>; Marc-André Lureau
> <marcandre.lureau@redhat.com>; Thomas Huth <thuth@redhat.com>; Philippe
> Mathieu-Daudé <philmd@linaro.org>; Eric Blake <eblake@redhat.com>; Markus
> Armbruster <armbru@redhat.com>; Laurent Vivier <lvivier@redhat.com>; qemu-
> devel@nongnu.org
> Cc: Hao Xiang <hao.xiang@linux.dev>; Liu, Yuan1 <yuan1.liu@intel.com>;
> Zou, Nanhai <nanhai.zou@intel.com>; Ho-Ren (Jack) Chuang
> <horenchuang@bytedance.com>; Wang, Yichen <yichen.wang@bytedance.com>
> Subject: [PATCH v6 1/5] docs/migration: add qatzip compression feature
> 
> From: Yuan Liu <yuan1.liu@intel.com>
> 
> add Intel QATzip compression method introduction
> 
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Yichen Wang <yichen.wang@bytedance.com>
> ---
>  docs/devel/migration/features.rst           |   1 +
>  docs/devel/migration/qatzip-compression.rst | 251 ++++++++++++++++++++
>  2 files changed, 252 insertions(+)
>  create mode 100644 docs/devel/migration/qatzip-compression.rst
> 
> diff --git a/docs/devel/migration/features.rst
> b/docs/devel/migration/features.rst
> index 58f8fd9e16..8f431d52f9 100644
> --- a/docs/devel/migration/features.rst
> +++ b/docs/devel/migration/features.rst
> @@ -14,3 +14,4 @@ Migration has plenty of features to support different
> use cases.
>     CPR
>     qpl-compression
>     uadk-compression
> +   qatzip-compression
> diff --git a/docs/devel/migration/qatzip-compression.rst
> b/docs/devel/migration/qatzip-compression.rst
> new file mode 100644
> index 0000000000..72fa3e2826
> --- /dev/null
> +++ b/docs/devel/migration/qatzip-compression.rst
> @@ -0,0 +1,251 @@
> +==================
> +QATzip Compression
> +==================
> +In scenarios with limited network bandwidth, the ``QATzip`` solution can
> help
> +users save a lot of host CPU resources by accelerating compression and
> +decompression through the Intel QuickAssist Technology(``QAT``) hardware.

Hi Yichen

Thanks for adding the part of Performance Testing with QATzip, I wonder if we
can remove Performance Testing with QATzip part and directly add the following
content. 

Here, we use a typical example of limited bandwidth to illustrate the advantages
of QATzip. If the user is interested in qatzip, he still needs to verify the performance
by himself.

+The following test was conducted using 8 multifd channels and 10Gbps network
+bandwidth. The results show that, compared to zstd, ``QATzip`` significantly
+saves CPU resources on the sender and reduces migration time. Compared to the
+uncompressed solution, ``QATzip`` greatly improves the dirty page processing
+capability, indicated by the Pages per Second metric, and also reduces the
+total migration time.
+
+::
+
+   VM Configuration: 16 vCPU and 64G memory
+   VM Workload: all vCPUs are idle and 54G memory is filled with Silesia data.
+   QAT Devices: 4
+   |-----------|--------|---------|----------|----------|------|------|
+   |8 Channels |Total   |down     |throughput|pages per | send | recv |
+   |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
+   |-----------|--------|---------|----------|----------|------|------|
+   |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
+   |-----------|--------|---------|----------|----------|------|------|
+   |zstd       |   20165|       24|      8579|   2391465|   810|   340|
+   |-----------|--------|---------|----------|----------|------|------|
+   |none       |   46063|       40|     10848|    330240|    45|    85|
+   |-----------|--------|---------|----------|----------|------|------|


> +``QATzip`` is a user space library which builds on top of the Intel
> QuickAssist
> +Technology user space library, to provide extended accelerated
> compression and
> +decompression services.
> +
> +For more ``QATzip`` introduction, please refer to `QATzip Introduction
> +<https://github.com/intel/QATzip?tab=readme-ov-file#introductionl>`_
> +
> +QATzip Compression Framework
> +============================
> +
> +::
> +
> +  +----------------+
> +  | MultiFd Thread |
> +  +-------+--------+
> +          |
> +          | compress/decompress
> +  +-------+--------+
> +  | QATzip library |
> +  +-------+--------+
> +          |
> +  +-------+--------+
> +  |  QAT library   |
> +  +-------+--------+
> +          |         user space
> +  --------+---------------------
> +          |         kernel space
> +   +------+-------+
> +   |  QAT  Driver |
> +   +------+-------+
> +          |
> +   +------+-------+
> +   | QAT Devices  |
> +   +--------------+
> +
> +
> +QATzip Installation
> +-------------------
> +
> +The ``QATzip`` installation package has been integrated into some Linux
> +distributions and can be installed directly. For example, the Ubuntu
> Server
> +24.04 LTS system can be installed using below command
> +
> +.. code-block:: shell
> +
> +   #apt search qatzip
> +   libqatzip-dev/noble 1.2.0-0ubuntu3 amd64
> +     Intel QuickAssist user space library development files
> +
> +   libqatzip3/noble 1.2.0-0ubuntu3 amd64
> +     Intel QuickAssist user space library
> +
> +   qatzip/noble,now 1.2.0-0ubuntu3 amd64 [installed]
> +     Compression user-space tool for Intel QuickAssist Technology
> +
> +   #sudo apt install libqatzip-dev libqatzip3 qatzip
> +
> +If your system does not support the ``QATzip`` installation package, you
> can
> +use the source code to build and install, please refer to `QATzip source
> code installation
> +<https://github.com/intel/QATzip?tab=readme-ov-file#build-intel-
> quickassist-technology-driver>`_
> +
> +QAT Hardware Deployment
> +-----------------------
> +
> +``QAT`` supports physical functions(PFs) and virtual functions(VFs) for
> +deployment, and users can configure ``QAT`` resources for migration
> according
> +to actual needs. For more details about ``QAT`` deployment, please refer
> to
> +`Intel QuickAssist Technology Documentation
> +<https://intel.github.io/quickassist/index.html>`_
> +
> +For more ``QAT`` hardware introduction, please refer to `intel-quick-
> assist-technology-overview
> +<https://www.intel.com/content/www/us/en/architecture-and-
> technology/intel-quick-assist-technology-overview.html>`_
> +
> +How To Use QATzip Compression
> +=============================
> +
> +1 - Install ``QATzip`` library
> +
> +2 - Build ``QEMU`` with ``--enable-qatzip`` parameter
> +
> +  E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-
> qatzip``
> +
> +3 - Set ``migrate_set_parameter multifd-compression qatzip``
> +
> +4 - Set ``migrate_set_parameter multifd-qatzip-level comp_level``, the
> default
> +comp_level value is 1, and it supports levels from 1 to 9
> +
> +
> +Performance Testing with QATzip
> +===============================
> +
> +Testing environment is being set as below:
> +
> +VM configuration:16 vCPU, 64G memory;
> +
> +VM Workload: all vCPUs are idle and 54G memory is filled with Silesia
> data;
> +
> +QAT Devices: 4;
> +
> +Sender migration parameters:
> +
> +.. code-block:: shell
> +
> +    migrate_set_capability multifd on
> +    migrate_set_parameter multifd-channels 2/4/8
> +    migrate_set_parameter max-bandwidth 1G/10G
> +    migrate_set_parameter multifd-compression qatzip/zstd
> +
> +Receiver migration parameters:
> +
> +.. code-block:: shell
> +
> +    migrate_set_capability multifd on
> +    migrate_set_parameter multifd-channels 2
> +    migrate_set_parameter multifd-compression qatzip/zstd
> +
> +max-bandwidth: 1 GBps (Gbytes/sec)
> +
> +.. code-block:: text
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |2 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   21607|       77|      8051|    88|   125|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   78351|       96|      2199|   204|    80|
> +    |-----------|--------|---------|----------|------|------|
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |4 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   20336|       25|      8557|   110|   190|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   39324|       31|      4389|   406|   160|
> +    |-----------|--------|---------|----------|------|------|
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |8 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   20208|       22|      8613|   125|   300|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   20515|       22|      8438|   800|   340|
> +    |-----------|--------|---------|----------|------|------|
> +
> +max-bandwidth: 10 GBps (Gbytes/sec)
> +
> +.. code-block:: text
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |2 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   22450|       77|      7748|    80|   125|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   78339|       76|      2199|   204|    80|
> +    |-----------|--------|---------|----------|------|------|
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |4 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   13017|       24|     13401|   180|   285|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   39466|       21|      4373|   406|   160|
> +    |-----------|--------|---------|----------|------|------|
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |8 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   10255|       22|     17037|   280|   590|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   20126|       77|      8595|   810|   340|
> +    |-----------|--------|---------|----------|------|------|
> +
> +max-bandwidth: 1.25 GBps (Gbytes/sec)
> +
> +.. code-block:: text
> +
> +    |-----------|--------|---------|----------|----------|------|------|
> +    |8 Channels |Total   |down     |throughput|pages per | send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|----------|------|------|
> +    |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
> +    |-----------|--------|---------|----------|----------|------|------|
> +    |zstd       |   20165|       24|      8579|   2391465|   810|   340|
> +    |-----------|--------|---------|----------|----------|------|------|
> +    |none       |   46063|       40|     10848|    330240|    45|    85|
> +    |-----------|--------|---------|----------|----------|------|------|
> +
> +If the user has enabled compression in live migration, using QAT can save
> the
> +host CPU resources.
> +
> +When compression is enabled, the bottleneck of migration is usually the
> +compression throughput on the sender side, since CPU decompression
> throughput
> +is higher than compression, some reference data
> +https://github.com/inikep/lzbench, so more CPU resources need to be
> allocated
> +to the sender side.
> +
> +Summary:
> +
> +1. In the 1GBps case, QAT only uses 88% CPU utilization to reach 1GBps,
> but
> +   ZSTD needs 800%.
> +
> +2. In the 10Gbps case, QAT uses 180% CPU utilization to reach 10GBps. but
> ZSTD
> +   still cannot reach 10Gbps even if it uses 810%.
> +
> +3. The QAT decompression CPU utilization is higher than compression and
> ZSTD,
> +   because:
> +
> +   a. When using QAT compression, the data needs to be copied to the QAT
> memory
> +   (for DMA operations), and the same for decompression. However,
> +   do_user_addr_fault will be triggered during decompression because the
> QAT
> +   decompressed data is copied to the VM address space for the first
> time, in
> +   addition, both compression and decompression are processed by QAT and
> do not
> +   consume CPU resources, so the CPU utilization of the receiver is
> slightly
> +   higher than the sender.
> +
> +   b. Since zstd decompression decompresses data directly into the VM
> address
> +   space, there is one less memory copy than QAT, so the CPU utilization
> on the
> +   receiver is better than QAT. For the 1GBps case, the receiver CPU
> +   utilization is 125%, and the memory copy occupies ~80% of CPU
> utilization.
> +
> +How To Choose Between QATzip and QPL
> +====================================
> +Starting from Intel 4th Gen Intel Xeon Scalable processors, codenamed
> Sapphire
> +Rapids processor(``SPR``), it supports multiple build-in accelerators
> including
> +``QAT`` and ``IAA``, the former can accelerate ``QATzip``, and the latter
> is
> +used to accelerate ``QPL``.
> +
> +Here are some suggestions:
> +
> +1 - If your live migration scenario is limited network bandwidth and
> ``QAT``
> +hardware resources exceed ``IAA``, then use the ``QATzip`` method, which
> +can save a lot of host CPU resources for compression.
> +
> +2 - If your system cannot support shared virtual memory(SVM) technology,
> please
> +use ``QATzip`` method because ``QPL`` performance is not good without SVM
> +support.
> +
> +3 - For other scenarios, please use the ``QPL`` method first.
> --
> Yichen Wang
Peter Xu July 16, 2024, 2:50 p.m. UTC | #2
On Tue, Jul 16, 2024 at 02:34:07AM +0000, Liu, Yuan1 wrote:
> > -----Original Message-----
> > From: Yichen Wang <yichen.wang@bytedance.com>
> > Sent: Tuesday, July 16, 2024 6:13 AM
> > To: Peter Xu <peterx@redhat.com>; Fabiano Rosas <farosas@suse.de>; Paolo
> > Bonzini <pbonzini@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>;
> > Eduardo Habkost <eduardo@habkost.net>; Marc-André Lureau
> > <marcandre.lureau@redhat.com>; Thomas Huth <thuth@redhat.com>; Philippe
> > Mathieu-Daudé <philmd@linaro.org>; Eric Blake <eblake@redhat.com>; Markus
> > Armbruster <armbru@redhat.com>; Laurent Vivier <lvivier@redhat.com>; qemu-
> > devel@nongnu.org
> > Cc: Hao Xiang <hao.xiang@linux.dev>; Liu, Yuan1 <yuan1.liu@intel.com>;
> > Zou, Nanhai <nanhai.zou@intel.com>; Ho-Ren (Jack) Chuang
> > <horenchuang@bytedance.com>; Wang, Yichen <yichen.wang@bytedance.com>
> > Subject: [PATCH v6 1/5] docs/migration: add qatzip compression feature
> > 
> > From: Yuan Liu <yuan1.liu@intel.com>
> > 
> > add Intel QATzip compression method introduction
> > 
> > Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > Reviewed-by: Yichen Wang <yichen.wang@bytedance.com>
> > ---
> >  docs/devel/migration/features.rst           |   1 +
> >  docs/devel/migration/qatzip-compression.rst | 251 ++++++++++++++++++++
> >  2 files changed, 252 insertions(+)
> >  create mode 100644 docs/devel/migration/qatzip-compression.rst
> > 
> > diff --git a/docs/devel/migration/features.rst
> > b/docs/devel/migration/features.rst
> > index 58f8fd9e16..8f431d52f9 100644
> > --- a/docs/devel/migration/features.rst
> > +++ b/docs/devel/migration/features.rst
> > @@ -14,3 +14,4 @@ Migration has plenty of features to support different
> > use cases.
> >     CPR
> >     qpl-compression
> >     uadk-compression
> > +   qatzip-compression
> > diff --git a/docs/devel/migration/qatzip-compression.rst
> > b/docs/devel/migration/qatzip-compression.rst
> > new file mode 100644
> > index 0000000000..72fa3e2826
> > --- /dev/null
> > +++ b/docs/devel/migration/qatzip-compression.rst
> > @@ -0,0 +1,251 @@
> > +==================
> > +QATzip Compression
> > +==================
> > +In scenarios with limited network bandwidth, the ``QATzip`` solution can
> > help
> > +users save a lot of host CPU resources by accelerating compression and
> > +decompression through the Intel QuickAssist Technology(``QAT``) hardware.
> 
> Hi Yichen
> 
> Thanks for adding the part of Performance Testing with QATzip, I wonder if we
> can remove Performance Testing with QATzip part and directly add the following
> content. 
> 
> Here, we use a typical example of limited bandwidth to illustrate the advantages
> of QATzip. If the user is interested in qatzip, he still needs to verify the performance
> by himself.
> 
> +The following test was conducted using 8 multifd channels and 10Gbps network
> +bandwidth. The results show that, compared to zstd, ``QATzip`` significantly
> +saves CPU resources on the sender and reduces migration time. Compared to the
> +uncompressed solution, ``QATzip`` greatly improves the dirty page processing
> +capability, indicated by the Pages per Second metric, and also reduces the
> +total migration time.
> +
> +::
> +
> +   VM Configuration: 16 vCPU and 64G memory
> +   VM Workload: all vCPUs are idle and 54G memory is filled with Silesia data.
> +   QAT Devices: 4
> +   |-----------|--------|---------|----------|----------|------|------|
> +   |8 Channels |Total   |down     |throughput|pages per | send | recv |
> +   |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
> +   |-----------|--------|---------|----------|----------|------|------|
> +   |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
> +   |-----------|--------|---------|----------|----------|------|------|
> +   |zstd       |   20165|       24|      8579|   2391465|   810|   340|
> +   |-----------|--------|---------|----------|----------|------|------|
> +   |none       |   46063|       40|     10848|    330240|    45|    85|
> +   |-----------|--------|---------|----------|----------|------|------|

Yes this looks much simpler and better.  The 10GBps test isn't that useful
at least, especially with nocomp numbers absent.  I didn't say when looking
previously, but it'll be better to clarify the numbers.

Yuan, thanks so much for reviewing all the relevant patches.  It's very
helpful to us.
diff mbox series

Patch

diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst
index 58f8fd9e16..8f431d52f9 100644
--- a/docs/devel/migration/features.rst
+++ b/docs/devel/migration/features.rst
@@ -14,3 +14,4 @@  Migration has plenty of features to support different use cases.
    CPR
    qpl-compression
    uadk-compression
+   qatzip-compression
diff --git a/docs/devel/migration/qatzip-compression.rst b/docs/devel/migration/qatzip-compression.rst
new file mode 100644
index 0000000000..72fa3e2826
--- /dev/null
+++ b/docs/devel/migration/qatzip-compression.rst
@@ -0,0 +1,251 @@ 
+==================
+QATzip Compression
+==================
+In scenarios with limited network bandwidth, the ``QATzip`` solution can help
+users save a lot of host CPU resources by accelerating compression and
+decompression through the Intel QuickAssist Technology(``QAT``) hardware.
+
+``QATzip`` is a user space library which builds on top of the Intel QuickAssist
+Technology user space library, to provide extended accelerated compression and
+decompression services.
+
+For more ``QATzip`` introduction, please refer to `QATzip Introduction
+<https://github.com/intel/QATzip?tab=readme-ov-file#introductionl>`_
+
+QATzip Compression Framework
+============================
+
+::
+
+  +----------------+
+  | MultiFd Thread |
+  +-------+--------+
+          |
+          | compress/decompress
+  +-------+--------+
+  | QATzip library |
+  +-------+--------+
+          |
+  +-------+--------+
+  |  QAT library   |
+  +-------+--------+
+          |         user space
+  --------+---------------------
+          |         kernel space
+   +------+-------+
+   |  QAT  Driver |
+   +------+-------+
+          |
+   +------+-------+
+   | QAT Devices  |
+   +--------------+
+
+
+QATzip Installation
+-------------------
+
+The ``QATzip`` installation package has been integrated into some Linux
+distributions and can be installed directly. For example, the Ubuntu Server
+24.04 LTS system can be installed using below command
+
+.. code-block:: shell
+
+   #apt search qatzip
+   libqatzip-dev/noble 1.2.0-0ubuntu3 amd64
+     Intel QuickAssist user space library development files
+
+   libqatzip3/noble 1.2.0-0ubuntu3 amd64
+     Intel QuickAssist user space library
+
+   qatzip/noble,now 1.2.0-0ubuntu3 amd64 [installed]
+     Compression user-space tool for Intel QuickAssist Technology
+
+   #sudo apt install libqatzip-dev libqatzip3 qatzip
+
+If your system does not support the ``QATzip`` installation package, you can
+use the source code to build and install, please refer to `QATzip source code installation
+<https://github.com/intel/QATzip?tab=readme-ov-file#build-intel-quickassist-technology-driver>`_
+
+QAT Hardware Deployment
+-----------------------
+
+``QAT`` supports physical functions(PFs) and virtual functions(VFs) for
+deployment, and users can configure ``QAT`` resources for migration according
+to actual needs. For more details about ``QAT`` deployment, please refer to
+`Intel QuickAssist Technology Documentation
+<https://intel.github.io/quickassist/index.html>`_
+
+For more ``QAT`` hardware introduction, please refer to `intel-quick-assist-technology-overview
+<https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html>`_
+
+How To Use QATzip Compression
+=============================
+
+1 - Install ``QATzip`` library
+
+2 - Build ``QEMU`` with ``--enable-qatzip`` parameter
+
+  E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-qatzip``
+
+3 - Set ``migrate_set_parameter multifd-compression qatzip``
+
+4 - Set ``migrate_set_parameter multifd-qatzip-level comp_level``, the default
+comp_level value is 1, and it supports levels from 1 to 9
+
+
+Performance Testing with QATzip
+===============================
+
+Testing environment is being set as below:
+
+VM configuration:16 vCPU, 64G memory;
+
+VM Workload: all vCPUs are idle and 54G memory is filled with Silesia data;
+
+QAT Devices: 4;
+
+Sender migration parameters:
+
+.. code-block:: shell
+
+    migrate_set_capability multifd on
+    migrate_set_parameter multifd-channels 2/4/8
+    migrate_set_parameter max-bandwidth 1G/10G
+    migrate_set_parameter multifd-compression qatzip/zstd
+
+Receiver migration parameters:
+
+.. code-block:: shell
+
+    migrate_set_capability multifd on
+    migrate_set_parameter multifd-channels 2
+    migrate_set_parameter multifd-compression qatzip/zstd
+
+max-bandwidth: 1 GBps (Gbytes/sec)
+
+.. code-block:: text
+
+    |-----------|--------|---------|----------|------|------|
+    |2 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   21607|       77|      8051|    88|   125|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   78351|       96|      2199|   204|    80|
+    |-----------|--------|---------|----------|------|------|
+
+    |-----------|--------|---------|----------|------|------|
+    |4 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   20336|       25|      8557|   110|   190|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   39324|       31|      4389|   406|   160|
+    |-----------|--------|---------|----------|------|------|
+
+    |-----------|--------|---------|----------|------|------|
+    |8 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   20208|       22|      8613|   125|   300|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   20515|       22|      8438|   800|   340|
+    |-----------|--------|---------|----------|------|------|
+
+max-bandwidth: 10 GBps (Gbytes/sec)
+
+.. code-block:: text
+
+    |-----------|--------|---------|----------|------|------|
+    |2 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   22450|       77|      7748|    80|   125|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   78339|       76|      2199|   204|    80|
+    |-----------|--------|---------|----------|------|------|
+
+    |-----------|--------|---------|----------|------|------|
+    |4 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   13017|       24|     13401|   180|   285|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   39466|       21|      4373|   406|   160|
+    |-----------|--------|---------|----------|------|------|
+
+    |-----------|--------|---------|----------|------|------|
+    |8 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   10255|       22|     17037|   280|   590|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   20126|       77|      8595|   810|   340|
+    |-----------|--------|---------|----------|------|------|
+
+max-bandwidth: 1.25 GBps (Gbytes/sec)
+
+.. code-block:: text
+
+    |-----------|--------|---------|----------|----------|------|------|
+    |8 Channels |Total   |down     |throughput|pages per | send | recv |
+    |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
+    |-----------|--------|---------|----------|----------|------|------|
+    |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
+    |-----------|--------|---------|----------|----------|------|------|
+    |zstd       |   20165|       24|      8579|   2391465|   810|   340|
+    |-----------|--------|---------|----------|----------|------|------|
+    |none       |   46063|       40|     10848|    330240|    45|    85|
+    |-----------|--------|---------|----------|----------|------|------|
+
+If the user has enabled compression in live migration, using QAT can save the
+host CPU resources.
+
+When compression is enabled, the bottleneck of migration is usually the
+compression throughput on the sender side, since CPU decompression throughput
+is higher than compression, some reference data
+https://github.com/inikep/lzbench, so more CPU resources need to be allocated
+to the sender side.
+
+Summary:
+
+1. In the 1GBps case, QAT only uses 88% CPU utilization to reach 1GBps, but
+   ZSTD needs 800%.
+
+2. In the 10Gbps case, QAT uses 180% CPU utilization to reach 10GBps. but ZSTD
+   still cannot reach 10Gbps even if it uses 810%.
+
+3. The QAT decompression CPU utilization is higher than compression and ZSTD,
+   because:
+
+   a. When using QAT compression, the data needs to be copied to the QAT memory
+   (for DMA operations), and the same for decompression. However,
+   do_user_addr_fault will be triggered during decompression because the QAT
+   decompressed data is copied to the VM address space for the first time, in
+   addition, both compression and decompression are processed by QAT and do not
+   consume CPU resources, so the CPU utilization of the receiver is slightly
+   higher than the sender.
+
+   b. Since zstd decompression decompresses data directly into the VM address
+   space, there is one less memory copy than QAT, so the CPU utilization on the
+   receiver is better than QAT. For the 1GBps case, the receiver CPU
+   utilization is 125%, and the memory copy occupies ~80% of CPU utilization.
+
+How To Choose Between QATzip and QPL
+====================================
+Starting from Intel 4th Gen Intel Xeon Scalable processors, codenamed Sapphire
+Rapids processor(``SPR``), it supports multiple build-in accelerators including
+``QAT`` and ``IAA``, the former can accelerate ``QATzip``, and the latter is
+used to accelerate ``QPL``.
+
+Here are some suggestions:
+
+1 - If your live migration scenario is limited network bandwidth and ``QAT``
+hardware resources exceed ``IAA``, then use the ``QATzip`` method, which
+can save a lot of host CPU resources for compression.
+
+2 - If your system cannot support shared virtual memory(SVM) technology, please
+use ``QATzip`` method because ``QPL`` performance is not good without SVM
+support.
+
+3 - For other scenarios, please use the ``QPL`` method first.