mbox series

[v1,0/4] Fix a bug in VGIC ITS tables' save/restore

Message ID 20241105193422.1094875-1-jingzhangos@google.com (mailing list archive)
Headers show
Series Fix a bug in VGIC ITS tables' save/restore | expand

Message

Jing Zhang Nov. 5, 2024, 7:34 p.m. UTC
This patch series addresses a critical issue in the VGIC ITS tables'
save/restore mechanism, accompanied by a comprehensive selftest for bug
reproduction and verification.

The identified bug manifests as a failure in VM suspend/resume operations.
The root cause lies in the repeated suspend attempts often required for
successful VM suspension, coupled with concurrent device interrupt registration
and freeing. This concurrency leads to inconsistencies in ITS mappings before
the save operation, potentially leaving orphaned Device Translation Entries
(DTEs) and Interrupt Translation Entries (ITEs) in the respective tables.

During the subsequent restore operation, encountering these orphaned entries
can result in two error scenarios:
* EINVAL Error: If an orphaned entry lacks a corresponding collection ID, the
  restore operation fails with an EINVAL error.
* Mapping Corruption: If an orphaned entry possesses a valid collection ID, the
  restore operation may succeed but with incorrect or lost mappings,
  compromising system integrity.

The provided selftest facilitates the reproduction of both error scenarios:
* EINVAL Reproduction: Execute ./vgic_its_tables without any options.
* Mapping Corruption Reproduction: Execute ./vgic_its_tables -s
  The -s option enforces identical collection IDs for all mappings.
* A workaround within the selftest involves clearing the tables before the save
  operation using the command ./vgic_its_tables -c. With this, we can run the
  the selftest successfully on host w/o the fix.

The core issue stems from the static linked list implementation of DTEs/ITEs,
requiring a full table scan to locate the list head during restoration. This
scan increases the likelihood of encountering orphaned entries.  To rectify
this, the patch series introduces a dummy head to the list, enabling immediate
access to the list head and bypassing the scan. This optimization not only
resolves the bug but also significantly enhances restore performance,
particularly in edge cases where valid entries reside at the end of the table.

Result from the test demonstrates a remarkable 1000x performance improvement in
such edge cases. For instance, with a single L2 device table (64KB) and 8192
mappings (one event per device at the table's end), the restore time is reduced
from 6 seconds to 6 milliseconds.

Importantly, these modifications maintain compatibility with the existing ITS
TABLE ABI REV0.
The table entry was a valid DTE/ITE, or an orphaned DTE/ITE, or an entry of 0.
The dummy entry added in this patch series presents a fourth kind, which is an
invalid entry w/ an offset field pointing to the first valid entry in the table.
The dummy head entry is always the first entry in the table if it exists.

An alternative solution, proposed in patch series [1], involves clearing
DTEs/ITEs during MAPD/DISCARD commands. While this approach requires fewer code
changes, it lacks the performance benefits offered by the dummy head solution
presented in this patch series.

---

* v1:
  - Based on v6.12-rc6

[1] https://lore.kernel.org/linux-arm-kernel/20240704142319.728-1-jiangkunkun@huawei.com

---

Jing Zhang (4):
  KVM: selftests: aarch64: Test VGIC ITS tables save/restore
  KVM: arm64: vgic-its: Add a dummy DTE/ITE if necessary in ITS tables
    save operation
  KVM: arm64: vgic-its: Return device/event id instead of offset in ITS
    tables restore
  KVM: arm64: vgic-its: Utilize the dummy entry in ITS tables restoring

 arch/arm64/kvm/vgic/vgic-its.c                | 154 +++--
 arch/arm64/kvm/vgic/vgic.h                    |   6 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/aarch64/vgic_its_tables.c   | 562 ++++++++++++++++++
 .../kvm/include/aarch64/gic_v3_its.h          |   3 +-
 .../testing/selftests/kvm/include/kvm_util.h  |   4 +-
 .../selftests/kvm/lib/aarch64/gic_v3_its.c    |  24 +-
 7 files changed, 713 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_its_tables.c


base-commit: 59b723cd2adbac2a34fc8e12c74ae26ae45bf230

Comments

Oliver Upton Nov. 5, 2024, 9:33 p.m. UTC | #1
Hi Jing,

On Tue, Nov 05, 2024 at 11:34:18AM -0800, Jing Zhang wrote:
> The core issue stems from the static linked list implementation of DTEs/ITEs,
> requiring a full table scan to locate the list head during restoration. This
> scan increases the likelihood of encountering orphaned entries.  To rectify
> this, the patch series introduces a dummy head to the list, enabling immediate
> access to the list head and bypassing the scan. This optimization not only
> resolves the bug but also significantly enhances restore performance,
> particularly in edge cases where valid entries reside at the end of the table.

I think we need a more targeted fix (i.e. Kunkun's patch) to stop the
bleeding + backport it to stable.

Then we can have a separate discussion about improving the save/restore
performance with your approach.
Jing Zhang Nov. 5, 2024, 9:56 p.m. UTC | #2
On Tue, Nov 5, 2024 at 1:33 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Hi Jing,
>
> On Tue, Nov 05, 2024 at 11:34:18AM -0800, Jing Zhang wrote:
> > The core issue stems from the static linked list implementation of DTEs/ITEs,
> > requiring a full table scan to locate the list head during restoration. This
> > scan increases the likelihood of encountering orphaned entries.  To rectify
> > this, the patch series introduces a dummy head to the list, enabling immediate
> > access to the list head and bypassing the scan. This optimization not only
> > resolves the bug but also significantly enhances restore performance,
> > particularly in edge cases where valid entries reside at the end of the table.
>
> I think we need a more targeted fix (i.e. Kunkun's patch) to stop the
> bleeding + backport it to stable.
>
> Then we can have a separate discussion about improving the save/restore
> performance with your approach.

Yes, I'll respin Kunkun's patch soon. This patch series has the
selftest which we can use for verification.

>
> --
> Thanks,
> Oliver

Thanks,
Jing
Oliver Upton Nov. 5, 2024, 9:58 p.m. UTC | #3
On Tue, Nov 05, 2024 at 01:56:14PM -0800, Jing Zhang wrote:
> On Tue, Nov 5, 2024 at 1:33 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Hi Jing,
> >
> > On Tue, Nov 05, 2024 at 11:34:18AM -0800, Jing Zhang wrote:
> > > The core issue stems from the static linked list implementation of DTEs/ITEs,
> > > requiring a full table scan to locate the list head during restoration. This
> > > scan increases the likelihood of encountering orphaned entries.  To rectify
> > > this, the patch series introduces a dummy head to the list, enabling immediate
> > > access to the list head and bypassing the scan. This optimization not only
> > > resolves the bug but also significantly enhances restore performance,
> > > particularly in edge cases where valid entries reside at the end of the table.
> >
> > I think we need a more targeted fix (i.e. Kunkun's patch) to stop the
> > bleeding + backport it to stable.
> >
> > Then we can have a separate discussion about improving the save/restore
> > performance with your approach.
> 
> Yes, I'll respin Kunkun's patch soon. This patch series has the
> selftest which we can use for verification.

Right -- go ahead and include your selftest as part of that respin, I'd
definitely like to have some test coverage for this whole save/restore mess.