mbox series

[GIT,PULL] KVM/arm64 fixes for 6.3, part #2

Message ID ZBPZ4D9MIsaCNDh6@thinky-boi (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] KVM/arm64 fixes for 6.3, part #2 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-fixes-6.3-2

Message

Oliver Upton March 17, 2023, 3:09 a.m. UTC
Hi Paolo,

Another week, another set of fixes for KVM/arm64.

Description can be found in the tag, but the teardown race when walking
host page tables is particularly nasty and currently causing problems
for folks. The fix is quite simple by disabling interrupts when walking
host page tables, as the thread must be IPI'ed before the table memory
can actually be freed.

Please pull,

Oliver

The following changes since commit 47053904e18282af4525a02e3e0f519f014fc7f9:

  KVM: arm64: timers: Convert per-vcpu virtual offset to a global value (2023-03-11 02:00:40 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-fixes-6.3-2

for you to fetch changes up to 8c2e8ac8ad4be68409e806ce1cc78fc7a04539f3:

  KVM: arm64: Check for kvm_vma_mte_allowed in the critical section (2023-03-16 23:42:56 +0000)

----------------------------------------------------------------
KVM/arm64 fixes for 6.3, part #2

Fixes for a rather interesting set of bugs relating to the MMU:

 - Read the MMU notifier seq before dropping the mmap lock to guard
   against reading a potentially stale VMA

 - Disable interrupts when walking user page tables to protect against
   the page table being freed

 - Read the MTE permissions for the VMA within the mmap lock critical
   section, avoiding the use of a potentally stale VMA pointer

Additionally, some fixes targeting the vPMU:

 - Return the sum of the current perf event value and PMC snapshot for
   reads from userspace

 - Don't save the value of guest writes to PMCR_EL0.{C,P}, which could
   otherwise lead to userspace erroneously resetting the vPMU during VM
   save/restore

----------------------------------------------------------------
David Matlack (1):
      KVM: arm64: Retry fault if vma_lookup() results become invalid

Marc Zyngier (2):
      KVM: arm64: Disable interrupts while walking userspace PTs
      KVM: arm64: Check for kvm_vma_mte_allowed in the critical section

Reiji Watanabe (2):
      KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value
      KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU

 arch/arm64/kvm/mmu.c      | 99 ++++++++++++++++++++++++++++++-----------------
 arch/arm64/kvm/pmu-emul.c |  3 +-
 arch/arm64/kvm/sys_regs.c | 21 +++++++++-
 3 files changed, 85 insertions(+), 38 deletions(-)

Comments

Oliver Upton March 24, 2023, 6:16 p.m. UTC | #1
Paolo,

Pinging this PR in case you missed it, the issues around host page table walks
are particularly urgent as the race on host page table teardown has been
reproduced on some setups.

--
Thanks,
Oliver

On Thu, Mar 16, 2023 at 08:09:20PM -0700, Oliver Upton wrote:
> Hi Paolo,
> 
> Another week, another set of fixes for KVM/arm64.
> 
> Description can be found in the tag, but the teardown race when walking
> host page tables is particularly nasty and currently causing problems
> for folks. The fix is quite simple by disabling interrupts when walking
> host page tables, as the thread must be IPI'ed before the table memory
> can actually be freed.
> 
> Please pull,
> 
> Oliver
> 
> The following changes since commit 47053904e18282af4525a02e3e0f519f014fc7f9:
> 
>   KVM: arm64: timers: Convert per-vcpu virtual offset to a global value (2023-03-11 02:00:40 -0800)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-fixes-6.3-2
> 
> for you to fetch changes up to 8c2e8ac8ad4be68409e806ce1cc78fc7a04539f3:
> 
>   KVM: arm64: Check for kvm_vma_mte_allowed in the critical section (2023-03-16 23:42:56 +0000)
> 
> ----------------------------------------------------------------
> KVM/arm64 fixes for 6.3, part #2
> 
> Fixes for a rather interesting set of bugs relating to the MMU:
> 
>  - Read the MMU notifier seq before dropping the mmap lock to guard
>    against reading a potentially stale VMA
> 
>  - Disable interrupts when walking user page tables to protect against
>    the page table being freed
> 
>  - Read the MTE permissions for the VMA within the mmap lock critical
>    section, avoiding the use of a potentally stale VMA pointer
> 
> Additionally, some fixes targeting the vPMU:
> 
>  - Return the sum of the current perf event value and PMC snapshot for
>    reads from userspace
> 
>  - Don't save the value of guest writes to PMCR_EL0.{C,P}, which could
>    otherwise lead to userspace erroneously resetting the vPMU during VM
>    save/restore
> 
> ----------------------------------------------------------------
> David Matlack (1):
>       KVM: arm64: Retry fault if vma_lookup() results become invalid
> 
> Marc Zyngier (2):
>       KVM: arm64: Disable interrupts while walking userspace PTs
>       KVM: arm64: Check for kvm_vma_mte_allowed in the critical section
> 
> Reiji Watanabe (2):
>       KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value
>       KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU
> 
>  arch/arm64/kvm/mmu.c      | 99 ++++++++++++++++++++++++++++++-----------------
>  arch/arm64/kvm/pmu-emul.c |  3 +-
>  arch/arm64/kvm/sys_regs.c | 21 +++++++++-
>  3 files changed, 85 insertions(+), 38 deletions(-)
>
Marc Zyngier March 27, 2023, 11:39 a.m. UTC | #2
On Fri, 24 Mar 2023 18:16:30 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Paolo,
> 
> Pinging this PR in case you missed it, the issues around host page table walks
> are particularly urgent as the race on host page table teardown has been
> reproduced on some setups.

If we don't hear from Paolo shortly, I suggest we route this via the
arm64 tree.

Thanks,

	M.

> 
> --
> Thanks,
> Oliver
> 
> On Thu, Mar 16, 2023 at 08:09:20PM -0700, Oliver Upton wrote:
> > Hi Paolo,
> > 
> > Another week, another set of fixes for KVM/arm64.
> > 
> > Description can be found in the tag, but the teardown race when walking
> > host page tables is particularly nasty and currently causing problems
> > for folks. The fix is quite simple by disabling interrupts when walking
> > host page tables, as the thread must be IPI'ed before the table memory
> > can actually be freed.
> > 
> > Please pull,
> > 
> > Oliver
> > 
> > The following changes since commit 47053904e18282af4525a02e3e0f519f014fc7f9:
> > 
> >   KVM: arm64: timers: Convert per-vcpu virtual offset to a global value (2023-03-11 02:00:40 -0800)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-fixes-6.3-2
> > 
> > for you to fetch changes up to 8c2e8ac8ad4be68409e806ce1cc78fc7a04539f3:
> > 
> >   KVM: arm64: Check for kvm_vma_mte_allowed in the critical section (2023-03-16 23:42:56 +0000)
> > 
> > ----------------------------------------------------------------
> > KVM/arm64 fixes for 6.3, part #2
> > 
> > Fixes for a rather interesting set of bugs relating to the MMU:
> > 
> >  - Read the MMU notifier seq before dropping the mmap lock to guard
> >    against reading a potentially stale VMA
> > 
> >  - Disable interrupts when walking user page tables to protect against
> >    the page table being freed
> > 
> >  - Read the MTE permissions for the VMA within the mmap lock critical
> >    section, avoiding the use of a potentally stale VMA pointer
> > 
> > Additionally, some fixes targeting the vPMU:
> > 
> >  - Return the sum of the current perf event value and PMC snapshot for
> >    reads from userspace
> > 
> >  - Don't save the value of guest writes to PMCR_EL0.{C,P}, which could
> >    otherwise lead to userspace erroneously resetting the vPMU during VM
> >    save/restore
> > 
> > ----------------------------------------------------------------
> > David Matlack (1):
> >       KVM: arm64: Retry fault if vma_lookup() results become invalid
> > 
> > Marc Zyngier (2):
> >       KVM: arm64: Disable interrupts while walking userspace PTs
> >       KVM: arm64: Check for kvm_vma_mte_allowed in the critical section
> > 
> > Reiji Watanabe (2):
> >       KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value
> >       KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU
> > 
> >  arch/arm64/kvm/mmu.c      | 99 ++++++++++++++++++++++++++++++-----------------
> >  arch/arm64/kvm/pmu-emul.c |  3 +-
> >  arch/arm64/kvm/sys_regs.c | 21 +++++++++-
> >  3 files changed, 85 insertions(+), 38 deletions(-)
> > 
> 
> -- 
> Thanks,
> Oliver
>
Paolo Bonzini March 27, 2023, 1:59 p.m. UTC | #3
On 3/27/23 13:39, Marc Zyngier wrote:
>> Paolo,
>>
>> Pinging this PR in case you missed it, the issues around host page table walks
>> are particularly urgent as the race on host page table teardown has been
>> reproduced on some setups.
> If we don't hear from Paolo shortly, I suggest we route this via the
> arm64 tree.

It missed the pull request I sent on March 17th by a few hours.  I have 
queued it now and will send it to Linus later today.

Paolo
Marc Zyngier March 27, 2023, 2:15 p.m. UTC | #4
On Mon, 27 Mar 2023 14:59:19 +0100,
Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 3/27/23 13:39, Marc Zyngier wrote:
> >> Paolo,
> >> 
> >> Pinging this PR in case you missed it, the issues around host page table walks
> >> are particularly urgent as the race on host page table teardown has been
> >> reproduced on some setups.
> > If we don't hear from Paolo shortly, I suggest we route this via the
> > arm64 tree.
> 
> It missed the pull request I sent on March 17th by a few hours.  I
> have queued it now and will send it to Linus later today.

Maybe you could help us here and state what is your schedule when it
comes to sending these pull requests? It would certainly help
coordinate and avoid wasting 10+ days to get things merged.

I appreciate that you don't need nor want to wait for us to send
something to Linus, but if we know when the train is departing, we can
make sure we're standing on the platform early enough.

Thanks,

	M.
Paolo Bonzini March 27, 2023, 2:22 p.m. UTC | #5
On Mon, Mar 27, 2023 at 4:15 PM Marc Zyngier <maz@kernel.org> wrote:
> > It missed the pull request I sent on March 17th by a few hours.  I
> > have queued it now and will send it to Linus later today.
>
> Maybe you could help us here and state what is your schedule when it
> comes to sending these pull requests? It would certainly help
> coordinate and avoid wasting 10+ days to get things merged.
>
> I appreciate that you don't need nor want to wait for us to send
> something to Linus, but if we know when the train is departing, we can
> make sure we're standing on the platform early enough.

In general, sending the pull request to me on or before Thursday will
work best, though I have no problem sending stuff to Linus on Sunday
morning (so that architectures don't need to time their request too
carefully).

These weeks my Friday afternoon was more free than usual due to all
meetings with American people moving one hour earlier, and that
translated into different family plans for Friday itself and over the
weekend.

Paolo