[PATCHv2,0/2] kvm/x86: vhost task creation failure handling

Message ID	20250226213844.3826821-1-kbusch@meta.com (mailing list archive)
Headers	show Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAEA925D542 for <kvm@vger.kernel.org>; Wed, 26 Feb 2025 21:39:32 +0000 (UTC) From: Keith Busch <kbusch@meta.com> To: <pbonzini@redhat.com>, <seanjc@google.com>, <kvm@vger.kernel.org> CC: <x86@kernel.org>, <virtualization@lists.linux.dev>, <linux-kernel@vger.kernel.org>, Keith Busch <kbusch@kernel.org> Subject: [PATCHv2 0/2] kvm/x86: vhost task creation failure handling Date: Wed, 26 Feb 2025 13:38:42 -0800 Message-ID: <20250226213844.3826821-1-kbusch@meta.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain
Series	kvm/x86: vhost task creation failure handling \| expand [PATCHv2,0/2] kvm/x86: vhost task creation failure handling [PATCHv2,1/2] vhost: return task creation error instead of NULL [PATCHv2,2/2] kvm: retry nx_huge_page_recovery_thread creation

Message ID

20250226213844.3826821-1-kbusch@meta.com (mailing list archive)

Headers

From: Keith Busch <kbusch@meta.com>
To: <pbonzini@redhat.com>, <seanjc@google.com>, <kvm@vger.kernel.org>
CC: <x86@kernel.org>, <virtualization@lists.linux.dev>,
        <linux-kernel@vger.kernel.org>, Keith Busch <kbusch@kernel.org>
Subject: [PATCHv2 0/2] kvm/x86: vhost task creation failure handling
Date: Wed, 26 Feb 2025 13:38:42 -0800
Message-ID: <20250226213844.3826821-1-kbusch@meta.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain

Series

kvm/x86: vhost task creation failure handling | expand

Message

Keith Busch Feb. 26, 2025, 9:38 p.m. UTC

From: Keith Busch <kbusch@kernel.org>

The suggestion from Sean appears to be successful, so sending out a new
version for consideration.

Background:

The crosvm VMM might send signals to its threads that have entered
KVM_RUN. The signal specifically is SIGRTRMIN from here:

  https://github.com/google/crosvm/blob/main/src/crosvm/sys/linux/vcpu.rs#L651

If this happens to occur when the huge page recovery is trying to create
its vhost task, that will fail with ERESTARTNOINTR. Once this happens,
all KVM_RUN calls will fail with ENOMEM despite memory not being the
problem.

This series propogates the error up so we can distinguish that from the
current defaulting to ENOMEM and replaces the call_once since we need to
be able to call it repeatedly due to this condition.

Changes from the v1 (prefixed as an "RFC", really) patch:

  Instead of using a VM-wide mutex, update the call_once pattern to
  complete only if what it calls is successful (from Sean).

Keith Busch (1):
  vhost: return task creation error instead of NULL

Sean Christopherson (1):
  kvm: retry nx_huge_page_recovery_thread creation

 arch/x86/kvm/mmu/mmu.c    | 12 +++++-------
 drivers/vhost/vhost.c     |  2 +-
 include/linux/call_once.h | 16 +++++++++++-----
 kernel/vhost_task.c       |  4 ++--
 4 files changed, 19 insertions(+), 15 deletions(-)

Comments

Lei Yang Feb. 27, 2025, 10:21 a.m. UTC | #1

Hi Keith

There are some error messages from qemu output when I tested this
series of patches with the virtio-net regression test. It can
reproduced by boot up a guest with vhost device after applied your
patches.
Error messages:
Qemu output:
qemu-kvm: -netdev {"id": "idoejzv8", "type": "tap", "vhost": true,
"vhostfd": "16", "fd": "10"}: vhost_set_owner failed: Cannot allocate
memory
qemu-kvm: -netdev {"id": "idoejzv8", "type": "tap", "vhost": true,
"vhostfd": "16", "fd": "10"}: vhost-net requested but could not be
initialized

My tests based on this commit:
# git log -1
commit dd83757f6e686a2188997cb58b5975f744bb7786 (HEAD -> master,
origin/master, origin/HEAD)
Merge: 102c16a1f9a9 eb54d2695b57
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Feb 26 16:55:30 2025 -0800

    Merge tag 'bcachefs-2025-02-26' of git://evilpiepirate.org/bcachefs

Thanks
Lei



On Thu, Feb 27, 2025 at 5:40 AM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> The suggestion from Sean appears to be successful, so sending out a new
> version for consideration.
>
> Background:
>
> The crosvm VMM might send signals to its threads that have entered
> KVM_RUN. The signal specifically is SIGRTRMIN from here:
>
>   https://github.com/google/crosvm/blob/main/src/crosvm/sys/linux/vcpu.rs#L651
>
> If this happens to occur when the huge page recovery is trying to create
> its vhost task, that will fail with ERESTARTNOINTR. Once this happens,
> all KVM_RUN calls will fail with ENOMEM despite memory not being the
> problem.
>
> This series propogates the error up so we can distinguish that from the
> current defaulting to ENOMEM and replaces the call_once since we need to
> be able to call it repeatedly due to this condition.
>
> Changes from the v1 (prefixed as an "RFC", really) patch:
>
>   Instead of using a VM-wide mutex, update the call_once pattern to
>   complete only if what it calls is successful (from Sean).
>
> Keith Busch (1):
>   vhost: return task creation error instead of NULL
>
> Sean Christopherson (1):
>   kvm: retry nx_huge_page_recovery_thread creation
>
>  arch/x86/kvm/mmu/mmu.c    | 12 +++++-------
>  drivers/vhost/vhost.c     |  2 +-
>  include/linux/call_once.h | 16 +++++++++++-----
>  kernel/vhost_task.c       |  4 ++--
>  4 files changed, 19 insertions(+), 15 deletions(-)
>
> --
> 2.43.5
>
>

Keith Busch Feb. 27, 2025, 3:09 p.m. UTC | #2

On Thu, Feb 27, 2025 at 06:21:34PM +0800, Lei Yang wrote:
> Hi Keith
> 
> There are some error messages from qemu output when I tested this
> series of patches with the virtio-net regression test. It can
> reproduced by boot up a guest with vhost device after applied your
> patches.
> Error messages:
> Qemu output:
> qemu-kvm: -netdev {"id": "idoejzv8", "type": "tap", "vhost": true,
> "vhostfd": "16", "fd": "10"}: vhost_set_owner failed: Cannot allocate
> memory
> qemu-kvm: -netdev {"id": "idoejzv8", "type": "tap", "vhost": true,
> "vhostfd": "16", "fd": "10"}: vhost-net requested but could not be
> initialized

*facepalm*

I accidently left the "!" in the condition:

@@ -666,7 +666,7 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 
 	vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed,
 				 worker, name);
-	if (!vtsk)
+	if (!IS_ERR(vtsk))
 		goto free_worker;