From patchwork Sat Feb 8 10:53:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13966373 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF5B922FDF1; Sat, 8 Feb 2025 10:54:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739012085; cv=none; b=upO3lAWd/YDV+Jj+yT5eUdP6yI/oaRJA2K+gxnVHekJjHcQNEWqtN96Is5Jxkm80V6OMAeTp0jKwQxRFH+btzKCtmF+La2ogWBDZJvjqwALQMX7cy7CyjNTQU8Ht8VRBiEmBDqDpnLaWZCGerOK4mEC3Lt0p+oMHYJbfcfcAjwI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739012085; c=relaxed/simple; bh=lI/51wLi2H8Tam0i5+pdnLfj876YBIBpf/fiqN/u6YM=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=GuJaJvUsYGM82pKy/b+tdQY8g8lzKG+Q5lVCH8IZL0GctmMCjhOa50ozgGZuab9h7IdHhIyyImOy/Yq3bPJJawKuLuMneV/99xMvhZ9/jJSXRUZ+oa4vvp17DgTX91iXmqCISE3+2E94/RUBjZoZ9pFZ80yi/yd3y5LBRHLWBQ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=e0dMd1kf; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="e0dMd1kf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739012075; x=1770548075; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=lI/51wLi2H8Tam0i5+pdnLfj876YBIBpf/fiqN/u6YM=; b=e0dMd1kf1xiTMXjh+Y2pUClJcKfrT0BAAcmIeejJkwtxxXk1XOlHmaLT TZZcf1QEK+exipPPT9yb0CERy892c+MJgtzPKSFXuC9BsXFHTvQU5yhzM 6S8Hi7bFMh42cadEx+sbQcVUjBqh68NSIJC7lRS18p7W59IBrOP2QgUf8 PhpbQmuHhm8VRR3BW1qROlKOLPfR1Dfr+11yHTqeo5otc4W9o9TrLAYjt i7Viucp5d6QMLCtAlt/7HV/dQmUiZIhxppqo1rQxA3zqvPj+aVMng1hys qwCu9GtYRGzNjAvp+FdDELcnsTAhebeBdEudsFxYK0fz697YX/KYHvVjM Q==; X-CSE-ConnectionGUID: d5DfgFCXTKSthEv9KWEO1A== X-CSE-MsgGUID: iwknCTFHSrmAS6LzuwOW1w== X-IronPort-AV: E=McAfee;i="6700,10204,11338"; a="43310486" X-IronPort-AV: E=Sophos;i="6.13,269,1732608000"; d="scan'208";a="43310486" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Feb 2025 02:54:25 -0800 X-CSE-ConnectionGUID: PbE6dLz4RI2m2AgTbUfwlg== X-CSE-MsgGUID: KdnNn2DzTy25klR1Ypc8pA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,269,1732608000"; d="scan'208";a="116798959" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Feb 2025 02:54:23 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Yan Zhao Subject: [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO in mmu_stress_test Date: Sat, 8 Feb 2025 18:53:18 +0800 Message-ID: <20250208105318.16861-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In the read-only mprotect() phase of mmu_stress_test, ensure that mprotect(PROT_READ) has completed before the guest starts writing to the read-only mprotect() memory. Without waiting for mprotect_ro_done before the guest starts writing in stage 3 (the stage for read-only mprotect()), the host's assertion of stage 3 could fail if mprotect_ro_done is set to true in the window between the guest finishing writes to all GPAs and executing GUEST_SYNC(3). This scenario is easy to occur especially when there are hundred of vCPUs. CPU 0 CPU 1 guest CPU 1 host enter stage 3's 1st loop //in stage 3 write all GPAs @rip 0x4025f0 mprotect(PROT_READ) mprotect_ro_done=true GUEST_SYNC(3) r=0, continue stage 3's 1st loop //in stage 4 write GPA @rip 0x402635 -EFAULT, jump out stage 3's 1st loop enter stage 3's 2nd loop write GPA @rip 0x402635 -EFAULT, continue stage 3's 2nd loop guest rip += 3 The test then fails and reports "Unhandled exception '0xe' at guest RIP '0x402638'", since the next valid guest rip address is 0x402639, i.e. the "(mem) = val" in vcpu_arch_put_guest() is compiled into a mov instruction of length 4. Even if it could be compiled into a mov instruction of length 3, the following execution of GUEST_SYNC(4) in guest could still cause the host failure of the assertion of stage 3. Signed-off-by: Yan Zhao --- tools/testing/selftests/kvm/mmu_stress_test.c | 25 +++++++++++-------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/kvm/mmu_stress_test.c b/tools/testing/selftests/kvm/mmu_stress_test.c index d9c76b4c0d88..a2ccf705bb2a 100644 --- a/tools/testing/selftests/kvm/mmu_stress_test.c +++ b/tools/testing/selftests/kvm/mmu_stress_test.c @@ -35,11 +35,16 @@ static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride) GUEST_SYNC(2); /* - * Write to the region while mprotect(PROT_READ) is underway. Keep - * looping until the memory is guaranteed to be read-only, otherwise - * vCPUs may complete their writes and advance to the next stage - * prematurely. - * + * mprotect(PROT_READ) is underway. Keep looping until the memory is + * guaranteed to be read-only, otherwise vCPUs may complete their + * writes and advance to the next stage prematurely. + */ + do { + ; + } while (!READ_ONCE(mprotect_ro_done)); + + /* + * Write to the region after mprotect(PROT_READ) is done. * For architectures that support skipping the faulting instruction, * generate the store via inline assembly to ensure the exact length * of the instruction is known and stable (vcpu_arch_put_guest() on @@ -47,16 +52,14 @@ static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride) * is low in this case). For x86, hand-code the exact opcode so that * there is no room for variability in the generated instruction. */ - do { - for (gpa = start_gpa; gpa < end_gpa; gpa += stride) + for (gpa = start_gpa; gpa < end_gpa; gpa += stride) #ifdef __x86_64__ - asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */ + asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */ #elif defined(__aarch64__) - asm volatile("str %0, [%0]" :: "r" (gpa) : "memory"); + asm volatile("str %0, [%0]" :: "r" (gpa) : "memory"); #else - vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa); + vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa); #endif - } while (!READ_ONCE(mprotect_ro_done)); /* * Only architectures that write the entire range can explicitly sync,