From patchwork Tue Jul 23 01:05:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Porter X-Patchwork-Id: 13739261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E009AC3DA5D for ; Tue, 23 Jul 2024 01:07:15 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sW3yc-0001hC-Hl; Mon, 22 Jul 2024 21:06:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sW3ya-0001ew-U0 for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:00 -0400 Received: from mail-qk1-x733.google.com ([2607:f8b0:4864:20::733]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sW3yY-0008P4-Bg for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:00 -0400 Received: by mail-qk1-x733.google.com with SMTP id af79cd13be357-7a1a6af940cso214517685a.0 for ; Mon, 22 Jul 2024 18:05:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.unc.edu; s=google; t=1721696756; x=1722301556; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BuI0OBU7+znySR9fck86H7yJwkkhEYX3J4q9eeUwu80=; b=YIDn7GxFHm7T2yQpnqBuQp30MKSobns4EoMwNc5tsLbMSj+NEPW250fXTPqgfEnd9a OgXaPJRwMJLAaRpVzIfKAMnuOmam/YJm+lp3d2xsYzgjh1i6/pQSAfs1tJL/w0tHI2BL 1ubwNo6GbaX3bjZ3a1SHGWPqKgPs+5iIpDoyFb0YURXjGAKMfmvQzd7w/A0B+no1k7We 4iBwUzYxNRRuRtxZOvHSLcCffEltfsabuKieNW6InLQpim622qzMSubYYKgplXJfW45+ YPchIvu0JA623pqv35kCDIH0V4GyrUk3WQIlWTKtbUEliBmjBkT4WQXLW+5ocfQ3CoDm hr+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721696756; x=1722301556; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BuI0OBU7+znySR9fck86H7yJwkkhEYX3J4q9eeUwu80=; b=LtVLWrPhp+oUeKAs/xfrdQYKLiUWEu1S5twQpSVCFctcsJQpBp9294oBlx1+dAsKv7 1J353ZGLQ6xBNs8N0nACfZV6dAqM/bGfhHCzPLBIxTcS5wdR7ka/28WOFu1FGhGR0fCi Sb2C/eNFr7AYN/XYNkDTlZKvJhEmocZZxrAOGIRogv3UXNh2SU7yVkYtkqKYqp4XOiyI DHzqayL+TXkNBa/IYxOBTNPSg/U68phyY4bRiZiaMTSMdQIFOAif6Xmj+Bj7Q8UDv+hd TcNxrCftMmhLZe21ouml+VVmfeLptJIYX1o3lY8BbTVrRBhlOQ1MX6UlnYmVPQkvq636 szPQ== X-Gm-Message-State: AOJu0YyvFg7V5SW0Q3eNJZR0sCUFZRBkGy2k/dkjhl0KPhURe82tZQu5 GX5yil0Z3ltq2dYy/dO3Eh+91Rj63T9B6fA6azx9f/gCkICUgz2u0k+amiP5Naeho+C6PkykmcL hDXOQD0H7fwqwvxS5tKS7pSKqPLHCEibSWv2d3D8dHzhmXYNbtlWRtffEwBU7OBXT8RFeK4KJbc O97bjhKEBl2c/9FR+p3z4eC06rgh9C X-Google-Smtp-Source: AGHT+IFw+F8IEQN74p1phVOPI1IGNKcs1wUZiYCPfugJbg0cFtziOxs9V1OFiVWNcXT7fA56dGcDLA== X-Received: by 2002:a05:620a:2944:b0:79d:63ec:b7df with SMTP id af79cd13be357-7a1c2eb955dmr99009785a.5.1721696756456; Mon, 22 Jul 2024 18:05:56 -0700 (PDT) Received: from kermit.cs.unc.edu (kermit.cs.unc.edu. [152.2.133.133]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f9cdc0cbdsm37953601cf.92.2024.07.22.18.05.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 18:05:55 -0700 (PDT) From: Don Porter To: qemu-devel@nongnu.org Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com, richard.henderson@linaro.org, philmd@linaro.org, berrange@redhat.com, Don Porter Subject: [PATCH v4 1/7] Code motion: expose some TCG definitions for page table walk consolidation. Date: Mon, 22 Jul 2024 21:05:39 -0400 Message-Id: <20240723010545.3648706-2-porter@cs.unc.edu> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240723010545.3648706-1-porter@cs.unc.edu> References: <20240723010545.3648706-1-porter@cs.unc.edu> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::733; envelope-from=porter@cs.unc.edu; helo=mail-qk1-x733.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Don Porter --- include/hw/core/sysemu-cpu-ops.h | 6 +++++ target/i386/cpu.h | 5 ++-- target/i386/helper.c | 36 +++++++++++++++++++++++++++ target/i386/tcg/helper-tcg.h | 32 ++++++++++++++++++++++++ target/i386/tcg/seg_helper.c | 36 --------------------------- target/i386/tcg/sysemu/excp_helper.c | 37 +--------------------------- 6 files changed, 77 insertions(+), 75 deletions(-) diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h index 24d003fe04..4c94e51267 100644 --- a/include/hw/core/sysemu-cpu-ops.h +++ b/include/hw/core/sysemu-cpu-ops.h @@ -12,6 +12,12 @@ #include "hw/core/cpu.h" +typedef enum TranslateFaultStage2 { + S2_NONE, + S2_GPA, + S2_GPT, +} TranslateFaultStage2; + /* * struct SysemuCPUOps: System operations specific to a CPU class */ diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 1e121acef5..d899644cb8 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -21,6 +21,7 @@ #define I386_CPU_H #include "sysemu/tcg.h" +#include "hw/core/sysemu-cpu-ops.h" #include "cpu-qom.h" #include "kvm/hyperv-proto.h" #include "exec/cpu-defs.h" @@ -2362,6 +2363,7 @@ void host_cpuid(uint32_t function, uint32_t count, bool cpu_has_x2apic_feature(CPUX86State *env); /* helper.c */ +int get_pg_mode(CPUX86State *env); void x86_cpu_set_a20(X86CPU *cpu, int a20_state); void cpu_sync_avx_hflag(CPUX86State *env); @@ -2540,9 +2542,6 @@ static inline bool cpu_vmx_maybe_enabled(CPUX86State *env) ((env->cr[4] & CR4_VMXE_MASK) || (env->hflags & HF_SMM_MASK)); } -/* excp_helper.c */ -int get_pg_mode(CPUX86State *env); - /* fpu_helper.c */ void update_fp_status(CPUX86State *env); void update_mxcsr_status(CPUX86State *env); diff --git a/target/i386/helper.c b/target/i386/helper.c index 01a268a30b..9cb6e51426 100644 --- a/target/i386/helper.c +++ b/target/i386/helper.c @@ -721,3 +721,39 @@ void x86_stq_phys(CPUState *cs, hwaddr addr, uint64_t val) address_space_stq(as, addr, val, attrs, NULL); } #endif + +int get_pg_mode(CPUX86State *env) +{ + int pg_mode = 0; + if (!(env->cr[0] & CR0_PG_MASK)) { + return 0; + } + if (env->cr[0] & CR0_WP_MASK) { + pg_mode |= PG_MODE_WP; + } + if (env->cr[4] & CR4_PAE_MASK) { + pg_mode |= PG_MODE_PAE; + if (env->efer & MSR_EFER_NXE) { + pg_mode |= PG_MODE_NXE; + } + } + if (env->cr[4] & CR4_PSE_MASK) { + pg_mode |= PG_MODE_PSE; + } + if (env->cr[4] & CR4_SMEP_MASK) { + pg_mode |= PG_MODE_SMEP; + } + if (env->hflags & HF_LMA_MASK) { + pg_mode |= PG_MODE_LMA; + if (env->cr[4] & CR4_PKE_MASK) { + pg_mode |= PG_MODE_PKE; + } + if (env->cr[4] & CR4_PKS_MASK) { + pg_mode |= PG_MODE_PKS; + } + if (env->cr[4] & CR4_LA57_MASK) { + pg_mode |= PG_MODE_LA57; + } + } + return pg_mode; +} diff --git a/target/i386/tcg/helper-tcg.h b/target/i386/tcg/helper-tcg.h index 15d6c6f8b4..1cbeab9161 100644 --- a/target/i386/tcg/helper-tcg.h +++ b/target/i386/tcg/helper-tcg.h @@ -92,6 +92,38 @@ extern const uint8_t parity_table[256]; /* misc_helper.c */ void cpu_load_eflags(CPUX86State *env, int eflags, int update_mask); +/* sysemu/excp_helper.c */ +typedef struct TranslateFault { + int exception_index; + int error_code; + target_ulong cr2; + TranslateFaultStage2 stage2; +} TranslateFault; + +typedef struct PTETranslate { + CPUX86State *env; + TranslateFault *err; + int ptw_idx; + void *haddr; + hwaddr gaddr; +} PTETranslate; + +bool ptw_setl_slow(const PTETranslate *in, uint32_t old, uint32_t new); + +static inline bool ptw_setl(const PTETranslate *in, uint32_t old, uint32_t set) +{ + if (set & ~old) { + uint32_t new = old | set; + if (likely(in->haddr)) { + old = cpu_to_le32(old); + new = cpu_to_le32(new); + return qatomic_cmpxchg((uint32_t *)in->haddr, old, new) == old; + } + return ptw_setl_slow(in, old, new); + } + return true; +} + /* sysemu/svm_helper.c */ #ifndef CONFIG_USER_ONLY G_NORETURN void cpu_vmexit(CPUX86State *nenv, uint32_t exit_code, diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index aac092a356..90f01180d9 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -92,42 +92,6 @@ static uint32_t popl(StackAccess *sa) return ret; } -int get_pg_mode(CPUX86State *env) -{ - int pg_mode = 0; - if (!(env->cr[0] & CR0_PG_MASK)) { - return 0; - } - if (env->cr[0] & CR0_WP_MASK) { - pg_mode |= PG_MODE_WP; - } - if (env->cr[4] & CR4_PAE_MASK) { - pg_mode |= PG_MODE_PAE; - if (env->efer & MSR_EFER_NXE) { - pg_mode |= PG_MODE_NXE; - } - } - if (env->cr[4] & CR4_PSE_MASK) { - pg_mode |= PG_MODE_PSE; - } - if (env->cr[4] & CR4_SMEP_MASK) { - pg_mode |= PG_MODE_SMEP; - } - if (env->hflags & HF_LMA_MASK) { - pg_mode |= PG_MODE_LMA; - if (env->cr[4] & CR4_PKE_MASK) { - pg_mode |= PG_MODE_PKE; - } - if (env->cr[4] & CR4_PKS_MASK) { - pg_mode |= PG_MODE_PKS; - } - if (env->cr[4] & CR4_LA57_MASK) { - pg_mode |= PG_MODE_LA57; - } - } - return pg_mode; -} - /* return non zero if error */ static inline int load_segment_ra(CPUX86State *env, uint32_t *e1_ptr, uint32_t *e2_ptr, int selector, diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c index 8fb05b1f53..3ebb67d65b 100644 --- a/target/i386/tcg/sysemu/excp_helper.c +++ b/target/i386/tcg/sysemu/excp_helper.c @@ -39,27 +39,6 @@ typedef struct TranslateResult { int page_size; } TranslateResult; -typedef enum TranslateFaultStage2 { - S2_NONE, - S2_GPA, - S2_GPT, -} TranslateFaultStage2; - -typedef struct TranslateFault { - int exception_index; - int error_code; - target_ulong cr2; - TranslateFaultStage2 stage2; -} TranslateFault; - -typedef struct PTETranslate { - CPUX86State *env; - TranslateFault *err; - int ptw_idx; - void *haddr; - hwaddr gaddr; -} PTETranslate; - static bool ptw_translate(PTETranslate *inout, hwaddr addr, uint64_t ra) { CPUTLBEntryFull *full; @@ -104,7 +83,7 @@ static inline uint64_t ptw_ldq(const PTETranslate *in, uint64_t ra) * even 64-bit ones, because PG_PRESENT_MASK, PG_ACCESSED_MASK and * PG_DIRTY_MASK are all in the low 32 bits. */ -static bool ptw_setl_slow(const PTETranslate *in, uint32_t old, uint32_t new) +bool ptw_setl_slow(const PTETranslate *in, uint32_t old, uint32_t new) { uint32_t cmp; @@ -118,20 +97,6 @@ static bool ptw_setl_slow(const PTETranslate *in, uint32_t old, uint32_t new) return cmp == old; } -static inline bool ptw_setl(const PTETranslate *in, uint32_t old, uint32_t set) -{ - if (set & ~old) { - uint32_t new = old | set; - if (likely(in->haddr)) { - old = cpu_to_le32(old); - new = cpu_to_le32(new); - return qatomic_cmpxchg((uint32_t *)in->haddr, old, new) == old; - } - return ptw_setl_slow(in, old, new); - } - return true; -} - static bool mmu_translate(CPUX86State *env, const TranslateParams *in, TranslateResult *out, TranslateFault *err, uint64_t ra) From patchwork Tue Jul 23 01:05:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Porter X-Patchwork-Id: 13739264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C7069C3DA70 for ; Tue, 23 Jul 2024 01:07:29 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sW3ye-0001n8-2i; Mon, 22 Jul 2024 21:06:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sW3yc-0001il-RH for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:02 -0400 Received: from mail-qk1-x734.google.com ([2607:f8b0:4864:20::734]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sW3ya-0008PR-PS for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:02 -0400 Received: by mail-qk1-x734.google.com with SMTP id af79cd13be357-79efbc9328bso289981585a.1 for ; Mon, 22 Jul 2024 18:05:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.unc.edu; s=google; t=1721696758; x=1722301558; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NtpcxD+hVAJHL0wI1O+3ONDgQ3GWnMejKdakkXVeoP4=; b=AoiS2Vi6Bd+sgIy8kyCgkFhmpz2oYTng7rPeZDulbNj/HFtS0zHWMdRa0/J6RwN5Ex pqUbP3aeBVmPvLZO5nauOxcZvT++A8zQTV+cr/6u13uNICmHpsG8ZhasAWcDmH7+GYVT 4guXLzJVgEWInF6P2llkQkAvQrYdTOhW3PMzquDQ/onfUK2mu3ymxtrdlnjQjJ+hoUVl 7hF/cFi5vynO198/9+Taa4FK1Wmc5NRN+zGmY1DT7cJDo3aqLR9207wMK0DGS0lWHxCB G3UEWZu96BKc3F4yK5MFb3PYVdYg3IidA2EOvUiI30N9+IZ63x7A74cLjWaOohrf1Odx +IbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721696758; x=1722301558; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NtpcxD+hVAJHL0wI1O+3ONDgQ3GWnMejKdakkXVeoP4=; b=QsXtE4atysKqEwmIxKd1d2sgbq7LYQwTOJekyuG25se69b4D8s/tQHx5HFaHXwvO7w 3UgrafKZEMmug80R/4nHJVAF+knjtUO9BvIOgyf2Ss+cWwsoM7xVWsW+ivtnXVzVxTrS kToJajNvsg8YPnliLu7Qjswpz2uYukFJcZJL8Rhxlu9OpsbL7GqnL3dGw0CQpydbO1PR aKsA7p4i3Ryds932tWIMJJlwAqsiCIBdCKv60DSiz9KC3g7aELBQZCf70TSVE+fA0wtH 3N8caLiZ8UELk9YTl2yCAkU4DEETshmZYnk5fgVBw+wQPXnNdEOBoaiTv7Naxr0vo+oG qo1Q== X-Gm-Message-State: AOJu0YzC4qJg7wRBQOTuNW8AqBiFnQqeMQMufPpXkZVnUHNGe8SKQpiX i2qloXxyGydyQm/3UEorGxT2JQuHpNHz8VmI6I6KxOdjBxFInI+CRJdcDQ34ey0QpBBU7swKrbv /xM5RAJqgEqCOBo89Z1urECFxu3Op3v0OAjBaJx+MBz/nNA1EmYG0G5Ih7J3rWzzMisCYHFIsi1 i2a5AvMl8m0B3rtV0JQJcA0g1gLQP3 X-Google-Smtp-Source: AGHT+IELTaY9CeaJzxVZdEQMmc4dnw4/2gRkZNQ/W0GtL0FGO03MqACt+lshZ+e8giBVy45L/iP3gA== X-Received: by 2002:a05:620a:3951:b0:79f:5d5:f03e with SMTP id af79cd13be357-7a1c06bb790mr183518985a.17.1721696758195; Mon, 22 Jul 2024 18:05:58 -0700 (PDT) Received: from kermit.cs.unc.edu (kermit.cs.unc.edu. [152.2.133.133]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f9cdc0cbdsm37953601cf.92.2024.07.22.18.05.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 18:05:57 -0700 (PDT) From: Don Porter To: qemu-devel@nongnu.org Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com, richard.henderson@linaro.org, philmd@linaro.org, berrange@redhat.com, Don Porter Subject: [PATCH v4 2/7] Import vmcs12 definition from Linux/KVM Date: Mon, 22 Jul 2024 21:05:40 -0400 Message-Id: <20240723010545.3648706-3-porter@cs.unc.edu> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240723010545.3648706-1-porter@cs.unc.edu> References: <20240723010545.3648706-1-porter@cs.unc.edu> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::734; envelope-from=porter@cs.unc.edu; helo=mail-qk1-x734.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Don Porter --- target/i386/kvm/vmcs12.h | 213 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 213 insertions(+) create mode 100644 target/i386/kvm/vmcs12.h diff --git a/target/i386/kvm/vmcs12.h b/target/i386/kvm/vmcs12.h new file mode 100644 index 0000000000..c7b139f4db --- /dev/null +++ b/target/i386/kvm/vmcs12.h @@ -0,0 +1,213 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef QEMU_KVM_X86_VMX_VMCS12_H +#define QEMU_KVM_X86_VMX_VMCS12_H + +#include + +/* XXX: Stolen from Linux with light edits, for now */ + +typedef uint64_t u64; +typedef uint32_t u32; +typedef uint16_t u16; + +/* + * struct vmcs12 describes the state that our guest hypervisor (L1) keeps for a + * single nested guest (L2), hence the name vmcs12. Any VMX implementation has + * a VMCS structure, and vmcs12 is our emulated VMX's VMCS. This structure is + * stored in guest memory specified by VMPTRLD, but is opaque to the guest, + * which must access it using VMREAD/VMWRITE/VMCLEAR instructions. + * More than one of these structures may exist, if L1 runs multiple L2 guests. + * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the + * underlying hardware which will be used to run L2. + * This structure is packed to ensure that its layout is identical across + * machines (necessary for live migration). + * + * IMPORTANT: Changing the layout of existing fields in this structure + * will break save/restore compatibility with older kvm releases. When + * adding new fields, either use space in the reserved padding* arrays + * or add the new fields to the end of the structure. + */ +typedef u64 natural_width; + + +struct vmcs_hdr { + u32 revision_id : 31; + u32 shadow_vmcs : 1; +}; + +struct __attribute__ ((__packed__)) vmcs12 { + /* + * According to the Intel spec, a VMCS region must start with the + * following two fields. Then follow implementation-specific data. + */ + struct vmcs_hdr hdr; + u32 abort; + + u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */ + u32 padding[7]; /* room for future expansion */ + + u64 io_bitmap_a; + u64 io_bitmap_b; + u64 msr_bitmap; + u64 vm_exit_msr_store_addr; + u64 vm_exit_msr_load_addr; + u64 vm_entry_msr_load_addr; + u64 tsc_offset; + u64 virtual_apic_page_addr; + u64 apic_access_addr; + u64 posted_intr_desc_addr; + u64 ept_pointer; + u64 eoi_exit_bitmap0; + u64 eoi_exit_bitmap1; + u64 eoi_exit_bitmap2; + u64 eoi_exit_bitmap3; + u64 xss_exit_bitmap; + u64 guest_physical_address; + u64 vmcs_link_pointer; + u64 guest_ia32_debugctl; + u64 guest_ia32_pat; + u64 guest_ia32_efer; + u64 guest_ia32_perf_global_ctrl; + u64 guest_pdptr0; + u64 guest_pdptr1; + u64 guest_pdptr2; + u64 guest_pdptr3; + u64 guest_bndcfgs; + u64 host_ia32_pat; + u64 host_ia32_efer; + u64 host_ia32_perf_global_ctrl; + u64 vmread_bitmap; + u64 vmwrite_bitmap; + u64 vm_function_control; + u64 eptp_list_address; + u64 pml_address; + u64 encls_exiting_bitmap; + u64 tsc_multiplier; + u64 padding64[1]; /* room for future expansion */ + /* + * To allow migration of L1 (complete with its L2 guests) between + * machines of different natural widths (32 or 64 bit), we cannot have + * unsigned long fields with no explicit size. We use u64 (aliased + * natural_width) instead. Luckily, x86 is little-endian. + */ + natural_width cr0_guest_host_mask; + natural_width cr4_guest_host_mask; + natural_width cr0_read_shadow; + natural_width cr4_read_shadow; + /* Last remnants of cr3_target_value[0-3]. */ + natural_width dead_space[4]; + natural_width exit_qualification; + natural_width guest_linear_address; + natural_width guest_cr0; + natural_width guest_cr3; + natural_width guest_cr4; + natural_width guest_es_base; + natural_width guest_cs_base; + natural_width guest_ss_base; + natural_width guest_ds_base; + natural_width guest_fs_base; + natural_width guest_gs_base; + natural_width guest_ldtr_base; + natural_width guest_tr_base; + natural_width guest_gdtr_base; + natural_width guest_idtr_base; + natural_width guest_dr7; + natural_width guest_rsp; + natural_width guest_rip; + natural_width guest_rflags; + natural_width guest_pending_dbg_exceptions; + natural_width guest_sysenter_esp; + natural_width guest_sysenter_eip; + natural_width host_cr0; + natural_width host_cr3; + natural_width host_cr4; + natural_width host_fs_base; + natural_width host_gs_base; + natural_width host_tr_base; + natural_width host_gdtr_base; + natural_width host_idtr_base; + natural_width host_ia32_sysenter_esp; + natural_width host_ia32_sysenter_eip; + natural_width host_rsp; + natural_width host_rip; + natural_width paddingl[8]; /* room for future expansion */ + u32 pin_based_vm_exec_control; + u32 cpu_based_vm_exec_control; + u32 exception_bitmap; + u32 page_fault_error_code_mask; + u32 page_fault_error_code_match; + u32 cr3_target_count; + u32 vm_exit_controls; + u32 vm_exit_msr_store_count; + u32 vm_exit_msr_load_count; + u32 vm_entry_controls; + u32 vm_entry_msr_load_count; + u32 vm_entry_intr_info_field; + u32 vm_entry_exception_error_code; + u32 vm_entry_instruction_len; + u32 tpr_threshold; + u32 secondary_vm_exec_control; + u32 vm_instruction_error; + u32 vm_exit_reason; + u32 vm_exit_intr_info; + u32 vm_exit_intr_error_code; + u32 idt_vectoring_info_field; + u32 idt_vectoring_error_code; + u32 vm_exit_instruction_len; + u32 vmx_instruction_info; + u32 guest_es_limit; + u32 guest_cs_limit; + u32 guest_ss_limit; + u32 guest_ds_limit; + u32 guest_fs_limit; + u32 guest_gs_limit; + u32 guest_ldtr_limit; + u32 guest_tr_limit; + u32 guest_gdtr_limit; + u32 guest_idtr_limit; + u32 guest_es_ar_bytes; + u32 guest_cs_ar_bytes; + u32 guest_ss_ar_bytes; + u32 guest_ds_ar_bytes; + u32 guest_fs_ar_bytes; + u32 guest_gs_ar_bytes; + u32 guest_ldtr_ar_bytes; + u32 guest_tr_ar_bytes; + u32 guest_interruptibility_info; + u32 guest_activity_state; + u32 guest_sysenter_cs; + u32 host_ia32_sysenter_cs; + u32 vmx_preemption_timer_value; + u32 padding32[7]; /* room for future expansion */ + u16 virtual_processor_id; + u16 posted_intr_nv; + u16 guest_es_selector; + u16 guest_cs_selector; + u16 guest_ss_selector; + u16 guest_ds_selector; + u16 guest_fs_selector; + u16 guest_gs_selector; + u16 guest_ldtr_selector; + u16 guest_tr_selector; + u16 guest_intr_status; + u16 host_es_selector; + u16 host_cs_selector; + u16 host_ss_selector; + u16 host_ds_selector; + u16 host_fs_selector; + u16 host_gs_selector; + u16 host_tr_selector; + u16 guest_pml_index; +}; + +/* + * VMCS12_REVISION is an arbitrary id that should be changed if the content or + * layout of struct vmcs12 is changed. MSR_IA32_VMX_BASIC returns this id, and + * VMPTRLD verifies that the VMCS region that L1 is loading contains this id. + * + * IMPORTANT: Changing this value will break save/restore compatibility with + * older kvm releases. + */ +#define VMCS12_REVISION 0x11e57ed0 + +#endif From patchwork Tue Jul 23 01:05:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Porter X-Patchwork-Id: 13739265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 539FEC3DA70 for ; Tue, 23 Jul 2024 01:07:33 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sW3yj-00029Y-O5; Mon, 22 Jul 2024 21:06:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sW3yh-000220-Oz for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:07 -0400 Received: from mail-qt1-x830.google.com ([2607:f8b0:4864:20::830]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sW3yd-0008Pr-6a for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:07 -0400 Received: by mail-qt1-x830.google.com with SMTP id d75a77b69052e-447eb65d366so26304771cf.1 for ; Mon, 22 Jul 2024 18:06:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.unc.edu; s=google; t=1721696761; x=1722301561; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9/mdOQDY1ghGA+xvcmnOOxcVL1lDJZ6kJ8Mir8k5wNo=; b=OWdYg+a7c1ZnyxwE14LEer8eoftCJ5ywXuDe08/kkEhlTuYSpFWa/X0g4DAXoCB3HF B0CdV9NA2UglshxZ1gaTgZhAaX9gA+5tIqQNALfaBou9YAENccvbDiL9G5Q1WzaOYsv2 5WlCe6zXBTCBu/CXJut/e+EacD6tTPQtn3n1raVFH0AK9cbfoheJTIG7BLsehN3VG3js a5UTm1HhgFSOxeCoqPmMfFaU6VMsq9zeynZ+6ZUBKQpfow5CgdOQ89xnjY3TKYyon/xy KXgFp5eAj//V92pMxHDhSLINCmPAhfZIU6JinWtrHiThVa34HbckQ5ro9g9lmubM2hX0 f25Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721696761; x=1722301561; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9/mdOQDY1ghGA+xvcmnOOxcVL1lDJZ6kJ8Mir8k5wNo=; b=d3CjUfM0FhrB4HBGpRFsxiErnmfCTfcG2V7P5ihWbnLvGyvuP4PWyfgQLMudAEwMuG jzEICRyACcDxLShEyQG40oYMd5pS/g0KY1E1kgD3yfNnAwEaqFVzgPZW5KZ6Tea9NWuS clzTSiaP9oI5MjjgU2ImNQMl7Lc0kFSriMeq4Gihi//RF6DjDwlTK9/gzukKv4mbJ3nD ykAtC+RchVl78xt3ZEumIklf/1Wzt4liXzZvmYtOjpXQZ1MHTJGlqzxDzzA/s3jylfoO 1PjgYYDwVclqj5ozjIiaYOmsdzhWt+ccCbx4bWikqA41NtbDSlgJjNy5IHxI9SJYK2Ax P/nA== X-Gm-Message-State: AOJu0Yx6cc17hIc1cwDUoBqoARZK0qnlQ0jVhDnFVYK2Fn+4gkCBiqzB HVwWheY7nDeFO181E3eRr4dX16Zw7zE1wOZlCXtWM8+JYLdvZXnReJldjlS64k9KX14rPxJ6/xn cHENPntql/me8mNzBUKTEyCiPP+WtGjk6CUSk/JCQJHW2w+zh8LzFCbgJl/typCIhLEjCZEN5SU y22IEQnjpLzzdirN8VjCMaIwOgXnPb X-Google-Smtp-Source: AGHT+IHs7/gQ/vGvFanyj7n8GXRPV68vkSI1w9WnogzkLSz999QKPZcoNI9j1nY9PvcX0oAU/cPmYA== X-Received: by 2002:a05:622a:47:b0:447:e0e6:7787 with SMTP id d75a77b69052e-44fa538db17mr140672491cf.60.1721696760039; Mon, 22 Jul 2024 18:06:00 -0700 (PDT) Received: from kermit.cs.unc.edu (kermit.cs.unc.edu. [152.2.133.133]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f9cdc0cbdsm37953601cf.92.2024.07.22.18.05.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 18:05:59 -0700 (PDT) From: Don Porter To: qemu-devel@nongnu.org Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com, richard.henderson@linaro.org, philmd@linaro.org, berrange@redhat.com, Don Porter Subject: [PATCH v4 3/7] Add an "info pg" command that prints the current page tables Date: Mon, 22 Jul 2024 21:05:41 -0400 Message-Id: <20240723010545.3648706-4-porter@cs.unc.edu> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240723010545.3648706-1-porter@cs.unc.edu> References: <20240723010545.3648706-1-porter@cs.unc.edu> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::830; envelope-from=porter@cs.unc.edu; helo=mail-qt1-x830.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The new "info pg" monitor command prints the current page table, including virtual address ranges, flag bits, and snippets of physical page numbers. Completely filled regions of the page table with compatible flags are "folded", with the result that the complete output for a freshly booted x86-64 Linux VM can fit in a single terminal window. The output looks like this: Info pg for CPU 0 VPN range Entry Flags Physical page [7f0000000-7f0000000] PML4[0fe] ---DA--UWP [7f28c0000-7f28fffff] PDP[0a3] ---DA--UWP [7f28c4600-7f28c47ff] PDE[023] ---DA--UWP [7f28c4655-7f28c4656] PTE[055-056] X--D---U-P 0000007f14-0000007f15 [7f28c465b-7f28c465b] PTE[05b] ----A--U-P 0000001cfc ... [ff8000000-ff8000000] PML4[1ff] ---DA--UWP [ffff80000-ffffbffff] PDP[1fe] ---DA---WP [ffff81000-ffff81dff] PDE[008-00e] -GSDA---WP 0000001000-0000001dff [ffffc0000-fffffffff] PDP[1ff] ---DA--UWP [ffffff400-ffffff5ff] PDE[1fa] ---DA--UWP [ffffff5fb-ffffff5fc] PTE[1fb-1fc] XG-DACT-WP 00000fec00 00000fee00 [ffffff600-ffffff7ff] PDE[1fb] ---DA--UWP [ffffff600-ffffff600] PTE[000] -G-DA--U-P 0000001467 This draws heavy inspiration from Austin Clements' original patch. For nested paging, it does a recursive walk: Info pg for CPU 0, guest mode VPN range Entry Flags Physical page(s) [008000000-00fffffff] PML4[001] -------UWP [008000000-00803ffff] PDP[000] -------UWP [008000800-0080009ff] PDE[004] -------UWP [008000800-00800085b] PTE[000-05b] -------U-P 000000076a-00000007c5 [008000a00-008000bff] PDE[005] -------UWP [008000a00-008000a0f] PTE[000-00f] -------U-P 000000075a-0000000769 [008003e00-008003fff] PDE[01f] -------UWP [008003fa8-008003fb7] PTE[1a8-1b7] --------WP 000000054b-000000055a [008003fc0-008003fcf] PTE[1c0-1cf] --------WP 000000053b-000000054a [008003fd8-008003fe7] PTE[1d8-1e7] --------WP 000000052b-000000053a [008003ff0-008003fff] PTE[1f0-1ff] --------WP 000000051b-000000052a [008004000-008004fff] PDE[020-027] -------UWP [008004000-008004fff] PTE[000-1ff] --------WP 0000000000-0000000fff Info pg for CPU 0, host mode VPN range Entry Flags Physical page(s) [000000000-007ffffff] PML4[000] ----XWR [000000000-00003ffff] PDP[000] ----XWR [000000000-0000001ff] PDE[000] ----XWR [000000001-000000005] PTE[001-005] ----XWR 0000001b24-0000001b28 [000000006-000000006] PTE[006] ----XWR 0000001ab6 [000000007-000000007] PTE[007] ----XWR 00000012ab [000000008-00000000e] PTE[008-00e] ----XWR 0000001b29-0000001b2f ... This also adds a generic page table walker, which other monitor and execution commands will be migrated to in subsequent patches. Finally, this patch adds some fields to the x86 architectural state that are necessary to implement nested page table walks, but are not standardized across acclerator back-ends. Because I cannot test on all of these accelerators, the goal of this patch set is to detect if the accelerator and page table format can be walked, and fail gracefully if not. Signed-off-by: Don Porter --- hmp-commands-info.hx | 13 + hw/core/cpu-sysemu.c | 168 ++++- hw/core/machine-qmp-cmds.c | 243 ++++++++ include/hw/core/cpu.h | 78 ++- include/hw/core/sysemu-cpu-ops.h | 138 ++++- include/monitor/hmp-target.h | 1 + qapi/machine.json | 17 + system/memory_mapping.c | 2 +- target/i386/arch_memory_mapping.c | 994 +++++++++++++++++++++++++++++- target/i386/cpu.c | 23 +- target/i386/cpu.h | 58 +- target/i386/kvm/kvm.c | 68 ++ target/i386/monitor.c | 175 ++++++ 13 files changed, 1967 insertions(+), 11 deletions(-) diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx index c59cd6637b..8f178193e3 100644 --- a/hmp-commands-info.hx +++ b/hmp-commands-info.hx @@ -242,6 +242,19 @@ SRST Show memory tree. ERST + { + .name = "pg", + .args_type = "", + .params = "", + .help = "show the page table", + .cmd_info_hrt = qmp_x_query_pg, + }, + +SRST + ``info pg`` + Show the active page table. +ERST + #if defined(CONFIG_TCG) { .name = "jit", diff --git a/hw/core/cpu-sysemu.c b/hw/core/cpu-sysemu.c index 2a9a2a4eb5..9d05e3e363 100644 --- a/hw/core/cpu-sysemu.c +++ b/hw/core/cpu-sysemu.c @@ -23,12 +23,12 @@ #include "exec/tswap.h" #include "hw/core/sysemu-cpu-ops.h" -bool cpu_paging_enabled(const CPUState *cpu) +bool cpu_paging_enabled(const CPUState *cpu, int mmu_idx) { CPUClass *cc = CPU_GET_CLASS(cpu); if (cc->sysemu_ops->get_paging_enabled) { - return cc->sysemu_ops->get_paging_enabled(cpu); + return cc->sysemu_ops->get_paging_enabled(cpu, mmu_idx); } return false; @@ -142,3 +142,167 @@ GuestPanicInformation *cpu_get_crash_info(CPUState *cpu) } return res; } + +/** + * for_each_pte_recursive - recursive helper function + * + * @cs - CPU state + * @fn(cs, data, pte, vaddr, height) - User-provided function to call on each + * pte. + * * @cs - pass through cs + * * @data - user-provided, opaque pointer + * * @pte - current pte + * * @height - height in the tree of pte + * * @layout- The layout of the radix tree + * @data - user-provided, opaque pointer, passed to fn() + * @visit_interior_nodes - if true, call fn() on page table entries in + * interior nodes. If false, only call fn() on page + * table entries in leaves. + * @visit_not_present - if true, call fn() on entries that are not present. + * if false, visit only present entries. + * @visit_malformed - if true, call fn() on entries that are malformed (e.g., + * bad reserved bits. Even if true, will not follow + * a child pointer to another node. + * @node - The physical address of the current page table radix tree node + * @vaddr_in - The virtual address bits translated in walking the page + * table to node + * @height - The height of the node in the radix tree + * @layout- The layout of the radix tree + * @mmu_idx - Which level of the mmu we are interested in: + * 0 == user mode, 1 == nested page table + * Note that MMU_*_IDX macros are not consistent across + * architectures. + * + * height starts at the max and counts down. + * In a 4 level x86 page table, pml4e is level 4, pdpe is level 3, + * pde is level 2, and pte is level 1 + * + * Returns true on success, false on error. + */ +static bool +for_each_pte_recursive(CPUState *cs, qemu_page_walker_for_each fn, void *data, + bool visit_interior_nodes, bool visit_not_present, + bool visit_malformed, hwaddr node, vaddr vaddr_in, + int height, const PageTableLayout *layout, + int mmu_idx) +{ + int i; + CPUClass *cc = cs->cc; + const struct SysemuCPUOps *ops = cc->sysemu_ops; + + assert(height > 0); + int ptes_per_node = layout->entries_per_node[height]; + + for (i = 0; i < ptes_per_node; i++) { + DecodedPTE pt_entry; + + memset(&pt_entry, 0, sizeof(pt_entry)); + + /* + * For now, let's assume we don't enumerate a page table except + * in debug mode, so the access type should be irrelevant + */ + if (!ops->get_pte(cs, node, i, height, &pt_entry, vaddr_in, true, + mmu_idx, false, MMU_DATA_LOAD, NULL, NULL, NULL)) { + /* Fail if we can't read the PTE */ + return false; + } + + if (!pt_entry.reserved_bits_ok && !visit_malformed) { + continue; + } + + if (pt_entry.present || visit_not_present) { + + if (!pt_entry.present || pt_entry.leaf) { + if (fn(cs, data, &pt_entry, height, i, mmu_idx, layout)) { + /* Error */ + return false; + } + } else { /* Non-leaf */ + if (visit_interior_nodes) { + if (fn(cs, data, &pt_entry, height, i, mmu_idx, layout)) { + /* Error */ + return false; + } + } + assert(height > 1); + + if (pt_entry.reserved_bits_ok) { + + if (!for_each_pte_recursive(cs, fn, data, + visit_interior_nodes, + visit_not_present, + visit_malformed, + pt_entry.child, + pt_entry.bits_translated, + height - 1, layout, + mmu_idx)) { + return false; + } + } + } + } + } + + return true; +} + +/** + * for_each_pte - iterate over a page table, and + * call fn on each entry + * + * @cs - CPU state + * @fn(cs, data, pte, height, offset, layout) - User-provided function to call + * on each pte. + * * @cs - pass through cs + * * @data - user-provided, opaque pointer + * * @pte - current pte, decoded + * * @height - height in the tree of pte + * * @offset - offset within the page tabe node + * * @layout- The layout of the radix tree + * @data - opaque pointer; passed through to fn + * @visit_interior_nodes - if true, call fn() on interior entries in + * page table; if false, visit only leaf entries. + * @visit_not_present - if true, call fn() on entries that are not present. + * if false, visit only present entries. + * @visit_malformed - if true, call fn() on entries that are malformed (e.g., + * bad reserved bits. Even if true, will not follow + * a child pointer to another node. + * @mmu_idx - Which level of the mmu we are interested in: + * 0 == user mode, 1 == nested page table + * Note that MMU_*_IDX macros are not consistent across + * architectures. + * + * Returns true on success, false on error. + * + */ +bool for_each_pte(CPUState *cs, qemu_page_walker_for_each fn, void *data, + bool visit_interior_nodes, bool visit_not_present, + bool visit_malformed, int mmu_idx) +{ + vaddr vaddr = 0; + hwaddr root; + CPUClass *cc = cs->cc; + const PageTableLayout *layout; + + if (!cpu_paging_enabled(cs, mmu_idx)) { + /* paging is disabled */ + return true; + } + + if (!cc->sysemu_ops->page_table_root) { + return false; + } + assert(cc->sysemu_ops->get_pte); + + root = cc->sysemu_ops->page_table_root(cs, &layout, mmu_idx); + + assert(layout->height > 1); + + /* Recursively call a helper to walk the page table */ + return for_each_pte_recursive(cs, fn, data, visit_interior_nodes, + visit_not_present, visit_malformed, root, + vaddr, layout->height, layout, mmu_idx); + +} diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c index 130217da8f..0e17750969 100644 --- a/hw/core/machine-qmp-cmds.c +++ b/hw/core/machine-qmp-cmds.c @@ -10,6 +10,7 @@ #include "qemu/osdep.h" #include "hw/acpi/vmgenid.h" #include "hw/boards.h" +#include "hw/core/sysemu-cpu-ops.h" #include "hw/intc/intc.h" #include "hw/mem/memory-device.h" #include "qapi/error.h" @@ -406,3 +407,245 @@ GuidInfo *qmp_query_vm_generation_id(Error **errp) info->guid = qemu_uuid_unparse_strdup(&vms->guid); return info; } + +/* Assume only called on present entries */ +int compressing_iterator(CPUState *cs, void *data, DecodedPTE *pte, + int height, int offset, int mmu_idx, + const PageTableLayout *layout) +{ + struct mem_print_state *state = (struct mem_print_state *) data; + hwaddr paddr = pte->child; + uint64_t size = pte->leaf_page_size; + bool start_new_run = false, flush = false; + bool is_leaf = pte->leaf; + + int entries_per_node = layout->entries_per_node[height]; + + + /* Prot of current pte */ + int prot = pte->prot; + + /* If there is a prior run, first try to extend it. */ + if (state->start_height != 0) { + + /* + * If we aren't flushing interior nodes, raise the start height. + * We don't need to detect non-compressible interior nodes. + */ + if (!state->flush_interior && state->start_height < height) { + state->start_height = height; + state->vstart[height] = pte->bits_translated; + state->vend[height] = pte->bits_translated; + assert(pte->leaf_page_size != -1); + state->pg_size[height] = pte->leaf_page_size; + state->prot[height] = prot; + if (offset == 0) { + state->last_offset[height] = entries_per_node - 1; + } else { + state->last_offset[height] = offset - 1; + } + } + + /* Detect when we are walking down the "left edge" of a range */ + if (state->vstart[height] == -1 + && (height + 1) <= state->start_height + && state->vstart[height + 1] == pte->bits_translated) { + + state->vstart[height] = pte->bits_translated; + assert(pte->leaf_page_size != -1); + state->pg_size[height] = pte->leaf_page_size; + state->vend[height] = pte->bits_translated; + state->prot[height] = prot; + state->offset[height] = offset; + state->last_offset[height] = offset; + + if (is_leaf) { + state->pstart = paddr; + state->pend = paddr; + state->leaf_height = height; + } + + /* Detect contiguous entries at same level */ + } else if (state->vstart[height] != -1 + && state->start_height >= height + && state->prot[height] == prot + && (state->last_offset[height] + 1) % entries_per_node + == offset + && (!is_leaf + || !state->require_physical_contiguity + || state->pend + size == paddr)) { + + + /* + * If there are entries at the levels below, make sure we + * completed them. We only compress interior nodes + * without holes in the mappings. + */ + if (height != 1) { + for (int i = height - 1; i >= 1; i--) { + int entries = layout->entries_per_node[i]; + + /* Stop if we hit large pages before level 1 */ + if (state->vstart[i] == -1) { + break; + } + + if (state->last_offset[i] + 1 != entries) { + flush = true; + start_new_run = true; + break; + } + } + } + + + if (!flush) { + + /* We can compress these entries */ + state->prot[height] = prot; + state->vend[height] = pte->bits_translated; + state->last_offset[height] = offset; + + /* Only update the physical range on leaves */ + if (is_leaf) { + state->pend = paddr; + } + } + /* Let PTEs accumulate... */ + } else { + flush = true; + } + + if (flush) { + /* + * We hit dicontiguous permissions or pages. + * Print the old entries, then start accumulating again + * + * Some clients only want the flusher called on a leaf. + * Check that too. + * + * We can infer whether the accumulated range includes a + * leaf based on whether pstart is -1. + */ + if (state->flush_interior || (state->pstart != -1)) { + if (state->flusher(cs, state)) { + start_new_run = true; + } + } else { + start_new_run = true; + } + } + } else { + start_new_run = true; + } + + if (start_new_run) { + /* start a new run with this PTE */ + for (int i = state->start_height; i > 0; i--) { + if (state->vstart[i] != -1) { + state->prot[i] = 0; + state->last_offset[i] = 0; + state->vstart[i] = -1; + state->pg_size[height] = -1; + } + } + state->pstart = -1; + state->leaf_height = -1; + state->vstart[height] = pte->bits_translated; + state->vend[height] = pte->bits_translated; + state->pg_size[height] = pte->leaf_page_size; + state->prot[height] = prot; + state->offset[height] = offset; + state->last_offset[height] = offset; + if (is_leaf) { + state->pstart = paddr; + state->pend = paddr; + state->leaf_height = height; + } + state->start_height = height; + } + + return 0; +} + +static +void query_page_helper(GString *buf, CPUState *cpu, int mmu_idx, bool nested) +{ + CPUClass *cc = cpu->cc; + struct mem_print_state state; + + if (!cc->sysemu_ops->mon_init_page_table_iterator(cpu, buf, mmu_idx, + &state)) { + g_string_append_printf(buf, + "Unable to initialize page table iterator\n"); + return; + } + + if (nested) { + if (mmu_idx == 0) { + g_string_append_printf(buf, "Info pg for CPU %d, guest mode\n", + cpu->cpu_index); + } else if (mmu_idx == 1) { + g_string_append_printf(buf, "Info pg for CPU %d, host mode\n", + cpu->cpu_index); + } else { + g_assert_not_reached(); + } + } else { + g_string_append_printf(buf, "Info pg for CPU %d\n", cpu->cpu_index); + } + + state.flush_interior = true; + state.require_physical_contiguity = true; + state.flusher = cc->sysemu_ops->mon_flush_page_print_state; + + cc->sysemu_ops->mon_info_pg_print_header(&state); + + /* + * We must visit interior entries to get the hierarchy, but + * can skip not present mappings + */ + for_each_pte(cpu, &compressing_iterator, &state, true, false, true, + mmu_idx); + + /* Print last entry, if one present */ + cc->sysemu_ops->mon_flush_page_print_state(cpu, &state); +} + +HumanReadableText *qmp_x_query_pg(Error **errp) +{ + + g_autoptr(GString) buf = g_string_new(""); + + CPUState *cpu; + CPU_FOREACH(cpu) { + bool nested; + + cpu_synchronize_state(cpu); + + if (!cpu_paging_enabled(cpu, 0)) { + continue; + } + + nested = cpu_paging_enabled(cpu, 1); + + CPUClass *cc = cpu->cc; + + if (!cc->sysemu_ops->page_table_root) { + g_string_append_printf(buf, "Info pg unsupported on this ISA\n"); + break; + } + + assert(cc->sysemu_ops->mon_init_page_table_iterator); + assert(cc->sysemu_ops->mon_info_pg_print_header); + assert(cc->sysemu_ops->mon_flush_page_print_state); + + query_page_helper(buf, cpu, 0, nested); + + if (nested) { + query_page_helper(buf, cpu, 1, nested); + } + } + + return human_readable_text_from_str(buf); +} diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h index d946161717..c70d31433d 100644 --- a/include/hw/core/cpu.h +++ b/include/hw/core/cpu.h @@ -605,10 +605,11 @@ extern bool mttcg_enabled; /** * cpu_paging_enabled: * @cpu: The CPU whose state is to be inspected. + * @mmu_idx: 0 == traditional paging, 1 == nested paging * * Returns: %true if paging is enabled, %false otherwise. */ -bool cpu_paging_enabled(const CPUState *cpu); +bool cpu_paging_enabled(const CPUState *cpu, int mmu_idx); /** * cpu_get_memory_mapping: @@ -671,9 +672,82 @@ int cpu_write_elf32_qemunote(WriteCoreDumpFunction f, CPUState *cpu, * Caller is responsible for freeing the data. */ GuestPanicInformation *cpu_get_crash_info(CPUState *cpu); - #endif /* !CONFIG_USER_ONLY */ +/* Maximum supported page table height - currently x86 at 5 */ +#define MAX_HEIGHT 5 + +typedef struct PageTableLayout { + int height; /* Height of the page table */ + int entries_per_node[MAX_HEIGHT + 1]; +} PageTableLayout; + +typedef struct DecodedPTE { + int prot; /* Always populated, arch-specific, decoded flags */ + bool present; + bool leaf; /* Only valid if present */ + bool reserved_bits_ok; + bool user_read_ok; + bool user_write_ok; + bool user_exec_ok; + bool super_read_ok; + bool super_write_ok; + bool super_exec_ok; + bool dirty; + hwaddr child; /* Only valid if present and !leaf */ + uint64_t leaf_page_size; /* Only valid if present and leaf */ + uint64_t nested_page_size; /* + * If nested paging, the page size of the host + * page storing the data, versus the size of the + * guest page frame in leaf_page_size + */ + vaddr bits_translated; /* + * The virtual address bits translated in walking + * the page table to node[i]. + */ + hwaddr pte_addr; /* (guest) physical address of the PTE */ + hwaddr pte_host_addr; /* (host) physical address of the PTE */ + uint64_t pte_contents; /* Raw contents of the pte */ +} DecodedPTE; + +typedef int (*qemu_page_walker_for_each)(CPUState *cs, void *data, + DecodedPTE *pte, + int height, int offset, int mmu_idx, + const PageTableLayout *layout); + +/** + * for_each_pte - iterate over a page table, and + * call fn on each entry + * + * @cs - CPU state + * @fn(cs, data, pte, height, offset, layout) - User-provided function to call + * on each pte. + * * @cs - pass through cs + * * @data - user-provided, opaque pointer + * * @pte - current pte, decoded + * * @height - height in the tree of pte + * * @offset - offset within the page tabe node + * * @layout - pointer to a PageTableLayout for this tree + * @data - opaque pointer; passed through to fn + * @visit_interior_nodes - if true, call fn() on interior entries in + * page table; if false, visit only leaf entries. + * @visit_not_present - if true, call fn() on entries that are not present. + * if false, visit only present entries. + * @mmu_idx - Which level of the mmu we are interested in: + * 0 == user mode, 1 == nested page table + * Note that MMU_*_IDX macros are not consistent across + * architectures. + * + * Returns true on success, false on error. + * + * We assume all callers of this function are in debug mode, and do not + * want to synthesize, say, a user-mode load, on each page in the address + * space. + */ +bool for_each_pte(CPUState *cs, qemu_page_walker_for_each fn, void *data, + bool visit_interior_nodes, bool visit_not_present, + bool visit_malformed, int mmu_idx); + /** * CPUDumpFlags: * @CPU_DUMP_CODE: diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h index 4c94e51267..d0e939def8 100644 --- a/include/hw/core/sysemu-cpu-ops.h +++ b/include/hw/core/sysemu-cpu-ops.h @@ -12,6 +12,43 @@ #include "hw/core/cpu.h" +/* + * struct mem_print_state: Used by qmp in walking page tables. + */ +struct mem_print_state { + GString *buf; + CPUArchState *env; + int vaw, paw; /* VA and PA width in characters */ + int max_height; + int mmu_idx; /* 0 == user mode, 1 == nested page table */ + bool (*flusher)(CPUState *cs, struct mem_print_state *state); + bool flush_interior; /* If false, only call flusher() on leaves */ + bool require_physical_contiguity; + /* + * The height at which we started accumulating ranges, i.e., the + * next height we need to print once we hit the end of a + * contiguous range. + */ + int start_height; + int leaf_height; /* The height at which we found a leaf, or -1 */ + /* + * For compressing contiguous ranges, track the + * start and end of the range + */ + hwaddr vstart[MAX_HEIGHT + 1]; /* Starting virt. addr. of open pte range */ + hwaddr vend[MAX_HEIGHT + 1]; /* Ending virtual address of open pte range */ + hwaddr pstart; /* Starting physical address of open pte range */ + hwaddr pend; /* Ending physical address of open pte range */ + + /* PTE protection flags current root->leaf path */ + uint64_t prot[MAX_HEIGHT + 1]; + + /* Page size (leaf) or address range covered (non-leaf). */ + uint64_t pg_size[MAX_HEIGHT + 1]; + int offset[MAX_HEIGHT + 1]; /* PTE range starting offsets */ + int last_offset[MAX_HEIGHT + 1]; /* PTE range ending offsets */ +}; + typedef enum TranslateFaultStage2 { S2_NONE, S2_GPA, @@ -30,7 +67,7 @@ typedef struct SysemuCPUOps { /** * @get_paging_enabled: Callback for inquiring whether paging is enabled. */ - bool (*get_paging_enabled)(const CPUState *cpu); + bool (*get_paging_enabled)(const CPUState *cpu, int mmu_idx); /** * @get_phys_page_debug: Callback for obtaining a physical address. */ @@ -93,6 +130,105 @@ typedef struct SysemuCPUOps { */ const VMStateDescription *legacy_vmsd; + /** + * page_table_root - Given a CPUState, return the physical address + * of the current page table root, as well as + * setting a pointer to a PageTableLayout. + * + * @cs - CPU state + * @layout - a pointer to a PageTableLayout structure, which stores + * the page table tree geometry. + * @mmu_idx - Which level of the mmu we are interested in: + * 0 == user mode, 1 == nested page table + * Note that MMU_*_IDX macros are not consistent across + * architectures. + * + * Returns a hardware address on success. Should not fail (i.e., + * caller is responsible to ensure that a page table is actually + * present, or that, with nested paging, there is a nested table + * present). + * + * Do not free layout. + */ + hwaddr (*page_table_root)(CPUState *cs, const PageTableLayout **layout, + int mmu_idx); + + /** + * get_pte - Copy and decode the contents of the page table entry at + * node[i] into pt_entry. + * + * @cs - CPU state + * @node - physical address of the current page table node + * @i - index (in page table entries, not bytes) of the page table + * entry, within node + * @height - height of node within the tree (leaves are 1, not 0) + * @pt_entry - Pointer to a DecodedPTE, stores the contents of the page + * table entry + * @vaddr_parent - The virtual address bits already translated in + * walking the page table to node. Optional: only + * used if vaddr_pte is set. + * @debug - If true, do not update softmmu state (if applicable) to reflect + * the page table walk. + * @mmu_idx - Which level of the mmu we are interested in: + * 0 == user mode, 1 == nested page table + * Note that MMU_*_IDX macros are not consistent across + * architectures. + * @user_access - For non-debug accesses, true if a user mode access, false + * if supervisor mode access. Used to determine faults. + * @access_type - For non-debug accesses, what type of access is driving the + * lookup. Used to determine faults. + * @error_code - Optional integer pointer, to store error reason on failure + * @fault_addr - Optional vaddr pointer, to store the faulting address on a + * recursive page walk for the pe. Otherwise, caller is + * expected to determine if this pte access would fault. + * @nested_fault - Optional pointer, to differentiate causes of nested + * faults. Set to true if there is a fault recurring on a + * nested page table. + * + * Returns true on success, false on failure. This should only fail if a + * page table entry cannot be read because the address of node is not a + * valid (guest) physical address. Otherwise, we capture errors like bad + * reserved flags in the DecodedPTE entry and let the caller decide how to + * handle it. + */ + + bool (*get_pte)(CPUState *cs, hwaddr node, int i, int height, + DecodedPTE *pt_entry, vaddr vaddr_parent, bool debug, + int mmu_idx, bool user_access, + const MMUAccessType access_type, int *error_code, + hwaddr *fault_addr, TranslateFaultStage2 *nested_fault); + + /** + * @mon_init_page_table_iterator: Callback to configure a page table + * iterator for use by a monitor function. + * Returns true on success, false if not supported (e.g., no paging disabled + * or not implemented on this CPU). + * + * @mmu_idx - Which level of the mmu we are interested in: + * 0 == user mode, 1 == nested page table + * Note that MMU_*_IDX macros are not consistent across + * architectures. + */ + bool (*mon_init_page_table_iterator)(CPUState *cpu, GString *buf, + int mmu_idx, + struct mem_print_state *state); + + /** + * @mon_info_pg_print_header: Prints the header line for 'info pg'. + */ + void (*mon_info_pg_print_header)(struct mem_print_state *state); + + /** + * @flush_page_table_iterator_state: For 'info pg', it prints the last + * entry that was visited by the compressing_iterator, if one is present. + */ + bool (*mon_flush_page_print_state)(CPUState *cs, + struct mem_print_state *state); + } SysemuCPUOps; +int compressing_iterator(CPUState *cs, void *data, DecodedPTE *pte, + int height, int offset, int mmu_idx, + const PageTableLayout *layout); + #endif /* SYSEMU_CPU_OPS_H */ diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h index b679aaebbf..9af72ea58d 100644 --- a/include/monitor/hmp-target.h +++ b/include/monitor/hmp-target.h @@ -50,6 +50,7 @@ CPUState *mon_get_cpu(Monitor *mon); void hmp_info_mem(Monitor *mon, const QDict *qdict); void hmp_info_tlb(Monitor *mon, const QDict *qdict); void hmp_mce(Monitor *mon, const QDict *qdict); +void hmp_info_pg(Monitor *mon, const QDict *qdict); void hmp_info_local_apic(Monitor *mon, const QDict *qdict); void hmp_info_sev(Monitor *mon, const QDict *qdict); void hmp_info_sgx(Monitor *mon, const QDict *qdict); diff --git a/qapi/machine.json b/qapi/machine.json index f9ea6b3e97..2a259588dc 100644 --- a/qapi/machine.json +++ b/qapi/machine.json @@ -1771,6 +1771,23 @@ 'if': 'CONFIG_TCG', 'features': [ 'unstable' ] } +## +# @x-query-pg: +# +# Query current page tables +# +# Features: +# +# @unstable: This command is meant for debugging. +# +# Returns: Compressed summary of page tables. +# +# Since: 9.0 +## +{ 'command': 'x-query-pg', + 'returns': 'HumanReadableText', + 'features': [ 'unstable' ] } + ## # @x-query-ramblock: # diff --git a/system/memory_mapping.c b/system/memory_mapping.c index 6f884c5b90..78499ae607 100644 --- a/system/memory_mapping.c +++ b/system/memory_mapping.c @@ -296,7 +296,7 @@ static CPUState *find_paging_enabled_cpu(void) CPUState *cpu; CPU_FOREACH(cpu) { - if (cpu_paging_enabled(cpu)) { + if (cpu_paging_enabled(cpu, 0)) { return cpu; } } diff --git a/target/i386/arch_memory_mapping.c b/target/i386/arch_memory_mapping.c index d1ff659128..ef29e4b42f 100644 --- a/target/i386/arch_memory_mapping.c +++ b/target/i386/arch_memory_mapping.c @@ -14,6 +14,997 @@ #include "qemu/osdep.h" #include "cpu.h" #include "sysemu/memory_mapping.h" +#include "exec/cpu_ldst.h" +#include "tcg/helper-tcg.h" + +#define PML4_ADDR_MASK 0xffffffffff000ULL /* selects bits 51:12 */ + +const PageTableLayout x86_lma57_layout = { .height = 5, + .entries_per_node = {0, 512, 512, 512, 512, 512} +}; + +const PageTableLayout x86_lma48_layout = { .height = 4, + .entries_per_node = {0, 512, 512, 512, 512, 0} +}; + +const PageTableLayout x86_pae32_layout = { .height = 3, + .entries_per_node = {0, 512, 512, 4, 0, 0} +}; + +const PageTableLayout x86_ia32_layout = { .height = 2, + .entries_per_node = {0, 1024, 1024, 0, 0, 0} +}; + +/** + * x86_page_table_root - Given a CPUState, return the physical address + * of the current page table root, as well as + * setting a pointer to a PageTableLayout. + * + * @cs - CPU state + * @layout - a pointer to a pointer to a PageTableLayout structure, + * into which is written a pointer to the page table tree + * geometry. + * @mmu_idx - Which level of the mmu we are interested in: + * 0 == user mode, 1 == nested page table + * Note that MMU_*_IDX macros are not consistent across + * architectures. + * + * Returns a hardware address on success. Should not fail (i.e., + * caller is responsible to ensure that a page table is actually + * present, or that, with nested paging, there is a nested + * table present). + * + * Do not free *layout. + */ +hwaddr +x86_page_table_root(CPUState *cs, const PageTableLayout ** layout, + int mmu_idx) +{ + X86CPU *cpu = X86_CPU(cs); + CPUX86State *env = &cpu->env; + /* + * DEP 5/15/24: Some original page table walking code sets the a20 + * mask as a 32 bit integer and checks it on each level of hte + * page table walk; some only checks it against the final result. + * For 64 bits, I think we need to sign extend in the common case + * it is not set (and returns -1), or we will lose bits. + */ + hwaddr root = 0; + int pg_mode; + int64_t a20_mask; + + assert(cpu_paging_enabled(cs, mmu_idx)); + a20_mask = x86_get_a20_mask(env); + + switch (mmu_idx) { + case 0: + root = env->cr[3]; + pg_mode = get_pg_mode(env); + + if (pg_mode & PG_MODE_PAE) { +#ifdef TARGET_X86_64 + if (pg_mode & PG_MODE_LMA) { + if (pg_mode & PG_MODE_LA57) { + *layout = &x86_lma57_layout; + } else { + *layout = &x86_lma48_layout; + } + return (root & PML4_ADDR_MASK) & a20_mask; + } else +#endif + { + *layout = &x86_pae32_layout; + return (root & ~0x1f) & a20_mask; + } + } else { + assert(mmu_idx != 1); + *layout = &x86_ia32_layout; + return (root & ~0xfff) & a20_mask; + } + break; + case 1: + assert(env->vm_state_valid); + root = env->nested_pg_root; + switch (env->nested_pg_height) { + case 4: + *layout = &x86_lma48_layout; + break; + case 5: + *layout = &x86_lma57_layout; + break; + default: + g_assert_not_reached(); + } + return (root & PML4_ADDR_MASK) & a20_mask; + default: + g_assert_not_reached(); + } + + g_assert_not_reached(); + return 0; +} + +/* + * Given a CPU state and height, return the number of bits + * to shift right/left in going from virtual to PTE index + * and vice versa, the number of useful bits. + */ +static void _mmu_decode_va_parameters(CPUState *cs, int height, + int *shift, int *width) +{ + X86CPU *cpu = X86_CPU(cs); + CPUX86State *env = &cpu->env; + int _shift = 0; + int _width = 0; + bool pae_enabled = env->cr[4] & CR4_PAE_MASK; + + switch (height) { + case 5: + _shift = 48; + _width = 9; + break; + case 4: + _shift = 39; + _width = 9; + break; + case 3: + _shift = 30; + _width = 9; + break; + case 2: + /* 64 bit page tables shift from 30->21 bits here */ + if (pae_enabled) { + _shift = 21; + _width = 9; + } else { + /* 32 bit page tables shift from 32->22 bits */ + _shift = 22; + _width = 10; + } + break; + case 1: + _shift = 12; + if (pae_enabled) { + _width = 9; + } else { + _width = 10; + } + + break; + default: + g_assert_not_reached(); + } + + if (shift) { + *shift = _shift; + } + + if (width) { + *width = _width; + } +} + +/** + * x86_virtual_to_pte_index - Given a virtual address and height in + * the page table radix tree, return the index that should be + * used to look up the next page table entry (pte) in + * translating an address. + * + * @cs - CPU state + * @vaddr - The virtual address to translate + * @height - height of node within the tree (leaves are 1, not 0). + * + * Example: In 32-bit x86 page tables, the virtual address is split + * into 10 bits at height 2, 10 bits at height 1, and 12 offset bits. + * So a call with VA and height 2 would return the first 10 bits of va, + * right shifted by 22. + */ +int x86_virtual_to_pte_index(CPUState *cs, vaddr vaddr_in, int height) +{ + int shift = 0; + int width = 0; + int mask = 0; + + _mmu_decode_va_parameters(cs, height, &shift, &width); + + mask = (1 << width) - 1; + + return (vaddr_in >> shift) & mask; +} + +/** + * x86_get_pte - Copy and decode the contents of the page table entry at + * node[i] into pt_entry. + * + * @cs - CPU state + * @node - physical address of the current page table node + * @i - index (in page table entries, not bytes) of the page table + * entry, within node + * @height - height of node within the tree (leaves are 1, not 0) + * @pt_entry - Poiter to a DecodedPTE, stores the contents of the page table + * entry + * @vaddr_parent - The virtual address bits already translated in walking the + * page table to node. Optional: only used if vaddr_pte is set. + * @debug - If true, do not update softmmu state (if applicable) to reflect + * the page table walk. + * @mmu_idx - Which level of the mmu we are interested in: 0 == user + * mode, 1 == nested page table Note that MMU_*_IDX macros + * are not consistent across architectures. + * @user_access - For non-debug accesses, is this a user or supervisor-mode + * access. Used to determine faults. + * @access_type - For non-debug accesses, what type of access is driving the + * lookup. Used to determine faults. + * @error_code - Optional integer pointer, to store error reason on failure + * @fault_addr - Optional vaddr pointer, to store the faulting address on a + * recursive page walk for the pe. Otherwise, caller is expected + * to determine if this pte access would fault. + * @nested_fault - Optional pointer, to differentiate causes of nested faults. + * Set to true if there is a fault recurring on a nested page + * table. + * + * Returns true on success, false on failure. This should only fail if a page + * table entry cannot be read because the address of node is not a valid (guest) + * physical address. Otherwise, we capture errors like bad reserved flags in + * the DecodedPTE entry and let the caller decide how to handle it. + */ +bool +x86_get_pte(CPUState *cs, hwaddr node, int i, int height, DecodedPTE *pt_entry, + vaddr vaddr_parent, bool debug, int mmu_idx, bool user_access, + const MMUAccessType access_type, int *error_code, + vaddr *fault_addr, TranslateFaultStage2 *nested_fault) +{ + CPUX86State *env = cpu_env(cs); + int32_t a20_mask = x86_get_a20_mask(env); + hwaddr pte = 0; + uint64_t pte_contents = 0; + hwaddr pte_host_addr = 0; + uint64_t unused = 0; /* We always call probe_access in non-fault mode */ + bool use_stage2 = env->hflags & HF_GUEST_MASK; + int pte_width = 4; + uint64_t leaf_mask = 0; + int pg_mode = get_pg_mode(env); + bool pae_enabled = !!(pg_mode & PG_MODE_PAE); + bool long_mode = !!(pg_mode & PG_MODE_LMA); +#ifdef CONFIG_TCG + void *pte_internal_pointer = NULL; +#endif + + pt_entry->reserved_bits_ok = false; + + if (env->hflags & HF_LMA_MASK) { + /* 64 bit */ + pte_width = 8; + } + + pte = (node + (i * pte_width)) & a20_mask; + + if (debug) { + + /* Recur on nested paging */ + if (mmu_idx == 0 && use_stage2) { + + bool ok = x86_ptw_translate(cs, pte, &pte_host_addr, debug, 1, + user_access, access_type, NULL, + error_code, fault_addr, NULL, NULL, + NULL); + if (!ok) { + if (nested_fault) { + *nested_fault = S2_GPT; + } + return false; + } + } else { + pte_host_addr = pte; + } + } else { +#ifdef CONFIG_TCG + CPUTLBEntryFull *full; + int flags = probe_access_full(env, pte, 0, MMU_DATA_STORE, + MMU_NESTED_IDX, true, + &pte_internal_pointer, &full, + unused); + + if (unlikely(flags & TLB_INVALID_MASK)) { + if (nested_fault) { + *nested_fault = S2_GPT; + } + if (error_code) { + *error_code = env->error_code; + } + if (fault_addr) { + *fault_addr = pte; + } + return false; + } + + pte_host_addr = full->phys_addr; + /* probe_access_full() drops the offset bits; we need to re-add them */ + pte_host_addr += i * pte_width; + /* + * But don't re-add to pte_internal_pointer, which overlaps with + * pte_host_addr... + */ +#else + /* Any non-TCG use case should be read-only */ + g_assert_not_reached(); +#endif + } +#ifdef CONFIG_TCG + /* + * TCG needs to set the accessed bit on the PTE; it does this in a + * compare-and-swap loop. + */ + reread_pte: +#endif + + /* Read the PTE contents */ + if (likely(pte_host_addr)) { + if (long_mode) { + pte_contents = address_space_ldq(cs->as, pte_host_addr, + MEMTXATTRS_UNSPECIFIED, NULL); + } else { + pte_contents = address_space_ldl(cs->as, pte_host_addr, + MEMTXATTRS_UNSPECIFIED, NULL); + } + } else { + pte_contents = long_mode ? + cpu_ldq_mmuidx_ra(env, pte, MMU_PHYS_IDX, unused) : + cpu_ldl_mmuidx_ra(env, pte, MMU_PHYS_IDX, unused); + } + + /* Deserialize flag bits, different by mmu index */ + if (mmu_idx == 0 || + (mmu_idx == 1 && env->vm_state_valid && env->nested_pg_format == 1)) + { + pt_entry->present = pte_contents & PG_PRESENT_MASK; + + if (pt_entry->present) { + bool nx_enabled = !!(pg_mode & PG_MODE_NXE); + bool smep_enabled = !!(pg_mode & PG_MODE_SMEP); + + pt_entry->super_read_ok = true; + if (pg_mode & PG_MODE_WP) { + pt_entry->super_write_ok = !!(pte_contents & PG_RW_MASK); + } else { + pt_entry->super_write_ok = true; + } + + if (nx_enabled) { + if (smep_enabled) { + pt_entry->super_exec_ok = !(pte_contents & PG_USER_MASK); + } else { + pt_entry->super_exec_ok = !(pte_contents & PG_NX_MASK); + } + pt_entry->user_exec_ok = !(pte_contents & PG_NX_MASK); + } else { + pt_entry->super_exec_ok = true; + pt_entry->user_exec_ok = !!(pte_contents & PG_USER_MASK); + } + + if (pte_contents & PG_USER_MASK) { + pt_entry->user_read_ok = true; + pt_entry->user_write_ok = !!(pte_contents & PG_RW_MASK); + } + + pt_entry->dirty = !!(pte_contents & PG_DIRTY_MASK); + } + + pt_entry->prot = pte_contents & (PG_USER_MASK | PG_RW_MASK | + PG_PRESENT_MASK); + + + + /* In 32-bit mode without PAE, we need to check the PSE flag in cr4 */ + if (long_mode || pae_enabled || pg_mode & PG_MODE_PSE) { + leaf_mask = PG_PSE_MASK; + } + + } else if (mmu_idx == 1) { + uint64_t mask = PG_EPT_PRESENT_MASK; + /* + * One could arguably check whether the CPU is in supervisor mode + * here. At least for debugging functions, one probably only wants + * an entry treated as not-present if it is not present in all modes, + * not just the current guest ring. OTOH, TCG may want this semantic. + */ + if (env->enable_mode_based_access_control) { + mask |= PG_EPT_X_USER_MASK; + } + pt_entry->present = !!(pte_contents & mask); + if (pt_entry->present) { + pt_entry->super_read_ok = pt_entry->user_read_ok + = !!(pte_contents & PG_EPT_R_MASK); + + pt_entry->super_exec_ok = !!(pte_contents & PG_EPT_X_SUPER_MASK); + if (env->enable_mode_based_access_control) { + pt_entry->user_exec_ok = !!(pte_contents & PG_EPT_X_USER_MASK); + } else { + pt_entry->user_exec_ok = pt_entry->super_exec_ok; + } + + pt_entry->dirty = !!(pte_contents & PG_DIRTY_MASK); + } + pt_entry->prot = pte_contents & (PG_EPT_PRESENT_MASK | + PG_EPT_X_USER_MASK); + leaf_mask = PG_EPT_PSE_MASK; + } else { + g_assert_not_reached(); + } + + if (pt_entry->present) { + pt_entry->leaf = (height == 1 || + pte_contents & leaf_mask); + + /* Sanity checks */ + if (pt_entry->leaf) { + switch (height) { +#ifdef TARGET_X86_64 + case 5: + /* No leaves at level 5 in EPT */ + assert(mmu_idx == 0); + assert(pae_enabled); + assert(env->cr[4] & CR4_LA57_MASK); + assert(env->hflags & HF_LMA_MASK); + break; + case 4: + /* No leaves at level 4 in EPT */ + assert(mmu_idx == 0); + assert(pae_enabled); + assert(env->hflags & HF_LMA_MASK); + break; +#endif + case 3: + if (mmu_idx == 0) { + assert(pae_enabled); + } + break; + } + } + + switch (height) { +#ifdef TARGET_X86_64 + case 5: + /* assert(pae_enabled); */ + /* Fall through */ + case 4: + /* assert(pae_enabled); */ + /* Fall through */ +#endif + case 3: + assert(pae_enabled); +#ifdef TARGET_X86_64 + if (env->hflags & HF_LMA_MASK) { + if (pt_entry->leaf) { + /* Select bits 30--51 */ + pt_entry->child = (pte_contents & 0xfffffc0000000); + } else { + pt_entry->child = (pte_contents & PG_ADDRESS_MASK) + & a20_mask; + } + } else +#endif + { + pt_entry->child = (pte_contents & ~0xfff) & a20_mask; + } + break; + case 2: + if (pt_entry->leaf) { + if (pae_enabled) { + /* Select bits 21--51 */ + pt_entry->child = (pte_contents & 0xfffffffe00000); + } else { + /* + * 4 MB page: + * bits 39:32 are bits 20:13 of the PDE + * bit3 31:22 are bits 31:22 of the PDE + */ + hwaddr high_paddr = (hwaddr)(pte_contents & 0x1fe000) << 19; + pt_entry->child = (pte_contents & ~0x3fffff) | high_paddr; + } + break; + } + /* else fall through */ + case 1: + if (pae_enabled || mmu_idx == 1) { + pt_entry->child = (pte_contents & PG_ADDRESS_MASK) + & a20_mask; + } else { + pt_entry->child = (pte_contents & ~0xfff) & a20_mask; + } + break; + default: + g_assert_not_reached(); + } + + /* Check reserved bits */ + uint64_t rsvd_mask = ~MAKE_64BIT_MASK(0, env_archcpu(env)->phys_bits); + rsvd_mask &= PG_ADDRESS_MASK; + + if (mmu_idx == 0 + || (mmu_idx == 1 && env->vm_state_valid && + env->nested_pg_format == 1)) { + + if (!(env->efer & MSR_EFER_NXE) + || !long_mode) { + rsvd_mask |= PG_NX_MASK; + } + if (height > 3) { + rsvd_mask |= PG_PSE_MASK; + } + if (!long_mode) { + if (pae_enabled) { + rsvd_mask |= PG_HI_USER_MASK; + } else if (!pae_enabled && height == 2 && pt_entry->leaf) { + rsvd_mask = 0x200000; + } else { + rsvd_mask = 0; + } + } + + /* If PAT is not supported, the PAT bit is reserved */ + if (!(env->features[FEAT_1_EDX] & CPUID_PAT)) { + rsvd_mask |= PG_PSE_PAT_MASK; + } + + } else if (mmu_idx == 1) { + assert(env->nested_pg_format == 0); + /* All EPT formats reserve bits 51..max phys address. */ + rsvd_mask &= 0xffffffffff000; + + if (pt_entry->leaf) { + /* Leaves reserve irrelevant low-bits of the phys addr */ + if (height == 3) { + rsvd_mask |= 0x3ffff000; + } else if (height == 2) { + rsvd_mask |= 0x1ff000; + } + } else { + /* non-leaves should have bits 7:3 clear */ + rsvd_mask |= 0xf8; + } + } else { + g_assert_not_reached(); + } + + if (pte_contents & rsvd_mask) { + pt_entry->reserved_bits_ok = false; + } else { + pt_entry->reserved_bits_ok = true; + } + + /* In non-read-only case, set accessed bits */ + if (!debug) { +#ifdef CONFIG_TCG + TranslateFault err; + PTETranslate pte_trans = { + .gaddr = pte_host_addr, + .haddr = pte_internal_pointer, + .env = env, + .err = &err, + .ptw_idx = MMU_PHYS_IDX, /* We already recurred */ + }; + + /* If this is a leaf and a store, set the dirty bit too */ + if (mmu_idx == 0 || (mmu_idx == 1 && env->nested_pg_format == 1)) { + uint32_t set = PG_ACCESSED_MASK; + if (pt_entry->leaf && access_type == MMU_DATA_STORE) { + set |= PG_DIRTY_MASK; + } + if (!ptw_setl(&pte_trans, pte_contents, set)) { + goto reread_pte; + } + } else if (mmu_idx == 1) { + assert(env->nested_pg_format == 0); + if (env->enable_ept_accessed_dirty) { + uint32_t set = PG_EPT_ACCESSED_MASK; + if (pt_entry->leaf && access_type == MMU_DATA_STORE) { + set |= PG_EPT_DIRTY_MASK; + } + if (!ptw_setl(&pte_trans, pte_contents, set)) { + goto reread_pte; + } + } + } else { + g_assert_not_reached(); + } +#else + g_assert_not_reached(); +#endif + } + } + + /* + * We always report the relevant leaf page size so that + * consumers know the virtual addresses range translated by this entry. + */ + + /* Decode the child node's hw address */ + switch (height) { +#ifdef TARGET_X86_64 + case 5: + assert(env->cr[4] & CR4_LA57_MASK); + pt_entry->leaf_page_size = 1ULL << 48; + break; + case 4: + assert(env->hflags & HF_LMA_MASK); + pt_entry->leaf_page_size = 1ULL << 39; + break; +#endif + case 3: + pt_entry->leaf_page_size = 1 << 30; + break; + case 2: + if (pae_enabled || mmu_idx == 1) { + pt_entry->leaf_page_size = 1 << 21; + } else { + pt_entry->leaf_page_size = 1 << 22; + } + break; + case 1: + pt_entry->leaf_page_size = 4096; + break; + default: + g_assert_not_reached(); + } + + int shift = 0; + _mmu_decode_va_parameters(cs, height, &shift, NULL); + pt_entry->bits_translated = vaddr_parent | ((i & 0x1ffULL) << shift); + pt_entry->pte_addr = pte; + pt_entry->pte_host_addr = (hwaddr) pte_host_addr; + pt_entry->pte_contents = pte_contents; + + return true; +} + +bool x86_ptw_translate(CPUState *cs, vaddr vaddress, hwaddr *hpa, + bool debug, int mmu_idx, bool user_access, + const MMUAccessType access_type, uint64_t *page_size, + int *error_code, hwaddr *fault_addr, + TranslateFaultStage2 *nested_fault, int *prot, + bool *dirty) +{ + CPUX86State *env = cpu_env(cs); + const PageTableLayout *layout; + hwaddr pt_node = x86_page_table_root(cs, &layout, mmu_idx); + DecodedPTE pt_entry; + hwaddr offset = 0; + hwaddr real_hpa = 0; + uint64_t real_page_size; + + vaddr bits_translated = 0; + int pg_mode = get_pg_mode(env); + bool use_stage2 = env->hflags & HF_GUEST_MASK; + + /* + * As we iterate on the page table, accumulate allowed operations, for + * a possible TLB refill (e.g., TCG). Note that we follow the TCG softmmu + * code in applying protection keys here; my reading is that one needs to + * flush the TLB on any operation that changes a relevant key, which is + * beyond this code's purview... + */ + bool user_read_ok = true, user_write_ok = true, user_exec_ok = true; + bool super_read_ok = true, super_write_ok = true, super_exec_ok = true; + + /* Initialize the error code to 0 */ + if (error_code) { + *error_code = 0; + } + + /* Ensure nested_fault is initialized properly */ + if (nested_fault) { + *nested_fault = S2_NONE; + } + + int i = layout->height; + do { + int index = x86_virtual_to_pte_index(cs, vaddress, i); + + memset(&pt_entry, 0, sizeof(pt_entry)); + + if (!x86_get_pte(cs, pt_node, index, i, &pt_entry, bits_translated, + debug, mmu_idx, user_access, access_type, error_code, + fault_addr, nested_fault)) { + return false; + } + + if (!pt_entry.present) { + if (error_code) { + /* Set the P bit to zero */ + if (error_code) { + *error_code &= ~PG_ERROR_P_MASK; + if (user_access) { + *error_code |= PG_ERROR_U_MASK; + } + if (access_type == MMU_DATA_STORE) { + *error_code |= PG_ERROR_W_MASK; + } else if (access_type == MMU_INST_FETCH) { + if (pg_mode & PG_MODE_SMEP + || (pg_mode & PG_MODE_NXE + && pg_mode & PG_MODE_PAE)) { + *error_code |= PG_ERROR_I_D_MASK; + } + } + } + } + goto fault_out; + } + + /* Always check reserved bits */ + if (!pt_entry.reserved_bits_ok) { + if (error_code) { + *error_code |= PG_ERROR_RSVD_MASK; + } + goto fault_out; + } + + /* Always accumulate the permissions on the page table walk. */ + user_read_ok &= pt_entry.user_read_ok; + user_write_ok &= pt_entry.user_write_ok; + user_exec_ok &= pt_entry.user_exec_ok; + super_read_ok &= pt_entry.super_read_ok; + super_write_ok &= pt_entry.super_write_ok; + super_exec_ok &= pt_entry.super_exec_ok; + + /* If we are not in debug mode, check permissions before recurring */ + if (!debug) { + if (user_access) { + switch (access_type) { + case MMU_DATA_LOAD: + if (!pt_entry.user_read_ok) { + if (error_code) { + *error_code |= PG_ERROR_U_MASK; + /* We can only set the P bit on a leaf */ + if (pt_entry.leaf) { + *error_code |= PG_ERROR_P_MASK; + } + } + goto fault_out; + } + break; + case MMU_DATA_STORE: + if (!pt_entry.user_write_ok) { + if (error_code) { + *error_code |= PG_ERROR_W_MASK | PG_ERROR_U_MASK; + /* We can only set the P bit on a leaf */ + if (pt_entry.leaf) { + *error_code |= PG_ERROR_P_MASK; + } + } + goto fault_out; + } + break; + case MMU_INST_FETCH: + if (!pt_entry.user_exec_ok) { + if (error_code) { + *error_code |= PG_ERROR_U_MASK; + if (pg_mode & PG_MODE_SMEP + || (pg_mode & PG_MODE_NXE + && pg_mode & PG_MODE_PAE)) { + *error_code |= PG_ERROR_I_D_MASK; + /* We can only set the P bit on a leaf */ + if (pt_entry.leaf) { + *error_code |= PG_ERROR_P_MASK; + } + } + } + goto fault_out; + } + break; + default: + g_assert_not_reached(); + } + } else { + switch (access_type) { + case MMU_DATA_LOAD: + if (!pt_entry.super_read_ok) { + if (error_code && pt_entry.leaf) { + /* Not a distinct super+r mask */ + *error_code |= PG_ERROR_P_MASK; + } + goto fault_out; + } + break; + case MMU_DATA_STORE: + if (!pt_entry.super_write_ok) { + if (error_code) { + *error_code |= PG_ERROR_P_MASK | PG_ERROR_W_MASK; + /* We can only set the P bit on a leaf */ + if (pt_entry.leaf) { + *error_code |= PG_ERROR_P_MASK; + } + + } + goto fault_out; + } + break; + case MMU_INST_FETCH: + if (!pt_entry.super_exec_ok) { + if (error_code) { + /* We can only set the P bit on a leaf */ + if (pt_entry.leaf) { + *error_code |= PG_ERROR_P_MASK; + } + if (pg_mode & PG_MODE_SMEP + || (pg_mode & PG_MODE_NXE + && pg_mode & PG_MODE_PAE)) { + *error_code |= PG_ERROR_I_D_MASK; + } + + } + goto fault_out; + } + break; + default: + g_assert_not_reached(); + } + } + } + + /* Check if we have hit a leaf. Won't happen (yet) at heights > 3. */ + if (pt_entry.leaf) { + assert(i < 4); + break; + } + + /* Move to the child node */ + assert(i > 1); + pt_node = pt_entry.child; + bits_translated |= pt_entry.bits_translated; + i--; + } while (i > 0); + + assert(pt_entry.leaf); + + /* Some x86 protection checks are leaf-specific */ + + /* Apply MPK at end, only on non-nested page tables */ + if (mmu_idx == 0) { + /* MPK */ + uint32_t pkr; + + /* Is this a user-mode mapping? */ + if (user_read_ok) { + pkr = pg_mode & PG_MODE_PKE ? env->pkru : 0; + } else { + pkr = pg_mode & PG_MODE_PKS ? env->pkrs : 0; + } + + if (pkr) { + uint32_t pk = (pt_entry.pte_contents & PG_PKRU_MASK) + >> PG_PKRU_BIT; + /* + * Follow the TCG pattern here of applying these bits + * to the protection, which may be fed to the TLB. + * My reading is that it is not safe to cache this across + * changes to these registers... + */ + uint32_t pkr_ad = (pkr >> pk * 2) & 1; + uint32_t pkr_wd = (pkr >> pk * 2) & 2; + + if (pkr_ad) { + super_read_ok = false; + user_read_ok = false; + super_write_ok = false; + user_write_ok = false; + + if (!debug) { + if (access_type == MMU_DATA_LOAD + || access_type == MMU_DATA_STORE) { + if (error_code) { + *error_code |= PG_ERROR_PK_MASK | PG_ERROR_P_MASK; + if (user_access) { + *error_code |= PG_ERROR_U_MASK; + } + } + goto fault_out; + + } + } + } + + if (pkr_wd) { + user_write_ok = false; + if (pg_mode & PG_MODE_WP) { + super_write_ok = false; + } + if (!debug) { + if (access_type == MMU_DATA_STORE + && (user_access || pg_mode & PG_MODE_WP)) { + if (error_code) { + *error_code |= PG_ERROR_PK_MASK | PG_ERROR_P_MASK; + if (user_access) { + *error_code |= PG_ERROR_U_MASK; + } + } + goto fault_out; + } + } + } + } + } + + real_page_size = pt_entry.leaf_page_size; + /* Add offset bits back to hpa */ + offset = vaddress & (pt_entry.leaf_page_size - 1); + real_hpa = pt_entry.child | offset; + + /* + * In the event of nested paging, we need to recur one last time on the + * child address to resolve the host address. Also, if the nested page + * size is larger use that for a TLB consumer. Recursion with the offset + * bits added in should do the right thing if the nested page sizes differ. + */ + + if (mmu_idx == 0 && use_stage2) { + vaddr gpa = pt_entry.child | offset; + uint64_t nested_page_size = 0; + + if (error_code) { + assert(error_code == 0); + } + + if (!x86_ptw_translate(cs, gpa, &real_hpa, + debug, 1, user_access, access_type, + &nested_page_size, error_code, fault_addr, + nested_fault, prot, NULL)) { + if (nested_fault) { + *nested_fault = S2_GPA; + } + return false; + } + + if (real_page_size < nested_page_size) { + real_page_size = nested_page_size; + } + } + + if (hpa) { + *hpa = real_hpa; + } + + if (page_size) { + *page_size = real_page_size; + } + + if (prot) { + *prot = 0; + if (user_access) { + if (user_read_ok) { + *prot |= PAGE_READ; + } + if (user_write_ok) { + *prot |= PAGE_WRITE; + } + if (user_exec_ok) { + *prot |= PAGE_EXEC; + } + } else { + if (super_read_ok) { + *prot |= PAGE_READ; + } + if (super_write_ok) { + *prot |= PAGE_WRITE; + } + if (super_exec_ok) { + *prot |= PAGE_EXEC; + } + } + } + + if (dirty) { + *dirty = pt_entry.dirty; + } + + return true; + + fault_out: + if (fault_addr) { + *fault_addr = vaddress; + } + return false; + +} /* PAE Paging or IA-32e Paging */ static void walk_pte(MemoryMappingList *list, AddressSpace *as, @@ -273,7 +1264,7 @@ bool x86_cpu_get_memory_mapping(CPUState *cs, MemoryMappingList *list, CPUX86State *env = &cpu->env; int32_t a20_mask; - if (!cpu_paging_enabled(cs)) { + if (!cpu_paging_enabled(cs, 0)) { /* paging is disabled */ return true; } @@ -313,4 +1304,3 @@ bool x86_cpu_get_memory_mapping(CPUState *cs, MemoryMappingList *list, return true; } - diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 4688d140c2..ec419e0ef0 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -8050,11 +8050,25 @@ static int64_t x86_cpu_get_arch_id(CPUState *cs) } #if !defined(CONFIG_USER_ONLY) -static bool x86_cpu_get_paging_enabled(const CPUState *cs) +static bool x86_cpu_get_paging_enabled(const CPUState *cs, int mmu_idx) { X86CPU *cpu = X86_CPU(cs); - return cpu->env.cr[0] & CR0_PG_MASK; + if (mmu_idx == 0) { + return cpu->env.cr[0] & CR0_PG_MASK; + } else if (mmu_idx == 1) { + if (cpu->env.hflags & HF_GUEST_MASK) { + if (!cpu->env.vm_state_valid) { + warn_report_once("Attempt to query virtualization state on an" + "unsupported accelerator. This operation will" + "not work properly on this configuration."); + return false; + } + + return cpu->env.nested_paging; + } + } + return false; } #endif /* !CONFIG_USER_ONLY */ @@ -8369,6 +8383,11 @@ static const struct SysemuCPUOps i386_sysemu_ops = { .write_elf32_qemunote = x86_cpu_write_elf32_qemunote, .write_elf64_qemunote = x86_cpu_write_elf64_qemunote, .legacy_vmsd = &vmstate_x86_cpu, + .page_table_root = &x86_page_table_root, + .get_pte = &x86_get_pte, + .mon_init_page_table_iterator = &x86_mon_init_page_table_iterator, + .mon_info_pg_print_header = &x86_mon_info_pg_print_header, + .mon_flush_page_print_state = &x86_mon_flush_print_pg_state, }; #endif diff --git a/target/i386/cpu.h b/target/i386/cpu.h index d899644cb8..4e5877f41d 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -303,6 +303,7 @@ typedef enum X86Seg { #define DR_RESERVED_MASK 0xffffffff00000000ULL +/* Regular x86 Page Bits */ #define PG_PRESENT_BIT 0 #define PG_RW_BIT 1 #define PG_USER_BIT 2 @@ -340,6 +341,28 @@ typedef enum X86Seg { #define PG_ERROR_I_D_MASK 0x10 #define PG_ERROR_PK_MASK 0x20 +/* EPT Bits */ +#define PG_EPT_R_BIT 0 +#define PG_EPT_W_BIT 1 +#define PG_EPT_X_SUPER_BIT 2 +#define PG_EPT_PSE_BIT 7 +#define PG_EPT_ACCESSED_BIT 8 +#define PG_EPT_DIRTY_BIT 9 /* Only set on leaves */ +#define PG_EPT_X_USER_BIT 10 + +#define PG_EPT_R_MASK (1 << PG_EPT_R_BIT) +#define PG_EPT_W_MASK (1 << PG_EPT_W_BIT) +#define PG_EPT_X_SUPER_MASK (1 << PG_EPT_X_SUPER_BIT) +#define PG_EPT_PSE_MASK (1 << PG_EPT_PSE_BIT) +#define PG_EPT_ACCESSED_MASK (1 << PG_EPT_ACCESSED_BIT) +#define PG_EPT_DIRTY_MASK (1 << PG_EPT_DIRTY_BIT) +#define PG_EPT_X_USER_MASK (1 << PG_EPT_X_USER_BIT) + +/* EPT_X_USER_BIT only checked if vm mode based controls enabled */ +#define PG_EPT_PRESENT_MASK (PG_EPT_R_MASK | PG_EPT_W_MASK \ + | PG_EPT_X_SUPER_MASK) + + #define PG_MODE_PAE (1 << 0) #define PG_MODE_LMA (1 << 1) #define PG_MODE_NXE (1 << 2) @@ -1167,6 +1190,7 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w); #define VMX_SECONDARY_EXEC_RDSEED_EXITING 0x00010000 #define VMX_SECONDARY_EXEC_ENABLE_PML 0x00020000 #define VMX_SECONDARY_EXEC_XSAVES 0x00100000 +#define VMX_SECONDARY_EXEC_ENABLE_MODE_BASED_EXC 0x00400000 #define VMX_SECONDARY_EXEC_TSC_SCALING 0x02000000 #define VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE 0x04000000 @@ -1862,6 +1886,19 @@ typedef struct CPUArchState { }; /* break/watchpoints for dr[0..3] */ int old_exception; /* exception in flight */ + /* Genericized architectural state for virtualization. Work in progress */ + bool nested_paging; /* Nested or extended hardware paging enabled */ + bool vm_state_valid; /* Not all accelerators sync nested_cr3 */ + bool enable_ept_accessed_dirty; + bool enable_mode_based_access_control; + uint8_t nested_pg_height; + uint8_t nested_pg_format; /* 0 == Intel EPT, 1 == AMD NPT */ + uint64_t nested_pg_root; + /* End generic architctural state for virtualization */ + + /**** accelerator specific virtualization state *****/ + + /* TCG */ uint64_t vm_vmcb; uint64_t tsc_offset; uint64_t intercept; @@ -2212,8 +2249,28 @@ int x86_cpu_write_elf64_qemunote(WriteCoreDumpFunction f, CPUState *cpu, int x86_cpu_write_elf32_qemunote(WriteCoreDumpFunction f, CPUState *cpu, DumpState *s); +int get_pg_mode(CPUX86State *env); +hwaddr x86_page_table_root(CPUState *cs, const PageTableLayout **layout, + int mmu_idx); +bool x86_get_pte(CPUState *cs, hwaddr node, int i, int height, + DecodedPTE *pt_entry, vaddr vaddr_parent, bool debug, + int mmu_idx, bool user_access, const MMUAccessType access_type, + int *error_code, hwaddr *fault_addr, + TranslateFaultStage2 *nested_fault); +int x86_virtual_to_pte_index(CPUState *cs, vaddr vaddr_in, int height); bool x86_cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list, Error **errp); +bool x86_mon_init_page_table_iterator(CPUState *cpu, GString *buf, int mmu_idx, + struct mem_print_state *state); +void x86_mon_info_pg_print_header(struct mem_print_state *state); +bool x86_mon_flush_print_pg_state(CPUState *cs, struct mem_print_state *state); +bool x86_ptw_translate(CPUState *cs, vaddr vaddress, hwaddr *hpa, + bool debug, int mmu_idx, bool user_access, + const MMUAccessType access_type, uint64_t *page_size, + int *error_code, hwaddr *fault_addr, + TranslateFaultStage2 *nested_fault, int *prot, bool * + dirty); + void x86_cpu_dump_state(CPUState *cs, FILE *f, int flags); @@ -2363,7 +2420,6 @@ void host_cpuid(uint32_t function, uint32_t count, bool cpu_has_x2apic_feature(CPUX86State *env); /* helper.c */ -int get_pg_mode(CPUX86State *env); void x86_cpu_set_a20(X86CPU *cpu, int a20_state); void cpu_sync_avx_hflag(CPUX86State *env); diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index becca2efa5..a81e1eac87 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -65,6 +65,8 @@ #include "exec/memattrs.h" #include "trace.h" +#include "vmcs12.h" + #include CONFIG_DEVICES //#define DEBUG_KVM @@ -4676,6 +4678,72 @@ static int kvm_get_nested_state(X86CPU *cpu) env->hflags &= ~HF_GUEST_MASK; } + + if (env->hflags & HF_GUEST_MASK) { + + /* Extract the EPTP value from vmcs12 structure, store in arch state */ + if (env->nested_state->format == KVM_STATE_NESTED_FORMAT_VMX) { + struct vmcs12 *vmcs = + (struct vmcs12 *) env->nested_state->data.vmx->vmcs12; + + assert(vmcs->hdr.revision_id == VMCS12_REVISION); + + /* See if EPT is enabled */ + if (vmcs->secondary_vm_exec_control & + VMX_SECONDARY_EXEC_ENABLE_EPT) { + env->nested_paging = true; + env->nested_pg_format = 0; + + /* Decode the ept pointer following SDM 24.6.11 */ + /* The height of the tree is encoded in bits 5:3, height -1 */ + uint8_t height = (uint8_t) (vmcs->ept_pointer >> 3); + height &= 7; + height++; + env->nested_pg_height = height; + /* The accessed/dirty flag is in bit 6 of the EPTP*/ + env->enable_ept_accessed_dirty = + !!(vmcs->ept_pointer & (1 << 6)); + + /* Mask out low 12 bits, bits beyond physical addr width */ + uint64_t phys_mask = MAKE_64BIT_MASK(0, cpu->phys_bits); + phys_mask &= ~0xfff; + env->nested_pg_root = vmcs->ept_pointer & phys_mask; + + if (vmcs->secondary_vm_exec_control & + VMX_SECONDARY_EXEC_ENABLE_MODE_BASED_EXC) { + env->enable_mode_based_access_control = true; + } else { + env->enable_mode_based_access_control = false; + } + } else { + env->nested_paging = false; + env->enable_mode_based_access_control = false; + } + env->vm_state_valid = true; + } else if (env->nested_state->format == KVM_STATE_NESTED_FORMAT_SVM) { + struct vmcb *vmcb = (struct vmcb *) env->nested_state->data.svm; + + /* See if nested paging is enabled */ + if (vmcb->control.nested_ctl & SVM_NPT_ENABLED) { + env->nested_paging = true; + env->nested_pg_format = 1; + env->nested_pg_root = vmcb->control.nested_cr3; + + env->nested_pg_height = env->cr[4] & CR4_LA57_MASK ? + 5 : 4; + + env->enable_ept_accessed_dirty = false; + env->enable_mode_based_access_control = false; + + } else { + env->nested_paging = false; + env->enable_mode_based_access_control = false; + } + env->vm_state_valid = true; + } + } + + /* Keep HF2_GIF_MASK set on !SVM as x86_cpu_pending_interrupt() needs it */ if (cpu_has_svm(env)) { if (env->nested_state->flags & KVM_STATE_NESTED_GIF_SET) { diff --git a/target/i386/monitor.c b/target/i386/monitor.c index 2d766b2637..8ef92e7c42 100644 --- a/target/i386/monitor.c +++ b/target/i386/monitor.c @@ -32,6 +32,181 @@ #include "qapi/qapi-commands-misc-target.h" #include "qapi/qapi-commands-misc.h" +/********************* x86 specific hooks for printing page table stuff ****/ + +const char *names[7] = {(char *)NULL, "PTE", "PDE", "PDP", "PML4", "Pml5", + (char *)NULL}; +static char *pg_bits(CPUState *cs, hwaddr ent, int mmu_idx) +{ + static char buf[32]; + CPUX86State *env = cpu_env(cs); + + if (mmu_idx == 0 + || (mmu_idx == 1 + && env->vm_state_valid && env->nested_pg_format == 1)) { + snprintf(buf, 32, "%c%c%c%c%c%c%c%c%c%c", + ent & PG_NX_MASK ? 'X' : '-', + ent & PG_GLOBAL_MASK ? 'G' : '-', + ent & PG_PSE_MASK ? 'S' : '-', + ent & PG_DIRTY_MASK ? 'D' : '-', + ent & PG_ACCESSED_MASK ? 'A' : '-', + ent & PG_PCD_MASK ? 'C' : '-', + ent & PG_PWT_MASK ? 'T' : '-', + ent & PG_USER_MASK ? 'U' : '-', + ent & PG_RW_MASK ? 'W' : '-', + ent & PG_PRESENT_MASK ? 'P' : '-'); + } else if (mmu_idx == 1) { + bool accessed = false; + bool dirty = false; + X86CPU *cpu = X86_CPU(cs); + + if (cpu->env.enable_ept_accessed_dirty) { + accessed = !!(ent & PG_EPT_ACCESSED_MASK); + dirty = !!(ent & (PG_EPT_ACCESSED_MASK | PG_EPT_PSE_MASK)); + } + + snprintf(buf, 32, "%c%c%c%c%c%c%c ", + ent & PG_EPT_X_USER_MASK ? 'U' : '-', + dirty ? 'D' : '-', + accessed ? 'A' : '-', + ent & PG_EPT_PSE_MASK ? 'S' : '-', + ent & PG_EPT_X_SUPER_MASK ? 'X' : '-', + ent & PG_EPT_W_MASK ? 'W' : '-', + ent & PG_EPT_R_MASK ? 'R' : '-'); + } else { + g_assert_not_reached(); + } + return buf; +} + +bool x86_mon_init_page_table_iterator(CPUState *cs, GString *buf, int mmu_idx, + struct mem_print_state *state) +{ + X86CPU *cpu = X86_CPU(cs); + CPUX86State *env = &cpu->env; + + state->env = env; + state->buf = buf; + state->mmu_idx = mmu_idx; + state->flush_interior = false; + state->require_physical_contiguity = false; + + for (int i = 0; i < MAX_HEIGHT; i++) { + state->vstart[i] = -1; + state->last_offset[i] = 0; + } + state->start_height = 0; + + if (!(env->cr[0] & CR0_PG_MASK)) { + g_string_append_printf(buf, "PG disabled\n"); + return false; + } + + /* set va and pa width */ + if (env->cr[4] & CR4_PAE_MASK) { + state->paw = 13; +#ifdef TARGET_X86_64 + if (env->hflags & HF_LMA_MASK) { + if (env->cr[4] & CR4_LA57_MASK) { + state->vaw = 15; + state->max_height = 5; + } else { + state->vaw = 12; + state->max_height = 4; + } + } else +#endif + { + state->vaw = 8; + state->max_height = 3; + } + } else { + state->max_height = 2; + state->vaw = 8; + state->paw = 8; + } + + return true; +} + +void x86_mon_info_pg_print_header(struct mem_print_state *state) +{ + /* Header line */ + g_string_append_printf(state->buf, "%-*s %-13s %-10s %*s%s\n", + 3 + 2 * (state->vaw - 3), "VPN range", + "Entry", "Flags", + 2 * (state->max_height - 1), "", + "Physical page(s)"); +} + + +static void pg_print(CPUState *cs, GString *out_buf, uint64_t pt_ent, + vaddr vaddr_s, vaddr vaddr_l, + hwaddr paddr_s, hwaddr paddr_l, + int offset_s, int offset_l, + int height, int max_height, int vaw, int paw, + uint64_t page_size, bool is_leaf, int mmu_idx) + +{ + g_autoptr(GString) buf = g_string_new(""); + + /* VFN range */ + g_string_append_printf(buf, "%*s[%0*"PRIx64"-%0*"PRIx64"] ", + (max_height - height) * 2, "", + vaw - 3, vaddr_s >> 12, + vaw - 3, (vaddr_l + page_size - 1) >> 12); + + /* Slot */ + if (vaddr_s == vaddr_l) { + g_string_append_printf(buf, "%4s[%03x] ", + names[height], offset_s); + } else { + g_string_append_printf(buf, "%4s[%03x-%03x]", + names[height], offset_s, offset_l); + } + + /* Flags */ + g_string_append_printf(buf, " %s", pg_bits(cs, pt_ent, mmu_idx)); + + + /* Range-compressed PFN's */ + if (is_leaf) { + if (vaddr_s == vaddr_l) { + g_string_append_printf(buf, " %0*"PRIx64, + paw - 3, (uint64_t)paddr_s >> 12); + } else { + g_string_append_printf(buf, " %0*"PRIx64"-%0*"PRIx64, + paw - 3, (uint64_t)paddr_s >> 12, + paw - 3, (uint64_t)paddr_l >> 12); + } + } + + /* Trim line to fit screen */ + g_string_truncate(buf, 79); + + g_string_append_printf(out_buf, "%s\n", buf->str); +} + +/* Returns true if it emitted anything */ +bool x86_mon_flush_print_pg_state(CPUState *cs, struct mem_print_state *state) +{ + bool ret = false; + for (int i = state->start_height; i > 0; i--) { + if (state->vstart[i] == -1) { + break; + } + ret = true; + pg_print(cs, state->buf, state->prot[i], + state->vstart[i], state->vend[i], + state->pstart, state->pend, + state->offset[i], state->last_offset[i], + i, state->max_height, state->vaw, state->paw, + state->pg_size[i], i == state->leaf_height, state->mmu_idx); + } + + return ret; +} + /* Perform linear address sign extension */ static hwaddr addr_canonical(CPUArchState *env, hwaddr addr) { From patchwork Tue Jul 23 01:05:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Porter X-Patchwork-Id: 13739263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5BC03C3DA5D for ; Tue, 23 Jul 2024 01:07:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sW3yj-00027P-6R; Mon, 22 Jul 2024 21:06:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sW3yg-0001yM-Qa for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:07 -0400 Received: from mail-oi1-x235.google.com ([2607:f8b0:4864:20::235]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sW3yd-0008Pz-Kx for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:06 -0400 Received: by mail-oi1-x235.google.com with SMTP id 5614622812f47-3daf0e73a62so1239823b6e.0 for ; Mon, 22 Jul 2024 18:06:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.unc.edu; s=google; t=1721696762; x=1722301562; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=I1vAWukA7skXvo38uXnn+rRVz1qXPJidwroYA3B/VcE=; b=NdxLX70oVFXTMZXASMkxXddfqEBTQLHYAGLFcgdppxyqt5oIl3UjrzjvHmNKltRMpm gWhtCMzA3V7Zyrr+fOAlU1mHBdLWP2ft1EZ9R8SO/kT45xm0qJQLc6RDkk8UhlSsS/TE QRhtuzWEbP311Ukz1QCdsfsHJ9IXsX6cMTg16cdbRJ0EfScwT7Pz8TIM8sMejTA3MbZ2 Acy/rE32dsvSnUyLluxaII9WJIY4oXAXbJ4y1KyHiHijD6ruEe3CbT4Tclc0PmjgtU/Q laRSBZ7ehkDQx9FOOwjDGVXMQn0/l5cVRa5oN4azalIGpY5P+35rvelUQhP+5Wae3XEW Y2Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721696762; x=1722301562; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=I1vAWukA7skXvo38uXnn+rRVz1qXPJidwroYA3B/VcE=; b=QVHxpsMrypr06P4/ifJEY4nzAnQkPNnA/tb2Ok/Z+yZmRbwzEtwDgDksINSiXAcOZp OSMa1jlQcft2x+CMknnlzQtW1CTbs+QxIT6Czoxvv9L8aL+rZ1VfBK3YBhGQDHVHWmjY jV8TVmcuggj0g3T2SejuvXXVCX1NUXUbV3INhCkaw1NLRg9CFJKyXQoR/JbqyiqkWq2f 9ES1uWmjuVRbzbA/hS4RPCzmReC+jNMoyEMNDO7AsGdjNTvBIEEDA7J5/MKrfqcub2pN 3XrDXi5omZeTBy0TlVK2q8NqjnW1P5myt6Xeh8uKr2CgqDzAyjipxRhGJgmGdmJNLagH nMtw== X-Gm-Message-State: AOJu0Yz4CDhHUcFh5C94qxTwFH6v52dDCHuy77F9UZ/1667BTyR6unKP YTrEwlml/VKLiCV41aFriTxTA2ZGoOreM6t0lmogdnr8hh39n6a1g5x99PuRqPzD3JrIPdJr5u8 K8p176UdPRIDEwR4kXbqo1+pvRx1Qf45NGdoBoe9qDVpFxpt8c2cKZhFFUvifexD9cfQod4eRs1 /CknPMH0Wkol8gMKWyzzrwQPWHCmXm X-Google-Smtp-Source: AGHT+IFZjplj2BlS0CUke0Xi6NgQdzK8CxyQafZOIjSnguJ6OD0QAEFgeKTdGA3pumBnq++PxS+bew== X-Received: by 2002:a05:6871:54e:b0:260:e611:c09 with SMTP id 586e51a60fabf-2646932e43cmr1229980fac.47.1721696761536; Mon, 22 Jul 2024 18:06:01 -0700 (PDT) Received: from kermit.cs.unc.edu (kermit.cs.unc.edu. [152.2.133.133]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f9cdc0cbdsm37953601cf.92.2024.07.22.18.06.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 18:06:01 -0700 (PDT) From: Don Porter To: qemu-devel@nongnu.org Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com, richard.henderson@linaro.org, philmd@linaro.org, berrange@redhat.com, Don Porter Subject: [PATCH v4 4/7] Convert 'info tlb' to use generic iterator. Date: Mon, 22 Jul 2024 21:05:42 -0400 Message-Id: <20240723010545.3648706-5-porter@cs.unc.edu> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240723010545.3648706-1-porter@cs.unc.edu> References: <20240723010545.3648706-1-porter@cs.unc.edu> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::235; envelope-from=porter@cs.unc.edu; helo=mail-oi1-x235.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org If the the guest is using nested page tables, change the output format slightly, to first show guest virtual to guest physical, then guest physical to host physical, as below: (qemu) info tlb Info guest TLB (guest virtual to guest physical): 0000008000800000: 000000000076a000 -------U-P 0000008000801000: 000000000076b000 -------U-P 0000008000802000: 000000000076c000 -------U-P 0000008000803000: 000000000076d000 -------U-P [...] 0000008004ffd000: 0000000000ffd000 --------WP 0000008004ffe000: 0000000000ffe000 --------WP 0000008004fff000: 0000000000fff000 --------WP Info host TLB, (guest physical to host physical): 0000000000001000: 0000000001b20000 ----XWR 0000000000002000: 0000000001b21000 ----XWR 0000000000003000: 0000000001b22000 ----XWR 0000000000004000: 0000000001b23000 ----XWR 0000000000005000: 0000000001b24000 ----XWR [...] Signed-off-by: Don Porter --- include/hw/core/sysemu-cpu-ops.h | 7 + target/i386/cpu.c | 1 + target/i386/cpu.h | 2 + target/i386/monitor.c | 233 +++++++++---------------------- 4 files changed, 75 insertions(+), 168 deletions(-) diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h index d0e939def8..083df4717c 100644 --- a/include/hw/core/sysemu-cpu-ops.h +++ b/include/hw/core/sysemu-cpu-ops.h @@ -225,6 +225,13 @@ typedef struct SysemuCPUOps { bool (*mon_flush_page_print_state)(CPUState *cs, struct mem_print_state *state); + /** + * @mon_print_pte: Hook called by the monitor to print a page + * table entry at address addr, with contents pte. + */ + void (*mon_print_pte) (CPUState *cs, GString *buf, hwaddr addr, + hwaddr pte, uint64_t prot, int mmu_idx); + } SysemuCPUOps; int compressing_iterator(CPUState *cs, void *data, DecodedPTE *pte, diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ec419e0ef0..030198497a 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -8388,6 +8388,7 @@ static const struct SysemuCPUOps i386_sysemu_ops = { .mon_init_page_table_iterator = &x86_mon_init_page_table_iterator, .mon_info_pg_print_header = &x86_mon_info_pg_print_header, .mon_flush_page_print_state = &x86_mon_flush_print_pg_state, + .mon_print_pte = &x86_mon_print_pte, }; #endif diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 4e5877f41d..413c743c1a 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2264,6 +2264,8 @@ bool x86_mon_init_page_table_iterator(CPUState *cpu, GString *buf, int mmu_idx, struct mem_print_state *state); void x86_mon_info_pg_print_header(struct mem_print_state *state); bool x86_mon_flush_print_pg_state(CPUState *cs, struct mem_print_state *state); +void x86_mon_print_pte(CPUState *cs, GString *out_buf, hwaddr addr, + hwaddr child, uint64_t prot, int mmu_idx); bool x86_ptw_translate(CPUState *cs, vaddr vaddress, hwaddr *hpa, bool debug, int mmu_idx, bool user_access, const MMUAccessType access_type, uint64_t *page_size, diff --git a/target/i386/monitor.c b/target/i386/monitor.c index 8ef92e7c42..d88347684b 100644 --- a/target/i386/monitor.c +++ b/target/i386/monitor.c @@ -224,201 +224,98 @@ static hwaddr addr_canonical(CPUArchState *env, hwaddr addr) return addr; } -static void print_pte(Monitor *mon, CPUArchState *env, hwaddr addr, - hwaddr pte, hwaddr mask) +void x86_mon_print_pte(CPUState *cs, GString *out_buf, hwaddr addr, + hwaddr child, uint64_t prot, int mmu_idx) { + CPUX86State *env = cpu_env(cs); + g_autoptr(GString) buf = g_string_new(""); + addr = addr_canonical(env, addr); - monitor_printf(mon, HWADDR_FMT_plx ": " HWADDR_FMT_plx - " %c%c%c%c%c%c%c%c%c\n", - addr, - pte & mask, - pte & PG_NX_MASK ? 'X' : '-', - pte & PG_GLOBAL_MASK ? 'G' : '-', - pte & PG_PSE_MASK ? 'P' : '-', - pte & PG_DIRTY_MASK ? 'D' : '-', - pte & PG_ACCESSED_MASK ? 'A' : '-', - pte & PG_PCD_MASK ? 'C' : '-', - pte & PG_PWT_MASK ? 'T' : '-', - pte & PG_USER_MASK ? 'U' : '-', - pte & PG_RW_MASK ? 'W' : '-'); -} + g_string_append_printf(buf, HWADDR_FMT_plx ": " HWADDR_FMT_plx " ", + addr, child); -static void tlb_info_32(Monitor *mon, CPUArchState *env) -{ - unsigned int l1, l2; - uint32_t pgd, pde, pte; + g_string_append_printf(buf, " %s", pg_bits(cs, prot, mmu_idx)); - pgd = env->cr[3] & ~0xfff; - for(l1 = 0; l1 < 1024; l1++) { - cpu_physical_memory_read(pgd + l1 * 4, &pde, 4); - pde = le32_to_cpu(pde); - if (pde & PG_PRESENT_MASK) { - if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) { - /* 4M pages */ - print_pte(mon, env, (l1 << 22), pde, ~((1 << 21) - 1)); - } else { - for(l2 = 0; l2 < 1024; l2++) { - cpu_physical_memory_read((pde & ~0xfff) + l2 * 4, &pte, 4); - pte = le32_to_cpu(pte); - if (pte & PG_PRESENT_MASK) { - print_pte(mon, env, (l1 << 22) + (l2 << 12), - pte & ~PG_PSE_MASK, - ~0xfff); - } - } - } - } - } -} - -static void tlb_info_pae32(Monitor *mon, CPUArchState *env) -{ - unsigned int l1, l2, l3; - uint64_t pdpe, pde, pte; - uint64_t pdp_addr, pd_addr, pt_addr; + /* Trim line to fit screen */ + g_string_truncate(buf, 79); - pdp_addr = env->cr[3] & ~0x1f; - for (l1 = 0; l1 < 4; l1++) { - cpu_physical_memory_read(pdp_addr + l1 * 8, &pdpe, 8); - pdpe = le64_to_cpu(pdpe); - if (pdpe & PG_PRESENT_MASK) { - pd_addr = pdpe & 0x3fffffffff000ULL; - for (l2 = 0; l2 < 512; l2++) { - cpu_physical_memory_read(pd_addr + l2 * 8, &pde, 8); - pde = le64_to_cpu(pde); - if (pde & PG_PRESENT_MASK) { - if (pde & PG_PSE_MASK) { - /* 2M pages with PAE, CR4.PSE is ignored */ - print_pte(mon, env, (l1 << 30) + (l2 << 21), pde, - ~((hwaddr)(1 << 20) - 1)); - } else { - pt_addr = pde & 0x3fffffffff000ULL; - for (l3 = 0; l3 < 512; l3++) { - cpu_physical_memory_read(pt_addr + l3 * 8, &pte, 8); - pte = le64_to_cpu(pte); - if (pte & PG_PRESENT_MASK) { - print_pte(mon, env, (l1 << 30) + (l2 << 21) - + (l3 << 12), - pte & ~PG_PSE_MASK, - ~(hwaddr)0xfff); - } - } - } - } - } - } - } + g_string_append_printf(out_buf, "%s\n", buf->str); } -#ifdef TARGET_X86_64 -static void tlb_info_la48(Monitor *mon, CPUArchState *env, - uint64_t l0, uint64_t pml4_addr) +static +int mem_print_tlb(CPUState *cs, void *data, DecodedPTE *pte, int height, + int offset, int mmu_idx, const PageTableLayout *layout) { - uint64_t l1, l2, l3, l4; - uint64_t pml4e, pdpe, pde, pte; - uint64_t pdp_addr, pd_addr, pt_addr; - - for (l1 = 0; l1 < 512; l1++) { - cpu_physical_memory_read(pml4_addr + l1 * 8, &pml4e, 8); - pml4e = le64_to_cpu(pml4e); - if (!(pml4e & PG_PRESENT_MASK)) { - continue; - } - - pdp_addr = pml4e & 0x3fffffffff000ULL; - for (l2 = 0; l2 < 512; l2++) { - cpu_physical_memory_read(pdp_addr + l2 * 8, &pdpe, 8); - pdpe = le64_to_cpu(pdpe); - if (!(pdpe & PG_PRESENT_MASK)) { - continue; - } + struct mem_print_state *state = (struct mem_print_state *) data; + CPUClass *cc = CPU_GET_CLASS(cs); - if (pdpe & PG_PSE_MASK) { - /* 1G pages, CR4.PSE is ignored */ - print_pte(mon, env, (l0 << 48) + (l1 << 39) + (l2 << 30), - pdpe, 0x3ffffc0000000ULL); - continue; - } - - pd_addr = pdpe & 0x3fffffffff000ULL; - for (l3 = 0; l3 < 512; l3++) { - cpu_physical_memory_read(pd_addr + l3 * 8, &pde, 8); - pde = le64_to_cpu(pde); - if (!(pde & PG_PRESENT_MASK)) { - continue; - } + cc->sysemu_ops->mon_print_pte(cs, state->buf, pte->bits_translated, + pte->child, pte->prot, mmu_idx); - if (pde & PG_PSE_MASK) { - /* 2M pages, CR4.PSE is ignored */ - print_pte(mon, env, (l0 << 48) + (l1 << 39) + (l2 << 30) + - (l3 << 21), pde, 0x3ffffffe00000ULL); - continue; - } - - pt_addr = pde & 0x3fffffffff000ULL; - for (l4 = 0; l4 < 512; l4++) { - cpu_physical_memory_read(pt_addr - + l4 * 8, - &pte, 8); - pte = le64_to_cpu(pte); - if (pte & PG_PRESENT_MASK) { - print_pte(mon, env, (l0 << 48) + (l1 << 39) + - (l2 << 30) + (l3 << 21) + (l4 << 12), - pte & ~PG_PSE_MASK, 0x3fffffffff000ULL); - } - } - } - } - } + return 0; } -static void tlb_info_la57(Monitor *mon, CPUArchState *env) +static +void helper_hmp_info_tlb(CPUState *cs, Monitor *mon, int mmu_idx) { - uint64_t l0; - uint64_t pml5e; - uint64_t pml5_addr; + struct mem_print_state state; + g_autoptr(GString) buf = g_string_new(""); + CPUClass *cc = CPU_GET_CLASS(cs); - pml5_addr = env->cr[3] & 0x3fffffffff000ULL; - for (l0 = 0; l0 < 512; l0++) { - cpu_physical_memory_read(pml5_addr + l0 * 8, &pml5e, 8); - pml5e = le64_to_cpu(pml5e); - if (pml5e & PG_PRESENT_MASK) { - tlb_info_la48(mon, env, l0, pml5e & 0x3fffffffff000ULL); - } + if (!cc->sysemu_ops->mon_init_page_table_iterator(cs, buf, mmu_idx, + &state)) { + monitor_printf(mon, "Unable to initialize page table iterator\n"); + return; } + + /** + * 'info tlb' visits only leaf PTEs marked present. + * It does not check other protection bits. + */ + for_each_pte(cs, &mem_print_tlb, &state, false, false, false, mmu_idx); + + monitor_printf(mon, "%s", buf->str); } -#endif /* TARGET_X86_64 */ void hmp_info_tlb(Monitor *mon, const QDict *qdict) { - CPUArchState *env; + CPUState *cs = mon_get_cpu(mon); + bool nested; - env = mon_get_cpu_env(mon); - if (!env) { - monitor_printf(mon, "No CPU available\n"); + if (!cs) { + monitor_printf(mon, "Unable to get CPUState. Internal error\n"); return; } - if (!(env->cr[0] & CR0_PG_MASK)) { + if (!cpu_paging_enabled(cs, 0)) { monitor_printf(mon, "PG disabled\n"); return; } - if (env->cr[4] & CR4_PAE_MASK) { -#ifdef TARGET_X86_64 - if (env->hflags & HF_LMA_MASK) { - if (env->cr[4] & CR4_LA57_MASK) { - tlb_info_la57(mon, env); - } else { - tlb_info_la48(mon, env, 0, env->cr[3] & 0x3fffffffff000ULL); - } - } else -#endif - { - tlb_info_pae32(mon, env); - } - } else { - tlb_info_32(mon, env); + + CPUClass *cc = CPU_GET_CLASS(cs); + + if (!cc->sysemu_ops->mon_print_pte + || !cc->sysemu_ops->mon_init_page_table_iterator) { + monitor_printf(mon, "Info tlb unsupported on this ISA\n"); + return; + } + + nested = cpu_paging_enabled(cs, 1); + + if (nested) { + monitor_printf(mon, + "Info guest TLB (guest virtual to guest physical):\n"); + } + + helper_hmp_info_tlb(cs, mon, 0); + + if (nested) { + monitor_printf(mon, + "Info host TLB, (guest physical to host physical):\n"); + + helper_hmp_info_tlb(cs, mon, 1); + } } From patchwork Tue Jul 23 01:05:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Porter X-Patchwork-Id: 13739259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2A85FC3DA70 for ; Tue, 23 Jul 2024 01:06:51 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sW3yk-0002EA-SY; Mon, 22 Jul 2024 21:06:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sW3yi-00026D-Sd for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:08 -0400 Received: from mail-qk1-x731.google.com ([2607:f8b0:4864:20::731]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sW3yf-0008QF-CW for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:08 -0400 Received: by mail-qk1-x731.google.com with SMTP id af79cd13be357-79f17d6be18so301940985a.2 for ; Mon, 22 Jul 2024 18:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.unc.edu; s=google; t=1721696763; x=1722301563; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mmrfwWInRb+J4QwsonGZ8qptunQS8YEDUwaLiBZ/2Sk=; b=Yfgy9FSYxOqOBhmzJYAqdpM0yIOFFa+O0nn4ekmBMypudFJHPngOGG6gmeZW5kA5/C UNc7wBkScXflbiy14LQPjjMVVh5I0iofUoV+Nc0/doJ8TERBgigQEEfD//tlxtx5U4y2 WhWi7W39WhLv9sp8hi9+7Cpm4nTiUMvWG2Y1N9Ko7IJvUsl8tGwvEohrmm8Ti1Wh36sd pziQ0KMzRRjoValr3sdjkqjKTJ6RM4WwD+aB8uTBeN9hfVL72FaPcTTRAbHMENuKM8GM WSqYI2MmCDC0CuIPx1FCaZ+cdwgU8XQwwlWC85HZPAAN8C2ExfFOqlfDTk3PPhyDmKAR fyTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721696763; x=1722301563; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mmrfwWInRb+J4QwsonGZ8qptunQS8YEDUwaLiBZ/2Sk=; b=vTiZr5hFP/b6JcuMoUdGiKBt1empcMwnQcbhic1kofinTBkV2USHXCB3xxhxNTQ6F5 xNwCREV+RMerWiK1oUBqqM3u7Tse+xyToo+eUpaLmJcHsW9rl79xlLpm55hLrO2gNIgq GMoc/Qq1Ea/vtJ9pXDAn/KzGGRPFtIjSu24g40R5Ya4edyjoJvv66pyboiRygmlGOZKl Qi+cno1DSKmu/AYxX3fOIa472X//Im3vQwLzOHZROHLK5tdPUVSvlE0zRSFred2rRkFj jjl8MkZ777bx1JgeWDmJm1kgBfwwqetwJEPC4zLKPxlcCjy209Kwb964wET5hg29KYn9 p8YQ== X-Gm-Message-State: AOJu0Yz1kMvkfSoAYf14cB17V+D2bTJyKA+NSIFCPFx1mqOzbkja1Vmt SfaDJwOMPPGkiUuXxEJTg8d9HMDS/QVjTGhPUfjV6bXSDtczGcUQaKj/35qfYZtHi5kexSnSTrD /68mM/2z7s0gvBehzoUtd65pkf3It6riPZzr/gke8LMnJ1cBE0Wg9s5oPEhXKKRRsp2LK1drAgK kBYcJgEik8xMh8dM7zHZm+RfGwtqDw X-Google-Smtp-Source: AGHT+IFsR8LpA1/fxaN6Rf+TOkN41uoP9UAxQID1tIlbhL+UjCQY5/rLYU72gJbtzo33leUTREWT9g== X-Received: by 2002:a05:620a:45a6:b0:7a1:c431:357b with SMTP id af79cd13be357-7a1c4313826mr22601085a.33.1721696763318; Mon, 22 Jul 2024 18:06:03 -0700 (PDT) Received: from kermit.cs.unc.edu (kermit.cs.unc.edu. [152.2.133.133]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f9cdc0cbdsm37953601cf.92.2024.07.22.18.06.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 18:06:02 -0700 (PDT) From: Don Porter To: qemu-devel@nongnu.org Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com, richard.henderson@linaro.org, philmd@linaro.org, berrange@redhat.com, Don Porter Subject: [PATCH v4 5/7] Convert 'info mem' to use generic iterator Date: Mon, 22 Jul 2024 21:05:43 -0400 Message-Id: <20240723010545.3648706-6-porter@cs.unc.edu> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240723010545.3648706-1-porter@cs.unc.edu> References: <20240723010545.3648706-1-porter@cs.unc.edu> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::731; envelope-from=porter@cs.unc.edu; helo=mail-qk1-x731.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org In the case of nested paging, change the output slightly to show both the guest's and host's view. For example: (qemu) info mem Info guest mem (guest virtual to guest physical mappings): 0000008000800000-000000800085c000 000000000005c000 ur- 0000008000a00000-0000008000a10000 0000000000010000 ur- 0000008003fa8000-0000008003fb8000 0000000000010000 -rw 0000008003fc0000-0000008003fd0000 0000000000010000 -rw 0000008003fd8000-0000008003fe8000 0000000000010000 -rw 0000008003ff0000-0000008005000000 0000000001010000 -rw Info host mem (guest physical to host physical mappings): 0000000000001000-000000000000f000 000000000000e000 -xwr 00000000000b8000-00000000000b9000 0000000000001000 -xwr 0000000000100000-0000000000108000 0000000000008000 -xwr 0000000000200000-00000000007c6000 00000000005c6000 -xwr Signed-off-by: Don Porter --- include/hw/core/sysemu-cpu-ops.h | 6 + target/i386/cpu.c | 1 + target/i386/cpu.h | 1 + target/i386/monitor.c | 387 +++++++------------------------ 4 files changed, 95 insertions(+), 300 deletions(-) diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h index 083df4717c..f8b71fb60d 100644 --- a/include/hw/core/sysemu-cpu-ops.h +++ b/include/hw/core/sysemu-cpu-ops.h @@ -232,6 +232,12 @@ typedef struct SysemuCPUOps { void (*mon_print_pte) (CPUState *cs, GString *buf, hwaddr addr, hwaddr pte, uint64_t prot, int mmu_idx); + /** + * @mon_print_mem: Hook called by the monitor to print a range + * of memory mappings in 'info mem' + */ + bool (*mon_print_mem)(CPUState *cs, struct mem_print_state *state); + } SysemuCPUOps; int compressing_iterator(CPUState *cs, void *data, DecodedPTE *pte, diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 030198497a..f9ca2cddd3 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -8389,6 +8389,7 @@ static const struct SysemuCPUOps i386_sysemu_ops = { .mon_info_pg_print_header = &x86_mon_info_pg_print_header, .mon_flush_page_print_state = &x86_mon_flush_print_pg_state, .mon_print_pte = &x86_mon_print_pte, + .mon_print_mem = &x86_mon_print_mem, }; #endif diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 413c743c1a..da565bb7da 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2266,6 +2266,7 @@ void x86_mon_info_pg_print_header(struct mem_print_state *state); bool x86_mon_flush_print_pg_state(CPUState *cs, struct mem_print_state *state); void x86_mon_print_pte(CPUState *cs, GString *out_buf, hwaddr addr, hwaddr child, uint64_t prot, int mmu_idx); +bool x86_mon_print_mem(CPUState *cs, struct mem_print_state *state); bool x86_ptw_translate(CPUState *cs, vaddr vaddress, hwaddr *hpa, bool debug, int mmu_idx, bool user_access, const MMUAccessType access_type, uint64_t *page_size, diff --git a/target/i386/monitor.c b/target/i386/monitor.c index d88347684b..318f9b7ca2 100644 --- a/target/i386/monitor.c +++ b/target/i386/monitor.c @@ -319,331 +319,118 @@ void hmp_info_tlb(Monitor *mon, const QDict *qdict) } } -static void mem_print(Monitor *mon, CPUArchState *env, - hwaddr *pstart, int *plast_prot, - hwaddr end, int prot) +bool x86_mon_print_mem(CPUState *cs, struct mem_print_state *state) { - int prot1; - prot1 = *plast_prot; - if (prot != prot1) { - if (*pstart != -1) { - monitor_printf(mon, HWADDR_FMT_plx "-" HWADDR_FMT_plx " " - HWADDR_FMT_plx " %c%c%c\n", - addr_canonical(env, *pstart), - addr_canonical(env, end), - addr_canonical(env, end - *pstart), - prot1 & PG_USER_MASK ? 'u' : '-', - 'r', - prot1 & PG_RW_MASK ? 'w' : '-'); - } - if (prot != 0) - *pstart = end; - else - *pstart = -1; - *plast_prot = prot; - } -} + CPUArchState *env = state->env; + int i = 0; -static void mem_info_32(Monitor *mon, CPUArchState *env) -{ - unsigned int l1, l2; - int prot, last_prot; - uint32_t pgd, pde, pte; - hwaddr start, end; - - pgd = env->cr[3] & ~0xfff; - last_prot = 0; - start = -1; - for(l1 = 0; l1 < 1024; l1++) { - cpu_physical_memory_read(pgd + l1 * 4, &pde, 4); - pde = le32_to_cpu(pde); - end = l1 << 22; - if (pde & PG_PRESENT_MASK) { - if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) { - prot = pde & (PG_USER_MASK | PG_RW_MASK | PG_PRESENT_MASK); - mem_print(mon, env, &start, &last_prot, end, prot); - } else { - for(l2 = 0; l2 < 1024; l2++) { - cpu_physical_memory_read((pde & ~0xfff) + l2 * 4, &pte, 4); - pte = le32_to_cpu(pte); - end = (l1 << 22) + (l2 << 12); - if (pte & PG_PRESENT_MASK) { - prot = pte & pde & - (PG_USER_MASK | PG_RW_MASK | PG_PRESENT_MASK); - } else { - prot = 0; - } - mem_print(mon, env, &start, &last_prot, end, prot); - } - } - } else { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); + /* We need to figure out the lowest populated level */ + for ( ; i < state->max_height; i++) { + if (state->vstart[i] != -1) { + break; } } - /* Flush last range */ - mem_print(mon, env, &start, &last_prot, (hwaddr)1 << 32, 0); -} -static void mem_info_pae32(Monitor *mon, CPUArchState *env) -{ - unsigned int l1, l2, l3; - int prot, last_prot; - uint64_t pdpe, pde, pte; - uint64_t pdp_addr, pd_addr, pt_addr; - hwaddr start, end; - - pdp_addr = env->cr[3] & ~0x1f; - last_prot = 0; - start = -1; - for (l1 = 0; l1 < 4; l1++) { - cpu_physical_memory_read(pdp_addr + l1 * 8, &pdpe, 8); - pdpe = le64_to_cpu(pdpe); - end = l1 << 30; - if (pdpe & PG_PRESENT_MASK) { - pd_addr = pdpe & 0x3fffffffff000ULL; - for (l2 = 0; l2 < 512; l2++) { - cpu_physical_memory_read(pd_addr + l2 * 8, &pde, 8); - pde = le64_to_cpu(pde); - end = (l1 << 30) + (l2 << 21); - if (pde & PG_PRESENT_MASK) { - if (pde & PG_PSE_MASK) { - prot = pde & (PG_USER_MASK | PG_RW_MASK | - PG_PRESENT_MASK); - mem_print(mon, env, &start, &last_prot, end, prot); - } else { - pt_addr = pde & 0x3fffffffff000ULL; - for (l3 = 0; l3 < 512; l3++) { - cpu_physical_memory_read(pt_addr + l3 * 8, &pte, 8); - pte = le64_to_cpu(pte); - end = (l1 << 30) + (l2 << 21) + (l3 << 12); - if (pte & PG_PRESENT_MASK) { - prot = pte & pde & (PG_USER_MASK | PG_RW_MASK | - PG_PRESENT_MASK); - } else { - prot = 0; - } - mem_print(mon, env, &start, &last_prot, end, prot); - } - } - } else { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); - } - } - } else { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); - } + hwaddr vstart = state->vstart[i]; + hwaddr end = state->vend[i] + state->pg_size[i]; + int prot = state->prot[i]; + + if (state->mmu_idx == 0 + || (state->mmu_idx == 1 && env->vm_state_valid + && env->nested_pg_format == 1)){ + + g_string_append_printf(state->buf, HWADDR_FMT_plx "-" HWADDR_FMT_plx " " + HWADDR_FMT_plx " %c%c%c\n", + addr_canonical(env, vstart), + addr_canonical(env, end), + addr_canonical(env, end - vstart), + prot & PG_USER_MASK ? 'u' : '-', + 'r', + prot & PG_RW_MASK ? 'w' : '-'); + return true; + } else if (state->mmu_idx == 1) { + g_string_append_printf(state->buf, HWADDR_FMT_plx "-" HWADDR_FMT_plx " " + HWADDR_FMT_plx " %c%c%c%c\n", + addr_canonical(env, vstart), + addr_canonical(env, end), + addr_canonical(env, end - vstart), + prot & PG_EPT_X_USER_MASK ? 'u' : '-', + prot & PG_EPT_X_SUPER_MASK ? 'x' : '-', + prot & PG_EPT_W_MASK ? 'w' : '-', + prot & PG_EPT_R_MASK ? 'r' : '-'); + + return true; + } else { + return false; } - /* Flush last range */ - mem_print(mon, env, &start, &last_prot, (hwaddr)1 << 32, 0); -} -#ifdef TARGET_X86_64 -static void mem_info_la48(Monitor *mon, CPUArchState *env) -{ - int prot, last_prot; - uint64_t l1, l2, l3, l4; - uint64_t pml4e, pdpe, pde, pte; - uint64_t pml4_addr, pdp_addr, pd_addr, pt_addr, start, end; - - pml4_addr = env->cr[3] & 0x3fffffffff000ULL; - last_prot = 0; - start = -1; - for (l1 = 0; l1 < 512; l1++) { - cpu_physical_memory_read(pml4_addr + l1 * 8, &pml4e, 8); - pml4e = le64_to_cpu(pml4e); - end = l1 << 39; - if (pml4e & PG_PRESENT_MASK) { - pdp_addr = pml4e & 0x3fffffffff000ULL; - for (l2 = 0; l2 < 512; l2++) { - cpu_physical_memory_read(pdp_addr + l2 * 8, &pdpe, 8); - pdpe = le64_to_cpu(pdpe); - end = (l1 << 39) + (l2 << 30); - if (pdpe & PG_PRESENT_MASK) { - if (pdpe & PG_PSE_MASK) { - prot = pdpe & (PG_USER_MASK | PG_RW_MASK | - PG_PRESENT_MASK); - prot &= pml4e; - mem_print(mon, env, &start, &last_prot, end, prot); - } else { - pd_addr = pdpe & 0x3fffffffff000ULL; - for (l3 = 0; l3 < 512; l3++) { - cpu_physical_memory_read(pd_addr + l3 * 8, &pde, 8); - pde = le64_to_cpu(pde); - end = (l1 << 39) + (l2 << 30) + (l3 << 21); - if (pde & PG_PRESENT_MASK) { - if (pde & PG_PSE_MASK) { - prot = pde & (PG_USER_MASK | PG_RW_MASK | - PG_PRESENT_MASK); - prot &= pml4e & pdpe; - mem_print(mon, env, &start, - &last_prot, end, prot); - } else { - pt_addr = pde & 0x3fffffffff000ULL; - for (l4 = 0; l4 < 512; l4++) { - cpu_physical_memory_read(pt_addr - + l4 * 8, - &pte, 8); - pte = le64_to_cpu(pte); - end = (l1 << 39) + (l2 << 30) + - (l3 << 21) + (l4 << 12); - if (pte & PG_PRESENT_MASK) { - prot = pte & (PG_USER_MASK | PG_RW_MASK | - PG_PRESENT_MASK); - prot &= pml4e & pdpe & pde; - } else { - prot = 0; - } - mem_print(mon, env, &start, - &last_prot, end, prot); - } - } - } else { - prot = 0; - mem_print(mon, env, &start, - &last_prot, end, prot); - } - } - } - } else { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); - } - } - } else { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); - } - } - /* Flush last range */ - mem_print(mon, env, &start, &last_prot, (hwaddr)1 << 48, 0); } -static void mem_info_la57(Monitor *mon, CPUArchState *env) +static +void helper_hmp_info_mem(CPUState *cs, Monitor *mon, int mmu_idx) { - int prot, last_prot; - uint64_t l0, l1, l2, l3, l4; - uint64_t pml5e, pml4e, pdpe, pde, pte; - uint64_t pml5_addr, pml4_addr, pdp_addr, pd_addr, pt_addr, start, end; - - pml5_addr = env->cr[3] & 0x3fffffffff000ULL; - last_prot = 0; - start = -1; - for (l0 = 0; l0 < 512; l0++) { - cpu_physical_memory_read(pml5_addr + l0 * 8, &pml5e, 8); - pml5e = le64_to_cpu(pml5e); - end = l0 << 48; - if (!(pml5e & PG_PRESENT_MASK)) { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); - continue; - } + struct mem_print_state state; + g_autoptr(GString) buf = g_string_new(""); - pml4_addr = pml5e & 0x3fffffffff000ULL; - for (l1 = 0; l1 < 512; l1++) { - cpu_physical_memory_read(pml4_addr + l1 * 8, &pml4e, 8); - pml4e = le64_to_cpu(pml4e); - end = (l0 << 48) + (l1 << 39); - if (!(pml4e & PG_PRESENT_MASK)) { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); - continue; - } + CPUClass *cc = CPU_GET_CLASS(cs); - pdp_addr = pml4e & 0x3fffffffff000ULL; - for (l2 = 0; l2 < 512; l2++) { - cpu_physical_memory_read(pdp_addr + l2 * 8, &pdpe, 8); - pdpe = le64_to_cpu(pdpe); - end = (l0 << 48) + (l1 << 39) + (l2 << 30); - if (pdpe & PG_PRESENT_MASK) { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); - continue; - } - - if (pdpe & PG_PSE_MASK) { - prot = pdpe & (PG_USER_MASK | PG_RW_MASK | - PG_PRESENT_MASK); - prot &= pml5e & pml4e; - mem_print(mon, env, &start, &last_prot, end, prot); - continue; - } - - pd_addr = pdpe & 0x3fffffffff000ULL; - for (l3 = 0; l3 < 512; l3++) { - cpu_physical_memory_read(pd_addr + l3 * 8, &pde, 8); - pde = le64_to_cpu(pde); - end = (l0 << 48) + (l1 << 39) + (l2 << 30) + (l3 << 21); - if (pde & PG_PRESENT_MASK) { - prot = 0; - mem_print(mon, env, &start, &last_prot, end, prot); - continue; - } - - if (pde & PG_PSE_MASK) { - prot = pde & (PG_USER_MASK | PG_RW_MASK | - PG_PRESENT_MASK); - prot &= pml5e & pml4e & pdpe; - mem_print(mon, env, &start, &last_prot, end, prot); - continue; - } - - pt_addr = pde & 0x3fffffffff000ULL; - for (l4 = 0; l4 < 512; l4++) { - cpu_physical_memory_read(pt_addr + l4 * 8, &pte, 8); - pte = le64_to_cpu(pte); - end = (l0 << 48) + (l1 << 39) + (l2 << 30) + - (l3 << 21) + (l4 << 12); - if (pte & PG_PRESENT_MASK) { - prot = pte & (PG_USER_MASK | PG_RW_MASK | - PG_PRESENT_MASK); - prot &= pml5e & pml4e & pdpe & pde; - } else { - prot = 0; - } - mem_print(mon, env, &start, &last_prot, end, prot); - } - } - } - } + if (!cc->sysemu_ops->mon_init_page_table_iterator(cs, buf, mmu_idx, + &state)) { + monitor_printf(mon, "Unable to initialize page table iterator\n"); + return; } - /* Flush last range */ - mem_print(mon, env, &start, &last_prot, (hwaddr)1 << 57, 0); + + state.flusher = cc->sysemu_ops->mon_print_mem; + + /** + * We must visit interior entries to update prot + */ + for_each_pte(cs, &compressing_iterator, &state, true, false, false, + mmu_idx); + + /* Flush the last entry, if needed */ + cc->sysemu_ops->mon_print_mem(cs, &state); + + monitor_printf(mon, "%s", buf->str); } -#endif /* TARGET_X86_64 */ void hmp_info_mem(Monitor *mon, const QDict *qdict) { - CPUArchState *env; + CPUState *cs = mon_get_cpu(mon); + bool nested; - env = mon_get_cpu_env(mon); - if (!env) { - monitor_printf(mon, "No CPU available\n"); + if (!cs) { + monitor_printf(mon, "Unable to get CPUState. Internal error\n"); return; } - if (!(env->cr[0] & CR0_PG_MASK)) { + if (!cpu_paging_enabled(cs, 0)) { monitor_printf(mon, "PG disabled\n"); return; } - if (env->cr[4] & CR4_PAE_MASK) { -#ifdef TARGET_X86_64 - if (env->hflags & HF_LMA_MASK) { - if (env->cr[4] & CR4_LA57_MASK) { - mem_info_la57(mon, env); - } else { - mem_info_la48(mon, env); - } - } else -#endif - { - mem_info_pae32(mon, env); - } - } else { - mem_info_32(mon, env); + + CPUClass *cc = CPU_GET_CLASS(cs); + + if (!cc->sysemu_ops->mon_print_mem + || !cc->sysemu_ops->mon_init_page_table_iterator) { + monitor_printf(mon, "Info tlb unsupported on this ISA\n"); + } + + nested = cpu_paging_enabled(cs, 1); + + if (nested) { + monitor_printf(mon, + "Info guest mem (guest virtual to guest physical mappings):\n"); + } + + helper_hmp_info_mem(cs, mon, 0); + + if (nested) { + monitor_printf(mon, + "Info host mem (guest physical to host physical mappings):\n"); + + helper_hmp_info_mem(cs, mon, 1); } } From patchwork Tue Jul 23 01:05:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Porter X-Patchwork-Id: 13739266 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F2E42C3DA7E for ; Tue, 23 Jul 2024 01:07:36 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sW3yl-0002Fs-BF; Mon, 22 Jul 2024 21:06:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sW3yj-00028w-HI for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:09 -0400 Received: from mail-oa1-x2b.google.com ([2001:4860:4864:20::2b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sW3yh-0008QO-Bu for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:09 -0400 Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-26106ec9336so1840097fac.2 for ; Mon, 22 Jul 2024 18:06:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.unc.edu; s=google; t=1721696765; x=1722301565; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XJ8HWFv9ihWOF+E22RR1hIhCiKb1NCoJ/EAF7Qw8N2w=; b=JH/gdf5vLxE2C/xGE/WxYi2tzZOVXezxM0pgjfaGzjbNYUDiJsiWs14qVL/lFBhYcv p5NpsjF9m2QlaoS9Dstm3jU2wfo5gOHq5deHIFDuWm2+AQSVgIBmsJ8ODqHBBR6TFbTE Ai2wxSuf0XyDbNW4hqubdFM4ytY1HqwlK374ONbf4jMqM4kXUChy9Gd3u2FYLg0o6mgw j0xR4qHJSLqJjeJn4QYtWL6vUwP5uARf0oauOf2yRiUGma5m0q7LTxKwGpKTbOXpapde z7XVUNxm51RU0RI1sbXkXhIcS5iAHypDP/54Ty8hn11semN6YIEENMMI5c4CLHPAT3/J vh7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721696765; x=1722301565; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XJ8HWFv9ihWOF+E22RR1hIhCiKb1NCoJ/EAF7Qw8N2w=; b=brChgUkIWBmP720at3X4lw/MXPXUKCAUMv5U+Nn0JrFcpD4oYjM/tf76udWbz7iFwh TdSFW0l62Hca+s6E1vjGPU4I4qrV8/ygGSXjr5G3aILoOZNjxMoipzJigYgi+uikzLjf qK8g6rkIkEMVLgFZLzZgZOxBcojpaopXI7HBPBMiTTpQ/VVQmw32rs+Aq2Wtmtkjai1J 7RX1FA5+DZovyRc69xxC4x1d9nQswCIqKS47MNLM7ur+WPIiLi8ixiT5vycyAQBKxsyE 3MynHalx1eMYBk5e2Us9d9mnRi2AFwyowQAIBG0jpe1Nx/nKJnU9A0UNDmWdNzOaQ8tK Czjg== X-Gm-Message-State: AOJu0YwjhPNu+GuHmxK4GQpxqr/LC8XxaBT5kjM1aTCVbkwv8FrKcxlr jWxX8G9SpA1BPHNq1k6vcPpDUG+nRGZukR+tkxq4YDvxpNrRrHJehyIKgcQsaDMCuyLbjC1Mpf4 iD0Z8TkG5G1KpeEWHS2Eb7rImaR4NdTLUQwxFtZTIk7MhOr/caoR+F740qwqK3OT1epRjltW3BD r2U8lzE3tk4gQyvleH6Saz7lUdAvhm X-Google-Smtp-Source: AGHT+IG3tQdAsCXld+axsJc6tuV9o09i+qPwtsqsIcdpk3J9dv22JjttraF7iwSPsGMwvU31ByGN2w== X-Received: by 2002:a05:6870:ac25:b0:261:88b:36fe with SMTP id 586e51a60fabf-263ab315e4emr4668266fac.15.1721696764905; Mon, 22 Jul 2024 18:06:04 -0700 (PDT) Received: from kermit.cs.unc.edu (kermit.cs.unc.edu. [152.2.133.133]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f9cdc0cbdsm37953601cf.92.2024.07.22.18.06.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 18:06:04 -0700 (PDT) From: Don Porter To: qemu-devel@nongnu.org Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com, richard.henderson@linaro.org, philmd@linaro.org, berrange@redhat.com, Don Porter Subject: [PATCH v4 6/7] Convert x86_cpu_get_memory_mapping() to use generic iterators Date: Mon, 22 Jul 2024 21:05:44 -0400 Message-Id: <20240723010545.3648706-7-porter@cs.unc.edu> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240723010545.3648706-1-porter@cs.unc.edu> References: <20240723010545.3648706-1-porter@cs.unc.edu> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4860:4864:20::2b; envelope-from=porter@cs.unc.edu; helo=mail-oa1-x2b.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Don Porter --- target/i386/arch_memory_mapping.c | 305 ++---------------------------- 1 file changed, 19 insertions(+), 286 deletions(-) diff --git a/target/i386/arch_memory_mapping.c b/target/i386/arch_memory_mapping.c index ef29e4b42f..bb97443f0f 100644 --- a/target/i386/arch_memory_mapping.c +++ b/target/i386/arch_memory_mapping.c @@ -1006,301 +1006,34 @@ bool x86_ptw_translate(CPUState *cs, vaddr vaddress, hwaddr *hpa, } -/* PAE Paging or IA-32e Paging */ -static void walk_pte(MemoryMappingList *list, AddressSpace *as, - hwaddr pte_start_addr, - int32_t a20_mask, target_ulong start_line_addr) -{ - hwaddr pte_addr, start_paddr; - uint64_t pte; - target_ulong start_vaddr; - int i; - - for (i = 0; i < 512; i++) { - pte_addr = (pte_start_addr + i * 8) & a20_mask; - pte = address_space_ldq(as, pte_addr, MEMTXATTRS_UNSPECIFIED, NULL); - if (!(pte & PG_PRESENT_MASK)) { - /* not present */ - continue; - } - - start_paddr = (pte & ~0xfff) & ~(0x1ULL << 63); - if (cpu_physical_memory_is_io(start_paddr)) { - /* I/O region */ - continue; - } - - start_vaddr = start_line_addr | ((i & 0x1ff) << 12); - memory_mapping_list_add_merge_sorted(list, start_paddr, - start_vaddr, 1 << 12); - } -} - -/* 32-bit Paging */ -static void walk_pte2(MemoryMappingList *list, AddressSpace *as, - hwaddr pte_start_addr, int32_t a20_mask, - target_ulong start_line_addr) -{ - hwaddr pte_addr, start_paddr; - uint32_t pte; - target_ulong start_vaddr; - int i; - - for (i = 0; i < 1024; i++) { - pte_addr = (pte_start_addr + i * 4) & a20_mask; - pte = address_space_ldl(as, pte_addr, MEMTXATTRS_UNSPECIFIED, NULL); - if (!(pte & PG_PRESENT_MASK)) { - /* not present */ - continue; - } - - start_paddr = pte & ~0xfff; - if (cpu_physical_memory_is_io(start_paddr)) { - /* I/O region */ - continue; - } - - start_vaddr = start_line_addr | ((i & 0x3ff) << 12); - memory_mapping_list_add_merge_sorted(list, start_paddr, - start_vaddr, 1 << 12); - } -} - -/* PAE Paging or IA-32e Paging */ -#define PLM4_ADDR_MASK 0xffffffffff000ULL /* selects bits 51:12 */ - -static void walk_pde(MemoryMappingList *list, AddressSpace *as, - hwaddr pde_start_addr, - int32_t a20_mask, target_ulong start_line_addr) -{ - hwaddr pde_addr, pte_start_addr, start_paddr; - uint64_t pde; - target_ulong line_addr, start_vaddr; - int i; - - for (i = 0; i < 512; i++) { - pde_addr = (pde_start_addr + i * 8) & a20_mask; - pde = address_space_ldq(as, pde_addr, MEMTXATTRS_UNSPECIFIED, NULL); - if (!(pde & PG_PRESENT_MASK)) { - /* not present */ - continue; - } - - line_addr = start_line_addr | ((i & 0x1ff) << 21); - if (pde & PG_PSE_MASK) { - /* 2 MB page */ - start_paddr = (pde & ~0x1fffff) & ~(0x1ULL << 63); - if (cpu_physical_memory_is_io(start_paddr)) { - /* I/O region */ - continue; - } - start_vaddr = line_addr; - memory_mapping_list_add_merge_sorted(list, start_paddr, - start_vaddr, 1 << 21); - continue; - } - - pte_start_addr = (pde & PLM4_ADDR_MASK) & a20_mask; - walk_pte(list, as, pte_start_addr, a20_mask, line_addr); - } -} - -/* 32-bit Paging */ -static void walk_pde2(MemoryMappingList *list, AddressSpace *as, - hwaddr pde_start_addr, int32_t a20_mask, - bool pse) -{ - hwaddr pde_addr, pte_start_addr, start_paddr, high_paddr; - uint32_t pde; - target_ulong line_addr, start_vaddr; - int i; - - for (i = 0; i < 1024; i++) { - pde_addr = (pde_start_addr + i * 4) & a20_mask; - pde = address_space_ldl(as, pde_addr, MEMTXATTRS_UNSPECIFIED, NULL); - if (!(pde & PG_PRESENT_MASK)) { - /* not present */ - continue; - } - - line_addr = (((unsigned int)i & 0x3ff) << 22); - if ((pde & PG_PSE_MASK) && pse) { - /* - * 4 MB page: - * bits 39:32 are bits 20:13 of the PDE - * bit3 31:22 are bits 31:22 of the PDE - */ - high_paddr = ((hwaddr)(pde & 0x1fe000) << 19); - start_paddr = (pde & ~0x3fffff) | high_paddr; - if (cpu_physical_memory_is_io(start_paddr)) { - /* I/O region */ - continue; - } - start_vaddr = line_addr; - memory_mapping_list_add_merge_sorted(list, start_paddr, - start_vaddr, 1 << 22); - continue; - } - - pte_start_addr = (pde & ~0xfff) & a20_mask; - walk_pte2(list, as, pte_start_addr, a20_mask, line_addr); - } -} - -/* PAE Paging */ -static void walk_pdpe2(MemoryMappingList *list, AddressSpace *as, - hwaddr pdpe_start_addr, int32_t a20_mask) -{ - hwaddr pdpe_addr, pde_start_addr; - uint64_t pdpe; - target_ulong line_addr; - int i; - - for (i = 0; i < 4; i++) { - pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask; - pdpe = address_space_ldq(as, pdpe_addr, MEMTXATTRS_UNSPECIFIED, NULL); - if (!(pdpe & PG_PRESENT_MASK)) { - /* not present */ - continue; - } - - line_addr = (((unsigned int)i & 0x3) << 30); - pde_start_addr = (pdpe & ~0xfff) & a20_mask; - walk_pde(list, as, pde_start_addr, a20_mask, line_addr); - } -} +struct memory_mapping_data { + MemoryMappingList *list; +}; -#ifdef TARGET_X86_64 -/* IA-32e Paging */ -static void walk_pdpe(MemoryMappingList *list, AddressSpace *as, - hwaddr pdpe_start_addr, int32_t a20_mask, - target_ulong start_line_addr) +static int add_memory_mapping_to_list(CPUState *cs, void *data, DecodedPTE *pte, + int height, int offset, int mmu_idx, + const PageTableLayout *layout) { - hwaddr pdpe_addr, pde_start_addr, start_paddr; - uint64_t pdpe; - target_ulong line_addr, start_vaddr; - int i; - - for (i = 0; i < 512; i++) { - pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask; - pdpe = address_space_ldq(as, pdpe_addr, MEMTXATTRS_UNSPECIFIED, NULL); - if (!(pdpe & PG_PRESENT_MASK)) { - /* not present */ - continue; - } - - line_addr = start_line_addr | ((i & 0x1ffULL) << 30); - if (pdpe & PG_PSE_MASK) { - /* 1 GB page */ - start_paddr = (pdpe & ~0x3fffffff) & ~(0x1ULL << 63); - if (cpu_physical_memory_is_io(start_paddr)) { - /* I/O region */ - continue; - } - start_vaddr = line_addr; - memory_mapping_list_add_merge_sorted(list, start_paddr, - start_vaddr, 1 << 30); - continue; - } - - pde_start_addr = (pdpe & PLM4_ADDR_MASK) & a20_mask; - walk_pde(list, as, pde_start_addr, a20_mask, line_addr); - } -} + struct memory_mapping_data *mm_data = (struct memory_mapping_data *) data; -/* IA-32e Paging */ -static void walk_pml4e(MemoryMappingList *list, AddressSpace *as, - hwaddr pml4e_start_addr, int32_t a20_mask, - target_ulong start_line_addr) -{ - hwaddr pml4e_addr, pdpe_start_addr; - uint64_t pml4e; - target_ulong line_addr; - int i; - - for (i = 0; i < 512; i++) { - pml4e_addr = (pml4e_start_addr + i * 8) & a20_mask; - pml4e = address_space_ldq(as, pml4e_addr, MEMTXATTRS_UNSPECIFIED, - NULL); - if (!(pml4e & PG_PRESENT_MASK)) { - /* not present */ - continue; - } + /* In the case of nested paging, give the real, host-physical mapping. */ + hwaddr start_paddr = pte->pte_host_addr; + size_t pg_size = pte->leaf_page_size; - line_addr = start_line_addr | ((i & 0x1ffULL) << 39); - pdpe_start_addr = (pml4e & PLM4_ADDR_MASK) & a20_mask; - walk_pdpe(list, as, pdpe_start_addr, a20_mask, line_addr); + /* This hook skips mappings for the I/O region */ + if (cpu_physical_memory_is_io(start_paddr)) { + /* I/O region */ + return 0; } -} - -static void walk_pml5e(MemoryMappingList *list, AddressSpace *as, - hwaddr pml5e_start_addr, int32_t a20_mask) -{ - hwaddr pml5e_addr, pml4e_start_addr; - uint64_t pml5e; - target_ulong line_addr; - int i; - - for (i = 0; i < 512; i++) { - pml5e_addr = (pml5e_start_addr + i * 8) & a20_mask; - pml5e = address_space_ldq(as, pml5e_addr, MEMTXATTRS_UNSPECIFIED, - NULL); - if (!(pml5e & PG_PRESENT_MASK)) { - /* not present */ - continue; - } - line_addr = (0x7fULL << 57) | ((i & 0x1ffULL) << 48); - pml4e_start_addr = (pml5e & PLM4_ADDR_MASK) & a20_mask; - walk_pml4e(list, as, pml4e_start_addr, a20_mask, line_addr); - } + memory_mapping_list_add_merge_sorted(mm_data->list, start_paddr, + pte->bits_translated, pg_size); + return 0; } -#endif bool x86_cpu_get_memory_mapping(CPUState *cs, MemoryMappingList *list, Error **errp) { - X86CPU *cpu = X86_CPU(cs); - CPUX86State *env = &cpu->env; - int32_t a20_mask; - - if (!cpu_paging_enabled(cs, 0)) { - /* paging is disabled */ - return true; - } - - a20_mask = x86_get_a20_mask(env); - if (env->cr[4] & CR4_PAE_MASK) { -#ifdef TARGET_X86_64 - if (env->hflags & HF_LMA_MASK) { - if (env->cr[4] & CR4_LA57_MASK) { - hwaddr pml5e_addr; - - pml5e_addr = (env->cr[3] & PLM4_ADDR_MASK) & a20_mask; - walk_pml5e(list, cs->as, pml5e_addr, a20_mask); - } else { - hwaddr pml4e_addr; - - pml4e_addr = (env->cr[3] & PLM4_ADDR_MASK) & a20_mask; - walk_pml4e(list, cs->as, pml4e_addr, a20_mask, - 0xffffULL << 48); - } - } else -#endif - { - hwaddr pdpe_addr; - - pdpe_addr = (env->cr[3] & ~0x1f) & a20_mask; - walk_pdpe2(list, cs->as, pdpe_addr, a20_mask); - } - } else { - hwaddr pde_addr; - bool pse; - - pde_addr = (env->cr[3] & ~0xfff) & a20_mask; - pse = !!(env->cr[4] & CR4_PSE_MASK); - walk_pde2(list, cs->as, pde_addr, a20_mask, pse); - } - - return true; + return for_each_pte(cs, &add_memory_mapping_to_list, list, false, false, + false, 0); } From patchwork Tue Jul 23 01:05:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Porter X-Patchwork-Id: 13739260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 92660C3DA5D for ; Tue, 23 Jul 2024 01:07:07 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sW3yl-0002Gj-Lm; Mon, 22 Jul 2024 21:06:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sW3yk-0002Dp-Ow for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:10 -0400 Received: from mail-qt1-x82d.google.com ([2607:f8b0:4864:20::82d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sW3yi-0008Qp-HW for qemu-devel@nongnu.org; Mon, 22 Jul 2024 21:06:10 -0400 Received: by mail-qt1-x82d.google.com with SMTP id d75a77b69052e-447f25e65f9so24636371cf.3 for ; Mon, 22 Jul 2024 18:06:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.unc.edu; s=google; t=1721696767; x=1722301567; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XWDXnEPKNEpB10DKBkm4YAVj9PA/XeShZDvT8s5ahIU=; b=YGVH/aaCSe664C5ZG0ggrdmROP8H8Kiy8kQjStfTxYAuL6EFxcFCvrWS9fiDMxbt77 LiCkqcPD7l4++DJ0VF80d1GdrHV1jDMh2/Ju3uF3uXUGMF1iwO+Na6Mu2wiZ5dvc9/IF mSFEX5YzYTqHaVT1Oy5/JYOkv0q2Ru/zrXMiyQOody3sRwPSKkG9P3a3GPzvlRp5IL98 G6pUg0M9DYO59iwlpd4fDW1gispD0jdP9k9FvlZ6250Mft/8gXHdDw3a+1mpdoyhcSIO M4ryNMe49tPttOu35Z7AUL9yH2uvcLqWfERyKFB/ldX+htEUS/Xc6PkRrgfuC7UNGPeq n63g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721696767; x=1722301567; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XWDXnEPKNEpB10DKBkm4YAVj9PA/XeShZDvT8s5ahIU=; b=Q9d7T/7i8UJ5JChuc2xqyF3rnz48/iRiSIPWmRbF3sGsEuvSVkvdjxJYaSW/CQ0I4+ o0czzZyficqduZZ7YNolyOfXYqCW0u2W1MfEQaIx7rwXmSxqmpOS/fKO7RC+6liO42Qf h+I/UsQW5l11gzZI484BKM52eV9WbLAraZPg4kU2hrl8fDbnnnLbxm1JIqNS+ZBour6p nAnrtsSqmSpnIJ+OJql+7TC69bek4Z+bhuzYHouJPsmiiqhruPQ6rSH32ZSHX/oq1ttB 02jvNJFxVQfE9U/mSQSxWrc8yumE9C64Hi/cSKLYtYX+6JG59Uln/mnGM16FyesC2A74 unVA== X-Gm-Message-State: AOJu0Yw3xVYARWn1CdX/8NokR3VeHk2k6Ro2PLsrhEK+YpVRlg5yE9o2 uCWIF4mHSVNbgDxlNDwQtVv6wLJoq3oUc438c9AULB+9wx62Yuw+78X/M2SUFCfaTE/C1yUd9qP A2WrHV/gEFSmnbE6/NMWc9FiaRARgNjuUhEjjX8107aBmvUukDs/qz5F0wwbL2pPH/2NowKY5RY MLgyceakSPJFIEO3TEsmG6WMqr+oIv X-Google-Smtp-Source: AGHT+IFaGbFIPPIqWA40uW1wev9Fb7KBPBt42XbfQf6WFvs4DG/Vkcs4gZ48ufZxyU+yHoMdknOVow== X-Received: by 2002:a05:622a:411:b0:447:e786:3f9b with SMTP id d75a77b69052e-44fc557b048mr20252751cf.49.1721696766621; Mon, 22 Jul 2024 18:06:06 -0700 (PDT) Received: from kermit.cs.unc.edu (kermit.cs.unc.edu. [152.2.133.133]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f9cdc0cbdsm37953601cf.92.2024.07.22.18.06.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 18:06:06 -0700 (PDT) From: Don Porter To: qemu-devel@nongnu.org Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com, richard.henderson@linaro.org, philmd@linaro.org, berrange@redhat.com, Don Porter Subject: [PATCH v4 7/7] Convert x86_mmu_translate() to use common code. Date: Mon, 22 Jul 2024 21:05:45 -0400 Message-Id: <20240723010545.3648706-8-porter@cs.unc.edu> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240723010545.3648706-1-porter@cs.unc.edu> References: <20240723010545.3648706-1-porter@cs.unc.edu> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::82d; envelope-from=porter@cs.unc.edu; helo=mail-qt1-x82d.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Don Porter --- target/i386/tcg/helper-tcg.h | 2 +- target/i386/tcg/sysemu/excp_helper.c | 417 ++------------------------- 2 files changed, 30 insertions(+), 389 deletions(-) diff --git a/target/i386/tcg/helper-tcg.h b/target/i386/tcg/helper-tcg.h index 1cbeab9161..8f63280359 100644 --- a/target/i386/tcg/helper-tcg.h +++ b/target/i386/tcg/helper-tcg.h @@ -96,7 +96,7 @@ void cpu_load_eflags(CPUX86State *env, int eflags, int update_mask); typedef struct TranslateFault { int exception_index; int error_code; - target_ulong cr2; + uint64_t cr2; TranslateFaultStage2 stage2; } TranslateFault; diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c index 3ebb67d65b..37c33fdfb1 100644 --- a/target/i386/tcg/sysemu/excp_helper.c +++ b/target/i386/tcg/sysemu/excp_helper.c @@ -36,32 +36,9 @@ typedef struct TranslateParams { typedef struct TranslateResult { hwaddr paddr; int prot; - int page_size; + uint64_t page_size; } TranslateResult; -static bool ptw_translate(PTETranslate *inout, hwaddr addr, uint64_t ra) -{ - CPUTLBEntryFull *full; - int flags; - - inout->gaddr = addr; - flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE, - inout->ptw_idx, true, &inout->haddr, &full, ra); - - if (unlikely(flags & TLB_INVALID_MASK)) { - TranslateFault *err = inout->err; - - assert(inout->ptw_idx == MMU_NESTED_IDX); - *err = (TranslateFault){ - .error_code = inout->env->error_code, - .cr2 = addr, - .stage2 = S2_GPT, - }; - return false; - } - return true; -} - static inline uint32_t ptw_ldl(const PTETranslate *in, uint64_t ra) { if (likely(in->haddr)) { @@ -102,371 +79,33 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, uint64_t ra) { const target_ulong addr = in->addr; - const int pg_mode = in->pg_mode; - const bool is_user = is_mmu_index_user(in->mmu_idx); - const MMUAccessType access_type = in->access_type; - uint64_t ptep, pte, rsvd_mask; - PTETranslate pte_trans = { - .env = env, - .err = err, - .ptw_idx = in->ptw_idx, - }; - hwaddr pte_addr, paddr; - uint32_t pkr; - int page_size; - int error_code; - - restart_all: - rsvd_mask = ~MAKE_64BIT_MASK(0, env_archcpu(env)->phys_bits); - rsvd_mask &= PG_ADDRESS_MASK; - if (!(pg_mode & PG_MODE_NXE)) { - rsvd_mask |= PG_NX_MASK; - } - - if (pg_mode & PG_MODE_PAE) { -#ifdef TARGET_X86_64 - if (pg_mode & PG_MODE_LMA) { - if (pg_mode & PG_MODE_LA57) { - /* - * Page table level 5 - */ - pte_addr = (in->cr3 & ~0xfff) + (((addr >> 48) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr, ra)) { - return false; - } - restart_5: - pte = ptw_ldq(&pte_trans, ra); - if (!(pte & PG_PRESENT_MASK)) { - goto do_fault; - } - if (pte & (rsvd_mask | PG_PSE_MASK)) { - goto do_fault_rsvd; - } - if (!ptw_setl(&pte_trans, pte, PG_ACCESSED_MASK)) { - goto restart_5; - } - ptep = pte ^ PG_NX_MASK; - } else { - pte = in->cr3; - ptep = PG_NX_MASK | PG_USER_MASK | PG_RW_MASK; - } - - /* - * Page table level 4 - */ - pte_addr = (pte & PG_ADDRESS_MASK) + (((addr >> 39) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr, ra)) { - return false; - } - restart_4: - pte = ptw_ldq(&pte_trans, ra); - if (!(pte & PG_PRESENT_MASK)) { - goto do_fault; - } - if (pte & (rsvd_mask | PG_PSE_MASK)) { - goto do_fault_rsvd; - } - if (!ptw_setl(&pte_trans, pte, PG_ACCESSED_MASK)) { - goto restart_4; - } - ptep &= pte ^ PG_NX_MASK; - - /* - * Page table level 3 - */ - pte_addr = (pte & PG_ADDRESS_MASK) + (((addr >> 30) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr, ra)) { - return false; - } - restart_3_lma: - pte = ptw_ldq(&pte_trans, ra); - if (!(pte & PG_PRESENT_MASK)) { - goto do_fault; - } - if (pte & rsvd_mask) { - goto do_fault_rsvd; - } - if (!ptw_setl(&pte_trans, pte, PG_ACCESSED_MASK)) { - goto restart_3_lma; - } - ptep &= pte ^ PG_NX_MASK; - if (pte & PG_PSE_MASK) { - /* 1 GB page */ - page_size = 1024 * 1024 * 1024; - goto do_check_protect; - } - } else -#endif - { - /* - * Page table level 3 - */ - pte_addr = (in->cr3 & 0xffffffe0ULL) + ((addr >> 27) & 0x18); - if (!ptw_translate(&pte_trans, pte_addr, ra)) { - return false; - } - rsvd_mask |= PG_HI_USER_MASK; - restart_3_nolma: - pte = ptw_ldq(&pte_trans, ra); - if (!(pte & PG_PRESENT_MASK)) { - goto do_fault; - } - if (pte & (rsvd_mask | PG_NX_MASK)) { - goto do_fault_rsvd; - } - if (!ptw_setl(&pte_trans, pte, PG_ACCESSED_MASK)) { - goto restart_3_nolma; - } - ptep = PG_NX_MASK | PG_USER_MASK | PG_RW_MASK; - } - - /* - * Page table level 2 - */ - pte_addr = (pte & PG_ADDRESS_MASK) + (((addr >> 21) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr, ra)) { - return false; - } - restart_2_pae: - pte = ptw_ldq(&pte_trans, ra); - if (!(pte & PG_PRESENT_MASK)) { - goto do_fault; - } - if (pte & rsvd_mask) { - goto do_fault_rsvd; - } - if (pte & PG_PSE_MASK) { - /* 2 MB page */ - page_size = 2048 * 1024; - ptep &= pte ^ PG_NX_MASK; - goto do_check_protect; - } - if (!ptw_setl(&pte_trans, pte, PG_ACCESSED_MASK)) { - goto restart_2_pae; - } - ptep &= pte ^ PG_NX_MASK; - - /* - * Page table level 1 - */ - pte_addr = (pte & PG_ADDRESS_MASK) + (((addr >> 12) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr, ra)) { - return false; - } - pte = ptw_ldq(&pte_trans, ra); - if (!(pte & PG_PRESENT_MASK)) { - goto do_fault; - } - if (pte & rsvd_mask) { - goto do_fault_rsvd; - } - /* combine pde and pte nx, user and rw protections */ - ptep &= pte ^ PG_NX_MASK; - page_size = 4096; - } else { - /* - * Page table level 2 - */ - pte_addr = (in->cr3 & 0xfffff000ULL) + ((addr >> 20) & 0xffc); - if (!ptw_translate(&pte_trans, pte_addr, ra)) { - return false; - } - restart_2_nopae: - pte = ptw_ldl(&pte_trans, ra); - if (!(pte & PG_PRESENT_MASK)) { - goto do_fault; - } - ptep = pte | PG_NX_MASK; - - /* if PSE bit is set, then we use a 4MB page */ - if ((pte & PG_PSE_MASK) && (pg_mode & PG_MODE_PSE)) { - page_size = 4096 * 1024; - /* - * Bits 20-13 provide bits 39-32 of the address, bit 21 is reserved. - * Leave bits 20-13 in place for setting accessed/dirty bits below. - */ - pte = (uint32_t)pte | ((pte & 0x1fe000LL) << (32 - 13)); - rsvd_mask = 0x200000; - goto do_check_protect_pse36; - } - if (!ptw_setl(&pte_trans, pte, PG_ACCESSED_MASK)) { - goto restart_2_nopae; - } - - /* - * Page table level 1 - */ - pte_addr = (pte & ~0xfffu) + ((addr >> 10) & 0xffc); - if (!ptw_translate(&pte_trans, pte_addr, ra)) { - return false; - } - pte = ptw_ldl(&pte_trans, ra); - if (!(pte & PG_PRESENT_MASK)) { - goto do_fault; - } - /* combine pde and pte user and rw protections */ - ptep &= pte | PG_NX_MASK; - page_size = 4096; - rsvd_mask = 0; - } - -do_check_protect: - rsvd_mask |= (page_size - 1) & PG_ADDRESS_MASK & ~PG_PSE_PAT_MASK; -do_check_protect_pse36: - if (pte & rsvd_mask) { - goto do_fault_rsvd; - } - ptep ^= PG_NX_MASK; - - /* can the page can be put in the TLB? prot will tell us */ - if (is_user && !(ptep & PG_USER_MASK)) { - goto do_fault_protect; - } - - int prot = 0; - if (!is_mmu_index_smap(in->mmu_idx) || !(ptep & PG_USER_MASK)) { - prot |= PAGE_READ; - if ((ptep & PG_RW_MASK) || !(is_user || (pg_mode & PG_MODE_WP))) { - prot |= PAGE_WRITE; - } - } - if (!(ptep & PG_NX_MASK) && - (is_user || - !((pg_mode & PG_MODE_SMEP) && (ptep & PG_USER_MASK)))) { - prot |= PAGE_EXEC; - } - - if (ptep & PG_USER_MASK) { - pkr = pg_mode & PG_MODE_PKE ? env->pkru : 0; - } else { - pkr = pg_mode & PG_MODE_PKS ? env->pkrs : 0; - } - if (pkr) { - uint32_t pk = (pte & PG_PKRU_MASK) >> PG_PKRU_BIT; - uint32_t pkr_ad = (pkr >> pk * 2) & 1; - uint32_t pkr_wd = (pkr >> pk * 2) & 2; - uint32_t pkr_prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC; - - if (pkr_ad) { - pkr_prot &= ~(PAGE_READ | PAGE_WRITE); - } else if (pkr_wd && (is_user || (pg_mode & PG_MODE_WP))) { - pkr_prot &= ~PAGE_WRITE; - } - if ((pkr_prot & (1 << access_type)) == 0) { - goto do_fault_pk_protect; - } - prot &= pkr_prot; - } - - if ((prot & (1 << access_type)) == 0) { - goto do_fault_protect; - } - - /* yes, it can! */ - { - uint32_t set = PG_ACCESSED_MASK; - if (access_type == MMU_DATA_STORE) { - set |= PG_DIRTY_MASK; - } else if (!(pte & PG_DIRTY_MASK)) { - /* - * Only set write access if already dirty... - * otherwise wait for dirty access. - */ - prot &= ~PAGE_WRITE; - } - if (!ptw_setl(&pte_trans, pte, set)) { - /* - * We can arrive here from any of 3 levels and 2 formats. - * The only safe thing is to restart the entire lookup. - */ - goto restart_all; - } + hwaddr paddr; + CPUState *cs = env_cpu(env); + bool dirty = false; + + bool ok = x86_ptw_translate(cs, addr, &paddr, false, + in->ptw_idx == MMU_NESTED_IDX ? 1 : 0, + is_mmu_index_user(in->mmu_idx), in->access_type, + &out->page_size, &err->error_code, + (hwaddr *) &err->cr2, &err->stage2, &out->prot, + &dirty); + if (!ok) { + + err->exception_index = EXCP0E_PAGE; + return false; } - /* merge offset within page */ - paddr = (pte & PG_ADDRESS_MASK & ~(page_size - 1)) | (addr & (page_size - 1)); - /* - * Note that NPT is walked (for both paging structures and final guest - * addresses) using the address with the A20 bit set. + * Only set write access if already dirty... + * otherwise wait for dirty access. */ - if (in->ptw_idx == MMU_NESTED_IDX) { - CPUTLBEntryFull *full; - int flags, nested_page_size; - - flags = probe_access_full(env, paddr, 0, access_type, - MMU_NESTED_IDX, true, - &pte_trans.haddr, &full, 0); - if (unlikely(flags & TLB_INVALID_MASK)) { - *err = (TranslateFault){ - .error_code = env->error_code, - .cr2 = paddr, - .stage2 = S2_GPA, - }; - return false; - } - - /* Merge stage1 & stage2 protection bits. */ - prot &= full->prot; - - /* Re-verify resulting protection. */ - if ((prot & (1 << access_type)) == 0) { - goto do_fault_protect; - } - - /* Merge stage1 & stage2 addresses to final physical address. */ - nested_page_size = 1 << full->lg_page_size; - paddr = (full->phys_addr & ~(nested_page_size - 1)) - | (paddr & (nested_page_size - 1)); - - /* - * Use the larger of stage1 & stage2 page sizes, so that - * invalidation works. - */ - if (nested_page_size > page_size) { - page_size = nested_page_size; - } + if (in->access_type != MMU_DATA_STORE && !dirty) { + out->prot &= ~PAGE_WRITE; } out->paddr = paddr & x86_get_a20_mask(env); - out->prot = prot; - out->page_size = page_size; - return true; - do_fault_rsvd: - error_code = PG_ERROR_RSVD_MASK; - goto do_fault_cont; - do_fault_protect: - error_code = PG_ERROR_P_MASK; - goto do_fault_cont; - do_fault_pk_protect: - assert(access_type != MMU_INST_FETCH); - error_code = PG_ERROR_PK_MASK | PG_ERROR_P_MASK; - goto do_fault_cont; - do_fault: - error_code = 0; - do_fault_cont: - if (is_user) { - error_code |= PG_ERROR_U_MASK; - } - switch (access_type) { - case MMU_DATA_LOAD: - break; - case MMU_DATA_STORE: - error_code |= PG_ERROR_W_MASK; - break; - case MMU_INST_FETCH: - if (pg_mode & (PG_MODE_NXE | PG_MODE_SMEP)) { - error_code |= PG_ERROR_I_D_MASK; - } - break; - } - *err = (TranslateFault){ - .exception_index = EXCP0E_PAGE, - .error_code = error_code, - .cr2 = addr, - }; - return false; + return true; } static G_NORETURN void raise_stage2(CPUX86State *env, TranslateFault *err, @@ -491,10 +130,11 @@ static G_NORETURN void raise_stage2(CPUX86State *env, TranslateFault *err, cpu_vmexit(env, SVM_EXIT_NPF, exit_info_1, retaddr); } -static bool get_physical_address(CPUX86State *env, vaddr addr, - MMUAccessType access_type, int mmu_idx, - TranslateResult *out, TranslateFault *err, - uint64_t ra) +static +bool x86_cpu_get_physical_address(CPUX86State *env, vaddr addr, + MMUAccessType access_type, int mmu_idx, + TranslateResult *out, + TranslateFault *err, uint64_t ra) { TranslateParams in; bool use_stage2 = env->hflags2 & HF2_NPT_MASK; @@ -511,7 +151,8 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, in.cr3 = env->nested_cr3; in.pg_mode = env->nested_pg_mode; in.mmu_idx = - env->nested_pg_mode & PG_MODE_LMA ? MMU_USER64_IDX : MMU_USER32_IDX; + env->nested_pg_mode & PG_MODE_LMA ? + MMU_USER64_IDX : MMU_USER32_IDX; in.ptw_idx = MMU_PHYS_IDX; if (!mmu_translate(env, &in, out, err, ra)) { @@ -565,8 +206,8 @@ bool x86_cpu_tlb_fill(CPUState *cs, vaddr addr, int size, TranslateResult out; TranslateFault err; - if (get_physical_address(env, addr, access_type, mmu_idx, &out, &err, - retaddr)) { + if (x86_cpu_get_physical_address(env, addr, access_type, mmu_idx, &out, + &err, retaddr)) { /* * Even if 4MB pages, we map only one 4KB page in the cache to * avoid filling it too fast.