From patchwork Tue Oct 22 10:56:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 13845547 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8BB419C543 for ; Tue, 22 Oct 2024 10:56:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729594590; cv=none; b=rd9Dq5L8BtNEXRObyqW9lMaluveQshpddBJHY7t0ekQGmSVLR+DY44gh/q8XvNAAmxJ5jecXDdng/eEOm4jN8VRNIB9LW5FvyqUYMsSQFyRpAtP57i0UzaH6NnHRfhi9HEVCU67J2lI/kRVznn1yzmk7xYmZn56uRtd1nSyfFv4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729594590; c=relaxed/simple; bh=k4DUf5ld0t+Rj337VJF2t4M25SBJ9Xw6VHcscakA8eI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=qLhswaFaLwsh5eTsUoG3gUEVpZz6xslsoKNkbkUCHUmmI/MiB6guk5GWVXoyfnWt3agV62rO2cTabFYC+3RxjNQm1KYrw3fysqkbUgz14LceRFgZAx3qrOUIEPY1cPSeVtGDoPpMljRlNv71y0hGrqf9+LE6xULeE1LxqPdQD48= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=p+ENd5Vt; arc=none smtp.client-ip=209.85.208.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="p+ENd5Vt" Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-5c9362c26d8so11257442a12.1 for ; Tue, 22 Oct 2024 03:56:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1729594586; x=1730199386; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CnSO5MY5LW+ONfwtsvf8QoavFQHBZ1D8g95DfEWZbMI=; b=p+ENd5VtXICbWO6JW0bCZ4tbpeq5mAKAsa/ypnza2ffgkBGTD0KPdALhEC8YcvF6fD jyIBW5t2mfd9aPSWtzBrx4GdQOAJltPH/D10aocPDQt4KddOKLXFps2JW6BpBJKZNbNy atNDALJ/+Adw5UHZM5ZkTX0NNLW8gonouF3vFprpylIQDX37zVJfWl4f9+tbxHviOO4f zHapx5OJC1ErhsCu6Hr0PbE7OP6T+liNML+uBhR9BF1C4OLqXTbvDgMOJvpl3FRG//Db 2tb4otUL+SmiuGAOxklV1arM2Zd8pyfslpfwrr25DB/wZW+6HZ8KML8PUMBKPn67N0G2 R7rQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729594586; x=1730199386; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CnSO5MY5LW+ONfwtsvf8QoavFQHBZ1D8g95DfEWZbMI=; b=EpXcCiOh7cT1edyNFiFe5s+s2tTyc9c2xHqhKObjyRqFp83MyOlk8r8eAetXx6gUBw X9Qe7xCB4zHgl8kpkDOWC/7m9vjZ4fz8D+eU5rck4lnyKOoLIv/gRdBmatx9CIlZHrk1 5lTSjT9+jOD9KlFWu6K1SZZeFxBtMHIYxsk6epYw/W/Ba3eO6/FlTZV59TwFSQEKKwrF Hk8Vt1AZ4RpyxkTME9td84miIEbP3XmjB5vE09ttMtQySpQgeTu9U0zGpJew1NO3P9vX pXH8umwVuook7ztKZRxMOVdsa7PRzP9Rrlyz4dUuJIkOTy19mbdpvwn1PygGG9zsul81 hFnQ== X-Forwarded-Encrypted: i=1; AJvYcCWMZkKOuZ8eQN2ttD6NOdxCmpLcziBx1316ACnvLK/5p09TaUORo7shIrfHblpJ3vmKuws=@vger.kernel.org X-Gm-Message-State: AOJu0YzKvHB0KUNhx1uOxJsnigbprh4tmLsauBSZk7uLtzkQkN5Wga7w oRO7O/behx5LlKq2MGc5E2ztDLlLqKNEhG61KPjXnoqBCRD9v5D7bEZny8PjSXk= X-Google-Smtp-Source: AGHT+IE/K5pr2g0U8BMqLhyIWw7NwWY4cgjvbPeZV6ZYZQj/JnrnlV+snPJRyVIL9w1vqQwf5cuM+A== X-Received: by 2002:a17:907:944e:b0:a9a:3f9c:f89d with SMTP id a640c23a62f3a-a9aaa5ee850mr282522466b.34.1729594586066; Tue, 22 Oct 2024 03:56:26 -0700 (PDT) Received: from draig.lan ([85.9.250.243]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a91559c06sm324381166b.138.2024.10.22.03.56.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2024 03:56:19 -0700 (PDT) Received: from draig.lan (localhost [IPv6:::1]) by draig.lan (Postfix) with ESMTP id 5CB715F9D0; Tue, 22 Oct 2024 11:56:15 +0100 (BST) From: =?utf-8?q?Alex_Benn=C3=A9e?= To: qemu-devel@nongnu.org Cc: Beraldo Leal , Laurent Vivier , Wainer dos Santos Moschetta , Mahmoud Mandour , Jiaxun Yang , =?utf-8?q?Alex_Benn=C3=A9e?= , Yanan Wang , Thomas Huth , John Snow , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , qemu-arm@nongnu.org, =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , devel@lists.libvirt.org, Cleber Rosa , kvm@vger.kernel.org, =?utf-8?q?Philippe_Mat?= =?utf-8?q?hieu-Daud=C3=A9?= , Alexandre Iooss , Peter Maydell , Richard Henderson , Riku Voipio , Zhao Liu , Marcelo Tosatti , "Edgar E. Iglesias" , Marcel Apfelbaum , Pierrick Bouvier , Paolo Bonzini Subject: [PATCH v2 08/20] accel/tcg: add tracepoints for cpu_loop_exit_atomic Date: Tue, 22 Oct 2024 11:56:02 +0100 Message-Id: <20241022105614.839199-9-alex.bennee@linaro.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241022105614.839199-1-alex.bennee@linaro.org> References: <20241022105614.839199-1-alex.bennee@linaro.org> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We try to avoid using cpu_loop_exit_atomic as it brings in an all-core sync point. However on some cpu/kernel/benchmark combinations it is starting to show up in the performance profile. To make it easier to see whats going on add tracepoints for the slow path so we can see what is triggering the wait. It seems for a modern CPU it can be quite a bit, for example: ./qemu-system-aarch64 \ -machine type=virt,virtualization=on,pflash0=rom,pflash1=efivars,gic-version=max \ -smp 4 \ -accel tcg \ -device virtio-net-pci,netdev=unet \ -device virtio-scsi-pci \ -device scsi-hd,drive=hd \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -blockdev driver=raw,node-name=hd,file.driver=host_device,file.filename=/dev/zen-ssd2/trixie-arm64,discard=unmap \ -serial mon:stdio \ -blockdev node-name=rom,driver=file,filename=(pwd)/pc-bios/edk2-aarch64-code.fd,read-only=true \ -blockdev node-name=efivars,driver=file,filename=$HOME/images/qemu-arm64-efivars \ -m 8192 \ -object memory-backend-memfd,id=mem,size=8G,share=on \ -kernel /home/alex/lsrc/linux.git/builds/arm64/arch/arm64/boot/Image -append "root=/dev/sda2 console=ttyAMA0 systemd.unit=benchmark-stress-ng.service" \ -display none -d trace:load_atom\*_fallback,trace:store_atom\*_fallback With: -cpu neoverse-v1,pauth-impdef=on => 2203343 With: -cpu cortex-a76 => 0 Signed-off-by: Alex Bennée Reviewed-by: Richard Henderson Reviewed-by: Pierrick Bouvier --- v2 - 0x prefixes for ra as per checkpatch --- accel/tcg/user-exec.c | 2 +- accel/tcg/ldst_atomicity.c.inc | 9 +++++++++ accel/tcg/trace-events | 12 ++++++++++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index 51b2c16dbe..aa8af52cc3 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -29,7 +29,7 @@ #include "exec/page-protection.h" #include "exec/helper-proto.h" #include "qemu/atomic128.h" -#include "trace/trace-root.h" +#include "trace.h" #include "tcg/tcg-ldst.h" #include "internal-common.h" #include "internal-target.h" diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc index 134da3c1da..c735add261 100644 --- a/accel/tcg/ldst_atomicity.c.inc +++ b/accel/tcg/ldst_atomicity.c.inc @@ -168,6 +168,7 @@ static uint64_t load_atomic8_or_exit(CPUState *cpu, uintptr_t ra, void *pv) #endif /* Ultimate fallback: re-execute in serial context. */ + trace_load_atom8_or_exit_fallback(ra); cpu_loop_exit_atomic(cpu, ra); } @@ -212,6 +213,7 @@ static Int128 load_atomic16_or_exit(CPUState *cpu, uintptr_t ra, void *pv) } /* Ultimate fallback: re-execute in serial context. */ + trace_load_atom16_or_exit_fallback(ra); cpu_loop_exit_atomic(cpu, ra); } @@ -519,6 +521,7 @@ static uint64_t load_atom_8(CPUState *cpu, uintptr_t ra, if (HAVE_al8) { return load_atom_extract_al8x2(pv); } + trace_load_atom8_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); default: g_assert_not_reached(); @@ -563,6 +566,7 @@ static Int128 load_atom_16(CPUState *cpu, uintptr_t ra, break; case MO_64: if (!HAVE_al8) { + trace_load_atom16_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } a = load_atomic8(pv); @@ -570,6 +574,7 @@ static Int128 load_atom_16(CPUState *cpu, uintptr_t ra, break; case -MO_64: if (!HAVE_al8) { + trace_load_atom16_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } a = load_atom_extract_al8x2(pv); @@ -897,6 +902,7 @@ static void store_atom_2(CPUState *cpu, uintptr_t ra, g_assert_not_reached(); } + trace_store_atom2_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } @@ -961,6 +967,7 @@ static void store_atom_4(CPUState *cpu, uintptr_t ra, return; } } + trace_store_atom4_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); default: g_assert_not_reached(); @@ -1029,6 +1036,7 @@ static void store_atom_8(CPUState *cpu, uintptr_t ra, default: g_assert_not_reached(); } + trace_store_atom8_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } @@ -1107,5 +1115,6 @@ static void store_atom_16(CPUState *cpu, uintptr_t ra, default: g_assert_not_reached(); } + trace_store_atom16_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } diff --git a/accel/tcg/trace-events b/accel/tcg/trace-events index 4e9b450520..14f638810c 100644 --- a/accel/tcg/trace-events +++ b/accel/tcg/trace-events @@ -12,3 +12,15 @@ memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64 # translate-all.c translate_block(void *tb, uintptr_t pc, const void *tb_code) "tb:%p, pc:0x%"PRIxPTR", tb_code:%p" + +# ldst_atomicity +load_atom2_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +load_atom4_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +load_atom8_or_exit_fallback(uintptr_t ra) "ra:0x%"PRIxPTR"" +load_atom8_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +load_atom16_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +load_atom16_or_exit_fallback(uintptr_t ra) "ra:0x%"PRIxPTR"" +store_atom2_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +store_atom4_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +store_atom8_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +store_atom16_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR""