From patchwork Wed Dec 13 00:04:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13490148 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF51CC4332F for ; Wed, 13 Dec 2023 00:07:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4306C6B03F5; Tue, 12 Dec 2023 19:07:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E25A6B03F7; Tue, 12 Dec 2023 19:07:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 233496B03F6; Tue, 12 Dec 2023 19:07:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0F0F86B03F4 for ; Tue, 12 Dec 2023 19:07:29 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E0BA31C145D for ; Wed, 13 Dec 2023 00:07:28 +0000 (UTC) X-FDA: 81559855776.28.ADAF642 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf17.hostedemail.com (Postfix) with ESMTP id A326240006 for ; Wed, 13 Dec 2023 00:07:26 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=OFgX5URF; spf=pass (imf17.hostedemail.com: domain of "prvs=704f7accf=graf@amazon.de" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=704f7accf=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702426046; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DMVF+g90vjiGQ2+mH0UneD/R5uFZpo7f74QocXGx4Fo=; b=3OdJSaqhNiGcMYpqLKCOzvkEgZcQX+vf0LHTf7aKbKwgE7FA02QsMEEekIEezf7IWc7VRn K9nFF37F3JaQFKmWwZz8LqGw0jY+EnxMEJO9h4isLPOmwBXnb7c0z//BoFNG/TFL6qvNIF paNuliE1581hBm4OIk69ILNqm1wjxoY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702426046; a=rsa-sha256; cv=none; b=Zu6PKkbL649aSWmC4Y/3ico5CLyGjE20BzEzFq9n5pzAMb/zSMyfYFQB7NV7IaM+TmYTCD tWAdh7DzZnpLhOf4B0s38Twew3vR3TgB4XQskH9X5ShsWC50fXmwrqxi1MbbfstXwElpwm P9dhTH7C3B0N6VaAp5eUeUAvZfeqFBE= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=OFgX5URF; spf=pass (imf17.hostedemail.com: domain of "prvs=704f7accf=graf@amazon.de" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=704f7accf=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1702426047; x=1733962047; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DMVF+g90vjiGQ2+mH0UneD/R5uFZpo7f74QocXGx4Fo=; b=OFgX5URFuqPb5CBW19rJLjI3h/YZJwxiRlImzdUziEiBGCOJg/7TXYrd exiKw+A4RGoS500K8VZDV7m+Ym2k8ilksbVmxHgbCH4rCOx78fZNKD5on V4L+I0ReaW+DD++67by5KOS1SAg22uAKdBaPLathD/PsDEvei06pB4Wgo U=; X-IronPort-AV: E=Sophos;i="6.04,271,1695686400"; d="scan'208";a="259290097" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-iad-1a-m6i4x-bbc6e425.us-east-1.amazon.com) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Dec 2023 00:07:24 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (iad7-ws-svc-p70-lb3-vlan3.iad.amazon.com [10.32.235.38]) by email-inbound-relay-iad-1a-m6i4x-bbc6e425.us-east-1.amazon.com (Postfix) with ESMTPS id 9684F806AE; Wed, 13 Dec 2023 00:07:12 +0000 (UTC) Received: from EX19MTAUWC001.ant.amazon.com [10.0.38.20:42528] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.34.165:2525] with esmtp (Farcaster) id d4eed612-db0d-44d2-b4c4-2cf51a2e8457; Wed, 13 Dec 2023 00:07:11 +0000 (UTC) X-Farcaster-Flow-ID: d4eed612-db0d-44d2-b4c4-2cf51a2e8457 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 13 Dec 2023 00:07:11 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 13 Dec 2023 00:07:07 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH 12/15] tracing: Recover trace buffers from kexec handover Date: Wed, 13 Dec 2023 00:04:49 +0000 Message-ID: <20231213000452.88295-13-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231213000452.88295-1-graf@amazon.com> References: <20231213000452.88295-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D037UWB002.ant.amazon.com (10.13.138.121) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Queue-Id: A326240006 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 54sa671puaskxobk1ei1hyn6841immkx X-HE-Tag: 1702426046-422422 X-HE-Meta: U2FsdGVkX19ZjDFymLwkWGAAslkszGZhbiy8y825RSM58Qeh5OHUEcMyvcI2tmKG0tvfTHVXekkgKQ9LUnNVwbljLnuLx8ywmMxu33hqVV5kkS5sbmXXRwSSkkrFG0XALClohu2VEkoL+l9hSL/hazr1rtoPJrJAfLGwxhV6PqX1hWijDZGOgjASllQii2vxM6S5VmHof8Hd4Ynz4bJ2D0VHQwFcBoTi2F1JmiGt1jOKsp6a4dZI4GGsJ++tFHUfd5KS4G9NMeVw4HL5x5rbXUM7cabtLOYh0xpQ1vflMxmJZ0Rjpzv8tl04ijoGiClgc9IupkEabfok2KXY3j6gJkzgmN2pM83D8QuRpjfxDOfGKrP8bfBmasFqzK5BrZZH1l2qJBgmKeKYTjarsfmjN852bBM23ui7v4fWnl3bXC67bVt4LiQoUS9uQT8kgeT63sihm/j02aaUkWLys556rjJFhtBvayCRORlNrxjNMr3zHiUQ5isaGeM+hKY1Lthoupfn8n1frzm2m7FRzR0FgKLioOpfh8er5E/ucFsQJcFcdPpOvkB2gph0MKzNT+WS+Of9ovNLAiQkdAxQFfhZDvhYdtoi0PHWw71sdzgYKiF+ClsQ3scYtQWiInx/z+5lfudWyrQOP5otlR4ZiJ++z3zYfG5K9zKRoG66dHQWuYgdQzm+7+VnEGdjIN52Xoub+++CRndDy/VRE80AcokWk1cV5YwAlvDPTje6lUeXMJOceP+XDWmLP3hEd5jMmM0Oniruxy2yY0ru7h78x1FxOA6xlQL3i/awG22bk59r+HsOCp8rXPlP4Kf+baGhh+mgst/Mz5FRHZ/SGnoUsFW7vsJU8DVUgDNbcNo1W0Q9PKiX5WnBOfx/qwB83TIZ9WGFUqHXDEGojUsQh3S9IURtZZAqtnzMaBiU7nol8K1Ighckj07D0tt+zZdwVuWeA5EuRgBbbN1tItuTH/CbZFR /05OwYWi q8fKZaOC2xKa5cjGgYoa7kYI5u0HonMvrdVVZERC7v0hgwlkFV+8jeLdiRStVvPec57agsjuLJ4CiwnKWOrlCz36sN4rKhfcF8pEDVxOCUeHyBvHBT4x88xBRHXwmsi8eOAV+vDAwGWeUztOCWRhpwypul/EOtC84HdBwZglwEP67W0bBaRUW7eBNlh+3JyrSdcOc2wL+Oqe1DYVBiT9VFG0SQ/blBmEHtIyxmcCPeSR0WBVNdSbvnhweQ/MuLgxECG9JkELzqNWeT0ZXsFRKPv8aKYdVHWkv7lEMCgDIE/tBA9DdoR7QSQa9X3JzM59cmhbI5NCRI+JaZmVP0W5cnde1UX6+tIzctrWaMROiU0/Lq5+iWP6WUYbBf0rPNLCmk6uDkXsFcSDbJ0tNDlM88hTRwFn9EJSaaiFFcSpoBrqRwbYn2HftSF7v1Y7OKQJhX/vX3IfjfdvGjyzNIkcdO+Jh/Cc07SYKzcEp+7tPkDCPnlOVUce/XjWqAR3YLugYgsKq0wLcQRUNgUNpfUiyHQPxgOAjTpBcMJ0/XjcXxheJasfUV+RAHVhYxktwi+EzUvKJ2NutboFM6hkk9I5+ySgxQT1Vx9odWmx55qquXaH2Ii59KuQ6aI/DIg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When kexec handover is in place, we now know the location of all previous buffers for ftrace rings. With this patch applied, ftrace reassembles any new trace buffer that carries the same name as a previous one with the same data pages that the previous buffer had. That way, a buffer that we had in place before kexec becomes readable after kexec again as soon as it gets initialized with the same name. Signed-off-by: Alexander Graf --- kernel/trace/ring_buffer.c | 173 ++++++++++++++++++++++++++++++++++++- 1 file changed, 171 insertions(+), 2 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 691d1236eeb1..f3d07cb90762 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -575,6 +575,28 @@ struct ring_buffer_iter { int missed_events; }; +struct trace_kho_cpu { + const struct kho_mem *mem; + uint32_t nr_mems; +}; + +#ifdef CONFIG_FTRACE_KHO +static int trace_kho_replace_buffers(struct ring_buffer_per_cpu *cpu_buffer, + struct trace_kho_cpu *kho); +static int trace_kho_read_cpu(const char *name, int cpu, struct trace_kho_cpu *kho); +#else +static int trace_kho_replace_buffers(struct ring_buffer_per_cpu *cpu_buffer, + struct trace_kho_cpu *kho) +{ + return -EINVAL; +} + +static int trace_kho_read_cpu(const char *name, int cpu, struct trace_kho_cpu *kho) +{ + return -EINVAL; +} +#endif + #ifdef RB_TIME_32 /* @@ -1807,10 +1829,12 @@ struct trace_buffer *__ring_buffer_alloc(const char *name, unsigned long size, unsigned flags, struct lock_class_key *key) { + int cpu = raw_smp_processor_id(); + struct trace_kho_cpu kho = {}; struct trace_buffer *buffer; + bool use_kho = false; long nr_pages; int bsize; - int cpu; int ret; /* keep it in its own cache line */ @@ -1823,6 +1847,12 @@ struct trace_buffer *__ring_buffer_alloc(const char *name, goto fail_free_buffer; nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE); + if (!trace_kho_read_cpu(name, cpu, &kho) && kho.nr_mems > 4) { + nr_pages = kho.nr_mems / 2; + use_kho = true; + pr_debug("Using kho for buffer '%s' on CPU [%03d]", name, cpu); + } + buffer->flags = flags; buffer->clock = trace_clock_local; buffer->reader_lock_key = key; @@ -1843,12 +1873,14 @@ struct trace_buffer *__ring_buffer_alloc(const char *name, if (!buffer->buffers) goto fail_free_cpumask; - cpu = raw_smp_processor_id(); cpumask_set_cpu(cpu, buffer->cpumask); buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); if (!buffer->buffers[cpu]) goto fail_free_buffers; + if (use_kho && trace_kho_replace_buffers(buffer->buffers[cpu], &kho)) + pr_warn("Could not revive all previous trace data"); + ret = cpuhp_state_add_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node); if (ret < 0) goto fail_free_buffers; @@ -5886,7 +5918,9 @@ EXPORT_SYMBOL_GPL(ring_buffer_read_page); */ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node) { + struct trace_kho_cpu kho = {}; struct trace_buffer *buffer; + bool use_kho = false; long nr_pages_same; int cpu_i; unsigned long nr_pages; @@ -5910,6 +5944,12 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node) /* allocate minimum pages, user can later expand it */ if (!nr_pages_same) nr_pages = 2; + + if (!trace_kho_read_cpu(buffer->name, cpu, &kho) && kho.nr_mems > 4) { + nr_pages = kho.nr_mems / 2; + use_kho = true; + } + buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); if (!buffer->buffers[cpu]) { @@ -5917,12 +5957,141 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node) cpu); return -ENOMEM; } + + if (use_kho && trace_kho_replace_buffers(buffer->buffers[cpu], &kho)) + pr_warn("Could not revive all previous trace data"); + smp_wmb(); cpumask_set_cpu(cpu, buffer->cpumask); return 0; } #ifdef CONFIG_FTRACE_KHO +static int trace_kho_replace_buffers(struct ring_buffer_per_cpu *cpu_buffer, + struct trace_kho_cpu *kho) +{ + bool first_loop = true; + struct list_head *tmp; + int err = 0; + int i = 0; + + if (kho->nr_mems != cpu_buffer->nr_pages * 2) + return -EINVAL; + + for (tmp = rb_list_head(cpu_buffer->pages); + tmp != rb_list_head(cpu_buffer->pages) || first_loop; + tmp = rb_list_head(tmp->next), first_loop = false) { + struct buffer_page *bpage = (struct buffer_page *)tmp; + const struct kho_mem *mem_bpage = &kho->mem[i++]; + const struct kho_mem *mem_page = &kho->mem[i++]; + const uint64_t rb_page_head = 1; + struct buffer_page *old_bpage; + void *old_page; + + old_bpage = __va(mem_bpage->addr); + if (!bpage) + goto out; + + if ((ulong)old_bpage->list.next & rb_page_head) { + struct list_head *new_lhead; + struct buffer_page *new_head; + + new_lhead = rb_list_head(bpage->list.next); + new_head = (struct buffer_page *)new_lhead; + + /* Assume the buffer is completely full */ + cpu_buffer->tail_page = bpage; + cpu_buffer->commit_page = bpage; + /* Set the head pointers to what they were before */ + cpu_buffer->head_page->list.prev->next = (struct list_head *) + ((ulong)cpu_buffer->head_page->list.prev->next & ~rb_page_head); + cpu_buffer->head_page = new_head; + bpage->list.next = (struct list_head *)((ulong)new_lhead | rb_page_head); + } + + if (rb_page_entries(old_bpage) || rb_page_write(old_bpage)) { + /* + * We want to recycle the pre-kho page, it contains + * trace data. To do so, we unreserve it and swap the + * current data page with the pre-kho one + */ + old_page = kho_claim_mem(mem_page); + + /* Recycle the old page, it contains data */ + free_page((ulong)bpage->page); + bpage->page = old_page; + + bpage->write = old_bpage->write; + bpage->entries = old_bpage->entries; + bpage->real_end = old_bpage->real_end; + + local_inc(&cpu_buffer->pages_touched); + } else { + kho_return_mem(mem_page); + } + + kho_return_mem(mem_bpage); + } + +out: + return err; +} + +static int trace_kho_read_cpu(const char *name, int cpu, + struct trace_kho_cpu *kho) +{ + void *fdt = kho_get_fdt(); + int mem_len; + int err = 0; + char *path; + int off; + + if (!fdt) + return -ENOENT; + + if (!kho) + return -EINVAL; + + path = kasprintf(GFP_KERNEL, "/ftrace/%s/buffer/cpu%x", name, cpu); + if (!path) + return -ENOMEM; + + pr_debug("Trying to revive trace buffer '%s'", path); + + off = fdt_path_offset(fdt, path); + if (off < 0) { + pr_debug("Could not find '%s' in DT", path); + err = -ENOENT; + goto out; + } + + err = fdt_node_check_compatible(fdt, off, "ftrace,cpu-v1"); + if (err) { + pr_warn("Node '%s' has invalid compatible", path); + err = -EINVAL; + goto out; + } + + kho->mem = fdt_getprop(fdt, off, "mem", &mem_len); + if (!kho->mem) { + pr_warn("Node '%s' has invalid mem property", path); + err = -EINVAL; + goto out; + } + + kho->nr_mems = mem_len / sizeof(*kho->mem); + + /* Should follow "bpage 0, page 0, bpage 1, page 1, ..." pattern */ + if ((kho->nr_mems & 1)) { + err = -EINVAL; + goto out; + } + +out: + kfree(path); + return err; +} + static int trace_kho_write_cpu(void *fdt, struct trace_buffer *buffer, int cpu) { int i = 0;