From patchwork Wed Oct 2 16:07:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 13820019 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A79B8CF6D3B for ; Wed, 2 Oct 2024 16:09:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22E434401C0; Wed, 2 Oct 2024 12:09:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 191124401B5; Wed, 2 Oct 2024 12:09:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED4C54401C0; Wed, 2 Oct 2024 12:09:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C3AD64401B5 for ; Wed, 2 Oct 2024 12:09:12 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4F1971C6862 for ; Wed, 2 Oct 2024 16:09:12 +0000 (UTC) X-FDA: 82629146544.07.EC56BB4 Received: from forwardcorp1d.mail.yandex.net (forwardcorp1d.mail.yandex.net [178.154.239.200]) by imf29.hostedemail.com (Postfix) with ESMTP id 2DDBF12002E for ; Wed, 2 Oct 2024 16:09:09 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=F5+i2ZBr; spf=pass (imf29.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.200 as permitted sender) smtp.mailfrom=arbn@yandex-team.com; dmarc=pass (policy=none) header.from=yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727885310; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r6KOHiXGwu9wJr+jMWOsPrgnVfkjQk3UoBVxWjKVRQc=; b=BkmjWHnG6Q2PHQtWVXTemH0Rhae4fmREgHqqd6v1fjVHopNg9Msm/GnieHA0Up1DlP+Hyd jrhEhsE4D99Yl8NaZTWxw+EQxe5G9J9fB5dmCin5sizcL4fPR6EDxy51PjRXbyOaiGeLmf LxcBpUKEKmb8RlF2wI8xbWBc6VpCFbc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=F5+i2ZBr; spf=pass (imf29.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.200 as permitted sender) smtp.mailfrom=arbn@yandex-team.com; dmarc=pass (policy=none) header.from=yandex-team.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727885310; a=rsa-sha256; cv=none; b=QNcBVxHMZT19Yb+tOn/jV6uU7GCpa6WWv9q+h2KgBcTiUh1l9r5r31UOggsojycbQVH35y ashHkx2ZDxGDPAp0MMMWDGBo9TtxAIGwQnCdCrPPzUu2erMgAblPRI2ZavOJBRMolS/+O1 uqsdUrtLAJdcnbuXckY6CLt8DRU8RuA= Received: from mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net [IPv6:2a02:6b8:c42:b1cb:0:640:2a1e:0]) by forwardcorp1d.mail.yandex.net (Yandex) with ESMTPS id 9F16460949; Wed, 2 Oct 2024 19:09:08 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id Z8emWD2IhiE0-ueku0Pjg; Wed, 02 Oct 2024 19:09:07 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1727885347; bh=r6KOHiXGwu9wJr+jMWOsPrgnVfkjQk3UoBVxWjKVRQc=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=F5+i2ZBr2Uq7UPOSwZbgC7ZA0tYzgqL47KGhDszAsbQa6F6J/7vbR7Ko3bOmn6riv 8JQEDe8b3xruR/nSABrpSw17jrHYUuNZVIz8HuRdoAPk5CYiB7gV3oWoy60iWQ6ke4 u3C0/g9G/pAFLeQBOSS9YnP/FYBQuMXxxe4R9D2w= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , linux-trace-kernel@vger.kernel.org, valesini@yandex-team.com, Andrey Ryabinin Subject: [RFC PATCH 7/7] trace: migrate trace buffers across kexec Date: Wed, 2 Oct 2024 18:07:22 +0200 Message-ID: <20241002160722.20025-8-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241002160722.20025-1-arbn@yandex-team.com> References: <20241002160722.20025-1-arbn@yandex-team.com> MIME-Version: 1.0 X-Yandex-Filter: 1 X-Rspam-User: X-Stat-Signature: gqzg8wyi935sr7k8gqphk8841c6po8cg X-Rspamd-Queue-Id: 2DDBF12002E X-Rspamd-Server: rspam11 X-HE-Tag: 1727885349-195764 X-HE-Meta: U2FsdGVkX1+8cCkMFbRX4S+9I/jAVo3MJvn2UGwFpmaI8SaH94MhCwSpNGlu6QYFN+WjPCrvnUjd+AnxjXHuGr+Cbe+C3painLine3Pd4ppdzSBjRWpXBfXjVaaCCkxMdPisXuUJZumUFqRDjCVsreiX07AGL4bmhLfixnH7Fa4ARSD4N/u3XJjHs2EUfyOOgeERAgxstydN7BisPNTcyYmRdYBOP36NVvGHl02fy33atXwUwAXkfZl6A1Uk7lQTHjMLKai8ECglgthSVpfuP6jnOC44OPMyPGGC5iIHhf2neprF0ElR7V44/BNghDq0TRmpI9lWdxZwufp3AKr+OjvslosCZRXWq2Mn7DMJcr6gfoV3sRxasQ1HJRuksepeYli27NI2AQxmLHxhe7xF0v4pc7JXnTs0kQr39INchcgrVrYegBIy32EU4qFlZceeOy6KAPXASdT/yVZVRTrEJ/C+hKClmYU/ypOeP0R6kv5ieXcIMy561WNia19gwIz6xwIABp2y/n3n3xxbMl8iVbzlgvgwFLVNY0Lu2cXtd1cUELzfJex3k9ahx8Dm0z3RD/zzAchgqVBjMftIeMXqxyWjtvrjgoihwafmrM5r1rnihapWLQ2H50Yj+vLa5fkzCTXwHyyt3dunH6eQd3N4xUXPdoOD7O+djK3LjOEDjgFdZk7c0Kujbsrq5A/Ul8kZbtk/PExMiyiQVjDNvXUTAH92UbR/6/PFSFJ74/cTj6BbCN/AAqxKqT7JZuHX0d4YE1sMm78WQHIze3eXhkBuqRFt6Lb5EsVHBeYONp1kYOe7TiN39DWFkgbG40RpnQ6q7jGPhonMp3TmB6kXuK9mhT3S1PC/SivrqxQlClPkxfSzNuuK6Cl5PeXjhgmT5LA2RYH0ESQEzDGe+yfwnoK7i2ygahJ4LWIGSguVdAuKXROuMny3eVV50b8lB3SZ2snCGAqkssBttkS5FvV/0Sn u73WA820 3TM1di2SVb2DPU0t3TJCRBzBgd14JXPskbAamrgp/UBp9kgsRyQM7xc6TOhlkUxuZEHjjsYtl8+GnwPwoMrk6q1SPWk73RHYWvoS6SFyx/6oXpR87hPtvEm6ycs/qnjzVUms85PuAdiw/mKwGFsinJsiVQGhy6/Z4jrALMugIuPkI1XQ60nWe1tQMfT3tC4TwEljrBvSjprdGgmlCDbL0ij+RzmT/rU5o2mhfFzn4hfU74QGIbjHq14MhpY3kqKa0KKZj0HfiMSxRbRTRyoiUEXDb4YNkRSxNr9qmtYH8wbnoxJ/d33cliMWbpA8JHMnGg/mfp7gP9870u8fUuUytRSg63x2JouEsQpobBLry4e1lFx0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is a demonstration of kstate capabilities to migrate across kexec something more complex rather than simple structure like in the lib/test_kstate.c module. Here we migrate tracing_on/current_trace and content of the trace buffers to the new kernel The 'global_trace_state' describes 'tracing_on' and 'current_trace' states. The 'trace_buffer' kstate field in 'global_trace_state' points to 'kstate_trace_buffer' describing the state of ring buffers. The code in kstate_rb_[save/restore]() saves and restore list of buffer pages. It turned out to be somewhat hacky and ugly, partially because kstate currently can't migrate slab data. So because of that we have to save/restore positions of commit_page/reader_page/etc in the list of pages. We could probably teach kstate to migrate slab pages, preserving contents at the same address, which would make easier to migrate lists like the ring buffer list in the trace, as we would need to save/restore only pointer. Signed-off-by: Andrey Ryabinin --- include/linux/kstate.h | 4 + kernel/trace/ring_buffer.c | 189 +++++++++++++++++++++++++++++++++++++ kernel/trace/trace.c | 81 ++++++++++++++++ 3 files changed, 274 insertions(+) diff --git a/include/linux/kstate.h b/include/linux/kstate.h index 2ddbe41a1f171..ae807a75a02f8 100644 --- a/include/linux/kstate.h +++ b/include/linux/kstate.h @@ -32,6 +32,10 @@ enum kstate_ids { KSTATE_PAGE_ID, KSTATE_RSVD_MEM_ID, KSTATE_TEST_ID, + KSTATE_TRACE_ID, + KSTATE_TRACE_BUFFER_ID, + KSTATE_TRACE_RING_BUFFER_ID, + KSTATE_TRACE_BUFFER_PAGE_ID, KSTATE_LAST_ID = -1, }; diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 77dc0b25140e6..9a8692d7d960c 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -16,6 +16,7 @@ #include #include #include +#include #include /* for self test */ #include #include @@ -1467,6 +1468,194 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer, } } +#ifdef CONFIG_KSTATE +static int kstate_bpage_save(void *mig_stream, void *obj, const struct kstate_field *field) +{ + struct buffer_page *bpage = obj; + + kstate_register_page(virt_to_page(bpage->page), bpage->order); + return 0; + +} +struct kstate_description kstate_buffer_page = { + .name = "buffer_page", + .id = KSTATE_TRACE_BUFFER_PAGE_ID, + .fields = (const struct kstate_field[]) { + KSTATE_SIMPLE(write, struct buffer_page), + KSTATE_SIMPLE(read, struct buffer_page), + KSTATE_SIMPLE(entries, struct buffer_page), + KSTATE_SIMPLE(real_end, struct buffer_page), + KSTATE_SIMPLE(order, struct buffer_page), + KSTATE_SIMPLE(page, struct buffer_page), + { + .name = "buffer_page", + .flags = KS_CUSTOM, + .save = kstate_bpage_save, + .size = (sizeof(struct buffer_page)), + }, + KSTATE_END_OF_LIST(), + }, +}; + +static void restore_pages_positions(void **mig_stream, + struct ring_buffer_per_cpu *cpu_buffer) +{ + struct list_head *tmp; + struct list_head *head = rb_list_head(cpu_buffer->pages); + unsigned long commit_page_nr, reader_page_nr, + head_page_nr, tail_page_nr; + int i = 0; + + commit_page_nr = kstate_get_ulong(mig_stream); + reader_page_nr = kstate_get_ulong(mig_stream); + head_page_nr = kstate_get_ulong(mig_stream); + tail_page_nr = kstate_get_ulong(mig_stream); + + for (tmp = head;;) { + struct buffer_page *page = (struct buffer_page *)tmp; + + if (commit_page_nr == i) + cpu_buffer->commit_page = page; + if (reader_page_nr == i) + cpu_buffer->reader_page = page; + if (head_page_nr == i) + cpu_buffer->head_page = page; + if (tail_page_nr == i) + cpu_buffer->tail_page = page; + i++; + tmp = rb_list_head(tmp->next); + if (tmp == head) + break; + } +} + +static int kstate_rb_restore(void *mig_stream, void *obj, + const struct kstate_field *field) +{ + struct ring_buffer_per_cpu *cpu_buffer = obj; + LIST_HEAD(pages); + void *stream_start = mig_stream; + struct buffer_page *page; + struct list_head *tmp; + struct list_head *head = rb_list_head(cpu_buffer->pages); + int i = 0; + + while (kstate_get_byte(&mig_stream)) { + int j = 0; + bool page_exists = false; + + for (tmp = rb_list_head(head->next); tmp != head; + tmp = rb_list_head(tmp->next)) { + if (j == i) { + page_exists = true; + page = (struct buffer_page *)tmp; + break; + } + j++; + } + if (!page_exists) { + struct buffer_page *bpage; + + bpage = kzalloc_node(ALIGN(sizeof(*bpage), + cache_line_size()), GFP_KERNEL, + cpu_to_node(cpu_buffer->cpu)); + list_add(&bpage->list, &pages); + page = bpage; + } + mig_stream = restore_kstate((struct kstate_entry *)mig_stream, + i++, field->ksd, page); + } + + restore_pages_positions(&mig_stream, cpu_buffer); + + return mig_stream - stream_start; +} + +static int kstate_rb_save(void *mig_stream, void *obj, + const struct kstate_field *field) +{ + struct ring_buffer_per_cpu *cpu_buffer = obj; + struct list_head *tmp; + struct list_head *head = rb_list_head(cpu_buffer->pages); + void *stream_start = mig_stream; + unsigned long commit_page_nr, reader_page_nr, + head_page_nr, tail_page_nr; + int i = 0; + + + for (tmp = head;;) { + struct buffer_page *page = (struct buffer_page *)tmp; + + mig_stream = kstate_save_byte(mig_stream, 1); + mig_stream = save_kstate(mig_stream, i, field->ksd, page); + + if (cpu_buffer->commit_page == page) + commit_page_nr = i; + if (cpu_buffer->reader_page == page) + reader_page_nr = i; + if (cpu_buffer->head_page == page) + head_page_nr = i; + if (cpu_buffer->tail_page == page) + tail_page_nr = i; + i++; + tmp = rb_list_head(tmp->next); + if (tmp == head) + break; + } + + mig_stream = kstate_save_byte(mig_stream, 0); + + /* save pages positions */ + mig_stream = kstate_save_ulong(mig_stream, commit_page_nr); + mig_stream = kstate_save_ulong(mig_stream, reader_page_nr); + mig_stream = kstate_save_ulong(mig_stream, head_page_nr); + mig_stream = kstate_save_ulong(mig_stream, tail_page_nr); + + return mig_stream - stream_start; +} + +struct kstate_description kstate_ring_buffer_per_cpu = { + .name = "ring_buffer_per_cpu", + .id = KSTATE_TRACE_RING_BUFFER_ID, + .state_list = LIST_HEAD_INIT(kstate_ring_buffer_per_cpu.state_list), + .fields = (const struct kstate_field[]) { + KSTATE_SIMPLE(entries, struct ring_buffer_per_cpu), + KSTATE_SIMPLE(entries_bytes, struct ring_buffer_per_cpu), + { + .name = "buffer_pages", + .flags = KS_CUSTOM, + .size = (sizeof(struct ring_buffer_per_cpu)), + .ksd = &kstate_buffer_page, + .save = kstate_rb_save, + .restore = kstate_rb_restore, + }, + KSTATE_END_OF_LIST(), + }, +}; + +static int nr_ring_buffers(void) +{ + return nr_cpu_ids; +} + +struct kstate_description kstate_trace_buffer = { + .name = "trace_buffer", + .id = KSTATE_TRACE_BUFFER_ID, + .state_list = LIST_HEAD_INIT(kstate_trace_buffer.state_list), + .fields = (const struct kstate_field[]) { + { + .name = "ring_buffers", + .flags = KS_STRUCT|KS_POINTER|KS_ARRAY_OF_POINTER, + .size = (sizeof(struct ring_buffer_per_cpu *)), + .offset = offsetof(struct trace_buffer, buffers), + .count = nr_ring_buffers, + .ksd = &kstate_ring_buffer_per_cpu, + }, + KSTATE_END_OF_LIST(), + } +}; +#endif + static void rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer, struct buffer_page *bpage) { diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c01375adc4714..bb07d716beab4 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -10621,6 +10622,84 @@ __init static void enable_instances(void) } } +#ifdef CONFIG_KSTATE +static int cur_trace_save(void *mig_stream, void *obj, + const struct kstate_field *field) +{ + struct trace_array *tr = obj; + + return strscpy(mig_stream, tr->current_trace->name, 100) + 1; +} + +static int cur_trace_restore(void *mig_stream, void *obj, + const struct kstate_field *field) +{ + struct trace_array *tr = obj; + + tracing_set_tracer(tr, mig_stream); + return strlen(mig_stream) + 1; +} + +static int tracing_on_save(void *mig_stream, void *obj, + const struct kstate_field *field) +{ + struct trace_array *tr = obj; + + *(u8 *)mig_stream = (u8)tracer_tracing_is_on(tr); + return sizeof(u8); + +} + +static int tracing_on_restore(void *mig_stream, void *obj, + const struct kstate_field *field) +{ + struct trace_array *tr = obj; + u8 on = *(u8 *)mig_stream; + + if (on) + tracer_tracing_on(tr); + else + tracer_tracing_off(tr); + + return sizeof(on); +} + +extern struct kstate_description kstate_trace_buffer; + +struct kstate_description global_trace_state = { + .name = "trace_state", + .id = KSTATE_TRACE_ID, + .version_id = 1, + .state_list = LIST_HEAD_INIT(global_trace_state.state_list), + .fields = (const struct kstate_field[]) { + { + .name = "tracing_on", + .flags = KS_CUSTOM, + .version_id = 0, + .size = sizeof(struct trace_array), + .save = tracing_on_save, + .restore = tracing_on_restore, + }, + { + .name = "current_trace", + .flags = KS_CUSTOM, + .version_id = 0, + .size = sizeof(struct trace_array), + .save = cur_trace_save, + .restore = cur_trace_restore, + + }, + { + .name = "trace_buffer", + .flags = KS_STRUCT|KS_POINTER, + .offset = offsetof(struct trace_array, array_buffer.buffer), + .ksd = &kstate_trace_buffer, + }, + KSTATE_END_OF_LIST() + }, +}; +#endif + __init static int tracer_alloc_buffers(void) { int ring_buf_size; @@ -10848,6 +10927,8 @@ __init static int late_trace_init(void) tracing_set_default_clock(); clear_boot_tracer(); + kstate_register(&global_trace_state, &global_trace); + return 0; }