From patchwork Thu Apr 11 01:25:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625289 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A0329463; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798786; cv=none; b=TJbQ2EH6RnQg49VrmDrBbINZDVjiKYNOLPkJYrzEIe+yDd50lmOe7zM5T5+4QVdIjs9e+mdfx/oJx8NmpvfwCk5x1Q0eMDNZHmW74sU1TQPX98XMwoQGfCvBvtWGqzb6SnidXYb/D3PpIfO7qDXEjq25nFmUpMvN2YPahpcFphg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798786; c=relaxed/simple; bh=IlPKCSnTXcWRdfMJfl+v4+6ZAFaukIF1oqLA7Iz3tPQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=dgFNw6jujPlUW0MTFLiGqZTkvg4jE2Dohzj/AbDKs8SlJovOeJd8o7KNRyM8PGMwxIHYt+9Cl2yiG/uHzc3Y30qqoPYsHjKXqYH7oyUhqFavhoCb3xn9imMLrHi0Fsv3W5qDpKizIX1ec7efzgIO3RZY+JzzejPTdO46Zj1Fx0Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id F270DC433C7; Thu, 11 Apr 2024 01:26:25 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFQ-00000000nUT-1Z1D; Wed, 10 Apr 2024 21:29:04 -0400 Message-ID: <20240411012904.237435058@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:42 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 01/11] ring-buffer: Allow mapped field to be set without mapping References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" In preparation for having the ring buffer mapped to a dedicated location, which will have the same restrictions as user space memory mapped buffers, allow it to use the "mapped" field of the ring_buffer_per_cpu structure without having the user space meta page mapping. When this starts using the mapped field, it will need to handle adding a user space mapping (and removing it) from a ring buffer that is using a dedicated memory range. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 793ecc454039..44b1d5f1a99a 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -5223,6 +5223,9 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer) { struct trace_buffer_meta *meta = cpu_buffer->meta_page; + if (!meta) + return; + meta->reader.read = cpu_buffer->reader_page->read; meta->reader.id = cpu_buffer->reader_page->id; meta->reader.lost_events = cpu_buffer->lost_events; @@ -6167,7 +6170,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu) mutex_lock(&cpu_buffer->mapping_lock); - if (!cpu_buffer->mapped) { + if (!cpu_buffer->mapped || !cpu_buffer->meta_page) { mutex_unlock(&cpu_buffer->mapping_lock); return ERR_PTR(-ENODEV); } @@ -6345,12 +6348,13 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu, */ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); rb_setup_ids_meta_page(cpu_buffer, subbuf_ids); + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); err = __rb_map_vma(cpu_buffer, vma); if (!err) { raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - cpu_buffer->mapped = 1; + cpu_buffer->mapped++; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); } else { kfree(cpu_buffer->subbuf_ids); @@ -6388,7 +6392,7 @@ int ring_buffer_unmap(struct trace_buffer *buffer, int cpu) mutex_lock(&buffer->mutex); raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - cpu_buffer->mapped = 0; + cpu_buffer->mapped--; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); From patchwork Thu Apr 11 01:25:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625288 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A7B98BE7; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798786; cv=none; b=fG5Xnt30JwgNJUX/3JTUg07vrFk8+dDMPHozpqzElDT7nxYF9xiCRn+HrwW78IhHH2DGgNp+LML/TBWO3H+CMspr58idrouzwX1vRr4dc3fANQHU9IVQK8F8S3kCeHbquvQfUkmyPjrrhNB0OoLBi8xqxqWrah+340RDdf9vbng= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798786; c=relaxed/simple; bh=JSFoWL0wLn5bnqm8TQNaSNWoiwj6FIEabvNz0w9yDqw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=NDPJp9OWaDmoBkU7GR/ajHbTKYLgvzJ1xkDmgGNhfx3DEDqMszbOcso4nwVJjXdH2Hmhd5pIW8WY9F+UOLuM0fwIqTu48gbeR0tgUCKKivlF1tdCZ8PCL2Ndn9BVNLIIv03QG35e4WaKYtwt5+RfD3pEmXQqDWJLCa8JCM0nipc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1F4DAC433B2; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFQ-00000000nUy-2F80; Wed, 10 Apr 2024 21:29:04 -0400 Message-ID: <20240411012904.393749229@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:43 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 02/11] ring-buffer: Add ring_buffer_alloc_range() References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" In preparation to allowing the trace ring buffer to be allocated in a range of memory that is persistent across reboots, add ring_buffer_alloc_range(). It takes a contiguous range of memory and will split it up evening for the per CPU ring buffers. If there's not enough memory to handle all CPUs with the minimum size, it will fail to allocate the ring buffer. Signed-off-by: Steven Rostedt (Google) --- include/linux/ring_buffer.h | 17 +++ kernel/trace/ring_buffer.c | 222 ++++++++++++++++++++++++++++++------ 2 files changed, 204 insertions(+), 35 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 96d2140b471e..a50b0223b1d3 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -89,6 +89,11 @@ void ring_buffer_discard_commit(struct trace_buffer *buffer, struct trace_buffer * __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key); +struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flags, + int order, unsigned long start, + unsigned long range_size, + struct lock_class_key *key); + /* * Because the ring buffer is generic, if other users of the ring buffer get * traced by ftrace, it can produce lockdep warnings. We need to keep each @@ -100,6 +105,18 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k __ring_buffer_alloc((size), (flags), &__key); \ }) +/* + * Because the ring buffer is generic, if other users of the ring buffer get + * traced by ftrace, it can produce lockdep warnings. We need to keep each + * ring buffer's lock class separate. + */ +#define ring_buffer_alloc_range(size, flags, order, start, range_size) \ +({ \ + static struct lock_class_key __key; \ + __ring_buffer_alloc_range((size), (flags), (order), (start), \ + (range_size), &__key); \ +}) + typedef bool (*ring_buffer_cond_fn)(void *data); int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, ring_buffer_cond_fn cond, void *data); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 44b1d5f1a99a..a6d22a77f128 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -42,6 +42,9 @@ static void update_pages_handler(struct work_struct *work); +struct ring_buffer_meta { +}; + /* * The ring buffer header is special. We must manually up keep it. */ @@ -340,7 +343,8 @@ struct buffer_page { local_t entries; /* entries on this page */ unsigned long real_end; /* real end of data */ unsigned order; /* order of the page */ - u32 id; /* ID for external mapping */ + u32 id:30; /* ID for external mapping */ + u32 range:1; /* Mapped via a range */ struct buffer_data_page *page; /* Actual data page */ }; @@ -371,7 +375,9 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage) static void free_buffer_page(struct buffer_page *bpage) { - free_pages((unsigned long)bpage->page, bpage->order); + /* Range pages are not to be freed */ + if (!bpage->range) + free_pages((unsigned long)bpage->page, bpage->order); kfree(bpage); } @@ -521,6 +527,9 @@ struct trace_buffer { struct rb_irq_work irq_work; bool time_stamp_abs; + unsigned long range_addr_start; + unsigned long range_addr_end; + unsigned int subbuf_size; unsigned int subbuf_order; unsigned int max_data_size; @@ -1483,9 +1492,67 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) } } +/* + * Take an address, add the meta data size as well as the array of + * array subbuffer indexes, then align it to a subbuffer size. + */ +static unsigned long +rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs) +{ + addr += sizeof(struct ring_buffer_meta) + + sizeof(int) * nr_subbufs; + return ALIGN(addr, subbuf_size); +} + +/* + * Return a specific sub-buffer for a given @cpu defined by @idx. + */ +static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int nr_pages, int idx) +{ + unsigned long ptr; + int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE; + int nr_subbufs; + + /* Include the reader page */ + nr_subbufs = nr_pages + 1; + + /* + * The first chunk may not be subbuffer aligned, where as + * the rest of the chunks are. + */ + ptr = buffer->range_addr_start; + ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); + if (cpu) { + unsigned long p; + + ptr += subbuf_size * nr_subbufs; + + /* Save the beginning of this CPU chunk */ + p = ptr; + + ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); + + if (cpu > 1) { + unsigned long size; + + ptr += subbuf_size * nr_subbufs; + + /* Now all chunks after this are the same size */ + size = ptr - p; + ptr += size * (cpu - 2); + + ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); + } + } + if (ptr + subbuf_size * nr_subbufs > buffer->range_addr_end) + return NULL; + return (void *)ptr; +} + static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, long nr_pages, struct list_head *pages) { + struct trace_buffer *buffer = cpu_buffer->buffer; struct buffer_page *bpage, *tmp; bool user_thread = current->mm != NULL; gfp_t mflags; @@ -1522,6 +1589,7 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, set_current_oom_origin(); for (i = 0; i < nr_pages; i++) { struct page *page; + int cpu = cpu_buffer->cpu; bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()), mflags, cpu_to_node(cpu_buffer->cpu)); @@ -1530,14 +1598,22 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, rb_check_bpage(cpu_buffer, bpage); - list_add(&bpage->list, pages); + list_add_tail(&bpage->list, pages); - page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), - mflags | __GFP_COMP | __GFP_ZERO, - cpu_buffer->buffer->subbuf_order); - if (!page) - goto free_pages; - bpage->page = page_address(page); + if (buffer->range_addr_start) { + /* A range was given. Use that for the buffer page */ + bpage->page = rb_range_buffer(buffer, cpu, nr_pages, i + 1); + if (!bpage->page) + goto free_pages; + bpage->range = 1; + } else { + page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), + mflags | __GFP_COMP | __GFP_ZERO, + cpu_buffer->buffer->subbuf_order); + if (!page) + goto free_pages; + bpage->page = page_address(page); + } bpage->order = cpu_buffer->buffer->subbuf_order; rb_init_page(bpage->page); @@ -1619,11 +1695,19 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) cpu_buffer->reader_page = bpage; - page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL | __GFP_COMP | __GFP_ZERO, - cpu_buffer->buffer->subbuf_order); - if (!page) - goto fail_free_reader; - bpage->page = page_address(page); + if (buffer->range_addr_start) { + bpage->page = rb_range_buffer(buffer, cpu, nr_pages, 0); + if (!bpage->page) + goto fail_free_reader; + bpage->range = 1; + } else { + page = alloc_pages_node(cpu_to_node(cpu), + GFP_KERNEL | __GFP_COMP | __GFP_ZERO, + cpu_buffer->buffer->subbuf_order); + if (!page) + goto fail_free_reader; + bpage->page = page_address(page); + } rb_init_page(bpage->page); INIT_LIST_HEAD(&cpu_buffer->reader_page->list); @@ -1674,22 +1758,14 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer) kfree(cpu_buffer); } -/** - * __ring_buffer_alloc - allocate a new ring_buffer - * @size: the size in bytes per cpu that is needed. - * @flags: attributes to set for the ring buffer. - * @key: ring buffer reader_lock_key. - * - * Currently the only flag that is available is the RB_FL_OVERWRITE - * flag. This flag means that the buffer will overwrite old data - * when the buffer wraps. If this flag is not set, the buffer will - * drop data when the tail hits the head. - */ -struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, - struct lock_class_key *key) +static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags, + int order, unsigned long start, + unsigned long end, + struct lock_class_key *key) { struct trace_buffer *buffer; long nr_pages; + int subbuf_size; int bsize; int cpu; int ret; @@ -1703,14 +1779,13 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, if (!zalloc_cpumask_var(&buffer->cpumask, GFP_KERNEL)) goto fail_free_buffer; - /* Default buffer page size - one system page */ - buffer->subbuf_order = 0; - buffer->subbuf_size = PAGE_SIZE - BUF_PAGE_HDR_SIZE; + buffer->subbuf_order = order; + subbuf_size = (PAGE_SIZE << order); + buffer->subbuf_size = subbuf_size - BUF_PAGE_HDR_SIZE; /* Max payload is buffer page size - header (8bytes) */ buffer->max_data_size = buffer->subbuf_size - (sizeof(u32) * 2); - nr_pages = DIV_ROUND_UP(size, buffer->subbuf_size); buffer->flags = flags; buffer->clock = trace_clock_local; buffer->reader_lock_key = key; @@ -1718,10 +1793,6 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, init_irq_work(&buffer->irq_work.work, rb_wake_up_waiters); init_waitqueue_head(&buffer->irq_work.waiters); - /* need at least two pages */ - if (nr_pages < 2) - nr_pages = 2; - buffer->cpus = nr_cpu_ids; bsize = sizeof(void *) * nr_cpu_ids; @@ -1730,6 +1801,46 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, if (!buffer->buffers) goto fail_free_cpumask; + /* If start/end are specified, then that overrides size */ + if (start && end) { + unsigned long ptr; + int n; + + size = end - start; + size = size / nr_cpu_ids; + nr_pages = (size - sizeof(struct ring_buffer_meta)) / + (subbuf_size + sizeof(int)); + /* Need at least two pages plus the reader page */ + if (nr_pages < 3) + goto fail_free_buffers; + + again: + /* Make sure that the size fits aligned */ + for (n = 0, ptr = start; n < nr_cpu_ids; n++) { + ptr += sizeof(struct ring_buffer_meta) + + sizeof(int) * nr_pages; + ptr = ALIGN(ptr, subbuf_size); + ptr += subbuf_size * nr_pages; + } + if (ptr > end) { + if (nr_pages <= 3) + goto fail_free_buffers; + nr_pages--; + goto again; + } + + /* nr_pages should not count the reader page */ + nr_pages--; + buffer->range_addr_start = start; + buffer->range_addr_end = end; + } else { + + /* need at least two pages */ + nr_pages = DIV_ROUND_UP(size, buffer->subbuf_size); + if (nr_pages < 2) + nr_pages = 2; + } + cpu = raw_smp_processor_id(); cpumask_set_cpu(cpu, buffer->cpumask); buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); @@ -1758,8 +1869,49 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, kfree(buffer); return NULL; } + +/** + * __ring_buffer_alloc - allocate a new ring_buffer + * @size: the size in bytes per cpu that is needed. + * @flags: attributes to set for the ring buffer. + * @key: ring buffer reader_lock_key. + * + * Currently the only flag that is available is the RB_FL_OVERWRITE + * flag. This flag means that the buffer will overwrite old data + * when the buffer wraps. If this flag is not set, the buffer will + * drop data when the tail hits the head. + */ +struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, + struct lock_class_key *key) +{ + /* Default buffer page size - one system page */ + return alloc_buffer(size, flags, 0, 0, 0,key); + +} EXPORT_SYMBOL_GPL(__ring_buffer_alloc); +/** + * __ring_buffer_alloc_range - allocate a new ring_buffer from existing memory + * @size: the size in bytes per cpu that is needed. + * @flags: attributes to set for the ring buffer. + * @start: start of allocated range + * @range_size: size of allocated range + * @order: sub-buffer order + * @key: ring buffer reader_lock_key. + * + * Currently the only flag that is available is the RB_FL_OVERWRITE + * flag. This flag means that the buffer will overwrite old data + * when the buffer wraps. If this flag is not set, the buffer will + * drop data when the tail hits the head. + */ +struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flags, + int order, unsigned long start, + unsigned long range_size, + struct lock_class_key *key) +{ + return alloc_buffer(size, flags, order, start, start + range_size, key); +} + /** * ring_buffer_free - free a ring buffer. * @buffer: the buffer to free. From patchwork Thu Apr 11 01:25:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625291 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD5DE61674; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798786; cv=none; b=l4TGjJsEdC/0yJu/78eftZ5li7dg/4O6iN6lVXx4BJLzvkWE/FcGSML/cEZ3HHhLYHZfsxOONCr3EquWxPs/R+UKWqyDhGE+puIm5WuJ1ccHkhb9Y5HQxpxRc/7mXJzRI3NdZw2lUnFgRvL4txEJfXeqM2nAaHdEcs0cPaJtFvs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798786; c=relaxed/simple; bh=G17z1aT/3B5BtwWO23Z92QlEpsuy8r9Regd4BdW0d2o=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=VqFulNx+ECY/rbo7gmBd4PRf1UzDudqPv8+J9HtR+F+K+3awEe02MTe3WHxd2b944pnPCgPra2rbHNr9InCjEu0n3o3DS4QnLvrV+qqMpjklZ/UWphDu9+nJbvQufbKtgPkR4yC6DIDMpAokTFshav978FwtJGV+wBsaN+Krmqw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 52DA2C4166A; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFQ-00000000nVS-2uul; Wed, 10 Apr 2024 21:29:04 -0400 Message-ID: <20240411012904.554737831@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:44 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 03/11] ring-buffer: Add ring_buffer_meta data References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" Populate the ring_buffer_meta array. It holds the pointer to the head_buffer (next to read), the commit_buffer (next to write) the size of the sub-buffers, number of sub-buffers and an array that keeps track of the order of the sub-buffers. This information will be stored in the persistent memory to help on reboot to reconstruct the ring buffer. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 208 ++++++++++++++++++++++++++++++++----- 1 file changed, 183 insertions(+), 25 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index a6d22a77f128..db8762faef77 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -43,6 +43,11 @@ static void update_pages_handler(struct work_struct *work); struct ring_buffer_meta { + unsigned long head_buffer; + unsigned long commit_buffer; + __u32 subbuf_size; + __u32 nr_subbufs; + int buffers[]; }; /* @@ -498,6 +503,7 @@ struct ring_buffer_per_cpu { struct mutex mapping_lock; unsigned long *subbuf_ids; /* ID to subbuf VA */ struct trace_buffer_meta *meta_page; + struct ring_buffer_meta *ring_meta; /* ring buffer pages to update, > 0 to add, < 0 to remove */ long nr_pages_to_update; @@ -1258,6 +1264,11 @@ static void rb_head_page_activate(struct ring_buffer_per_cpu *cpu_buffer) * Set the previous list pointer to have the HEAD flag. */ rb_set_list_to_head(head->list.prev); + + if (cpu_buffer->ring_meta) { + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + meta->head_buffer = (unsigned long)head->page; + } } static void rb_list_head_clear(struct list_head *list) @@ -1505,50 +1516,125 @@ rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs) } /* - * Return a specific sub-buffer for a given @cpu defined by @idx. + * Return the ring_buffer_meta for a given @cpu. */ -static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int nr_pages, int idx) +static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu) { - unsigned long ptr; int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE; + unsigned long ptr = buffer->range_addr_start; + struct ring_buffer_meta *meta; int nr_subbufs; - /* Include the reader page */ - nr_subbufs = nr_pages + 1; + if (!ptr) + return NULL; + + /* When nr_pages passed in is zero, the first meta has already been initialized */ + if (!nr_pages) { + meta = (struct ring_buffer_meta *)ptr; + nr_subbufs = meta->nr_subbufs; + } else { + meta = NULL; + /* Include the reader page */ + nr_subbufs = nr_pages + 1; + } /* * The first chunk may not be subbuffer aligned, where as * the rest of the chunks are. */ - ptr = buffer->range_addr_start; - ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); if (cpu) { - unsigned long p; - - ptr += subbuf_size * nr_subbufs; - - /* Save the beginning of this CPU chunk */ - p = ptr; - ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); + ptr += subbuf_size * nr_subbufs; if (cpu > 1) { unsigned long size; + unsigned long p; + /* Save the beginning of this CPU chunk */ + p = ptr; + ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); ptr += subbuf_size * nr_subbufs; /* Now all chunks after this are the same size */ size = ptr - p; ptr += size * (cpu - 2); - - ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); } } - if (ptr + subbuf_size * nr_subbufs > buffer->range_addr_end) + return (void *)ptr; +} + +static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta) +{ + int subbuf_size = meta->subbuf_size; + unsigned long ptr; + + ptr = (unsigned long)meta; + ptr = rb_range_align_subbuf(ptr, subbuf_size, meta->nr_subbufs); + + return (void *)ptr; +} + +/* + * Return a specific sub-buffer for a given @cpu defined by @idx. + */ +static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx) +{ + struct ring_buffer_meta *meta; + unsigned long ptr; + int subbuf_size; + + meta = rb_range_meta(cpu_buffer->buffer, 0, cpu_buffer->cpu); + if (!meta) + return NULL; + + if (WARN_ON_ONCE(idx >= meta->nr_subbufs)) return NULL; + + subbuf_size = meta->subbuf_size; + + /* Map this buffer to the order that's in meta->buffers[] */ + idx = meta->buffers[idx]; + + ptr = (unsigned long)rb_subbufs_from_meta(meta); + + ptr += subbuf_size * idx; + if (ptr + subbuf_size > cpu_buffer->buffer->range_addr_end) + return NULL; + return (void *)ptr; } +static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) +{ + struct ring_buffer_meta *meta; + void *subbuf; + int cpu; + int i; + + for (cpu = 0; cpu < nr_cpu_ids; cpu++) { + meta = rb_range_meta(buffer, nr_pages, cpu); + + meta->nr_subbufs = nr_pages + 1; + meta->subbuf_size = PAGE_SIZE; + + subbuf = rb_subbufs_from_meta(meta); + + /* + * The buffers[] array holds the order of the sub-buffers + * that are after the meta data. The sub-buffers may + * be swapped out when read and inserted into a different + * location of the ring buffer. Although their addresses + * remain the same, the buffers[] array contains the + * index into the sub-buffers holding their actual order. + */ + for (i = 0; i < meta->nr_subbufs; i++) { + meta->buffers[i] = i; + rb_init_page(subbuf); + subbuf += meta->subbuf_size; + } + } +} + static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, long nr_pages, struct list_head *pages) { @@ -1589,7 +1675,6 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, set_current_oom_origin(); for (i = 0; i < nr_pages; i++) { struct page *page; - int cpu = cpu_buffer->cpu; bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()), mflags, cpu_to_node(cpu_buffer->cpu)); @@ -1602,10 +1687,11 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, if (buffer->range_addr_start) { /* A range was given. Use that for the buffer page */ - bpage->page = rb_range_buffer(buffer, cpu, nr_pages, i + 1); + bpage->page = rb_range_buffer(cpu_buffer, i + 1); if (!bpage->page) goto free_pages; bpage->range = 1; + bpage->id = i + 1; } else { page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), mflags | __GFP_COMP | __GFP_ZERO, @@ -1613,9 +1699,9 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, if (!page) goto free_pages; bpage->page = page_address(page); + rb_init_page(bpage->page); } bpage->order = cpu_buffer->buffer->subbuf_order; - rb_init_page(bpage->page); if (user_thread && fatal_signal_pending(current)) goto free_pages; @@ -1696,7 +1782,13 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) cpu_buffer->reader_page = bpage; if (buffer->range_addr_start) { - bpage->page = rb_range_buffer(buffer, cpu, nr_pages, 0); + /* + * Range mapped buffers have the same restrictions as memory + * mapped ones do. + */ + cpu_buffer->mapped = 1; + cpu_buffer->ring_meta = rb_range_meta(buffer, nr_pages, cpu); + bpage->page = rb_range_buffer(cpu_buffer, 0); if (!bpage->page) goto fail_free_reader; bpage->range = 1; @@ -1707,8 +1799,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) if (!page) goto fail_free_reader; bpage->page = page_address(page); + rb_init_page(bpage->page); } - rb_init_page(bpage->page); INIT_LIST_HEAD(&cpu_buffer->reader_page->list); INIT_LIST_HEAD(&cpu_buffer->new_pages); @@ -1722,6 +1814,10 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page; rb_head_page_activate(cpu_buffer); + if (cpu_buffer->ring_meta) { + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + meta->commit_buffer = meta->head_buffer; + } return cpu_buffer; @@ -1833,6 +1929,8 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags, nr_pages--; buffer->range_addr_start = start; buffer->range_addr_end = end; + + rb_range_meta_init(buffer, nr_pages); } else { /* need at least two pages */ @@ -2517,6 +2615,52 @@ static void rb_inc_iter(struct ring_buffer_iter *iter) iter->next_event = 0; } +/* Return the index into the sub-buffers for a given sub-buffer */ +static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf) +{ + void *subbuf_array; + + subbuf_array = (void *)meta + sizeof(int) * meta->nr_subbufs; + subbuf_array = (void *)ALIGN((unsigned long)subbuf_array, meta->subbuf_size); + return (subbuf - subbuf_array) / meta->subbuf_size; +} + +static void rb_update_meta_head(struct ring_buffer_per_cpu *cpu_buffer, + struct buffer_page *next_page) +{ + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + unsigned long old_head = (unsigned long)next_page->page; + unsigned long new_head; + + rb_inc_page(&next_page); + new_head = (unsigned long)next_page->page; + + /* + * Only move it forward once, if something else came in and + * moved it forward, then we don't want to touch it. + */ + (void)cmpxchg(&meta->head_buffer, old_head, new_head); +} + +static void rb_update_meta_reader(struct ring_buffer_per_cpu *cpu_buffer, + struct buffer_page *reader) +{ + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + void *old_reader = cpu_buffer->reader_page->page; + void *new_reader = reader->page; + int id; + + id = reader->id; + cpu_buffer->reader_page->id = id; + reader->id = 0; + + meta->buffers[0] = rb_meta_subbuf_idx(meta, new_reader); + meta->buffers[id] = rb_meta_subbuf_idx(meta, old_reader); + + /* The head pointer is the one after the reader */ + rb_update_meta_head(cpu_buffer, reader); +} + /* * rb_handle_head_page - writer hit the head page * @@ -2566,6 +2710,8 @@ rb_handle_head_page(struct ring_buffer_per_cpu *cpu_buffer, local_sub(rb_page_commit(next_page), &cpu_buffer->entries_bytes); local_inc(&cpu_buffer->pages_lost); + if (cpu_buffer->ring_meta) + rb_update_meta_head(cpu_buffer, next_page); /* * The entries will be zeroed out when we move the * tail page. @@ -3127,6 +3273,10 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer) local_set(&cpu_buffer->commit_page->page->commit, rb_page_write(cpu_buffer->commit_page)); rb_inc_page(&cpu_buffer->commit_page); + if (cpu_buffer->ring_meta) { + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + meta->commit_buffer = (unsigned long)cpu_buffer->commit_page->page; + } /* add barrier to keep gcc from optimizing too much */ barrier(); } @@ -4744,6 +4894,9 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) if (!ret) goto spin; + if (cpu_buffer->ring_meta) + rb_update_meta_reader(cpu_buffer, reader); + /* * Yay! We succeeded in replacing the page. * @@ -5434,11 +5587,16 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer) cpu_buffer->lost_events = 0; cpu_buffer->last_overrun = 0; - if (cpu_buffer->mapped) - rb_update_meta_page(cpu_buffer); - rb_head_page_activate(cpu_buffer); cpu_buffer->pages_removed = 0; + + if (cpu_buffer->mapped) { + rb_update_meta_page(cpu_buffer); + if (cpu_buffer->ring_meta) { + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + meta->commit_buffer = meta->head_buffer; + } + } } /* Must have disabled the cpu buffer then done a synchronize_rcu */ From patchwork Thu Apr 11 01:25:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625292 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0231B626DF; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; cv=none; b=bcjKh0hFGm4A+fPG+djB8Y2aoxCSv2sV1zBOoWaJKFueWqvgpTyKb0kDcfwm/sfLGqq35xQAiczcOpSTplTVX7cwylDCnD65Lx+DANzMxB4wAbRfbP9N2Ff1tldrAic4KYH9BOaAQD8JXN6oypvy1RISDjxMt6YYln9DRSbUfGU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; c=relaxed/simple; bh=j7R15Vswkf2LNk7KzUJnfJqxLnSVENe3zwazAuIc2Ic=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=nUxPesWPR4YsNS4+gR1rPe8f4GZxK7Wv6MYT8bZ10U0XLZ5qoQipDeMo881wqQoOrrp++JP+GVQp2G9yWkz1IjREMXUqPTmg6loTtL820bnor6fANws5eeyvSWtJVgmmKktrfzpU5gaVd/603EDM+a2OHT6CSpCaRjjmh/hduyg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 81018C43142; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFQ-00000000nVw-3Zls; Wed, 10 Apr 2024 21:29:04 -0400 Message-ID: <20240411012904.715156108@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:45 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 04/11] tracing: Implement creating an instance based on a given memory region References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" Allow for creating a new instance by passing in an address and size to map the ring buffer for the instance to. This will allow features like a pstore memory mapped region to be used for an tracing instance ring buffer that can be retrieved from one boot to the next. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 50 +++++++++++++++++++++++++++++++++++--------- kernel/trace/trace.h | 3 +++ 2 files changed, 43 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 83194bf7b1df..31067de977fc 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4921,6 +4921,11 @@ static int tracing_open(struct inode *inode, struct file *file) static bool trace_ok_for_array(struct tracer *t, struct trace_array *tr) { +#ifdef CONFIG_TRACER_SNAPSHOT + /* arrays with mapped buffer range do not have snapshots */ + if (tr->range_addr_start && t->use_max_tr) + return false; +#endif return (tr->flags & TRACE_ARRAY_FL_GLOBAL) || t->allow_instances; } @@ -8673,11 +8678,13 @@ tracing_init_tracefs_percpu(struct trace_array *tr, long cpu) tr, cpu, &tracing_entries_fops); #ifdef CONFIG_TRACER_SNAPSHOT - trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu, - tr, cpu, &snapshot_fops); + if (!tr->range_addr_start) { + trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu, + tr, cpu, &snapshot_fops); - trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu, - tr, cpu, &snapshot_raw_fops); + trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu, + tr, cpu, &snapshot_raw_fops); + } #endif } @@ -9214,7 +9221,18 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size buf->tr = tr; - buf->buffer = ring_buffer_alloc(size, rb_flags); + if (tr->range_addr_start && tr->range_addr_size) { + buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0, + tr->range_addr_start, + tr->range_addr_size); + /* + * This is basically the same as a mapped buffer, + * with the same restrictions. + */ + tr->mapped++; + } else { + buf->buffer = ring_buffer_alloc(size, rb_flags); + } if (!buf->buffer) return -ENOMEM; @@ -9251,6 +9269,10 @@ static int allocate_trace_buffers(struct trace_array *tr, int size) return ret; #ifdef CONFIG_TRACER_MAX_TRACE + /* Fix mapped buffer trace arrays do not have snapshot buffers */ + if (tr->range_addr_start) + return 0; + ret = allocate_trace_buffer(tr, &tr->max_buffer, allocate_snapshot ? size : 1); if (MEM_FAIL(ret, "Failed to allocate trace buffer\n")) { @@ -9351,7 +9373,9 @@ static int trace_array_create_dir(struct trace_array *tr) } static struct trace_array * -trace_array_create_systems(const char *name, const char *systems) +trace_array_create_systems(const char *name, const char *systems, + unsigned long range_addr_start, + unsigned long range_addr_size) { struct trace_array *tr; int ret; @@ -9377,6 +9401,10 @@ trace_array_create_systems(const char *name, const char *systems) goto out_free_tr; } + /* Only for boot up memory mapped ring buffers */ + tr->range_addr_start = range_addr_start; + tr->range_addr_size = range_addr_size; + tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS; cpumask_copy(tr->tracing_cpumask, cpu_all_mask); @@ -9434,7 +9462,7 @@ trace_array_create_systems(const char *name, const char *systems) static struct trace_array *trace_array_create(const char *name) { - return trace_array_create_systems(name, NULL); + return trace_array_create_systems(name, NULL, 0, 0); } static int instance_mkdir(const char *name) @@ -9488,7 +9516,7 @@ struct trace_array *trace_array_get_by_name(const char *name, const char *system goto out_unlock; } - tr = trace_array_create_systems(name, systems); + tr = trace_array_create_systems(name, systems, 0, 0); if (IS_ERR(tr)) tr = NULL; @@ -9681,8 +9709,10 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) MEM_FAIL(1, "Could not allocate function filter files"); #ifdef CONFIG_TRACER_SNAPSHOT - trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer, - tr, &snapshot_fops); + if (!tr->range_addr_start) { + trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer, + tr, &snapshot_fops); + } #endif trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer, diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 749a182dab48..d22d7c3b770a 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -338,6 +338,9 @@ struct trace_array { unsigned int snapshot; unsigned int mapped; unsigned long max_latency; + /* The below is for memory mapped ring buffer */ + unsigned long range_addr_start; + unsigned long range_addr_size; #ifdef CONFIG_FSNOTIFY struct dentry *d_max_latency; struct work_struct fsnotify_work; From patchwork Thu Apr 11 01:25:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625293 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10D8F629E6; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; cv=none; b=eKKZfxzTy5j/xer5CfwX3uBy4uYHwWpXRxV5vwYlwYJURozzbBhIcfKOyq7QjAgsBj5FVfC5nNJSm7uIZMW+BEX9vdYzsWeOiniKqPPjoXscHlec0oIq1BEIk7DZQrh31tYR4PAaZlitl4umn8OQO6wqwJ4qObmTn2iZpr0BJ70= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; c=relaxed/simple; bh=viBLZ0M/vmoMhCS+p6zv4rpqOP5As3oDZWwPYYlYQTg=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=WXmHmkU1NA+MEP6lXDkRLNnYkxpWLpQmUEbX+eEn9B5VcuF9Y62Nlvn6+R6OTKZ0rrYvaH9UqViWKcCvRbA/zBOdf8Cd/j/8h+4UQnK2AonPPJ35ylROKx92QGhxA7hwaFqJ10Xwr/jD0/uCuA81eOwQm2AZiNVXpZSAPPn6miI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE941C41674; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFR-00000000nWQ-04Fy; Wed, 10 Apr 2024 21:29:05 -0400 Message-ID: <20240411012904.873466915@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:46 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 05/11] ring-buffer: Add output of ring buffer meta page References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" Add a buffer_meta per-cpu file for the trace instance that is mapped to boot memory. This shows the current meta-data and can be used by user space tools to record off the current mappings to help reconstruct the ring buffer after a reboot. It does not expose any virtual addresses, just indexes into the sub-buffer pages. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 77 ++++++++++++++++++++++++++++++++++++++ kernel/trace/trace.c | 30 ++++++++++++++- kernel/trace/trace.h | 2 + 3 files changed, 107 insertions(+), 2 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index db8762faef77..ee3a5c6966e2 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -32,6 +32,8 @@ #include #include +#include "trace.h" + /* * The "absolute" timestamp in the buffer is only 59 bits. * If a clock has the 5 MSBs set, it needs to be saved and @@ -1635,6 +1637,81 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) } } +static void *rbm_start(struct seq_file *m, loff_t *pos) +{ + struct ring_buffer_per_cpu *cpu_buffer = m->private; + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + unsigned long val; + + if (!meta) + return NULL; + + if (*pos > meta->nr_subbufs) + return NULL; + + val = *pos; + val++; + + return (void *)val; +} + +static void *rbm_next(struct seq_file *m, void *v, loff_t *pos) +{ + (*pos)++; + + return rbm_start(m, pos); +} + +static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf); + +static int rbm_show(struct seq_file *m, void *v) +{ + struct ring_buffer_per_cpu *cpu_buffer = m->private; + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + unsigned long val = (unsigned long)v; + + if (val == 1) { + seq_printf(m, "head_buffer: %d\n", + rb_meta_subbuf_idx(meta, (void *)meta->head_buffer)); + seq_printf(m, "commit_buffer: %d\n", + rb_meta_subbuf_idx(meta, (void *)meta->commit_buffer)); + seq_printf(m, "subbuf_size: %d\n", meta->subbuf_size); + seq_printf(m, "nr_subbufs: %d\n", meta->nr_subbufs); + return 0; + } + + val -= 2; + seq_printf(m, "buffer[%ld]: %d\n", val, meta->buffers[val]); + + return 0; +} + +static void rbm_stop(struct seq_file *m, void *p) +{ +} + +static const struct seq_operations rb_meta_seq_ops = { + .start = rbm_start, + .next = rbm_next, + .show = rbm_show, + .stop = rbm_stop, +}; + +int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, int cpu) +{ + struct seq_file *m; + int ret; + + ret = seq_open(file, &rb_meta_seq_ops); + if (ret) + return ret; + + m = file->private_data; + m->private = buffer->buffers[cpu]; + + return 0; +} + static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, long nr_pages, struct list_head *pages) { diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 31067de977fc..b2b16be6e4ec 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5018,7 +5018,7 @@ static int show_traces_open(struct inode *inode, struct file *file) return 0; } -static int show_traces_release(struct inode *inode, struct file *file) +static int tracing_seq_release(struct inode *inode, struct file *file) { struct trace_array *tr = inode->i_private; @@ -5059,7 +5059,7 @@ static const struct file_operations show_traces_fops = { .open = show_traces_open, .read = seq_read, .llseek = seq_lseek, - .release = show_traces_release, + .release = tracing_seq_release, }; static ssize_t @@ -6860,6 +6860,22 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf, return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); } +static int tracing_buffer_meta_open(struct inode *inode, struct file *filp) +{ + struct trace_array *tr = inode->i_private; + int cpu = tracing_get_cpu(inode); + int ret; + + ret = tracing_check_open_get_tr(tr); + if (ret) + return ret; + + ret = ring_buffer_meta_seq_init(filp, tr->array_buffer.buffer, cpu); + if (ret < 0) + __trace_array_put(tr); + return ret; +} + static ssize_t tracing_free_buffer_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos) @@ -7436,6 +7452,13 @@ static const struct file_operations tracing_entries_fops = { .release = tracing_release_generic_tr, }; +static const struct file_operations tracing_buffer_meta_fops = { + .open = tracing_buffer_meta_open, + .read = seq_read, + .llseek = seq_lseek, + .release = tracing_seq_release, +}; + static const struct file_operations tracing_total_entries_fops = { .open = tracing_open_generic_tr, .read = tracing_total_entries_read, @@ -8677,6 +8700,9 @@ tracing_init_tracefs_percpu(struct trace_array *tr, long cpu) trace_create_cpu_file("buffer_size_kb", TRACE_MODE_READ, d_cpu, tr, cpu, &tracing_entries_fops); + if (tr->range_addr_start) + trace_create_cpu_file("buffer_meta", TRACE_MODE_READ, d_cpu, + tr, cpu, &tracing_buffer_meta_fops); #ifdef CONFIG_TRACER_SNAPSHOT if (!tr->range_addr_start) { trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu, diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index d22d7c3b770a..ccff4891c2ac 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -644,6 +644,8 @@ trace_buffer_lock_reserve(struct trace_buffer *buffer, unsigned long len, unsigned int trace_ctx); +int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, int cpu); + struct trace_entry *tracing_get_trace_entry(struct trace_array *tr, struct trace_array_cpu *data); From patchwork Thu Apr 11 01:25:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625296 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E4C4657B5; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; cv=none; b=SyMM1eofXq38nzN0v1SjW5B+aa4weIoGnNvRfyWAOVJL3IBDLVLSueIyHII8c6OI5ad0wYTl16A3EagvYvk0iYBv0v4WJD3Nr5VRWQgMjpnQtZ3oUn4MQDRzSKPDKlM0Udo/VpHjvpB2QWsFi0W9Xk+Y9T58EjAvlZDsKq5sIcg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; c=relaxed/simple; bh=miliMOXMBcECmaqRdnKPy+H70NlyrIEKCPPyaMLRHpo=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=VI/O8ss0b9zyUXJLW/4PYlm8XOWL3zuBXbuvZITwVkFVTpA2fV2Xe6w+VVFsLQeEoktZ5tOQ0oMgs80jGhLoy5jN1qpoBk6EuO09iY0AUxgCKZkBqG7OAEfBKNKgK63Rf1x6qI2YwXNSLx8q+PENGOqojlwAV799gOuwS0Garq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id C2C47C41679; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFR-00000000nWu-0j33; Wed, 10 Apr 2024 21:29:05 -0400 Message-ID: <20240411012905.035116528@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:47 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 06/11] ring-buffer: Add test if range of boot buffer is valid References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" Add a test against the ring buffer memory range to see if it has valid data. The ring_buffer_meta structure is given a new field called "first_buffer" which holds the address of the first sub-buffer. This is used to both determine if the other fields are valid as well as finding the offset between the old addresses of the sub-buffer from the previous boot to the new addresses of the current boot. Since the values for nr_subbufs and subbuf_size is to be the same, check if the values in the meta page match the values calculated. Take the range of the first_buffer and the total size of all the buffers and make sure the saved head_buffer and commit_buffer fall in the range. Iterate through all the sub-buffers to make sure that the values in the sub-buffer "commit" field (the field that holds the amount of data on the sub-buffer) is within the end of the sub-buffer. Also check the index array to make sure that all the indexes are within nr_subbufs. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 143 ++++++++++++++++++++++++++++++++++--- 1 file changed, 135 insertions(+), 8 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index ee3a5c6966e2..d05b60fd5695 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -45,6 +45,7 @@ static void update_pages_handler(struct work_struct *work); struct ring_buffer_meta { + unsigned long first_buffer; unsigned long head_buffer; unsigned long commit_buffer; __u32 subbuf_size; @@ -1606,21 +1607,103 @@ static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx) return (void *)ptr; } +/* + * See if the existing memory contains valid ring buffer data. + * As the previous kernel must be the same as this kernel, all + * the calculations (size of buffers and number of buffers) + * must be the same. + */ +static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu, + struct trace_buffer *buffer, int nr_pages) +{ + int subbuf_size = PAGE_SIZE; + struct buffer_data_page *subbuf; + unsigned long buffers_start; + unsigned long buffers_end; + int i; + + /* The subbuffer's size and number of subbuffers must match */ + if (meta->subbuf_size != subbuf_size || + meta->nr_subbufs != nr_pages + 1) { + pr_info("Ring buffer boot meta [%d] mismatch of subbuf_size/nr_pages\n", cpu); + return false; + } + + buffers_start = meta->first_buffer; + buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs); + + /* Is the head and commit buffers within the range of buffers? */ + if (meta->head_buffer < buffers_start || + meta->head_buffer >= buffers_end) { + pr_info("Ring buffer boot meta [%d] head buffer out of range\n", cpu); + return false; + } + + if (meta->commit_buffer < buffers_start || + meta->commit_buffer >= buffers_end) { + pr_info("Ring buffer boot meta [%d] commit buffer out of range\n", cpu); + return false; + } + + subbuf = rb_subbufs_from_meta(meta); + + /* Is the meta buffers and the subbufs themselves have correct data? */ + for (i = 0; i < meta->nr_subbufs; i++) { + if (meta->buffers[i] < 0 || + meta->buffers[i] >= meta->nr_subbufs) { + pr_info("Ring buffer boot meta [%d] array out of range\n", cpu); + return false; + } + + if ((unsigned)local_read(&subbuf->commit) > subbuf_size) { + pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu); + return false; + } + + subbuf = (void *)subbuf + subbuf_size; + } + + pr_info("Ring buffer meta is from previous boot!\n"); + return true; +} + static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) { struct ring_buffer_meta *meta; + unsigned long delta; void *subbuf; int cpu; int i; for (cpu = 0; cpu < nr_cpu_ids; cpu++) { + void *next_meta; + meta = rb_range_meta(buffer, nr_pages, cpu); + if (rb_meta_valid(meta, cpu, buffer, nr_pages)) { + /* Make the mappings match the current address */ + subbuf = rb_subbufs_from_meta(meta); + delta = (unsigned long)subbuf - meta->first_buffer; + meta->first_buffer += delta; + meta->head_buffer += delta; + meta->commit_buffer += delta; + continue; + } + + if (cpu < nr_cpu_ids - 1) + next_meta = rb_range_meta(buffer, nr_pages, cpu + 1); + else + next_meta = (void *)buffer->range_addr_end; + + memset(meta, 0, next_meta - (void *)meta); + meta->nr_subbufs = nr_pages + 1; meta->subbuf_size = PAGE_SIZE; subbuf = rb_subbufs_from_meta(meta); + meta->first_buffer = (unsigned long)subbuf; + /* * The buffers[] array holds the order of the sub-buffers * that are after the meta data. The sub-buffers may @@ -1712,10 +1795,26 @@ int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, in return 0; } +/* Map the buffer_pages to the previous head and commit pages */ +static void rb_meta_buffer_update(struct ring_buffer_per_cpu *cpu_buffer, + struct buffer_page *bpage) +{ + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + + if (meta->head_buffer == (unsigned long)bpage->page) + cpu_buffer->head_page = bpage; + + if (meta->commit_buffer == (unsigned long)bpage->page) { + cpu_buffer->commit_page = bpage; + cpu_buffer->tail_page = bpage; + } +} + static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, long nr_pages, struct list_head *pages) { struct trace_buffer *buffer = cpu_buffer->buffer; + struct ring_buffer_meta *meta = NULL; struct buffer_page *bpage, *tmp; bool user_thread = current->mm != NULL; gfp_t mflags; @@ -1750,6 +1849,10 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, */ if (user_thread) set_current_oom_origin(); + + if (buffer->range_addr_start) + meta = rb_range_meta(buffer, nr_pages, cpu_buffer->cpu); + for (i = 0; i < nr_pages; i++) { struct page *page; @@ -1762,11 +1865,14 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, list_add_tail(&bpage->list, pages); - if (buffer->range_addr_start) { + if (meta) { /* A range was given. Use that for the buffer page */ bpage->page = rb_range_buffer(cpu_buffer, i + 1); if (!bpage->page) goto free_pages; + /* If this is valid from a previous boot */ + if (meta->head_buffer) + rb_meta_buffer_update(cpu_buffer, bpage); bpage->range = 1; bpage->id = i + 1; } else { @@ -1828,6 +1934,7 @@ static struct ring_buffer_per_cpu * rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) { struct ring_buffer_per_cpu *cpu_buffer; + struct ring_buffer_meta *meta; struct buffer_page *bpage; struct page *page; int ret; @@ -1868,6 +1975,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) bpage->page = rb_range_buffer(cpu_buffer, 0); if (!bpage->page) goto fail_free_reader; + if (cpu_buffer->ring_meta->head_buffer) + rb_meta_buffer_update(cpu_buffer, bpage); bpage->range = 1; } else { page = alloc_pages_node(cpu_to_node(cpu), @@ -1886,14 +1995,32 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) if (ret < 0) goto fail_free_reader; - cpu_buffer->head_page - = list_entry(cpu_buffer->pages, struct buffer_page, list); - cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page; + /* If the boot meta was valid then this has already been updated */ + meta = cpu_buffer->ring_meta; + if (!meta || !meta->head_buffer || + !cpu_buffer->head_page || !cpu_buffer->commit_page || !cpu_buffer->tail_page) { + if (meta && meta->head_buffer && + (cpu_buffer->head_page || cpu_buffer->commit_page || cpu_buffer->tail_page)) { + pr_warn("Ring buffer meta buffers not all mapped\n"); + if (!cpu_buffer->head_page) + pr_warn(" Missing head_page\n"); + if (!cpu_buffer->commit_page) + pr_warn(" Missing commit_page\n"); + if (!cpu_buffer->tail_page) + pr_warn(" Missing tail_page\n"); + } - rb_head_page_activate(cpu_buffer); - if (cpu_buffer->ring_meta) { - struct ring_buffer_meta *meta = cpu_buffer->ring_meta; - meta->commit_buffer = meta->head_buffer; + cpu_buffer->head_page + = list_entry(cpu_buffer->pages, struct buffer_page, list); + cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page; + + rb_head_page_activate(cpu_buffer); + + if (cpu_buffer->ring_meta) + meta->commit_buffer = meta->head_buffer; + } else { + /* The valid meta buffer still needs to activate the head page */ + rb_head_page_activate(cpu_buffer); } return cpu_buffer; From patchwork Thu Apr 11 01:25:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625294 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3AF1E64CC0; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; cv=none; b=i5EPhwDX737BncE5cPxLtbvpMJZ8aMuhr/A8a0FezEjy21nXXNRihUyHtpLHJcvmjNxQuzsi76+uJILhpheNPIzKhlSLLIiDXYdTyVD2OIm93ZswD0TTTBR8vIDXc5wSfPs9wNQkuT5g92C2RIwuzj49xU9piOxMskmwt0mYEwE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; c=relaxed/simple; bh=JSQojgpzWXRR0Az6t0VzaaXRtYk8WJbAdEauGbfshYc=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=dJLGmL0xT5y1C7EeA6Ob7Rj5TlrsRXtOG+kanEnoSem8Vte2vGv43OuqIYLOmv+Bj83iVxelIwoeJl+jbIbazDG4ntszrt7M5V9fzFb66BYcvv3pBHCcLAJdfNnmHvNlosIm1qX6YPuaZ038h76t8fogdaTpspnDMzJlTANH0t4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id ECD02C32787; Thu, 11 Apr 2024 01:26:26 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFR-00000000nXO-1Oke; Wed, 10 Apr 2024 21:29:05 -0400 Message-ID: <20240411012905.194026306@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:48 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 07/11] ring-buffer: Validate boot range memory events References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" Make sure all the events in each of the sub-buffers that were mapped in a memory region are valid. This moves the code that walks the buffers for time-stamp validation out of the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS ifdef block and is used to validate the content. Only the ring buffer event meta data is checked and not the data load. This also has a second purpose. The buffer_page structure that points to the data sub-buffers has accounting that keeps track of the number of events that are on the sub-buffer. This updates that counter as well. That counter is used in reading the buffer and knowing if the ring buffer is empty or not. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 190 ++++++++++++++++++++++++++++++++----- 1 file changed, 166 insertions(+), 24 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index d05b60fd5695..6f19e7612472 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1663,10 +1663,170 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu, subbuf = (void *)subbuf + subbuf_size; } - pr_info("Ring buffer meta is from previous boot!\n"); return true; } +static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf); + +#ifdef CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS +static DEFINE_PER_CPU(atomic_t, checking); +static atomic_t ts_dump; + +#define buffer_warn_return(fmt, ...) \ + do { \ + /* If another report is happening, ignore this one */ \ + if (atomic_inc_return(&ts_dump) != 1) { \ + atomic_dec(&ts_dump); \ + goto out; \ + } \ + atomic_inc(&cpu_buffer->record_disabled); \ + pr_warn(fmt, ##__VA_ARGS__); \ + dump_buffer_page(bpage, info, tail); \ + atomic_dec(&ts_dump); \ + /* There's some cases in boot up that this can happen */ \ + if (WARN_ON_ONCE(system_state != SYSTEM_BOOTING)) \ + /* Do not re-enable checking */ \ + return; \ + } while (0) +#else +#define buffer_warn_return(fmt, ...) do { } while (0) +#endif + +static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu, + unsigned long long *timestamp, bool warn) +{ + struct ring_buffer_event *event; + u64 ts, delta; + int events = 0; + int e; + + ts = dpage->time_stamp; + + for (e = 0; e < tail; e += rb_event_length(event)) { + + event = (struct ring_buffer_event *)(dpage->data + e); + + switch (event->type_len) { + + case RINGBUF_TYPE_TIME_EXTEND: + delta = rb_event_time_stamp(event); + ts += delta; + break; + + case RINGBUF_TYPE_TIME_STAMP: + delta = rb_event_time_stamp(event); + if (warn && delta < ts) { + buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT BACKWARDS: last ts: %lld absolute ts: %lld\n", + cpu, ts, delta); + } + ts = delta; + break; + + case RINGBUF_TYPE_PADDING: + if (event->time_delta == 1) + break; + fallthrough; + case RINGBUF_TYPE_DATA: + events++; + ts += event->time_delta; + break; + + default: + return -1; + } + } + *timestamp = ts; + return events; +} + +static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu) +{ + unsigned long long ts; + int tail; + + tail = local_read(&dpage->commit); + return rb_read_data_buffer(dpage, tail, cpu, &ts, false); +} + +/* If the meta data has been validated, now validate the events */ +static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer) +{ + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + struct buffer_page *head_page; + unsigned long entry_bytes = 0; + unsigned long entries = 0; + int ret; + int i; + + if (!meta || !meta->head_buffer) + return; + + /* Do the reader page first */ + ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu); + if (ret < 0) { + printk("INVALID READER PAGE\n"); + goto invalid; + } + entries += ret; + entry_bytes += local_read(&cpu_buffer->reader_page->page->commit); + local_set(&cpu_buffer->reader_page->entries, ret); + + head_page = cpu_buffer->head_page; + + /* If both the head and commit are on the reader_page then we are done. */ + if (head_page == cpu_buffer->reader_page && + head_page == cpu_buffer->commit_page) + goto done; + + /* Iterate until finding the commit page */ + for (i = 0; i < meta->nr_subbufs + 1; i++, rb_inc_page(&head_page)) { + + /* Reader page has already been done */ + if (head_page == cpu_buffer->reader_page) + continue; + + ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu); + if (ret < 0) { + pr_info("Ring buffer meta [%d] invalid buffer page\n", + cpu_buffer->cpu); + goto invalid; + } + entries += ret; + entry_bytes += local_read(&head_page->page->commit); + local_set(&cpu_buffer->head_page->entries, ret); + + if (head_page == cpu_buffer->commit_page) + break; + } + + if (head_page != cpu_buffer->commit_page) { + pr_info("Ring buffer meta [%d] commit page not found\n", + cpu_buffer->cpu); + goto invalid; + } + done: + local_set(&cpu_buffer->entries, entries); + local_set(&cpu_buffer->entries_bytes, entry_bytes); + + pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu); + return; + + invalid: + /* The content of the buffers are invalid, reset the meta data */ + meta->head_buffer = 0; + meta->commit_buffer = 0; + + /* Reset the reader page */ + local_set(&cpu_buffer->reader_page->entries, 0); + local_set(&cpu_buffer->reader_page->page->commit, 0); + + /* Reset all the subbuffers */ + for (i = 0; i < meta->nr_subbufs - 1; i++, rb_inc_page(&head_page)) { + local_set(&head_page->entries, 0); + local_set(&head_page->page->commit, 0); + } +} + static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) { struct ring_buffer_meta *meta; @@ -1745,8 +1905,6 @@ static void *rbm_next(struct seq_file *m, void *v, loff_t *pos) return rbm_start(m, pos); } -static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf); - static int rbm_show(struct seq_file *m, void *v) { struct ring_buffer_per_cpu *cpu_buffer = m->private; @@ -1995,6 +2153,8 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) if (ret < 0) goto fail_free_reader; + rb_meta_validate_events(cpu_buffer); + /* If the boot meta was valid then this has already been updated */ meta = cpu_buffer->ring_meta; if (!meta || !meta->head_buffer || @@ -3899,26 +4059,6 @@ static void dump_buffer_page(struct buffer_data_page *bpage, pr_warn("expected end:0x%lx last event actually ended at:0x%x\n", tail, e); } -static DEFINE_PER_CPU(atomic_t, checking); -static atomic_t ts_dump; - -#define buffer_warn_return(fmt, ...) \ - do { \ - /* If another report is happening, ignore this one */ \ - if (atomic_inc_return(&ts_dump) != 1) { \ - atomic_dec(&ts_dump); \ - goto out; \ - } \ - atomic_inc(&cpu_buffer->record_disabled); \ - pr_warn(fmt, ##__VA_ARGS__); \ - dump_buffer_page(bpage, info, tail); \ - atomic_dec(&ts_dump); \ - /* There's some cases in boot up that this can happen */ \ - if (WARN_ON_ONCE(system_state != SYSTEM_BOOTING)) \ - /* Do not re-enable checking */ \ - return; \ - } while (0) - /* * Check if the current event time stamp matches the deltas on * the buffer page. @@ -3957,8 +4097,9 @@ static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer, if (atomic_inc_return(this_cpu_ptr(&checking)) != 1) goto out; - ts = bpage->time_stamp; + ret = rb_read_data_buffer(bpage, tail, &ts, cpu_buffer->cpu, true); +#if 0 for (e = 0; e < tail; e += rb_event_length(event)) { event = (struct ring_buffer_event *)(bpage->data + e); @@ -3992,6 +4133,7 @@ static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer, RB_WARN_ON(cpu_buffer, 1); } } +#endif if ((full && ts > info->ts) || (!full && ts + info->delta != info->ts)) { buffer_warn_return("[CPU: %d]TIME DOES NOT MATCH expected:%lld actual:%lld delta:%lld before:%lld after:%lld%s context:%s\n", From patchwork Thu Apr 11 01:25:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625295 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AB5A657AD; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; cv=none; b=EIxRlaHqFYlN1benZ87uO7fDVAw50JqgjYaBRk5Rkd1/Pnm4ZYGg9sj3WVwb+vDhedc7WweDkJ4wu7igE3RxnzzCvMJxxLkmTFjYswdwHd+sZEyQxGqXwr4g2e3ZcjRTgRok3DyBa9Vljn466wztTYc85fnuJIjDnoHgl2O+Eg4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; c=relaxed/simple; bh=2BB6TvBEG9JxTSuiXtOY5mfN3oEafelle7GnLEa5JII=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=uZE/pxR9RWHzcGo44JLXPN1youoEbPsrnPK+t7edNeyWIAZoBXrlB2Ni+FLCclOUrlZPGjFFW84TedfFt/YOvIkkN698zuEQfZPi26CCR7LvFuPtm/LLDN5A9cASRGQ4cSYm8rreQoCIZGd7jt//FB9B4F5/L4TgZOve8nLfF+E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2B84CC3278E; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFR-00000000nXu-24h2; Wed, 10 Apr 2024 21:29:05 -0400 Message-ID: <20240411012905.354430601@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:49 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 08/11] ring-buffer: Save text and data locations in mapped meta data References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" When a ring buffer is mapped to a specific address, save the address of a text function and some data. This will be used to determine the delta between the last boot and the current boot for pointers to functions as well as to data. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 6f19e7612472..c68695ae2749 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -45,6 +45,8 @@ static void update_pages_handler(struct work_struct *work); struct ring_buffer_meta { + unsigned long text_addr; + unsigned long data_addr; unsigned long first_buffer; unsigned long head_buffer; unsigned long commit_buffer; @@ -539,6 +541,9 @@ struct trace_buffer { unsigned long range_addr_start; unsigned long range_addr_end; + long last_text_delta; + long last_data_delta; + unsigned int subbuf_size; unsigned int subbuf_order; unsigned int max_data_size; @@ -1827,10 +1832,15 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer) } } +/* Used to calculate data delta */ +static char rb_data_ptr[] __initdata = ""; + static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) { struct ring_buffer_meta *meta; unsigned long delta; + unsigned long this_text = (unsigned long)rb_range_meta_init; + unsigned long this_data = (unsigned long)rb_data_ptr; void *subbuf; int cpu; int i; @@ -1847,6 +1857,10 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) meta->first_buffer += delta; meta->head_buffer += delta; meta->commit_buffer += delta; + buffer->last_text_delta = this_text - meta->text_addr; + buffer->last_data_delta = this_data - meta->data_addr; + meta->text_addr = this_text; + meta->data_addr = this_data; continue; } @@ -1863,6 +1877,8 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) subbuf = rb_subbufs_from_meta(meta); meta->first_buffer = (unsigned long)subbuf; + meta->text_addr = this_text; + meta->data_addr = this_data; /* * The buffers[] array holds the order of the sub-buffers From patchwork Thu Apr 11 01:25:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625298 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B536169944; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; cv=none; b=HOBAQy5zmO7lSp51uRKeJ4n5yPKNByeCwVpCPutrpW4mldEaBkUdFdRht3cW+3cxdmK92F1zMw+D8YhXoHm2GwN/ONCV2/3qhLDcFgcMG/0mh8G7WTfP+I7ugGENj+l+OrEBQYkKhG7BYD7+b7GQ0f+zPefScwwa269TxmNxjsY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; c=relaxed/simple; bh=tHHmtzEzbAqYDUTjwAOyNj1XGNAlZ5qKZwUkxVRso18=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=KmNNjuFJLkZLyzSkPgkIElpder1u9ywdGgJa6iz4mG1R54jgnl+K68HBcyw4/5vUkXRPOqws5q4fIiwD0lD2wnuqfIgpkreJHRTODRJOJZej3Wmicy7MsFcRn2QJk+AZoCZIW5NfTbo039vsafWimZ5FJqploHI05cUD/hDALiQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 674EEC32795; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFR-00000000nYP-2ixV; Wed, 10 Apr 2024 21:29:05 -0400 Message-ID: <20240411012905.513883918@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:50 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 09/11] tracing/ring-buffer: Add last_boot_info file to boot instance References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" If an instance is mapped to memory on boot up, create a new file called "last_boot_info" that will hold information that can be used to properly parse the raw data in the ring buffer. It will export the delta of the addresses for text and data from what it was from the last boot. It does not expose actually addresses (unless you knew what the actual address was from the last boot). The output will look like: # cat last_boot_info text delta: -268435456 data delta: -268435456 The text and data are kept separate in case they are ever made different. Signed-off-by: Steven Rostedt (Google) --- include/linux/ring_buffer.h | 3 +++ kernel/trace/ring_buffer.c | 23 ++++++++++++++++++ kernel/trace/trace.c | 47 ++++++++++++++++++++++++++++++++++++- kernel/trace/trace.h | 2 ++ 4 files changed, 74 insertions(+), 1 deletion(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index a50b0223b1d3..55de3798a9b9 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -94,6 +94,9 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag unsigned long range_size, struct lock_class_key *key); +bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text, + long *data); + /* * Because the ring buffer is generic, if other users of the ring buffer get * traced by ftrace, it can produce lockdep warnings. We need to keep each diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index c68695ae2749..e189f500ac32 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2390,6 +2390,29 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag return alloc_buffer(size, flags, order, start, start + range_size, key); } +/** + * ring_buffer_last_boot_delta - return the delta offset from last boot + * @buffer: The buffer to return the delta from + * @text: Return text delta + * @data: Return data delta + * + * Returns: The true if the delta is non zero + */ +bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text, + long *data) +{ + if (!buffer) + return false; + + if (!buffer->last_text_delta) + return false; + + *text = buffer->last_text_delta; + *data = buffer->last_data_delta; + + return true; +} + /** * ring_buffer_free - free a ring buffer. * @buffer: the buffer to free. diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index b2b16be6e4ec..33ef5311fa39 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6041,6 +6041,18 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr, return ret; } +static void update_last_data(struct trace_array *tr) +{ + if (!tr->text_delta && !tr->data_delta) + return; + + /* Clear old data */ + tracing_reset_online_cpus(&tr->array_buffer); + + /* Using current data now */ + tr->text_delta = 0; + tr->data_delta = 0; +} /** * tracing_update_buffers - used by tracing facility to expand ring buffers @@ -6058,6 +6070,9 @@ int tracing_update_buffers(struct trace_array *tr) int ret = 0; mutex_lock(&trace_types_lock); + + update_last_data(tr); + if (!tr->ring_buffer_expanded) ret = __tracing_resize_ring_buffer(tr, trace_buf_size, RING_BUFFER_ALL_CPUS); @@ -6113,6 +6128,8 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) mutex_lock(&trace_types_lock); + update_last_data(tr); + if (!tr->ring_buffer_expanded) { ret = __tracing_resize_ring_buffer(tr, trace_buf_size, RING_BUFFER_ALL_CPUS); @@ -6860,6 +6877,21 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf, return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); } +static ssize_t +tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) +{ + struct trace_array *tr = filp->private_data; + struct seq_buf seq; + char buf[64]; + + seq_buf_init(&seq, buf, 64); + + seq_buf_printf(&seq, "text delta:\t%ld\n", tr->text_delta); + seq_buf_printf(&seq, "data delta:\t%ld\n", tr->data_delta); + + return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq)); +} + static int tracing_buffer_meta_open(struct inode *inode, struct file *filp) { struct trace_array *tr = inode->i_private; @@ -7499,6 +7531,13 @@ static const struct file_operations trace_time_stamp_mode_fops = { .release = tracing_single_release_tr, }; +static const struct file_operations last_boot_fops = { + .open = tracing_open_generic_tr, + .read = tracing_last_boot_read, + .llseek = generic_file_llseek, + .release = tracing_release_generic_tr, +}; + #ifdef CONFIG_TRACER_SNAPSHOT static const struct file_operations snapshot_fops = { .open = tracing_snapshot_open, @@ -9251,6 +9290,9 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0, tr->range_addr_start, tr->range_addr_size); + + ring_buffer_last_boot_delta(buf->buffer, + &tr->text_delta, &tr->data_delta); /* * This is basically the same as a mapped buffer, * with the same restrictions. @@ -9735,7 +9777,10 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) MEM_FAIL(1, "Could not allocate function filter files"); #ifdef CONFIG_TRACER_SNAPSHOT - if (!tr->range_addr_start) { + if (tr->range_addr_start) { + trace_create_file("last_boot_info", TRACE_MODE_READ, d_tracer, + tr, &last_boot_fops); + } else { trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer, tr, &snapshot_fops); } diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index ccff4891c2ac..653ed9027554 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -341,6 +341,8 @@ struct trace_array { /* The below is for memory mapped ring buffer */ unsigned long range_addr_start; unsigned long range_addr_size; + long text_delta; + long data_delta; #ifdef CONFIG_FSNOTIFY struct dentry *d_max_latency; struct work_struct fsnotify_work; From patchwork Thu Apr 11 01:25:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625297 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B532769314; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; cv=none; b=Mo2oEqnXZ8tukVHUHf8jf/T8SdNg5cHikzLtlnBkXb54Uo0rLEWzdmJJCFpxIW89N2dHM9JiayxLFMtgdUJgP6rIkXpREPTXcZfuAzQUFiUWzGG0smMxXX44AkYvz9TpECj+/NFgCFvfYLMcu2MbzX8vEdDMNodBdE+qlF6QqpA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798787; c=relaxed/simple; bh=Ertzit/1teBML8ZsgAn3OsCPk/nGsMK+MB0FCFsLJMA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=IT1x/wX1tdwtMlG7VD7J7ABXmvfcsUSI/9fEvr+cCefJ3hiLIgpUqyPp4c44nMKFZpJgpRf2hJWyttkvWbpBpr2YgduErXj87YM/yUNR2wL6fzrE8DznDFVa9pRvBS30Iyj57mEgkkX/smP4VpXGW5PCYdoGDNwVP5WG0S6ModI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 67857C4163B; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFR-00000000nYu-3O3p; Wed, 10 Apr 2024 21:29:05 -0400 Message-ID: <20240411012905.668885748@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:51 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 10/11] tracing: Handle old buffer mappings for event strings and functions References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" Use the saved text_delta and data_delta of a persistent memory mapped ring buffer that was saved from a previous boot, and use the delta in the trace event print output so that strings and functions show up normally. That is, for an event like trace_kmalloc() that prints the callsite via "%pS", if it used the address saved in the ring buffer it will not match the function that was saved in the previous boot if the kernel remaps itself between boots. For RCU events that point to saved static strings where only the address of the string is saved in the ring buffer, it too will be adjusted to point to where the string is on the current boot. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 33ef5311fa39..bac5db67cf15 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3671,8 +3671,11 @@ static void test_can_verify(void) void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, va_list ap) { + long text_delta = iter->tr->text_delta; + long data_delta = iter->tr->data_delta; const char *p = fmt; const char *str; + bool good; int i, j; if (WARN_ON_ONCE(!fmt)) @@ -3691,7 +3694,10 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, j = 0; - /* We only care about %s and variants */ + /* + * We only care about %s and variants + * as well as %p[sS] if delta is non-zero + */ for (i = 0; p[i]; i++) { if (i + 1 >= iter->fmt_size) { /* @@ -3720,6 +3726,11 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, } if (p[i+j] == 's') break; + + if (text_delta && p[i+1] == 'p' && + ((p[i+2] == 's' || p[i+2] == 'S'))) + break; + star = false; } j = 0; @@ -3733,6 +3744,24 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, iter->fmt[i] = '\0'; trace_seq_vprintf(&iter->seq, iter->fmt, ap); + /* Add delta to %pS pointers */ + if (p[i+1] == 'p') { + unsigned long addr; + char fmt[4]; + + fmt[0] = '%'; + fmt[1] = 'p'; + fmt[2] = p[i+2]; /* Either %ps or %pS */ + fmt[3] = '\0'; + + addr = va_arg(ap, unsigned long); + addr += text_delta; + trace_seq_printf(&iter->seq, fmt, (void *)addr); + + p += i + 3; + continue; + } + /* * If iter->seq is full, the above call no longer guarantees * that ap is in sync with fmt processing, and further calls @@ -3751,6 +3780,14 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, /* The ap now points to the string data of the %s */ str = va_arg(ap, const char *); + good = trace_safe_str(iter, str, star, len); + + /* Could be from the last boot */ + if (data_delta && !good) { + str += data_delta; + good = trace_safe_str(iter, str, star, len); + } + /* * If you hit this warning, it is likely that the * trace event in question used %s on a string that @@ -3760,8 +3797,7 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, * instead. See samples/trace_events/trace-events-sample.h * for reference. */ - if (WARN_ONCE(!trace_safe_str(iter, str, star, len), - "fmt: '%s' current_buffer: '%s'", + if (WARN_ONCE(!good, "fmt: '%s' current_buffer: '%s'", fmt, seq_buf_str(&iter->seq.seq))) { int ret; From patchwork Thu Apr 11 01:25:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 13625299 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 214E969D31; Thu, 11 Apr 2024 01:26:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798788; cv=none; b=MQx65/s/1ZtBIQdKMnu37gCx9s34w29b70KOMR0JPheoqJZ5SU9lE7wlbAu5AmgpDn7YhaPhqE8ZxT616HmuNUrUt1DtXxXDFZJ/SZlkLmi561fIclE2ELoSNub4Hro9OeU5tdKOW7JnONLx2iR3kjJ3m1LKdI9ecJ/DVxiVxDM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712798788; c=relaxed/simple; bh=IE1IK2OKn7/pafSQ0ncO+HLzuopsT1q+/33iYZS7wTg=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=nYEiH7bqDkmi89z1AEDMBS5Ho34ySH9sphhp9CJQyu94F+Pb5QT+y096ijZZJIfpzsCWA9wAdeT8twgVOmjoRkXJs0LAd+ZcyePm69EJPey54Rni3sJPfuFS7mWGEtJ0z2WEy/hqkS2FVU4MCNTQ2O2zsgdI7drKUbwgNnrLA78= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 864EEC32796; Thu, 11 Apr 2024 01:26:27 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rujFR-00000000nZP-43Mh; Wed, 10 Apr 2024 21:29:05 -0400 Message-ID: <20240411012905.828215645@goodmis.org> User-Agent: quilt/0.67 Date: Wed, 10 Apr 2024 21:25:52 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH v2 11/11] tracing: Update function tracing output for previous boot buffer References: <20240411012541.285904543@goodmis.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Steven Rostedt (Google)" For a persistent ring buffer that is saved across boots, if function tracing was performed in the previous boot, it only saves the address of the functions and uses "%pS" to print their names. But the current boot, those functions may be in different locations. The persistent meta-data saves the text delta between the two boots and can be used to find the address of the saved function of where it is located in the current boot. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_output.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index d8b302d01083..b9d2c64c0648 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -990,8 +990,11 @@ enum print_line_t trace_nop_print(struct trace_iterator *iter, int flags, } static void print_fn_trace(struct trace_seq *s, unsigned long ip, - unsigned long parent_ip, int flags) + unsigned long parent_ip, long delta, int flags) { + ip += delta; + parent_ip += delta; + seq_print_ip_sym(s, ip, flags); if ((flags & TRACE_ITER_PRINT_PARENT) && parent_ip) { @@ -1009,7 +1012,7 @@ static enum print_line_t trace_fn_trace(struct trace_iterator *iter, int flags, trace_assign_type(field, iter->ent); - print_fn_trace(s, field->ip, field->parent_ip, flags); + print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, flags); trace_seq_putc(s, '\n'); return trace_handle_return(s); @@ -1674,7 +1677,7 @@ trace_func_repeats_print(struct trace_iterator *iter, int flags, trace_assign_type(field, iter->ent); - print_fn_trace(s, field->ip, field->parent_ip, flags); + print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, flags); trace_seq_printf(s, " (repeats: %u, last_ts:", field->count); trace_print_time(s, iter, iter->ts - FUNC_REPEATS_GET_DELTA_TS(field));