From patchwork Mon Feb 24 12:13:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Donnefort X-Patchwork-Id: 13988018 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B625CC021A4 for ; Mon, 24 Feb 2025 13:13:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=jWYLW84fII6ZnL9QJz4u7J1y34KkAmylrxqKmP4x71E=; b=UY1EcRSW/sAgvAPAF9/30I8uwa MzMmYHA4V9e3x2bGM6J7p+R/vJEgTuyV7Kq4FsrrOIAxZHnN8QuRwlOkAkiCLKVmC5EKY2a+Php7c Yu3zShVrapWHlBej0grdDszh5x0Q1jtz5CNY/ryAy9tcXqrR5PD9C3oePQdq/ocPKIehOLT+tNQ5P APVDEUjCY6oDur0ToK6T3JqfNVGWE45pUDo4G/PVciyLzGhcoKBcJTTG+N5snc7a0VXRKa5tVxVx2 neNzAXL7aNNIqVcJOFHui/ZVO1mTyzEep//NY7ZX+8Uz5cz9WBi6j3wpG/yfYneQo3g7LoORHN/g1 6er5AmoA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tmYHO-0000000Dn5A-03ZG; Mon, 24 Feb 2025 13:13:50 +0000 Received: from mail-wr1-x44a.google.com ([2a00:1450:4864:20::44a]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tmXMu-0000000DeIT-0f3v for linux-arm-kernel@lists.infradead.org; Mon, 24 Feb 2025 12:15:30 +0000 Received: by mail-wr1-x44a.google.com with SMTP id ffacd0b85a97d-38f27acb979so4400846f8f.0 for ; Mon, 24 Feb 2025 04:15:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740399326; x=1741004126; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jWYLW84fII6ZnL9QJz4u7J1y34KkAmylrxqKmP4x71E=; b=EEQatXMtIylyRahfbKn9yt+ksKcFAJV+gLlCAdjgpoU1gg88SHhMWjxS1xdAuHeGOj 0vJkqtc/5vRwpx0W/eLQcX2L4RQBlclj4n+lior8TsEYvr14CvSsXTizY96mki1kqO6M n3f5+CgpPG/oxcAZ61Cxlgz1UF3LjzferomTBfTR5tA34Y7wzwfCpFD7Tyqxwr5/rYW9 OKLaZGJgxYS7p+GCX+OUYnQoyoUh0LH53+X/wS4C4ciSbvSs7TLqpV4AE9BYJh0JTNva PVI8ap0rIWNymyy/lI0Hty9pjpNuhwwjFhknG9EYZILzIPm/641UbWS5+0PEumKufb7g 0Ziw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740399326; x=1741004126; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jWYLW84fII6ZnL9QJz4u7J1y34KkAmylrxqKmP4x71E=; b=ZevCAXWC2qbKU2uz+wVbFHALJjm2rhKY1itmsizfn2lcfY/SE4gTAhPxsqaa87VPK/ KEKzZp1jTZcuN4Sa730lIL63IIwcMnUPmKcdHtOBcp2SHakI7LSnLTuQkuZtW2FHPNVv 4UMyPhFIFRb75J3N2nm1qorgLEJ0YyoCy+hlRhpsa+VjogDtCzEbRLjljBD9jdX39mAK FRi0OJlWZ23Koai2YCP5QV2syo+djsIE923geNEp4IS9jz4rBWx29LDJEChntRMyowon Hn7IBcF7pp436LLvdH3qV/bZoezRbLpLj22ZKrfULs3Vwbb1BZlwnpYPRuCK8bVAnXrY enFg== X-Forwarded-Encrypted: i=1; AJvYcCUZDsh4O9iaMfC91arJa2O2sZ5svITdmsMine4UowgeyqVRVkjbkuWwZRSA6ldmNWONmQQkazLyO44+QpzefM8e@lists.infradead.org X-Gm-Message-State: AOJu0YzIKQ+uQ0EBZxt801Lh6fJvge6/KGOW+4usaTqydHKdLZ1D3mQt ya/2S5lEnNJNsv5gSUmkKt8dJhjcp7C45lLktO+1ZlbpkgcwWvYUNvqvfiluNmVsdJ2hik5CaQ9 HwNZMt9gSDUSlVTR/3Q== X-Google-Smtp-Source: AGHT+IHyObP309uBRXnN5wwj7NDZKKEgN2u3PMNug3aC5ywwpb/Sai0pnVQ5x/varO7eM7fJIU1wJnfYbD7A4Yfr X-Received: from wmgg25.prod.google.com ([2002:a05:600d:19:b0:439:997a:8b94]) (user=vdonnefort job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:6c6d:0:b0:38f:4cdc:5d2a with SMTP id ffacd0b85a97d-38f70783feamr9139552f8f.4.1740399326119; Mon, 24 Feb 2025 04:15:26 -0800 (PST) Date: Mon, 24 Feb 2025 12:13:43 +0000 In-Reply-To: <20250224121353.98697-1-vdonnefort@google.com> Mime-Version: 1.0 References: <20250224121353.98697-1-vdonnefort@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250224121353.98697-2-vdonnefort@google.com> Subject: [PATCH 01/11] ring-buffer: Introduce ring-buffer remote From: Vincent Donnefort To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, linux-trace-kernel@vger.kernel.org, maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, jstultz@google.com, qperret@google.com, will@kernel.org, kernel-team@android.com, linux-kernel@vger.kernel.org, Vincent Donnefort X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250224_041528_213027_172F33DB X-CRM114-Status: GOOD ( 26.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org A ring-buffer remote is an entity outside of the kernel (most likely a firmware or a hypervisor) capable of writing events in a ring-buffer following the same format as the tracefs ring-buffer. To setup the ring-buffer on the kernel side, a description of the pages (struct trace_page_desc) is necessary. A callback (get_reader_page) must also be provided. It is called whenever it is done reading the previous reader page. It is expected from the remote to keep the meta-page updated. Signed-off-by: Vincent Donnefort diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 17fbb7855295..2a1330a65edb 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -248,4 +248,59 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu, struct vm_area_struct *vma); int ring_buffer_unmap(struct trace_buffer *buffer, int cpu); int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu); + +#define meta_pages_lost(__meta) \ + ((__meta)->Reserved1) +#define meta_pages_touched(__meta) \ + ((__meta)->Reserved2) + +struct rb_page_desc { + unsigned int cpu; + unsigned int nr_page_va; /* excludes the meta page */ + unsigned long meta_va; + unsigned long page_va[]; +}; + +struct trace_page_desc { + size_t struct_len; + unsigned int nr_cpus; + char __data[]; /* list of rb_page_desc */ +}; + +static inline +struct rb_page_desc *__next_rb_page_desc(struct rb_page_desc *pdesc) +{ + size_t len = struct_size(pdesc, page_va, pdesc->nr_page_va); + + return (struct rb_page_desc *)((void *)pdesc + len); +} + +static inline +struct rb_page_desc *__first_rb_page_desc(struct trace_page_desc *trace_pdesc) +{ + return (struct rb_page_desc *)(&trace_pdesc->__data[0]); +} + +#define for_each_rb_page_desc(__pdesc, __cpu, __trace_pdesc) \ + for (__pdesc = __first_rb_page_desc(__trace_pdesc), __cpu = 0; \ + __cpu < (__trace_pdesc)->nr_cpus; \ + __cpu++, __pdesc = __next_rb_page_desc(__pdesc)) + +struct ring_buffer_remote { + struct trace_page_desc *pdesc; + int (*get_reader_page)(int cpu); + int (*reset)(int cpu); +}; + +int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu); + +struct trace_buffer * +__ring_buffer_alloc_remote(struct ring_buffer_remote *remote, + struct lock_class_key *key); + +#define ring_buffer_remote(remote) \ +({ \ + static struct lock_class_key __key; \ + __ring_buffer_alloc_remote(remote, &__key); \ +}) #endif /* _LINUX_RING_BUFFER_H */ diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index bb6089c2951e..c27516a384a8 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -519,6 +519,8 @@ struct ring_buffer_per_cpu { struct trace_buffer_meta *meta_page; struct ring_buffer_meta *ring_meta; + struct ring_buffer_remote *remote; + /* ring buffer pages to update, > 0 to add, < 0 to remove */ long nr_pages_to_update; struct list_head new_pages; /* new pages to add */ @@ -541,6 +543,8 @@ struct trace_buffer { struct ring_buffer_per_cpu **buffers; + struct ring_buffer_remote *remote; + struct hlist_node node; u64 (*clock)(void); @@ -2155,6 +2159,41 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, return -ENOMEM; } +static struct rb_page_desc *rb_page_desc(struct trace_page_desc *trace_pdesc, int cpu) +{ + struct rb_page_desc *pdesc, *end; + size_t len; + int i; + + if (!trace_pdesc) + return NULL; + + if (cpu >= trace_pdesc->nr_cpus) + return NULL; + + end = (struct rb_page_desc *)((void *)trace_pdesc + trace_pdesc->struct_len); + pdesc = __first_rb_page_desc(trace_pdesc); + len = struct_size(pdesc, page_va, pdesc->nr_page_va); + pdesc = (struct rb_page_desc *)((void *)pdesc + (len * cpu)); + + if (pdesc < end && pdesc->cpu == cpu) + return pdesc; + + /* Missing CPUs, need to linear search */ + for_each_rb_page_desc(pdesc, i, trace_pdesc) { + if (pdesc->cpu == cpu) + return pdesc; + } + + return NULL; +} + +static void *rb_page_desc_page(struct rb_page_desc *pdesc, int page_id) +{ + return page_id > pdesc->nr_page_va ? NULL : (void *)pdesc->page_va[page_id]; +} + + static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages) { @@ -2215,6 +2254,31 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) cpu_buffer->reader_page = bpage; + if (buffer->remote) { + struct rb_page_desc *pdesc = rb_page_desc(buffer->remote->pdesc, cpu); + + if (!pdesc) + goto fail_free_reader; + + cpu_buffer->remote = buffer->remote; + cpu_buffer->meta_page = (struct trace_buffer_meta *)(void *)pdesc->meta_va; + cpu_buffer->subbuf_ids = pdesc->page_va; + cpu_buffer->nr_pages = pdesc->nr_page_va - 1; + atomic_inc(&cpu_buffer->record_disabled); + atomic_inc(&cpu_buffer->resize_disabled); + + bpage->page = rb_page_desc_page(pdesc, + cpu_buffer->meta_page->reader.id); + if (!bpage->page) + goto fail_free_reader; + /* + * The meta-page can only describe which of the ring-buffer page + * is the reader. There is no need to init the rest of the + * ring-buffer. + */ + return cpu_buffer; + } + if (buffer->range_addr_start) { /* * Range mapped buffers have the same restrictions as memory @@ -2292,6 +2356,10 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer) irq_work_sync(&cpu_buffer->irq_work.work); + /* remote ring-buffer. We do not own the data pages */ + if (cpu_buffer->remote) + cpu_buffer->reader_page->page = NULL; + free_buffer_page(cpu_buffer->reader_page); if (head) { @@ -2313,7 +2381,8 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer) static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags, int order, unsigned long start, unsigned long end, - struct lock_class_key *key) + struct lock_class_key *key, + struct ring_buffer_remote *remote) { struct trace_buffer *buffer; long nr_pages; @@ -2341,6 +2410,11 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags, buffer->flags = flags; buffer->clock = trace_clock_local; buffer->reader_lock_key = key; + if (remote) { + buffer->remote = remote; + /* The writer is remote. This ring-buffer is read-only */ + atomic_inc(&buffer->record_disabled); + } init_irq_work(&buffer->irq_work.work, rb_wake_up_waiters); init_waitqueue_head(&buffer->irq_work.waiters); @@ -2447,7 +2521,7 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key) { /* Default buffer page size - one system page */ - return alloc_buffer(size, flags, 0, 0, 0,key); + return alloc_buffer(size, flags, 0, 0, 0, key, NULL); } EXPORT_SYMBOL_GPL(__ring_buffer_alloc); @@ -2471,7 +2545,18 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag unsigned long range_size, struct lock_class_key *key) { - return alloc_buffer(size, flags, order, start, start + range_size, key); + return alloc_buffer(size, flags, order, start, start + range_size, key, NULL); +} + +/** + * __ring_buffer_alloc_remote - allocate a new ring_buffer from a remote + * @remote: Contains a description of the ring-buffer pages and remote callbacks. + * @key: ring buffer reader_lock_key. + */ +struct trace_buffer *__ring_buffer_alloc_remote(struct ring_buffer_remote *remote, + struct lock_class_key *key) +{ + return alloc_buffer(0, 0, 0, 0, 0, key, remote); } /** @@ -5225,8 +5310,56 @@ rb_update_iter_read_stamp(struct ring_buffer_iter *iter, } } +static bool rb_read_remote_meta_page(struct ring_buffer_per_cpu *cpu_buffer) +{ + local_set(&cpu_buffer->entries, READ_ONCE(cpu_buffer->meta_page->entries)); + local_set(&cpu_buffer->overrun, READ_ONCE(cpu_buffer->meta_page->overrun)); + local_set(&cpu_buffer->pages_touched, READ_ONCE(meta_pages_touched(cpu_buffer->meta_page))); + local_set(&cpu_buffer->pages_lost, READ_ONCE(meta_pages_lost(cpu_buffer->meta_page))); + /* + * No need to get the "read" field, it can be tracked here as any + * reader will have to go through a rign_buffer_per_cpu. + */ + + return rb_num_of_entries(cpu_buffer); +} + static struct buffer_page * -rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) +__rb_get_reader_page_from_remote(struct ring_buffer_per_cpu *cpu_buffer) +{ + u32 prev_reader; + + if (!rb_read_remote_meta_page(cpu_buffer)) + return NULL; + + /* More to read on the reader page */ + if (cpu_buffer->reader_page->read < rb_page_size(cpu_buffer->reader_page)) { + if (!cpu_buffer->reader_page->read) + cpu_buffer->read_stamp = cpu_buffer->reader_page->page->time_stamp; + return cpu_buffer->reader_page; + } + + prev_reader = cpu_buffer->meta_page->reader.id; + + WARN_ON(cpu_buffer->remote->get_reader_page(cpu_buffer->cpu)); + /* nr_pages doesn't include the reader page */ + if (WARN_ON(cpu_buffer->meta_page->reader.id > cpu_buffer->nr_pages)) + return NULL; + + cpu_buffer->reader_page->page = + (void *)cpu_buffer->subbuf_ids[cpu_buffer->meta_page->reader.id]; + cpu_buffer->reader_page->id = cpu_buffer->meta_page->reader.id; + cpu_buffer->reader_page->read = 0; + cpu_buffer->read_stamp = cpu_buffer->reader_page->page->time_stamp; + cpu_buffer->lost_events = cpu_buffer->meta_page->reader.lost_events; + + WARN_ON(prev_reader == cpu_buffer->meta_page->reader.id); + + return rb_page_size(cpu_buffer->reader_page) ? cpu_buffer->reader_page : NULL; +} + +static struct buffer_page * +__rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) { struct buffer_page *reader = NULL; unsigned long bsize = READ_ONCE(cpu_buffer->buffer->subbuf_size); @@ -5397,6 +5530,13 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) return reader; } +static struct buffer_page * +rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) +{ + return cpu_buffer->remote ? __rb_get_reader_page_from_remote(cpu_buffer) : + __rb_get_reader_page(cpu_buffer); +} + static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer) { struct ring_buffer_event *event; @@ -5801,7 +5941,7 @@ ring_buffer_read_prepare(struct trace_buffer *buffer, int cpu, gfp_t flags) struct ring_buffer_per_cpu *cpu_buffer; struct ring_buffer_iter *iter; - if (!cpumask_test_cpu(cpu, buffer->cpumask)) + if (!cpumask_test_cpu(cpu, buffer->cpumask) || buffer->remote) return NULL; iter = kzalloc(sizeof(*iter), flags); @@ -5971,6 +6111,23 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer) { struct buffer_page *page; + if (cpu_buffer->remote) { + if (!cpu_buffer->remote->reset) + return; + + cpu_buffer->remote->reset(cpu_buffer->cpu); + rb_read_remote_meta_page(cpu_buffer); + + /* Read related values, not covered by the meta-page */ + local_set(&cpu_buffer->pages_read, 0); + cpu_buffer->read = 0; + cpu_buffer->read_bytes = 0; + cpu_buffer->last_overrun = 0; + cpu_buffer->reader_page->read = 0; + + return; + } + rb_head_page_deactivate(cpu_buffer); cpu_buffer->head_page @@ -6218,6 +6375,49 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu) } EXPORT_SYMBOL_GPL(ring_buffer_empty_cpu); +int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu) +{ + struct ring_buffer_per_cpu *cpu_buffer; + unsigned long flags; + + if (cpu != RING_BUFFER_ALL_CPUS) { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) + return -EINVAL; + + cpu_buffer = buffer->buffers[cpu]; + + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + if (rb_read_remote_meta_page(cpu_buffer)) + rb_wakeups(buffer, cpu_buffer); + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + + return 0; + } + + /* + * Make sure all the ring buffers are up to date before we start reading + * them. + */ + for_each_buffer_cpu(buffer, cpu) { + cpu_buffer = buffer->buffers[cpu]; + + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + rb_read_remote_meta_page(buffer->buffers[cpu]); + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + } + + for_each_buffer_cpu(buffer, cpu) { + cpu_buffer = buffer->buffers[cpu]; + + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + if (rb_num_of_entries(cpu_buffer)) + rb_wakeups(buffer, buffer->buffers[cpu]); + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + } + + return 0; +} + #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP /** * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers @@ -6469,6 +6669,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer, unsigned int commit; unsigned int read; u64 save_timestamp; + bool force_memcpy; int ret = -1; if (!cpumask_test_cpu(cpu, buffer->cpumask)) @@ -6506,6 +6707,8 @@ int ring_buffer_read_page(struct trace_buffer *buffer, /* Check if any events were dropped */ missed_events = cpu_buffer->lost_events; + force_memcpy = cpu_buffer->mapped || cpu_buffer->remote; + /* * If this page has been partially read or * if len is not big enough to read the rest of the page or @@ -6515,7 +6718,7 @@ int ring_buffer_read_page(struct trace_buffer *buffer, */ if (read || (len < (commit - read)) || cpu_buffer->reader_page == cpu_buffer->commit_page || - cpu_buffer->mapped) { + force_memcpy) { struct buffer_data_page *rpage = cpu_buffer->reader_page->page; unsigned int rpos = read; unsigned int pos = 0; @@ -7097,7 +7300,7 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu, unsigned long flags, *subbuf_ids; int err = 0; - if (!cpumask_test_cpu(cpu, buffer->cpumask)) + if (!cpumask_test_cpu(cpu, buffer->cpumask) || buffer->remote) return -EINVAL; cpu_buffer = buffer->buffers[cpu];