diff mbox series

[PATCH/RFC] tracing: make trace_marker{,_raw} stream-like

Message ID 20210909115734.3818711-1-john@metanate.com (mailing list archive)
State Superseded
Delegated to: Steven Rostedt
Headers show
Series [PATCH/RFC] tracing: make trace_marker{,_raw} stream-like | expand

Commit Message

John Keeping Sept. 9, 2021, 11:57 a.m. UTC
The tracing marker files are write-only streams with no meaningful
concept of file position.  Using stream_open() to mark them as
stream-link indicates this and has the added advantage that a single
file descriptor can now be used from multiple threads without contention
thanks to clearing FMODE_ATOMIC_POS.

Note that this has the potential to break existing userspace by since
both lseek(2) and pwrite(2) will now return ESPIPE when previously lseek
would have updated the stored offset and pwrite would have appended to
the trace.  A survey of libtracefs and several other projects found to
use trace_marker(_raw) [1][2][3] suggests that everyone limits
themselves to calling write(2) and close(2) on these file descriptors so
there is a good chance this will go unnoticed and the benefits of
reduced overhead and lock contention seem worth the risk.

[1] https://github.com/google/perfetto
[2] https://github.com/intel/media-driver/
[3] https://w1.fi/cgit/hostap/

Signed-off-by: John Keeping <john@metanate.com>
---
I'm marking this as RFC because it breaks the Prime Directive of Linux
development, as explained above.  But I'm hoping this is one of the
cases where we get away with it because this is a huge improvement for
multi-threaded programs when doing the simple thing of opening a single
trace_marker FD at startup and just writing to it from any thread.

After writing this, I realised that an alternative solution to my
original problem would have been to use pwrite instead of write!  But I
still think fixing this at the source would be better.

 kernel/trace/trace.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

Comments

John Keeping Dec. 7, 2021, 10:31 a.m. UTC | #1
Hi Steve,

On Thu, Sep 09, 2021 at 08:35:29AM -0400, Steven Rostedt wrote:
> On Thu,  9 Sep 2021 12:57:34 +0100
> John Keeping <john@metanate.com> wrote:
> 
> > The tracing marker files are write-only streams with no meaningful
> > concept of file position.  Using stream_open() to mark them as
> > stream-link indicates this and has the added advantage that a single
> > file descriptor can now be used from multiple threads without contention
> > thanks to clearing FMODE_ATOMIC_POS.
> > 
> > Note that this has the potential to break existing userspace by since
> > both lseek(2) and pwrite(2) will now return ESPIPE when previously lseek
> > would have updated the stored offset and pwrite would have appended to
> > the trace.  A survey of libtracefs and several other projects found to
> > use trace_marker(_raw) [1][2][3] suggests that everyone limits
> > themselves to calling write(2) and close(2) on these file descriptors so
> > there is a good chance this will go unnoticed and the benefits of
> > reduced overhead and lock contention seem worth the risk.
> > 
> > [1] https://github.com/google/perfetto
> > [2] https://github.com/intel/media-driver/
> > [3] https://w1.fi/cgit/hostap/
> > 
> > Signed-off-by: John Keeping <john@metanate.com>
> > ---
> > I'm marking this as RFC because it breaks the Prime Directive of Linux
> > development, as explained above.  But I'm hoping this is one of the
> 
> The "Prime Directive of Linux development" is the tree falling in the
> forest approach. If you break user space API but there's no user space
> application around to notice the break, did you really break it? The answer
> is "No".
> 
> > cases where we get away with it because this is a huge improvement for
> > multi-threaded programs when doing the simple thing of opening a single
> > trace_marker FD at startup and just writing to it from any thread.
> 
> I like the idea too.
> 
> > 
> > After writing this, I realised that an alternative solution to my
> > original problem would have been to use pwrite instead of write!  But I
> > still think fixing this at the source would be better.
> 
> I'm fine with adding this. But I'm going to add it after the merge window
> for the next release (5.16).
> 
> If someone complains that it broke their application, we may need to revert
> it, but I doubt that will happen.

Were you expecting more input from me on this?  The above sounded like
"will be added for 5.16" but I don't see this change in v5.16-rc4 and
the patch is still marked as "New" in patchwork [1]

[1] https://patchwork.kernel.org/project/linux-trace-devel/patch/20210909115734.3818711-1-john@metanate.com/


Thanks,
John
Steven Rostedt Dec. 7, 2021, 2:03 p.m. UTC | #2
On Tue, 7 Dec 2021 10:31:15 +0000
John Keeping <john@metanate.com> wrote:

> Were you expecting more input from me on this?  The above sounded like
> "will be added for 5.16" but I don't see this change in v5.16-rc4 and
> the patch is still marked as "New" in patchwork [1]
> 
> [1] https://patchwork.kernel.org/project/linux-trace-devel/patch/20210909115734.3818711-1-john@metanate.com/

And that explains why I didn't add it :-(

The linux-trace-devel is mainly for user space tools, although some kernel
patches end up there too. Any patch that hits my inbox without Cc'ing
linux-trace-devel falls into my internal patchwork. I look at that one
first for kernel patches to add to my queue, and then I'll look at that
public one. I forgot about this conversation, and because the title of that
patch has RFC in it, I thought there would be another patch coming. I don't
add RFC patches generally, and forgot about this conversation.

Can you resend the patch to me, but remove the linux-trace-devel mailing
list (but still include linux-kernel).

Thanks!

-- Steve
diff mbox series

Patch

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2dbf797aa133..251679846a1b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4834,6 +4834,12 @@  int tracing_open_generic_tr(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static int tracing_mark_open(struct inode *inode, struct file *filp)
+{
+	stream_open(inode, filp);
+	return tracing_open_generic_tr(inode, filp);
+}
+
 static int tracing_release(struct inode *inode, struct file *file)
 {
 	struct trace_array *tr = inode->i_private;
@@ -7101,9 +7107,6 @@  tracing_mark_write(struct file *filp, const char __user *ubuf,
 	if (tt)
 		event_triggers_post_call(tr->trace_marker_file, tt);
 
-	if (written > 0)
-		*fpos += written;
-
 	return written;
 }
 
@@ -7162,9 +7165,6 @@  tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
 
 	__buffer_unlock_commit(buffer, event);
 
-	if (written > 0)
-		*fpos += written;
-
 	return written;
 }
 
@@ -7564,16 +7564,14 @@  static const struct file_operations tracing_free_buffer_fops = {
 };
 
 static const struct file_operations tracing_mark_fops = {
-	.open		= tracing_open_generic_tr,
+	.open		= tracing_mark_open,
 	.write		= tracing_mark_write,
-	.llseek		= generic_file_llseek,
 	.release	= tracing_release_generic_tr,
 };
 
 static const struct file_operations tracing_mark_raw_fops = {
-	.open		= tracing_open_generic_tr,
+	.open		= tracing_mark_open,
 	.write		= tracing_mark_raw_write,
-	.llseek		= generic_file_llseek,
 	.release	= tracing_release_generic_tr,
 };