Message ID | 20231214221803.1a923e10@gandalf.local.home (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | ring-buffer: Do not try to put back write_stamp | expand |
On Thu, 14 Dec 2023 22:18:03 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > If an update to an event is interrupted by another event between the time > the initial event allocated its buffer and where it wrote to the > write_stamp, the code try to reset the write stamp back to the what it had > just overwritten. It knows that it was overwritten via checking the > before_stamp, and if it didn't match what it wrote to the before_stamp > before it allocated its space, it knows it was overwritten. > > To put back the write_stamp, it uses the before_stamp it read. The problem > here is that by writing the before_stamp to the write_stamp it makes the > two equal again, which means that the write_stamp can be considered valid > as the last timestamp written to the ring buffer. But this is not > necessarily true. The event that interrupted the event could have been > interrupted in a way that it was interrupted as well, and can end up > leaving with an invalid write_stamp. But if this happens and returns to > this context that uses the before_stamp to update the write_stamp again, > it can possibly incorrectly make it valid, causing later events to have in > correct time stamps. > > As it is OK to leave this function with an invalid write_stamp (one that > doesn't match the before_stamp), there's no reason to try to make it valid > again in this case. If this race happens, then just leave with the invalid > write_stamp and the next event to come along will just add a absolute > timestamp and validate everything again. > > Bonus points: This gets rid of another cmpxchg64! > With this patch and https://lore.kernel.org/linux-trace-kernel/20231211114420.36dde01b@gandalf.local.home I remove two critical uses of cmpxchg64, leaving only two other use cases. Both of which are pretty much optional! One is used in the discard path, which I plan on getting rid of anyway, and if the cmpxchg64 fails there, it just doesn't discard the event but converts it to padding. The other location is in another slow path were it tries to get a more accurate timestamp. If that cmpxchg64 fails, it just uses the timestamp of the previous event that interrupted it. Which is what the code use to do before all this timestamp magic was done. That is, I can still get rid of the 32-bit cmpxchg64 logic and instead just replace it with: #ifdef CONFIG_ARCH_DOES_NOT_HAVE_CMPXCHG64 /* I made this up, but you get the gist */ static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) { return false; } #else static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) { return local64_try_cmpxchg(&t->time, &expect, set); } #endif And that makes things so much easier! Again, after this and the other patch applied, the cmpxcg64 is just a nice-to-have and not required for the ring buffer to work. -- Steve
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 9fdbd08af72f..378ff9e53fe6 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3418,12 +3418,14 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, } if (likely(tail == w)) { - u64 save_before; - /* Nothing interrupted us between A and C */ /*D*/ rb_time_set(&cpu_buffer->write_stamp, info->ts); - barrier(); - /*E*/ rb_time_read(&cpu_buffer->before_stamp, &save_before); + /* + * If something came in between C and D, the write stamp + * may now not be in sync. But that's fine as the before_stamp + * will be different and then next event will just be forced + * to use an absolute timestamp. + */ if (likely(!(info->add_timestamp & (RB_ADD_STAMP_FORCE | RB_ADD_STAMP_ABSOLUTE)))) /* This did not interrupt any time update */ @@ -3431,23 +3433,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, else /* Just use full timestamp for interrupting event */ info->delta = info->ts; - barrier(); check_buffer(cpu_buffer, info, tail); - if (unlikely(info->ts != save_before)) { - /* SLOW PATH - Interrupted between C and E */ - - rb_time_read(&cpu_buffer->write_stamp, &info->after); - - /* Write stamp must only go forward */ - if (save_before > info->after) { - /* - * We do not care about the result, only that - * it gets updated atomically. - */ - (void)rb_time_cmpxchg(&cpu_buffer->write_stamp, - info->after, save_before); - } - } } else { u64 ts; /* SLOW PATH - Interrupted between A and C */