Message ID | 20211216173511.10390-1-beaub@linux.microsoft.com (mailing list archive) |
---|---|
Headers | show |
Series | user_events: Enable user processes to create and write to trace events | expand |
* Beau Belgrave | 2021-12-16 09:34:59 [-0800]: >The typical scenario is on process start to mmap user_events_status. Processes >then register the events they plan to use via the REG ioctl. The ioctl reads >and updates the passed in user_reg struct. The status_index of the struct is >used to know the byte in the status page to check for that event. The >write_index of the struct is used to describe that event when writing out to >the fd that was used for the ioctl call. The data must always include this >index first when writing out data for an event. Data can be written either by >write() or by writev(). Hey Beau, a little bit late to the party. A few questions from my side: What are the exact weak points of USDT compared to User Events that stand in the way of further extend USDT (in a non-compatible way, sure, just as an different approach!)? The nice thing about USDT is that I can search for all possible probes of the system via "find / | readelf | ". Since they are listed in a dedicated ELF section (.note.stapsdt) - they are visible & transparent. I can also map a hierarchy/structure in Executable/DSO via clever choice of names. The big disadvantage of USDT is the lack of type information, but from a registration, explicit point of view, they are nice. Or in other words: why not extends the USDT approach? Why not u32 val = 23; const char *garbage = "tracestring"; DYNAMIC_TRACE_PROBE2("foo:bar", val, u32, garbage, cstring); Sure, the argument names, here "val" and "garbage" should also be saved. I also like the "just one additional header to the project to get things running" (#include "sdt.h"). Sure, a DYNAMIC_TRACE_IS_ACTIVE("foo:bar") would be great. But in fact we have never needed that in the past. hgn
On Mon, Apr 18, 2022 at 10:43:29PM +0200, Hagen Paul Pfeifer wrote: > * Beau Belgrave | 2021-12-16 09:34:59 [-0800]: > > >The typical scenario is on process start to mmap user_events_status. Processes > >then register the events they plan to use via the REG ioctl. The ioctl reads > >and updates the passed in user_reg struct. The status_index of the struct is > >used to know the byte in the status page to check for that event. The > >write_index of the struct is used to describe that event when writing out to > >the fd that was used for the ioctl call. The data must always include this > >index first when writing out data for an event. Data can be written either by > >write() or by writev(). > > Hey Beau, a little bit late to the party. A few questions from my side: What > are the exact weak points of USDT compared to User Events that stand in the > way of further extend USDT (in a non-compatible way, sure, just as an > different approach!)? The nice thing about USDT is that I can search for all > possible probes of the system via "find / | readelf | ". Since they are listed > in a dedicated ELF section (.note.stapsdt) - they are visible & transparent. I > can also map a hierarchy/structure in Executable/DSO via clever choice of > names. The big disadvantage of USDT is the lack of type information, but from > a registration, explicit point of view, they are nice. > > Or in other words: why not extends the USDT approach? Why not > > u32 val = 23; > const char *garbage = "tracestring"; > > DYNAMIC_TRACE_PROBE2("foo:bar", val, u32, garbage, cstring); > We actually tried some USDT extension methods early on, by extending the .note.stapsdt sections and seeing how far we could get our definitions into that form. There are a few problems when running in a highly container/CGROUP environment even if you can get our formats into stapsdt. It costs a lot to transverse every ELF file on the machine to find all the notes. When profiling or tracing many containers, each cgroup's mount space must be entered and then tracked. Since these files are in different locations, they each need a separate probe definition, since the definitions/patches are tied to the location of the binary to patch. As new cgroups come online, we would have to keep track of each new binary location and find probes that match their location. This becomes really hard to manage if for example we just want to always enable a specific event regardless of where it is on the filesystem. Events are limited to a max of 2^16 having many duplicate events in the system might start to approach that limit for high-core machines with many small cgroup isolations. We run programs that are built on interpreted or JIT'd code (C#, javascript, etc.). These don't have great places to put a stap definition, since they aren't ELF files. I've seen approaches where temporary ELF files are generated, however, this costs a lot. Now we have even more temporarily files to go patch, meaning more events and more probe definitions (many of them in our case would be duplicates of the others). In production environments we have them locked down heavily with both SELINUX and IPE enabled. This prevents us from patching user mode code on the fly, the typical perf probe calls fail here. We typically want to know what events are available to us with very little overhead. Having programs register to a well known location already (trace_events, tracefs) I can easily see all the user events on the system by just doing ls on /sys/kernel/tracing/events/user_events. I can also see all their data formats and easily enable hist and filtering since these formats are known to the kernel. In our testing uprobes are much more costly to the running program than the write syscall. For managed code, as in java, code is moving around and are not always in static locations. The probe locations can change, etc. Calling from a managed location into a native one has performance implications as well when using a dynamic/temp elf stub approach. We are actively using user_events to solve these problems in our environments that have previously seen high overheads to achieve the same results. Many times we cannot afford to miss any events, so live scanning for new ELF files doesn't work for us as the programs and cgroups are short lived. > > Sure, the argument names, here "val" and "garbage" should also be saved. I > also like the "just one additional header to the project to get things > running" (#include "sdt.h"). Sure, a DYNAMIC_TRACE_IS_ACTIVE("foo:bar") would > be great. But in fact we have never needed that in the past. > > > hgn Thanks, -Beau