Message ID | 20210121212718.2441-9-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Implement support for external IPT monitoring | expand |
Andrew Cooper writes ("[PATCH v7 08/10] tools/misc: Add xen-vmtrace tool"): > From: Michał Leszczyński <michal.leszczynski@cert.pl> ... > + if ( signal(SIGINT, int_handler) == SIG_ERR ) > + err(1, "Failed to register signal handler\n"); How bad is it if this signal handler is not effective ? > + if ( xc_vmtrace_disable(xch, domid, vcpu) ) > + perror("xc_vmtrace_disable()"); I guess the tracing will remain on, pointlessly, which has a perf impact but nothing else ? How is it possible for the user to clean this up ? Also: at the very least, you need to trap SIGTERM SIGHUP SIGPIPE. It would be good to exit with the right signal by re-raising it. > +static volatile int interrupted = 0; sig_atomic_t Ian.
On 22/01/2021 15:33, Ian Jackson wrote: > Andrew Cooper writes ("[PATCH v7 08/10] tools/misc: Add xen-vmtrace tool"): >> From: Michał Leszczyński <michal.leszczynski@cert.pl> > ... >> + if ( signal(SIGINT, int_handler) == SIG_ERR ) >> + err(1, "Failed to register signal handler\n"); > How bad is it if this signal handler is not effective ? I believe far less so now that I've fixed up everything to use a (fixed) XENMEM_acquire_resource, so Xen doesn't crash if this process dies in the wrong order WRT the domain shutting down. But I would have to defer to Michał on that. >> + if ( xc_vmtrace_disable(xch, domid, vcpu) ) >> + perror("xc_vmtrace_disable()"); > I guess the tracing will remain on, pointlessly, which has a perf > impact but nothing else ? The perf hit is substantial, but it is safe to leave enabled. > How is it possible for the user to clean this up ? For now, enable/disable can only fail with -EINVAL for calls made in the wrong context, so a failure here is benign in practice. I specifically didn't opt for reference counting the enable/disable calls, because there cannot (usefully) be two users of this interface. > > Also: at the very least, you need to trap SIGTERM SIGHUP SIGPIPE. > > It would be good to exit with the right signal by re-raising it. This is example code, not a production utility. Anything more production-wise using this needs to account for the fact that Intel Processor Trace can't pause on a full buffer. (It ought to be able to on forthcoming hardware, but this facility isn't available yet.) The use-cases thus far are always "small delta of execution between introspection events", using a massive buffer as the mitigation for hardware wrapping. No amount of additional code here can prevent stream corruption problems with the buffer wrapping. As a result, it is kept as simple as possible as a demonstration of how to use the API. ~Andrew
Andrew Cooper writes ("Re: [PATCH v7 08/10] tools/misc: Add xen-vmtrace tool"): > On 22/01/2021 15:33, Ian Jackson wrote: > > Andrew Cooper writes ("[PATCH v7 08/10] tools/misc: Add xen-vmtrace tool"): > >> + if ( xc_vmtrace_disable(xch, domid, vcpu) ) > >> + perror("xc_vmtrace_disable()"); > > I guess the tracing will remain on, pointlessly, which has a perf > > impact but nothing else ? > > The perf hit is substantial, but it is safe to leave enabled. And how does one clean up after an unclean exit of this program ? > > How is it possible for the user to clean this up ? > > For now, enable/disable can only fail with -EINVAL for calls made in the > wrong context, so a failure here is benign in practice. > > I specifically didn't opt for reference counting the enable/disable > calls, because there cannot (usefully) be two users of this interface. Right. > > Also: at the very least, you need to trap SIGTERM SIGHUP SIGPIPE. > > > > It would be good to exit with the right signal by re-raising it. > > This is example code, not a production utility. Perhaps the utility could print some kind of health warning about it possibly leaving this perf-impacting feature enabled, and how to clean up ? Ian.
On 26/01/2021 11:59, Ian Jackson wrote: >>> Also: at the very least, you need to trap SIGTERM SIGHUP SIGPIPE. >>> >>> It would be good to exit with the right signal by re-raising it. >> This is example code, not a production utility. > Perhaps the utility could print some kind of health warning about it > possibly leaving this perf-impacting feature enabled, and how to clean > up ? Why? The theoretical fallout here is not conceptually different to libxl or qemu segfaulting, or any of the myriad other random utilities we have. Printing "Warning - this program, just like everything else in the Xen tree, might in exceptional circumstances segfault and leave the domain in a weird state" is obvious, and doesn't need stating. The domain is stuffed. `xl destroy` may or may not make the problem go away. ~Andrew
Andrew Cooper writes ("Re: [PATCH v7 08/10] tools/misc: Add xen-vmtrace tool"): > On 26/01/2021 11:59, Ian Jackson wrote: > >[Andrew:] > >> This is example code, not a production utility. > > Perhaps the utility could print some kind of health warning about it > > possibly leaving this perf-impacting feature enabled, and how to clean > > up ? > > Why? The theoretical fallout here is not conceptually different to > libxl or qemu segfaulting, or any of the myriad other random utilities > we have. > > Printing "Warning - this program, just like everything else in the Xen > tree, might in exceptional circumstances segfault and leave the domain > in a weird state" is obvious, and doesn't need stating. > > The domain is stuffed. `xl destroy` may or may not make the problem go away. Firstly, I don't agree with this pessimistic analysis of our current tooling. Secondly, I would consider many such behaviours bugs; certainly we have bugs but we shouldn't introduce more of them. You are justifying the poor robustness of this tool on the grounds that it's "example code, not a production utility". But we are shipping it to bin/ and there is nothing telling anyone that trying to use it (perhaps wrapped in something of their own devising) is a bad idea. Either this is code users might be expected to run in production in which we need to make it at least have a minimal level of engineering robustness (which is perhaps too difficult at this stage), or we need to communicate to our users that it's a programming example, not a useful utility. Note that *even if it is a programming example*, we should highlight its most important deficiencies. Otherwise it is a hazardously misleading example. I hope that makes sense. Thanks, Ian.
On 26/01/2021 13:32, Ian Jackson wrote: > Andrew Cooper writes ("Re: [PATCH v7 08/10] tools/misc: Add xen-vmtrace tool"): >> On 26/01/2021 11:59, Ian Jackson wrote: >>> [Andrew:] >>>> This is example code, not a production utility. >>> Perhaps the utility could print some kind of health warning about it >>> possibly leaving this perf-impacting feature enabled, and how to clean >>> up ? >> Why? The theoretical fallout here is not conceptually different to >> libxl or qemu segfaulting, or any of the myriad other random utilities >> we have. >> >> Printing "Warning - this program, just like everything else in the Xen >> tree, might in exceptional circumstances segfault and leave the domain >> in a weird state" is obvious, and doesn't need stating. >> >> The domain is stuffed. `xl destroy` may or may not make the problem go away. > Firstly, I don't agree with this pessimistic analysis of our current > tooling. Secondly, I would consider many such behaviours bugs; > certainly we have bugs but we shouldn't introduce more of them. > > You are justifying the poor robustness of this tool on the grounds > that it's "example code, not a production utility". > > But we are shipping it to bin/ and there is nothing telling anyone > that trying to use it (perhaps wrapped in something of their own > devising) is a bad idea. > > Either this is code users might be expected to run in production in > which we need to make it at least have a minimal level of engineering > robustness (which is perhaps too difficult at this stage), or we need > to communicate to our users that it's a programming example, not a > useful utility. > > Note that *even if it is a programming example*, we should highlight > its most important deficiencies. Otherwise it is a hazardously > misleading example. > > I hope that makes sense. First of all - I'm not the author of this code. I'm merely the person who did the latest round of cleanup, and sent the series. This code is a damn sight more robust than the other utilities, because in the case that something does go wrong, the domain will still function correctly. Almost everything else, qemu included, will leave the VM totally broken, as it becomes paused waiting on a request which has escaped into the ether. I also don't feel that now is an appropriate time to be insisting that the goalpost's move when it comes to submissions into tools/, particularly as you were happy (well - didn't comment on) with this pattern back in v3. What exact colour do you want this bikeshed? Anything non-trivial is going to miss 4.15. ~Andrew
diff --git a/tools/misc/.gitignore b/tools/misc/.gitignore index b2c3b21f57..ce6f937d0c 100644 --- a/tools/misc/.gitignore +++ b/tools/misc/.gitignore @@ -1,3 +1,4 @@ xen-access xen-memshare xen-ucode +xen-vmtrace diff --git a/tools/misc/Makefile b/tools/misc/Makefile index 912c5d4f0e..c5017e6c70 100644 --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -25,6 +25,7 @@ INSTALL_SBIN-$(CONFIG_X86) += xen-lowmemd INSTALL_SBIN-$(CONFIG_X86) += xen-memshare INSTALL_SBIN-$(CONFIG_X86) += xen-mfndump INSTALL_SBIN-$(CONFIG_X86) += xen-ucode +INSTALL_SBIN-$(CONFIG_X86) += xen-vmtrace INSTALL_SBIN += xencov INSTALL_SBIN += xenhypfs INSTALL_SBIN += xenlockprof @@ -90,6 +91,9 @@ xen-hvmcrash: xen-hvmcrash.o xen-memshare: xen-memshare.o $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) +xen-vmtrace: xen-vmtrace.o + $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(LDLIBS_libxenforeignmemory) $(APPEND_LDFLAGS) + xenperf: xenperf.o $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) diff --git a/tools/misc/xen-vmtrace.c b/tools/misc/xen-vmtrace.c new file mode 100644 index 0000000000..47fea871cf --- /dev/null +++ b/tools/misc/xen-vmtrace.c @@ -0,0 +1,154 @@ +/****************************************************************************** + * tools/vmtrace.c + * + * Demonstrative tool for collecting Intel Processor Trace data from Xen. + * Could be used to externally monitor a given vCPU in given DomU. + * + * Copyright (C) 2020 by CERT Polska - NASK PIB + * + * Authors: Michał Leszczyński, michal.leszczynski@cert.pl + * Date: June, 2020 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; under version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <err.h> +#include <errno.h> +#include <signal.h> +#include <stdio.h> +#include <stdlib.h> +#include <sys/mman.h> + +#include <xenctrl.h> +#include <xenforeignmemory.h> + +#define MSR_RTIT_CTL 0x00000570 +#define RTIT_CTL_OS (1 << 2) +#define RTIT_CTL_USR (1 << 3) +#define RTIT_CTL_BRANCH_EN (1 << 13) + +static volatile int interrupted = 0; + +static xc_interface *xch; +static xenforeignmemory_handle *fh; + +void int_handler(int signum) +{ + interrupted = 1; +} + +int main(int argc, char **argv) +{ + uint32_t domid, vcpu; + int rc, exit = 1; + size_t size; + char *buf = NULL; + xenforeignmemory_resource_handle *fres = NULL; + uint64_t last_offset = 0; + + if ( signal(SIGINT, int_handler) == SIG_ERR ) + err(1, "Failed to register signal handler\n"); + + if ( argc != 3 ) + { + fprintf(stderr, "Usage: %s <domid> <vcpu_id>\n", argv[0]); + fprintf(stderr, "It's recommended to redirect thisprogram's output to file\n"); + fprintf(stderr, "or to pipe it's output to xxd or other program.\n"); + return 1; + } + + domid = atoi(argv[1]); + vcpu = atoi(argv[2]); + + xch = xc_interface_open(NULL, NULL, 0); + fh = xenforeignmemory_open(NULL, 0); + + if ( !xch ) + err(1, "xc_interface_open()"); + if ( !fh ) + err(1, "xenforeignmemory_open()"); + + rc = xenforeignmemory_resource_size( + fh, domid, XENMEM_resource_vmtrace_buf, vcpu, &size); + if ( rc ) + err(1, "xenforeignmemory_resource_size()"); + + fres = xenforeignmemory_map_resource( + fh, domid, XENMEM_resource_vmtrace_buf, vcpu, + 0, size >> XC_PAGE_SHIFT, (void **)&buf, PROT_READ, 0); + if ( !fres ) + err(1, "xenforeignmemory_map_resource()"); + + if ( xc_vmtrace_set_option( + xch, domid, vcpu, MSR_RTIT_CTL, + RTIT_CTL_BRANCH_EN | RTIT_CTL_USR | RTIT_CTL_OS) ) + { + perror("xc_vmtrace_set_option()"); + goto out; + } + + if ( xc_vmtrace_enable(xch, domid, vcpu) ) + { + perror("xc_vmtrace_enable()"); + goto out; + } + + while ( !interrupted ) + { + xc_dominfo_t dominfo; + uint64_t offset; + + if ( xc_vmtrace_output_position(xch, domid, vcpu, &offset) ) + { + perror("xc_vmtrace_output_position()"); + goto out; + } + + if ( offset > last_offset ) + fwrite(buf + last_offset, offset - last_offset, 1, stdout); + else if ( offset < last_offset ) + { + /* buffer wrapped */ + fwrite(buf + last_offset, size - last_offset, 1, stdout); + fwrite(buf, offset, 1, stdout); + } + + last_offset = offset; + usleep(1000 * 100); + + if ( xc_domain_getinfo(xch, domid, 1, &dominfo) != 1 || + dominfo.domid != domid || dominfo.shutdown ) + break; + } + + exit = 0; + + out: + if ( xc_vmtrace_disable(xch, domid, vcpu) ) + perror("xc_vmtrace_disable()"); + + if ( fres && xenforeignmemory_unmap_resource(fh, fres) ) + perror("xenforeignmemory_unmap_resource()"); + + return exit; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */