diff mbox series

[4/8] KVM: selftests: Copy printf.c to KVM selftests

Message ID 20230301053425.3880773-5-aaronlewis@google.com (mailing list archive)
State New, archived
Headers show
Series Add printf and formatted asserts in the guest | expand

Commit Message

Aaron Lewis March 1, 2023, 5:34 a.m. UTC
Add a local version of vsprintf() for the guest to use.

The file printf.c was lifted from arch/x86/boot/printf.c.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
---
 tools/testing/selftests/kvm/lib/printf.c | 307 +++++++++++++++++++++++
 1 file changed, 307 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/lib/printf.c

Comments

Sean Christopherson March 23, 2023, 10:04 p.m. UTC | #1
On Wed, Mar 01, 2023, Aaron Lewis wrote:
> Add a local version of vsprintf() for the guest to use.
> 
> The file printf.c was lifted from arch/x86/boot/printf.c.

Is there by any shance a version of 
> +/*
> + * Oh, it's a waste of space, but oh-so-yummy for debugging.  This
> + * version of printf() does not include 64-bit support.  "Live with

But selftests are 64-bit only, at least on x86.

> +static char *number(char *str, long num, int base, int size, int precision,
> +		    int type)

Do we actually need a custom number()?  I.e. can we sub in a libc equivalent?
That would reduce the craziness of this file by more than a few degrees.
Aaron Lewis April 17, 2023, 3:59 p.m. UTC | #2
On Thu, Mar 23, 2023, Sean Christopherson wrote:
> On Wed, Mar 01, 2023, Aaron Lewis wrote:
> > Add a local version of vsprintf() for the guest to use.
> > 
> > The file printf.c was lifted from arch/x86/boot/printf.c.
> 
> Is there by any shance a version of 
> > +/*
> > + * Oh, it's a waste of space, but oh-so-yummy for debugging.  This
> > + * version of printf() does not include 64-bit support.  "Live with
> 
> But selftests are 64-bit only, at least on x86.
> 

I think that's a legacy comment.  AFAICT this code supports 64-bit values.
I'll remove the comment to avoid confusion.

> > +static char *number(char *str, long num, int base, int size, int precision,
> > +		    int type)
> 
> Do we actually need a custom number()?  I.e. can we sub in a libc equivalent?
> That would reduce the craziness of this file by more than a few degrees.
> 

Yeah, I think we need it.  One of the biggest problems I'm trying to avoid
here is the use of LIBC in a guest.  Using it always seems to end poorly
because guests generally don't set up AVX-512 or a TLS segmet, nor should
they have to.  Calling into LIBC seems to require both of them too often,
so it seems like it's better to just avoid it.

Also, in working on v2 I upgraded vsprintf() to vsnprintf() which required
custom changes to number().
Sean Christopherson April 18, 2023, 3:03 p.m. UTC | #3
On Mon, Apr 17, 2023, Aaron Lewis wrote:
> On Thu, Mar 23, 2023, Sean Christopherson wrote:
> > > +static char *number(char *str, long num, int base, int size, int precision,
> > > +		    int type)
> > 
> > Do we actually need a custom number()?  I.e. can we sub in a libc equivalent?
> > That would reduce the craziness of this file by more than a few degrees.
> 
> Yeah, I think we need it.  One of the biggest problems I'm trying to avoid
> here is the use of LIBC in a guest.  Using it always seems to end poorly
> because guests generally don't set up AVX-512 or a TLS segmet, nor should
> they have to.  Calling into LIBC seems to require both of them too often,
> so it seems like it's better to just avoid it.

True, we'd probably end up in a world of hurt.

I was going to suggest copy+pasting from a different source, e.g. musl, in the
hopes of reducing the crazy by a degree, but after looking at the musl source,
that's a terrible idea :-)

And copying from the kernel has the advantage of keeping any bugs/quirks that
users may be expecting and/or relying on.
Andrew Jones April 18, 2023, 4:06 p.m. UTC | #4
On Tue, Apr 18, 2023 at 08:03:53AM -0700, Sean Christopherson wrote:
> On Mon, Apr 17, 2023, Aaron Lewis wrote:
> > On Thu, Mar 23, 2023, Sean Christopherson wrote:
> > > > +static char *number(char *str, long num, int base, int size, int precision,
> > > > +		    int type)
> > > 
> > > Do we actually need a custom number()?  I.e. can we sub in a libc equivalent?
> > > That would reduce the craziness of this file by more than a few degrees.
> > 
> > Yeah, I think we need it.  One of the biggest problems I'm trying to avoid
> > here is the use of LIBC in a guest.  Using it always seems to end poorly
> > because guests generally don't set up AVX-512 or a TLS segmet, nor should
> > they have to.  Calling into LIBC seems to require both of them too often,
> > so it seems like it's better to just avoid it.
> 
> True, we'd probably end up in a world of hurt.
> 
> I was going to suggest copy+pasting from a different source, e.g. musl, in the
> hopes of reducing the crazy by a degree, but after looking at the musl source,
> that's a terrible idea :-)
> 
> And copying from the kernel has the advantage of keeping any bugs/quirks that
> users may be expecting and/or relying on.

What about trying to use tools/include/nolibc/? Maybe we could provide our
own tools/include/nolibc/arch-*.h files where the my_syscall* macros get
implemented with ucalls, and then the ucalls would implement the syscalls,
possibly just forwarding the parameters to real syscalls. We can implement
copy_from/to_guest() functions to deal with pointer parameters.

Thanks,
drew
Sean Christopherson April 20, 2023, 5:50 p.m. UTC | #5
On Tue, Apr 18, 2023, Andrew Jones wrote:
> On Tue, Apr 18, 2023 at 08:03:53AM -0700, Sean Christopherson wrote:
> > On Mon, Apr 17, 2023, Aaron Lewis wrote:
> > > On Thu, Mar 23, 2023, Sean Christopherson wrote:
> > > > > +static char *number(char *str, long num, int base, int size, int precision,
> > > > > +		    int type)
> > > > 
> > > > Do we actually need a custom number()?  I.e. can we sub in a libc equivalent?
> > > > That would reduce the craziness of this file by more than a few degrees.
> > > 
> > > Yeah, I think we need it.  One of the biggest problems I'm trying to avoid
> > > here is the use of LIBC in a guest.  Using it always seems to end poorly
> > > because guests generally don't set up AVX-512 or a TLS segmet, nor should
> > > they have to.  Calling into LIBC seems to require both of them too often,
> > > so it seems like it's better to just avoid it.
> > 
> > True, we'd probably end up in a world of hurt.
> > 
> > I was going to suggest copy+pasting from a different source, e.g. musl, in the
> > hopes of reducing the crazy by a degree, but after looking at the musl source,
> > that's a terrible idea :-)
> > 
> > And copying from the kernel has the advantage of keeping any bugs/quirks that
> > users may be expecting and/or relying on.
> 
> What about trying to use tools/include/nolibc/? Maybe we could provide our
> own tools/include/nolibc/arch-*.h files where the my_syscall* macros get
> implemented with ucalls, and then the ucalls would implement the syscalls,
> possibly just forwarding the parameters to real syscalls. We can implement
> copy_from/to_guest() functions to deal with pointer parameters.

Hmm, I was going to say that pulling in nolibc would conflict with the host side's
need for an actual libc, but I think we could solve that conundrum by putting
ucall_fmt() in a dedicated file and compiling it separately, a la string_override.c.

However, I don't think we'd want to override my_syscall to do a ucall.  If I'm
reading the code correctly, that would trigger a ucall after every escape sequence,
which isn't what we want, expecially for a GUEST_ASSERT.

That's solvable by having my_syscall3() be a memcpy() to the buffer provided by
KVM's guest-side vsprintf(), but that still leaves the question of whether or not
taking a dependency on nolibc.h would be a net positive.

E.g. pulling in nolibc.h as-is would well and truly put ucall_fmt.c (or whatever
it's called) into its own world, e.g. it would end up with different typedefs for
all basic types.  Those shouldn't cause problems, but it'd be a weird setup.  And
I don't think we can rule out the possibility of the nolibc dependency causing
subtle problems, e.g. what will happen when linking due to both nolibc and
string_override.c defining globally visible mem{cmp,cpy,set}() functions.

Another minor issue is that nolibc's vfprintf() handles a subset of escapes compared
to the kernel's vsprintf().  That probably won't be a big deal in practice, but it's
again a potential maintenance concern for us in the future.

I'm definitely torn.  As much as I dislike the idea of copy+pasting mode code into
KVM selftests, I think pulling in nolibc would bring its own set of problems.

My vote is probably to copy+paste, at least for the initial implementation.  Being
able to do the equivalent of printf() in the guest would be a huge improvement for
debugging and triaging selftests, i.e. is something I would like to see landed
fairly quickly.  Copy+pasting seems like it gives us the fastest path forward,
e.g. doesn't risk getting bogged down with weird linker errors and whatnot.
Andrew Jones April 21, 2023, 6:03 a.m. UTC | #6
On Thu, Apr 20, 2023 at 10:50:07AM -0700, Sean Christopherson wrote:
> On Tue, Apr 18, 2023, Andrew Jones wrote:
> > On Tue, Apr 18, 2023 at 08:03:53AM -0700, Sean Christopherson wrote:
> > > On Mon, Apr 17, 2023, Aaron Lewis wrote:
> > > > On Thu, Mar 23, 2023, Sean Christopherson wrote:
> > > > > > +static char *number(char *str, long num, int base, int size, int precision,
> > > > > > +		    int type)
> > > > > 
> > > > > Do we actually need a custom number()?  I.e. can we sub in a libc equivalent?
> > > > > That would reduce the craziness of this file by more than a few degrees.
> > > > 
> > > > Yeah, I think we need it.  One of the biggest problems I'm trying to avoid
> > > > here is the use of LIBC in a guest.  Using it always seems to end poorly
> > > > because guests generally don't set up AVX-512 or a TLS segmet, nor should
> > > > they have to.  Calling into LIBC seems to require both of them too often,
> > > > so it seems like it's better to just avoid it.
> > > 
> > > True, we'd probably end up in a world of hurt.
> > > 
> > > I was going to suggest copy+pasting from a different source, e.g. musl, in the
> > > hopes of reducing the crazy by a degree, but after looking at the musl source,
> > > that's a terrible idea :-)
> > > 
> > > And copying from the kernel has the advantage of keeping any bugs/quirks that
> > > users may be expecting and/or relying on.
> > 
> > What about trying to use tools/include/nolibc/? Maybe we could provide our
> > own tools/include/nolibc/arch-*.h files where the my_syscall* macros get
> > implemented with ucalls, and then the ucalls would implement the syscalls,
> > possibly just forwarding the parameters to real syscalls. We can implement
> > copy_from/to_guest() functions to deal with pointer parameters.
> 
> Hmm, I was going to say that pulling in nolibc would conflict with the host side's
> need for an actual libc, but I think we could solve that conundrum by putting
> ucall_fmt() in a dedicated file and compiling it separately, a la string_override.c.
> 
> However, I don't think we'd want to override my_syscall to do a ucall.  If I'm
> reading the code correctly, that would trigger a ucall after every escape sequence,
> which isn't what we want, expecially for a GUEST_ASSERT.
> 
> That's solvable by having my_syscall3() be a memcpy() to the buffer provided by
> KVM's guest-side vsprintf(), but that still leaves the question of whether or not
> taking a dependency on nolibc.h would be a net positive.
> 
> E.g. pulling in nolibc.h as-is would well and truly put ucall_fmt.c (or whatever
> it's called) into its own world, e.g. it would end up with different typedefs for
> all basic types.  Those shouldn't cause problems, but it'd be a weird setup.  And
> I don't think we can rule out the possibility of the nolibc dependency causing
> subtle problems, e.g. what will happen when linking due to both nolibc and
> string_override.c defining globally visible mem{cmp,cpy,set}() functions.
> 
> Another minor issue is that nolibc's vfprintf() handles a subset of escapes compared
> to the kernel's vsprintf().  That probably won't be a big deal in practice, but it's
> again a potential maintenance concern for us in the future.
> 
> I'm definitely torn.  As much as I dislike the idea of copy+pasting mode code into
> KVM selftests, I think pulling in nolibc would bring its own set of problems.
> 
> My vote is probably to copy+paste, at least for the initial implementation.  Being
> able to do the equivalent of printf() in the guest would be a huge improvement for
> debugging and triaging selftests, i.e. is something I would like to see landed
> fairly quickly.

Fully agree.

> Copy+pasting seems like it gives us the fastest path forward,
> e.g. doesn't risk getting bogged down with weird linker errors and whatnot.

Works for me.

Thanks,
drew
diff mbox series

Patch

diff --git a/tools/testing/selftests/kvm/lib/printf.c b/tools/testing/selftests/kvm/lib/printf.c
new file mode 100644
index 000000000000..1237beeb9540
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/printf.c
@@ -0,0 +1,307 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/* -*- linux-c -*- ------------------------------------------------------- *
+ *
+ *   Copyright (C) 1991, 1992 Linus Torvalds
+ *   Copyright 2007 rPath, Inc. - All Rights Reserved
+ *
+ * ----------------------------------------------------------------------- */
+
+/*
+ * Oh, it's a waste of space, but oh-so-yummy for debugging.  This
+ * version of printf() does not include 64-bit support.  "Live with
+ * it."
+ *
+ */
+
+#include "boot.h"
+
+static int skip_atoi(const char **s)
+{
+	int i = 0;
+
+	while (isdigit(**s))
+		i = i * 10 + *((*s)++) - '0';
+	return i;
+}
+
+#define ZEROPAD	1		/* pad with zero */
+#define SIGN	2		/* unsigned/signed long */
+#define PLUS	4		/* show plus */
+#define SPACE	8		/* space if plus */
+#define LEFT	16		/* left justified */
+#define SMALL	32		/* Must be 32 == 0x20 */
+#define SPECIAL	64		/* 0x */
+
+#define __do_div(n, base) ({ \
+int __res; \
+__res = ((unsigned long) n) % (unsigned) base; \
+n = ((unsigned long) n) / (unsigned) base; \
+__res; })
+
+static char *number(char *str, long num, int base, int size, int precision,
+		    int type)
+{
+	/* we are called with base 8, 10 or 16, only, thus don't need "G..."  */
+	static const char digits[16] = "0123456789ABCDEF"; /* "GHIJKLMNOPQRSTUVWXYZ"; */
+
+	char tmp[66];
+	char c, sign, locase;
+	int i;
+
+	/* locase = 0 or 0x20. ORing digits or letters with 'locase'
+	 * produces same digits or (maybe lowercased) letters */
+	locase = (type & SMALL);
+	if (type & LEFT)
+		type &= ~ZEROPAD;
+	if (base < 2 || base > 16)
+		return NULL;
+	c = (type & ZEROPAD) ? '0' : ' ';
+	sign = 0;
+	if (type & SIGN) {
+		if (num < 0) {
+			sign = '-';
+			num = -num;
+			size--;
+		} else if (type & PLUS) {
+			sign = '+';
+			size--;
+		} else if (type & SPACE) {
+			sign = ' ';
+			size--;
+		}
+	}
+	if (type & SPECIAL) {
+		if (base == 16)
+			size -= 2;
+		else if (base == 8)
+			size--;
+	}
+	i = 0;
+	if (num == 0)
+		tmp[i++] = '0';
+	else
+		while (num != 0)
+			tmp[i++] = (digits[__do_div(num, base)] | locase);
+	if (i > precision)
+		precision = i;
+	size -= precision;
+	if (!(type & (ZEROPAD + LEFT)))
+		while (size-- > 0)
+			*str++ = ' ';
+	if (sign)
+		*str++ = sign;
+	if (type & SPECIAL) {
+		if (base == 8)
+			*str++ = '0';
+		else if (base == 16) {
+			*str++ = '0';
+			*str++ = ('X' | locase);
+		}
+	}
+	if (!(type & LEFT))
+		while (size-- > 0)
+			*str++ = c;
+	while (i < precision--)
+		*str++ = '0';
+	while (i-- > 0)
+		*str++ = tmp[i];
+	while (size-- > 0)
+		*str++ = ' ';
+	return str;
+}
+
+int vsprintf(char *buf, const char *fmt, va_list args)
+{
+	int len;
+	unsigned long num;
+	int i, base;
+	char *str;
+	const char *s;
+
+	int flags;		/* flags to number() */
+
+	int field_width;	/* width of output field */
+	int precision;		/* min. # of digits for integers; max
+				   number of chars for from string */
+	int qualifier;		/* 'h', 'l', or 'L' for integer fields */
+
+	for (str = buf; *fmt; ++fmt) {
+		if (*fmt != '%') {
+			*str++ = *fmt;
+			continue;
+		}
+
+		/* process flags */
+		flags = 0;
+	      repeat:
+		++fmt;		/* this also skips first '%' */
+		switch (*fmt) {
+		case '-':
+			flags |= LEFT;
+			goto repeat;
+		case '+':
+			flags |= PLUS;
+			goto repeat;
+		case ' ':
+			flags |= SPACE;
+			goto repeat;
+		case '#':
+			flags |= SPECIAL;
+			goto repeat;
+		case '0':
+			flags |= ZEROPAD;
+			goto repeat;
+		}
+
+		/* get field width */
+		field_width = -1;
+		if (isdigit(*fmt))
+			field_width = skip_atoi(&fmt);
+		else if (*fmt == '*') {
+			++fmt;
+			/* it's the next argument */
+			field_width = va_arg(args, int);
+			if (field_width < 0) {
+				field_width = -field_width;
+				flags |= LEFT;
+			}
+		}
+
+		/* get the precision */
+		precision = -1;
+		if (*fmt == '.') {
+			++fmt;
+			if (isdigit(*fmt))
+				precision = skip_atoi(&fmt);
+			else if (*fmt == '*') {
+				++fmt;
+				/* it's the next argument */
+				precision = va_arg(args, int);
+			}
+			if (precision < 0)
+				precision = 0;
+		}
+
+		/* get the conversion qualifier */
+		qualifier = -1;
+		if (*fmt == 'h' || *fmt == 'l' || *fmt == 'L') {
+			qualifier = *fmt;
+			++fmt;
+		}
+
+		/* default base */
+		base = 10;
+
+		switch (*fmt) {
+		case 'c':
+			if (!(flags & LEFT))
+				while (--field_width > 0)
+					*str++ = ' ';
+			*str++ = (unsigned char)va_arg(args, int);
+			while (--field_width > 0)
+				*str++ = ' ';
+			continue;
+
+		case 's':
+			s = va_arg(args, char *);
+			len = strnlen(s, precision);
+
+			if (!(flags & LEFT))
+				while (len < field_width--)
+					*str++ = ' ';
+			for (i = 0; i < len; ++i)
+				*str++ = *s++;
+			while (len < field_width--)
+				*str++ = ' ';
+			continue;
+
+		case 'p':
+			if (field_width == -1) {
+				field_width = 2 * sizeof(void *);
+				flags |= ZEROPAD;
+			}
+			str = number(str,
+				     (unsigned long)va_arg(args, void *), 16,
+				     field_width, precision, flags);
+			continue;
+
+		case 'n':
+			if (qualifier == 'l') {
+				long *ip = va_arg(args, long *);
+				*ip = (str - buf);
+			} else {
+				int *ip = va_arg(args, int *);
+				*ip = (str - buf);
+			}
+			continue;
+
+		case '%':
+			*str++ = '%';
+			continue;
+
+			/* integer number formats - set up the flags and "break" */
+		case 'o':
+			base = 8;
+			break;
+
+		case 'x':
+			flags |= SMALL;
+		case 'X':
+			base = 16;
+			break;
+
+		case 'd':
+		case 'i':
+			flags |= SIGN;
+		case 'u':
+			break;
+
+		default:
+			*str++ = '%';
+			if (*fmt)
+				*str++ = *fmt;
+			else
+				--fmt;
+			continue;
+		}
+		if (qualifier == 'l')
+			num = va_arg(args, unsigned long);
+		else if (qualifier == 'h') {
+			num = (unsigned short)va_arg(args, int);
+			if (flags & SIGN)
+				num = (short)num;
+		} else if (flags & SIGN)
+			num = va_arg(args, int);
+		else
+			num = va_arg(args, unsigned int);
+		str = number(str, num, base, field_width, precision, flags);
+	}
+	*str = '\0';
+	return str - buf;
+}
+
+int sprintf(char *buf, const char *fmt, ...)
+{
+	va_list args;
+	int i;
+
+	va_start(args, fmt);
+	i = vsprintf(buf, fmt, args);
+	va_end(args);
+	return i;
+}
+
+int printf(const char *fmt, ...)
+{
+	char printf_buf[1024];
+	va_list args;
+	int printed;
+
+	va_start(args, fmt);
+	printed = vsprintf(printf_buf, fmt, args);
+	va_end(args);
+
+	puts(printf_buf);
+
+	return printed;
+}