diff mbox series

[1/4] xen/dmalloc: Introduce dmalloc() APIs

Message ID 20201223163442.8840-2-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series xen: domain-tracked allocations, and fault injection | expand

Commit Message

Andrew Cooper Dec. 23, 2020, 4:34 p.m. UTC
Wrappers for xmalloc() and friends, which track allocations tied to a specific
domain.

Check for any leaked memory at domain destruction time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Tamas K Lengyel <tamas@tklengyel.com>

RFC:
 * This probably wants to be less fatal in release builds
 * In an ideal world, we'd also want to count the total number of bytes
   allocated from the xmalloc heap, which would be interesting to print in the
   'q' debugkey.  However, that data is fairly invasive to obtain.
 * More complicated logic could track the origins of each allocation, and be
   able to identify which one(s) leaked.
---
 xen/common/Makefile       |  1 +
 xen/common/dmalloc.c      | 19 +++++++++++++++++++
 xen/common/domain.c       |  6 ++++++
 xen/include/xen/dmalloc.h | 29 +++++++++++++++++++++++++++++
 xen/include/xen/sched.h   |  2 ++
 5 files changed, 57 insertions(+)
 create mode 100644 xen/common/dmalloc.c
 create mode 100644 xen/include/xen/dmalloc.h

Comments

Jan Beulich Jan. 5, 2021, 3:56 p.m. UTC | #1
On 23.12.2020 17:34, Andrew Cooper wrote:
> RFC:
>  * This probably wants to be less fatal in release builds

I'm not even convinced this wants to be a panic() in debug builds.

>  * In an ideal world, we'd also want to count the total number of bytes
>    allocated from the xmalloc heap, which would be interesting to print in the
>    'q' debugkey.  However, that data is fairly invasive to obtain.

Unless we used an xmem_pool rather than the new interface being
a thin layer around xmalloc(). (This would then also provide
better locality of the allocations, i.e. different domains
wouldn't share allocations from the same page.) And even without
doing so, adding a function to retrieve the actual size
shouldn't be all that difficult - internally xmalloc_tlsf.c
knows the size, after all, for e.g. xrealloc() to work right.

> --- /dev/null
> +++ b/xen/include/xen/dmalloc.h
> @@ -0,0 +1,29 @@
> +#ifndef XEN_DMALLOC_H
> +#define XEN_DMALLOC_H
> +
> +#include <xen/types.h>
> +
> +struct domain;
> +
> +#define dzalloc_array(d, _type, _num)                                   \

While I realize I'll get bashed again, the inconsistency of using
(or not) leading underscores is too odd to not comment upon. I
don't see what use they are here, irrespective of my general view
on the topic.

> +    ((_type *)_dzalloc_array(d, sizeof(_type), __alignof__(_type), _num))
> +
> +
> +void dfree(struct domain *d, void *ptr);

May I ask to avoid double blank lines?

> +#define DFREE(d, p)                             \
> +    do {                                        \
> +        dfree(d, p);                            \
> +        (p) = NULL;                             \
> +    } while ( 0 )
> +
> +
> +void *_dzalloc(struct domain *d, size_t size, size_t align);
> +
> +static inline void *_dzalloc_array(struct domain *d, size_t size,
> +                                   size_t align, size_t num)
> +{
> +    return _dzalloc(d, size * num, align);

No protection at all against the multiplication overflowing?

Jan
Jan Beulich Jan. 5, 2021, 4:01 p.m. UTC | #2
On 23.12.2020 17:34, Andrew Cooper wrote:
> --- /dev/null
> +++ b/xen/common/dmalloc.c
> @@ -0,0 +1,19 @@
> +#include <xen/dmalloc.h>
> +#include <xen/sched.h>
> +#include <xen/xmalloc.h>
> +
> +void dfree(struct domain *d, void *ptr)
> +{
> +    atomic_dec(&d->dalloc_heap);
> +    xfree(ptr);
> +}
> +
> +void *_dzalloc(struct domain *d, size_t size, size_t align)
> +{
> +    void *ptr = _xmalloc(size, align);
> +
> +    if ( ptr )
> +        atomic_inc(&d->dalloc_heap);

While this is properly conditional, the freeing side also needs to
tolerate NULL coming in (noticed only while looking at patch 2).
I also wonder whether ZERO_BLOCK_PTR wants special casing.

Jan
Andrew Cooper Jan. 13, 2021, 11:16 p.m. UTC | #3
On 05/01/2021 15:56, Jan Beulich wrote:
> On 23.12.2020 17:34, Andrew Cooper wrote:
>> RFC:
>>  * This probably wants to be less fatal in release builds
> I'm not even convinced this wants to be a panic() in debug builds.

Any memory leak spotted by this is an XSA, except in the narrow case of
being due exclusively to a weird and non-default order of hypercalls.

It absolutely needs to be something fatal in debug builds, for avoid
going unnoticed by testing.  Patch 4 is my proposed solution for this,
which will hopefully prevent bugs from escaping staging.

For release builds, a real memory leak is less bad behaviour than taking
the host down, but it certainly shouldn't shouldn't go unnoticed.

>>  * In an ideal world, we'd also want to count the total number of bytes
>>    allocated from the xmalloc heap, which would be interesting to print in the
>>    'q' debugkey.  However, that data is fairly invasive to obtain.
> Unless we used an xmem_pool rather than the new interface being
> a thin layer around xmalloc(). (This would then also provide
> better locality of the allocations, i.e. different domains
> wouldn't share allocations from the same page.)

I'm not sure if using a separate memory pool is a clear cut improvement.

For an attacker which has a corruption primitive, a single shared pool
will reduce the chance of the exploit being stable across different
systems.  Improved locality of allocations is an advantage from the
attackers point of view, but the proper way to combat that is with a
real randomised heap allocator.

Sharing within the same page isn't an issue, so long as we respect a
cache line minimum allocation size.

More importantly however, until we know exactly how much memory we're
talking about here, switching to a per-domain pool might be a massive
waste.  The xmalloc() allocations are in the noise compared to RAM, and
might only be a small multiple of the pool metadata to begin with.

> And even without doing so, adding a function to retrieve the actual size
> shouldn't be all that difficult - internally xmalloc_tlsf.c
> knows the size, after all, for e.g. xrealloc() to work right.

Yeah - the internals of the pool can calculate this.  I was considering
doing just this, but wasn't sure how exposing an API for this would go down.

For maximum flexibility, it would be my preferred way forward.

>> --- /dev/null
>> +++ b/xen/include/xen/dmalloc.h
>> @@ -0,0 +1,29 @@
>> +#ifndef XEN_DMALLOC_H
>> +#define XEN_DMALLOC_H
>> +
>> +#include <xen/types.h>
>> +
>> +struct domain;
>> +
>> +#define dzalloc_array(d, _type, _num)                                   \
> While I realize I'll get bashed again, the inconsistency of using
> (or not) leading underscores is too odd to not comment upon. I
> don't see what use they are here, irrespective of my general view
> on the topic.
>
>> +    ((_type *)_dzalloc_array(d, sizeof(_type), __alignof__(_type), _num))
>> +
>> +
>> +void dfree(struct domain *d, void *ptr);
> May I ask to avoid double blank lines?

Both of these were just copied from xmalloc.h without any real thought. 
What I did forget was to plaster this series with RFC.

>> +#define DFREE(d, p)                             \
>> +    do {                                        \
>> +        dfree(d, p);                            \
>> +        (p) = NULL;                             \
>> +    } while ( 0 )
>> +
>> +
>> +void *_dzalloc(struct domain *d, size_t size, size_t align);
>> +
>> +static inline void *_dzalloc_array(struct domain *d, size_t size,
>> +                                   size_t align, size_t num)
>> +{
>> +    return _dzalloc(d, size * num, align);
> No protection at all against the multiplication overflowing?

Well - xmalloc doesn't have any either.  But yes - both want some
compiler-aided overflow detection, rather than messing around with
arbitrary limits pretending to be an overflow check.

~Andrew
Jan Beulich Jan. 14, 2021, 10:14 a.m. UTC | #4
On 14.01.2021 00:16, Andrew Cooper wrote:
> On 05/01/2021 15:56, Jan Beulich wrote:
>> On 23.12.2020 17:34, Andrew Cooper wrote:
>>> RFC:
>>>  * This probably wants to be less fatal in release builds
>> I'm not even convinced this wants to be a panic() in debug builds.
> 
> Any memory leak spotted by this is an XSA, except in the narrow case of
> being due exclusively to a weird and non-default order of hypercalls.
> 
> It absolutely needs to be something fatal in debug builds, for avoid
> going unnoticed by testing.

This is a perfectly valid position to take, but I'm afraid once
again isn't the only possible one. Since I do routinely look at
logs, I'm personally in favor of avoiding crashing the host
even for debug builds. The more that I've had pretty bad fallout
from crashes in the past, due to (afaict) bugs in a file system
driver in Linux that persisted over a longer period of time.

>  Patch 4 is my proposed solution for this,
> which will hopefully prevent bugs from escaping staging.
> 
> For release builds, a real memory leak is less bad behaviour than taking
> the host down, but it certainly shouldn't shouldn't go unnoticed.

Of course - it absolutely needs logging.

>>>  * In an ideal world, we'd also want to count the total number of bytes
>>>    allocated from the xmalloc heap, which would be interesting to print in the
>>>    'q' debugkey.  However, that data is fairly invasive to obtain.
>> Unless we used an xmem_pool rather than the new interface being
>> a thin layer around xmalloc(). (This would then also provide
>> better locality of the allocations, i.e. different domains
>> wouldn't share allocations from the same page.)
> 
> I'm not sure if using a separate memory pool is a clear cut improvement.

Neither am I, but it's an option to at least consider.

> For an attacker which has a corruption primitive, a single shared pool
> will reduce the chance of the exploit being stable across different
> systems.  Improved locality of allocations is an advantage from the
> attackers point of view, but the proper way to combat that is with a
> real randomised heap allocator.
> 
> Sharing within the same page isn't an issue, so long as we respect a
> cache line minimum allocation size.
> 
> More importantly however, until we know exactly how much memory we're
> talking about here, switching to a per-domain pool might be a massive
> waste.  The xmalloc() allocations are in the noise compared to RAM, and
> might only be a small multiple of the pool metadata to begin with.
> 
>> And even without doing so, adding a function to retrieve the actual size
>> shouldn't be all that difficult - internally xmalloc_tlsf.c
>> knows the size, after all, for e.g. xrealloc() to work right.
> 
> Yeah - the internals of the pool can calculate this.  I was considering
> doing just this, but wasn't sure how exposing an API for this would go down.
> 
> For maximum flexibility, it would be my preferred way forward.

I don't seen an issue with exposing such an API, so long as it's
made clear what purposes we want to permit it to be used for.

>>> +#define DFREE(d, p)                             \
>>> +    do {                                        \
>>> +        dfree(d, p);                            \
>>> +        (p) = NULL;                             \
>>> +    } while ( 0 )
>>> +
>>> +
>>> +void *_dzalloc(struct domain *d, size_t size, size_t align);
>>> +
>>> +static inline void *_dzalloc_array(struct domain *d, size_t size,
>>> +                                   size_t align, size_t num)
>>> +{
>>> +    return _dzalloc(d, size * num, align);
>> No protection at all against the multiplication overflowing?
> 
> Well - xmalloc doesn't have any either.  But yes - both want some
> compiler-aided overflow detection, rather than messing around with
> arbitrary limits pretending to be an overflow check.

You did suggest this previously, for xmalloc(). And I did look
into doing so, but either ran into some issue or simply didn't
see the point, considering the code we already have. Yes, the
use of UINT_MAX may seem somewhat arbitrary. We could make this
less arbitrary by deriving from MAX_ORDER instead, or by adding
a suitable BUILD_BUG_ON(UINT_MAX < ...). After all there's no
point even just trying allocations exceeding MAX_ORDER.

The reason for my comment was that the functions here are
specifically _not_ a simple copy of their xmalloc()
counterparts.

Jan
Andrew Cooper Jan. 14, 2021, 3:30 p.m. UTC | #5
On 14/01/2021 10:14, Jan Beulich wrote:
> On 14.01.2021 00:16, Andrew Cooper wrote:
>> On 05/01/2021 15:56, Jan Beulich wrote:
>>> On 23.12.2020 17:34, Andrew Cooper wrote:
>>>> RFC:
>>>>  * This probably wants to be less fatal in release builds
>>> I'm not even convinced this wants to be a panic() in debug builds.
>> Any memory leak spotted by this is an XSA, except in the narrow case of
>> being due exclusively to a weird and non-default order of hypercalls.
>>
>> It absolutely needs to be something fatal in debug builds, for avoid
>> going unnoticed by testing.
> This is a perfectly valid position to take, but I'm afraid once
> again isn't the only possible one. Since I do routinely look at
> logs, I'm personally in favor of avoiding crashing the host
> even for debug builds. The more that I've had pretty bad fallout
> from crashes in the past, due to (afaict) bugs in a file system
> driver in Linux that persisted over a longer period of time.

By that logic, we should replace most assertions with printk()s.  The
only reason I didn't use assert in this patch is because the
backtrace/etc is totally useless (we're in an RCU callback).

No-one, not even you, reviews all the logs from OSSTest.  It's sometimes
hard enough to get anyone to look at the logs even when there are emails
saying "$X looks broken".

The point at which we can spot the resource leak is too late to leave a
zombie domain around (we've already removed the domain from the
domlist), and a printk() is not an acceptably robust way of signalling
the problem.  Any change in the wording will render detection useless,
and some testing scenarios (stress in particular) will manage to wrap
the console ring in short order which risks hiding all traces of the
problem.

We're talking about something which will only occur on staging for
ill-reviewed patches, along with a (hopefully) reliable test developers
can use themselves, *and* in due course, this test being run pre-push
when we sort out the Gitlab CI.

The bottom line is that making this fatal is the only way we have of
getting people to pay attention to the severity of the issue, and I
don't think it reasonable to hamper automated testing's ability to spot
these issues, simply because you believe you're better than average at
reading your log files.

~Andrew
diff mbox series

Patch

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 7a4e652b57..c5d9c23fd1 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -5,6 +5,7 @@  obj-$(CONFIG_CORE_PARKING) += core_parking.o
 obj-y += cpu.o
 obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
 obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
+obj-y += dmalloc.o
 obj-y += domain.o
 obj-y += event_2l.o
 obj-y += event_channel.o
diff --git a/xen/common/dmalloc.c b/xen/common/dmalloc.c
new file mode 100644
index 0000000000..e3a0e546c2
--- /dev/null
+++ b/xen/common/dmalloc.c
@@ -0,0 +1,19 @@ 
+#include <xen/dmalloc.h>
+#include <xen/sched.h>
+#include <xen/xmalloc.h>
+
+void dfree(struct domain *d, void *ptr)
+{
+    atomic_dec(&d->dalloc_heap);
+    xfree(ptr);
+}
+
+void *_dzalloc(struct domain *d, size_t size, size_t align)
+{
+    void *ptr = _xmalloc(size, align);
+
+    if ( ptr )
+        atomic_inc(&d->dalloc_heap);
+
+    return ptr;
+}
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d151be3f36..1db1c0e70a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -332,6 +332,8 @@  static int domain_teardown(struct domain *d)
  */
 static void _domain_destroy(struct domain *d)
 {
+    int outstanding;
+
     BUG_ON(!d->is_dying);
     BUG_ON(atomic_read(&d->refcnt) != DOMAIN_DESTROYED);
 
@@ -347,6 +349,10 @@  static void _domain_destroy(struct domain *d)
 
     lock_profile_deregister_struct(LOCKPROF_TYPE_PERDOM, d);
 
+    outstanding = atomic_read(&d->dalloc_heap);
+    if ( outstanding )
+        panic("%pd has %d outstanding heap allocations\n", d, outstanding);
+
     free_domain_struct(d);
 }
 
diff --git a/xen/include/xen/dmalloc.h b/xen/include/xen/dmalloc.h
new file mode 100644
index 0000000000..a90cf0259c
--- /dev/null
+++ b/xen/include/xen/dmalloc.h
@@ -0,0 +1,29 @@ 
+#ifndef XEN_DMALLOC_H
+#define XEN_DMALLOC_H
+
+#include <xen/types.h>
+
+struct domain;
+
+#define dzalloc_array(d, _type, _num)                                   \
+    ((_type *)_dzalloc_array(d, sizeof(_type), __alignof__(_type), _num))
+
+
+void dfree(struct domain *d, void *ptr);
+
+#define DFREE(d, p)                             \
+    do {                                        \
+        dfree(d, p);                            \
+        (p) = NULL;                             \
+    } while ( 0 )
+
+
+void *_dzalloc(struct domain *d, size_t size, size_t align);
+
+static inline void *_dzalloc_array(struct domain *d, size_t size,
+                                   size_t align, size_t num)
+{
+    return _dzalloc(d, size * num, align);
+}
+
+#endif /* XEN_DMALLOC_H */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3e46384a3c..8ed8b55a1e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -349,6 +349,8 @@  struct domain
     atomic_t         shr_pages;         /* shared pages */
     atomic_t         paged_pages;       /* paged-out pages */
 
+    atomic_t         dalloc_heap;       /* Number of xmalloc-like allocations. */
+
     /* Scheduling. */
     void            *sched_priv;    /* scheduler-specific data */
     struct sched_unit *sched_unit_list;