diff mbox series

[v9,1/8] xen/common: introduce a new framework for save/restore of 'domain' context

Message ID 20200924131030.1876-2-paul@xen.org (mailing list archive)
State Superseded
Headers show
Series domain context infrastructure | expand

Commit Message

Paul Durrant Sept. 24, 2020, 1:10 p.m. UTC
To allow enlightened HVM guests (i.e. those that have PV drivers) to be
migrated without their co-operation it will be necessary to transfer 'PV'
state such as event channel state, grant entry state, etc.

Currently there is a framework (entered via the hvm_save/load() functions)
that allows a domain's 'HVM' (architectural) state to be transferred but
'PV' state is also common with pure PV guests and so this framework is not
really suitable.

This patch adds the new public header and low level implementation of a new
common framework, entered via the domain_save/load() functions. Subsequent
patches will introduce other parts of the framework, and code that will
make use of it within the current version of the libxc migration stream.

This patch also marks the HVM-only framework as deprecated in favour of the
new framework.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wl@xen.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>

v7:
 - Add an option to domain_load_end() to ignore unconsumed data, which will
   be needed by a subsequent patch
 - Kept acks since the modification is very small

v4:
 - Addressed further comments from Jan

v3:
 - Addressed comments from Julien and Jan
 - Save handlers no longer need to state entry length up-front
 - Save handlers expected to deal with multiple instances internally
 - Entries are now auto-padded to 8 byte boundary

v2:
 - Allow multi-stage save/load to avoid the need to double-buffer
 - Get rid of the masks and add an 'ignore' flag instead
 - Create copy function union to preserve const save buffer
 - Deprecate HVM-only framework
---
 xen/common/Makefile                    |   1 +
 xen/common/save.c                      | 315 +++++++++++++++++++++++++
 xen/include/public/arch-arm/hvm/save.h |   5 +
 xen/include/public/arch-x86/hvm/save.h |   5 +
 xen/include/public/save.h              |  89 +++++++
 xen/include/xen/save.h                 | 170 +++++++++++++
 6 files changed, 585 insertions(+)
 create mode 100644 xen/common/save.c
 create mode 100644 xen/include/public/save.h
 create mode 100644 xen/include/xen/save.h

Comments

Andrew Cooper Oct. 2, 2020, 9:20 p.m. UTC | #1
On 24/09/2020 14:10, Paul Durrant wrote:
> diff --git a/xen/common/save.c b/xen/common/save.c
> new file mode 100644
> index 0000000000..841c4d0e4e
> --- /dev/null
> +++ b/xen/common/save.c
> @@ -0,0 +1,315 @@
> +/*
> + * save.c: Save and restore PV guest state common to all domain types.

This description will be stale by the time your work is complete.

> +int domain_save_data(struct domain_context *c, const void *src, size_t len)
> +{
> +    int rc = c->ops.save->append(c->priv, src, len);
> +
> +    if ( !rc )
> +        c->len += len;
> +
> +    return rc;
> +}
> +
> +#define DOMAIN_SAVE_ALIGN 8

This is part of the stream ABI.

> +
> +int domain_save_end(struct domain_context *c)
> +{
> +    struct domain *d = c->domain;
> +    size_t len = ROUNDUP(c->len, DOMAIN_SAVE_ALIGN) - c->len; /* padding */

DOMAIN_SAVE_ALIGN - (c->len & (DOMAIN_SAVE_ALIGN - 1))

isn't vulnerable to overflow.

> +    int rc;
> +
> +    if ( len )
> +    {
> +        static const uint8_t pad[DOMAIN_SAVE_ALIGN] = {};
> +
> +        rc = domain_save_data(c, pad, len);
> +
> +        if ( rc )
> +            return rc;
> +    }
> +    ASSERT(IS_ALIGNED(c->len, DOMAIN_SAVE_ALIGN));
> +
> +    if ( c->name )
> +        gdprintk(XENLOG_INFO, "%pd save: %s[%u] +%zu (-%zu)\n", d, c->name,
> +                 c->desc.instance, c->len, len);

IMO, this is unhelpful to print out.  It also appears to be the only use
of the c->name field.

It also creates obscure and hard to follow logic based on dry_run.

> diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> new file mode 100644
> index 0000000000..551dbbddb8
> --- /dev/null
> +++ b/xen/include/public/save.h
> @@ -0,0 +1,89 @@
> +/*
> + * save.h
> + *
> + * Structure definitions for common PV/HVM domain state that is held by
> + * Xen and must be saved along with the domain's memory.
> + *
> + * Copyright Amazon.com Inc. or its affiliates.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef XEN_PUBLIC_SAVE_H
> +#define XEN_PUBLIC_SAVE_H
> +
> +#if defined(__XEN__) || defined(__XEN_TOOLS__)
> +
> +#include "xen.h"
> +
> +/* Entry data is preceded by a descriptor */
> +struct domain_save_descriptor {
> +    uint16_t typecode;
> +
> +    /*
> +     * Instance number of the entry (since there may be multiple of some
> +     * types of entries).
> +     */
> +    uint16_t instance;
> +
> +    /* Entry length not including this descriptor */
> +    uint32_t length;
> +};
> +
> +/*
> + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> + * binds these things together, although it is not intended that the
> + * resulting type is ever instantiated.
> + */
> +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> +    struct DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
> +
> +#define DOMAIN_SAVE_CODE(_x) \
> +    (sizeof(((struct DOMAIN_SAVE_TYPE_##_x *)0)->c))
> +#define DOMAIN_SAVE_TYPE(_x) \
> +    typeof(((struct DOMAIN_SAVE_TYPE_##_x *)0)->t)

I realise this is going to make me very unpopular, but NACK.

This is straight up obfuscation with no redeeming properties.  I know
you've copied it from the exist HVMCONTEXT infrastructure, but it is
obnoxious to use there (particularly in the domain builder) and not an
example wanting copying.

Furthermore, the code will be simpler and easier to follow without it.

Secondly, and more importantly, I do not see anything in docs/specs/
describing the binary format of this stream,  and I'm going to insist
that one appears, ahead of this patch in the series.

In doing so, you're hopefully going to discover the bug with the older
HVMCONTEXT stream which makes the version field fairly pointless (more
below).

It should describe how to forward compatibly extend the stream, and
under what circumstances the version number can/should change.  It also
needs to describe the alignment and extending rules which ...

> +
> +/*
> + * All entries will be zero-padded to the next 64-bit boundary when saved,
> + * so there is no need to include trailing pad fields in structure
> + * definitions.
> + * When loading, entries will be zero-extended if the load handler reads
> + * beyond the length specified in the descriptor.
> + */

... shouldn't be this.

The current zero extending property was an emergency hack to fix an ABI
breakage which had gone unnoticed for a couple of releases.  The work to
implement it created several very hard to debug breakages in Xen.

A properly designed stream shouldn't need auto-extending behaviour, and
the legibility of the code is improved by not having it.

It is a trick which can stay up your sleeve for an emergency, in the
hope you'll never have to use it.

> +
> +/* Terminating entry */
> +struct domain_save_end {};
> +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
> +
> +#define DOMAIN_SAVE_MAGIC   0x53415645
> +#define DOMAIN_SAVE_VERSION 0x00000001
> +
> +/* Initial entry */
> +struct domain_save_header {
> +    uint32_t magic;                /* Must be DOMAIN_SAVE_MAGIC */
> +    uint16_t xen_major, xen_minor; /* Xen version */
> +    uint32_t version;              /* Save format version */
> +};
> +DECLARE_DOMAIN_SAVE_TYPE(HEADER, 1, struct domain_save_header);

The layout problem with the stream is the fact that this header doesn't
come first.

In the eventual future where uint16_t won't be sufficient for instance,
and uint32_t might not be sufficient for len, the version number is
going to have to be bumped, in order to change the descriptor layout.


Overall, this patch needs to be a minimum of two.  First a written
document which is the authoritative stream ABI, and the second which is
this implementation.  The header describing the stream format should not
be substantively different from xg_sr_stream_format.h

~Andrew

P.S. Another good reason for having extremely simple header files is for
the poor sole trying to write a Go/Rust/other binding for this in some
likely not-to-distant future.
Andrew Cooper Oct. 2, 2020, 10 p.m. UTC | #2
On 24/09/2020 14:10, Paul Durrant wrote:
> +/*
> + * The 'dry_run' flag indicates that the caller of domain_save() (see below)
> + * is not trying to actually acquire the data, only the size of the data.
> + * The save handler can therefore limit work to only that which is necessary
> + * to call domain_save_data() the correct number of times with accurate values
> + * for 'len'.
> + */
> +typedef int (*domain_save_handler)(const struct domain *d,
> +                                   struct domain_context *c,
> +                                   bool dry_run);

Sorry - missed this the first time around.  This cannot take a const domain.

Doing so prevents putting (amongst other things), event channel details
into the stream, because you won't be able to take the domain's event
lock, and having the domain paused isn't good enough protection.

Removing this const will reduce the churn in subsequent patches somewhat.

~Andrew
Wei Liu Oct. 3, 2020, 2:33 p.m. UTC | #3
On Fri, Oct 02, 2020 at 10:20:18PM +0100, Andrew Cooper wrote:
[...]
> P.S. Another good reason for having extremely simple header files is for
> the poor sole trying to write a Go/Rust/other binding for this in some
> likely not-to-distant future.

For Rust the header is going to be generated by a tool called bindgen.
It doesn't like nested macros, so I would be all for a simpler C header
file if we can help it.

Wei.
Paul Durrant Oct. 5, 2020, 8:03 a.m. UTC | #4
> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 02 October 2020 22:20
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>; Julien Grall <julien@xen.org>; Jan Beulich
> <jbeulich@suse.com>; George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Roger Pau Monné <roger.pau@citrix.com>
> Subject: Re: [PATCH v9 1/8] xen/common: introduce a new framework for save/restore of 'domain' context
> 
> On 24/09/2020 14:10, Paul Durrant wrote:
> > diff --git a/xen/common/save.c b/xen/common/save.c
> > new file mode 100644
> > index 0000000000..841c4d0e4e
> > --- /dev/null
> > +++ b/xen/common/save.c
> > @@ -0,0 +1,315 @@
> > +/*
> > + * save.c: Save and restore PV guest state common to all domain types.
> 
> This description will be stale by the time your work is complete.
> 

True now, I'll just drop the 'PV'

> > +int domain_save_data(struct domain_context *c, const void *src, size_t len)
> > +{
> > +    int rc = c->ops.save->append(c->priv, src, len);
> > +
> > +    if ( !rc )
> > +        c->len += len;
> > +
> > +    return rc;
> > +}
> > +
> > +#define DOMAIN_SAVE_ALIGN 8
> 
> This is part of the stream ABI.
> 

And what's actually the problem with defining it here?

> > +
> > +int domain_save_end(struct domain_context *c)
> > +{
> > +    struct domain *d = c->domain;
> > +    size_t len = ROUNDUP(c->len, DOMAIN_SAVE_ALIGN) - c->len; /* padding */
> 
> DOMAIN_SAVE_ALIGN - (c->len & (DOMAIN_SAVE_ALIGN - 1))
> 
> isn't vulnerable to overflow.
> 

...and significantly uglier code. What's actually wrong with what I wrote?

> > +    int rc;
> > +
> > +    if ( len )
> > +    {
> > +        static const uint8_t pad[DOMAIN_SAVE_ALIGN] = {};
> > +
> > +        rc = domain_save_data(c, pad, len);
> > +
> > +        if ( rc )
> > +            return rc;
> > +    }
> > +    ASSERT(IS_ALIGNED(c->len, DOMAIN_SAVE_ALIGN));
> > +
> > +    if ( c->name )
> > +        gdprintk(XENLOG_INFO, "%pd save: %s[%u] +%zu (-%zu)\n", d, c->name,
> > +                 c->desc.instance, c->len, len);
> 
> IMO, this is unhelpful to print out.  It also appears to be the only use
> of the c->name field.
> 
> It also creates obscure and hard to follow logic based on dry_run.
> 

I'll drop it to debug. I personally find it helpful and would prefer to keep it.

> > diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> > new file mode 100644
> > index 0000000000..551dbbddb8
> > --- /dev/null
> > +++ b/xen/include/public/save.h
> > @@ -0,0 +1,89 @@
> > +/*
> > + * save.h
> > + *
> > + * Structure definitions for common PV/HVM domain state that is held by
> > + * Xen and must be saved along with the domain's memory.
> > + *
> > + * Copyright Amazon.com Inc. or its affiliates.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a copy
> > + * of this software and associated documentation files (the "Software"), to
> > + * deal in the Software without restriction, including without limitation the
> > + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> > + * sell copies of the Software, and to permit persons to whom the Software is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + */
> > +
> > +#ifndef XEN_PUBLIC_SAVE_H
> > +#define XEN_PUBLIC_SAVE_H
> > +
> > +#if defined(__XEN__) || defined(__XEN_TOOLS__)
> > +
> > +#include "xen.h"
> > +
> > +/* Entry data is preceded by a descriptor */
> > +struct domain_save_descriptor {
> > +    uint16_t typecode;
> > +
> > +    /*
> > +     * Instance number of the entry (since there may be multiple of some
> > +     * types of entries).
> > +     */
> > +    uint16_t instance;
> > +
> > +    /* Entry length not including this descriptor */
> > +    uint32_t length;
> > +};
> > +
> > +/*
> > + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> > + * binds these things together, although it is not intended that the
> > + * resulting type is ever instantiated.
> > + */
> > +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> > +    struct DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
> > +
> > +#define DOMAIN_SAVE_CODE(_x) \
> > +    (sizeof(((struct DOMAIN_SAVE_TYPE_##_x *)0)->c))
> > +#define DOMAIN_SAVE_TYPE(_x) \
> > +    typeof(((struct DOMAIN_SAVE_TYPE_##_x *)0)->t)
> 
> I realise this is going to make me very unpopular, but NACK.
> 
> This is straight up obfuscation with no redeeming properties.  I know
> you've copied it from the exist HVMCONTEXT infrastructure, but it is
> obnoxious to use there (particularly in the domain builder) and not an
> example wanting copying.
> 
> Furthermore, the code will be simpler and easier to follow without it.
> 

OK, I can drop it if you so vehemently object.

> Secondly, and more importantly, I do not see anything in docs/specs/
> describing the binary format of this stream,  and I'm going to insist
> that one appears, ahead of this patch in the series.
> 

I can certainly put something there if you wish.

> In doing so, you're hopefully going to discover the bug with the older
> HVMCONTEXT stream which makes the version field fairly pointless (more
> below).
> 
> It should describe how to forward compatibly extend the stream, and
> under what circumstances the version number can/should change.  It also
> needs to describe the alignment and extending rules which ...
> 
> > +
> > +/*
> > + * All entries will be zero-padded to the next 64-bit boundary when saved,
> > + * so there is no need to include trailing pad fields in structure
> > + * definitions.
> > + * When loading, entries will be zero-extended if the load handler reads
> > + * beyond the length specified in the descriptor.
> > + */
> 
> ... shouldn't be this.
> 

Auto-padding was explicitly requested by Julien and extending (with zeroes or otherwise) is the necessary corollary (since the save handlers are not explicitly padding to the alignment boundary).

> The current zero extending property was an emergency hack to fix an ABI
> breakage which had gone unnoticed for a couple of releases.  The work to
> implement it created several very hard to debug breakages in Xen.
> 
> A properly designed stream shouldn't need auto-extending behaviour, and
> the legibility of the code is improved by not having it.
> 
> It is a trick which can stay up your sleeve for an emergency, in the
> hope you'll never have to use it.
> 

The zero-extending here is different; it does not form part of the record. It is merely there to make sure the alignment constraint is met.

> > +
> > +/* Terminating entry */
> > +struct domain_save_end {};
> > +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
> > +
> > +#define DOMAIN_SAVE_MAGIC   0x53415645
> > +#define DOMAIN_SAVE_VERSION 0x00000001
> > +
> > +/* Initial entry */
> > +struct domain_save_header {
> > +    uint32_t magic;                /* Must be DOMAIN_SAVE_MAGIC */
> > +    uint16_t xen_major, xen_minor; /* Xen version */
> > +    uint32_t version;              /* Save format version */
> > +};
> > +DECLARE_DOMAIN_SAVE_TYPE(HEADER, 1, struct domain_save_header);
> 
> The layout problem with the stream is the fact that this header doesn't
> come first.
> 

? It most certainly does some first as is evident from the load and save functions. But I will add a document that states it, as requested.

> In the eventual future where uint16_t won't be sufficient for instance,
> and uint32_t might not be sufficient for len, the version number is
> going to have to be bumped, in order to change the descriptor layout.
> 
> 
> Overall, this patch needs to be a minimum of two.  First a written
> document which is the authoritative stream ABI, and the second which is
> this implementation.  The header describing the stream format should not
> be substantively different from xg_sr_stream_format.h
> 

Ok.

> ~Andrew
> 
> P.S. Another good reason for having extremely simple header files is for
> the poor sole trying to write a Go/Rust/other binding for this in some
> likely not-to-distant future.

Fine. I'm happy to drop the macro/type magic if no-one feels it is necessary.

  Paul
Jan Beulich Oct. 13, 2020, 11:44 a.m. UTC | #5
On 05.10.2020 10:03, Paul Durrant wrote:
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>> Sent: 02 October 2020 22:20
>>
>> On 24/09/2020 14:10, Paul Durrant wrote:
>>> +int domain_save_end(struct domain_context *c)
>>> +{
>>> +    struct domain *d = c->domain;
>>> +    size_t len = ROUNDUP(c->len, DOMAIN_SAVE_ALIGN) - c->len; /* padding */
>>
>> DOMAIN_SAVE_ALIGN - (c->len & (DOMAIN_SAVE_ALIGN - 1))
>>
>> isn't vulnerable to overflow.
>>
> 
> ...and significantly uglier code. What's actually wrong with what I wrote?

I don't think there's anything "wrong" or "vulnerable" here, but
I still can see Andrew's point. The "vulnerable" aspect applies
only in the (highly hypothetical I think) cases of either
sizeof(size_t) < sizeof(int) or size_t being a signed type, afaict.
But since it's easy (and imo not "significantly uglier") to write
code that is free of any wrapping or overflowing behavior, I
think it is sensible to actually write it that way.

Jan
diff mbox series

Patch

diff --git a/xen/common/Makefile b/xen/common/Makefile
index b3b60a1ba2..3e6f21714a 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -37,6 +37,7 @@  obj-y += radix-tree.o
 obj-y += rbtree.o
 obj-y += rcupdate.o
 obj-y += rwlock.o
+obj-y += save.o
 obj-y += shutdown.o
 obj-y += softirq.o
 obj-y += sort.o
diff --git a/xen/common/save.c b/xen/common/save.c
new file mode 100644
index 0000000000..841c4d0e4e
--- /dev/null
+++ b/xen/common/save.c
@@ -0,0 +1,315 @@ 
+/*
+ * save.c: Save and restore PV guest state common to all domain types.
+ *
+ * Copyright Amazon.com Inc. or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/compile.h>
+#include <xen/save.h>
+
+struct domain_context {
+    struct domain *domain;
+    const char *name; /* for logging purposes */
+    struct domain_save_descriptor desc;
+    size_t len; /* for internal accounting */
+    union {
+        const struct domain_save_ops *save;
+        const struct domain_load_ops *load;
+    } ops;
+    void *priv;
+};
+
+static struct {
+    const char *name;
+    domain_save_handler save;
+    domain_load_handler load;
+} handlers[DOMAIN_SAVE_CODE_MAX + 1];
+
+void __init domain_register_save_type(unsigned int typecode,
+                                      const char *name,
+                                      domain_save_handler save,
+                                      domain_load_handler load)
+{
+    BUG_ON(typecode >= ARRAY_SIZE(handlers));
+
+    ASSERT(!handlers[typecode].save);
+    ASSERT(!handlers[typecode].load);
+
+    handlers[typecode].name = name;
+    handlers[typecode].save = save;
+    handlers[typecode].load = load;
+}
+
+int domain_save_begin(struct domain_context *c, unsigned int typecode,
+                      unsigned int instance)
+{
+    int rc;
+
+    if ( typecode != c->desc.typecode )
+    {
+        ASSERT_UNREACHABLE();
+        return -EINVAL;
+    }
+    ASSERT(!c->desc.length); /* Should always be zero during domain_save() */
+    ASSERT(!c->len); /* Verify domain_save_end() was called */
+
+    rc = c->ops.save->begin(c->priv, &c->desc);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+int domain_save_data(struct domain_context *c, const void *src, size_t len)
+{
+    int rc = c->ops.save->append(c->priv, src, len);
+
+    if ( !rc )
+        c->len += len;
+
+    return rc;
+}
+
+#define DOMAIN_SAVE_ALIGN 8
+
+int domain_save_end(struct domain_context *c)
+{
+    struct domain *d = c->domain;
+    size_t len = ROUNDUP(c->len, DOMAIN_SAVE_ALIGN) - c->len; /* padding */
+    int rc;
+
+    if ( len )
+    {
+        static const uint8_t pad[DOMAIN_SAVE_ALIGN] = {};
+
+        rc = domain_save_data(c, pad, len);
+
+        if ( rc )
+            return rc;
+    }
+    ASSERT(IS_ALIGNED(c->len, DOMAIN_SAVE_ALIGN));
+
+    if ( c->name )
+        gdprintk(XENLOG_INFO, "%pd save: %s[%u] +%zu (-%zu)\n", d, c->name,
+                 c->desc.instance, c->len, len);
+
+    rc = c->ops.save->end(c->priv, c->len);
+    c->len = 0;
+
+    return rc;
+}
+
+int domain_save(struct domain *d, const struct domain_save_ops *ops,
+                void *priv, bool dry_run)
+{
+    struct domain_context c = {
+        .domain = d,
+        .ops.save = ops,
+        .priv = priv,
+    };
+    static const struct domain_save_header h = {
+        .magic = DOMAIN_SAVE_MAGIC,
+        .xen_major = XEN_VERSION,
+        .xen_minor = XEN_SUBVERSION,
+        .version = DOMAIN_SAVE_VERSION,
+    };
+    const struct domain_save_end e = {};
+    unsigned int i;
+    int rc;
+
+    ASSERT(d != current->domain);
+    domain_pause(d);
+
+    c.name = !dry_run ? "HEADER" : NULL;
+    c.desc.typecode = DOMAIN_SAVE_CODE(HEADER);
+
+    rc = DOMAIN_SAVE_ENTRY(HEADER, &c, 0, &h, sizeof(h));
+    if ( rc )
+        goto out;
+
+    for ( i = 0; i < ARRAY_SIZE(handlers); i++ )
+    {
+        domain_save_handler save = handlers[i].save;
+
+        if ( !save )
+            continue;
+
+        c.name = !dry_run ? handlers[i].name : NULL;
+        memset(&c.desc, 0, sizeof(c.desc));
+        c.desc.typecode = i;
+
+        rc = save(d, &c, dry_run);
+        if ( rc )
+            goto out;
+    }
+
+    c.name = !dry_run ? "END" : NULL;
+    memset(&c.desc, 0, sizeof(c.desc));
+    c.desc.typecode = DOMAIN_SAVE_CODE(END);
+
+    rc = DOMAIN_SAVE_ENTRY(END, &c, 0, &e, sizeof(e));
+
+ out:
+    domain_unpause(d);
+
+    return rc;
+}
+
+int domain_load_begin(struct domain_context *c, unsigned int typecode,
+                      unsigned int *instance)
+{
+    if ( typecode != c->desc.typecode )
+    {
+        ASSERT_UNREACHABLE();
+        return -EINVAL;
+    }
+
+    ASSERT(!c->len); /* Verify domain_load_end() was called */
+
+    *instance = c->desc.instance;
+
+    return 0;
+}
+
+int domain_load_data(struct domain_context *c, void *dst, size_t len)
+{
+    size_t copy_len = min_t(size_t, len, c->desc.length - c->len);
+    int rc;
+
+    c->len += copy_len;
+    ASSERT(c->len <= c->desc.length);
+
+    rc = copy_len ? c->ops.load->read(c->priv, dst, copy_len) : 0;
+    if ( rc )
+        return rc;
+
+    /* Zero extend if the entry is exhausted */
+    len -= copy_len;
+    if ( len )
+    {
+        dst += copy_len;
+        memset(dst, 0, len);
+    }
+
+    return 0;
+}
+
+int domain_load_end(struct domain_context *c, bool ignore_data)
+{
+    struct domain *d = c->domain;
+    size_t len = c->desc.length - c->len;
+
+    while ( c->len != c->desc.length ) /* unconsumed data or pad */
+    {
+        uint8_t pad;
+        int rc = domain_load_data(c, &pad, sizeof(pad));
+
+        if ( rc )
+            return rc;
+
+        if ( !ignore_data && pad )
+            return -EINVAL;
+    }
+
+    ASSERT(c->name);
+    gdprintk(XENLOG_INFO, "%pd load: %s[%u] +%zu (-%zu)\n", d, c->name,
+             c->desc.instance, c->len, len);
+
+    c->len = 0;
+
+    return 0;
+}
+
+int domain_load(struct domain *d, const struct domain_load_ops *ops,
+                void *priv)
+{
+    struct domain_context c = {
+        .domain = d,
+        .ops.load = ops,
+        .priv = priv,
+    };
+    unsigned int instance;
+    struct domain_save_header h;
+    int rc;
+
+    ASSERT(d != current->domain);
+
+    rc = c.ops.load->read(c.priv, &c.desc, sizeof(c.desc));
+    if ( rc )
+        return rc;
+
+    c.name = "HEADER";
+
+    rc = DOMAIN_LOAD_ENTRY(HEADER, &c, &instance, &h, sizeof(h));
+    if ( rc )
+        return rc;
+
+    if ( instance || h.magic != DOMAIN_SAVE_MAGIC ||
+         h.version != DOMAIN_SAVE_VERSION )
+        return -EINVAL;
+
+    domain_pause(d);
+
+    for (;;)
+    {
+        unsigned int i;
+        domain_load_handler load;
+
+        rc = c.ops.load->read(c.priv, &c.desc, sizeof(c.desc));
+        if ( rc )
+            return rc;
+
+        rc = -EINVAL;
+
+        if ( c.desc.typecode == DOMAIN_SAVE_CODE(END) )
+        {
+            struct domain_save_end e;
+
+            c.name = "END";
+
+            rc = DOMAIN_LOAD_ENTRY(END, &c, &instance, &e, sizeof(e));
+
+            if ( instance )
+                return -EINVAL;
+
+            break;
+        }
+
+        i = c.desc.typecode;
+        if ( i >= ARRAY_SIZE(handlers) )
+            break;
+
+        c.name = handlers[i].name;
+        load = handlers[i].load;
+
+        rc = load ? load(d, &c) : -EOPNOTSUPP;
+        if ( rc )
+            break;
+    }
+
+    domain_unpause(d);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
index 75b8e65bcb..d5b0c15203 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -26,6 +26,11 @@ 
 #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
 #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
 
+/*
+ * Further use of HVM state is deprecated. New state records should only
+ * be added to the domain state header: public/save.h
+ */
+
 #endif
 
 /*
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 773a380bc2..e61e2dbcd7 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -648,6 +648,11 @@  struct hvm_msr {
  */
 #define HVM_SAVE_CODE_MAX 20
 
+/*
+ * Further use of HVM state is deprecated. New state records should only
+ * be added to the domain state header: public/save.h
+ */
+
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
 
 /*
diff --git a/xen/include/public/save.h b/xen/include/public/save.h
new file mode 100644
index 0000000000..551dbbddb8
--- /dev/null
+++ b/xen/include/public/save.h
@@ -0,0 +1,89 @@ 
+/*
+ * save.h
+ *
+ * Structure definitions for common PV/HVM domain state that is held by
+ * Xen and must be saved along with the domain's memory.
+ *
+ * Copyright Amazon.com Inc. or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef XEN_PUBLIC_SAVE_H
+#define XEN_PUBLIC_SAVE_H
+
+#if defined(__XEN__) || defined(__XEN_TOOLS__)
+
+#include "xen.h"
+
+/* Entry data is preceded by a descriptor */
+struct domain_save_descriptor {
+    uint16_t typecode;
+
+    /*
+     * Instance number of the entry (since there may be multiple of some
+     * types of entries).
+     */
+    uint16_t instance;
+
+    /* Entry length not including this descriptor */
+    uint32_t length;
+};
+
+/*
+ * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
+ * binds these things together, although it is not intended that the
+ * resulting type is ever instantiated.
+ */
+#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
+    struct DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
+
+#define DOMAIN_SAVE_CODE(_x) \
+    (sizeof(((struct DOMAIN_SAVE_TYPE_##_x *)0)->c))
+#define DOMAIN_SAVE_TYPE(_x) \
+    typeof(((struct DOMAIN_SAVE_TYPE_##_x *)0)->t)
+
+/*
+ * All entries will be zero-padded to the next 64-bit boundary when saved,
+ * so there is no need to include trailing pad fields in structure
+ * definitions.
+ * When loading, entries will be zero-extended if the load handler reads
+ * beyond the length specified in the descriptor.
+ */
+
+/* Terminating entry */
+struct domain_save_end {};
+DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
+
+#define DOMAIN_SAVE_MAGIC   0x53415645
+#define DOMAIN_SAVE_VERSION 0x00000001
+
+/* Initial entry */
+struct domain_save_header {
+    uint32_t magic;                /* Must be DOMAIN_SAVE_MAGIC */
+    uint16_t xen_major, xen_minor; /* Xen version */
+    uint32_t version;              /* Save format version */
+};
+DECLARE_DOMAIN_SAVE_TYPE(HEADER, 1, struct domain_save_header);
+
+#define DOMAIN_SAVE_CODE_MAX 1
+
+#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
+
+#endif /* XEN_PUBLIC_SAVE_H */
diff --git a/xen/include/xen/save.h b/xen/include/xen/save.h
new file mode 100644
index 0000000000..e631a2e85e
--- /dev/null
+++ b/xen/include/xen/save.h
@@ -0,0 +1,170 @@ 
+/*
+ * save.h: support routines for save/restore
+ *
+ * Copyright Amazon.com Inc. or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef XEN_SAVE_H
+#define XEN_SAVE_H
+
+#include <xen/init.h>
+#include <xen/sched.h>
+#include <xen/types.h>
+
+#include <public/save.h>
+
+struct domain_context;
+
+int domain_save_begin(struct domain_context *c, unsigned int typecode,
+                      unsigned int instance);
+
+#define DOMAIN_SAVE_BEGIN(x, c, i) \
+    domain_save_begin((c), DOMAIN_SAVE_CODE(x), (i))
+
+int domain_save_data(struct domain_context *c, const void *data, size_t len);
+int domain_save_end(struct domain_context *c);
+
+static inline int domain_save_entry(struct domain_context *c,
+                                    unsigned int typecode,
+                                    unsigned int instance, const void *src,
+                                    size_t len)
+{
+    int rc;
+
+    rc = domain_save_begin(c, typecode, instance);
+    if ( rc )
+        return rc;
+
+    rc = domain_save_data(c, src, len);
+    if ( rc )
+        return rc;
+
+    return domain_save_end(c);
+}
+
+#define DOMAIN_SAVE_ENTRY(x, c, i, s, l) \
+    domain_save_entry((c), DOMAIN_SAVE_CODE(x), (i), (s), (l))
+
+int domain_load_begin(struct domain_context *c, unsigned int typecode,
+                      unsigned int *instance);
+
+#define DOMAIN_LOAD_BEGIN(x, c, i) \
+    domain_load_begin((c), DOMAIN_SAVE_CODE(x), (i))
+
+int domain_load_data(struct domain_context *c, void *data, size_t len);
+int domain_load_end(struct domain_context *c, bool ignore_data);
+
+static inline int domain_load_entry(struct domain_context *c,
+                                    unsigned int typecode,
+                                    unsigned int *instance, void *dst,
+                                    size_t len)
+{
+    int rc;
+
+    rc = domain_load_begin(c, typecode, instance);
+    if ( rc )
+        return rc;
+
+    rc = domain_load_data(c, dst, len);
+    if ( rc )
+        return rc;
+
+    return domain_load_end(c, false);
+}
+
+#define DOMAIN_LOAD_ENTRY(x, c, i, d, l) \
+    domain_load_entry((c), DOMAIN_SAVE_CODE(x), (i), (d), (l))
+
+/*
+ * The 'dry_run' flag indicates that the caller of domain_save() (see below)
+ * is not trying to actually acquire the data, only the size of the data.
+ * The save handler can therefore limit work to only that which is necessary
+ * to call domain_save_data() the correct number of times with accurate values
+ * for 'len'.
+ */
+typedef int (*domain_save_handler)(const struct domain *d,
+                                   struct domain_context *c,
+                                   bool dry_run);
+typedef int (*domain_load_handler)(struct domain *d,
+                                   struct domain_context *c);
+
+void domain_register_save_type(unsigned int typecode, const char *name,
+                               domain_save_handler save,
+                               domain_load_handler load);
+
+/*
+ * Register save and load handlers.
+ *
+ * Save handlers will be invoked in an order which copes with any inter-
+ * entry dependencies. For now this means that HEADER will come first and
+ * END will come last, all others being invoked in order of 'typecode'.
+ *
+ * Load handlers will be invoked in the order of entries present in the
+ * buffer.
+ */
+#define DOMAIN_REGISTER_SAVE_LOAD(x, s, l)                    \
+    static int __init __domain_register_##x##_save_load(void) \
+    {                                                         \
+        domain_register_save_type(                            \
+            DOMAIN_SAVE_CODE(x),                              \
+            #x,                                               \
+            &(s),                                             \
+            &(l));                                            \
+                                                              \
+        return 0;                                             \
+    }                                                         \
+    __initcall(__domain_register_##x##_save_load);
+
+/* Callback functions */
+struct domain_save_ops {
+    /*
+     * Begin a new entry with the given descriptor (only type and instance
+     * are valid).
+     */
+    int (*begin)(void *priv, const struct domain_save_descriptor *desc);
+    /* Append data/padding to the buffer */
+    int (*append)(void *priv, const void *data, size_t len);
+    /*
+     * Complete the entry by updating the descriptor with the total
+     * length of the appended data (not including padding).
+     */
+    int (*end)(void *priv, size_t len);
+};
+
+struct domain_load_ops {
+    /* Read data/padding from the buffer */
+    int (*read)(void *priv, void *data, size_t len);
+};
+
+/*
+ * Entry points:
+ *
+ * ops:     These are callback functions provided by the caller that will
+ *          be used to write to (in the save case) or read from (in the
+ *          load case) the context buffer. See above for more detail.
+ * priv:    This is a pointer that will be passed to the copy function to
+ *          allow it to identify the context buffer and the current state
+ *          of the save or load operation.
+ * dry_run: If this is set then the caller of domain_save() is only trying
+ *          to acquire the total size of the data, not the data itself.
+ *          In this case the caller may supply different ops to avoid doing
+ *          unnecessary work.
+ */
+int domain_save(struct domain *d, const struct domain_save_ops *ops,
+                void *priv, bool dry_run);
+int domain_load(struct domain *d, const struct domain_load_ops *ops,
+                void *priv);
+
+#endif /* XEN_SAVE_H */