diff mbox series

[for-4.17,v3,15/15] tools/ocaml/libs/xc: fix use of uninitialized memory in shadow_allocation_get

Message ID 94f93ee61a4d0bd2fac3f5a753cb935962be20bb.1667920496.git.edvin.torok@citrix.com (mailing list archive)
State New, archived
Headers show
Series OCaml fixes for Xen 4.17 | expand

Commit Message

Edwin Török Nov. 8, 2022, 3:34 p.m. UTC
It has been noticed in 2013 that shadow allocation sometimes returns the
wrong value, which got worked around by adding a limit to the shadow
multiplier of 1000 and ignoring the value from Xen in that case
to avoid a shadow multiplier causing a VM to request 6PB of memory for
example:
https://github.com/xapi-project/xen-api/pull/1215/commits/be55a8c30b41d1cd7596fc100ab1cfd3539f74eb

However that is just a workaround, and I've just reproduced this by
killing a VM mid migration, which resulted in a shadow multiplier of
629.42, rendering the VM unbootable even after a host reboot.

The real bug is in Xen: when a VM is dying it will return '0' for paging
op domctls and log a message at info level
'Ignoring paging op on dying domain', which leaves the 'mb' parameter
uninitialized upon return from the domctl.

The binding also doesn't initialize the 'c->mb' parameter (it is meant
to be used only when setting, not when querying the allocation),
which results in the VM getting a shadow allocation (and thus multiplier)
set based on what value happened to be currently on the stack.

Explicitly initialize the value passed to the domctl, and detect the uninitialized
case (shadow allocation of 0), and raise an exception in that case.
The exception will cause xenopsd to skip setting the shadow multiplier.

Note that the behaviour of Xen here is inconsistent between x86 and ARM:
ARM would return EINVAL when it gets a paging op on a dying domain,
and X86-64 would return 0 with possibly uninitialized data.

It might be desirable to change the x86 path in the hypervisor to return
EINVAL, although that would require more testing in case it breaks
somethig.
But the bindings should be defensive anyway against bugs like this.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
---
Reason for inclusion in 4.17:
- fixes a long-standing (>9y old) bug that is still happening today

Changes since v2:
- new in v3
---
 tools/ocaml/libs/xc/xenctrl_stubs.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Edwin Török Nov. 9, 2022, 1:52 p.m. UTC | #1
On 8 Nov 2022, at 15:34, Edwin Török <edvin.torok@citrix.com> wrote:
> 
> It has been noticed in 2013 that shadow allocation sometimes returns the
> wrong value, which got worked around by adding a limit to the shadow
> multiplier of 1000 and ignoring the value from Xen in that case
> to avoid a shadow multiplier causing a VM to request 6PB of memory for
> example:
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxapi-project%2Fxen-api%2Fpull%2F1215%2Fcommits%2Fbe55a8c30b41d1cd7596fc100ab1cfd3539f74eb&amp;data=05%7C01%7Cedvin.torok%40citrix.com%7C54fa199055674737536f08dac19f7026%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C638035187781870066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=l3%2BSDQinoqZ9CZvWsAcXgFl5vEbJf7hjVzBPLKoVYp4%3D&amp;reserved=0
> 
> However that is just a workaround, and I've just reproduced this by
> killing a VM mid migration, which resulted in a shadow multiplier of
> 629.42, rendering the VM unbootable even after a host reboot.


After some more discussion it looks like this is getting fixed in the hypervisor, so this workaround wouldn't be needed,
might want to hold off on this patch until the domctl discussion is settled at:
https://lore.kernel.org/xen-devel/20221108113850.61619-1-roger.pau@citrix.com/


Best regards,
--Edwin

> 
> The real bug is in Xen: when a VM is dying it will return '0' for paging
> op domctls and log a message at info level
> 'Ignoring paging op on dying domain', which leaves the 'mb' parameter
> uninitialized upon return from the domctl.
> 
> The binding also doesn't initialize the 'c->mb' parameter (it is meant
> to be used only when setting, not when querying the allocation),
> which results in the VM getting a shadow allocation (and thus multiplier)
> set based on what value happened to be currently on the stack.
> 
> Explicitly initialize the value passed to the domctl, and detect the uninitialized
> case (shadow allocation of 0), and raise an exception in that case.
> The exception will cause xenopsd to skip setting the shadow multiplier.
> 
> Note that the behaviour of Xen here is inconsistent between x86 and ARM:
> ARM would return EINVAL when it gets a paging op on a dying domain,
> and X86-64 would return 0 with possibly uninitialized data.
> 
> It might be desirable to change the x86 path in the hypervisor to return
> EINVAL, although that would require more testing in case it breaks
> somethig.
> But the bindings should be defensive anyway against bugs like this.
> 
> Signed-off-by: Edwin Török <edvin.torok@citrix.com>
> ---
> Reason for inclusion in 4.17:
> - fixes a long-standing (>9y old) bug that is still happening today
> 
> Changes since v2:
> - new in v3
> ---
> tools/ocaml/libs/xc/xenctrl_stubs.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
> index e2d897581f..9681a74e40 100644
> --- a/tools/ocaml/libs/xc/xenctrl_stubs.c
> +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
> @@ -1019,7 +1019,7 @@ CAMLprim value stub_shadow_allocation_get(value xch, value domid)
> {
>     CAMLparam2(xch, domid);
>     CAMLlocal1(mb);
> -    unsigned int c_mb;
> +    unsigned int c_mb = 0;
>     int ret;
> 
>     caml_enter_blocking_section();
> @@ -1029,6 +1029,9 @@ CAMLprim value stub_shadow_allocation_get(value xch, value domid)
>     caml_leave_blocking_section();
>     if (ret != 0)
>         failwith_xc(_H(xch));
> +    if ( !c_mb )
> +        caml_failwith("domctl returned uninitialized data for shadow "
> +                      "allocation, dying domain?");
> 
>     mb = Val_int(c_mb);
>     CAMLreturn(mb);
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index e2d897581f..9681a74e40 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1019,7 +1019,7 @@  CAMLprim value stub_shadow_allocation_get(value xch, value domid)
 {
     CAMLparam2(xch, domid);
     CAMLlocal1(mb);
-    unsigned int c_mb;
+    unsigned int c_mb = 0;
     int ret;
 
     caml_enter_blocking_section();
@@ -1029,6 +1029,9 @@  CAMLprim value stub_shadow_allocation_get(value xch, value domid)
     caml_leave_blocking_section();
     if (ret != 0)
         failwith_xc(_H(xch));
+    if ( !c_mb )
+        caml_failwith("domctl returned uninitialized data for shadow "
+                      "allocation, dying domain?");
 
     mb = Val_int(c_mb);
     CAMLreturn(mb);