diff mbox series

[v2,13/13] SUPPORT.md: write down restriction of 32-bit tool stacks

Message ID ddff8b28-274d-d7fe-4ba9-0772859b7a72@suse.com (mailing list archive)
State New
Headers show
Series x86: more or less log-dirty related improvements | expand

Commit Message

Jan Beulich July 5, 2021, 3:18 p.m. UTC
Let's try to avoid giving the impression that 32-bit tool stacks are as
capable as 64-bit ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Wording adjustments as per review discussion.

Comments

Julien Grall July 14, 2021, 6:16 p.m. UTC | #1
Hi Jan,

On 05/07/2021 16:18, Jan Beulich wrote:
> Let's try to avoid giving the impression that 32-bit tool stacks are as
> capable as 64-bit ones.

Would you be able to provide a few examples of the known issues in the 
commit message? This would be helpful for anyone to understand why we 
decided to drop the support.

At least on Arm, we tried to design the hypercall ABI in such a way that 
it should be possible to use a 32-bit toolstack.

That said, I am not aware of anyone using the 32-bit ABI on 64-bit Arm 
hypervisor. So dropping the support should be fine.

Cheers,

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Wording adjustments as per review discussion.
> 
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -131,6 +131,12 @@ ARM only has one guest type at the momen
>   
>   ## Toolstack
>   
> +While 32-bit builds of the tool stack are generally supported, restrictions
> +apply in particular when running on top of a 64-bit hypervisor.  For example,
> +very large guests aren't expected to be manageable in this case.  This includes
> +guests giving the appearance of being large, by altering their own memory
> +layouts.
> +
>   ### xl
>   
>       Status: Supported
>
Jan Beulich July 15, 2021, 6:38 a.m. UTC | #2
On 14.07.2021 20:16, Julien Grall wrote:
> On 05/07/2021 16:18, Jan Beulich wrote:
>> Let's try to avoid giving the impression that 32-bit tool stacks are as
>> capable as 64-bit ones.
> 
> Would you be able to provide a few examples of the known issues in the 
> commit message? This would be helpful for anyone to understand why we 
> decided to drop the support.

Not sure how useful this is going to be. This would be pointing at the
declarations / definitions of various tool stack internal variables or
structure fields. Which also is why ...

> At least on Arm, we tried to design the hypercall ABI in such a way that 
> it should be possible to use a 32-bit toolstack.

... keeping the ABI tidy in this regard didn't help at all (albeit it
of course was a prereq to writing a tool stack that would be capable).

Jan
Julien Grall July 15, 2021, 9:05 a.m. UTC | #3
Hi Jan,

On 15/07/2021 07:38, Jan Beulich wrote:
> On 14.07.2021 20:16, Julien Grall wrote:
>> On 05/07/2021 16:18, Jan Beulich wrote:
>>> Let's try to avoid giving the impression that 32-bit tool stacks are as
>>> capable as 64-bit ones.
>>
>> Would you be able to provide a few examples of the known issues in the
>> commit message? This would be helpful for anyone to understand why we
>> decided to drop the support.
> 
> Not sure how useful this is going to be.

It would at least be useful to me, so I can make an informed decision. I 
suspect it would also be for anyone reading it in the future. This is 
rather frustrating to find commit message with barely any rationale and 
no-one remembering why this was done...

I vaguely recall a discussion about 64-bit hypercall ([1]). I assume the 
decision to drop support is related to it, but I have no way to prove it 
from the commit message.

It is also not clear why adding the restriction is the way to go...

> This would be pointing at the
> declarations / definitions of various tool stack internal variables or
> structure fields. Which also is why ...

... is this because such issues are too widespread in libxc/libxl to fix 
it in long term?

> 
>> At least on Arm, we tried to design the hypercall ABI in such a way that
>> it should be possible to use a 32-bit toolstack.
> 
> ... keeping the ABI tidy in this regard didn't help at all (albeit it
> of course was a prereq to writing a tool stack that would be capable).
> 
> Jan
> 

[1] 
https://lore.kernel.org/xen-devel/71b8a4f1-9c18-36e7-56b1-3f1b1dabddd6@suse.com/
Jan Beulich July 15, 2021, 11:36 a.m. UTC | #4
On 15.07.2021 11:05, Julien Grall wrote:
> On 15/07/2021 07:38, Jan Beulich wrote:
>> On 14.07.2021 20:16, Julien Grall wrote:
>>> On 05/07/2021 16:18, Jan Beulich wrote:
>>>> Let's try to avoid giving the impression that 32-bit tool stacks are as
>>>> capable as 64-bit ones.
>>>
>>> Would you be able to provide a few examples of the known issues in the
>>> commit message? This would be helpful for anyone to understand why we
>>> decided to drop the support.
>>
>> Not sure how useful this is going to be.
> 
> It would at least be useful to me, so I can make an informed decision. I 
> suspect it would also be for anyone reading it in the future. This is 
> rather frustrating to find commit message with barely any rationale and 
> no-one remembering why this was done...

Well, I've added "There are a number of cases there where 32-bit
types are used to hold e.g. frame numbers." Not sure whether you
consider this sufficient.

Problematic code may be primarily in areas Arm doesn't
care about (yet), like PCI pass-through or migration. But see e.g.
- xc_map_foreign_range()'s "mfn" and "size" parameters,
- xc_maximum_ram_page()'s "max_mfn" parameter,
- libxl_dom.c:hvm_build_set_params()'s "store_mfn" and "console_mfn"
  parameters,
- xs_introduce_domain()'s "mfn" parameter,
and quite a few more in particular in libxenguest.

And then there are also subtle oddities like xc_set_mem_access_multi()
having

    xen_mem_access_op_t mao =
    {
        .op       = XENMEM_access_op_set_access_multi,
        .domid    = domain_id,
        .access   = XENMEM_access_default + 1, /* Invalid value */
        .pfn      = ~0UL, /* Invalid GFN */
        .nr       = nr,
    };

Clearly ~0UL won't have the intended effect even for 32-bit guests,
when the field is uint64_t (we get away here because for whatever
reason the hypervisor doesn't check that the field indeed is ~0UL).
But I wouldn't be surprised to find uses where there would be a
difference. One of the main aspects certainly is ...

> I vaguely recall a discussion about 64-bit hypercall ([1]). I assume the 
> decision to drop support is related to it, but I have no way to prove it 
> from the commit message.

... this. Some XENMEM_* may return 64-bit values, yet the hypercall
interface is limited to "long" return types. Not even the multicall
approach taken to work around the restriction to "int" would help
here for x86-32, as struct multicall_entry also uses xen_ulong_t
for its "result" field.

> It is also not clear why adding the restriction is the way to go...
> 
>> This would be pointing at the
>> declarations / definitions of various tool stack internal variables or
>> structure fields. Which also is why ...
> 
> ... is this because such issues are too widespread in libxc/libxl to fix 
> it in long term?

Fixing is an option, but until it gets fixed (if anyone really cared
to do so), spelling out the restriction looks to be an appropriate
step to me (or else I wouldn't have followed the request and created
this patch). Once suitably audited, fixed, and tested, I wouldn't see
a reason not to remove the restriction again.

Jan
Julien Grall July 16, 2021, 7:50 a.m. UTC | #5
Hi Jan,

On 15/07/2021 12:36, Jan Beulich wrote:
> On 15.07.2021 11:05, Julien Grall wrote:
>> On 15/07/2021 07:38, Jan Beulich wrote:
>>> On 14.07.2021 20:16, Julien Grall wrote:
>>>> On 05/07/2021 16:18, Jan Beulich wrote:
>>>>> Let's try to avoid giving the impression that 32-bit tool stacks are as
>>>>> capable as 64-bit ones.
>>>>
>>>> Would you be able to provide a few examples of the known issues in the
>>>> commit message? This would be helpful for anyone to understand why we
>>>> decided to drop the support.
>>>
>>> Not sure how useful this is going to be.
>>
>> It would at least be useful to me, so I can make an informed decision. I
>> suspect it would also be for anyone reading it in the future. This is
>> rather frustrating to find commit message with barely any rationale and
>> no-one remembering why this was done...
> 
> Well, I've added "There are a number of cases there where 32-bit
> types are used to hold e.g. frame numbers." Not sure whether you
> consider this sufficient.

That's good enough for me in the commit message.

> 
> Problematic code may be primarily in areas Arm doesn't
> care about (yet), like PCI pass-through or migration. But see e.g.
> - xc_map_foreign_range()'s "mfn" and "size" parameters,
> - xc_maximum_ram_page()'s "max_mfn" parameter,
> - libxl_dom.c:hvm_build_set_params()'s "store_mfn" and "console_mfn"
>    parameters,
> - xs_introduce_domain()'s "mfn" parameter,
> and quite a few more in particular in libxenguest.

That's quite a few :/. Thanks for listing them on the ML, they are 
useful to have them log on the ML.

> 
> And then there are also subtle oddities like xc_set_mem_access_multi()
> having
> 
>      xen_mem_access_op_t mao =
>      {
>          .op       = XENMEM_access_op_set_access_multi,
>          .domid    = domain_id,
>          .access   = XENMEM_access_default + 1, /* Invalid value */
>          .pfn      = ~0UL, /* Invalid GFN */
>          .nr       = nr,
>      };
 >
> Clearly ~0UL won't have the intended effect even for 32-bit guests,
> when the field is uint64_t (we get away here because for whatever
> reason the hypervisor doesn't check that the field indeed is ~0UL).
> But I wouldn't be surprised to find uses where there would be a
> difference. One of the main aspects certainly is ...

Whoops :). One topic that came up on my series to drop the M2P helpers 
on Arm is the lack of a suitable define for invalid GFN.

However, at the first look, it didn't seem to be easy to introduce 
because the GFN is sometime stored in a 64-bit and other time in 
xen_ulong_t.

> 
>> I vaguely recall a discussion about 64-bit hypercall ([1]). I assume the
>> decision to drop support is related to it, but I have no way to prove it
>> from the commit message.
> 
> ... this. Some XENMEM_* may return 64-bit values, yet the hypercall
> interface is limited to "long" return types. Not even the multicall
> approach taken to work around the restriction to "int" would help
> here for x86-32, as struct multicall_entry also uses xen_ulong_t
> for its "result" field.
> 
>> It is also not clear why adding the restriction is the way to go...
>>
>>> This would be pointing at the
>>> declarations / definitions of various tool stack internal variables or
>>> structure fields. Which also is why ...
>>
>> ... is this because such issues are too widespread in libxc/libxl to fix
>> it in long term?
> 
> Fixing is an option, but until it gets fixed (if anyone really cared
> to do so), spelling out the restriction looks to be an appropriate
> step to me (or else I wouldn't have followed the request and created
> this patch). Once suitably audited, fixed, and tested, I wouldn't see
> a reason not to remove the restriction again.

Agreed. I was mostly wondering whether this was a matter of couple 
patches and could be restricted to maybe libxl (IOW toolstack based on 
libxc may not be affected). But from what you wrote, the issue is quite 
widespread.

Anyway, this is enough to convince me that dropping support (until it is 
fixed) for 32-bit toolstack on 64-bit hypervisor.

Cheers,
diff mbox series

Patch

--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -131,6 +131,12 @@  ARM only has one guest type at the momen
 
 ## Toolstack
 
+While 32-bit builds of the tool stack are generally supported, restrictions
+apply in particular when running on top of a 64-bit hypervisor.  For example,
+very large guests aren't expected to be manageable in this case.  This includes
+guests giving the appearance of being large, by altering their own memory
+layouts.
+
 ### xl
 
     Status: Supported