diff mbox series

Xen 4.18 pvshim console issue (with patch)

Message ID ZS-1wVlZjdAJyUz6@mail.soc.lip6.fr (mailing list archive)
State New, archived
Headers show
Series Xen 4.18 pvshim console issue (with patch) | expand

Commit Message

Manuel Bouyer Oct. 18, 2023, 10:38 a.m. UTC
Hello,
With Xen 4.18, a PV domain running under pvshim doesn't get console input.
This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
hardwired to 0), so console_input_domain() will never select that domain
as input.

The attached patch fixes it by translating 0 to the real domain id for
pvshim, but there may be a better way to do this.

Comments

Manuel Bouyer Oct. 18, 2023, 10:43 a.m. UTC | #1
On Wed, Oct 18, 2023 at 12:38:57PM +0200, Manuel Bouyer wrote:
> Hello,
> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
> hardwired to 0), so console_input_domain() will never select that domain
> as input.
> 
> The attached patch fixes it by translating 0 to the real domain id for
> pvshim, but there may be a better way to do this.

Small improvement, print the real domain ID instead of DOM0
Andrew Cooper Oct. 18, 2023, 10:44 a.m. UTC | #2
On 18/10/2023 11:38 am, Manuel Bouyer wrote:
> Hello,
> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
> hardwired to 0), so console_input_domain() will never select that domain
> as input.
> 
> The attached patch fixes it by translating 0 to the real domain id for
> pvshim, but there may be a better way to do this.
> 

Thankyou for the report.

First, CC'ing Henry as 4.18 release manager.

There have been changes in how this works recently in 4.18, notably c/s
c2581c58bec96.

However, it's not obvious whether this worked in 4.17 or not.  i.e.
whether it's a regression in 4.18, or whether it's been broken since PV
Shim was introduced.

~Andrew
Manuel Bouyer Oct. 18, 2023, 11:20 a.m. UTC | #3
On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote:
> On 18/10/2023 11:38 am, Manuel Bouyer wrote:
> > Hello,
> > With Xen 4.18, a PV domain running under pvshim doesn't get console input.
> > This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
> > hardwired to 0), so console_input_domain() will never select that domain
> > as input.
> > 
> > The attached patch fixes it by translating 0 to the real domain id for
> > pvshim, but there may be a better way to do this.
> > 
> 
> Thankyou for the report.
> 
> First, CC'ing Henry as 4.18 release manager.
> 
> There have been changes in how this works recently in 4.18, notably c/s
> c2581c58bec96.

Yes, it looks like this one introduced the problem.
Before this, we would switch console_rx to 1 without checking if
domain (console_rx - 1) exists, and console_rx == 1 is a special case
in __serial_rx()

> 
> However, it's not obvious whether this worked in 4.17 or not.  i.e.
> whether it's a regression in 4.18, or whether it's been broken since PV
> Shim was introduced.

I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15
Andrew Cooper Oct. 18, 2023, 11:23 a.m. UTC | #4
On 18/10/2023 12:20 pm, Manuel Bouyer wrote:
> On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote:
>> On 18/10/2023 11:38 am, Manuel Bouyer wrote:
>>> Hello,
>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
>>> hardwired to 0), so console_input_domain() will never select that domain
>>> as input.
>>>
>>> The attached patch fixes it by translating 0 to the real domain id for
>>> pvshim, but there may be a better way to do this.
>>>
>> Thankyou for the report.
>>
>> First, CC'ing Henry as 4.18 release manager.
>>
>> There have been changes in how this works recently in 4.18, notably c/s
>> c2581c58bec96.
> Yes, it looks like this one introduced the problem.
> Before this, we would switch console_rx to 1 without checking if
> domain (console_rx - 1) exists, and console_rx == 1 is a special case
> in __serial_rx()
>
>> However, it's not obvious whether this worked in 4.17 or not.  i.e.
>> whether it's a regression in 4.18, or whether it's been broken since PV
>> Shim was introduced.
> I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15
>

That commit is new in 4.18, so Henry - this is a release
critical/blocker owing to it being a regression vs 4.17.

I'll try and put some brainpower towards it when I get some other 4.18
work sorted.

~Andrew
Henry Wang Oct. 18, 2023, 11:24 a.m. UTC | #5
Hi Andrew,

> On Oct 18, 2023, at 19:23, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> 
> On 18/10/2023 12:20 pm, Manuel Bouyer wrote:
>> On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote:
>>> On 18/10/2023 11:38 am, Manuel Bouyer wrote:
>>>> Hello,
>>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
>>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
>>>> hardwired to 0), so console_input_domain() will never select that domain
>>>> as input.
>>>> 
>>>> The attached patch fixes it by translating 0 to the real domain id for
>>>> pvshim, but there may be a better way to do this.
>>>> 
>>> Thankyou for the report.
>>> 
>>> First, CC'ing Henry as 4.18 release manager.
>>> 
>>> There have been changes in how this works recently in 4.18, notably c/s
>>> c2581c58bec96.
>> Yes, it looks like this one introduced the problem.
>> Before this, we would switch console_rx to 1 without checking if
>> domain (console_rx - 1) exists, and console_rx == 1 is a special case
>> in __serial_rx()
>> 
>>> However, it's not obvious whether this worked in 4.17 or not.  i.e.
>>> whether it's a regression in 4.18, or whether it's been broken since PV
>>> Shim was introduced.
>> I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15
>> 
> 
> That commit is new in 4.18, so Henry - this is a release
> critical/blocker owing to it being a regression vs 4.17.

Noted down.

Out of curiosity, do we have maintainers for that specific driver? It would
be good to looping them in.

Kind regards,
Henry

> 
> I'll try and put some brainpower towards it when I get some other 4.18
> work sorted.
> 
> ~Andrew
Jan Beulich Oct. 18, 2023, 1:24 p.m. UTC | #6
On 18.10.2023 13:20, Manuel Bouyer wrote:
> On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote:
>> On 18/10/2023 11:38 am, Manuel Bouyer wrote:
>>> Hello,
>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
>>> hardwired to 0), so console_input_domain() will never select that domain
>>> as input.
>>>
>>> The attached patch fixes it by translating 0 to the real domain id for
>>> pvshim, but there may be a better way to do this.
>>>
>>
>> Thankyou for the report.
>>
>> First, CC'ing Henry as 4.18 release manager.
>>
>> There have been changes in how this works recently in 4.18, notably c/s
>> c2581c58bec96.
> 
> Yes, it looks like this one introduced the problem.
> Before this, we would switch console_rx to 1 without checking if
> domain (console_rx - 1) exists, and console_rx == 1 is a special case
> in __serial_rx()
> 
>>
>> However, it's not obvious whether this worked in 4.17 or not.  i.e.
>> whether it's a regression in 4.18, or whether it's been broken since PV
>> Shim was introduced.
> 
> I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15

From looking at the code, it doesn't look like it would: There
switch_serial_input() toggles console_rx between 0 and 1 afaict, and
console_input_domain() handles value 0 as "Xen" (like in 4.18), while
otherwise it calls rcu_lock_domain_by_id(console_rx - 1) (i.e. trying
to get hold of Dom0's domain structure, not Dom1's).

Jan
Jan Beulich Oct. 18, 2023, 1:29 p.m. UTC | #7
On 18.10.2023 12:38, Manuel Bouyer wrote:
> Hello,
> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
> hardwired to 0), so console_input_domain() will never select that domain
> as input.
> 
> The attached patch fixes it by translating 0 to the real domain id for
> pvshim, but there may be a better way to do this.

My primary observation with the patch is that it presumably won't build for
other than x86. There are also indentation and other style issues, no S-o-b,
and no description. But I wonder whether a different approach doesn't want
taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no
need for any other changes?

Also Cc-ing Michal as the author of the (possibly) offending patch.

Jan
Manuel Bouyer Oct. 18, 2023, 1:36 p.m. UTC | #8
On Wed, Oct 18, 2023 at 03:24:22PM +0200, Jan Beulich wrote:
> On 18.10.2023 13:20, Manuel Bouyer wrote:
> > On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote:
> >> On 18/10/2023 11:38 am, Manuel Bouyer wrote:
> >>> Hello,
> >>> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
> >>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
> >>> hardwired to 0), so console_input_domain() will never select that domain
> >>> as input.
> >>>
> >>> The attached patch fixes it by translating 0 to the real domain id for
> >>> pvshim, but there may be a better way to do this.
> >>>
> >>
> >> Thankyou for the report.
> >>
> >> First, CC'ing Henry as 4.18 release manager.
> >>
> >> There have been changes in how this works recently in 4.18, notably c/s
> >> c2581c58bec96.
> > 
> > Yes, it looks like this one introduced the problem.
> > Before this, we would switch console_rx to 1 without checking if
> > domain (console_rx - 1) exists, and console_rx == 1 is a special case
> > in __serial_rx()
> > 
> >>
> >> However, it's not obvious whether this worked in 4.17 or not.  i.e.
> >> whether it's a regression in 4.18, or whether it's been broken since PV
> >> Shim was introduced.
> > 
> > I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15
> 
> >From looking at the code, it doesn't look like it would: There
> switch_serial_input() toggles console_rx between 0 and 1 afaict, and
> console_input_domain() handles value 0 as "Xen" (like in 4.18), while
> otherwise it calls rcu_lock_domain_by_id(console_rx - 1) (i.e. trying
> to get hold of Dom0's domain structure, not Dom1's).

Well, I have a 4.15.5 installation in production and I can tell you that
with PV+PVSHIM, the console is working, for sure.

AFAIK console_input_domain() is called only on ARM, from
vpl011_write_data_xen(). It's never called for x86.
Manuel Bouyer Oct. 18, 2023, 1:40 p.m. UTC | #9
On Wed, Oct 18, 2023 at 03:29:08PM +0200, Jan Beulich wrote:
> On 18.10.2023 12:38, Manuel Bouyer wrote:
> > Hello,
> > With Xen 4.18, a PV domain running under pvshim doesn't get console input.
> > This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
> > hardwired to 0), so console_input_domain() will never select that domain
> > as input.
> > 
> > The attached patch fixes it by translating 0 to the real domain id for
> > pvshim, but there may be a better way to do this.
> 
> My primary observation with the patch is that it presumably won't build for
> other than x86.

It's possible, I don't have other platform to test

> There are also indentation and other style issues, no S-o-b,
> and no description.

I'll let whoever commit it deal with this

> But I wonder whether a different approach doesn't want
> taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no
> need for any other changes?

max_init_domid=1 won't help, because we're using the real domid from the
real hypervisor here. So it can be arbitrary large (in my tests I did a
printk here, and the domid was matching the id from 'xl list')
Jan Beulich Oct. 18, 2023, 1:51 p.m. UTC | #10
On 18.10.2023 15:36, Manuel Bouyer wrote:
> On Wed, Oct 18, 2023 at 03:24:22PM +0200, Jan Beulich wrote:
>> On 18.10.2023 13:20, Manuel Bouyer wrote:
>>> On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote:
>>>> On 18/10/2023 11:38 am, Manuel Bouyer wrote:
>>>>> Hello,
>>>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
>>>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
>>>>> hardwired to 0), so console_input_domain() will never select that domain
>>>>> as input.
>>>>>
>>>>> The attached patch fixes it by translating 0 to the real domain id for
>>>>> pvshim, but there may be a better way to do this.
>>>>>
>>>>
>>>> Thankyou for the report.
>>>>
>>>> First, CC'ing Henry as 4.18 release manager.
>>>>
>>>> There have been changes in how this works recently in 4.18, notably c/s
>>>> c2581c58bec96.
>>>
>>> Yes, it looks like this one introduced the problem.
>>> Before this, we would switch console_rx to 1 without checking if
>>> domain (console_rx - 1) exists, and console_rx == 1 is a special case
>>> in __serial_rx()
>>>
>>>>
>>>> However, it's not obvious whether this worked in 4.17 or not.  i.e.
>>>> whether it's a regression in 4.18, or whether it's been broken since PV
>>>> Shim was introduced.
>>>
>>> I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15
>>
>> >From looking at the code, it doesn't look like it would: There
>> switch_serial_input() toggles console_rx between 0 and 1 afaict, and
>> console_input_domain() handles value 0 as "Xen" (like in 4.18), while
>> otherwise it calls rcu_lock_domain_by_id(console_rx - 1) (i.e. trying
>> to get hold of Dom0's domain structure, not Dom1's).
> 
> Well, I have a 4.15.5 installation in production and I can tell you that
> with PV+PVSHIM, the console is working, for sure.
> 
> AFAIK console_input_domain() is called only on ARM, from
> vpl011_write_data_xen(). It's never called for x86.

Oh, indeed. I took your patch touching the function as meaning it's used
on x86. This then means my earlier suggestion won't work, as we need
console_rx to have the value 1 for input to be accepted from Dom1. Which
in turn means your change to switch_serial_input(), suitably cleaned up,
is then likely the best we can do, despite me not really liking the shim
special casing.

As to cleaning up - besides the build, indentation, and style issues I
think you want to drop the "&& pv_shim" part of the condition (as
get_initial_domain_id() takes care of that already), and ideally you'd
also move the new "domid" variable into the more narrow scope. If you
don't feel like providing a proper (updated) patch, I'm happy to take
over, but to retain indication of your initial work I'd need you to
permit me to add your S-o-b (alongside mine).

Jan
Manuel Bouyer Oct. 18, 2023, 1:56 p.m. UTC | #11
On Wed, Oct 18, 2023 at 03:51:54PM +0200, Jan Beulich wrote:
> On 18.10.2023 15:36, Manuel Bouyer wrote:
> > On Wed, Oct 18, 2023 at 03:24:22PM +0200, Jan Beulich wrote:
> >> On 18.10.2023 13:20, Manuel Bouyer wrote:
> >>> On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote:
> >>>> On 18/10/2023 11:38 am, Manuel Bouyer wrote:
> >>>>> Hello,
> >>>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
> >>>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
> >>>>> hardwired to 0), so console_input_domain() will never select that domain
> >>>>> as input.
> >>>>>
> >>>>> The attached patch fixes it by translating 0 to the real domain id for
> >>>>> pvshim, but there may be a better way to do this.
> >>>>>
> >>>>
> >>>> Thankyou for the report.
> >>>>
> >>>> First, CC'ing Henry as 4.18 release manager.
> >>>>
> >>>> There have been changes in how this works recently in 4.18, notably c/s
> >>>> c2581c58bec96.
> >>>
> >>> Yes, it looks like this one introduced the problem.
> >>> Before this, we would switch console_rx to 1 without checking if
> >>> domain (console_rx - 1) exists, and console_rx == 1 is a special case
> >>> in __serial_rx()
> >>>
> >>>>
> >>>> However, it's not obvious whether this worked in 4.17 or not.  i.e.
> >>>> whether it's a regression in 4.18, or whether it's been broken since PV
> >>>> Shim was introduced.
> >>>
> >>> I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15
> >>
> >> >From looking at the code, it doesn't look like it would: There
> >> switch_serial_input() toggles console_rx between 0 and 1 afaict, and
> >> console_input_domain() handles value 0 as "Xen" (like in 4.18), while
> >> otherwise it calls rcu_lock_domain_by_id(console_rx - 1) (i.e. trying
> >> to get hold of Dom0's domain structure, not Dom1's).
> > 
> > Well, I have a 4.15.5 installation in production and I can tell you that
> > with PV+PVSHIM, the console is working, for sure.
> > 
> > AFAIK console_input_domain() is called only on ARM, from
> > vpl011_write_data_xen(). It's never called for x86.
> 
> Oh, indeed. I took your patch touching the function as meaning it's used
> on x86. This then means my earlier suggestion won't work, as we need
> console_rx to have the value 1 for input to be accepted from Dom1. Which
> in turn means your change to switch_serial_input(), suitably cleaned up,
> is then likely the best we can do, despite me not really liking the shim
> special casing.
> 
> As to cleaning up - besides the build, indentation, and style issues I
> think you want to drop the "&& pv_shim" part of the condition (as
> get_initial_domain_id() takes care of that already), and ideally you'd
> also move the new "domid" variable into the more narrow scope. If you
> don't feel like providing a proper (updated) patch, I'm happy to take
> over, but to retain indication of your initial work I'd need you to
> permit me to add your S-o-b (alongside mine).

No problem, please do !
Michal Orzel Oct. 18, 2023, 2:18 p.m. UTC | #12
Hi,

On 18/10/2023 15:29, Jan Beulich wrote:
> 
> 
> On 18.10.2023 12:38, Manuel Bouyer wrote:
>> Hello,
>> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
>> hardwired to 0), so console_input_domain() will never select that domain
>> as input.
>>
>> The attached patch fixes it by translating 0 to the real domain id for
>> pvshim, but there may be a better way to do this.
> 
> My primary observation with the patch is that it presumably won't build for
> other than x86. There are also indentation and other style issues, no S-o-b,
> and no description. But I wonder whether a different approach doesn't want
> taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no
> need for any other changes?
> 
> Also Cc-ing Michal as the author of the (possibly) offending patch.
What if we set max_init_domid in pvshim case to the value returned by get_initial_domain_id()
in create_dom0()? The drawback is that we would waste some time in a loop if the value is large.

~Michal
Manuel Bouyer Oct. 18, 2023, 2:37 p.m. UTC | #13
On Wed, Oct 18, 2023 at 04:18:26PM +0200, Michal Orzel wrote:
> Hi,
> 
> On 18/10/2023 15:29, Jan Beulich wrote:
> > 
> > 
> > On 18.10.2023 12:38, Manuel Bouyer wrote:
> >> Hello,
> >> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
> >> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
> >> hardwired to 0), so console_input_domain() will never select that domain
> >> as input.
> >>
> >> The attached patch fixes it by translating 0 to the real domain id for
> >> pvshim, but there may be a better way to do this.
> > 
> > My primary observation with the patch is that it presumably won't build for
> > other than x86. There are also indentation and other style issues, no S-o-b,
> > and no description. But I wonder whether a different approach doesn't want
> > taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no
> > need for any other changes?
> > 
> > Also Cc-ing Michal as the author of the (possibly) offending patch.
> What if we set max_init_domid in pvshim case to the value returned by get_initial_domain_id()
> in create_dom0()? The drawback is that we would waste some time in a loop if the value is large.

I considered this too, but max_init_domid is a constant on x86; it looked
like a more intrusive change.
Jan Beulich Oct. 18, 2023, 2:51 p.m. UTC | #14
On 18.10.2023 16:18, Michal Orzel wrote:
> On 18/10/2023 15:29, Jan Beulich wrote:
>> On 18.10.2023 12:38, Manuel Bouyer wrote:
>>> Hello,
>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input.
>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is
>>> hardwired to 0), so console_input_domain() will never select that domain
>>> as input.
>>>
>>> The attached patch fixes it by translating 0 to the real domain id for
>>> pvshim, but there may be a better way to do this.
>>
>> My primary observation with the patch is that it presumably won't build for
>> other than x86. There are also indentation and other style issues, no S-o-b,
>> and no description. But I wonder whether a different approach doesn't want
>> taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no
>> need for any other changes?
>>
>> Also Cc-ing Michal as the author of the (possibly) offending patch.
> What if we set max_init_domid in pvshim case to the value returned by get_initial_domain_id()
> in create_dom0()? The drawback is that we would waste some time in a loop if the value is large.

I'd like to avoid specifically that.

Jan
diff mbox series

Patch

--- drivers/char/console.c.orig	2023-10-18 12:24:57.221891748 +0200
+++ drivers/char/console.c	2023-10-18 12:30:26.072844802 +0200
@@ -478,14 +478,20 @@ 
 /* Make sure to rcu_unlock_domain after use */
 struct domain *console_input_domain(void)
 {
+    domid_t domid;
     if ( console_rx == 0 )
             return NULL;
-    return rcu_lock_domain_by_id(console_rx - 1);
+    if (console_rx == 1 && pv_shim)
+        domid = get_initial_domain_id();
+    else
+	domid = console_rx - 1;
+    return rcu_lock_domain_by_id(domid);
 }
 
 static void switch_serial_input(void)
 {
     unsigned int next_rx = console_rx;
+    domid_t domid;
 
     /*
      * Rotate among Xen, dom0 and boot-time created domUs while skipping
@@ -502,7 +508,11 @@ 
             break;
         }
 
-        d = rcu_lock_domain_by_id(next_rx - 1);
+	if (next_rx == 1 && pv_shim)
+	    domid = get_initial_domain_id();
+	else
+	    domid = next_rx - 1;
+        d = rcu_lock_domain_by_id(domid);
         if ( d )
         {
             rcu_unlock_domain(d);