diff mbox series

[v2,1/2] xen/x86: io_apic: Introduce a command line option to skip timer check

Message ID 20231211122322.15815-2-julien@xen.org (mailing list archive)
State New
Headers show
Series xen/x86: Optimize timer_irq_works() | expand

Commit Message

Julien Grall Dec. 11, 2023, 12:23 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

Currently, Xen will spend ~100ms to check if the timer works. If the
Admin knows their platform have a working timer, then it would be
handy to be able to bypass the check.

Introduce a command line option 'pit-irq-works' for this purpose.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

Changelog since v1:
    - Rename the command line option. I went with pit-irq-works rather
      than timer-irq-works because Roger thought it would be better suited
    - Rework the command line description
---
 docs/misc/xen-command-line.pandoc | 11 +++++++++++
 xen/arch/x86/io_apic.c            | 11 +++++++++++
 2 files changed, 22 insertions(+)

Comments

Jan Beulich Dec. 14, 2023, 10:10 a.m. UTC | #1
On 11.12.2023 13:23, Julien Grall wrote:
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2535,6 +2535,17 @@ pages) must also be specified via the tbuf_size parameter.
>  ### tickle_one_idle_cpu
>  > `= <boolean>`
>  
> +### pit-irq-works (x86)
> +> `=<boolean>`
> +
> +> Default: `false`
> +
> +Disables the code which tests for broken timer IRQ sources. Enabling
> +this option will reduce boot time on HW where the timer works properly.
> +
> +If the system is unstable when enabling the option, then it means you
> +may have a broken HW and therefore the testing cannot be be skipped.
> +
>  ### timer_slop
>  > `= <integer>`

With the rename this now needs to move up to retain sorting.

> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -57,6 +57,14 @@ bool __initdata ioapic_ack_forced;
>  int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
>  int __read_mostly nr_ioapics;
>  
> +/*
> + * The logic to check if the timer is working is expensive. So allow
> + * the admin to bypass it if they know their platform doesn't have
> + * a buggy timer.
> + */
> +static bool __initdata pit_irq_works;
> +boolean_param("pit-irq-works", pit_irq_works);
> +
>  /*
>   * Rough estimation of how many shared IRQs there are, can
>   * be changed anytime.
> @@ -1502,6 +1510,9 @@ static int __init timer_irq_works(void)
>  {
>      unsigned long t1, flags;
>  
> +    if ( pit_irq_works )
> +        return 1;

When the check is placed here, what exactly use of the option means is
system dependent. I consider this somewhat risky, so I'd prefer if the
check was put on the "normal" path in check_timer(). That way it'll
affect only the one case which we can generally consider "known good",
but not the cases where the virtual wire setups are being probed. I.e.

    if (pin1 != -1) {
        /*
         * Ok, does IRQ0 through the IOAPIC work?
         */
        unmask_IO_APIC_irq(irq_to_desc(0));
        if (pit_irq_works || timer_irq_works()) {
            local_irq_restore(flags);
            return;
        }

Plus this way changes to the various fallback paths can also be done
without needing to consider users who might be making use of the new
option.

Jan
Julien Grall Dec. 14, 2023, 10:14 a.m. UTC | #2
Hi,

On 14/12/2023 10:10, Jan Beulich wrote:
> On 11.12.2023 13:23, Julien Grall wrote:
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -2535,6 +2535,17 @@ pages) must also be specified via the tbuf_size parameter.
>>   ### tickle_one_idle_cpu
>>   > `= <boolean>`
>>   
>> +### pit-irq-works (x86)
>> +> `=<boolean>`
>> +
>> +> Default: `false`
>> +
>> +Disables the code which tests for broken timer IRQ sources. Enabling
>> +this option will reduce boot time on HW where the timer works properly.
>> +
>> +If the system is unstable when enabling the option, then it means you
>> +may have a broken HW and therefore the testing cannot be be skipped.
>> +
>>   ### timer_slop
>>   > `= <integer>`
> 
> With the rename this now needs to move up to retain sorting.

Ok.

> 
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -57,6 +57,14 @@ bool __initdata ioapic_ack_forced;
>>   int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
>>   int __read_mostly nr_ioapics;
>>   
>> +/*
>> + * The logic to check if the timer is working is expensive. So allow
>> + * the admin to bypass it if they know their platform doesn't have
>> + * a buggy timer.
>> + */
>> +static bool __initdata pit_irq_works;
>> +boolean_param("pit-irq-works", pit_irq_works);
>> +
>>   /*
>>    * Rough estimation of how many shared IRQs there are, can
>>    * be changed anytime.
>> @@ -1502,6 +1510,9 @@ static int __init timer_irq_works(void)
>>   {
>>       unsigned long t1, flags;
>>   
>> +    if ( pit_irq_works )
>> +        return 1;
> 
> When the check is placed here, what exactly use of the option means is
> system dependent. I consider this somewhat risky, so I'd prefer if the
> check was put on the "normal" path in check_timer(). That way it'll
> affect only the one case which we can generally consider "known good",
> but not the cases where the virtual wire setups are being probed. I.e.

I am not against restricting when we allow skipping the timer check. But 
in that case, I wonder why Linux is doing it differently?

After all, this code is heavily borrowed from Linux. So shouldn't we 
follow what they are doing?

Cheers,
Jan Beulich Dec. 14, 2023, 10:35 a.m. UTC | #3
On 14.12.2023 11:14, Julien Grall wrote:
> On 14/12/2023 10:10, Jan Beulich wrote:
>> On 11.12.2023 13:23, Julien Grall wrote:
>>> --- a/xen/arch/x86/io_apic.c
>>> +++ b/xen/arch/x86/io_apic.c
>>> @@ -57,6 +57,14 @@ bool __initdata ioapic_ack_forced;
>>>   int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
>>>   int __read_mostly nr_ioapics;
>>>   
>>> +/*
>>> + * The logic to check if the timer is working is expensive. So allow
>>> + * the admin to bypass it if they know their platform doesn't have
>>> + * a buggy timer.
>>> + */
>>> +static bool __initdata pit_irq_works;
>>> +boolean_param("pit-irq-works", pit_irq_works);
>>> +
>>>   /*
>>>    * Rough estimation of how many shared IRQs there are, can
>>>    * be changed anytime.
>>> @@ -1502,6 +1510,9 @@ static int __init timer_irq_works(void)
>>>   {
>>>       unsigned long t1, flags;
>>>   
>>> +    if ( pit_irq_works )
>>> +        return 1;
>>
>> When the check is placed here, what exactly use of the option means is
>> system dependent. I consider this somewhat risky, so I'd prefer if the
>> check was put on the "normal" path in check_timer(). That way it'll
>> affect only the one case which we can generally consider "known good",
>> but not the cases where the virtual wire setups are being probed. I.e.
> 
> I am not against restricting when we allow skipping the timer check. But 
> in that case, I wonder why Linux is doing it differently?

Sadly Linux'es git history doesn't go back far enough (begins only at past
2.6.11), so I can't (easily) find the patch (and description) for the x86-64
change. The later i386 change is justified mainly by paravirt needs, so
isn't applicable here. I wouldn't therefore exclude that my point above
wasn't even taken into consideration. Furthermore their command line option
is "no_timer_check", which to me firmly says "don't check" without regard to
whether the source (PIT) is actually okay. That's different with the option
name you (imo validly) chose.

Jan
Julien Grall Jan. 2, 2024, 7:09 p.m. UTC | #4
Hi Jan,

On 14/12/2023 10:35, Jan Beulich wrote:
> On 14.12.2023 11:14, Julien Grall wrote:
>> On 14/12/2023 10:10, Jan Beulich wrote:
>>> On 11.12.2023 13:23, Julien Grall wrote:
>>>> --- a/xen/arch/x86/io_apic.c
>>>> +++ b/xen/arch/x86/io_apic.c
>>>> @@ -57,6 +57,14 @@ bool __initdata ioapic_ack_forced;
>>>>    int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
>>>>    int __read_mostly nr_ioapics;
>>>>    
>>>> +/*
>>>> + * The logic to check if the timer is working is expensive. So allow
>>>> + * the admin to bypass it if they know their platform doesn't have
>>>> + * a buggy timer.
>>>> + */
>>>> +static bool __initdata pit_irq_works;
>>>> +boolean_param("pit-irq-works", pit_irq_works);
>>>> +
>>>>    /*
>>>>     * Rough estimation of how many shared IRQs there are, can
>>>>     * be changed anytime.
>>>> @@ -1502,6 +1510,9 @@ static int __init timer_irq_works(void)
>>>>    {
>>>>        unsigned long t1, flags;
>>>>    
>>>> +    if ( pit_irq_works )
>>>> +        return 1;
>>>
>>> When the check is placed here, what exactly use of the option means is
>>> system dependent. I consider this somewhat risky, so I'd prefer if the
>>> check was put on the "normal" path in check_timer(). That way it'll
>>> affect only the one case which we can generally consider "known good",
>>> but not the cases where the virtual wire setups are being probed. I.e.

By "known good", do you mean the following:

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index c89fbed8d675..c39d39ee951a 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1960,7 +1959,8 @@ static void __init check_timer(void)
           * Ok, does IRQ0 through the IOAPIC work?
           */
          unmask_IO_APIC_irq(irq_to_desc(0));
-        if (timer_irq_works()) {
+        if (pit_irq_works || timer_irq_works()) {
+            printk("====== pirq_irq_works %d =====\n", pit_irq_works);
              local_irq_restore(flags);
              return;
          }

>>
>> I am not against restricting when we allow skipping the timer check. But
>> in that case, I wonder why Linux is doing it differently?
> 
> Sadly Linux'es git history doesn't go back far enough (begins only at past
> 2.6.11), so I can't (easily) find the patch (and description) for the x86-64
> change. The later i386 change is justified mainly by paravirt needs, so
> isn't applicable here. I wouldn't therefore exclude that my point above
> wasn't even taken into consideration. Furthermore their command line option
> is "no_timer_check", which to me firmly says "don't check" without regard to
> whether the source (PIT) is actually okay. That's different with the option
> name you (imo validly) chose.

Just to note that the name was suggested by Roger. I have to admit that 
I didn't check if this made sense for the existing placement.

Anyway, I tested the change on the HW where I wanted to skip the timer 
check. And I can confirm this is still skipping the timer check.

So I will send a new version with the diff above and some updated comments.

Cheers,
Jan Beulich Jan. 4, 2024, 8:54 a.m. UTC | #5
On 02.01.2024 20:09, Julien Grall wrote:
> Hi Jan,
> 
> On 14/12/2023 10:35, Jan Beulich wrote:
>> On 14.12.2023 11:14, Julien Grall wrote:
>>> On 14/12/2023 10:10, Jan Beulich wrote:
>>>> On 11.12.2023 13:23, Julien Grall wrote:
>>>>> --- a/xen/arch/x86/io_apic.c
>>>>> +++ b/xen/arch/x86/io_apic.c
>>>>> @@ -57,6 +57,14 @@ bool __initdata ioapic_ack_forced;
>>>>>    int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
>>>>>    int __read_mostly nr_ioapics;
>>>>>    
>>>>> +/*
>>>>> + * The logic to check if the timer is working is expensive. So allow
>>>>> + * the admin to bypass it if they know their platform doesn't have
>>>>> + * a buggy timer.
>>>>> + */
>>>>> +static bool __initdata pit_irq_works;
>>>>> +boolean_param("pit-irq-works", pit_irq_works);
>>>>> +
>>>>>    /*
>>>>>     * Rough estimation of how many shared IRQs there are, can
>>>>>     * be changed anytime.
>>>>> @@ -1502,6 +1510,9 @@ static int __init timer_irq_works(void)
>>>>>    {
>>>>>        unsigned long t1, flags;
>>>>>    
>>>>> +    if ( pit_irq_works )
>>>>> +        return 1;
>>>>
>>>> When the check is placed here, what exactly use of the option means is
>>>> system dependent. I consider this somewhat risky, so I'd prefer if the
>>>> check was put on the "normal" path in check_timer(). That way it'll
>>>> affect only the one case which we can generally consider "known good",
>>>> but not the cases where the virtual wire setups are being probed. I.e.
> 
> By "known good", do you mean the following:
> 
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index c89fbed8d675..c39d39ee951a 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -1960,7 +1959,8 @@ static void __init check_timer(void)
>            * Ok, does IRQ0 through the IOAPIC work?
>            */
>           unmask_IO_APIC_irq(irq_to_desc(0));
> -        if (timer_irq_works()) {
> +        if (pit_irq_works || timer_irq_works()) {
> +            printk("====== pirq_irq_works %d =====\n", pit_irq_works);
>               local_irq_restore(flags);
>               return;
>           }

Yes.

>>> I am not against restricting when we allow skipping the timer check. But
>>> in that case, I wonder why Linux is doing it differently?
>>
>> Sadly Linux'es git history doesn't go back far enough (begins only at past
>> 2.6.11), so I can't (easily) find the patch (and description) for the x86-64
>> change. The later i386 change is justified mainly by paravirt needs, so
>> isn't applicable here. I wouldn't therefore exclude that my point above
>> wasn't even taken into consideration. Furthermore their command line option
>> is "no_timer_check", which to me firmly says "don't check" without regard to
>> whether the source (PIT) is actually okay. That's different with the option
>> name you (imo validly) chose.
> 
> Just to note that the name was suggested by Roger. I have to admit that 
> I didn't check if this made sense for the existing placement.

Roger, thoughts?

Jan

> Anyway, I tested the change on the HW where I wanted to skip the timer 
> check. And I can confirm this is still skipping the timer check.
> 
> So I will send a new version with the diff above and some updated comments.
> 
> Cheers,
>
Roger Pau Monne Jan. 15, 2024, 12:42 p.m. UTC | #6
On Thu, Jan 04, 2024 at 09:54:30AM +0100, Jan Beulich wrote:
> On 02.01.2024 20:09, Julien Grall wrote:
> > Hi Jan,
> > 
> > On 14/12/2023 10:35, Jan Beulich wrote:
> >> On 14.12.2023 11:14, Julien Grall wrote:
> >>> On 14/12/2023 10:10, Jan Beulich wrote:
> >>>> On 11.12.2023 13:23, Julien Grall wrote:
> >>>>> --- a/xen/arch/x86/io_apic.c
> >>>>> +++ b/xen/arch/x86/io_apic.c
> >>>>> @@ -57,6 +57,14 @@ bool __initdata ioapic_ack_forced;
> >>>>>    int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
> >>>>>    int __read_mostly nr_ioapics;
> >>>>>    
> >>>>> +/*
> >>>>> + * The logic to check if the timer is working is expensive. So allow
> >>>>> + * the admin to bypass it if they know their platform doesn't have
> >>>>> + * a buggy timer.
> >>>>> + */
> >>>>> +static bool __initdata pit_irq_works;
> >>>>> +boolean_param("pit-irq-works", pit_irq_works);
> >>>>> +
> >>>>>    /*
> >>>>>     * Rough estimation of how many shared IRQs there are, can
> >>>>>     * be changed anytime.
> >>>>> @@ -1502,6 +1510,9 @@ static int __init timer_irq_works(void)
> >>>>>    {
> >>>>>        unsigned long t1, flags;
> >>>>>    
> >>>>> +    if ( pit_irq_works )
> >>>>> +        return 1;
> >>>>
> >>>> When the check is placed here, what exactly use of the option means is
> >>>> system dependent. I consider this somewhat risky, so I'd prefer if the
> >>>> check was put on the "normal" path in check_timer(). That way it'll
> >>>> affect only the one case which we can generally consider "known good",
> >>>> but not the cases where the virtual wire setups are being probed. I.e.
> > 
> > By "known good", do you mean the following:
> > 
> > diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> > index c89fbed8d675..c39d39ee951a 100644
> > --- a/xen/arch/x86/io_apic.c
> > +++ b/xen/arch/x86/io_apic.c
> > @@ -1960,7 +1959,8 @@ static void __init check_timer(void)
> >            * Ok, does IRQ0 through the IOAPIC work?
> >            */
> >           unmask_IO_APIC_irq(irq_to_desc(0));
> > -        if (timer_irq_works()) {
> > +        if (pit_irq_works || timer_irq_works()) {
> > +            printk("====== pirq_irq_works %d =====\n", pit_irq_works);
> >               local_irq_restore(flags);
> >               return;
> >           }
> 
> Yes.
> 
> >>> I am not against restricting when we allow skipping the timer check. But
> >>> in that case, I wonder why Linux is doing it differently?
> >>
> >> Sadly Linux'es git history doesn't go back far enough (begins only at past
> >> 2.6.11), so I can't (easily) find the patch (and description) for the x86-64
> >> change. The later i386 change is justified mainly by paravirt needs, so
> >> isn't applicable here. I wouldn't therefore exclude that my point above
> >> wasn't even taken into consideration. Furthermore their command line option
> >> is "no_timer_check", which to me firmly says "don't check" without regard to
> >> whether the source (PIT) is actually okay. That's different with the option
> >> name you (imo validly) chose.
> > 
> > Just to note that the name was suggested by Roger. I have to admit that 
> > I didn't check if this made sense for the existing placement.
> 
> Roger, thoughts?

Right, with the usage of HPET in legacy replacement mode we are no
longer exclusively testing the PIT, so might make sense to use a more
generic name, timer-irq-works or some such.

Thanks, Roger.
diff mbox series

Patch

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 8e65f8bd18bf..c382b061b302 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2535,6 +2535,17 @@  pages) must also be specified via the tbuf_size parameter.
 ### tickle_one_idle_cpu
 > `= <boolean>`
 
+### pit-irq-works (x86)
+> `=<boolean>`
+
+> Default: `false`
+
+Disables the code which tests for broken timer IRQ sources. Enabling
+this option will reduce boot time on HW where the timer works properly.
+
+If the system is unstable when enabling the option, then it means you
+may have a broken HW and therefore the testing cannot be be skipped.
+
 ### timer_slop
 > `= <integer>`
 
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index d11c880544e6..238b6c1c2837 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -57,6 +57,14 @@  bool __initdata ioapic_ack_forced;
 int __read_mostly nr_ioapic_entries[MAX_IO_APICS];
 int __read_mostly nr_ioapics;
 
+/*
+ * The logic to check if the timer is working is expensive. So allow
+ * the admin to bypass it if they know their platform doesn't have
+ * a buggy timer.
+ */
+static bool __initdata pit_irq_works;
+boolean_param("pit-irq-works", pit_irq_works);
+
 /*
  * Rough estimation of how many shared IRQs there are, can
  * be changed anytime.
@@ -1502,6 +1510,9 @@  static int __init timer_irq_works(void)
 {
     unsigned long t1, flags;
 
+    if ( pit_irq_works )
+        return 1;
+
     t1 = ACCESS_ONCE(pit0_ticks);
 
     local_save_flags(flags);