diff mbox series

[2/5] x86/ucode: Fold early_update_cache() into microcode_init_cache()

Message ID 20230327194126.3573997-3-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/ucode: Fixes to bootstrap_map() handling | expand

Commit Message

Andrew Cooper March 27, 2023, 7:41 p.m. UTC
It is not valid to retain a bootstrap_map() across returning back to
__start_xen(), but various pointers get stashed across calls.

Begin to address this by folding early_update_cache() into it's single caller,
rearranging the exit path to always invalidate the mapping.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Sergey Dyasli <sergey.dyasli@citrix.com>
---
 xen/arch/x86/cpu/microcode/core.c | 54 +++++++++++++++++--------------
 1 file changed, 29 insertions(+), 25 deletions(-)

Comments

Jan Beulich March 28, 2023, 1:51 p.m. UTC | #1
On 27.03.2023 21:41, Andrew Cooper wrote:
> It is not valid to retain a bootstrap_map() across returning back to
> __start_xen(), but various pointers get stashed across calls.

It's error prone, yes, but "not valid" isn't really true imo: As long as
nothing calls bootstrap_map(NULL) all mappings will remain as they are.

> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -755,47 +755,51 @@ int microcode_update_one(void)
>      return microcode_update_cpu(NULL);
>  }
>  
> -static int __init early_update_cache(const void *data, size_t len)
> +int __init microcode_init_cache(unsigned long *module_map,
> +                                const struct multiboot_info *mbi)
>  {
>      int rc = 0;
>      struct microcode_patch *patch;
> +    struct ucode_mod_blob blob = {};
>  
> -    if ( !data )
> -        return -ENOMEM;

This is lost afaict. To be in sync with earlier code ) think you want to ...

> +    if ( ucode_scan )
> +        /* Need to rescan the modules because they might have been relocated */
> +        microcode_scan_module(module_map, mbi);
> +
> +    if ( ucode_mod.mod_end )
> +    {
> +        blob.data = bootstrap_map(&ucode_mod);

... check here instead of ...

> +        blob.size = ucode_mod.mod_end;
> +    }
> +    else if ( ucode_blob.size )
> +    {
> +        blob = ucode_blob;
> +    }

(nit: unnecessary braces)

> -    patch = parse_blob(data, len);
> +    if ( !blob.data )
> +        return 0;

... here, making the "return 0" the "else" to the earlier if/else-if.

Alternatively, if you think the -ENOMEM isn't sensible, I'm happy to
consider respective justification for its removal.

Jan
Andrew Cooper March 28, 2023, 3:07 p.m. UTC | #2
On 28/03/2023 2:51 pm, Jan Beulich wrote:
> On 27.03.2023 21:41, Andrew Cooper wrote:
>> It is not valid to retain a bootstrap_map() across returning back to
>> __start_xen(), but various pointers get stashed across calls.
> It's error prone, yes, but "not valid" isn't really true imo: As long as
> nothing calls bootstrap_map(NULL) all mappings will remain as they are.

And how is this code supposed to know whether it's stashed pointer is
any good or not?

This is precisely "not valid" means.  It doesn't mean "it definitely
doesn't work", but very much does mean "can't rely on it working".

c/s dc380df12ac which introduced microcode_init_cache() demonstrates the
problem by having to call scan module a second time to refresh the
cached pointer in ucode_blob.data.

The code we have today really is buggy, and is having to go out of its
way to not trip over.

>> --- a/xen/arch/x86/cpu/microcode/core.c
>> +++ b/xen/arch/x86/cpu/microcode/core.c
>> @@ -755,47 +755,51 @@ int microcode_update_one(void)
>>      return microcode_update_cpu(NULL);
>>  }
>>  
>> -static int __init early_update_cache(const void *data, size_t len)
>> +int __init microcode_init_cache(unsigned long *module_map,
>> +                                const struct multiboot_info *mbi)
>>  {
>>      int rc = 0;
>>      struct microcode_patch *patch;
>> +    struct ucode_mod_blob blob = {};
>>  
>> -    if ( !data )
>> -        return -ENOMEM;
> This is lost afaict. To be in sync with earlier code ) think you want to ...
>
>> +    if ( ucode_scan )
>> +        /* Need to rescan the modules because they might have been relocated */
>> +        microcode_scan_module(module_map, mbi);
>> +
>> +    if ( ucode_mod.mod_end )
>> +    {
>> +        blob.data = bootstrap_map(&ucode_mod);
> ... check here instead of ...
>
>> +        blob.size = ucode_mod.mod_end;
>> +    }
>> +    else if ( ucode_blob.size )
>> +    {
>> +        blob = ucode_blob;
>> +    }
> (nit: unnecessary braces)
>
>> -    patch = parse_blob(data, len);
>> +    if ( !blob.data )
>> +        return 0;
> ... here, making the "return 0" the "else" to the earlier if/else-if.
>
> Alternatively, if you think the -ENOMEM isn't sensible, I'm happy to
> consider respective justification for its removal.

I'm a little on the fence here.  Nothing checks the return value at all,
and nothing printed a failure previously either.

Right now, the early boostrap map limit is 1G-16M, so this really does
fail with a large enough initrd to scan.  There is a corner case where
the second pass can succeed where the first one failed, but this is also
fixed in the general case when thread 1 loads microcode (and updates the
BSP too.)

I'm also not convinced that early bootstrap mapping is safe if the
bootloader places Xen in [16M, 1G) but if that goes wrong, it will go
wrong before we get to microcode loading.

Overall, I think that bootstrap_map() itself should warn (and ideally
provide a backtrace) if it fails to map, because one way or another it's
an issue wanting fixing.  Individual callers can't really do anything
useful.

~Andrew
Jan Beulich March 28, 2023, 3:57 p.m. UTC | #3
On 28.03.2023 17:07, Andrew Cooper wrote:
> On 28/03/2023 2:51 pm, Jan Beulich wrote:
>> On 27.03.2023 21:41, Andrew Cooper wrote:
>>> It is not valid to retain a bootstrap_map() across returning back to
>>> __start_xen(), but various pointers get stashed across calls.
>> It's error prone, yes, but "not valid" isn't really true imo: As long as
>> nothing calls bootstrap_map(NULL) all mappings will remain as they are.
> 
> And how is this code supposed to know whether it's stashed pointer is
> any good or not?
> 
> This is precisely "not valid" means.  It doesn't mean "it definitely
> doesn't work", but very much does mean "can't rely on it working".

Hmm, not my understanding of "not valid".

> c/s dc380df12ac which introduced microcode_init_cache() demonstrates the
> problem by having to call scan module a second time to refresh the
> cached pointer in ucode_blob.data.
> 
> The code we have today really is buggy, and is having to go out of its
> way to not trip over.

Right - the code is fragile. And it's okay calling it so.

>>> --- a/xen/arch/x86/cpu/microcode/core.c
>>> +++ b/xen/arch/x86/cpu/microcode/core.c
>>> @@ -755,47 +755,51 @@ int microcode_update_one(void)
>>>      return microcode_update_cpu(NULL);
>>>  }
>>>  
>>> -static int __init early_update_cache(const void *data, size_t len)
>>> +int __init microcode_init_cache(unsigned long *module_map,
>>> +                                const struct multiboot_info *mbi)
>>>  {
>>>      int rc = 0;
>>>      struct microcode_patch *patch;
>>> +    struct ucode_mod_blob blob = {};
>>>  
>>> -    if ( !data )
>>> -        return -ENOMEM;
>> This is lost afaict. To be in sync with earlier code ) think you want to ...
>>
>>> +    if ( ucode_scan )
>>> +        /* Need to rescan the modules because they might have been relocated */
>>> +        microcode_scan_module(module_map, mbi);
>>> +
>>> +    if ( ucode_mod.mod_end )
>>> +    {
>>> +        blob.data = bootstrap_map(&ucode_mod);
>> ... check here instead of ...
>>
>>> +        blob.size = ucode_mod.mod_end;
>>> +    }
>>> +    else if ( ucode_blob.size )
>>> +    {
>>> +        blob = ucode_blob;
>>> +    }
>> (nit: unnecessary braces)
>>
>>> -    patch = parse_blob(data, len);
>>> +    if ( !blob.data )
>>> +        return 0;
>> ... here, making the "return 0" the "else" to the earlier if/else-if.
>>
>> Alternatively, if you think the -ENOMEM isn't sensible, I'm happy to
>> consider respective justification for its removal.
> 
> I'm a little on the fence here.  Nothing checks the return value at all,
> and nothing printed a failure previously either.
> 
> Right now, the early boostrap map limit is 1G-16M, so this really does
> fail with a large enough initrd to scan.  There is a corner case where
> the second pass can succeed where the first one failed, but this is also
> fixed in the general case when thread 1 loads microcode (and updates the
> BSP too.)

As said - with the removal justified in the description (e.g. by an
abridged version of the above), I'm okay.

> I'm also not convinced that early bootstrap mapping is safe if the
> bootloader places Xen in [16M, 1G) but if that goes wrong, it will go
> wrong before we get to microcode loading.

Hmm, yes, this could be a concern on systems with very little memory
below 4G.

Jan

> Overall, I think that bootstrap_map() itself should warn (and ideally
> provide a backtrace) if it fails to map, because one way or another it's
> an issue wanting fixing.  Individual callers can't really do anything
> useful.
> 
> ~Andrew
Jan Beulich March 29, 2023, 7:41 a.m. UTC | #4
On 28.03.2023 17:07, Andrew Cooper wrote:
> On 28/03/2023 2:51 pm, Jan Beulich wrote:
>> On 27.03.2023 21:41, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/cpu/microcode/core.c
>>> +++ b/xen/arch/x86/cpu/microcode/core.c
>>> @@ -755,47 +755,51 @@ int microcode_update_one(void)
>>>      return microcode_update_cpu(NULL);
>>>  }
>>>  
>>> -static int __init early_update_cache(const void *data, size_t len)
>>> +int __init microcode_init_cache(unsigned long *module_map,
>>> +                                const struct multiboot_info *mbi)
>>>  {
>>>      int rc = 0;
>>>      struct microcode_patch *patch;
>>> +    struct ucode_mod_blob blob = {};
>>>  
>>> -    if ( !data )
>>> -        return -ENOMEM;
>> This is lost afaict. To be in sync with earlier code ) think you want to ...
>>
>>> +    if ( ucode_scan )
>>> +        /* Need to rescan the modules because they might have been relocated */
>>> +        microcode_scan_module(module_map, mbi);
>>> +
>>> +    if ( ucode_mod.mod_end )
>>> +    {
>>> +        blob.data = bootstrap_map(&ucode_mod);
>> ... check here instead of ...
>>
>>> +        blob.size = ucode_mod.mod_end;
>>> +    }
>>> +    else if ( ucode_blob.size )
>>> +    {
>>> +        blob = ucode_blob;
>>> +    }
>> (nit: unnecessary braces)
>>
>>> -    patch = parse_blob(data, len);
>>> +    if ( !blob.data )
>>> +        return 0;
>> ... here, making the "return 0" the "else" to the earlier if/else-if.
>>
>> Alternatively, if you think the -ENOMEM isn't sensible, I'm happy to
>> consider respective justification for its removal.
> 
> I'm a little on the fence here.  Nothing checks the return value at all,
> and nothing printed a failure previously either.

So how about switching both microcode_init_cache() and, considering
the similar aspect in patch 3, early_microcode_init() to return void?
There's hardly any use the caller can make of the return value; if an
error message was to be logged, it likely would better be logged
closer to the source of the error.

Jan
George Dunlap March 29, 2023, 9:13 a.m. UTC | #5
On Tue, Mar 28, 2023 at 4:58 PM Jan Beulich <jbeulich@suse.com> wrote:

> On 28.03.2023 17:07, Andrew Cooper wrote:
> > On 28/03/2023 2:51 pm, Jan Beulich wrote:
> >> On 27.03.2023 21:41, Andrew Cooper wrote:
> >>> It is not valid to retain a bootstrap_map() across returning back to
> >>> __start_xen(), but various pointers get stashed across calls.
> >> It's error prone, yes, but "not valid" isn't really true imo: As long as
> >> nothing calls bootstrap_map(NULL) all mappings will remain as they are.
> >
> > And how is this code supposed to know whether it's stashed pointer is
> > any good or not?
> >
> > This is precisely "not valid" means.  It doesn't mean "it definitely
> > doesn't work", but very much does mean "can't rely on it working".
>
> Hmm, not my understanding of "not valid".
>

A "valid" approach or algorithm is one which can be relied on.  If an
approach or algorithm may sometimes work or may sometimes not work, it's
not valid.

That said:

* "Not valid" is kind of vague: it tells you think it's "bad", but not
how.  (Even "racy" or "risky" or "error-prone" are more descriptive,
because it prompts you for the types of problems to think about.) It's
usually better to state exactly what problems might happen if you do X,
rather than simply saying X is "not valid".

* It's usually better to propose specific alternate wording, rather than
arguing about whether a given wording is... er, valid or not.

 -George
Jan Beulich March 29, 2023, 9:26 a.m. UTC | #6
On 29.03.2023 11:13, George Dunlap wrote:
> On Tue, Mar 28, 2023 at 4:58 PM Jan Beulich <jbeulich@suse.com> wrote:
> 
>> On 28.03.2023 17:07, Andrew Cooper wrote:
>>> On 28/03/2023 2:51 pm, Jan Beulich wrote:
>>>> On 27.03.2023 21:41, Andrew Cooper wrote:
>>>>> It is not valid to retain a bootstrap_map() across returning back to
>>>>> __start_xen(), but various pointers get stashed across calls.
>>>> It's error prone, yes, but "not valid" isn't really true imo: As long as
>>>> nothing calls bootstrap_map(NULL) all mappings will remain as they are.
>>>
>>> And how is this code supposed to know whether it's stashed pointer is
>>> any good or not?
>>>
>>> This is precisely "not valid" means.  It doesn't mean "it definitely
>>> doesn't work", but very much does mean "can't rely on it working".
>>
>> Hmm, not my understanding of "not valid".
>>
> 
> A "valid" approach or algorithm is one which can be relied on.  If an
> approach or algorithm may sometimes work or may sometimes not work, it's
> not valid.

Interesting. Still not my understanding of "not valid", not the least
because "may work" again depends on further aspects. In the case here,
carefully placed (or avoided) bootstrap_map(NULL) means the logic, which
has served us well for years, does always work - so long as you apply
sufficient care. Over time that "sufficient care" has arguably turned
into "overly much care", and hence Andrew's rework here is definitely an
improvement. All I dislike is to call things worse than they really are.
(Another thing would be if "sufficient care" wasn't applied at some
point, and if as a result we actually had active breakage somewhere.)

> That said:
> 
> * "Not valid" is kind of vague: it tells you think it's "bad", but not
> how.  (Even "racy" or "risky" or "error-prone" are more descriptive,
> because it prompts you for the types of problems to think about.) It's
> usually better to state exactly what problems might happen if you do X,
> rather than simply saying X is "not valid".
> 
> * It's usually better to propose specific alternate wording, rather than
> arguing about whether a given wording is... er, valid or not.

Which I did (without explicitly saying that this is an alternative wording
proposal) by starting with "It's error prone, ...". Similarly (iirc) in
replies elsewhere on this series' thread.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
index 11c471cc3f83..3d23e3ed7ee4 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -755,47 +755,51 @@  int microcode_update_one(void)
     return microcode_update_cpu(NULL);
 }
 
-static int __init early_update_cache(const void *data, size_t len)
+int __init microcode_init_cache(unsigned long *module_map,
+                                const struct multiboot_info *mbi)
 {
     int rc = 0;
     struct microcode_patch *patch;
+    struct ucode_mod_blob blob = {};
 
-    if ( !data )
-        return -ENOMEM;
+    if ( ucode_scan )
+        /* Need to rescan the modules because they might have been relocated */
+        microcode_scan_module(module_map, mbi);
+
+    if ( ucode_mod.mod_end )
+    {
+        blob.data = bootstrap_map(&ucode_mod);
+        blob.size = ucode_mod.mod_end;
+    }
+    else if ( ucode_blob.size )
+    {
+        blob = ucode_blob;
+    }
 
-    patch = parse_blob(data, len);
+    if ( !blob.data )
+        return 0;
+
+    patch = parse_blob(blob.data, blob.size);
     if ( IS_ERR(patch) )
     {
-        printk(XENLOG_WARNING "Parsing microcode blob error %ld\n",
-               PTR_ERR(patch));
-        return PTR_ERR(patch);
+        rc = PTR_ERR(patch);
+        printk(XENLOG_WARNING "Parsing microcode blob error %d\n", rc);
+        goto out;
     }
 
     if ( !patch )
-        return -ENOENT;
+    {
+        rc = -ENOENT;
+        goto out;
+    }
 
     spin_lock(&microcode_mutex);
     rc = microcode_update_cache(patch);
     spin_unlock(&microcode_mutex);
     ASSERT(rc);
 
-    return rc;
-}
-
-int __init microcode_init_cache(unsigned long *module_map,
-                                const struct multiboot_info *mbi)
-{
-    int rc = 0;
-
-    if ( ucode_scan )
-        /* Need to rescan the modules because they might have been relocated */
-        microcode_scan_module(module_map, mbi);
-
-    if ( ucode_mod.mod_end )
-        rc = early_update_cache(bootstrap_map(&ucode_mod),
-                                ucode_mod.mod_end);
-    else if ( ucode_blob.size )
-        rc = early_update_cache(ucode_blob.data, ucode_blob.size);
+ out:
+    bootstrap_map(NULL);
 
     return rc;
 }