Message ID | 20230327194126.3573997-3-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/ucode: Fixes to bootstrap_map() handling | expand |
On 27.03.2023 21:41, Andrew Cooper wrote: > It is not valid to retain a bootstrap_map() across returning back to > __start_xen(), but various pointers get stashed across calls. It's error prone, yes, but "not valid" isn't really true imo: As long as nothing calls bootstrap_map(NULL) all mappings will remain as they are. > --- a/xen/arch/x86/cpu/microcode/core.c > +++ b/xen/arch/x86/cpu/microcode/core.c > @@ -755,47 +755,51 @@ int microcode_update_one(void) > return microcode_update_cpu(NULL); > } > > -static int __init early_update_cache(const void *data, size_t len) > +int __init microcode_init_cache(unsigned long *module_map, > + const struct multiboot_info *mbi) > { > int rc = 0; > struct microcode_patch *patch; > + struct ucode_mod_blob blob = {}; > > - if ( !data ) > - return -ENOMEM; This is lost afaict. To be in sync with earlier code ) think you want to ... > + if ( ucode_scan ) > + /* Need to rescan the modules because they might have been relocated */ > + microcode_scan_module(module_map, mbi); > + > + if ( ucode_mod.mod_end ) > + { > + blob.data = bootstrap_map(&ucode_mod); ... check here instead of ... > + blob.size = ucode_mod.mod_end; > + } > + else if ( ucode_blob.size ) > + { > + blob = ucode_blob; > + } (nit: unnecessary braces) > - patch = parse_blob(data, len); > + if ( !blob.data ) > + return 0; ... here, making the "return 0" the "else" to the earlier if/else-if. Alternatively, if you think the -ENOMEM isn't sensible, I'm happy to consider respective justification for its removal. Jan
On 28/03/2023 2:51 pm, Jan Beulich wrote: > On 27.03.2023 21:41, Andrew Cooper wrote: >> It is not valid to retain a bootstrap_map() across returning back to >> __start_xen(), but various pointers get stashed across calls. > It's error prone, yes, but "not valid" isn't really true imo: As long as > nothing calls bootstrap_map(NULL) all mappings will remain as they are. And how is this code supposed to know whether it's stashed pointer is any good or not? This is precisely "not valid" means. It doesn't mean "it definitely doesn't work", but very much does mean "can't rely on it working". c/s dc380df12ac which introduced microcode_init_cache() demonstrates the problem by having to call scan module a second time to refresh the cached pointer in ucode_blob.data. The code we have today really is buggy, and is having to go out of its way to not trip over. >> --- a/xen/arch/x86/cpu/microcode/core.c >> +++ b/xen/arch/x86/cpu/microcode/core.c >> @@ -755,47 +755,51 @@ int microcode_update_one(void) >> return microcode_update_cpu(NULL); >> } >> >> -static int __init early_update_cache(const void *data, size_t len) >> +int __init microcode_init_cache(unsigned long *module_map, >> + const struct multiboot_info *mbi) >> { >> int rc = 0; >> struct microcode_patch *patch; >> + struct ucode_mod_blob blob = {}; >> >> - if ( !data ) >> - return -ENOMEM; > This is lost afaict. To be in sync with earlier code ) think you want to ... > >> + if ( ucode_scan ) >> + /* Need to rescan the modules because they might have been relocated */ >> + microcode_scan_module(module_map, mbi); >> + >> + if ( ucode_mod.mod_end ) >> + { >> + blob.data = bootstrap_map(&ucode_mod); > ... check here instead of ... > >> + blob.size = ucode_mod.mod_end; >> + } >> + else if ( ucode_blob.size ) >> + { >> + blob = ucode_blob; >> + } > (nit: unnecessary braces) > >> - patch = parse_blob(data, len); >> + if ( !blob.data ) >> + return 0; > ... here, making the "return 0" the "else" to the earlier if/else-if. > > Alternatively, if you think the -ENOMEM isn't sensible, I'm happy to > consider respective justification for its removal. I'm a little on the fence here. Nothing checks the return value at all, and nothing printed a failure previously either. Right now, the early boostrap map limit is 1G-16M, so this really does fail with a large enough initrd to scan. There is a corner case where the second pass can succeed where the first one failed, but this is also fixed in the general case when thread 1 loads microcode (and updates the BSP too.) I'm also not convinced that early bootstrap mapping is safe if the bootloader places Xen in [16M, 1G) but if that goes wrong, it will go wrong before we get to microcode loading. Overall, I think that bootstrap_map() itself should warn (and ideally provide a backtrace) if it fails to map, because one way or another it's an issue wanting fixing. Individual callers can't really do anything useful. ~Andrew
On 28.03.2023 17:07, Andrew Cooper wrote: > On 28/03/2023 2:51 pm, Jan Beulich wrote: >> On 27.03.2023 21:41, Andrew Cooper wrote: >>> It is not valid to retain a bootstrap_map() across returning back to >>> __start_xen(), but various pointers get stashed across calls. >> It's error prone, yes, but "not valid" isn't really true imo: As long as >> nothing calls bootstrap_map(NULL) all mappings will remain as they are. > > And how is this code supposed to know whether it's stashed pointer is > any good or not? > > This is precisely "not valid" means. It doesn't mean "it definitely > doesn't work", but very much does mean "can't rely on it working". Hmm, not my understanding of "not valid". > c/s dc380df12ac which introduced microcode_init_cache() demonstrates the > problem by having to call scan module a second time to refresh the > cached pointer in ucode_blob.data. > > The code we have today really is buggy, and is having to go out of its > way to not trip over. Right - the code is fragile. And it's okay calling it so. >>> --- a/xen/arch/x86/cpu/microcode/core.c >>> +++ b/xen/arch/x86/cpu/microcode/core.c >>> @@ -755,47 +755,51 @@ int microcode_update_one(void) >>> return microcode_update_cpu(NULL); >>> } >>> >>> -static int __init early_update_cache(const void *data, size_t len) >>> +int __init microcode_init_cache(unsigned long *module_map, >>> + const struct multiboot_info *mbi) >>> { >>> int rc = 0; >>> struct microcode_patch *patch; >>> + struct ucode_mod_blob blob = {}; >>> >>> - if ( !data ) >>> - return -ENOMEM; >> This is lost afaict. To be in sync with earlier code ) think you want to ... >> >>> + if ( ucode_scan ) >>> + /* Need to rescan the modules because they might have been relocated */ >>> + microcode_scan_module(module_map, mbi); >>> + >>> + if ( ucode_mod.mod_end ) >>> + { >>> + blob.data = bootstrap_map(&ucode_mod); >> ... check here instead of ... >> >>> + blob.size = ucode_mod.mod_end; >>> + } >>> + else if ( ucode_blob.size ) >>> + { >>> + blob = ucode_blob; >>> + } >> (nit: unnecessary braces) >> >>> - patch = parse_blob(data, len); >>> + if ( !blob.data ) >>> + return 0; >> ... here, making the "return 0" the "else" to the earlier if/else-if. >> >> Alternatively, if you think the -ENOMEM isn't sensible, I'm happy to >> consider respective justification for its removal. > > I'm a little on the fence here. Nothing checks the return value at all, > and nothing printed a failure previously either. > > Right now, the early boostrap map limit is 1G-16M, so this really does > fail with a large enough initrd to scan. There is a corner case where > the second pass can succeed where the first one failed, but this is also > fixed in the general case when thread 1 loads microcode (and updates the > BSP too.) As said - with the removal justified in the description (e.g. by an abridged version of the above), I'm okay. > I'm also not convinced that early bootstrap mapping is safe if the > bootloader places Xen in [16M, 1G) but if that goes wrong, it will go > wrong before we get to microcode loading. Hmm, yes, this could be a concern on systems with very little memory below 4G. Jan > Overall, I think that bootstrap_map() itself should warn (and ideally > provide a backtrace) if it fails to map, because one way or another it's > an issue wanting fixing. Individual callers can't really do anything > useful. > > ~Andrew
On 28.03.2023 17:07, Andrew Cooper wrote: > On 28/03/2023 2:51 pm, Jan Beulich wrote: >> On 27.03.2023 21:41, Andrew Cooper wrote: >>> --- a/xen/arch/x86/cpu/microcode/core.c >>> +++ b/xen/arch/x86/cpu/microcode/core.c >>> @@ -755,47 +755,51 @@ int microcode_update_one(void) >>> return microcode_update_cpu(NULL); >>> } >>> >>> -static int __init early_update_cache(const void *data, size_t len) >>> +int __init microcode_init_cache(unsigned long *module_map, >>> + const struct multiboot_info *mbi) >>> { >>> int rc = 0; >>> struct microcode_patch *patch; >>> + struct ucode_mod_blob blob = {}; >>> >>> - if ( !data ) >>> - return -ENOMEM; >> This is lost afaict. To be in sync with earlier code ) think you want to ... >> >>> + if ( ucode_scan ) >>> + /* Need to rescan the modules because they might have been relocated */ >>> + microcode_scan_module(module_map, mbi); >>> + >>> + if ( ucode_mod.mod_end ) >>> + { >>> + blob.data = bootstrap_map(&ucode_mod); >> ... check here instead of ... >> >>> + blob.size = ucode_mod.mod_end; >>> + } >>> + else if ( ucode_blob.size ) >>> + { >>> + blob = ucode_blob; >>> + } >> (nit: unnecessary braces) >> >>> - patch = parse_blob(data, len); >>> + if ( !blob.data ) >>> + return 0; >> ... here, making the "return 0" the "else" to the earlier if/else-if. >> >> Alternatively, if you think the -ENOMEM isn't sensible, I'm happy to >> consider respective justification for its removal. > > I'm a little on the fence here. Nothing checks the return value at all, > and nothing printed a failure previously either. So how about switching both microcode_init_cache() and, considering the similar aspect in patch 3, early_microcode_init() to return void? There's hardly any use the caller can make of the return value; if an error message was to be logged, it likely would better be logged closer to the source of the error. Jan
On Tue, Mar 28, 2023 at 4:58 PM Jan Beulich <jbeulich@suse.com> wrote: > On 28.03.2023 17:07, Andrew Cooper wrote: > > On 28/03/2023 2:51 pm, Jan Beulich wrote: > >> On 27.03.2023 21:41, Andrew Cooper wrote: > >>> It is not valid to retain a bootstrap_map() across returning back to > >>> __start_xen(), but various pointers get stashed across calls. > >> It's error prone, yes, but "not valid" isn't really true imo: As long as > >> nothing calls bootstrap_map(NULL) all mappings will remain as they are. > > > > And how is this code supposed to know whether it's stashed pointer is > > any good or not? > > > > This is precisely "not valid" means. It doesn't mean "it definitely > > doesn't work", but very much does mean "can't rely on it working". > > Hmm, not my understanding of "not valid". > A "valid" approach or algorithm is one which can be relied on. If an approach or algorithm may sometimes work or may sometimes not work, it's not valid. That said: * "Not valid" is kind of vague: it tells you think it's "bad", but not how. (Even "racy" or "risky" or "error-prone" are more descriptive, because it prompts you for the types of problems to think about.) It's usually better to state exactly what problems might happen if you do X, rather than simply saying X is "not valid". * It's usually better to propose specific alternate wording, rather than arguing about whether a given wording is... er, valid or not. -George
On 29.03.2023 11:13, George Dunlap wrote: > On Tue, Mar 28, 2023 at 4:58 PM Jan Beulich <jbeulich@suse.com> wrote: > >> On 28.03.2023 17:07, Andrew Cooper wrote: >>> On 28/03/2023 2:51 pm, Jan Beulich wrote: >>>> On 27.03.2023 21:41, Andrew Cooper wrote: >>>>> It is not valid to retain a bootstrap_map() across returning back to >>>>> __start_xen(), but various pointers get stashed across calls. >>>> It's error prone, yes, but "not valid" isn't really true imo: As long as >>>> nothing calls bootstrap_map(NULL) all mappings will remain as they are. >>> >>> And how is this code supposed to know whether it's stashed pointer is >>> any good or not? >>> >>> This is precisely "not valid" means. It doesn't mean "it definitely >>> doesn't work", but very much does mean "can't rely on it working". >> >> Hmm, not my understanding of "not valid". >> > > A "valid" approach or algorithm is one which can be relied on. If an > approach or algorithm may sometimes work or may sometimes not work, it's > not valid. Interesting. Still not my understanding of "not valid", not the least because "may work" again depends on further aspects. In the case here, carefully placed (or avoided) bootstrap_map(NULL) means the logic, which has served us well for years, does always work - so long as you apply sufficient care. Over time that "sufficient care" has arguably turned into "overly much care", and hence Andrew's rework here is definitely an improvement. All I dislike is to call things worse than they really are. (Another thing would be if "sufficient care" wasn't applied at some point, and if as a result we actually had active breakage somewhere.) > That said: > > * "Not valid" is kind of vague: it tells you think it's "bad", but not > how. (Even "racy" or "risky" or "error-prone" are more descriptive, > because it prompts you for the types of problems to think about.) It's > usually better to state exactly what problems might happen if you do X, > rather than simply saying X is "not valid". > > * It's usually better to propose specific alternate wording, rather than > arguing about whether a given wording is... er, valid or not. Which I did (without explicitly saying that this is an alternative wording proposal) by starting with "It's error prone, ...". Similarly (iirc) in replies elsewhere on this series' thread. Jan
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c index 11c471cc3f83..3d23e3ed7ee4 100644 --- a/xen/arch/x86/cpu/microcode/core.c +++ b/xen/arch/x86/cpu/microcode/core.c @@ -755,47 +755,51 @@ int microcode_update_one(void) return microcode_update_cpu(NULL); } -static int __init early_update_cache(const void *data, size_t len) +int __init microcode_init_cache(unsigned long *module_map, + const struct multiboot_info *mbi) { int rc = 0; struct microcode_patch *patch; + struct ucode_mod_blob blob = {}; - if ( !data ) - return -ENOMEM; + if ( ucode_scan ) + /* Need to rescan the modules because they might have been relocated */ + microcode_scan_module(module_map, mbi); + + if ( ucode_mod.mod_end ) + { + blob.data = bootstrap_map(&ucode_mod); + blob.size = ucode_mod.mod_end; + } + else if ( ucode_blob.size ) + { + blob = ucode_blob; + } - patch = parse_blob(data, len); + if ( !blob.data ) + return 0; + + patch = parse_blob(blob.data, blob.size); if ( IS_ERR(patch) ) { - printk(XENLOG_WARNING "Parsing microcode blob error %ld\n", - PTR_ERR(patch)); - return PTR_ERR(patch); + rc = PTR_ERR(patch); + printk(XENLOG_WARNING "Parsing microcode blob error %d\n", rc); + goto out; } if ( !patch ) - return -ENOENT; + { + rc = -ENOENT; + goto out; + } spin_lock(µcode_mutex); rc = microcode_update_cache(patch); spin_unlock(µcode_mutex); ASSERT(rc); - return rc; -} - -int __init microcode_init_cache(unsigned long *module_map, - const struct multiboot_info *mbi) -{ - int rc = 0; - - if ( ucode_scan ) - /* Need to rescan the modules because they might have been relocated */ - microcode_scan_module(module_map, mbi); - - if ( ucode_mod.mod_end ) - rc = early_update_cache(bootstrap_map(&ucode_mod), - ucode_mod.mod_end); - else if ( ucode_blob.size ) - rc = early_update_cache(ucode_blob.data, ucode_blob.size); + out: + bootstrap_map(NULL); return rc; }
It is not valid to retain a bootstrap_map() across returning back to __start_xen(), but various pointers get stashed across calls. Begin to address this by folding early_update_cache() into it's single caller, rearranging the exit path to always invalidate the mapping. No practical change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> CC: Sergey Dyasli <sergey.dyasli@citrix.com> --- xen/arch/x86/cpu/microcode/core.c | 54 +++++++++++++++++-------------- 1 file changed, 29 insertions(+), 25 deletions(-)