Message ID | 1554492364-21684-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/livepatch: Fail the build if duplicate symbols exist | expand |
On Fri, Apr 05, 2019 at 08:26:04PM +0100, Andrew Cooper wrote: > From: Ross Lagerwall <ross.lagerwall@citrix.com> > > The binary diffing algorithm used by xen-livepatch depends on having unique > symbols. > > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Wei Liu <wei.liu2@citrix.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Stefano Stabellini <sstabellini@kernel.org> > CC: Julien Grall <julien.grall@arm.com> > CC: Ross Lagerwall <ross.lagerwall@citrix.com> > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > CC: Norbert Manthey <nmanthey@amazon.de> > > This patch has been part of the XenServer patchqueue for ages, and should have > been upstreamed sooner. It obviously can't be applied while the "block > speculation" issue is outstanding. > --- > xen/arch/x86/Makefile | 1 + > xen/tools/symbols.c | 11 +++++++++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile > index ef09939..d4ae46b 100644 > --- a/xen/arch/x86/Makefile > +++ b/xen/arch/x86/Makefile > @@ -98,6 +98,7 @@ endif > > syms-warn-dup-y := --warn-dup > syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) := > +syms-warn-dup-$(CONFIG_LIVEPATCH) := --error-dup > > $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 > ./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) $(XEN_IMG_OFFSET) \ > diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c > index 05139d1..9f9e2c9 100644 > --- a/xen/tools/symbols.c > +++ b/xen/tools/symbols.c > @@ -599,7 +599,7 @@ static int compare_name(const void *p1, const void *p2) > int main(int argc, char **argv) > { > unsigned int i; > - bool unsorted = false, warn_dup = false; > + bool unsorted = false, warn_dup = false, error_dup = false, found_dup = false; > > if (argc >= 2) { > for (i = 1; i < argc; i++) { > @@ -619,6 +619,8 @@ int main(int argc, char **argv) > sort_by_name = 1; > else if (strcmp(argv[i], "--warn-dup") == 0) > warn_dup = true; > + else if (strcmp(argv[i], "--error-dup") == 0) > + warn_dup = error_dup = true; > else if (strcmp(argv[i], "--xensyms") == 0) > map_only = true; > else > @@ -634,14 +636,19 @@ int main(int argc, char **argv) > for (i = 1; i < table_cnt; ++i) > if (strcmp(SYMBOL_NAME(table + i - 1), > SYMBOL_NAME(table + i)) == 0 && > - table[i - 1].addr != table[i].addr) > + table[i - 1].addr != table[i].addr) { > fprintf(stderr, > "Duplicate symbol '%s' (%llx != %llx)\n", > SYMBOL_NAME(table + i), > table[i].addr, table[i - 1].addr); > + found_dup = true; > + } > unsorted = true; > } > > + if (error_dup && found_dup) > + exit(1); > + > if (unsorted) > qsort(table, table_cnt, sizeof(*table), compare_value); > > -- > 2.1.4 >
>>> On 05.04.19 at 23:02, <konrad.wilk@oracle.com> wrote: > On Fri, Apr 05, 2019 at 08:26:04PM +0100, Andrew Cooper wrote: >> From: Ross Lagerwall <ross.lagerwall@citrix.com> >> >> The binary diffing algorithm used by xen-livepatch depends on having unique >> symbols. Question: Is this an inherent requirement, or an effect of the current implementation? It would seem to me that at least in the common case (binary didn't changer overly heavily) it ought to be possible to re-associate the duplicates with their correct origins. And by making the build fail (including in cases like the one we have right now), we're painting ourselves into the corner of having to "fix" code which isn't actually buggy. Was the option of altering (amending) the symbol name instead of causing failure considered? Again, in the common case this shouldn't have overly bad implications: Typically the order of symbols wouldn't change between builds, so tagging the duplicates with a sequence number might be good enough. And even if the order changed, as long as both instances in fact derive from the same inline function, all would still be fine afaict. Granted there's some heuristic underlying this, in that we then would hope for there not to be actually differing functions with the same source file and symbol names. >> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> As for the time being this is hopefully going to do (once, as you say, the currently present issue has been addressed): Acked-by: Jan Beulich <jbeulich@suse.com> Jan
On 08/04/2019 10:52, Jan Beulich wrote: >>>> On 05.04.19 at 23:02, <konrad.wilk@oracle.com> wrote: >> On Fri, Apr 05, 2019 at 08:26:04PM +0100, Andrew Cooper wrote: >>> From: Ross Lagerwall <ross.lagerwall@citrix.com> >>> >>> The binary diffing algorithm used by xen-livepatch depends on having unique >>> symbols. > Question: Is this an inherent requirement, or an effect of the > current implementation? It would seem to me that at least in the > common case (binary didn't changer overly heavily) it ought to > be possible to re-associate the duplicates with their correct > origins. And by making the build fail (including in cases like the > one we have right now), we're painting ourselves into the > corner of having to "fix" code which isn't actually buggy. It is inherent. The tool needs a unique way to identify each block of code, so it can work out if the binary has changed by more than just relocations. How do you propose disambiguating ambiguous symbols after the fact? (This is semi rhetorical, because even if there is a possible way, there is no plausible way it is the sensible option to take.) > > Was the option of altering (amending) the symbol name instead > of causing failure considered? Amending how? We've already got one bodge to prepend the filename to static symbols. The problem here is that we have two domctl.c's which both out-of-line the functions, and the compiler-adjustment of .isra.0 is deterministic based on the function parameters. > Again, in the common case this > shouldn't have overly bad implications: Typically the order of > symbols wouldn't change between builds, so tagging the > duplicates with a sequence number might be good enough. This is painting ourselves into a corner with LTO. Outside of LTO, the link order of files doesn't change, but the order of symbols in a file changes a fair bit. Also, this reasoning depends on people only making trivial changes via livepatch, which, as it turns out, isn't the case in practice. > And even if the order changed, as long as both instances in fact > derive from the same inline function, all would still be fine afaict. How do you propose that we determine this? > Granted there's some heuristic underlying this, in that we then > would hope for there not to be actually differing functions with > the same source file and symbol names. The more assumptions we start making, the more subtly this will go wrong in the future. > >>> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > As for the time being this is hopefully going to do (once, as you > say, the currently present issue has been addressed): > Acked-by: Jan Beulich <jbeulich@suse.com> Thanks, ~Andrew
>>> On 08.04.19 at 12:11, <andrew.cooper3@citrix.com> wrote: > On 08/04/2019 10:52, Jan Beulich wrote: >>>>> On 05.04.19 at 23:02, <konrad.wilk@oracle.com> wrote: >>> On Fri, Apr 05, 2019 at 08:26:04PM +0100, Andrew Cooper wrote: >>>> From: Ross Lagerwall <ross.lagerwall@citrix.com> >>>> >>>> The binary diffing algorithm used by xen-livepatch depends on having unique >>>> symbols. >> Question: Is this an inherent requirement, or an effect of the >> current implementation? It would seem to me that at least in the >> common case (binary didn't changer overly heavily) it ought to >> be possible to re-associate the duplicates with their correct >> origins. And by making the build fail (including in cases like the >> one we have right now), we're painting ourselves into the >> corner of having to "fix" code which isn't actually buggy. > > It is inherent. The tool needs a unique way to identify each block of > code, so it can work out if the binary has changed by more than just > relocations. > > How do you propose disambiguating ambiguous symbols after the fact? > > (This is semi rhetorical, because even if there is a possible way, there > is no plausible way it is the sensible option to take.) I don't understand why you consider it impossible for this to be a sensible option. First of all, from the very beginning of live-patching's existence I've been questioning it having got made dependent on the internal symbol table we generate, for there being a chance of it being broken in some way. As to disambiguating: Other nearby local symbols could surely help. This might be a problem with -ffunction-sections, but other than I thought we don't appear to be using it - at least I couldn't get grep to produce a hit in xen/, config/, and the tree root. Finally we likely could avoid the ambiguities altogether, e.g. by making the compiler emit (relative) directories when populating STT_FILE symbols. We already qualify multiply compiled source files by an object file name; I don't see why we couldn't also suitably qualify singly compiled files (and perhaps there's even a compiler option, such that we wouldn't need to issue .file assembler directives ourselves). Another possibility to avoid (false) ambiguities could be to make the non-inline expansions of intended-to-be-inline functions link-once (which by implication means non-static, and hence necessarily uniquely named). >> Was the option of altering (amending) the symbol name instead >> of causing failure considered? > > Amending how? We've already got one bodge to prepend the filename to > static symbols. Right, in an attempt to disambiguate them. If the file names included (relative) paths, this would have been enough. Since they don't (for now at least), a sequence number might help. >> Again, in the common case this >> shouldn't have overly bad implications: Typically the order of >> symbols wouldn't change between builds, so tagging the >> duplicates with a sequence number might be good enough. > > This is painting ourselves into a corner with LTO. Hmm, yes, it all would depend on the amount of churn between builds. > Also, this reasoning depends on people only making trivial changes via > livepatch, which, as it turns out, isn't the case in practice. > >> And even if the order changed, as long as both instances in fact >> derive from the same inline function, all would still be fine afaict. > > How do you propose that we determine this? We build with debug info, so it ought to be possible in principle to know that both have the same source origin. >> Granted there's some heuristic underlying this, in that we then >> would hope for there not to be actually differing functions with >> the same source file and symbol names. > > The more assumptions we start making, the more subtly this will go wrong > in the future. Yes, I agree - that's the main downside. But failing a build is a pretty meaningful downside as well, the more when people may not build with LIVEPATCH=y before they submit. (FTR, personally I wouldn't consider it appropriate to revert in such a case, as an issue here has nothing to do with the actual change, but only with how we post-process data.) And even if they did, compiler (version) differences may make them not notice any issue. For this last reason even automatic testing (osstest or else) may end up assuming all is fine when it's not. Jan
On 08/04/2019 11:53, Jan Beulich wrote: >>>> On 08.04.19 at 12:11, <andrew.cooper3@citrix.com> wrote: >> On 08/04/2019 10:52, Jan Beulich wrote: >>>>>> On 05.04.19 at 23:02, <konrad.wilk@oracle.com> wrote: >>>> On Fri, Apr 05, 2019 at 08:26:04PM +0100, Andrew Cooper wrote: >>>>> From: Ross Lagerwall <ross.lagerwall@citrix.com> >>>>> >>>>> The binary diffing algorithm used by xen-livepatch depends on having unique >>>>> symbols. >>> Question: Is this an inherent requirement, or an effect of the >>> current implementation? It would seem to me that at least in the >>> common case (binary didn't changer overly heavily) it ought to >>> be possible to re-associate the duplicates with their correct >>> origins. And by making the build fail (including in cases like the >>> one we have right now), we're painting ourselves into the >>> corner of having to "fix" code which isn't actually buggy. >> It is inherent. The tool needs a unique way to identify each block of >> code, so it can work out if the binary has changed by more than just >> relocations. >> >> How do you propose disambiguating ambiguous symbols after the fact? >> >> (This is semi rhetorical, because even if there is a possible way, there >> is no plausible way it is the sensible option to take.) > I don't understand why you consider it impossible for this to be a > sensible option. Because there is no problem which is best solved by throwing away the information you've got, and then trying to reconstruct it at a later point using divination. > First of all, from the very beginning of live-patching's existence I've > been questioning it having got made dependent on the internal > symbol table we generate, for there being a chance of it being > broken in some way. Xen uses the same diffing mechanism as kpatch. This was chosen for a number reasons, but first and foremost, because Ross and Konrad did the work which automatically gives them a 9/10'th choice on how to solve the various issues at hand, and "reusing an existing tool" is a very good reason. You are welcome to invent a brand new way of performing binary diffing, but on top of needing to be at least as functional as the current solution, you will need to justify why deviating from the existing standard tooling is a good thing for the Xen community to be doing. Until then, the current solution depends on having unique symbols. > As to disambiguating: Other nearby local symbols could surely > help. This might be a problem with -ffunction-sections, but other > than I thought we don't appear to be using it - at least I couldn't > get grep to produce a hit in xen/, config/, and the tree root. -ffunction-sections is injected by the livepatch-buildtools build logic. It doesn't live in the main xen.git build. > Finally we likely could avoid the ambiguities altogether, e.g. by > making the compiler emit (relative) directories when populating > STT_FILE symbols. We already qualify multiply compiled source > files by an object file name; I don't see why we couldn't also > suitably qualify singly compiled files (and perhaps there's even > a compiler option, such that we wouldn't need to issue .file > assembler directives ourselves). > > Another possibility to avoid (false) ambiguities could be to make > the non-inline expansions of intended-to-be-inline functions > link-once (which by implication means non-static, and hence > necessarily uniquely named). I repeat my previous questions of "how do you propose to make this happen", with a side implication of "without inventing increasingly contorted logic to try and overcome problems which have been solved in more simple ways by others". >> Also, this reasoning depends on people only making trivial changes via >> livepatch, which, as it turns out, isn't the case in practice. >> >>> And even if the order changed, as long as both instances in fact >>> derive from the same inline function, all would still be fine afaict. >> How do you propose that we determine this? > We build with debug info, so it ought to be possible in principle > to know that both have the same source origin. That doesn't make it safe. On release builds with LIVEPATCH enabled, it is domctl.c#is_pv_domain.isra.0 which is the duplicated symbol, and being separate translation units, the transformation into .isra.0 may be different. > >>> Granted there's some heuristic underlying this, in that we then >>> would hope for there not to be actually differing functions with >>> the same source file and symbol names. >> The more assumptions we start making, the more subtly this will go wrong >> in the future. > Yes, I agree - that's the main downside. But failing a build is > a pretty meaningful downside as well, the more when people > may not build with LIVEPATCH=y before they submit. This is what CI is for, and why we also have things like RANDCONFIG. But for people who want livepatching to work, failing the build is the obvious choice. The alternative is finding out that you've got customers with systems running the broken build of Xen which you can't generate a livepatch for. > (FTR, personally I wouldn't consider it appropriate to revert in such > a case, as an issue here has nothing to do with the actual > change, but only with how we post-process data.) And even > if they did, compiler (version) differences may make them not > notice any issue. For this last reason even automatic testing > (osstest or else) may end up assuming all is fine when it's not. No amount of automation is going to be able to perfectly determine that, for all current and future potential compilers, the resulting binary is fine. But what we have here is a construct which breaks every version of GCC I've tried it with, and general CI definitely will pick this aspect up. ~Andrew
>>> On 11.04.19 at 21:51, <andrew.cooper3@citrix.com> wrote: > On 08/04/2019 11:53, Jan Beulich wrote: >>>>> On 08.04.19 at 12:11, <andrew.cooper3@citrix.com> wrote: >>> On 08/04/2019 10:52, Jan Beulich wrote: >>>>>>> On 05.04.19 at 23:02, <konrad.wilk@oracle.com> wrote: >>>>> On Fri, Apr 05, 2019 at 08:26:04PM +0100, Andrew Cooper wrote: >>>>>> From: Ross Lagerwall <ross.lagerwall@citrix.com> >>>>>> >>>>>> The binary diffing algorithm used by xen-livepatch depends on having unique >>>>>> symbols. >>>> Question: Is this an inherent requirement, or an effect of the >>>> current implementation? It would seem to me that at least in the >>>> common case (binary didn't changer overly heavily) it ought to >>>> be possible to re-associate the duplicates with their correct >>>> origins. And by making the build fail (including in cases like the >>>> one we have right now), we're painting ourselves into the >>>> corner of having to "fix" code which isn't actually buggy. >>> It is inherent. The tool needs a unique way to identify each block of >>> code, so it can work out if the binary has changed by more than just >>> relocations. >>> >>> How do you propose disambiguating ambiguous symbols after the fact? >>> >>> (This is semi rhetorical, because even if there is a possible way, there >>> is no plausible way it is the sensible option to take.) >> I don't understand why you consider it impossible for this to be a >> sensible option. > > Because there is no problem which is best solved by throwing away the > information you've got, and then trying to reconstruct it at a later > point using divination. What information did I suggest to throw away? To me it doesn't sound impossible to produce a diff of two binaries if both have the same number of instances of an otherwise identically named symbol. For live patching purposes the assumption has to be anyway that it's not the entire executable which changes, but just limited regions thereof. >> First of all, from the very beginning of live-patching's existence I've >> been questioning it having got made dependent on the internal >> symbol table we generate, for there being a chance of it being >> broken in some way. > > Xen uses the same diffing mechanism as kpatch. This was chosen for a > number reasons, but first and foremost, because Ross and Konrad did the > work which automatically gives them a 9/10'th choice on how to solve the > various issues at hand, and "reusing an existing tool" is a very good > reason. > > You are welcome to invent a brand new way of performing binary diffing, > but on top of needing to be at least as functional as the current > solution, you will need to justify why deviating from the existing > standard tooling is a good thing for the Xen community to be doing. > > Until then, the current solution depends on having unique symbols. Okay, I can accept this as an argument, and I certainly don't plan on writing a replacement (let alone justifying why I did). >> As to disambiguating: Other nearby local symbols could surely >> help. This might be a problem with -ffunction-sections, but other >> than I thought we don't appear to be using it - at least I couldn't >> get grep to produce a hit in xen/, config/, and the tree root. > > -ffunction-sections is injected by the livepatch-buildtools build > logic. It doesn't live in the main xen.git build. I see - not very fortunate, but at least I wasn't entirely mis-remembering that it gets used. >>>> And even if the order changed, as long as both instances in fact >>>> derive from the same inline function, all would still be fine afaict. >>> How do you propose that we determine this? >> We build with debug info, so it ought to be possible in principle >> to know that both have the same source origin. > > That doesn't make it safe. On release builds with LIVEPATCH enabled, it > is domctl.c#is_pv_domain.isra.0 which is the duplicated symbol, and > being separate translation units, the transformation into .isra.0 may be > different. Oh, right - good point. Jan
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile index ef09939..d4ae46b 100644 --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -98,6 +98,7 @@ endif syms-warn-dup-y := --warn-dup syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) := +syms-warn-dup-$(CONFIG_LIVEPATCH) := --error-dup $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 ./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) $(XEN_IMG_OFFSET) \ diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c index 05139d1..9f9e2c9 100644 --- a/xen/tools/symbols.c +++ b/xen/tools/symbols.c @@ -599,7 +599,7 @@ static int compare_name(const void *p1, const void *p2) int main(int argc, char **argv) { unsigned int i; - bool unsorted = false, warn_dup = false; + bool unsorted = false, warn_dup = false, error_dup = false, found_dup = false; if (argc >= 2) { for (i = 1; i < argc; i++) { @@ -619,6 +619,8 @@ int main(int argc, char **argv) sort_by_name = 1; else if (strcmp(argv[i], "--warn-dup") == 0) warn_dup = true; + else if (strcmp(argv[i], "--error-dup") == 0) + warn_dup = error_dup = true; else if (strcmp(argv[i], "--xensyms") == 0) map_only = true; else @@ -634,14 +636,19 @@ int main(int argc, char **argv) for (i = 1; i < table_cnt; ++i) if (strcmp(SYMBOL_NAME(table + i - 1), SYMBOL_NAME(table + i)) == 0 && - table[i - 1].addr != table[i].addr) + table[i - 1].addr != table[i].addr) { fprintf(stderr, "Duplicate symbol '%s' (%llx != %llx)\n", SYMBOL_NAME(table + i), table[i].addr, table[i - 1].addr); + found_dup = true; + } unsorted = true; } + if (error_dup && found_dup) + exit(1); + if (unsorted) qsort(table, table_cnt, sizeof(*table), compare_value);