Message ID | 20200501095843.25401-3-ardb@kernel.org (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
Series | ACPI/IORT: rework num_ids off-by-one quirk | expand |
On 2020-05-01 10:58 am, Ard Biesheuvel wrote: > The ID mapping table structure of the IORT table describes the size of > a range using a num_ids field carrying the number of IDs in the region > minus one. This has been misinterpreted in the past in the parsing code, > and firmware is known to have shipped where this results in an ambiguity, > where regions that should be adjacent have an overlap of one value. > > So let's work around this by detecting this case specifically: when > resolving an ID translation, allow one that matches right at the end of > a multi-ID region to be superseded by a subsequent one. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > drivers/acpi/arm64/iort.c | 23 +++++++++++++++----- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > index 98be18266a73..d826dd9dc4c5 100644 > --- a/drivers/acpi/arm64/iort.c > +++ b/drivers/acpi/arm64/iort.c > @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in, > } > > if (rid_in < map->input_base || > - (rid_in >= map->input_base + map->id_count)) > + (rid_in > map->input_base + map->id_count)) > return -ENXIO; > > *rid_out = map->output_base + (rid_in - map->input_base); > + > + /* > + * Due to confusion regarding the meaning of the id_count field (which > + * carries the number of IDs *minus 1*), we may have to disregard this > + * match if it is at the end of the range, and overlaps with the start > + * of another one. > + */ > + if (map->id_count > 0 && rid_in == map->input_base + map->id_count) > + return -EAGAIN; > return 0; > } > > @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, > /* Parse the ID mapping tree to find specified node type */ > while (node) { > struct acpi_iort_id_mapping *map; > - int i, index; > + int i, index, rc = 0; > + u32 out_ref = 0, map_id = id; > > if (IORT_TYPE_MASK(node->type) & type_mask) { > if (id_out) > @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, > if (i == index) > continue; > > - if (!iort_id_map(map, node->type, id, &id)) > + rc = iort_id_map(map, node->type, map_id, &id); > + if (!rc) > break; This needs a big FW_BUG splat in the case where it did find an overlap. Ideally we'd also enforce that the other half of must be the first entry of another range, but perhaps we're into diminishing returns by that point. If we silently fix things up, then people will continue to write broken tables without even realising, new OSes will have to implement the same mechanism because vendors will have little interest in changing things that have worked "correctly" with Linux for years, and we've effectively achieved a de-facto redefinition of the spec. Making our end of the interface robust is obviously desirable, but there still needs to be *some* incentive for the folks on the other end to get it right. Robin. > + if (rc == -EAGAIN) > + out_ref = map->output_reference; > } > > - if (i == node->mapping_count) > + if (i == node->mapping_count && rc != -EAGAIN) > goto fail_map; > > node = ACPI_ADD_PTR(struct acpi_iort_node, iort_table, > - map->output_reference); > + rc ? out_ref : map->output_reference); > } > > fail_map: >
On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote: > > On 2020-05-01 10:58 am, Ard Biesheuvel wrote: > > The ID mapping table structure of the IORT table describes the size of > > a range using a num_ids field carrying the number of IDs in the region > > minus one. This has been misinterpreted in the past in the parsing code, > > and firmware is known to have shipped where this results in an ambiguity, > > where regions that should be adjacent have an overlap of one value. > > > > So let's work around this by detecting this case specifically: when > > resolving an ID translation, allow one that matches right at the end of > > a multi-ID region to be superseded by a subsequent one. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > drivers/acpi/arm64/iort.c | 23 +++++++++++++++----- > > 1 file changed, 18 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > > index 98be18266a73..d826dd9dc4c5 100644 > > --- a/drivers/acpi/arm64/iort.c > > +++ b/drivers/acpi/arm64/iort.c > > @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in, > > } > > > > if (rid_in < map->input_base || > > - (rid_in >= map->input_base + map->id_count)) > > + (rid_in > map->input_base + map->id_count)) > > return -ENXIO; > > > > *rid_out = map->output_base + (rid_in - map->input_base); > > + > > + /* > > + * Due to confusion regarding the meaning of the id_count field (which > > + * carries the number of IDs *minus 1*), we may have to disregard this > > + * match if it is at the end of the range, and overlaps with the start > > + * of another one. > > + */ > > + if (map->id_count > 0 && rid_in == map->input_base + map->id_count) > > + return -EAGAIN; > > return 0; > > } > > > > @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, > > /* Parse the ID mapping tree to find specified node type */ > > while (node) { > > struct acpi_iort_id_mapping *map; > > - int i, index; > > + int i, index, rc = 0; > > + u32 out_ref = 0, map_id = id; > > > > if (IORT_TYPE_MASK(node->type) & type_mask) { > > if (id_out) > > @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, > > if (i == index) > > continue; > > > > - if (!iort_id_map(map, node->type, id, &id)) > > + rc = iort_id_map(map, node->type, map_id, &id); > > + if (!rc) > > break; > > This needs a big FW_BUG splat in the case where it did find an overlap. Sure, although we did help create the problem in the first place. > Ideally we'd also enforce that the other half of must be the first entry > of another range, but perhaps we're into diminishing returns by that point. > That would mean the regions overlap regardless of whether you interpret num_ids correctly or not, which means we'll be doing validation of general well-formedness of the table rather than providing a workaround for this particular issue. I think the fact that we got it wrong initially justifies treating the off-by-one case specially, but beyond that, we should make it robust without being pedantic imo. > If we silently fix things up, then people will continue to write broken > tables without even realising, new OSes will have to implement the same > mechanism because vendors will have little interest in changing things > that have worked "correctly" with Linux for years, and we've effectively > achieved a de-facto redefinition of the spec. Making our end of the > interface robust is obviously desirable, but there still needs to be > *some* incentive for the folks on the other end to get it right. > Agreed. But at least we'll be able to detect it and flag it in the general case, rather than having a special case for D05/06 only (although I suppose splitting the output ranges like those platforms do is rather unusual)
On 2020-05-01 12:41 pm, Ard Biesheuvel wrote: > On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote: >> >> On 2020-05-01 10:58 am, Ard Biesheuvel wrote: >>> The ID mapping table structure of the IORT table describes the size of >>> a range using a num_ids field carrying the number of IDs in the region >>> minus one. This has been misinterpreted in the past in the parsing code, >>> and firmware is known to have shipped where this results in an ambiguity, >>> where regions that should be adjacent have an overlap of one value. >>> >>> So let's work around this by detecting this case specifically: when >>> resolving an ID translation, allow one that matches right at the end of >>> a multi-ID region to be superseded by a subsequent one. >>> >>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> >>> --- >>> drivers/acpi/arm64/iort.c | 23 +++++++++++++++----- >>> 1 file changed, 18 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c >>> index 98be18266a73..d826dd9dc4c5 100644 >>> --- a/drivers/acpi/arm64/iort.c >>> +++ b/drivers/acpi/arm64/iort.c >>> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in, >>> } >>> >>> if (rid_in < map->input_base || >>> - (rid_in >= map->input_base + map->id_count)) >>> + (rid_in > map->input_base + map->id_count)) >>> return -ENXIO; >>> >>> *rid_out = map->output_base + (rid_in - map->input_base); >>> + >>> + /* >>> + * Due to confusion regarding the meaning of the id_count field (which >>> + * carries the number of IDs *minus 1*), we may have to disregard this >>> + * match if it is at the end of the range, and overlaps with the start >>> + * of another one. >>> + */ >>> + if (map->id_count > 0 && rid_in == map->input_base + map->id_count) >>> + return -EAGAIN; >>> return 0; >>> } >>> >>> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, >>> /* Parse the ID mapping tree to find specified node type */ >>> while (node) { >>> struct acpi_iort_id_mapping *map; >>> - int i, index; >>> + int i, index, rc = 0; >>> + u32 out_ref = 0, map_id = id; >>> >>> if (IORT_TYPE_MASK(node->type) & type_mask) { >>> if (id_out) >>> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, >>> if (i == index) >>> continue; >>> >>> - if (!iort_id_map(map, node->type, id, &id)) >>> + rc = iort_id_map(map, node->type, map_id, &id); >>> + if (!rc) >>> break; >> >> This needs a big FW_BUG splat in the case where it did find an overlap. > > Sure, although we did help create the problem in the first place. > >> Ideally we'd also enforce that the other half of must be the first entry >> of another range, but perhaps we're into diminishing returns by that point. >> > > That would mean the regions overlap regardless of whether you > interpret num_ids correctly or not, which means we'll be doing > validation of general well-formedness of the table rather than > providing a workaround for this particular issue. The point was to limit any change in behaviour to the specific case that we need to work around. Otherwise a table that was entirely malformed rather than just off-by-one on the sizes might go from happening-to-work to not working, or vice versa; the diminishing return is in how much we care about that. > I think the fact that we got it wrong initially justifies treating the > off-by-one case specially, but beyond that, we should make it robust > without being pedantic imo. As the #1 search engine hit for "Linux is not a firmware validation suite", I can reassure you that we're on the same page in that regard ;) >> If we silently fix things up, then people will continue to write broken >> tables without even realising, new OSes will have to implement the same >> mechanism because vendors will have little interest in changing things >> that have worked "correctly" with Linux for years, and we've effectively >> achieved a de-facto redefinition of the spec. Making our end of the >> interface robust is obviously desirable, but there still needs to be >> *some* incentive for the folks on the other end to get it right. >> > > Agreed. But at least we'll be able to detect it and flag it in the > general case, rather than having a special case for D05/06 only > (although I suppose splitting the output ranges like those platforms > do is rather unusual) Yup, in principle the fixed quirk list gives a nice reassuring sense of "we'll work around these early platforms and everyone from now on will get it right", but whether reality plays out that way is another matter entirely... Robin.
On Fri, 1 May 2020 at 14:31, Robin Murphy <robin.murphy@arm.com> wrote: > > On 2020-05-01 12:41 pm, Ard Biesheuvel wrote: > > On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote: > >> > >> On 2020-05-01 10:58 am, Ard Biesheuvel wrote: > >>> The ID mapping table structure of the IORT table describes the size of > >>> a range using a num_ids field carrying the number of IDs in the region > >>> minus one. This has been misinterpreted in the past in the parsing code, > >>> and firmware is known to have shipped where this results in an ambiguity, > >>> where regions that should be adjacent have an overlap of one value. > >>> > >>> So let's work around this by detecting this case specifically: when > >>> resolving an ID translation, allow one that matches right at the end of > >>> a multi-ID region to be superseded by a subsequent one. > >>> > >>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > >>> --- > >>> drivers/acpi/arm64/iort.c | 23 +++++++++++++++----- > >>> 1 file changed, 18 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > >>> index 98be18266a73..d826dd9dc4c5 100644 > >>> --- a/drivers/acpi/arm64/iort.c > >>> +++ b/drivers/acpi/arm64/iort.c > >>> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in, > >>> } > >>> > >>> if (rid_in < map->input_base || > >>> - (rid_in >= map->input_base + map->id_count)) > >>> + (rid_in > map->input_base + map->id_count)) > >>> return -ENXIO; > >>> > >>> *rid_out = map->output_base + (rid_in - map->input_base); > >>> + > >>> + /* > >>> + * Due to confusion regarding the meaning of the id_count field (which > >>> + * carries the number of IDs *minus 1*), we may have to disregard this > >>> + * match if it is at the end of the range, and overlaps with the start > >>> + * of another one. > >>> + */ > >>> + if (map->id_count > 0 && rid_in == map->input_base + map->id_count) > >>> + return -EAGAIN; > >>> return 0; > >>> } > >>> > >>> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, > >>> /* Parse the ID mapping tree to find specified node type */ > >>> while (node) { > >>> struct acpi_iort_id_mapping *map; > >>> - int i, index; > >>> + int i, index, rc = 0; > >>> + u32 out_ref = 0, map_id = id; > >>> > >>> if (IORT_TYPE_MASK(node->type) & type_mask) { > >>> if (id_out) > >>> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, > >>> if (i == index) > >>> continue; > >>> > >>> - if (!iort_id_map(map, node->type, id, &id)) > >>> + rc = iort_id_map(map, node->type, map_id, &id); > >>> + if (!rc) > >>> break; > >> > >> This needs a big FW_BUG splat in the case where it did find an overlap. > > > > Sure, although we did help create the problem in the first place. > > > >> Ideally we'd also enforce that the other half of must be the first entry > >> of another range, but perhaps we're into diminishing returns by that point. > >> > > > > That would mean the regions overlap regardless of whether you > > interpret num_ids correctly or not, which means we'll be doing > > validation of general well-formedness of the table rather than > > providing a workaround for this particular issue. > > The point was to limit any change in behaviour to the specific case that > we need to work around. Otherwise a table that was entirely malformed > rather than just off-by-one on the sizes might go from happening-to-work > to not working, or vice versa; the diminishing return is in how much we > care about that. > I see. I think it is quite unlikely that a working system with overlapping ID ranges would work, and suddenly fail horribly when the exact point of overlap is shifted by 1. But yeah, I see your point. > > I think the fact that we got it wrong initially justifies treating the > > off-by-one case specially, but beyond that, we should make it robust > > without being pedantic imo. > > As the #1 search engine hit for "Linux is not a firmware validation > suite", I can reassure you that we're on the same page in that regard ;) > Good :-) > >> If we silently fix things up, then people will continue to write broken > >> tables without even realising, new OSes will have to implement the same > >> mechanism because vendors will have little interest in changing things > >> that have worked "correctly" with Linux for years, and we've effectively > >> achieved a de-facto redefinition of the spec. Making our end of the > >> interface robust is obviously desirable, but there still needs to be > >> *some* incentive for the folks on the other end to get it right. > >> > > > > Agreed. But at least we'll be able to detect it and flag it in the > > general case, rather than having a special case for D05/06 only > > (although I suppose splitting the output ranges like those platforms > > do is rather unusual) > > Yup, in principle the fixed quirk list gives a nice reassuring sense of > "we'll work around these early platforms and everyone from now on will > get it right", but whether reality plays out that way is another matter > entirely... The reason I am looking into this is that I think the fix should go to stable, given that the current situation makes it impossible to write firmware that works with older and newer kernels. Lorenzo said he wouldn't mind taking the current version with ACPI OEM ID matching back to -stable, but it's another quirk list to manage, which I would prefer to avoid. But I don't care deeply either way, to be honest, as long as we can get something backported so compliant firmware is not being penalized anymore.
On Fri, May 01, 2020 at 03:10:59PM +0200, Ard Biesheuvel wrote: [...] > > >> If we silently fix things up, then people will continue to write broken > > >> tables without even realising, new OSes will have to implement the same > > >> mechanism because vendors will have little interest in changing things > > >> that have worked "correctly" with Linux for years, and we've effectively > > >> achieved a de-facto redefinition of the spec. Making our end of the > > >> interface robust is obviously desirable, but there still needs to be > > >> *some* incentive for the folks on the other end to get it right. > > >> > > > > > > Agreed. But at least we'll be able to detect it and flag it in the > > > general case, rather than having a special case for D05/06 only > > > (although I suppose splitting the output ranges like those platforms > > > do is rather unusual) > > > > Yup, in principle the fixed quirk list gives a nice reassuring sense of > > "we'll work around these early platforms and everyone from now on will > > get it right", but whether reality plays out that way is another matter > > entirely... > > The reason I am looking into this is that I think the fix should go to > stable, given that the current situation makes it impossible to write > firmware that works with older and newer kernels. Yes. If we do remove the quirk the sooner we do it the better to reduce the stable patches. > Lorenzo said he wouldn't mind taking the current version with ACPI OEM > ID matching back to -stable, but it's another quirk list to manage, > which I would prefer to avoid. > > But I don't care deeply either way, to be honest, as long as we can > get something backported so compliant firmware is not being penalized > anymore. Question: if we remove the iort_workaround_oem_info stuff but we *do* keep the existing apply_id_count_workaround flag and we set it by going through all the mappings at boot time and detect if any of these off-by-one conditions apply - would the resulting code be any simpler ? The global flag would apply (as it does now) to _all_ mappings but it is very likely that if the off-by-one firmware bug is present it applies to the IORT table as a whole rather than a single mapping entry. Lorenzo
On 2020-05-01 2:10 pm, Ard Biesheuvel wrote: > On Fri, 1 May 2020 at 14:31, Robin Murphy <robin.murphy@arm.com> wrote: >> >> On 2020-05-01 12:41 pm, Ard Biesheuvel wrote: >>> On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote: >>>> >>>> On 2020-05-01 10:58 am, Ard Biesheuvel wrote: >>>>> The ID mapping table structure of the IORT table describes the size of >>>>> a range using a num_ids field carrying the number of IDs in the region >>>>> minus one. This has been misinterpreted in the past in the parsing code, >>>>> and firmware is known to have shipped where this results in an ambiguity, >>>>> where regions that should be adjacent have an overlap of one value. >>>>> >>>>> So let's work around this by detecting this case specifically: when >>>>> resolving an ID translation, allow one that matches right at the end of >>>>> a multi-ID region to be superseded by a subsequent one. >>>>> >>>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> >>>>> --- >>>>> drivers/acpi/arm64/iort.c | 23 +++++++++++++++----- >>>>> 1 file changed, 18 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c >>>>> index 98be18266a73..d826dd9dc4c5 100644 >>>>> --- a/drivers/acpi/arm64/iort.c >>>>> +++ b/drivers/acpi/arm64/iort.c >>>>> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in, >>>>> } >>>>> >>>>> if (rid_in < map->input_base || >>>>> - (rid_in >= map->input_base + map->id_count)) >>>>> + (rid_in > map->input_base + map->id_count)) >>>>> return -ENXIO; >>>>> >>>>> *rid_out = map->output_base + (rid_in - map->input_base); >>>>> + >>>>> + /* >>>>> + * Due to confusion regarding the meaning of the id_count field (which >>>>> + * carries the number of IDs *minus 1*), we may have to disregard this >>>>> + * match if it is at the end of the range, and overlaps with the start >>>>> + * of another one. >>>>> + */ >>>>> + if (map->id_count > 0 && rid_in == map->input_base + map->id_count) >>>>> + return -EAGAIN; >>>>> return 0; >>>>> } >>>>> >>>>> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, >>>>> /* Parse the ID mapping tree to find specified node type */ >>>>> while (node) { >>>>> struct acpi_iort_id_mapping *map; >>>>> - int i, index; >>>>> + int i, index, rc = 0; >>>>> + u32 out_ref = 0, map_id = id; >>>>> >>>>> if (IORT_TYPE_MASK(node->type) & type_mask) { >>>>> if (id_out) >>>>> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, >>>>> if (i == index) >>>>> continue; >>>>> >>>>> - if (!iort_id_map(map, node->type, id, &id)) >>>>> + rc = iort_id_map(map, node->type, map_id, &id); >>>>> + if (!rc) >>>>> break; >>>> >>>> This needs a big FW_BUG splat in the case where it did find an overlap. >>> >>> Sure, although we did help create the problem in the first place. >>> >>>> Ideally we'd also enforce that the other half of must be the first entry >>>> of another range, but perhaps we're into diminishing returns by that point. >>>> >>> >>> That would mean the regions overlap regardless of whether you >>> interpret num_ids correctly or not, which means we'll be doing >>> validation of general well-formedness of the table rather than >>> providing a workaround for this particular issue. >> >> The point was to limit any change in behaviour to the specific case that >> we need to work around. Otherwise a table that was entirely malformed >> rather than just off-by-one on the sizes might go from happening-to-work >> to not working, or vice versa; the diminishing return is in how much we >> care about that. >> > > I see. I think it is quite unlikely that a working system with > overlapping ID ranges would work, and suddenly fail horribly when the > exact point of overlap is shifted by 1. But yeah, I see your point. Say that due to a copy-paste error or some other development artefact, the same correctly-sized input range is described twice, but the second copy has the wrong output base. Unless the IORT implementation is wacky enough to process mappings in reverse order it will have worked out OK, until suddenly the highest input ID starts falling through to the spurious broken mapping instead. The match quirk implicitly encodes the exact nature of the ambiguity known to be present in the given table, so can be confident in fixing it up quietly. The heuristic doesn't have that luxury, so is wise to keep its scope as narrow as possible, and warn the user when it does choose to second-guess something on the off-chance that doing so actually makes the situation worse. Robin.
On Fri, 1 May 2020 at 16:13, Robin Murphy <robin.murphy@arm.com> wrote: > > On 2020-05-01 2:10 pm, Ard Biesheuvel wrote: > > On Fri, 1 May 2020 at 14:31, Robin Murphy <robin.murphy@arm.com> wrote: > >> > >> On 2020-05-01 12:41 pm, Ard Biesheuvel wrote: > >>> On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote: > >>>> > >>>> On 2020-05-01 10:58 am, Ard Biesheuvel wrote: > >>>>> The ID mapping table structure of the IORT table describes the size of > >>>>> a range using a num_ids field carrying the number of IDs in the region > >>>>> minus one. This has been misinterpreted in the past in the parsing code, > >>>>> and firmware is known to have shipped where this results in an ambiguity, > >>>>> where regions that should be adjacent have an overlap of one value. > >>>>> > >>>>> So let's work around this by detecting this case specifically: when > >>>>> resolving an ID translation, allow one that matches right at the end of > >>>>> a multi-ID region to be superseded by a subsequent one. > >>>>> > >>>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > >>>>> --- > >>>>> drivers/acpi/arm64/iort.c | 23 +++++++++++++++----- > >>>>> 1 file changed, 18 insertions(+), 5 deletions(-) > >>>>> > >>>>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > >>>>> index 98be18266a73..d826dd9dc4c5 100644 > >>>>> --- a/drivers/acpi/arm64/iort.c > >>>>> +++ b/drivers/acpi/arm64/iort.c > >>>>> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in, > >>>>> } > >>>>> > >>>>> if (rid_in < map->input_base || > >>>>> - (rid_in >= map->input_base + map->id_count)) > >>>>> + (rid_in > map->input_base + map->id_count)) > >>>>> return -ENXIO; > >>>>> > >>>>> *rid_out = map->output_base + (rid_in - map->input_base); > >>>>> + > >>>>> + /* > >>>>> + * Due to confusion regarding the meaning of the id_count field (which > >>>>> + * carries the number of IDs *minus 1*), we may have to disregard this > >>>>> + * match if it is at the end of the range, and overlaps with the start > >>>>> + * of another one. > >>>>> + */ > >>>>> + if (map->id_count > 0 && rid_in == map->input_base + map->id_count) > >>>>> + return -EAGAIN; > >>>>> return 0; > >>>>> } > >>>>> > >>>>> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, > >>>>> /* Parse the ID mapping tree to find specified node type */ > >>>>> while (node) { > >>>>> struct acpi_iort_id_mapping *map; > >>>>> - int i, index; > >>>>> + int i, index, rc = 0; > >>>>> + u32 out_ref = 0, map_id = id; > >>>>> > >>>>> if (IORT_TYPE_MASK(node->type) & type_mask) { > >>>>> if (id_out) > >>>>> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, > >>>>> if (i == index) > >>>>> continue; > >>>>> > >>>>> - if (!iort_id_map(map, node->type, id, &id)) > >>>>> + rc = iort_id_map(map, node->type, map_id, &id); > >>>>> + if (!rc) > >>>>> break; > >>>> > >>>> This needs a big FW_BUG splat in the case where it did find an overlap. > >>> > >>> Sure, although we did help create the problem in the first place. > >>> > >>>> Ideally we'd also enforce that the other half of must be the first entry > >>>> of another range, but perhaps we're into diminishing returns by that point. > >>>> > >>> > >>> That would mean the regions overlap regardless of whether you > >>> interpret num_ids correctly or not, which means we'll be doing > >>> validation of general well-formedness of the table rather than > >>> providing a workaround for this particular issue. > >> > >> The point was to limit any change in behaviour to the specific case that > >> we need to work around. Otherwise a table that was entirely malformed > >> rather than just off-by-one on the sizes might go from happening-to-work > >> to not working, or vice versa; the diminishing return is in how much we > >> care about that. > >> > > > > I see. I think it is quite unlikely that a working system with > > overlapping ID ranges would work, and suddenly fail horribly when the > > exact point of overlap is shifted by 1. But yeah, I see your point. > > Say that due to a copy-paste error or some other development artefact, > the same correctly-sized input range is described twice, but the second > copy has the wrong output base. Unless the IORT implementation is wacky > enough to process mappings in reverse order it will have worked out OK, > until suddenly the highest input ID starts falling through to the > spurious broken mapping instead. > OK, so there are other quite unlikely scenarios where this might break :-) > The match quirk implicitly encodes the exact nature of the ambiguity > known to be present in the given table, so can be confident in fixing it > up quietly. The heuristic doesn't have that luxury, so is wise to keep > its scope as narrow as possible, and warn the user when it does choose > to second-guess something on the off-chance that doing so actually makes > the situation worse. > Fair enough. I'll have a go at incorporating the FW_BUG and the double check.
On Fri, 1 May 2020 at 15:50, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > > On Fri, May 01, 2020 at 03:10:59PM +0200, Ard Biesheuvel wrote: > > [...] > > > > >> If we silently fix things up, then people will continue to write broken > > > >> tables without even realising, new OSes will have to implement the same > > > >> mechanism because vendors will have little interest in changing things > > > >> that have worked "correctly" with Linux for years, and we've effectively > > > >> achieved a de-facto redefinition of the spec. Making our end of the > > > >> interface robust is obviously desirable, but there still needs to be > > > >> *some* incentive for the folks on the other end to get it right. > > > >> > > > > > > > > Agreed. But at least we'll be able to detect it and flag it in the > > > > general case, rather than having a special case for D05/06 only > > > > (although I suppose splitting the output ranges like those platforms > > > > do is rather unusual) > > > > > > Yup, in principle the fixed quirk list gives a nice reassuring sense of > > > "we'll work around these early platforms and everyone from now on will > > > get it right", but whether reality plays out that way is another matter > > > entirely... > > > > The reason I am looking into this is that I think the fix should go to > > stable, given that the current situation makes it impossible to write > > firmware that works with older and newer kernels. > > Yes. If we do remove the quirk the sooner we do it the better to > reduce the stable patches. > > > Lorenzo said he wouldn't mind taking the current version with ACPI OEM > > ID matching back to -stable, but it's another quirk list to manage, > > which I would prefer to avoid. > > > > But I don't care deeply either way, to be honest, as long as we can > > get something backported so compliant firmware is not being penalized > > anymore. > > Question: if we remove the iort_workaround_oem_info stuff but we *do* > keep the existing apply_id_count_workaround flag and we set it by going > through all the mappings at boot time and detect if any of these > off-by-one conditions apply - would the resulting code be any simpler ? > > The global flag would apply (as it does now) to _all_ mappings but it is > very likely that if the off-by-one firmware bug is present it applies to > the IORT table as a whole rather than a single mapping entry. > This particular issue is based on a misinterpretation, so I agree that it makes sense to have a global flag, as long as we only set it if the mappings are fully consistent in every other respect, or we'll run the risk of hitting issues like the one Robin describes, where things happen to work, but will fail once we apply the heuristic. Such an issue could exist on one end of the table, while we could spot the off-by-one issue somewhere else. Which brings us back to a point I made earlier: do we really want to validate the table and ensure that it is fully internally consistent? Or do we want to be robust in the face of a single known issue that we helped create? So in my opinion, just fixing it up when we run into it is fine. I can add the extra sanity check to reduce the potential fallout for other broken systems, but beyond that, I think we shouldn't do too much.
On 2020-05-01 3:35 pm, Ard Biesheuvel wrote: > On Fri, 1 May 2020 at 15:50, Lorenzo Pieralisi > <lorenzo.pieralisi@arm.com> wrote: >> >> On Fri, May 01, 2020 at 03:10:59PM +0200, Ard Biesheuvel wrote: >> >> [...] >> >>>>>> If we silently fix things up, then people will continue to write broken >>>>>> tables without even realising, new OSes will have to implement the same >>>>>> mechanism because vendors will have little interest in changing things >>>>>> that have worked "correctly" with Linux for years, and we've effectively >>>>>> achieved a de-facto redefinition of the spec. Making our end of the >>>>>> interface robust is obviously desirable, but there still needs to be >>>>>> *some* incentive for the folks on the other end to get it right. >>>>>> >>>>> >>>>> Agreed. But at least we'll be able to detect it and flag it in the >>>>> general case, rather than having a special case for D05/06 only >>>>> (although I suppose splitting the output ranges like those platforms >>>>> do is rather unusual) >>>> >>>> Yup, in principle the fixed quirk list gives a nice reassuring sense of >>>> "we'll work around these early platforms and everyone from now on will >>>> get it right", but whether reality plays out that way is another matter >>>> entirely... >>> >>> The reason I am looking into this is that I think the fix should go to >>> stable, given that the current situation makes it impossible to write >>> firmware that works with older and newer kernels. >> >> Yes. If we do remove the quirk the sooner we do it the better to >> reduce the stable patches. >> >>> Lorenzo said he wouldn't mind taking the current version with ACPI OEM >>> ID matching back to -stable, but it's another quirk list to manage, >>> which I would prefer to avoid. >>> >>> But I don't care deeply either way, to be honest, as long as we can >>> get something backported so compliant firmware is not being penalized >>> anymore. >> >> Question: if we remove the iort_workaround_oem_info stuff but we *do* >> keep the existing apply_id_count_workaround flag and we set it by going >> through all the mappings at boot time and detect if any of these >> off-by-one conditions apply - would the resulting code be any simpler ? >> >> The global flag would apply (as it does now) to _all_ mappings but it is >> very likely that if the off-by-one firmware bug is present it applies to >> the IORT table as a whole rather than a single mapping entry. >> > > This particular issue is based on a misinterpretation, so I agree that > it makes sense to have a global flag, as long as we only set it if the > mappings are fully consistent in every other respect, or we'll run the > risk of hitting issues like the one Robin describes, where things > happen to work, but will fail once we apply the heuristic. Such an > issue could exist on one end of the table, while we could spot the > off-by-one issue somewhere else. > > Which brings us back to a point I made earlier: do we really want to > validate the table and ensure that it is fully internally consistent? > Or do we want to be robust in the face of a single known issue that we > helped create? > > So in my opinion, just fixing it up when we run into it is fine. I can > add the extra sanity check to reduce the potential fallout for other > broken systems, but beyond that, I think we shouldn't do too much. Agreed - AFAICS the extra robustness I'm asking for should only amount to a handful more lines on top of the proposed patch (maybe a couple of positive return values for "by the way this came from the start/end of a mapping range" instead of -EAGAIN). I think a separate scanning pass is likely to add up to more complexity and similar-but-not-quite-reusable code than simply detecting and handling potential off-by-one edges in-line. Robin.
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 98be18266a73..d826dd9dc4c5 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in, } if (rid_in < map->input_base || - (rid_in >= map->input_base + map->id_count)) + (rid_in > map->input_base + map->id_count)) return -ENXIO; *rid_out = map->output_base + (rid_in - map->input_base); + + /* + * Due to confusion regarding the meaning of the id_count field (which + * carries the number of IDs *minus 1*), we may have to disregard this + * match if it is at the end of the range, and overlaps with the start + * of another one. + */ + if (map->id_count > 0 && rid_in == map->input_base + map->id_count) + return -EAGAIN; return 0; } @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, /* Parse the ID mapping tree to find specified node type */ while (node) { struct acpi_iort_id_mapping *map; - int i, index; + int i, index, rc = 0; + u32 out_ref = 0, map_id = id; if (IORT_TYPE_MASK(node->type) & type_mask) { if (id_out) @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, if (i == index) continue; - if (!iort_id_map(map, node->type, id, &id)) + rc = iort_id_map(map, node->type, map_id, &id); + if (!rc) break; + if (rc == -EAGAIN) + out_ref = map->output_reference; } - if (i == node->mapping_count) + if (i == node->mapping_count && rc != -EAGAIN) goto fail_map; node = ACPI_ADD_PTR(struct acpi_iort_node, iort_table, - map->output_reference); + rc ? out_ref : map->output_reference); } fail_map:
The ID mapping table structure of the IORT table describes the size of a range using a num_ids field carrying the number of IDs in the region minus one. This has been misinterpreted in the past in the parsing code, and firmware is known to have shipped where this results in an ambiguity, where regions that should be adjacent have an overlap of one value. So let's work around this by detecting this case specifically: when resolving an ID translation, allow one that matches right at the end of a multi-ID region to be superseded by a subsequent one. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- drivers/acpi/arm64/iort.c | 23 +++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)