Message ID | 20230417100258.22965-1-quic_karm@quicinc.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Johannes Berg |
Headers | show |
Series | [v1] wifi: mac80211: Initialize EWMA fail avg to 1 | expand |
On Mon, 2023-04-17 at 15:32 +0530, Karthik M wrote: > If the average value in mesh metrics calculation > has been rounded to 0 (success), this resets it to > the smallest nonzero value (similarly to the initialization) > to avoid a case where a single failure would result in > an average value that goes beyond the value > of 95 (Link Failure Threshold). > > Signed-off-by: Tamizh Chelvam Raja <quic_tamizhr@quicinc.com> > Signed-off-by: Karthik M <quic_karm@quicinc.com> > --- > Changes since v1: > - Altered the comment to mention "mesh" and thershold value. > - Checkpatch done This is v2 now, how did an explicit "v1" happen? :) Anyway ... > +++ b/net/mac80211/mesh_hwmp.c > @@ -298,10 +298,23 @@ void ieee80211s_update_metric(struct ieee80211_local *local, > { > struct ieee80211_tx_info *txinfo = st->info; > int failed; > + u32 fail_avg; > struct rate_info rinfo; > > failed = !(txinfo->flags & IEEE80211_TX_STAT_ACK); > > + fail_avg = ewma_mesh_fail_avg_read(&sta->mesh->fail_avg); > + if (!fail_avg) { > + /* If the average value in mesh metrics calculation > + * has been rounded to 0 (success), this resets it to > + * the smallest nonzero value (similarly to the initialization) > + * to avoid a case where a single failure would result in > + * an average value that goes beyond the value > + * of 95 (Link Failure Threshold) > + */ > + ewma_mesh_fail_avg_add(&sta->mesh->fail_avg, 1); > + } > + > /* moving average, scaled to 100. > * feed failure as 100 and success as 0 > */ This still seems really non-intuitive to me. It seems to me this is down to the special "0 means init" behaviour, that you don't want, because your values actually fluctate between 0 and 100, and you can actually legitimately reach 0 with a lot of successes. But to me it's really non-intuitive, if not counter-intuitive, to say "oh yeah my values are 0 to 100 inclusive, but I can't ever deal with reaching 0 because then I jump to 100 immediately". That doesn't make much sense to me? I mean, I guess I can see where this patch makes some sense like from a code point of view, this is sort of the minimal code change you could make to make the existing code work, but ... I'd argue you're optimising to the wrong metric here, "minimal code changes to fix the bug" should normally not be your metric, it should be "code changes that make this clearer and avoid the problem", or something like that? Anyway I guess that's all a long-winded way of saying that I don't really agree with this change. To me, basically, I see two ways to solve this: 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which *doesn't* treat 0 as an uninitialized value, and either has a separate "not initialized yet" bit (but that's iffy storage wise), or simply has another argument to _init() for the initial value or so. 2) you don't just don't use 0 and 100 but say 1 and 100, that results in basically the same behaviour, but avoids the special 0. johannes
Hi, > > This still seems really non-intuitive to me. > > It seems to me this is down to the special "0 means init" behaviour, > that you don't want, because your values actually fluctate between 0 and > 100, and you can actually legitimately reach 0 with a lot of successes. > > But to me it's really non-intuitive, if not counter-intuitive, to say > "oh yeah my values are 0 to 100 inclusive, but I can't ever deal with > reaching 0 because then I jump to 100 immediately". That doesn't make > much sense to me? > > I mean, I guess I can see where this patch makes some sense like from a > code point of view, this is sort of the minimal code change you could > make to make the existing code work, but ... I'd argue you're optimising > to the wrong metric here, "minimal code changes to fix the bug" should > normally not be your metric, it should be "code changes that make this > clearer and avoid the problem", or something like that? > > Anyway I guess that's all a long-winded way of saying that I don't > really agree with this change. > > To me, basically, I see two ways to solve this: > > 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which > *doesn't* treat 0 as an uninitialized value, and either has a > separate "not initialized yet" bit (but that's iffy storage wise), > or simply has another argument to _init() for the initial value or > so. > > 2) you don't just don't use 0 and 100 but say 1 and 100, that results in > basically the same behaviour, but avoids the special 0. > > johannes I also ran into that problem in the past, and reviewing it again with a college, I think, this is a real bug in the EWMA implementation. I try to provide a proper patch in the next days, but actually the EWMA handles the internal value zero, always like in the initialization, which is wrong, e.g., for positive/negative averaged values. A quick research shows, this bug is since the first implementation of the ewma in the code ... kind regards Benjamin
On Thu, 2023-04-20 at 11:30 +0200, Benjamin Beichler wrote: > > To me, basically, I see two ways to solve this: > > > > 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which > > *doesn't* treat 0 as an uninitialized value, and either has a > > separate "not initialized yet" bit (but that's iffy storage wise), > > or simply has another argument to _init() for the initial value or > > so. > > > > 2) you don't just don't use 0 and 100 but say 1 and 100, that results in > > basically the same behaviour, but avoids the special 0. > > > > johannes > > I also ran into that problem in the past, and reviewing it again with a > college, I think, this is a real bug in the EWMA implementation. I try > to provide a proper patch in the next days, but actually the EWMA > handles the internal value zero, always like in the initialization, > which is wrong, e.g., for positive/negative averaged values. Yes, it's always wrong as long as you feed it something zero, or values with different sign. For a lot of use cases, however, that doesn't matter. Originally, it was used e.g. for signal strength averaging, average packet lengths, etc. where it really doesn't matter since you can never use 0 or values that have different sign. > A quick research shows, this bug is since the first implementation of > the ewma in the code ... > Yeah, I'm aware of that, I was around for it ;-) But see above, I'm not sure I'd even call it a bug, at least not originally with the users that we had intended. Hence I don't know if it's really good to fix this in general - for many of these cases zero can still be treated specially (and like I mentioned in my previous email, we can even here avoid 0), and then we don't spend an extra byte (or likely 4) to hold a "first time" flag. Dunno. Maybe it's not worth thinking about the extra memory space vs. the extra maintenance cost. But maybe at least on 64-bit we could steal a bit from the unsigned long? Not sure what all the users are... johannes
>>> To me, basically, I see two ways to solve this: >>> >>> 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which >>> *doesn't* treat 0 as an uninitialized value, and either has a >>> separate "not initialized yet" bit (but that's iffy storage wise), >>> or simply has another argument to _init() for the initial value or >>> so. >>> >>> 2) you don't just don't use 0 and 100 but say 1 and 100, that results in >>> basically the same behaviour, but avoids the special 0. >>> >>> johannes >> I also ran into that problem in the past, and reviewing it again with a >> college, I think, this is a real bug in the EWMA implementation. I try >> to provide a proper patch in the next days, but actually the EWMA >> handles the internal value zero, always like in the initialization, >> which is wrong, e.g., for positive/negative averaged values. > Yes, it's always wrong as long as you feed it something zero, or values > with different sign. > > For a lot of use cases, however, that doesn't matter. Originally, it was > used e.g. for signal strength averaging, average packet lengths, etc. > where it really doesn't matter since you can never use 0 or values that > have different sign. > >> A quick research shows, this bug is since the first implementation of >> the ewma in the code ... >> > Yeah, I'm aware of that, I was around for it ;-) > > But see above, I'm not sure I'd even call it a bug, at least not > originally with the users that we had intended. > > Hence I don't know if it's really good to fix this in general - for many > of these cases zero can still be treated specially (and like I mentioned > in my previous email, we can even here avoid 0), and then we don't spend > an extra byte (or likely 4) to hold a "first time" flag. > > Dunno. Maybe it's not worth thinking about the extra memory space vs. > the extra maintenance cost. But maybe at least on 64-bit we could steal > a bit from the unsigned long? Not sure what all the users are... I thought of introducing a separate function to initialize the "average", which could be optimized away, when unused. I had a look at the usage, and it looks like 10-15 places, which should work with initializing or simply weight the new value always, instead of the special case. For me the problem is, that the current implementation is unintuitive or at least badly documented. And I would even claim the same argument, that for most users, the behavior for initialization also does not matter, therefore I would use the mathematically more natural implementation :-)
On 20.04.23 12:27, Johannes Berg wrote: > On Thu, 2023-04-20 at 11:30 +0200, Benjamin Beichler wrote: >> > To me, basically, I see two ways to solve this: >> > >> > 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which >> > *doesn't* treat 0 as an uninitialized value, and either has a >> > separate "not initialized yet" bit (but that's iffy storage wise), >> > or simply has another argument to _init() for the initial value or >> > so. >> > >> > 2) you don't just don't use 0 and 100 but say 1 and 100, that results in >> > basically the same behaviour, but avoids the special 0. >> > >> > johannes >> >> I also ran into that problem in the past, and reviewing it again with a >> college, I think, this is a real bug in the EWMA implementation. I try >> to provide a proper patch in the next days, but actually the EWMA >> handles the internal value zero, always like in the initialization, >> which is wrong, e.g., for positive/negative averaged values. > > Yes, it's always wrong as long as you feed it something zero, or values > with different sign. > > For a lot of use cases, however, that doesn't matter. Originally, it was > used e.g. for signal strength averaging, average packet lengths, etc. > where it really doesn't matter since you can never use 0 or values that > have different sign. > >> A quick research shows, this bug is since the first implementation of >> the ewma in the code ... >> > > Yeah, I'm aware of that, I was around for it ;-) > > But see above, I'm not sure I'd even call it a bug, at least not > originally with the users that we had intended. > > Hence I don't know if it's really good to fix this in general - for many > of these cases zero can still be treated specially (and like I mentioned > in my previous email, we can even here avoid 0), and then we don't spend > an extra byte (or likely 4) to hold a "first time" flag. > > Dunno. Maybe it's not worth thinking about the extra memory space vs. > the extra maintenance cost. But maybe at least on 64-bit we could steal > a bit from the unsigned long? Not sure what all the users are.. We don't actually need a full bit. We can just add 1 to the internal value for initialized values. How about this (completely untested): https://nbd.name/p/69b00c5b - Felix
Am 20.04.2023 um 14:22 schrieb Felix Fietkau: > On 20.04.23 12:27, Johannes Berg wrote: >> On Thu, 2023-04-20 at 11:30 +0200, Benjamin Beichler wrote: >>> > To me, basically, I see two ways to solve this: >>> > >>> > 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which >>> > *doesn't* treat 0 as an uninitialized value, and either has a >>> > separate "not initialized yet" bit (but that's iffy storage >>> wise), >>> > or simply has another argument to _init() for the initial >>> value or >>> > so. >>> > >>> > 2) you don't just don't use 0 and 100 but say 1 and 100, that >>> results in >>> > basically the same behaviour, but avoids the special 0. >>> > >>> > johannes >>> >>> I also ran into that problem in the past, and reviewing it again with a >>> college, I think, this is a real bug in the EWMA implementation. I try >>> to provide a proper patch in the next days, but actually the EWMA >>> handles the internal value zero, always like in the initialization, >>> which is wrong, e.g., for positive/negative averaged values. >> >> Yes, it's always wrong as long as you feed it something zero, or values >> with different sign. >> >> For a lot of use cases, however, that doesn't matter. Originally, it was >> used e.g. for signal strength averaging, average packet lengths, etc. >> where it really doesn't matter since you can never use 0 or values that >> have different sign. >> >>> A quick research shows, this bug is since the first implementation of >>> the ewma in the code ... >>> >> >> Yeah, I'm aware of that, I was around for it ;-) >> >> But see above, I'm not sure I'd even call it a bug, at least not >> originally with the users that we had intended. >> >> Hence I don't know if it's really good to fix this in general - for many >> of these cases zero can still be treated specially (and like I mentioned >> in my previous email, we can even here avoid 0), and then we don't spend >> an extra byte (or likely 4) to hold a "first time" flag. >> >> Dunno. Maybe it's not worth thinking about the extra memory space vs. >> the extra maintenance cost. But maybe at least on 64-bit we could steal >> a bit from the unsigned long? Not sure what all the users are.. > We don't actually need a full bit. We can just add 1 to the internal > value for initialized values. How about this (completely untested): > https://nbd.name/p/69b00c5b > Nice hack, but even a bit more dirty than before :-D I think being more explicit, does not hurt here. Maybe we could rename it ewma_*_add_or_init and do a separate function with ewma_*_add. Since it is inlined, there will be no significant text size increase (except a driver uses both functions). > - Felix >
> On 20. Apr 2023, at 15:00, Benjamin Beichler <Benjamin.Beichler@uni-rostock.de> wrote: > > Am 20.04.2023 um 14:22 schrieb Felix Fietkau: >>> On 20.04.23 12:27, Johannes Berg wrote: >>> On Thu, 2023-04-20 at 11:30 +0200, Benjamin Beichler wrote: >>>> > To me, basically, I see two ways to solve this: >>>> > >>>> > 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which >>>> > *doesn't* treat 0 as an uninitialized value, and either has a >>>> > separate "not initialized yet" bit (but that's iffy storage wise), >>>> > or simply has another argument to _init() for the initial value or >>>> > so. >>>> > >>>> > 2) you don't just don't use 0 and 100 but say 1 and 100, that results in >>>> > basically the same behaviour, but avoids the special 0. >>>> > >>>> > johannes >>>> >>>> I also ran into that problem in the past, and reviewing it again with a >>>> college, I think, this is a real bug in the EWMA implementation. I try >>>> to provide a proper patch in the next days, but actually the EWMA >>>> handles the internal value zero, always like in the initialization, >>>> which is wrong, e.g., for positive/negative averaged values. >>> >>> Yes, it's always wrong as long as you feed it something zero, or values >>> with different sign. >>> >>> For a lot of use cases, however, that doesn't matter. Originally, it was >>> used e.g. for signal strength averaging, average packet lengths, etc. >>> where it really doesn't matter since you can never use 0 or values that >>> have different sign. >>> >>>> A quick research shows, this bug is since the first implementation of >>>> the ewma in the code ... >>>> >>> >>> Yeah, I'm aware of that, I was around for it ;-) >>> >>> But see above, I'm not sure I'd even call it a bug, at least not >>> originally with the users that we had intended. >>> >>> Hence I don't know if it's really good to fix this in general - for many >>> of these cases zero can still be treated specially (and like I mentioned >>> in my previous email, we can even here avoid 0), and then we don't spend >>> an extra byte (or likely 4) to hold a "first time" flag. >>> >>> Dunno. Maybe it's not worth thinking about the extra memory space vs. >>> the extra maintenance cost. But maybe at least on 64-bit we could steal >>> a bit from the unsigned long? Not sure what all the users are.. >> We don't actually need a full bit. We can just add 1 to the internal >> value for initialized values. How about this (completely untested): >> https://nbd.name/p/69b00c5b >> > Nice hack, but even a bit more dirty than before :-D I disagree. With my “hack” at least the implementation will do the right thing without the API user having to worry about 0 as a value. It could use some comments to clarify the details, but I do think it makes things easier. - Felix
On Thu, 2023-04-20 at 15:15 +0200, Felix Fietkau wrote: > > > On 20. Apr 2023, at 15:00, Benjamin Beichler <Benjamin.Beichler@uni-rostock.de> wrote: > > > > Am 20.04.2023 um 14:22 schrieb Felix Fietkau: > > > > On 20.04.23 12:27, Johannes Berg wrote: > > > > On Thu, 2023-04-20 at 11:30 +0200, Benjamin Beichler wrote: > > > > > > To me, basically, I see two ways to solve this: > > > > > > > > > > > > 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which > > > > > > *doesn't* treat 0 as an uninitialized value, and either has a > > > > > > separate "not initialized yet" bit (but that's iffy storage wise), > > > > > > or simply has another argument to _init() for the initial value or > > > > > > so. > > > > > > > > > > > > 2) you don't just don't use 0 and 100 but say 1 and 100, that results in > > > > > > basically the same behaviour, but avoids the special 0. > > > > > > > > > > > > johannes > > > > > > > > > > I also ran into that problem in the past, and reviewing it again with a > > > > > college, I think, this is a real bug in the EWMA implementation. I try > > > > > to provide a proper patch in the next days, but actually the EWMA > > > > > handles the internal value zero, always like in the initialization, > > > > > which is wrong, e.g., for positive/negative averaged values. > > > > > > > > Yes, it's always wrong as long as you feed it something zero, or values > > > > with different sign. > > > > > > > > For a lot of use cases, however, that doesn't matter. Originally, it was > > > > used e.g. for signal strength averaging, average packet lengths, etc. > > > > where it really doesn't matter since you can never use 0 or values that > > > > have different sign. > > > > > > > > > A quick research shows, this bug is since the first implementation of > > > > > the ewma in the code ... > > > > > > > > > > > > > Yeah, I'm aware of that, I was around for it ;-) > > > > > > > > But see above, I'm not sure I'd even call it a bug, at least not > > > > originally with the users that we had intended. > > > > > > > > Hence I don't know if it's really good to fix this in general - for many > > > > of these cases zero can still be treated specially (and like I mentioned > > > > in my previous email, we can even here avoid 0), and then we don't spend > > > > an extra byte (or likely 4) to hold a "first time" flag. > > > > > > > > Dunno. Maybe it's not worth thinking about the extra memory space vs. > > > > the extra maintenance cost. But maybe at least on 64-bit we could steal > > > > a bit from the unsigned long? Not sure what all the users are.. > > > We don't actually need a full bit. We can just add 1 to the internal > > > value for initialized values. How about this (completely untested): > > > https://nbd.name/p/69b00c5b > > > > > Nice hack, but even a bit more dirty than before :-D > > I disagree. With my “hack” at least the implementation will do the > right thing without the API user having to worry about 0 as a value. > It could use some comments to clarify the details, but I do think it > makes things easier. > To me, the first question is if there are potentially any users that are _relying_ on the current behaviour. This seems unlikely though, looking at the ~30 users, most sound like signal/rssi, packet sizes, etc. So let's say with the bug found here that prompted this patch, chances are that there aren't any users that really want 0 to be special. I also can't even really think of a reason for wanting that. So then let's say we want to fix the existing code. I can think of these possible ways: * splitting off a bit for initialized from the unsigned long (which at least for 64-bit should be OK since presumably most code using this will run on 32-bit systems too) * adding another value for it, e.g. making it u32 and adding a bool for "first value" * biasing the value, like Felix proposes, could be by 1 or -1 for example All of these have a memory cost, of course, though the first two are data and the second code, so for things like stations the code exists only once and the data multiple times. On 64-bit we can probably make the first two not have a data memory cost though. As for biasing the value, couldn't that lead to a similar problem? It's clearly less likely that the end of the range is reached rather than zero, but still? johannes
On 21.04.23 11:35, Johannes Berg wrote: > To me, the first question is if there are potentially any users that are > _relying_ on the current behaviour. This seems unlikely though, looking > at the ~30 users, most sound like signal/rssi, packet sizes, etc. > > So let's say with the bug found here that prompted this patch, chances > are that there aren't any users that really want 0 to be special. I also > can't even really think of a reason for wanting that. > > > So then let's say we want to fix the existing code. I can think of these > possible ways: > > * splitting off a bit for initialized from the unsigned long > (which at least for 64-bit should be OK since presumably most code > using this will run on 32-bit systems too) > * adding another value for it, e.g. making it u32 and adding a bool for > "first value" > * biasing the value, like Felix proposes, could be by 1 or -1 for > example > > All of these have a memory cost, of course, though the first two are > data and the second code, so for things like stations the code exists > only once and the data multiple times. On 64-bit we can probably make > the first two not have a data memory cost though. > > As for biasing the value, couldn't that lead to a similar problem? It's > clearly less likely that the end of the range is reached rather than > zero, but still? I don't see how it can reduce the range in any way, since the bias is added to the fractional part. A range reduction would seem to imply having an average value that's bigger than the maximum allowed shifted input (top bits cut off), and I don't think that's possible. - Felix
Am 20.04.2023 um 15:15 schrieb Felix Fietkau: >> On 20. Apr 2023, at 15:00, Benjamin Beichler<Benjamin.Beichler@uni-rostock.de> wrote: >> >> Am 20.04.2023 um 14:22 schrieb Felix Fietkau: >>>> On 20.04.23 12:27, Johannes Berg wrote: >>>> On Thu, 2023-04-20 at 11:30 +0200, Benjamin Beichler wrote: >>>>>> To me, basically, I see two ways to solve this: >>>>>> >>>>>> 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which >>>>>> *doesn't* treat 0 as an uninitialized value, and either has a >>>>>> separate "not initialized yet" bit (but that's iffy storage wise), >>>>>> or simply has another argument to _init() for the initial value or >>>>>> so. >>>>>> >>>>>> 2) you don't just don't use 0 and 100 but say 1 and 100, that results in >>>>>> basically the same behaviour, but avoids the special 0. >>>>>> >>>>>> johannes >>>>> I also ran into that problem in the past, and reviewing it again with a >>>>> college, I think, this is a real bug in the EWMA implementation. I try >>>>> to provide a proper patch in the next days, but actually the EWMA >>>>> handles the internal value zero, always like in the initialization, >>>>> which is wrong, e.g., for positive/negative averaged values. >>>> Yes, it's always wrong as long as you feed it something zero, or values >>>> with different sign. >>>> >>>> For a lot of use cases, however, that doesn't matter. Originally, it was >>>> used e.g. for signal strength averaging, average packet lengths, etc. >>>> where it really doesn't matter since you can never use 0 or values that >>>> have different sign. >>>> >>>>> A quick research shows, this bug is since the first implementation of >>>>> the ewma in the code ... >>>>> >>>> Yeah, I'm aware of that, I was around for it ;-) >>>> >>>> But see above, I'm not sure I'd even call it a bug, at least not >>>> originally with the users that we had intended. >>>> >>>> Hence I don't know if it's really good to fix this in general - for many >>>> of these cases zero can still be treated specially (and like I mentioned >>>> in my previous email, we can even here avoid 0), and then we don't spend >>>> an extra byte (or likely 4) to hold a "first time" flag. >>>> >>>> Dunno. Maybe it's not worth thinking about the extra memory space vs. >>>> the extra maintenance cost. But maybe at least on 64-bit we could steal >>>> a bit from the unsigned long? Not sure what all the users are.. >>> We don't actually need a full bit. We can just add 1 to the internal >>> value for initialized values. How about this (completely untested): >>> https://nbd.name/p/69b00c5b >>> >> Nice hack, but even a bit more dirty than before :-D > I disagree. With my “hack” at least the implementation will do the right thing without the API user having to worry about 0 as a value. It could use some comments to clarify the details, but I do think it makes things easier. I would like to cite Johannes here: > I mean, I guess I can see where this patch makes some sense like from a > code point of view, this is sort of the minimal code change you could > make to make the existing code work, but ... I'd argue you're optimising > to the wrong metric here, "minimal code changes to fix the bug" should > normally not be your metric, it should be "code changes that make this > clearer and avoid the problem", or something like that? I agree that you solve the problem, but it is not clearer. And as a user I expect to have a classical EWMA, which I initialize with a proper value and then use it with normal inputs (where I include zero and negative numbers). From my understanding, you shifted the problem only to one digit before zero, which may even not solve it for the mesh case here (but maybe I did not catch it right). I may also argue, that you introduce 2 additions into the hot path to avoid a separate init function, which is mostly called once. And it even introduces non-intuitive behavior in corner cases. However, even the mostly unused branch on the value of internal for initialization is suboptimal for the hot path and would be an indicator for a separate init function. From my dig into the usages, most do not care for initialization or do it explicitly with bogus add calls, which can be simply replaced. Benjamin
Am 21.04.2023 um 11:53 schrieb Felix Fietkau: > ________________________________ > Achtung! Externe E-Mail: Klicken Sie erst dann auf Links und Anhänge, > nachdem Sie die Vertrauenswürdigkeit der Absenderadresse geprüft haben. > ________________________________ > > On 21.04.23 11:35, Johannes Berg wrote: >> To me, the first question is if there are potentially any users that are >> _relying_ on the current behaviour. This seems unlikely though, looking >> at the ~30 users, most sound like signal/rssi, packet sizes, etc. >> >> So let's say with the bug found here that prompted this patch, chances >> are that there aren't any users that really want 0 to be special. I also >> can't even really think of a reason for wanting that. >> >> >> So then let's say we want to fix the existing code. I can think of these >> possible ways: >> >> * splitting off a bit for initialized from the unsigned long >> (which at least for 64-bit should be OK since presumably most code >> using this will run on 32-bit systems too) >> * adding another value for it, e.g. making it u32 and adding a bool >> for >> "first value" >> * biasing the value, like Felix proposes, could be by 1 or -1 for >> example You forgot the possibility to introduce a separate init function, which boils down to a shift with an assignment statement for code, and no further data memory cost. Even simply extending the current init function (which simply always set 0) would be enough. >> >> All of these have a memory cost, of course, though the first two are >> data and the second code, so for things like stations the code exists >> only once and the data multiple times. On 64-bit we can probably make >> the first two not have a data memory cost though. >> >> As for biasing the value, couldn't that lead to a similar problem? It's >> clearly less likely that the end of the range is reached rather than >> zero, but still? > I don't see how it can reduce the range in any way, since the bias is > added to the fractional part. A range reduction would seem to imply > having an average value that's bigger than the maximum allowed shifted > input (top bits cut off), and I don't think that's possible. > It does not reduce the range, but it does not matter whether your internal state is 0 or 2^(-precision), the non-intuitive behavior stays the same. > - Felix >
On Fri, 2023-04-21 at 12:34 +0200, Benjamin Beichler wrote: > > > > > > So then let's say we want to fix the existing code. I can think of these > > > possible ways: > > > > > > * splitting off a bit for initialized from the unsigned long > > > (which at least for 64-bit should be OK since presumably most code > > > using this will run on 32-bit systems too) > > > * adding another value for it, e.g. making it u32 and adding a bool > > > for > > > "first value" > > > * biasing the value, like Felix proposes, could be by 1 or -1 for > > > example > You forgot the possibility to introduce a separate init function, which > boils down to a shift with an assignment statement for code, and no > further data memory cost. Even simply extending the current init > function (which simply always set 0) would be enough. Sort of. Yeah I should've mentioned it, but that means you actually have to know the first value, and track "first time usage" separately in the user code. Or you init to something useful at the first value, e.g. saying for signal strength "let's assume -45dBm average if we don't know". That doesn't seem very practical? The behaviour of "first value inserted will init" seems sensible. > > > As for biasing the value, couldn't that lead to a similar problem? It's > > > clearly less likely that the end of the range is reached rather than > > > zero, but still? > > I don't see how it can reduce the range in any way, since the bias is > > added to the fractional part. A range reduction would seem to imply > > having an average value that's bigger than the maximum allowed shifted > > input (top bits cut off), and I don't think that's possible. > > > It does not reduce the range, but it does not matter whether your > internal state is 0 or 2^(-precision), the non-intuitive behavior stays > the same. OK I was sort of handwaving ... :-) To have a problem, basically the +1 has to overflow the value, so that we think that the next time around we should init, rather than add. That means the existing average has to be 0xffff'ffff (let's take 32 bits, its easier to type). Clearly that can't happen on the first time since then the precision bits are all 0. But I think Felix is right (thought not sure about the reasoning) and that cannot happen, because the add calculation does a ">>weight_rcp" shift at the end, so there are always some top bits that are non-zero, and _weight_rcp has to be a power of two. Now, 1 is a power of two, but that'd be really stupid, and nobody is using it ... So I think if we prohibit 1 for that, we're fine? Btw, Felix, shouldn't your patch have said "bool init = !internal"? johannes
Am 21.04.2023 um 13:13 schrieb Johannes Berg: > ________________________________ > Achtung! Externe E-Mail: Klicken Sie erst dann auf Links und Anhänge, nachdem Sie die Vertrauenswürdigkeit der Absenderadresse geprüft haben. > ________________________________ > > On Fri, 2023-04-21 at 12:34 +0200, Benjamin Beichler wrote: >>>> So then let's say we want to fix the existing code. I can think of these >>>> possible ways: >>>> >>>> * splitting off a bit for initialized from the unsigned long >>>> (which at least for 64-bit should be OK since presumably most code >>>> using this will run on 32-bit systems too) >>>> * adding another value for it, e.g. making it u32 and adding a bool >>>> for >>>> "first value" >>>> * biasing the value, like Felix proposes, could be by 1 or -1 for >>>> example >> You forgot the possibility to introduce a separate init function, which >> boils down to a shift with an assignment statement for code, and no >> further data memory cost. Even simply extending the current init >> function (which simply always set 0) would be enough. > Sort of. Yeah I should've mentioned it, but that means you actually have > to know the first value, and track "first time usage" separately in the > user code. > > Or you init to something useful at the first value, e.g. saying for > signal strength "let's assume -45dBm average if we don't know". That > doesn't seem very practical? I think (at least for the usages I dig into), either those values are already defined and applied with a bogus *_add. > > The behaviour of "first value inserted will init" seems sensible. Okay, my suggestion, as this is a c-macro version, which is generated for each user: why not offer both? If it is unused, the compiler will sort that out. Then we simply keep the zero as special value for init (and document it) and otherwise for the more mathematically defined usage. But my feeling is, that most users do not care about the first values that much and wait to let the ewma settle to some value. I try to do a RFC patch, with what I mean. Benjamin
On 21.04.23 13:13, Johannes Berg wrote: > On Fri, 2023-04-21 at 12:34 +0200, Benjamin Beichler wrote: >> > > >> > > So then let's say we want to fix the existing code. I can think of these >> > > possible ways: >> > > >> > > * splitting off a bit for initialized from the unsigned long >> > > (which at least for 64-bit should be OK since presumably most code >> > > using this will run on 32-bit systems too) >> > > * adding another value for it, e.g. making it u32 and adding a bool >> > > for >> > > "first value" >> > > * biasing the value, like Felix proposes, could be by 1 or -1 for >> > > example >> You forgot the possibility to introduce a separate init function, which >> boils down to a shift with an assignment statement for code, and no >> further data memory cost. Even simply extending the current init >> function (which simply always set 0) would be enough. > > Sort of. Yeah I should've mentioned it, but that means you actually have > to know the first value, and track "first time usage" separately in the > user code. > > Or you init to something useful at the first value, e.g. saying for > signal strength "let's assume -45dBm average if we don't know". That > doesn't seem very practical? > > The behaviour of "first value inserted will init" seems sensible. > >> > > As for biasing the value, couldn't that lead to a similar problem? It's >> > > clearly less likely that the end of the range is reached rather than >> > > zero, but still? >> > I don't see how it can reduce the range in any way, since the bias is >> > added to the fractional part. A range reduction would seem to imply >> > having an average value that's bigger than the maximum allowed shifted >> > input (top bits cut off), and I don't think that's possible. >> > >> It does not reduce the range, but it does not matter whether your >> internal state is 0 or 2^(-precision), the non-intuitive behavior stays >> the same. > > OK I was sort of handwaving ... :-) > > To have a problem, basically the +1 has to overflow the value, so that > we think that the next time around we should init, rather than add. > > That means the existing average has to be 0xffff'ffff (let's take 32 > bits, its easier to type). Clearly that can't happen on the first time > since then the precision bits are all 0. > > But I think Felix is right (thought not sure about the reasoning) and > that cannot happen, because the add calculation does a ">>weight_rcp" > shift at the end, so there are always some top bits that are non-zero, > and _weight_rcp has to be a power of two. Now, 1 is a power of two, but > that'd be really stupid, and nobody is using it ... So I think if we > prohibit 1 for that, we're fine? > > Btw, Felix, shouldn't your patch have said "bool init = !internal"? No, 'internal' is the correct value. The name is confusing though - I intended it to be short for 'initalized', but it should probably be renamed to 'valid' or something like that. - Felix
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c index 9b1ce7c3925a..f89cbcf212d5 100644 --- a/net/mac80211/mesh_hwmp.c +++ b/net/mac80211/mesh_hwmp.c @@ -298,10 +298,23 @@ void ieee80211s_update_metric(struct ieee80211_local *local, { struct ieee80211_tx_info *txinfo = st->info; int failed; + u32 fail_avg; struct rate_info rinfo; failed = !(txinfo->flags & IEEE80211_TX_STAT_ACK); + fail_avg = ewma_mesh_fail_avg_read(&sta->mesh->fail_avg); + if (!fail_avg) { + /* If the average value in mesh metrics calculation + * has been rounded to 0 (success), this resets it to + * the smallest nonzero value (similarly to the initialization) + * to avoid a case where a single failure would result in + * an average value that goes beyond the value + * of 95 (Link Failure Threshold) + */ + ewma_mesh_fail_avg_add(&sta->mesh->fail_avg, 1); + } + /* moving average, scaled to 100. * feed failure as 100 and success as 0 */