Message ID | 20240415132054.3822230-2-edumazet@google.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net_sched: first series for RTNL-less qdisc dumps | expand |
On Mon, Apr 15, 2024 at 01:20:41PM +0000, Eric Dumazet wrote: > Instead of relying on RTNL, fq_dump() can use READ_ONCE() > annotations, paired with WRITE_ONCE() in fq_change() > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > net/sched/sch_fq.c | 96 +++++++++++++++++++++++++++++----------------- > 1 file changed, 60 insertions(+), 36 deletions(-) > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > index cdf23ff16f40bf244bb822e76016fde44e0c439b..934c220b3f4336dc2f70af74d7758218492b675d 100644 > --- a/net/sched/sch_fq.c > +++ b/net/sched/sch_fq.c > @@ -888,7 +888,7 @@ static int fq_resize(struct Qdisc *sch, u32 log) > fq_rehash(q, old_fq_root, q->fq_trees_log, array, log); > > q->fq_root = array; > - q->fq_trees_log = log; > + WRITE_ONCE(q->fq_trees_log, log); > > sch_tree_unlock(sch); > > @@ -931,7 +931,7 @@ static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) > > memset(out, 0, num_elems / 4); > for (i = 0; i < num_elems; i++) > - out[i / 4] |= in[i] << (2 * (i & 0x3)); > + out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); > } > Hi Eric, I am a little unsure about the handling of q->prio2band in this patch. It seems to me that fq_prio2band_compress_crumb() is used to to store values in q->prio2band, and is called (indirectly) from fq_change() (and directly from fq_init()). While fq_prio2band_decompress_crumb() is used to read values from q->prio2band, and is called from fq_dump(). So I am wondering if should use WRITE_ONCE() when storing elements of out. And fq_prio2band_decompress_crumb should use READ_ONCE when reading elements of in.
On Tue, Apr 16, 2024 at 8:19 PM Simon Horman <horms@kernel.org> wrote: > > On Mon, Apr 15, 2024 at 01:20:41PM +0000, Eric Dumazet wrote: > > Instead of relying on RTNL, fq_dump() can use READ_ONCE() > > annotations, paired with WRITE_ONCE() in fq_change() > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > --- > > net/sched/sch_fq.c | 96 +++++++++++++++++++++++++++++----------------- > > 1 file changed, 60 insertions(+), 36 deletions(-) > > > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > > index cdf23ff16f40bf244bb822e76016fde44e0c439b..934c220b3f4336dc2f70af74d7758218492b675d 100644 > > --- a/net/sched/sch_fq.c > > +++ b/net/sched/sch_fq.c > > @@ -888,7 +888,7 @@ static int fq_resize(struct Qdisc *sch, u32 log) > > fq_rehash(q, old_fq_root, q->fq_trees_log, array, log); > > > > q->fq_root = array; > > - q->fq_trees_log = log; > > + WRITE_ONCE(q->fq_trees_log, log); > > > > sch_tree_unlock(sch); > > > > @@ -931,7 +931,7 @@ static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) > > > > memset(out, 0, num_elems / 4); > > for (i = 0; i < num_elems; i++) > > - out[i / 4] |= in[i] << (2 * (i & 0x3)); > > + out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); > > } > > > > Hi Eric, > > I am a little unsure about the handling of q->prio2band in this patch. > > It seems to me that fq_prio2band_compress_crumb() is used to > to store values in q->prio2band, and is called (indirectly) > from fq_change() (and directly from fq_init()). > > While fq_prio2band_decompress_crumb() is used to read values > from q->prio2band, and is called from fq_dump(). > > So I am wondering if should use WRITE_ONCE() when storing elements > of out. And fq_prio2band_decompress_crumb should use READ_ONCE when > reading elements of in. Yeah, you are probably right, I recall being a bit lazy on this part, thanks !
On Tue, Apr 16, 2024 at 8:33 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Apr 16, 2024 at 8:19 PM Simon Horman <horms@kernel.org> wrote: > > > > On Mon, Apr 15, 2024 at 01:20:41PM +0000, Eric Dumazet wrote: > > > Instead of relying on RTNL, fq_dump() can use READ_ONCE() > > > annotations, paired with WRITE_ONCE() in fq_change() > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > --- > > > net/sched/sch_fq.c | 96 +++++++++++++++++++++++++++++----------------- > > > 1 file changed, 60 insertions(+), 36 deletions(-) > > > > > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > > > index cdf23ff16f40bf244bb822e76016fde44e0c439b..934c220b3f4336dc2f70af74d7758218492b675d 100644 > > > --- a/net/sched/sch_fq.c > > > +++ b/net/sched/sch_fq.c > > > @@ -888,7 +888,7 @@ static int fq_resize(struct Qdisc *sch, u32 log) > > > fq_rehash(q, old_fq_root, q->fq_trees_log, array, log); > > > > > > q->fq_root = array; > > > - q->fq_trees_log = log; > > > + WRITE_ONCE(q->fq_trees_log, log); > > > > > > sch_tree_unlock(sch); > > > > > > @@ -931,7 +931,7 @@ static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) > > > > > > memset(out, 0, num_elems / 4); > > > for (i = 0; i < num_elems; i++) > > > - out[i / 4] |= in[i] << (2 * (i & 0x3)); > > > + out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); > > > } > > > > > > > Hi Eric, > > > > I am a little unsure about the handling of q->prio2band in this patch. > > > > It seems to me that fq_prio2band_compress_crumb() is used to > > to store values in q->prio2band, and is called (indirectly) > > from fq_change() (and directly from fq_init()). > > > > While fq_prio2band_decompress_crumb() is used to read values > > from q->prio2band, and is called from fq_dump(). > > > > So I am wondering if should use WRITE_ONCE() when storing elements > > of out. And fq_prio2band_decompress_crumb should use READ_ONCE when > > reading elements of in. > > Yeah, you are probably right, I recall being a bit lazy on this part, > thanks ! I will squash in V2 this part : diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 934c220b3f4336dc2f70af74d7758218492b675d..238974725679327b0a0d483c011e15fc94ab0878 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -106,6 +106,8 @@ struct fq_perband_flows { int quantum; /* based on band nr : 576KB, 192KB, 64KB */ }; +#define FQ_PRIO2BAND_CRUMB_SIZE ((TC_PRIO_MAX + 1) >> 2) + struct fq_sched_data { /* Read mostly cache line */ @@ -122,7 +124,7 @@ struct fq_sched_data { u8 rate_enable; u8 fq_trees_log; u8 horizon_drop; - u8 prio2band[(TC_PRIO_MAX + 1) >> 2]; + u8 prio2band[FQ_PRIO2BAND_CRUMB_SIZE]; u32 timer_slack; /* hrtimer slack in ns */ /* Read/Write fields. */ @@ -159,7 +161,7 @@ struct fq_sched_data { /* return the i-th 2-bit value ("crumb") */ static u8 fq_prio2band(const u8 *prio2band, unsigned int prio) { - return (prio2band[prio / 4] >> (2 * (prio & 0x3))) & 0x3; + return (READ_ONCE(prio2band[prio / 4]) >> (2 * (prio & 0x3))) & 0x3; } /* @@ -927,11 +929,15 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = { static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) { const int num_elems = TC_PRIO_MAX + 1; + u8 tmp[FQ_PRIO2BAND_CRUMB_SIZE]; int i; - memset(out, 0, num_elems / 4); + memset(tmp, 0, sizeof(tmp)); for (i = 0; i < num_elems; i++) - out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); + tmp[i / 4] |= in[i] << (2 * (i & 0x3)); + + for (i = 0; i < FQ_PRIO2BAND_CRUMB_SIZE; i++) + WRITE_ONCE(out[i], tmp[i]); } static void fq_prio2band_decompress_crumb(const u8 *in, u8 *out)
On Wed, Apr 17, 2024 at 10:45:09AM +0200, Eric Dumazet wrote: > On Tue, Apr 16, 2024 at 8:33 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Apr 16, 2024 at 8:19 PM Simon Horman <horms@kernel.org> wrote: > > > > > > On Mon, Apr 15, 2024 at 01:20:41PM +0000, Eric Dumazet wrote: > > > > Instead of relying on RTNL, fq_dump() can use READ_ONCE() > > > > annotations, paired with WRITE_ONCE() in fq_change() > > > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > --- > > > > net/sched/sch_fq.c | 96 +++++++++++++++++++++++++++++----------------- > > > > 1 file changed, 60 insertions(+), 36 deletions(-) > > > > > > > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > > > > index cdf23ff16f40bf244bb822e76016fde44e0c439b..934c220b3f4336dc2f70af74d7758218492b675d 100644 > > > > --- a/net/sched/sch_fq.c > > > > +++ b/net/sched/sch_fq.c > > > > @@ -888,7 +888,7 @@ static int fq_resize(struct Qdisc *sch, u32 log) > > > > fq_rehash(q, old_fq_root, q->fq_trees_log, array, log); > > > > > > > > q->fq_root = array; > > > > - q->fq_trees_log = log; > > > > + WRITE_ONCE(q->fq_trees_log, log); > > > > > > > > sch_tree_unlock(sch); > > > > > > > > @@ -931,7 +931,7 @@ static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) > > > > > > > > memset(out, 0, num_elems / 4); > > > > for (i = 0; i < num_elems; i++) > > > > - out[i / 4] |= in[i] << (2 * (i & 0x3)); > > > > + out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); > > > > } > > > > > > > > > > Hi Eric, > > > > > > I am a little unsure about the handling of q->prio2band in this patch. > > > > > > It seems to me that fq_prio2band_compress_crumb() is used to > > > to store values in q->prio2band, and is called (indirectly) > > > from fq_change() (and directly from fq_init()). > > > > > > While fq_prio2band_decompress_crumb() is used to read values > > > from q->prio2band, and is called from fq_dump(). > > > > > > So I am wondering if should use WRITE_ONCE() when storing elements > > > of out. And fq_prio2band_decompress_crumb should use READ_ONCE when > > > reading elements of in. > > > > Yeah, you are probably right, I recall being a bit lazy on this part, > > thanks ! > > I will squash in V2 this part : > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > index 934c220b3f4336dc2f70af74d7758218492b675d..238974725679327b0a0d483c011e15fc94ab0878 > 100644 > --- a/net/sched/sch_fq.c > +++ b/net/sched/sch_fq.c > @@ -106,6 +106,8 @@ struct fq_perband_flows { > int quantum; /* based on band nr : 576KB, 192KB, 64KB */ > }; > > +#define FQ_PRIO2BAND_CRUMB_SIZE ((TC_PRIO_MAX + 1) >> 2) > + > struct fq_sched_data { > /* Read mostly cache line */ > > @@ -122,7 +124,7 @@ struct fq_sched_data { > u8 rate_enable; > u8 fq_trees_log; > u8 horizon_drop; > - u8 prio2band[(TC_PRIO_MAX + 1) >> 2]; > + u8 prio2band[FQ_PRIO2BAND_CRUMB_SIZE]; > u32 timer_slack; /* hrtimer slack in ns */ > > /* Read/Write fields. */ > @@ -159,7 +161,7 @@ struct fq_sched_data { > /* return the i-th 2-bit value ("crumb") */ > static u8 fq_prio2band(const u8 *prio2band, unsigned int prio) > { > - return (prio2band[prio / 4] >> (2 * (prio & 0x3))) & 0x3; > + return (READ_ONCE(prio2band[prio / 4]) >> (2 * (prio & 0x3))) & 0x3; > } Thanks Eric, assuming that it is ok for this version of fq_prio2band() to run from fq_enqueue(), this update looks good to me. > > /* > @@ -927,11 +929,15 @@ static const struct nla_policy > fq_policy[TCA_FQ_MAX + 1] = { > static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) > { > const int num_elems = TC_PRIO_MAX + 1; > + u8 tmp[FQ_PRIO2BAND_CRUMB_SIZE]; > int i; > > - memset(out, 0, num_elems / 4); > + memset(tmp, 0, sizeof(tmp)); > for (i = 0; i < num_elems; i++) > - out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); > + tmp[i / 4] |= in[i] << (2 * (i & 0x3)); > + > + for (i = 0; i < FQ_PRIO2BAND_CRUMB_SIZE; i++) > + WRITE_ONCE(out[i], tmp[i]); > } > > static void fq_prio2band_decompress_crumb(const u8 *in, u8 *out) >
On Wed, Apr 17, 2024 at 11:00 AM Simon Horman <horms@kernel.org> wrote: > > On Wed, Apr 17, 2024 at 10:45:09AM +0200, Eric Dumazet wrote: > > On Tue, Apr 16, 2024 at 8:33 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Tue, Apr 16, 2024 at 8:19 PM Simon Horman <horms@kernel.org> wrote: > > > > > > > > On Mon, Apr 15, 2024 at 01:20:41PM +0000, Eric Dumazet wrote: > > > > > Instead of relying on RTNL, fq_dump() can use READ_ONCE() > > > > > annotations, paired with WRITE_ONCE() in fq_change() > > > > > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > > --- > > > > > net/sched/sch_fq.c | 96 +++++++++++++++++++++++++++++----------------- > > > > > 1 file changed, 60 insertions(+), 36 deletions(-) > > > > > > > > > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > > > > > index cdf23ff16f40bf244bb822e76016fde44e0c439b..934c220b3f4336dc2f70af74d7758218492b675d 100644 > > > > > --- a/net/sched/sch_fq.c > > > > > +++ b/net/sched/sch_fq.c > > > > > @@ -888,7 +888,7 @@ static int fq_resize(struct Qdisc *sch, u32 log) > > > > > fq_rehash(q, old_fq_root, q->fq_trees_log, array, log); > > > > > > > > > > q->fq_root = array; > > > > > - q->fq_trees_log = log; > > > > > + WRITE_ONCE(q->fq_trees_log, log); > > > > > > > > > > sch_tree_unlock(sch); > > > > > > > > > > @@ -931,7 +931,7 @@ static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) > > > > > > > > > > memset(out, 0, num_elems / 4); > > > > > for (i = 0; i < num_elems; i++) > > > > > - out[i / 4] |= in[i] << (2 * (i & 0x3)); > > > > > + out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); > > > > > } > > > > > > > > > > > > > Hi Eric, > > > > > > > > I am a little unsure about the handling of q->prio2band in this patch. > > > > > > > > It seems to me that fq_prio2band_compress_crumb() is used to > > > > to store values in q->prio2band, and is called (indirectly) > > > > from fq_change() (and directly from fq_init()). > > > > > > > > While fq_prio2band_decompress_crumb() is used to read values > > > > from q->prio2band, and is called from fq_dump(). > > > > > > > > So I am wondering if should use WRITE_ONCE() when storing elements > > > > of out. And fq_prio2band_decompress_crumb should use READ_ONCE when > > > > reading elements of in. > > > > > > Yeah, you are probably right, I recall being a bit lazy on this part, > > > thanks ! > > > > I will squash in V2 this part : > > > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > > index 934c220b3f4336dc2f70af74d7758218492b675d..238974725679327b0a0d483c011e15fc94ab0878 > > 100644 > > --- a/net/sched/sch_fq.c > > +++ b/net/sched/sch_fq.c > > @@ -106,6 +106,8 @@ struct fq_perband_flows { > > int quantum; /* based on band nr : 576KB, 192KB, 64KB */ > > }; > > > > +#define FQ_PRIO2BAND_CRUMB_SIZE ((TC_PRIO_MAX + 1) >> 2) > > + > > struct fq_sched_data { > > /* Read mostly cache line */ > > > > @@ -122,7 +124,7 @@ struct fq_sched_data { > > u8 rate_enable; > > u8 fq_trees_log; > > u8 horizon_drop; > > - u8 prio2band[(TC_PRIO_MAX + 1) >> 2]; > > + u8 prio2band[FQ_PRIO2BAND_CRUMB_SIZE]; > > u32 timer_slack; /* hrtimer slack in ns */ > > > > /* Read/Write fields. */ > > @@ -159,7 +161,7 @@ struct fq_sched_data { > > /* return the i-th 2-bit value ("crumb") */ > > static u8 fq_prio2band(const u8 *prio2band, unsigned int prio) > > { > > - return (prio2band[prio / 4] >> (2 * (prio & 0x3))) & 0x3; > > + return (READ_ONCE(prio2band[prio / 4]) >> (2 * (prio & 0x3))) & 0x3; > > } > > Thanks Eric, > > assuming that it is ok for this version of fq_prio2band() to run > from fq_enqueue(), this update looks good to me. It is ok, a READ_ONCE() here is not adding any constraint on compiler output.
On Wed, Apr 17, 2024 at 11:02:55AM +0200, Eric Dumazet wrote: > On Wed, Apr 17, 2024 at 11:00 AM Simon Horman <horms@kernel.org> wrote: > > > > On Wed, Apr 17, 2024 at 10:45:09AM +0200, Eric Dumazet wrote: > > > On Tue, Apr 16, 2024 at 8:33 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Tue, Apr 16, 2024 at 8:19 PM Simon Horman <horms@kernel.org> wrote: > > > > > > > > > > On Mon, Apr 15, 2024 at 01:20:41PM +0000, Eric Dumazet wrote: > > > > > > Instead of relying on RTNL, fq_dump() can use READ_ONCE() > > > > > > annotations, paired with WRITE_ONCE() in fq_change() > > > > > > > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > > > --- > > > > > > net/sched/sch_fq.c | 96 +++++++++++++++++++++++++++++----------------- > > > > > > 1 file changed, 60 insertions(+), 36 deletions(-) > > > > > > > > > > > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > > > > > > index cdf23ff16f40bf244bb822e76016fde44e0c439b..934c220b3f4336dc2f70af74d7758218492b675d 100644 > > > > > > --- a/net/sched/sch_fq.c > > > > > > +++ b/net/sched/sch_fq.c > > > > > > @@ -888,7 +888,7 @@ static int fq_resize(struct Qdisc *sch, u32 log) > > > > > > fq_rehash(q, old_fq_root, q->fq_trees_log, array, log); > > > > > > > > > > > > q->fq_root = array; > > > > > > - q->fq_trees_log = log; > > > > > > + WRITE_ONCE(q->fq_trees_log, log); > > > > > > > > > > > > sch_tree_unlock(sch); > > > > > > > > > > > > @@ -931,7 +931,7 @@ static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) > > > > > > > > > > > > memset(out, 0, num_elems / 4); > > > > > > for (i = 0; i < num_elems; i++) > > > > > > - out[i / 4] |= in[i] << (2 * (i & 0x3)); > > > > > > + out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); > > > > > > } > > > > > > > > > > > > > > > > Hi Eric, > > > > > > > > > > I am a little unsure about the handling of q->prio2band in this patch. > > > > > > > > > > It seems to me that fq_prio2band_compress_crumb() is used to > > > > > to store values in q->prio2band, and is called (indirectly) > > > > > from fq_change() (and directly from fq_init()). > > > > > > > > > > While fq_prio2band_decompress_crumb() is used to read values > > > > > from q->prio2band, and is called from fq_dump(). > > > > > > > > > > So I am wondering if should use WRITE_ONCE() when storing elements > > > > > of out. And fq_prio2band_decompress_crumb should use READ_ONCE when > > > > > reading elements of in. > > > > > > > > Yeah, you are probably right, I recall being a bit lazy on this part, > > > > thanks ! > > > > > > I will squash in V2 this part : > > > > > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c > > > index 934c220b3f4336dc2f70af74d7758218492b675d..238974725679327b0a0d483c011e15fc94ab0878 > > > 100644 > > > --- a/net/sched/sch_fq.c > > > +++ b/net/sched/sch_fq.c > > > @@ -106,6 +106,8 @@ struct fq_perband_flows { > > > int quantum; /* based on band nr : 576KB, 192KB, 64KB */ > > > }; > > > > > > +#define FQ_PRIO2BAND_CRUMB_SIZE ((TC_PRIO_MAX + 1) >> 2) > > > + > > > struct fq_sched_data { > > > /* Read mostly cache line */ > > > > > > @@ -122,7 +124,7 @@ struct fq_sched_data { > > > u8 rate_enable; > > > u8 fq_trees_log; > > > u8 horizon_drop; > > > - u8 prio2band[(TC_PRIO_MAX + 1) >> 2]; > > > + u8 prio2band[FQ_PRIO2BAND_CRUMB_SIZE]; > > > u32 timer_slack; /* hrtimer slack in ns */ > > > > > > /* Read/Write fields. */ > > > @@ -159,7 +161,7 @@ struct fq_sched_data { > > > /* return the i-th 2-bit value ("crumb") */ > > > static u8 fq_prio2band(const u8 *prio2band, unsigned int prio) > > > { > > > - return (prio2band[prio / 4] >> (2 * (prio & 0x3))) & 0x3; > > > + return (READ_ONCE(prio2band[prio / 4]) >> (2 * (prio & 0x3))) & 0x3; > > > } > > > > Thanks Eric, > > > > assuming that it is ok for this version of fq_prio2band() to run > > from fq_enqueue(), this update looks good to me. > > It is ok, a READ_ONCE() here is not adding any constraint on compiler output. Thanks for confirming. This looks good to me.
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index cdf23ff16f40bf244bb822e76016fde44e0c439b..934c220b3f4336dc2f70af74d7758218492b675d 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -888,7 +888,7 @@ static int fq_resize(struct Qdisc *sch, u32 log) fq_rehash(q, old_fq_root, q->fq_trees_log, array, log); q->fq_root = array; - q->fq_trees_log = log; + WRITE_ONCE(q->fq_trees_log, log); sch_tree_unlock(sch); @@ -931,7 +931,7 @@ static void fq_prio2band_compress_crumb(const u8 *in, u8 *out) memset(out, 0, num_elems / 4); for (i = 0; i < num_elems; i++) - out[i / 4] |= in[i] << (2 * (i & 0x3)); + out[i / 4] |= READ_ONCE(in[i]) << (2 * (i & 0x3)); } static void fq_prio2band_decompress_crumb(const u8 *in, u8 *out) @@ -958,7 +958,7 @@ static int fq_load_weights(struct fq_sched_data *q, } } for (i = 0; i < FQ_BANDS; i++) - q->band_flows[i].quantum = weights[i]; + WRITE_ONCE(q->band_flows[i].quantum, weights[i]); return 0; } @@ -1011,16 +1011,18 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt, err = -EINVAL; } if (tb[TCA_FQ_PLIMIT]) - sch->limit = nla_get_u32(tb[TCA_FQ_PLIMIT]); + WRITE_ONCE(sch->limit, + nla_get_u32(tb[TCA_FQ_PLIMIT])); if (tb[TCA_FQ_FLOW_PLIMIT]) - q->flow_plimit = nla_get_u32(tb[TCA_FQ_FLOW_PLIMIT]); + WRITE_ONCE(q->flow_plimit, + nla_get_u32(tb[TCA_FQ_FLOW_PLIMIT])); if (tb[TCA_FQ_QUANTUM]) { u32 quantum = nla_get_u32(tb[TCA_FQ_QUANTUM]); if (quantum > 0 && quantum <= (1 << 20)) { - q->quantum = quantum; + WRITE_ONCE(q->quantum, quantum); } else { NL_SET_ERR_MSG_MOD(extack, "invalid quantum"); err = -EINVAL; @@ -1028,7 +1030,8 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt, } if (tb[TCA_FQ_INITIAL_QUANTUM]) - q->initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]); + WRITE_ONCE(q->initial_quantum, + nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM])); if (tb[TCA_FQ_FLOW_DEFAULT_RATE]) pr_warn_ratelimited("sch_fq: defrate %u ignored.\n", @@ -1037,17 +1040,19 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt, if (tb[TCA_FQ_FLOW_MAX_RATE]) { u32 rate = nla_get_u32(tb[TCA_FQ_FLOW_MAX_RATE]); - q->flow_max_rate = (rate == ~0U) ? ~0UL : rate; + WRITE_ONCE(q->flow_max_rate, + (rate == ~0U) ? ~0UL : rate); } if (tb[TCA_FQ_LOW_RATE_THRESHOLD]) - q->low_rate_threshold = - nla_get_u32(tb[TCA_FQ_LOW_RATE_THRESHOLD]); + WRITE_ONCE(q->low_rate_threshold, + nla_get_u32(tb[TCA_FQ_LOW_RATE_THRESHOLD])); if (tb[TCA_FQ_RATE_ENABLE]) { u32 enable = nla_get_u32(tb[TCA_FQ_RATE_ENABLE]); if (enable <= 1) - q->rate_enable = enable; + WRITE_ONCE(q->rate_enable, + enable); else err = -EINVAL; } @@ -1055,7 +1060,8 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt, if (tb[TCA_FQ_FLOW_REFILL_DELAY]) { u32 usecs_delay = nla_get_u32(tb[TCA_FQ_FLOW_REFILL_DELAY]) ; - q->flow_refill_delay = usecs_to_jiffies(usecs_delay); + WRITE_ONCE(q->flow_refill_delay, + usecs_to_jiffies(usecs_delay)); } if (!err && tb[TCA_FQ_PRIOMAP]) @@ -1065,21 +1071,26 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt, err = fq_load_weights(q, tb[TCA_FQ_WEIGHTS], extack); if (tb[TCA_FQ_ORPHAN_MASK]) - q->orphan_mask = nla_get_u32(tb[TCA_FQ_ORPHAN_MASK]); + WRITE_ONCE(q->orphan_mask, + nla_get_u32(tb[TCA_FQ_ORPHAN_MASK])); if (tb[TCA_FQ_CE_THRESHOLD]) - q->ce_threshold = (u64)NSEC_PER_USEC * - nla_get_u32(tb[TCA_FQ_CE_THRESHOLD]); + WRITE_ONCE(q->ce_threshold, + (u64)NSEC_PER_USEC * + nla_get_u32(tb[TCA_FQ_CE_THRESHOLD])); if (tb[TCA_FQ_TIMER_SLACK]) - q->timer_slack = nla_get_u32(tb[TCA_FQ_TIMER_SLACK]); + WRITE_ONCE(q->timer_slack, + nla_get_u32(tb[TCA_FQ_TIMER_SLACK])); if (tb[TCA_FQ_HORIZON]) - q->horizon = (u64)NSEC_PER_USEC * - nla_get_u32(tb[TCA_FQ_HORIZON]); + WRITE_ONCE(q->horizon, + (u64)NSEC_PER_USEC * + nla_get_u32(tb[TCA_FQ_HORIZON])); if (tb[TCA_FQ_HORIZON_DROP]) - q->horizon_drop = nla_get_u8(tb[TCA_FQ_HORIZON_DROP]); + WRITE_ONCE(q->horizon_drop, + nla_get_u8(tb[TCA_FQ_HORIZON_DROP])); if (!err) { @@ -1160,13 +1171,13 @@ static int fq_init(struct Qdisc *sch, struct nlattr *opt, static int fq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct fq_sched_data *q = qdisc_priv(sch); - u64 ce_threshold = q->ce_threshold; struct tc_prio_qopt prio = { .bands = FQ_BANDS, }; - u64 horizon = q->horizon; struct nlattr *opts; + u64 ce_threshold; s32 weights[3]; + u64 horizon; opts = nla_nest_start_noflag(skb, TCA_OPTIONS); if (opts == NULL) @@ -1174,35 +1185,48 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb) /* TCA_FQ_FLOW_DEFAULT_RATE is not used anymore */ + ce_threshold = READ_ONCE(q->ce_threshold); do_div(ce_threshold, NSEC_PER_USEC); + + horizon = READ_ONCE(q->horizon); do_div(horizon, NSEC_PER_USEC); - if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) || - nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) || - nla_put_u32(skb, TCA_FQ_QUANTUM, q->quantum) || - nla_put_u32(skb, TCA_FQ_INITIAL_QUANTUM, q->initial_quantum) || - nla_put_u32(skb, TCA_FQ_RATE_ENABLE, q->rate_enable) || + if (nla_put_u32(skb, TCA_FQ_PLIMIT, + READ_ONCE(sch->limit)) || + nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, + READ_ONCE(q->flow_plimit)) || + nla_put_u32(skb, TCA_FQ_QUANTUM, + READ_ONCE(q->quantum)) || + nla_put_u32(skb, TCA_FQ_INITIAL_QUANTUM, + READ_ONCE(q->initial_quantum)) || + nla_put_u32(skb, TCA_FQ_RATE_ENABLE, + READ_ONCE(q->rate_enable)) || nla_put_u32(skb, TCA_FQ_FLOW_MAX_RATE, - min_t(unsigned long, q->flow_max_rate, ~0U)) || + min_t(unsigned long, + READ_ONCE(q->flow_max_rate), ~0U)) || nla_put_u32(skb, TCA_FQ_FLOW_REFILL_DELAY, - jiffies_to_usecs(q->flow_refill_delay)) || - nla_put_u32(skb, TCA_FQ_ORPHAN_MASK, q->orphan_mask) || + jiffies_to_usecs(READ_ONCE(q->flow_refill_delay))) || + nla_put_u32(skb, TCA_FQ_ORPHAN_MASK, + READ_ONCE(q->orphan_mask)) || nla_put_u32(skb, TCA_FQ_LOW_RATE_THRESHOLD, - q->low_rate_threshold) || + READ_ONCE(q->low_rate_threshold)) || nla_put_u32(skb, TCA_FQ_CE_THRESHOLD, (u32)ce_threshold) || - nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log) || - nla_put_u32(skb, TCA_FQ_TIMER_SLACK, q->timer_slack) || + nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, + READ_ONCE(q->fq_trees_log)) || + nla_put_u32(skb, TCA_FQ_TIMER_SLACK, + READ_ONCE(q->timer_slack)) || nla_put_u32(skb, TCA_FQ_HORIZON, (u32)horizon) || - nla_put_u8(skb, TCA_FQ_HORIZON_DROP, q->horizon_drop)) + nla_put_u8(skb, TCA_FQ_HORIZON_DROP, + READ_ONCE(q->horizon_drop))) goto nla_put_failure; fq_prio2band_decompress_crumb(q->prio2band, prio.priomap); if (nla_put(skb, TCA_FQ_PRIOMAP, sizeof(prio), &prio)) goto nla_put_failure; - weights[0] = q->band_flows[0].quantum; - weights[1] = q->band_flows[1].quantum; - weights[2] = q->band_flows[2].quantum; + weights[0] = READ_ONCE(q->band_flows[0].quantum); + weights[1] = READ_ONCE(q->band_flows[1].quantum); + weights[2] = READ_ONCE(q->band_flows[2].quantum); if (nla_put(skb, TCA_FQ_WEIGHTS, sizeof(weights), &weights)) goto nla_put_failure;
Instead of relying on RTNL, fq_dump() can use READ_ONCE() annotations, paired with WRITE_ONCE() in fq_change() Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/sched/sch_fq.c | 96 +++++++++++++++++++++++++++++----------------- 1 file changed, 60 insertions(+), 36 deletions(-)