Message ID | 20240718084112.12202-1-boy.wu@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] blk-cgroup: Replace u64 sync with spinlock for iostat | expand |
Hello, On Thu, Jul 18, 2024 at 04:41:12PM +0800, boy.wu wrote: > static void blkg_clear_stat(struct blkcg_gq *blkg) > { > int cpu; > > +#if BITS_PER_LONG == 32 > + guard(raw_spinlock_irqsave)(&blkg_stat_lock); > +#endif Can you please collect the ifdefs into a single place? If guard can be used for that, that's great. If not, just spin_lock/unlock wrappers are fine too, but please collect them into a single place and add a comment explaining why this is necessary and why u64_sync isn't being used. Also, for blkg_clear_stat(), we're running a slight chance of clearing of iostat_cpu racing against state updates from the hot path. Given the circumstances - stat reset is an cgroup1-only feature which isn't used widely and a problematic interface to begin with, I believe this is a valid choice but it needs to be noted. Thanks.
On Thu, 2024-07-18 at 11:15 -1000, Tejun Heo wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Hello, > > On Thu, Jul 18, 2024 at 04:41:12PM +0800, boy.wu wrote: > > static void blkg_clear_stat(struct blkcg_gq *blkg) > > { > > int cpu; > > > > +#if BITS_PER_LONG == 32 > > +guard(raw_spinlock_irqsave)(&blkg_stat_lock); > > +#endif > > Can you please collect the ifdefs into a single place? If guard can > be used > for that, that's great. If not, just spin_lock/unlock wrappers are > fine too, > but please collect them into a single place and add a comment > explaining why > this is necessary and why u64_sync isn't being used. > If it is for readability, I think keeping using u64 sync instead of replacing it with spinlock is better, because what u64 sync protects is 64-bit data for 32-bit systems, while spinlock can be used for many other reasons. The root cause of this issue is just the incorrect use of u64 sync. Adding back the missing spinlock for the correct usage of u64 sync is simpler. Is there any benefit to replacing u64 sync with spinlock? > Also, for blkg_clear_stat(), we're running a slight chance of > clearing of > iostat_cpu racing against state updates from the hot path. Given the > circumstances - stat reset is an cgroup1-only feature which isn't > used > widely and a problematic interface to begin with, I believe this is a > valid > choice but it needs to be noted. I don't get this part, but if this is another issue, maybe another patch would be better? > > Thanks. > > -- > tejun -- boy.wu
Hello, Boy. On Fri, Jul 19, 2024 at 01:47:35AM +0000, Boy Wu (吳勃誼) wrote: ... > If it is for readability, I think keeping using u64 sync instead of > replacing it with spinlock is better, because what u64 sync protects is > 64-bit data for 32-bit systems, while spinlock can be used for many > other reasons. The root cause of this issue is just the incorrect use Yeah, but it can't be used when there are multiple updaters. > of u64 sync. Adding back the missing spinlock for the correct usage of > u64 sync is simpler. Is there any benefit to replacing u64 sync with > spinlock? It doesn't make sense to protect u64_sync with a spinlock. Both are only needed on 32bit. What's the point of having both? Also, note that iostat_cpu is also updated from two paths - bio issue and stat reset. If you want to keep that u64_sync, the only way to avoid possible deadlocks is adding spinlock in the bio issue path too, which will be pretty expensive. > > Also, for blkg_clear_stat(), we're running a slight chance of > > clearing of > > iostat_cpu racing against state updates from the hot path. Given the > > circumstances - stat reset is an cgroup1-only feature which isn't > > used > > widely and a problematic interface to begin with, I believe this is a > > valid > > choice but it needs to be noted. > > I don't get this part, but if this is another issue, maybe another > patch would be better? It's the same issue. Reset is another writer and whenever you have more than one writers w/ u64_sync, there's a chance of deadlocks. So, iostat_cpu also has two writers - bio issue and stat reset. If you want to keep using u64_sync in both places, the only way to do it is adding spinlock protection to both paths, which is an expensive thing to do for the bio issue path. So, here, we'd rather just give up and let stat resetting be racy on 32bit. Thanks.
On Fri, 2024-07-19 at 07:31 -1000, tj@kernel.org wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Hello, Boy. > > On Fri, Jul 19, 2024 at 01:47:35AM +0000, Boy Wu (吳勃誼) wrote: > ... > > If it is for readability, I think keeping using u64 sync instead of > > replacing it with spinlock is better, because what u64 sync > protects is > > 64-bit data for 32-bit systems, while spinlock can be used for many > > other reasons. The root cause of this issue is just the incorrect > use > > Yeah, but it can't be used when there are multiple updaters. > > > of u64 sync. Adding back the missing spinlock for the correct usage > of > > u64 sync is simpler. Is there any benefit to replacing u64 sync > with > > spinlock? > > It doesn't make sense to protect u64_sync with a spinlock. Both are > only > needed on 32bit. What's the point of having both? Also, note that > iostat_cpu > is also updated from two paths - bio issue and stat reset. If you > want to > keep that u64_sync, the only way to avoid possible deadlocks is > adding > spinlock in the bio issue path too, which will be pretty expensive. > The use of a spinlock with u64 sync is suggested in include/linux/u64_stats_sync.h:33. * Usage : * * Stats producer (writer) should use following template granted it already got * an exclusive access to counters (a lock is already taken, or per cpu * data is used [in a non preemptable context]) * * spin_lock_bh(...) or other synchronization to get exclusive access * ... * u64_stats_update_begin(&stats->syncp); * u64_stats_add(&stats->bytes64, len); // non atomic operation * u64_stats_inc(&stats->packets64); // non atomic operation * u64_stats_update_end(&stats->syncp); Is this a incorrect statment? > > > Also, for blkg_clear_stat(), we're running a slight chance of > > > clearing of > > > iostat_cpu racing against state updates from the hot path. Given > the > > > circumstances - stat reset is an cgroup1-only feature which isn't > > > used > > > widely and a problematic interface to begin with, I believe this > is a > > > valid > > > choice but it needs to be noted. > > > > I don't get this part, but if this is another issue, maybe another > > patch would be better? > > It's the same issue. Reset is another writer and whenever you have > more than > one writers w/ u64_sync, there's a chance of deadlocks. So, > iostat_cpu also > has two writers - bio issue and stat reset. If you want to keep using > u64_sync in both places, the only way to do it is adding spinlock > protection > to both paths, which is an expensive thing to do for the bio issue > path. So, > here, we'd rather just give up and let stat resetting be racy on > 32bit. > > Thanks. > > -- > tejun There are three places update iostat and two places update iostat_cpu in blk-cgroup.c in version 6.10.1. I assume the suggestion in u64_stats_sync.h is correct. For readability, how about the following change? diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 85b3b9051455..7472cfa9a506 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -62,6 +62,10 @@ static struct workqueue_struct *blkcg_punt_bio_wq; static DEFINE_RAW_SPINLOCK(blkg_stat_lock); +#if BITS_PER_LONG == 32 +static DEFINE_RAW_SPINLOCK(blkg_stat_cpu_lock); +#endif + #define BLKG_DESTROY_BATCH_SIZE 64 /* @@ -535,5 +539,55 @@ static void blkg_destroy_all(struct gendisk *disk) spin_unlock_irq(&q->queue_lock); } +static inline unsigned long blkcg_iostats_update_begin_irqsave(struct u64_stats_sync *syncp) +{ + unsigned long flags; + +#if BITS_PER_LONG == 64 + flags = u64_stats_update_begin_irqsave(syncp); +#else /* 64 bit */ + raw_spin_lock_irqsave(&blkg_stat_lock, flags); + u64_stats_update_begin(syncp); +#endif + + return flags; +} + +static inline void blkcg_iostats_update_end_irqrestore(struct u64_stats_sync *syncp, + unsigned long flags) +{ +#if BITS_PER_LONG == 64 + u64_stats_update_end_irqrestore(syncp, flags); +#else /* 64 bit */ + u64_stats_update_end(syncp); + raw_spin_unlock_irqrestore(&blkg_stat_lock, flags); +#endif +} + +static unsigned long blkcg_iostats_cpu_update_begin_irqsave(struct u64_stats_sync *syncp) +{ + unsigned long flags; + +#if BITS_PER_LONG == 64 + flags = u64_stats_update_begin_irqsave(syncp); +#else /* 64 bit */ + raw_spin_lock_irqsave(&blkg_stat_cpu_lock, flags); + u64_stats_update_begin(syncp); +#endif + + return flags; +} + +static inline void blkcg_iostats_cpu_update_end_irqrestore(struct u64_stats_sync *syncp, + unsigned long flags) +{ +#if BITS_PER_LONG == 64 + u64_stats_update_end_irqrestore(syncp, flags); +#else /* 64 bit */ + u64_stats_update_end(syncp); + raw_spin_unlock_irqrestore(&blkg_stat_cpu_lock, flags); +#endif +} + static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src) { int i; @@ -632,24 +686,26 @@ static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src) static void __blkg_clear_stat(struct blkg_iostat_set *bis) { struct blkg_iostat cur = {0}; - unsigned long flags; - flags = u64_stats_update_begin_irqsave(&bis->sync); blkg_iostat_set(&bis->cur, &cur); blkg_iostat_set(&bis->last, &cur); - u64_stats_update_end_irqrestore(&bis->sync, flags); } static void blkg_clear_stat(struct blkcg_gq *blkg) { int cpu; + unsigned long flags; for_each_possible_cpu(cpu) { struct blkg_iostat_set *s = per_cpu_ptr(blkg- >iostat_cpu, cpu); + flags = blkcg_iostats_cpu_update_begin_irqsave(&s- >sync); __blkg_clear_stat(s); + blkcg_iostats_cpu_update_end_irqrestore(&s->sync, flags); } + flags = blkcg_iostats_update_begin_irqsave(&blkg->iostat.sync); __blkg_clear_stat(&blkg->iostat); + blkcg_iostats_update_end_irqrestore(&blkg->iostat.sync, flags); } static int blkcg_reset_stats(struct cgroup_subsys_state *css, @@ -1134,9 +1190,9 @@ static void blkcg_fill_root_iostats(void) cpu_dkstats->sectors[STAT_DISCARD] << 9; } - flags = u64_stats_update_begin_irqsave(&blkg- >iostat.sync); + flags = blkcg_iostats_update_begin_irqsave(&blkg- >iostat.sync); blkg_iostat_set(&blkg->iostat.cur, &tmp); - u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); + blkcg_iostats_update_end_irqrestore(&blkg->iostat.sync, flags); } } @@ -2152,7 +2208,7 @@ void blk_cgroup_bio_start(struct bio *bio) cpu = get_cpu(); bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu); - flags = u64_stats_update_begin_irqsave(&bis->sync); + flags = blkcg_iostats_cpu_update_begin_irqsave(&bis->sync); /* * If the bio is flagged with BIO_CGROUP_ACCT it means this is a split @@ -2175,7 +2231,7 @@ void blk_cgroup_bio_start(struct bio *bio) WRITE_ONCE(bis->lqueued, true); } - u64_stats_update_end_irqrestore(&bis->sync, flags); + blkcg_iostats_cpu_update_end_irqrestore(&bis->sync, flags); cgroup_rstat_updated(blkcg->css.cgroup, cpu); put_cpu(); } -- boy.wu
Hello, Boy. On Fri, Jul 26, 2024 at 03:43:27AM +0000, Boy Wu (吳勃誼) wrote: ... > The use of a spinlock with u64 sync is suggested in > include/linux/u64_stats_sync.h:33. > > * Usage : > * > * Stats producer (writer) should use following template granted it > already got > * an exclusive access to counters (a lock is already taken, or per cpu > * data is used [in a non preemptable context]) > * > * spin_lock_bh(...) or other synchronization to get exclusive access > * ... > * u64_stats_update_begin(&stats->syncp); > * u64_stats_add(&stats->bytes64, len); // non atomic operation > * u64_stats_inc(&stats->packets64); // non atomic operation > * u64_stats_update_end(&stats->syncp); > > Is this a incorrect statment? That's not incorrect and it'd make sense if we really want to use u64_sync - e.g. the reader is hot path. Here, just a spinlock would be simpler and do fine. Thanks.
On Tue, 2024-07-30 at 09:49 -1000, tj@kernel.org wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Hello, Boy. > > On Fri, Jul 26, 2024 at 03:43:27AM +0000, Boy Wu (吳勃誼) wrote: > ... > > The use of a spinlock with u64 sync is suggested in > > include/linux/u64_stats_sync.h:33. > > > > * Usage : > > * > > * Stats producer (writer) should use following template granted it > > already got > > * an exclusive access to counters (a lock is already taken, or per > cpu > > * data is used [in a non preemptable context]) > > * > > * spin_lock_bh(...) or other synchronization to get exclusive > access > > * ... > > * u64_stats_update_begin(&stats->syncp); > > * u64_stats_add(&stats->bytes64, len); // non atomic operation > > * u64_stats_inc(&stats->packets64); // non atomic operation > > * u64_stats_update_end(&stats->syncp); > > > > Is this a incorrect statment? > > That's not incorrect and it'd make sense if we really want to use > u64_sync - > e.g. the reader is hot path. Here, just a spinlock would be simpler > and do > fine. > > Thanks. > > -- > tejun u64_sync with spin lock has the benefit of locking only when writing iostat, but replacing u64_sync with spin lock will lock not only when writing iostat but also when reading iostat. Does it have enough benefit to replace u64_sync and add the cost of locking when reading iostat? -- Boy.Wu
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 37e6cc91d576..faa604c6fab9 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -329,7 +329,6 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct gendisk *disk, INIT_WORK(&blkg->async_bio_work, blkg_async_bio_workfn); #endif - u64_stats_init(&blkg->iostat.sync); for_each_possible_cpu(cpu) { u64_stats_init(&per_cpu_ptr(blkg->iostat_cpu, cpu)->sync); per_cpu_ptr(blkg->iostat_cpu, cpu)->blkg = blkg; @@ -632,18 +631,18 @@ static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src) static void __blkg_clear_stat(struct blkg_iostat_set *bis) { struct blkg_iostat cur = {0}; - unsigned long flags; - flags = u64_stats_update_begin_irqsave(&bis->sync); blkg_iostat_set(&bis->cur, &cur); blkg_iostat_set(&bis->last, &cur); - u64_stats_update_end_irqrestore(&bis->sync, flags); } static void blkg_clear_stat(struct blkcg_gq *blkg) { int cpu; +#if BITS_PER_LONG == 32 + guard(raw_spinlock_irqsave)(&blkg_stat_lock); +#endif for_each_possible_cpu(cpu) { struct blkg_iostat_set *s = per_cpu_ptr(blkg->iostat_cpu, cpu); @@ -995,15 +994,12 @@ static void blkcg_iostat_update(struct blkcg_gq *blkg, struct blkg_iostat *cur, struct blkg_iostat *last) { struct blkg_iostat delta; - unsigned long flags; /* propagate percpu delta to global */ - flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); blkg_iostat_set(&delta, cur); blkg_iostat_sub(&delta, last); blkg_iostat_add(&blkg->iostat.cur, &delta); blkg_iostat_add(last, &delta); - u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); } static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu) @@ -1034,7 +1030,6 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu) struct blkcg_gq *blkg = bisc->blkg; struct blkcg_gq *parent = blkg->parent; struct blkg_iostat cur; - unsigned int seq; /* * Order assignment of `next_bisc` from `bisc->lnode.next` in @@ -1051,10 +1046,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu) goto propagate_up; /* propagate up to parent only */ /* fetch the current per-cpu values */ - do { - seq = u64_stats_fetch_begin(&bisc->sync); - blkg_iostat_set(&cur, &bisc->cur); - } while (u64_stats_fetch_retry(&bisc->sync, seq)); + blkg_iostat_set(&cur, &bisc->cur); blkcg_iostat_update(blkg, &cur, &bisc->last); @@ -1112,7 +1104,6 @@ static void blkcg_fill_root_iostats(void) struct blkcg_gq *blkg = bdev->bd_disk->queue->root_blkg; struct blkg_iostat tmp; int cpu; - unsigned long flags; memset(&tmp, 0, sizeof(tmp)); for_each_possible_cpu(cpu) { @@ -1133,10 +1124,10 @@ static void blkcg_fill_root_iostats(void) tmp.bytes[BLKG_IOSTAT_DISCARD] += cpu_dkstats->sectors[STAT_DISCARD] << 9; } - - flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); +#if BITS_PER_LONG == 32 + guard(raw_spinlock_irqsave)(&blkg_stat_lock); +#endif blkg_iostat_set(&blkg->iostat.cur, &tmp); - u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); } } @@ -1145,7 +1136,6 @@ static void blkcg_print_one_stat(struct blkcg_gq *blkg, struct seq_file *s) struct blkg_iostat_set *bis = &blkg->iostat; u64 rbytes, wbytes, rios, wios, dbytes, dios; const char *dname; - unsigned seq; int i; if (!blkg->online) @@ -1157,16 +1147,18 @@ static void blkcg_print_one_stat(struct blkcg_gq *blkg, struct seq_file *s) seq_printf(s, "%s ", dname); - do { - seq = u64_stats_fetch_begin(&bis->sync); - +#if BITS_PER_LONG == 32 + scoped_guard(raw_spinlock_irqsave, &blkg_stat_lock) { +#endif rbytes = bis->cur.bytes[BLKG_IOSTAT_READ]; wbytes = bis->cur.bytes[BLKG_IOSTAT_WRITE]; dbytes = bis->cur.bytes[BLKG_IOSTAT_DISCARD]; rios = bis->cur.ios[BLKG_IOSTAT_READ]; wios = bis->cur.ios[BLKG_IOSTAT_WRITE]; dios = bis->cur.ios[BLKG_IOSTAT_DISCARD]; - } while (u64_stats_fetch_retry(&bis->sync, seq)); +#if BITS_PER_LONG == 32 + } +#endif if (rbytes || wbytes || rios || wios) { seq_printf(s, "rbytes=%llu wbytes=%llu rios=%llu wios=%llu dbytes=%llu dios=%llu",