Message ID | 20250128075250.11500-1-sooraj20636@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/bdi: fix race between cgwb_create and conflicting blkcg associations | expand |
On Tue, 28 Jan 2025 02:52:50 -0500 sooraj <sooraj20636@gmail.com> wrote: > Ensure cgwb (cgroup writeback) structures are uniquely associated with a > memcg-blkcg pair to prevent inconsistencies when concurrent cgwb_create > calls race. This resolves a scenario where two threads creating cgwbs > for the same memory cgroup (memcg) but different I/O control groups (blkcg) > could insert conflicting entries. > > The fix rechecks for existing cgwbs under the cgwb_lock spinlock after > initial creation. If a conflicting cgwb (same memcg, different blkcg) is > found, it is killed before inserting the new entry. This guarantees a > 1:1 relationship between memcg-blkcg pairs and their cgwbs, preserving > system invariants. Thanks. This looks sensible, but it would be best to bring it to Tejun's attention. I assume that this race has been observed in the real world? If so, please fully describe the circumstances under which it occurred, and describe the userspace-visible effects. Probably a "Cc: <stable@vger.kernel.org>" is appropriate. And it looks like the offending code is so old that a Fixes: won't be needed. > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > @@ -723,24 +723,39 @@ static int cgwb_create(struct backing_dev_info *bdi, > spin_lock_irqsave(&cgwb_lock, flags); > if (test_bit(WB_registered, &bdi->wb.state) && > blkcg_cgwb_list->next && memcg_cgwb_list->next) { > - /* we might have raced another instance of this function */ > - ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb); > - if (!ret) { > - list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list); > - list_add(&wb->memcg_node, memcg_cgwb_list); > - list_add(&wb->blkcg_node, blkcg_cgwb_list); > - blkcg_pin_online(blkcg_css); > - css_get(memcg_css); > - css_get(blkcg_css); > + /* Re-check under lock to handle races */ > + struct bdi_writeback *existing; > + > + existing = radix_tree_lookup(&bdi->cgwb_tree, memcg_css->id); > + if (existing) { > + if (existing->blkcg_css != blkcg_css) { > + cgwb_kill(existing); > + existing = NULL; > + } else { > + ret = 0; /* Already exists, treat as success */ > + } > + } > + > + if (!existing) { > + ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb); > + if (!ret) { > + list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list); > + list_add(&wb->memcg_node, memcg_cgwb_list); > + list_add(&wb->blkcg_node, blkcg_cgwb_list); > + blkcg_pin_online(blkcg_css); > + css_get(memcg_css); > + css_get(blkcg_css); > + } > } > } > spin_unlock_irqrestore(&cgwb_lock, flags); > - if (ret) { > - if (ret == -EEXIST) > - ret = 0; > + > + if (!ret) > + goto out_put; > + if (ret == -EEXIST) > + ret = 0; /* Lost race, another thread created the same wb */ > + else > goto err_fprop_exit; > - } > - goto out_put; > > err_fprop_exit: > bdi_put(bdi);
Hello, On Mon, Jan 27, 2025 at 04:53:11PM -0800, Andrew Morton wrote: > On Tue, 28 Jan 2025 02:52:50 -0500 sooraj <sooraj20636@gmail.com> wrote: > > > Ensure cgwb (cgroup writeback) structures are uniquely associated with a > > memcg-blkcg pair to prevent inconsistencies when concurrent cgwb_create > > calls race. This resolves a scenario where two threads creating cgwbs > > for the same memory cgroup (memcg) but different I/O control groups (blkcg) > > could insert conflicting entries. > > > > The fix rechecks for existing cgwbs under the cgwb_lock spinlock after > > initial creation. If a conflicting cgwb (same memcg, different blkcg) is > > found, it is killed before inserting the new entry. This guarantees a > > 1:1 relationship between memcg-blkcg pairs and their cgwbs, preserving > > system invariants. I'm a bit confused. Radix tree doesn't allow two entries to be inserted on the same key and the tree is keyed by memcg_id. Wouldn't that automatically guarantee 1:1 relationship? Thanks.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c index e61bbb1bd622..67acb565e9a7 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -723,24 +723,39 @@ static int cgwb_create(struct backing_dev_info *bdi, spin_lock_irqsave(&cgwb_lock, flags); if (test_bit(WB_registered, &bdi->wb.state) && blkcg_cgwb_list->next && memcg_cgwb_list->next) { - /* we might have raced another instance of this function */ - ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb); - if (!ret) { - list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list); - list_add(&wb->memcg_node, memcg_cgwb_list); - list_add(&wb->blkcg_node, blkcg_cgwb_list); - blkcg_pin_online(blkcg_css); - css_get(memcg_css); - css_get(blkcg_css); + /* Re-check under lock to handle races */ + struct bdi_writeback *existing; + + existing = radix_tree_lookup(&bdi->cgwb_tree, memcg_css->id); + if (existing) { + if (existing->blkcg_css != blkcg_css) { + cgwb_kill(existing); + existing = NULL; + } else { + ret = 0; /* Already exists, treat as success */ + } + } + + if (!existing) { + ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb); + if (!ret) { + list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list); + list_add(&wb->memcg_node, memcg_cgwb_list); + list_add(&wb->blkcg_node, blkcg_cgwb_list); + blkcg_pin_online(blkcg_css); + css_get(memcg_css); + css_get(blkcg_css); + } } } spin_unlock_irqrestore(&cgwb_lock, flags); - if (ret) { - if (ret == -EEXIST) - ret = 0; + + if (!ret) + goto out_put; + if (ret == -EEXIST) + ret = 0; /* Lost race, another thread created the same wb */ + else goto err_fprop_exit; - } - goto out_put; err_fprop_exit: bdi_put(bdi);
Ensure cgwb (cgroup writeback) structures are uniquely associated with a memcg-blkcg pair to prevent inconsistencies when concurrent cgwb_create calls race. This resolves a scenario where two threads creating cgwbs for the same memory cgroup (memcg) but different I/O control groups (blkcg) could insert conflicting entries. The fix rechecks for existing cgwbs under the cgwb_lock spinlock after initial creation. If a conflicting cgwb (same memcg, different blkcg) is found, it is killed before inserting the new entry. This guarantees a 1:1 relationship between memcg-blkcg pairs and their cgwbs, preserving system invariants. Signed-off-by: sooraj <sooraj20636@gmail.com> --- mm/backing-dev.c | 43 +++++++++++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 14 deletions(-)