diff mbox series

mm/bdi: fix race between cgwb_create and conflicting blkcg associations

Message ID 20250128075250.11500-1-sooraj20636@gmail.com (mailing list archive)
State New
Headers show
Series mm/bdi: fix race between cgwb_create and conflicting blkcg associations | expand

Commit Message

sooraj Jan. 28, 2025, 7:52 a.m. UTC
Ensure cgwb (cgroup writeback) structures are uniquely associated with a
memcg-blkcg pair to prevent inconsistencies when concurrent cgwb_create
calls race. This resolves a scenario where two threads creating cgwbs
for the same memory cgroup (memcg) but different I/O control groups (blkcg)
could insert conflicting entries.

The fix rechecks for existing cgwbs under the cgwb_lock spinlock after
initial creation. If a conflicting cgwb (same memcg, different blkcg) is
found, it is killed before inserting the new entry. This guarantees a
1:1 relationship between memcg-blkcg pairs and their cgwbs, preserving
system invariants.

Signed-off-by: sooraj <sooraj20636@gmail.com>
---
 mm/backing-dev.c | 43 +++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 14 deletions(-)

Comments

Andrew Morton Jan. 28, 2025, 12:53 a.m. UTC | #1
On Tue, 28 Jan 2025 02:52:50 -0500 sooraj <sooraj20636@gmail.com> wrote:

> Ensure cgwb (cgroup writeback) structures are uniquely associated with a
> memcg-blkcg pair to prevent inconsistencies when concurrent cgwb_create
> calls race. This resolves a scenario where two threads creating cgwbs
> for the same memory cgroup (memcg) but different I/O control groups (blkcg)
> could insert conflicting entries.
> 
> The fix rechecks for existing cgwbs under the cgwb_lock spinlock after
> initial creation. If a conflicting cgwb (same memcg, different blkcg) is
> found, it is killed before inserting the new entry. This guarantees a
> 1:1 relationship between memcg-blkcg pairs and their cgwbs, preserving
> system invariants.

Thanks.

This looks sensible, but it would be best to bring it to Tejun's attention.

I assume that this race has been observed in the real world?  If so,
please fully describe the circumstances under which it occurred, and
describe the userspace-visible effects.

Probably a "Cc: <stable@vger.kernel.org>" is appropriate.  And it looks
like the offending code is so old that a Fixes: won't be needed.

> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -723,24 +723,39 @@ static int cgwb_create(struct backing_dev_info *bdi,
>  	spin_lock_irqsave(&cgwb_lock, flags);
>  	if (test_bit(WB_registered, &bdi->wb.state) &&
>  	    blkcg_cgwb_list->next && memcg_cgwb_list->next) {
> -		/* we might have raced another instance of this function */
> -		ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb);
> -		if (!ret) {
> -			list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
> -			list_add(&wb->memcg_node, memcg_cgwb_list);
> -			list_add(&wb->blkcg_node, blkcg_cgwb_list);
> -			blkcg_pin_online(blkcg_css);
> -			css_get(memcg_css);
> -			css_get(blkcg_css);
> +		/* Re-check under lock to handle races */
> +		struct bdi_writeback *existing;
> +
> +		existing = radix_tree_lookup(&bdi->cgwb_tree, memcg_css->id);
> +		if (existing) {
> +			if (existing->blkcg_css != blkcg_css) {
> +				cgwb_kill(existing);
> +				existing = NULL;
> +			} else {
> +				ret = 0; /* Already exists, treat as success */
> +			}
> +		}
> +
> +		if (!existing) {
> +			ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb);
> +			if (!ret) {
> +				list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
> +				list_add(&wb->memcg_node, memcg_cgwb_list);
> +				list_add(&wb->blkcg_node, blkcg_cgwb_list);
> +				blkcg_pin_online(blkcg_css);
> +				css_get(memcg_css);
> +				css_get(blkcg_css);
> +			}
>  		}
>  	}
>  	spin_unlock_irqrestore(&cgwb_lock, flags);
> -	if (ret) {
> -		if (ret == -EEXIST)
> -			ret = 0;
> +
> +	if (!ret)
> +		goto out_put;
> +	if (ret == -EEXIST)
> +		ret = 0; /* Lost race, another thread created the same wb */
> +	else
>  		goto err_fprop_exit;
> -	}
> -	goto out_put;
>  
>  err_fprop_exit:
>  	bdi_put(bdi);
Tejun Heo Jan. 28, 2025, 8:48 p.m. UTC | #2
Hello,

On Mon, Jan 27, 2025 at 04:53:11PM -0800, Andrew Morton wrote:
> On Tue, 28 Jan 2025 02:52:50 -0500 sooraj <sooraj20636@gmail.com> wrote:
> 
> > Ensure cgwb (cgroup writeback) structures are uniquely associated with a
> > memcg-blkcg pair to prevent inconsistencies when concurrent cgwb_create
> > calls race. This resolves a scenario where two threads creating cgwbs
> > for the same memory cgroup (memcg) but different I/O control groups (blkcg)
> > could insert conflicting entries.
> > 
> > The fix rechecks for existing cgwbs under the cgwb_lock spinlock after
> > initial creation. If a conflicting cgwb (same memcg, different blkcg) is
> > found, it is killed before inserting the new entry. This guarantees a
> > 1:1 relationship between memcg-blkcg pairs and their cgwbs, preserving
> > system invariants.

I'm a bit confused. Radix tree doesn't allow two entries to be inserted on
the same key and the tree is keyed by memcg_id. Wouldn't that automatically
guarantee 1:1 relationship?

Thanks.
diff mbox series

Patch

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index e61bbb1bd622..67acb565e9a7 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -723,24 +723,39 @@  static int cgwb_create(struct backing_dev_info *bdi,
 	spin_lock_irqsave(&cgwb_lock, flags);
 	if (test_bit(WB_registered, &bdi->wb.state) &&
 	    blkcg_cgwb_list->next && memcg_cgwb_list->next) {
-		/* we might have raced another instance of this function */
-		ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb);
-		if (!ret) {
-			list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
-			list_add(&wb->memcg_node, memcg_cgwb_list);
-			list_add(&wb->blkcg_node, blkcg_cgwb_list);
-			blkcg_pin_online(blkcg_css);
-			css_get(memcg_css);
-			css_get(blkcg_css);
+		/* Re-check under lock to handle races */
+		struct bdi_writeback *existing;
+
+		existing = radix_tree_lookup(&bdi->cgwb_tree, memcg_css->id);
+		if (existing) {
+			if (existing->blkcg_css != blkcg_css) {
+				cgwb_kill(existing);
+				existing = NULL;
+			} else {
+				ret = 0; /* Already exists, treat as success */
+			}
+		}
+
+		if (!existing) {
+			ret = radix_tree_insert(&bdi->cgwb_tree, memcg_css->id, wb);
+			if (!ret) {
+				list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
+				list_add(&wb->memcg_node, memcg_cgwb_list);
+				list_add(&wb->blkcg_node, blkcg_cgwb_list);
+				blkcg_pin_online(blkcg_css);
+				css_get(memcg_css);
+				css_get(blkcg_css);
+			}
 		}
 	}
 	spin_unlock_irqrestore(&cgwb_lock, flags);
-	if (ret) {
-		if (ret == -EEXIST)
-			ret = 0;
+
+	if (!ret)
+		goto out_put;
+	if (ret == -EEXIST)
+		ret = 0; /* Lost race, another thread created the same wb */
+	else
 		goto err_fprop_exit;
-	}
-	goto out_put;
 
 err_fprop_exit:
 	bdi_put(bdi);