diff mbox series

[v5] btrfs: Turn delayed_nodes_tree into an XArray

Message ID 20220419155721.6702-1-gniebler@suse.com (mailing list archive)
State New, archived
Headers show
Series [v5] btrfs: Turn delayed_nodes_tree into an XArray | expand

Commit Message

Gabriel Niebler April 19, 2022, 3:57 p.m. UTC
… in the btrfs_root struct and adjust all usages of this object to use the
XArray API, because it is notionally easier to use and unserstand, as it
provides array semantics, and also takes care of locking for us, further
simplifying the code.

Also use the opportunity to do some light refactoring.

Signed-off-by: Gabriel Niebler <gniebler@suse.com>
---

Changes from v4:
 - Fixed breaking syntax error

Changes from v3:
 - Replaced goto-label construct with do-while loop (Nikolay)
 - Replaced `break` with equivalent `return` for better understandability (Nikolay)
 - Made use of `delayed_nodes` array more efficient (Nikolay)

Changes from v2:
 - Fixed uninitialised index variable (Nikolay)
 - Fixed missing storage of node in array (Nikolay)
 - Improved commit message to motivate patch (David)

Changes from v1:
 - Reworked patch set into single patch (David)
 - New member name `delayed_nodes` is independent of data strutcture used (David)
 - Shortened commit message and made it start with 'btrfs:' (David)

---
 fs/btrfs/ctree.h         |  4 +-
 fs/btrfs/delayed-inode.c | 79 +++++++++++++++++++---------------------
 fs/btrfs/disk-io.c       |  2 +-
 fs/btrfs/inode.c         |  2 +-
 4 files changed, 42 insertions(+), 45 deletions(-)

Comments

Roman Mamedov April 19, 2022, 4:05 p.m. UTC | #1
On Tue, 19 Apr 2022 17:57:21 +0200
Gabriel Niebler <gniebler@suse.com> wrote:

> -	btrfs_init_delayed_node(node, root, ino);
> +        do {

The "do" line appears one character off to the right for me, and upon
examination that's because it uses spaces for indentation instead of TABs like
all other lines. Did not check if this is the only place.

There is "checkpatch.pl" which would catch issues like this:
https://www.kernel.org/doc/html/latest/dev-tools/checkpatch.html
David Sterba April 19, 2022, 8:23 p.m. UTC | #2
On Tue, Apr 19, 2022 at 09:05:51PM +0500, Roman Mamedov wrote:
> On Tue, 19 Apr 2022 17:57:21 +0200
> Gabriel Niebler <gniebler@suse.com> wrote:
> 
> > -	btrfs_init_delayed_node(node, root, ino);
> > +        do {
> 
> The "do" line appears one character off to the right for me, and upon
> examination that's because it uses spaces for indentation instead of TABs like
> all other lines. Did not check if this is the only place.

Thanks for catching it, git am did not complain about that so I checked
it manually, there was one more place, now fixed in my local copy.
Nikolay Borisov April 20, 2022, 7:01 a.m. UTC | #3
On 19.04.22 г. 23:23 ч., David Sterba wrote:
> On Tue, Apr 19, 2022 at 09:05:51PM +0500, Roman Mamedov wrote:
>> On Tue, 19 Apr 2022 17:57:21 +0200
>> Gabriel Niebler <gniebler@suse.com> wrote:
>>
>>> -	btrfs_init_delayed_node(node, root, ino);
>>> +        do {
>>
>> The "do" line appears one character off to the right for me, and upon
>> examination that's because it uses spaces for indentation instead of TABs like
>> all other lines. Did not check if this is the only place.
> 
> Thanks for catching it, git am did not complain about that so I checked
> it manually, there was one more place, now fixed in my local copy.
> 


checkpatch actually found some other stuff as well:

ERROR:CODE_INDENT: code indent should use tabs where possible
#65: FILE: fs/btrfs/delayed-inode.c:131:
+        do {$

WARNING:LEADING_SPACE: please, no spaces at the start of a line
#65: FILE: fs/btrfs/delayed-inode.c:131:
+        do {$

ERROR:CODE_INDENT: code indent should use tabs where possible
#100: FILE: fs/btrfs/delayed-inode.c:151:
+                }$

WARNING:LEADING_SPACE: please, no spaces at the start of a line
#100: FILE: fs/btrfs/delayed-inode.c:151:
+                }$

ERROR:CODE_INDENT: code indent should use tabs where possible
#154: FILE: fs/btrfs/delayed-inode.c:1888:
+                        }$

WARNING:LEADING_SPACE: please, no spaces at the start of a line
#154: FILE: fs/btrfs/delayed-inode.c:1888:
+                        }$
Nikolay Borisov April 20, 2022, 7:09 a.m. UTC | #4
On 19.04.22 г. 18:57 ч., Gabriel Niebler wrote:
> … in the btrfs_root struct and adjust all usages of this object to use the
> XArray API, because it is notionally easier to use and unserstand, as it
> provides array semantics, and also takes care of locking for us, further
> simplifying the code.
> 
> Also use the opportunity to do some light refactoring.
> 
> Signed-off-by: Gabriel Niebler <gniebler@suse.com>
> ---
> 

<snip>

> @@ -1870,29 +1863,33 @@ void btrfs_kill_delayed_inode_items(struct btrfs_inode *inode)
>   
>   void btrfs_kill_all_delayed_nodes(struct btrfs_root *root)
>   {
> -	u64 inode_id = 0;
> +	unsigned long index = 0;
> +	struct btrfs_delayed_node *delayed_node;
>   	struct btrfs_delayed_node *delayed_nodes[8];

Actually in order for the new code to be correct this array needs to be 
zero-initialized upon every iteration of the while() loop. That's 
because as it stands ATM delayed_nodes would retain the value from the 
previous iteration. What this could lead to is on the last iteration we 
potentially have 8 nodes in delayed_nodes which have been freed, so if 
in the last iteration of the xa_for_each_start we copy only 3 nodes the 
array would still have 8 nodes in total - 3 from the  last iteration and 
5 stale from the previous since they haven't been cleared out. So the 
final for() loop which does the freeing would incur a double free on the 
5 already-freed entries.

To fix this you either need to clear delayed_nodes[i] in the final for 
loop, so simply define it inside the while() like so:

struct btrfs_delayed_node *delayed_nodes[8] = {};

That would ensure that on every iteration of the while() loop only those 
entries which have been poopulated by the inner xa_for_each are non-null 
when the final for loop executes.

>   	int i, n;
>   
>   	while (1) {
>   		spin_lock(&root->inode_lock);
> -		n = radix_tree_gang_lookup(&root->delayed_nodes_tree,
> -					   (void **)delayed_nodes, inode_id,
> -					   ARRAY_SIZE(delayed_nodes));
> -		if (!n) {
> +		if (xa_empty(&root->delayed_nodes)) {
>   			spin_unlock(&root->inode_lock);
> -			break;
> +			return;
>   		}
>   
> -		inode_id = delayed_nodes[n - 1]->inode_id + 1;
> -		for (i = 0; i < n; i++) {
> +		n = 0;
> +		xa_for_each_start(&root->delayed_nodes, index,
> +				  delayed_node, index) {
>   			/*
>   			 * Don't increase refs in case the node is dead and
>   			 * about to be removed from the tree in the loop below
>   			 */
> -			if (!refcount_inc_not_zero(&delayed_nodes[i]->refs))
> -				delayed_nodes[i] = NULL;
> +			if (refcount_inc_not_zero(&delayed_node->refs)) {
> +				delayed_nodes[n] = delayed_node;
> +				n++;
> +                        }
> +			if (n >= ARRAY_SIZE(delayed_nodes))
> +				break;
>   		}
> +		index++;
>   		spin_unlock(&root->inode_lock);
>   
>   		for (i = 0; i < n; i++) {

<snip>
Gabriel Niebler April 25, 2022, 8:49 a.m. UTC | #5
Am 20.04.22 um 09:09 schrieb Nikolay Borisov:
> On 19.04.22 г. 18:57 ч., Gabriel Niebler wrote:
>> @@ -1870,29 +1863,33 @@ void btrfs_kill_delayed_inode_items(struct 
>> btrfs_inode *inode)
>>   void btrfs_kill_all_delayed_nodes(struct btrfs_root *root)
>>   {
>> -    u64 inode_id = 0;
>> +    unsigned long index = 0;
>> +    struct btrfs_delayed_node *delayed_node;
>>       struct btrfs_delayed_node *delayed_nodes[8];
> 
> Actually in order for the new code to be correct this array needs to be 
> zero-initialized upon every iteration of the while() loop. That's 
> because as it stands ATM delayed_nodes would retain the value from the 
> previous iteration. What this could lead to is on the last iteration we 
> potentially have 8 nodes in delayed_nodes which have been freed, so if 
> in the last iteration of the xa_for_each_start we copy only 3 nodes the 
> array would still have 8 nodes in total - 3 from the  last iteration and 
> 5 stale from the previous since they haven't been cleared out. So the 
> final for() loop which does the freeing would incur a double free on the 
> 5 already-freed entries.

Oh, you're right! I must not have thought about the *last* iteration, 
but yeah: that's a problem!

> To fix this you either need to clear delayed_nodes[i] in the final for 
> loop, so simply define it inside the while() like so:
> 
> struct btrfs_delayed_node *delayed_nodes[8] = {};

I will do that. Thanks for introducing me to this neat trick how to 
initialise an array to all NULLs, BTW. I admit I had to look it up, but 
now it'll save me from writing another loop...

<snip>

I'll also use the opportunity to fix the whitespace issues.

Sorry for the delay, BTW, don't know why I didn't see the message earlier.
Gabriel Niebler April 25, 2022, 9:28 a.m. UTC | #6
I looked at this again, to see how this obvious problem could have 
slipped by me, and found something interesting...

Am 20.04.22 um 09:09 schrieb Nikolay Borisov:
> On 19.04.22 г. 18:57 ч., Gabriel Niebler wrote:
>> … in the btrfs_root struct and adjust all usages of this object to use 
>> the
>> XArray API, because it is notionally easier to use and unserstand, as it
>> provides array semantics, and also takes care of locking for us, further
>> simplifying the code.
>>
>> Also use the opportunity to do some light refactoring.
>>
>> Signed-off-by: Gabriel Niebler <gniebler@suse.com>
>> ---
>>
> 
> <snip>
> 
>> @@ -1870,29 +1863,33 @@ void btrfs_kill_delayed_inode_items(struct 
>> btrfs_inode *inode)
>>   void btrfs_kill_all_delayed_nodes(struct btrfs_root *root)
>>   {
>> -    u64 inode_id = 0;
>> +    unsigned long index = 0;
>> +    struct btrfs_delayed_node *delayed_node;
>>       struct btrfs_delayed_node *delayed_nodes[8];
> 
> Actually in order for the new code to be correct this array needs to be 
> zero-initialized upon every iteration of the while() loop. That's 
> because as it stands ATM delayed_nodes would retain the value from the 
> previous iteration.

That is correct.

> What this could lead to is on the last iteration we 
> potentially have 8 nodes in delayed_nodes which have been freed, so if 
> in the last iteration of the xa_for_each_start we copy only 3 nodes the 
> array would still have 8 nodes in total - 3 from the  last iteration and 
> 5 stale from the previous since they haven't been cleared out.

Still correct and a similar situation could occur when there are fewer 
than 8 delayed nodes in the array to begin with. Then the uninitialised 
entries in the array would contain garbage...

> So the 
> final for() loop which does the freeing would incur a double free on the 
> 5 already-freed entries.

Except that it never gets there, because it only goes up to 'n' and n 
will have the value of the highest *actual* entry in _this_ iteration of 
the enclosing while-loop.

<snip>

>>       int i, n;
>>       while (1) {
>>           spin_lock(&root->inode_lock);
>> -        n = radix_tree_gang_lookup(&root->delayed_nodes_tree,
>> -                       (void **)delayed_nodes, inode_id,
>> -                       ARRAY_SIZE(delayed_nodes));
>> -        if (!n) {
>> +        if (xa_empty(&root->delayed_nodes)) {
>>               spin_unlock(&root->inode_lock);
>> -            break;
>> +            return;
>>           }
>> -        inode_id = delayed_nodes[n - 1]->inode_id + 1;
>> -        for (i = 0; i < n; i++) {
>> +        n = 0;
>> +        xa_for_each_start(&root->delayed_nodes, index,
>> +                  delayed_node, index) {
>>               /*
>>                * Don't increase refs in case the node is dead and
>>                * about to be removed from the tree in the loop below
>>                */
>> -            if (!refcount_inc_not_zero(&delayed_nodes[i]->refs))
>> -                delayed_nodes[i] = NULL;
>> +            if (refcount_inc_not_zero(&delayed_node->refs)) {
>> +                delayed_nodes[n] = delayed_node;
>> +                n++;
>> +                        }
>> +            if (n >= ARRAY_SIZE(delayed_nodes))
>> +                break;
>>           }
>> +        index++;
>>           spin_unlock(&root->inode_lock);
>>           for (i = 0; i < n; i++) {

There it is. This loop will never encounter old (already freed) or 
uninitialised entries, AFAICS.

I'd say that there actually isn't a problem with this code.

Now the question is whether the change should still be done. Declaring 
and initialising the array for each iteration incurs a little overhead, 
though, so I don't really see why it should be.

And if not, should I then resubmit purely for the whitespace changes?
David Sterba April 25, 2022, 2:55 p.m. UTC | #7
On Mon, Apr 25, 2022 at 11:28:59AM +0200, Gabriel Niebler wrote:
> I looked at this again, to see how this obvious problem could have 
> slipped by me, and found something interesting...
> 
> Am 20.04.22 um 09:09 schrieb Nikolay Borisov:
> > On 19.04.22 г. 18:57 ч., Gabriel Niebler wrote:
> >> … in the btrfs_root struct and adjust all usages of this object to use 
> >> the
> >> XArray API, because it is notionally easier to use and unserstand, as it
> >> provides array semantics, and also takes care of locking for us, further
> >> simplifying the code.
> >>
> >> Also use the opportunity to do some light refactoring.
> >>
> >> Signed-off-by: Gabriel Niebler <gniebler@suse.com>
> >> ---
> >>
> > 
> > <snip>
> > 
> >> @@ -1870,29 +1863,33 @@ void btrfs_kill_delayed_inode_items(struct 
> >> btrfs_inode *inode)
> >>   void btrfs_kill_all_delayed_nodes(struct btrfs_root *root)
> >>   {
> >> -    u64 inode_id = 0;
> >> +    unsigned long index = 0;
> >> +    struct btrfs_delayed_node *delayed_node;
> >>       struct btrfs_delayed_node *delayed_nodes[8];
> > 
> > Actually in order for the new code to be correct this array needs to be 
> > zero-initialized upon every iteration of the while() loop. That's 
> > because as it stands ATM delayed_nodes would retain the value from the 
> > previous iteration.
> 
> That is correct.
> 
> > What this could lead to is on the last iteration we 
> > potentially have 8 nodes in delayed_nodes which have been freed, so if 
> > in the last iteration of the xa_for_each_start we copy only 3 nodes the 
> > array would still have 8 nodes in total - 3 from the  last iteration and 
> > 5 stale from the previous since they haven't been cleared out.
> 
> Still correct and a similar situation could occur when there are fewer 
> than 8 delayed nodes in the array to begin with. Then the uninitialised 
> entries in the array would contain garbage...
> 
> > So the 
> > final for() loop which does the freeing would incur a double free on the 
> > 5 already-freed entries.
> 
> Except that it never gets there, because it only goes up to 'n' and n 
> will have the value of the highest *actual* entry in _this_ iteration of 
> the enclosing while-loop.
> 
> <snip>
> 
> >>       int i, n;
> >>       while (1) {
> >>           spin_lock(&root->inode_lock);
> >> -        n = radix_tree_gang_lookup(&root->delayed_nodes_tree,
> >> -                       (void **)delayed_nodes, inode_id,
> >> -                       ARRAY_SIZE(delayed_nodes));
> >> -        if (!n) {
> >> +        if (xa_empty(&root->delayed_nodes)) {
> >>               spin_unlock(&root->inode_lock);
> >> -            break;
> >> +            return;
> >>           }
> >> -        inode_id = delayed_nodes[n - 1]->inode_id + 1;
> >> -        for (i = 0; i < n; i++) {
> >> +        n = 0;
> >> +        xa_for_each_start(&root->delayed_nodes, index,
> >> +                  delayed_node, index) {
> >>               /*
> >>                * Don't increase refs in case the node is dead and
> >>                * about to be removed from the tree in the loop below
> >>                */
> >> -            if (!refcount_inc_not_zero(&delayed_nodes[i]->refs))
> >> -                delayed_nodes[i] = NULL;
> >> +            if (refcount_inc_not_zero(&delayed_node->refs)) {
> >> +                delayed_nodes[n] = delayed_node;
> >> +                n++;
> >> +                        }
> >> +            if (n >= ARRAY_SIZE(delayed_nodes))
> >> +                break;
> >>           }
> >> +        index++;
> >>           spin_unlock(&root->inode_lock);
> >>           for (i = 0; i < n; i++) {
> 
> There it is. This loop will never encounter old (already freed) or 
> uninitialised entries, AFAICS.
> 
> I'd say that there actually isn't a problem with this code.
> 
> Now the question is whether the change should still be done. Declaring 
> and initialising the array for each iteration incurs a little overhead, 
> though, so I don't really see why it should be.
> 
> And if not, should I then resubmit purely for the whitespace changes?

Please resend, there are several changes, thanks.
Nikolay Borisov April 26, 2022, 7:40 a.m. UTC | #8
On 25.04.22 г. 12:28 ч., Gabriel Niebler wrote:
<snip>

> 
>>>       int i, n;
>>>       while (1) {
>>>           spin_lock(&root->inode_lock);
>>> -        n = radix_tree_gang_lookup(&root->delayed_nodes_tree,
>>> -                       (void **)delayed_nodes, inode_id,
>>> -                       ARRAY_SIZE(delayed_nodes));
>>> -        if (!n) {
>>> +        if (xa_empty(&root->delayed_nodes)) {
>>>               spin_unlock(&root->inode_lock);
>>> -            break;
>>> +            return;
>>>           }
>>> -        inode_id = delayed_nodes[n - 1]->inode_id + 1;
>>> -        for (i = 0; i < n; i++) {
>>> +        n = 0;
>>> +        xa_for_each_start(&root->delayed_nodes, index,
>>> +                  delayed_node, index) {
>>>               /*
>>>                * Don't increase refs in case the node is dead and
>>>                * about to be removed from the tree in the loop below
>>>                */
>>> -            if (!refcount_inc_not_zero(&delayed_nodes[i]->refs))
>>> -                delayed_nodes[i] = NULL;
>>> +            if (refcount_inc_not_zero(&delayed_node->refs)) {
>>> +                delayed_nodes[n] = delayed_node;
>>> +                n++;
>>> +                        }
>>> +            if (n >= ARRAY_SIZE(delayed_nodes))
>>> +                break;
>>>           }
>>> +        index++;
>>>           spin_unlock(&root->inode_lock);
>>>           for (i = 0; i < n; i++) {
> 
> There it is. This loop will never encounter old (already freed) or 
> uninitialised entries, AFAICS.
> 
> I'd say that there actually isn't a problem with this code.

You are right, the code as-is is correct. I have 2 more suggestions but 
at this point they can be considered "bike shed" so you might as well 
disregard them in which case it's only the whitespace issue that remains 
and David could probably fix this. Here are the suggestions:

1. The 'n' variable could be defined inside the while() loop:

int n = 0;

That makes it obvious its lifetime is within the loop and it's always 
initialized to 0.

2. The kernel now moved to using -std=gnu11 (starting with e8c07082a810 
("Kbuild: move to -std=gnu11"))  meaning variables can now be defined 
inside for() loop bodies like:

for (int i = 0; i < n; i++)

IMO it would be good if we gradually start moving code to using this.


Again I consider those 2 minor and the patch as-is is good to go so it's 
up to David to decide if they are worth it.

Thanks for sticking with it.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>


<snip>
Gabriel Niebler April 26, 2022, 9:18 a.m. UTC | #9
Am 26.04.22 um 09:40 schrieb Nikolay Borisov:
> [...]
> You are right, the code as-is is correct. I have 2 more suggestions but 
> at this point they can be considered "bike shed" so you might as well 
> disregard them in which case it's only the whitespace issue that remains 
> and David could probably fix this. Here are the suggestions:
> 
> 1. The 'n' variable could be defined inside the while() loop:
> 
> int n = 0;
> 
> That makes it obvious its lifetime is within the loop and it's always 
> initialized to 0.
> 
> 2. The kernel now moved to using -std=gnu11 (starting with e8c07082a810 
> ("Kbuild: move to -std=gnu11"))  meaning variables can now be defined 
> inside for() loop bodies like:
> 
> for (int i = 0; i < n; i++)
> 
> IMO it would be good if we gradually start moving code to using this.

I like both of these suggestions and will implement them.

> Again I consider those 2 minor and the patch as-is is good to go so it's 
> up to David to decide if they are worth it.

David has already asked me to resubmit with the whitespace fixes in 
place, so I will just do that with your suggestions added in.

> Thanks for sticking with it.

My pleasure.
David Sterba April 26, 2022, 5:52 p.m. UTC | #10
On Tue, Apr 26, 2022 at 10:40:39AM +0300, Nikolay Borisov wrote:
> 2. The kernel now moved to using -std=gnu11 (starting with e8c07082a810 
> ("Kbuild: move to -std=gnu11"))  meaning variables can now be defined 
> inside for() loop bodies like:
> 
> for (int i = 0; i < n; i++)
> 
> IMO it would be good if we gradually start moving code to using this.

OK, I'm not aware of any problems we'd run into, the minimum compiler
version is old enough for all who care and fixups to change it back to
pre-C11 is easy.

We'd have to select a subset of the allowed features as not all of them
are free of controversy. The variable definition inside for() is OK.
diff mbox series

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b7631b88426e..9377dded9679 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1224,10 +1224,10 @@  struct btrfs_root {
 	struct rb_root inode_tree;
 
 	/*
-	 * radix tree that keeps track of delayed nodes of every inode,
+	 * XArray that keeps track of delayed nodes of every inode,
 	 * protected by inode_lock
 	 */
-	struct radix_tree_root delayed_nodes_tree;
+	struct xarray delayed_nodes;
 	/*
 	 * right now this just gets used so that a root has its own devid
 	 * for stat.  It may be used for more later
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 748bf6b0d860..d6dd3be5f08e 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -78,7 +78,7 @@  static struct btrfs_delayed_node *btrfs_get_delayed_node(
 	}
 
 	spin_lock(&root->inode_lock);
-	node = radix_tree_lookup(&root->delayed_nodes_tree, ino);
+	node = xa_load(&root->delayed_nodes, ino);
 
 	if (node) {
 		if (btrfs_inode->delayed_node) {
@@ -90,9 +90,9 @@  static struct btrfs_delayed_node *btrfs_get_delayed_node(
 
 		/*
 		 * It's possible that we're racing into the middle of removing
-		 * this node from the radix tree.  In this case, the refcount
+		 * this node from the XArray.  In this case, the refcount
 		 * was zero and it should never go back to one.  Just return
-		 * NULL like it was never in the radix at all; our release
+		 * NULL like it was never in the XArray at all; our release
 		 * function is in the process of removing it.
 		 *
 		 * Some implementations of refcount_inc refuse to bump the
@@ -100,7 +100,7 @@  static struct btrfs_delayed_node *btrfs_get_delayed_node(
 		 * here, refcount_inc() may decide to just WARN_ONCE() instead
 		 * of actually bumping the refcount.
 		 *
-		 * If this node is properly in the radix, we want to bump the
+		 * If this node is properly in the XArray, we want to bump the
 		 * refcount twice, once for the inode and once for this get
 		 * operation.
 		 */
@@ -128,36 +128,30 @@  static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
 	u64 ino = btrfs_ino(btrfs_inode);
 	int ret;
 
-again:
-	node = btrfs_get_delayed_node(btrfs_inode);
-	if (node)
-		return node;
-
-	node = kmem_cache_zalloc(delayed_node_cache, GFP_NOFS);
-	if (!node)
-		return ERR_PTR(-ENOMEM);
-	btrfs_init_delayed_node(node, root, ino);
+        do {
+		node = btrfs_get_delayed_node(btrfs_inode);
+		if (node)
+			return node;
 
-	/* cached in the btrfs inode and can be accessed */
-	refcount_set(&node->refs, 2);
+		node = kmem_cache_zalloc(delayed_node_cache, GFP_NOFS);
+		if (!node)
+			return ERR_PTR(-ENOMEM);
+		btrfs_init_delayed_node(node, root, ino);
 
-	ret = radix_tree_preload(GFP_NOFS);
-	if (ret) {
-		kmem_cache_free(delayed_node_cache, node);
-		return ERR_PTR(ret);
-	}
+		/* cached in the btrfs inode and can be accessed */
+		refcount_set(&node->refs, 2);
 
-	spin_lock(&root->inode_lock);
-	ret = radix_tree_insert(&root->delayed_nodes_tree, ino, node);
-	if (ret == -EEXIST) {
-		spin_unlock(&root->inode_lock);
-		kmem_cache_free(delayed_node_cache, node);
-		radix_tree_preload_end();
-		goto again;
-	}
+		spin_lock(&root->inode_lock);
+		ret = xa_insert(&root->delayed_nodes, ino, node, GFP_NOFS);
+		if (ret) {
+			spin_unlock(&root->inode_lock);
+			kmem_cache_free(delayed_node_cache, node);
+			if (ret != -EBUSY)
+				return ERR_PTR(ret);
+                }
+	} while (ret);
 	btrfs_inode->delayed_node = node;
 	spin_unlock(&root->inode_lock);
-	radix_tree_preload_end();
 
 	return node;
 }
@@ -276,8 +270,7 @@  static void __btrfs_release_delayed_node(
 		 * back up.  We can delete it now.
 		 */
 		ASSERT(refcount_read(&delayed_node->refs) == 0);
-		radix_tree_delete(&root->delayed_nodes_tree,
-				  delayed_node->inode_id);
+		xa_erase(&root->delayed_nodes, delayed_node->inode_id);
 		spin_unlock(&root->inode_lock);
 		kmem_cache_free(delayed_node_cache, delayed_node);
 	}
@@ -1870,29 +1863,33 @@  void btrfs_kill_delayed_inode_items(struct btrfs_inode *inode)
 
 void btrfs_kill_all_delayed_nodes(struct btrfs_root *root)
 {
-	u64 inode_id = 0;
+	unsigned long index = 0;
+	struct btrfs_delayed_node *delayed_node;
 	struct btrfs_delayed_node *delayed_nodes[8];
 	int i, n;
 
 	while (1) {
 		spin_lock(&root->inode_lock);
-		n = radix_tree_gang_lookup(&root->delayed_nodes_tree,
-					   (void **)delayed_nodes, inode_id,
-					   ARRAY_SIZE(delayed_nodes));
-		if (!n) {
+		if (xa_empty(&root->delayed_nodes)) {
 			spin_unlock(&root->inode_lock);
-			break;
+			return;
 		}
 
-		inode_id = delayed_nodes[n - 1]->inode_id + 1;
-		for (i = 0; i < n; i++) {
+		n = 0;
+		xa_for_each_start(&root->delayed_nodes, index,
+				  delayed_node, index) {
 			/*
 			 * Don't increase refs in case the node is dead and
 			 * about to be removed from the tree in the loop below
 			 */
-			if (!refcount_inc_not_zero(&delayed_nodes[i]->refs))
-				delayed_nodes[i] = NULL;
+			if (refcount_inc_not_zero(&delayed_node->refs)) {
+				delayed_nodes[n] = delayed_node;
+				n++;
+                        }
+			if (n >= ARRAY_SIZE(delayed_nodes))
+				break;
 		}
+		index++;
 		spin_unlock(&root->inode_lock);
 
 		for (i = 0; i < n; i++) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 126f244cdf88..913261481c1a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1164,7 +1164,7 @@  static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	root->nr_delalloc_inodes = 0;
 	root->nr_ordered_extents = 0;
 	root->inode_tree = RB_ROOT;
-	INIT_RADIX_TREE(&root->delayed_nodes_tree, GFP_ATOMIC);
+	xa_init_flags(&root->delayed_nodes, GFP_ATOMIC);
 
 	btrfs_init_root_block_rsv(root);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5082b9c70f8c..50a699ece606 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3826,7 +3826,7 @@  static int btrfs_read_locked_inode(struct inode *inode,
 	 * cache.
 	 *
 	 * This is required for both inode re-read from disk and delayed inode
-	 * in delayed_nodes_tree.
+	 * in the delayed_nodes XArray.
 	 */
 	if (BTRFS_I(inode)->last_trans == fs_info->generation)
 		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,