Message ID | 20220414101054.15230-1-gniebler@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] btrfs: Turn delayed_nodes_tree into an XArray | expand |
On 14.04.22 г. 13:10 ч., Gabriel Niebler wrote: > … in the btrfs_root struct and adjust all usages of this object to use the > XArray API, because it is notionally easier to use and unserstand, as it > provides array semantics, and also takes care of locking for us, further > simplifying the code. > > Signed-off-by: Gabriel Niebler <gniebler@suse.com> Apart from some minor remarks (see below) looks good. Reviewed-by: Nikolay Borisov <nborisov@suse.com> > --- > fs/btrfs/ctree.h | 4 ++-- > fs/btrfs/delayed-inode.c | 49 +++++++++++++++++++--------------------- > fs/btrfs/disk-io.c | 2 +- > fs/btrfs/inode.c | 2 +- > 4 files changed, 27 insertions(+), 30 deletions(-) > <snip> > @@ -141,23 +141,17 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node( > /* cached in the btrfs inode and can be accessed */ > refcount_set(&node->refs, 2); > > - ret = radix_tree_preload(GFP_NOFS); > - if (ret) { > - kmem_cache_free(delayed_node_cache, node); > - return ERR_PTR(ret); > - } > - > spin_lock(&root->inode_lock); > - ret = radix_tree_insert(&root->delayed_nodes_tree, ino, node); > - if (ret == -EEXIST) { > + ret = xa_insert(&root->delayed_nodes, ino, node, GFP_NOFS); > + if (ret) { > spin_unlock(&root->inode_lock); > kmem_cache_free(delayed_node_cache, node); > - radix_tree_preload_end(); > - goto again; > + if (ret == -EBUSY) > + goto again; > + return ERR_PTR(ret); > } > btrfs_inode->delayed_node = node; > spin_unlock(&root->inode_lock); > - radix_tree_preload_end(); nit: One suggestion instead of relying on the goto again; do implement what's essentially a do {} while() loop, this code can be easily turned into a well-formed do {} while(). Usually this type of construct (label to implement a loop) is used to reduce indentation levels however in this case I did the transformation and we are not breaking the 80 chars line: 21 do { 20 node = btrfs_get_delayed_node(btrfs_inode); 19 if (node) 18 return node; 17 16 node = kmem_cache_zalloc(delayed_node_cache, GFP_NOFS); 15 if (!node) 14 return ERR_PTR(-ENOMEM); 13 btrfs_init_delayed_node(node, root, ino); 12 11 /* cached in the btrfs inode and can be accessed */ 10 refcount_set(&node->refs, 2); 9 8 spin_lock(&root->inode_lock); 7 ret = xa_insert(&root->delayed_nodes, ino, node, GFP_NOFS); 6 if (ret) { 5 spin_unlock(&root->inode_lock); 4 kmem_cache_free(delayed_node_cache, node); 3 if (ret != -EBUSY) 2 return ERR_PTR(ret); 1 } 152 } while (ret); I personally dislike using labels like that but since you are changing this code now might be a good time to also fix this wrinkle but this is not a deal breaker and I'd be happy to see David's opinion. > > return node; > } <snip> > @@ -1870,29 +1863,33 @@ void btrfs_kill_delayed_inode_items(struct btrfs_inode *inode) > > void btrfs_kill_all_delayed_nodes(struct btrfs_root *root) > { > - u64 inode_id = 0; > + unsigned long index = 0; > + struct btrfs_delayed_node *delayed_node; > struct btrfs_delayed_node *delayed_nodes[8]; > int i, n; > > while (1) { > spin_lock(&root->inode_lock); > - n = radix_tree_gang_lookup(&root->delayed_nodes_tree, > - (void **)delayed_nodes, inode_id, > - ARRAY_SIZE(delayed_nodes)); > - if (!n) { > + if (xa_empty(&root->delayed_nodes)) { > spin_unlock(&root->inode_lock); > break; nit: This could be turned into a return just because if someone's reading the code seeing the return would be an instant "let's forget this function" rather than having to scan downwards to see if anything special is going on after the loop's body. > } > > - inode_id = delayed_nodes[n - 1]->inode_id + 1; > - for (i = 0; i < n; i++) { > + n = 0; > + xa_for_each_start(&root->delayed_nodes, index, > + delayed_node, index) { > /* > * Don't increase refs in case the node is dead and > * about to be removed from the tree in the loop below > */ > - if (!refcount_inc_not_zero(&delayed_nodes[i]->refs)) > - delayed_nodes[i] = NULL; > + delayed_nodes[n] = delayed_node; > + if (!refcount_inc_not_zero(&delayed_node->refs)) > + delayed_nodes[n] = NULL; nit: instead of doing the assign; check if we should undo the assignment you can do something like : @@ -1878,13 +1879,13 @@ void btrfs_kill_all_delayed_nodes(struct btrfs_root *root) n = 0; xa_for_each_start(&root->delayed_nodes, index, delayed_node, index) { + struct btrfs_delayed_node *delayed_nodes[8] = {}; /* * Don't increase refs in case the node is dead and * about to be removed from the tree in the loop below */ - delayed_nodes[n] = delayed_node; - if (!refcount_inc_not_zero(&delayed_node->refs)) - delayed_nodes[n] = NULL; + if (refcount_inc_not_zero(&delayed_node->refs)) + delayed_nodes[n] = delayed_node; n++; if (n >= ARRAY_SIZE(delayed_nodes)) break; That is define the delayed_nodes array inside the loop as this is its lifetime, and also uses an empty designated initializer so that the array is guaranteed to be zero initialized, then you only assign a node if you manged to get a refcount. IMO this is somewhat cleaner. > + n++; Also since you are doing a "for each" style iteration then n++ can be incremented only if you actually saved a node, that is refcount was successful. That way you guarantee that delayed_nodes will contain 8 nodes to free. Alternatively think if you have 8 nodes and 7 of them are dying, in this case you will only save 1 node but increment n 8 times and then break, so the subsequent invocation of the freeing loop will only free 1 node i.e you lose the effect of batching. <snip>
On Thu, Apr 14, 2022 at 04:18:53PM +0300, Nikolay Borisov wrote: > > - radix_tree_preload_end(); > > nit: One suggestion instead of relying on the goto again; do implement what's > essentially a do {} while() loop, this code can be easily turned into a > well-formed do {} while(). Usually this type of construct (label to implement a loop) > is used to reduce indentation levels however in this case I did the transformation > and we are not breaking the 80 chars line: > > 21 do { > 20 node = btrfs_get_delayed_node(btrfs_inode); > 19 if (node) > 18 return node; > 17 > 16 node = kmem_cache_zalloc(delayed_node_cache, GFP_NOFS); > 15 if (!node) > 14 return ERR_PTR(-ENOMEM); > 13 btrfs_init_delayed_node(node, root, ino); > 12 > 11 /* cached in the btrfs inode and can be accessed */ > 10 refcount_set(&node->refs, 2); > 9 > 8 spin_lock(&root->inode_lock); > 7 ret = xa_insert(&root->delayed_nodes, ino, node, GFP_NOFS); > 6 if (ret) { > 5 spin_unlock(&root->inode_lock); > 4 kmem_cache_free(delayed_node_cache, node); > 3 if (ret != -EBUSY) > 2 return ERR_PTR(ret); > 1 } > 152 } while (ret); > > I personally dislike using labels like that but since you are changing this code now might be > a good time to also fix this wrinkle but this is not a deal breaker and I'd be happy to see David's > opinion. I also dislike the while and again: label and turning that th do/while is good as long as it does not obscure the real change. In this case and given the example above I think it makes sense, the loop body is short enough and we need to review the whole logic after the structure switch anyway.
tl;dr: All your points are well taken, will resubmit with them applied. Am 14.04.22 um 15:18 schrieb Nikolay Borisov: > On 14.04.22 г. 13:10 ч., Gabriel Niebler wrote: >> … in the btrfs_root struct and adjust all usages of this object to use >> the >> XArray API, because it is notionally easier to use and unserstand, as it >> provides array semantics, and also takes care of locking for us, further >> simplifying the code. [...] >> @@ -141,23 +141,17 @@ static struct btrfs_delayed_node >> *btrfs_get_or_create_delayed_node( >> /* cached in the btrfs inode and can be accessed */ >> refcount_set(&node->refs, 2); >> - ret = radix_tree_preload(GFP_NOFS); >> - if (ret) { >> - kmem_cache_free(delayed_node_cache, node); >> - return ERR_PTR(ret); >> - } >> - >> spin_lock(&root->inode_lock); >> - ret = radix_tree_insert(&root->delayed_nodes_tree, ino, node); >> - if (ret == -EEXIST) { >> + ret = xa_insert(&root->delayed_nodes, ino, node, GFP_NOFS); >> + if (ret) { >> spin_unlock(&root->inode_lock); >> kmem_cache_free(delayed_node_cache, node); >> - radix_tree_preload_end(); >> - goto again; >> + if (ret == -EBUSY) >> + goto again; >> + return ERR_PTR(ret); >> } >> btrfs_inode->delayed_node = node; >> spin_unlock(&root->inode_lock); >> - radix_tree_preload_end(); > > nit: One suggestion instead of relying on the goto again; do implement > what's > essentially a do {} while() loop, this code can be easily turned into a > well-formed do {} while(). Usually this type of construct (label to > implement a loop) > is used to reduce indentation levels however in this case I did the > transformation > and we are not breaking the 80 chars line: > > 21 do { > 20 node = btrfs_get_delayed_node(btrfs_inode); > 19 if (node) > 18 return node; > 17 > 16 node = kmem_cache_zalloc(delayed_node_cache, > GFP_NOFS); > 15 if (!node) > 14 return ERR_PTR(-ENOMEM); > 13 btrfs_init_delayed_node(node, root, ino); > 12 > 11 /* cached in the btrfs inode and can be accessed */ > 10 refcount_set(&node->refs, 2); > 9 > 8 spin_lock(&root->inode_lock); > 7 ret = xa_insert(&root->delayed_nodes, ino, node, > GFP_NOFS); > 6 if (ret) { > 5 spin_unlock(&root->inode_lock); > 4 kmem_cache_free(delayed_node_cache, node); > 3 if (ret != -EBUSY) > 2 return ERR_PTR(ret); > 1 } > 152 } while (ret); > > I personally dislike using labels like that but since you are changing > this code now might be > a good time to also fix this wrinkle but this is not a deal breaker and > I'd be happy to see David's > opinion. I don't like labels (and gotos) either, esp. when they're used like this. But as I had tried (and failed) to opportunistically remove some labels in my last patch, I've become more "minimalist" in this one. The trick is striking the right balance, I guess, and you demonstrate very nicely how to it here, so I will apply that. >> @@ -1870,29 +1863,33 @@ void btrfs_kill_delayed_inode_items(struct >> btrfs_inode *inode) >> void btrfs_kill_all_delayed_nodes(struct btrfs_root *root) >> { >> - u64 inode_id = 0; >> + unsigned long index = 0; >> + struct btrfs_delayed_node *delayed_node; >> struct btrfs_delayed_node *delayed_nodes[8]; >> int i, n; >> while (1) { >> spin_lock(&root->inode_lock); >> - n = radix_tree_gang_lookup(&root->delayed_nodes_tree, >> - (void **)delayed_nodes, inode_id, >> - ARRAY_SIZE(delayed_nodes)); >> - if (!n) { >> + if (xa_empty(&root->delayed_nodes)) { >> spin_unlock(&root->inode_lock); >> break; > > nit: This could be turned into a return just because if someone's > reading the code seeing the return would be an > instant "let's forget this function" rather than having to scan > downwards to see if anything special > is going on after the loop's body. You're right! I think I just kept the old logic here w/o thinking about it, but doing a return is exactly equivalent and quicker to understand, as you point out. (It's funny: There are people who are very dogmatic about the idea that "a function should only ever have a single return statement!". I'm *not* one of them, but I've worked with such people and perhaps the attitude has rubbed off on me - subconciously even. Maybe it's a Python thing - is this thinking ever found in the C/kernel development space?) >> } >> - inode_id = delayed_nodes[n - 1]->inode_id + 1; >> - for (i = 0; i < n; i++) { >> + n = 0; >> + xa_for_each_start(&root->delayed_nodes, index, >> + delayed_node, index) { >> /* >> * Don't increase refs in case the node is dead and >> * about to be removed from the tree in the loop below >> */ >> - if (!refcount_inc_not_zero(&delayed_nodes[i]->refs)) >> - delayed_nodes[i] = NULL; >> + delayed_nodes[n] = delayed_node; >> + if (!refcount_inc_not_zero(&delayed_node->refs)) >> + delayed_nodes[n] = NULL; > > nit: instead of doing the assign; check if we should undo the assignment > you can do > something like : > > @@ -1878,13 +1879,13 @@ void btrfs_kill_all_delayed_nodes(struct > btrfs_root *root) > n = 0; > xa_for_each_start(&root->delayed_nodes, index, > delayed_node, index) { > + struct btrfs_delayed_node *delayed_nodes[8] = {}; > /* > * Don't increase refs in case the node is dead > and > * about to be removed from the tree in the > loop below > */ > - delayed_nodes[n] = delayed_node; > - if (!refcount_inc_not_zero(&delayed_node->refs)) > - delayed_nodes[n] = NULL; > + if (refcount_inc_not_zero(&delayed_node->refs)) > + delayed_nodes[n] = delayed_node; > n++; > if (n >= ARRAY_SIZE(delayed_nodes)) > break; > > That is define the delayed_nodes array inside the loop as this is its > lifetime, and also uses an > empty designated initializer so that the array is guaranteed to be zero > initialized, then you only > assign a node if you manged to get a refcount. IMO this is somewhat > cleaner. > > >> + n++; > > Also since you are doing a "for each" style iteration then n++ can be > incremented only if you actually > saved a node, that is refcount was successful. That way you guarantee > that delayed_nodes will contain 8 nodes to free. > > Alternatively think if you have 8 nodes and 7 of them are dying, in this > case you will only save 1 node but increment n 8 times > and then break, so the subsequent invocation of the freeing loop will > only free 1 node i.e you lose the effect of batching. Oh yeah, this is somewhat inefficient. Again, I stuck to close to the old logic w/o "stepping back" and thinking about it. But your suggestions make perfect sense and I will apply them.
On Tue, Apr 19, 2022 at 11:03:12AM +0200, Gabriel Niebler wrote: > > nit: This could be turned into a return just because if someone's > > reading the code seeing the return would be an > > instant "let's forget this function" rather than having to scan > > downwards to see if anything special > > is going on after the loop's body. > > You're right! I think I just kept the old logic here w/o thinking about > it, but doing a return is exactly equivalent and quicker to understand, > as you point out. > > (It's funny: There are people who are very dogmatic about the idea that > "a function should only ever have a single return statement!". I'm *not* > one of them, but I've worked with such people and perhaps the attitude > has rubbed off on me - subconciously even. Maybe it's a Python thing - > is this thinking ever found in the C/kernel development space?) We've settled on pattern regarding return statements that's somewhere in between. If a function has some sort of quick setup/quick check sequence that does not need any cleanup then the return follows. Once there are loops, long function body, undo/cleanup functions then it's the goto pattern. And there are exceptions eg. for short functions with loops where the code flow is obvious.
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b7631b88426e..9377dded9679 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1224,10 +1224,10 @@ struct btrfs_root { struct rb_root inode_tree; /* - * radix tree that keeps track of delayed nodes of every inode, + * XArray that keeps track of delayed nodes of every inode, * protected by inode_lock */ - struct radix_tree_root delayed_nodes_tree; + struct xarray delayed_nodes; /* * right now this just gets used so that a root has its own devid * for stat. It may be used for more later diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 748bf6b0d860..bf7393e72633 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -78,7 +78,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node( } spin_lock(&root->inode_lock); - node = radix_tree_lookup(&root->delayed_nodes_tree, ino); + node = xa_load(&root->delayed_nodes, ino); if (node) { if (btrfs_inode->delayed_node) { @@ -90,9 +90,9 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node( /* * It's possible that we're racing into the middle of removing - * this node from the radix tree. In this case, the refcount + * this node from the XArray. In this case, the refcount * was zero and it should never go back to one. Just return - * NULL like it was never in the radix at all; our release + * NULL like it was never in the XArray at all; our release * function is in the process of removing it. * * Some implementations of refcount_inc refuse to bump the @@ -100,7 +100,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node( * here, refcount_inc() may decide to just WARN_ONCE() instead * of actually bumping the refcount. * - * If this node is properly in the radix, we want to bump the + * If this node is properly in the XArray, we want to bump the * refcount twice, once for the inode and once for this get * operation. */ @@ -141,23 +141,17 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node( /* cached in the btrfs inode and can be accessed */ refcount_set(&node->refs, 2); - ret = radix_tree_preload(GFP_NOFS); - if (ret) { - kmem_cache_free(delayed_node_cache, node); - return ERR_PTR(ret); - } - spin_lock(&root->inode_lock); - ret = radix_tree_insert(&root->delayed_nodes_tree, ino, node); - if (ret == -EEXIST) { + ret = xa_insert(&root->delayed_nodes, ino, node, GFP_NOFS); + if (ret) { spin_unlock(&root->inode_lock); kmem_cache_free(delayed_node_cache, node); - radix_tree_preload_end(); - goto again; + if (ret == -EBUSY) + goto again; + return ERR_PTR(ret); } btrfs_inode->delayed_node = node; spin_unlock(&root->inode_lock); - radix_tree_preload_end(); return node; } @@ -276,8 +270,7 @@ static void __btrfs_release_delayed_node( * back up. We can delete it now. */ ASSERT(refcount_read(&delayed_node->refs) == 0); - radix_tree_delete(&root->delayed_nodes_tree, - delayed_node->inode_id); + xa_erase(&root->delayed_nodes, delayed_node->inode_id); spin_unlock(&root->inode_lock); kmem_cache_free(delayed_node_cache, delayed_node); } @@ -1870,29 +1863,33 @@ void btrfs_kill_delayed_inode_items(struct btrfs_inode *inode) void btrfs_kill_all_delayed_nodes(struct btrfs_root *root) { - u64 inode_id = 0; + unsigned long index = 0; + struct btrfs_delayed_node *delayed_node; struct btrfs_delayed_node *delayed_nodes[8]; int i, n; while (1) { spin_lock(&root->inode_lock); - n = radix_tree_gang_lookup(&root->delayed_nodes_tree, - (void **)delayed_nodes, inode_id, - ARRAY_SIZE(delayed_nodes)); - if (!n) { + if (xa_empty(&root->delayed_nodes)) { spin_unlock(&root->inode_lock); break; } - inode_id = delayed_nodes[n - 1]->inode_id + 1; - for (i = 0; i < n; i++) { + n = 0; + xa_for_each_start(&root->delayed_nodes, index, + delayed_node, index) { /* * Don't increase refs in case the node is dead and * about to be removed from the tree in the loop below */ - if (!refcount_inc_not_zero(&delayed_nodes[i]->refs)) - delayed_nodes[i] = NULL; + delayed_nodes[n] = delayed_node; + if (!refcount_inc_not_zero(&delayed_node->refs)) + delayed_nodes[n] = NULL; + n++; + if (n >= ARRAY_SIZE(delayed_nodes)) + break; } + index++; spin_unlock(&root->inode_lock); for (i = 0; i < n; i++) { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b30309f187cf..2a49c772d175 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1164,7 +1164,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, root->nr_delalloc_inodes = 0; root->nr_ordered_extents = 0; root->inode_tree = RB_ROOT; - INIT_RADIX_TREE(&root->delayed_nodes_tree, GFP_ATOMIC); + xa_init_flags(&root->delayed_nodes, GFP_ATOMIC); btrfs_init_root_block_rsv(root); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 17d5557f98ec..5b5355fdfa62 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3827,7 +3827,7 @@ static int btrfs_read_locked_inode(struct inode *inode, * cache. * * This is required for both inode re-read from disk and delayed inode - * in delayed_nodes_tree. + * in the delayed_nodes XArray. */ if (BTRFS_I(inode)->last_trans == fs_info->generation) set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
… in the btrfs_root struct and adjust all usages of this object to use the XArray API, because it is notionally easier to use and unserstand, as it provides array semantics, and also takes care of locking for us, further simplifying the code. Signed-off-by: Gabriel Niebler <gniebler@suse.com> --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/delayed-inode.c | 49 +++++++++++++++++++--------------------- fs/btrfs/disk-io.c | 2 +- fs/btrfs/inode.c | 2 +- 4 files changed, 27 insertions(+), 30 deletions(-)