Message ID | 20231016032226.59199-4-zhangpeng.00@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce __mt_dup() to improve the performance of fork() | expand |
On Mon, Oct 16, 2023 at 11:22:19AM +0800, Peng Zhang wrote: > +++ b/lib/maple_tree.c > @@ -4,6 +4,10 @@ > * Copyright (c) 2018-2022 Oracle Corporation > * Authors: Liam R. Howlett <Liam.Howlett@oracle.com> > * Matthew Wilcox <willy@infradead.org> > + * > + * Algorithm for duplicating Maple Tree > + * Copyright (c) 2023 ByteDance > + * Author: Peng Zhang <zhangpeng.00@bytedance.com> You can't copyright an algorithm. You can copyright the _implementation_ of an algorithm. You have a significant chunk of code in this file, so adding your copyright is reasonable (although not legally required, AIUI). Just leave out this line: + * Algorithm for duplicating Maple Tree > +/** > + * __mt_dup(): Duplicate an entire maple tree > + * @mt: The source maple tree > + * @new: The new maple tree > + * @gfp: The GFP_FLAGS to use for allocations > + * > + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order > + * traversal. It uses memcopy() to copy nodes in the source tree and allocate memcpy()? > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > +{ > + int ret = 0; > + MA_STATE(mas, mt, 0, 0); > + MA_STATE(new_mas, new, 0, 0); > + > + mas_dup_build(&mas, &new_mas, gfp); > + > + if (unlikely(mas_is_err(&mas))) { > + ret = xa_err(mas.node); > + if (ret == -ENOMEM) > + mas_dup_free(&new_mas); > + } > + > + return ret; > +} > +EXPORT_SYMBOL(__mt_dup); Why does it need to be exported?
在 2023/10/16 22:10, Matthew Wilcox 写道: > On Mon, Oct 16, 2023 at 11:22:19AM +0800, Peng Zhang wrote: >> +++ b/lib/maple_tree.c >> @@ -4,6 +4,10 @@ >> * Copyright (c) 2018-2022 Oracle Corporation >> * Authors: Liam R. Howlett <Liam.Howlett@oracle.com> >> * Matthew Wilcox <willy@infradead.org> >> + * >> + * Algorithm for duplicating Maple Tree >> + * Copyright (c) 2023 ByteDance >> + * Author: Peng Zhang <zhangpeng.00@bytedance.com> > > You can't copyright an algorithm. You can copyright the > _implementation_ of an algorithm. You have a significant chunk of code > in this file, so adding your copyright is reasonable (although not > legally required, AIUI). Just leave out this line: > > + * Algorithm for duplicating Maple Tree Okay, I will make the correction, thank you. > >> +/** >> + * __mt_dup(): Duplicate an entire maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order >> + * traversal. It uses memcopy() to copy nodes in the source tree and allocate > > memcpy()? Yes, thank you for pointing this out. > >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) >> +{ >> + int ret = 0; >> + MA_STATE(mas, mt, 0, 0); >> + MA_STATE(new_mas, new, 0, 0); >> + >> + mas_dup_build(&mas, &new_mas, gfp); >> + >> + if (unlikely(mas_is_err(&mas))) { >> + ret = xa_err(mas.node); >> + if (ret == -ENOMEM) >> + mas_dup_free(&new_mas); >> + } >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(__mt_dup); > > Why does it need to be exported? I consider __mt_dup() as a general interface for Maple Tree, uncertain whether it will be used by certain modules in the future. > >
* Peng Zhang <zhangpeng.00@bytedance.com> [231015 23:23]: > Introduce interfaces __mt_dup() and mtree_dup(), which are used to > duplicate a maple tree. They duplicate a maple tree in Depth-First > Search (DFS) pre-order traversal. It uses memcopy() to copy nodes in the > source tree and allocate new child nodes in non-leaf nodes. The new node > is exactly the same as the source node except for all the addresses > stored in it. It will be faster than traversing all elements in the > source tree and inserting them one by one into the new tree. The time > complexity of these two functions is O(n). > > The difference between __mt_dup() and mtree_dup() is that mtree_dup() > handles locks internally. > > Analysis of the average time complexity of this algorithm: > > For simplicity, let's assume that the maximum branching factor of all > non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a > full tree. > > Under the given conditions, if there is a maple tree with n elements, > the number of its leaves is n/16. From bottom to top, the number of > nodes in each level is 1/16 of the number of nodes in the level below. > So the total number of nodes in the entire tree is given by the sum of > n/16 + n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has > log(n) terms with base 16. According to the formula for the sum of a > geometric series, the sum of this series can be calculated as (n-1)/15. > Each node has only one parent node pointer, which can be considered as > an edge. In total, there are (n-1)/15-1 edges. > > This algorithm consists of two operations: > > 1. Traversing all nodes in DFS order. > 2. For each node, making a copy and performing necessary modifications > to create a new node. > > For the first part, DFS traversal will visit each edge twice. Let > T(ascend) represent the cost of taking one step downwards, and > T(descend) represent the cost of taking one step upwards. And both of > them are constants (although mas_ascend() may not be, as it contains a > loop, but here we ignore it and treat it as a constant). So the time > spent on the first part can be represented as > ((n-1)/15-1) * (T(ascend) + T(descend)). > > For the second part, each node will be copied, and the cost of copying a > node is denoted as T(copy_node). For each non-leaf node, it is necessary > to reallocate all child nodes, and the cost of this operation is denoted > as T(dup_alloc). The behavior behind memory allocation is complex and > not specific to the maple tree operation. Here, we assume that the time > required for a single allocation is constant. Since the size of a node > is fixed, both of these symbols are also constants. We can calculate > that the time spent on the second part is > ((n-1)/15) * T(copy_node) + ((n-1)/15 - n/16) * T(dup_alloc). > > Adding both parts together, the total time spent by the algorithm can be > represented as: > > ((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) - > n/16 * T(dup_alloc) - (T(ascend) + T(descend)) > > Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc) > Let C2 = T(dup_alloc) > Let C3 = T(ascend) + T(descend) > > Finally, the expression can be simplified as: > ((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3). > > This is a linear function, so the average time complexity is O(n). > > Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> > --- > include/linux/maple_tree.h | 3 + > lib/maple_tree.c | 290 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 293 insertions(+) > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h > index f91dbc7fe091..a452dd8a1e5c 100644 > --- a/include/linux/maple_tree.h > +++ b/include/linux/maple_tree.h > @@ -329,6 +329,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, > void *entry, gfp_t gfp); > void *mtree_erase(struct maple_tree *mt, unsigned long index); > > +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); > + > void mtree_destroy(struct maple_tree *mt); > void __mt_destroy(struct maple_tree *mt); > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > index ca7039633844..6e0ad83f14e3 100644 > --- a/lib/maple_tree.c > +++ b/lib/maple_tree.c > @@ -4,6 +4,10 @@ > * Copyright (c) 2018-2022 Oracle Corporation > * Authors: Liam R. Howlett <Liam.Howlett@oracle.com> > * Matthew Wilcox <willy@infradead.org> > + * > + * Algorithm for duplicating Maple Tree > + * Copyright (c) 2023 ByteDance > + * Author: Peng Zhang <zhangpeng.00@bytedance.com> > */ > > /* > @@ -6475,6 +6479,292 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) > } > EXPORT_SYMBOL(mtree_erase); > > +/* > + * mas_dup_free() - Free an incomplete duplication of a tree. > + * @mas: The maple state of a incomplete tree. > + * > + * The parameter @mas->node passed in indicates that the allocation failed on > + * this node. This function frees all nodes starting from @mas->node in the > + * reverse order of mas_dup_build(). There is no need to hold the source tree > + * lock at this time. > + */ > +static void mas_dup_free(struct ma_state *mas) > +{ > + struct maple_node *node; > + enum maple_type type; > + void __rcu **slots; > + unsigned char count, i; > + > + /* Maybe the first node allocation failed. */ > + if (mas_is_none(mas)) > + return; > + > + while (!mte_is_root(mas->node)) { > + mas_ascend(mas); > + Please watch the extra whitespace. There are a few in this patch. > + if (mas->offset) { > + mas->offset--; > + do { > + mas_descend(mas); > + mas->offset = mas_data_end(mas); > + } while (!mte_is_leaf(mas->node)); > + > + mas_ascend(mas); > + } > + > + node = mte_to_node(mas->node); > + type = mte_node_type(mas->node); > + slots = ma_slots(node, type); > + count = mas_data_end(mas) + 1; > + for (i = 0; i < count; i++) > + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; > + > + mt_free_bulk(count, slots); > + } > + > + node = mte_to_node(mas->node); > + mt_free_one(node); > +} > + > +/* > + * mas_copy_node() - Copy a maple node and replace the parent. > + * @mas: The maple state of source tree. > + * @new_mas: The maple state of new tree. > + * @parent: The parent of the new node. > + * > + * Copy @mas->node to @new_mas->node, set @parent to be the parent of > + * @new_mas->node. If memory allocation fails, @mas is set to -ENOMEM. > + */ > +static inline void mas_copy_node(struct ma_state *mas, struct ma_state *new_mas, > + struct maple_pnode *parent) > +{ > + struct maple_node *node = mte_to_node(mas->node); > + struct maple_node *new_node = mte_to_node(new_mas->node); > + unsigned long val; > + > + /* Copy the node completely. */ > + memcpy(new_node, node, sizeof(struct maple_node)); > + > + /* Update the parent node pointer. */ > + val = (unsigned long)node->parent & MAPLE_NODE_MASK; > + new_node->parent = ma_parent_ptr(val | (unsigned long)parent); > +} > + > +/* > + * mas_dup_alloc() - Allocate child nodes for a maple node. > + * @mas: The maple state of source tree. > + * @new_mas: The maple state of new tree. > + * @gfp: The GFP_FLAGS to use for allocations. > + * > + * This function allocates child nodes for @new_mas->node during the duplication > + * process. If memory allocation fails, @mas is set to -ENOMEM. > + */ > +static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas, > + gfp_t gfp) > +{ > + struct maple_node *node = mte_to_node(mas->node); > + struct maple_node *new_node = mte_to_node(new_mas->node); > + enum maple_type type; > + unsigned char request, count, i; > + void __rcu **slots; > + void __rcu **new_slots; > + unsigned long val; > + > + /* Allocate memory for child nodes. */ > + type = mte_node_type(mas->node); > + new_slots = ma_slots(new_node, type); > + request = mas_data_end(mas) + 1; > + count = mt_alloc_bulk(gfp, request, (void **)new_slots); > + if (unlikely(count < request)) { > + if (count) > + mt_free_bulk(count, new_slots); We were dropping this mt_free_bulk() call as discussed in [1]. Did I miss something? > + > + memset(new_slots, 0, request * sizeof(void *)); > + mas_set_err(mas, -ENOMEM); > + return; > + } > + > + /* Restore node type information in slots. */ > + slots = ma_slots(node, type); > + for (i = 0; i < count; i++) { > + val = (unsigned long)mt_slot_locked(mas->tree, slots, i); > + val &= MAPLE_NODE_MASK; > + ((unsigned long *)new_slots)[i] |= val; > + } > +} > + > +/* > + * mas_dup_build() - Build a new maple tree from a source tree > + * @mas: The maple state of source tree, need to be in MAS_START state. > + * @new_mas: The maple state of new tree, need to be in MAS_START state. > + * @gfp: The GFP_FLAGS to use for allocations. > + * > + * This function builds a new tree in DFS preorder. If the memory allocation > + * fails, the error code -ENOMEM will be set in @mas, and @new_mas points to the > + * last node. mas_dup_free() will free the incomplete duplication of a tree. > + * > + * Note that the attributes of the two trees need to be exactly the same, and the > + * new tree needs to be empty, otherwise -EINVAL will be set in @mas. > + */ > +static inline void mas_dup_build(struct ma_state *mas, struct ma_state *new_mas, > + gfp_t gfp) > +{ > + struct maple_node *node; > + struct maple_pnode *parent = NULL; > + struct maple_enode *root; > + enum maple_type type; > + > + if (unlikely(mt_attr(mas->tree) != mt_attr(new_mas->tree)) || > + unlikely(!mtree_empty(new_mas->tree))) { > + mas_set_err(mas, -EINVAL); > + return; > + } > + > + mas_start(mas); > + if (mas_is_ptr(mas) || mas_is_none(mas)) { > + root = mt_root_locked(mas->tree); mas_start(mas) would return the root entry if it's a pointer and NULL if the tree is empty, so this can be written: root = mas_start(mas); if (mas_is_ptry() || mas_is_none() goto set_new_tree; > + goto set_new_tree; > + } > + > + node = mt_alloc_one(gfp); > + if (!node) { > + new_mas->node = MAS_NONE; > + mas_set_err(mas, -ENOMEM); > + return; > + } > + > + type = mte_node_type(mas->node); > + root = mt_mk_node(node, type); > + new_mas->node = root; > + new_mas->min = 0; > + new_mas->max = ULONG_MAX; > + root = mte_mk_root(root); > + > + while (1) { > + mas_copy_node(mas, new_mas, parent); > + > + if (!mte_is_leaf(mas->node)) { > + /* Only allocate child nodes for non-leaf nodes. */ > + mas_dup_alloc(mas, new_mas, gfp); > + if (unlikely(mas_is_err(mas))) > + return; > + } else { > + /* > + * This is the last leaf node and duplication is > + * completed. > + */ > + if (mas->max == ULONG_MAX) > + goto done; > + > + /* This is not the last leaf node and needs to go up. */ > + do { > + mas_ascend(mas); > + mas_ascend(new_mas); > + } while (mas->offset == mas_data_end(mas)); > + > + /* Move to the next subtree. */ > + mas->offset++; > + new_mas->offset++; > + } > + > + mas_descend(mas); > + parent = ma_parent_ptr(mte_to_node(new_mas->node)); > + mas_descend(new_mas); > + mas->offset = 0; > + new_mas->offset = 0; > + } > +done: > + /* Specially handle the parent of the root node. */ > + mte_to_node(root)->parent = ma_parent_ptr(mas_tree_parent(new_mas)); > +set_new_tree: > + /* Make them the same height */ > + new_mas->tree->ma_flags = mas->tree->ma_flags; > + rcu_assign_pointer(new_mas->tree->ma_root, root); > +} > + > +/** > + * __mt_dup(): Duplicate an entire maple tree > + * @mt: The source maple tree > + * @new: The new maple tree > + * @gfp: The GFP_FLAGS to use for allocations > + * > + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order > + * traversal. It uses memcopy() to copy nodes in the source tree and allocate > + * new child nodes in non-leaf nodes. The new node is exactly the same as the > + * source node except for all the addresses stored in it. It will be faster than > + * traversing all elements in the source tree and inserting them one by one into > + * the new tree. > + * The user needs to ensure that the attributes of the source tree and the new > + * tree are the same, and the new tree needs to be an empty tree, otherwise > + * -EINVAL will be returned. > + * Note that the user needs to manually lock the source tree and the new tree. > + * > + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If > + * the attributes of the two trees are different or the new tree is not an empty > + * tree. > + */ > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > +{ > + int ret = 0; > + MA_STATE(mas, mt, 0, 0); > + MA_STATE(new_mas, new, 0, 0); > + > + mas_dup_build(&mas, &new_mas, gfp); > + > + if (unlikely(mas_is_err(&mas))) { > + ret = xa_err(mas.node); > + if (ret == -ENOMEM) > + mas_dup_free(&new_mas); > + } > + > + return ret; > +} > +EXPORT_SYMBOL(__mt_dup); > + > +/** > + * mtree_dup(): Duplicate an entire maple tree > + * @mt: The source maple tree > + * @new: The new maple tree > + * @gfp: The GFP_FLAGS to use for allocations > + * > + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order > + * traversal. It uses memcopy() to copy nodes in the source tree and allocate > + * new child nodes in non-leaf nodes. The new node is exactly the same as the > + * source node except for all the addresses stored in it. It will be faster than > + * traversing all elements in the source tree and inserting them one by one into > + * the new tree. > + * The user needs to ensure that the attributes of the source tree and the new > + * tree are the same, and the new tree needs to be an empty tree, otherwise > + * -EINVAL will be returned. > + * > + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If > + * the attributes of the two trees are different or the new tree is not an empty > + * tree. > + */ > +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > +{ > + int ret = 0; > + MA_STATE(mas, mt, 0, 0); > + MA_STATE(new_mas, new, 0, 0); > + > + mas_lock(&new_mas); > + mas_lock_nested(&mas, SINGLE_DEPTH_NESTING); > + > + mas_dup_build(&mas, &new_mas, gfp); > + mas_unlock(&mas); > + > + if (unlikely(mas_is_err(&mas))) { > + ret = xa_err(mas.node); > + if (ret == -ENOMEM) > + mas_dup_free(&new_mas); > + } > + > + mas_unlock(&new_mas); > + > + return ret; > +} > +EXPORT_SYMBOL(mtree_dup); > + > /** > * __mt_destroy() - Walk and free all nodes of a locked maple tree. > * @mt: The maple tree > -- > 2.20.1 > [1]. https://lore.kernel.org/lkml/20231004142500.gz2552r74aiphl4z@revolver/ Thanks, Liam
在 2023/10/17 21:57, Liam R. Howlett 写道: > * Peng Zhang <zhangpeng.00@bytedance.com> [231015 23:23]: >> Introduce interfaces __mt_dup() and mtree_dup(), which are used to >> duplicate a maple tree. They duplicate a maple tree in Depth-First >> Search (DFS) pre-order traversal. It uses memcopy() to copy nodes in the >> source tree and allocate new child nodes in non-leaf nodes. The new node >> is exactly the same as the source node except for all the addresses >> stored in it. It will be faster than traversing all elements in the >> source tree and inserting them one by one into the new tree. The time >> complexity of these two functions is O(n). >> >> The difference between __mt_dup() and mtree_dup() is that mtree_dup() >> handles locks internally. >> >> Analysis of the average time complexity of this algorithm: >> >> For simplicity, let's assume that the maximum branching factor of all >> non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a >> full tree. >> >> Under the given conditions, if there is a maple tree with n elements, >> the number of its leaves is n/16. From bottom to top, the number of >> nodes in each level is 1/16 of the number of nodes in the level below. >> So the total number of nodes in the entire tree is given by the sum of >> n/16 + n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has >> log(n) terms with base 16. According to the formula for the sum of a >> geometric series, the sum of this series can be calculated as (n-1)/15. >> Each node has only one parent node pointer, which can be considered as >> an edge. In total, there are (n-1)/15-1 edges. >> >> This algorithm consists of two operations: >> >> 1. Traversing all nodes in DFS order. >> 2. For each node, making a copy and performing necessary modifications >> to create a new node. >> >> For the first part, DFS traversal will visit each edge twice. Let >> T(ascend) represent the cost of taking one step downwards, and >> T(descend) represent the cost of taking one step upwards. And both of >> them are constants (although mas_ascend() may not be, as it contains a >> loop, but here we ignore it and treat it as a constant). So the time >> spent on the first part can be represented as >> ((n-1)/15-1) * (T(ascend) + T(descend)). >> >> For the second part, each node will be copied, and the cost of copying a >> node is denoted as T(copy_node). For each non-leaf node, it is necessary >> to reallocate all child nodes, and the cost of this operation is denoted >> as T(dup_alloc). The behavior behind memory allocation is complex and >> not specific to the maple tree operation. Here, we assume that the time >> required for a single allocation is constant. Since the size of a node >> is fixed, both of these symbols are also constants. We can calculate >> that the time spent on the second part is >> ((n-1)/15) * T(copy_node) + ((n-1)/15 - n/16) * T(dup_alloc). >> >> Adding both parts together, the total time spent by the algorithm can be >> represented as: >> >> ((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) - >> n/16 * T(dup_alloc) - (T(ascend) + T(descend)) >> >> Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc) >> Let C2 = T(dup_alloc) >> Let C3 = T(ascend) + T(descend) >> >> Finally, the expression can be simplified as: >> ((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3). >> >> This is a linear function, so the average time complexity is O(n). >> >> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> >> --- >> include/linux/maple_tree.h | 3 + >> lib/maple_tree.c | 290 +++++++++++++++++++++++++++++++++++++ >> 2 files changed, 293 insertions(+) >> >> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h >> index f91dbc7fe091..a452dd8a1e5c 100644 >> --- a/include/linux/maple_tree.h >> +++ b/include/linux/maple_tree.h >> @@ -329,6 +329,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, >> void *entry, gfp_t gfp); >> void *mtree_erase(struct maple_tree *mt, unsigned long index); >> >> +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); >> + >> void mtree_destroy(struct maple_tree *mt); >> void __mt_destroy(struct maple_tree *mt); >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index ca7039633844..6e0ad83f14e3 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -4,6 +4,10 @@ >> * Copyright (c) 2018-2022 Oracle Corporation >> * Authors: Liam R. Howlett <Liam.Howlett@oracle.com> >> * Matthew Wilcox <willy@infradead.org> >> + * >> + * Algorithm for duplicating Maple Tree >> + * Copyright (c) 2023 ByteDance >> + * Author: Peng Zhang <zhangpeng.00@bytedance.com> >> */ >> >> /* >> @@ -6475,6 +6479,292 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) >> } >> EXPORT_SYMBOL(mtree_erase); >> >> +/* >> + * mas_dup_free() - Free an incomplete duplication of a tree. >> + * @mas: The maple state of a incomplete tree. >> + * >> + * The parameter @mas->node passed in indicates that the allocation failed on >> + * this node. This function frees all nodes starting from @mas->node in the >> + * reverse order of mas_dup_build(). There is no need to hold the source tree >> + * lock at this time. >> + */ >> +static void mas_dup_free(struct ma_state *mas) >> +{ >> + struct maple_node *node; >> + enum maple_type type; >> + void __rcu **slots; >> + unsigned char count, i; >> + >> + /* Maybe the first node allocation failed. */ >> + if (mas_is_none(mas)) >> + return; >> + >> + while (!mte_is_root(mas->node)) { >> + mas_ascend(mas); >> + > > Please watch the extra whitespace. There are a few in this patch. Done in v6, thank you. > >> + if (mas->offset) { >> + mas->offset--; >> + do { >> + mas_descend(mas); >> + mas->offset = mas_data_end(mas); >> + } while (!mte_is_leaf(mas->node)); >> + >> + mas_ascend(mas); >> + } >> + >> + node = mte_to_node(mas->node); >> + type = mte_node_type(mas->node); >> + slots = ma_slots(node, type); >> + count = mas_data_end(mas) + 1; >> + for (i = 0; i < count; i++) >> + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; >> + >> + mt_free_bulk(count, slots); >> + } >> + >> + node = mte_to_node(mas->node); >> + mt_free_one(node); >> +} >> + >> +/* >> + * mas_copy_node() - Copy a maple node and replace the parent. >> + * @mas: The maple state of source tree. >> + * @new_mas: The maple state of new tree. >> + * @parent: The parent of the new node. >> + * >> + * Copy @mas->node to @new_mas->node, set @parent to be the parent of >> + * @new_mas->node. If memory allocation fails, @mas is set to -ENOMEM. >> + */ >> +static inline void mas_copy_node(struct ma_state *mas, struct ma_state *new_mas, >> + struct maple_pnode *parent) >> +{ >> + struct maple_node *node = mte_to_node(mas->node); >> + struct maple_node *new_node = mte_to_node(new_mas->node); >> + unsigned long val; >> + >> + /* Copy the node completely. */ >> + memcpy(new_node, node, sizeof(struct maple_node)); >> + >> + /* Update the parent node pointer. */ >> + val = (unsigned long)node->parent & MAPLE_NODE_MASK; >> + new_node->parent = ma_parent_ptr(val | (unsigned long)parent); >> +} >> + >> +/* >> + * mas_dup_alloc() - Allocate child nodes for a maple node. >> + * @mas: The maple state of source tree. >> + * @new_mas: The maple state of new tree. >> + * @gfp: The GFP_FLAGS to use for allocations. >> + * >> + * This function allocates child nodes for @new_mas->node during the duplication >> + * process. If memory allocation fails, @mas is set to -ENOMEM. >> + */ >> +static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas, >> + gfp_t gfp) >> +{ >> + struct maple_node *node = mte_to_node(mas->node); >> + struct maple_node *new_node = mte_to_node(new_mas->node); >> + enum maple_type type; >> + unsigned char request, count, i; >> + void __rcu **slots; >> + void __rcu **new_slots; >> + unsigned long val; >> + >> + /* Allocate memory for child nodes. */ >> + type = mte_node_type(mas->node); >> + new_slots = ma_slots(new_node, type); >> + request = mas_data_end(mas) + 1; >> + count = mt_alloc_bulk(gfp, request, (void **)new_slots); >> + if (unlikely(count < request)) { >> + if (count) >> + mt_free_bulk(count, new_slots); > > We were dropping this mt_free_bulk() call as discussed in [1]. Did I > miss something? It seems that I misunderstood earlier, I thought it needed to be kept. It has been deleted in v6, thank you. > >> + >> + memset(new_slots, 0, request * sizeof(void *)); >> + mas_set_err(mas, -ENOMEM); >> + return; >> + } >> + >> + /* Restore node type information in slots. */ >> + slots = ma_slots(node, type); >> + for (i = 0; i < count; i++) { >> + val = (unsigned long)mt_slot_locked(mas->tree, slots, i); >> + val &= MAPLE_NODE_MASK; >> + ((unsigned long *)new_slots)[i] |= val; >> + } >> +} >> + >> +/* >> + * mas_dup_build() - Build a new maple tree from a source tree >> + * @mas: The maple state of source tree, need to be in MAS_START state. >> + * @new_mas: The maple state of new tree, need to be in MAS_START state. >> + * @gfp: The GFP_FLAGS to use for allocations. >> + * >> + * This function builds a new tree in DFS preorder. If the memory allocation >> + * fails, the error code -ENOMEM will be set in @mas, and @new_mas points to the >> + * last node. mas_dup_free() will free the incomplete duplication of a tree. >> + * >> + * Note that the attributes of the two trees need to be exactly the same, and the >> + * new tree needs to be empty, otherwise -EINVAL will be set in @mas. >> + */ >> +static inline void mas_dup_build(struct ma_state *mas, struct ma_state *new_mas, >> + gfp_t gfp) >> +{ >> + struct maple_node *node; >> + struct maple_pnode *parent = NULL; >> + struct maple_enode *root; >> + enum maple_type type; >> + >> + if (unlikely(mt_attr(mas->tree) != mt_attr(new_mas->tree)) || >> + unlikely(!mtree_empty(new_mas->tree))) { >> + mas_set_err(mas, -EINVAL); >> + return; >> + } >> + >> + mas_start(mas); >> + if (mas_is_ptr(mas) || mas_is_none(mas)) { >> + root = mt_root_locked(mas->tree); > > mas_start(mas) would return the root entry if it's a pointer and NULL if > the tree is empty, so this can be written: > root = mas_start(mas); > if (mas_is_ptry() || mas_is_none() > goto set_new_tree; Done in v6, thank you. > > >> + goto set_new_tree; >> + } >> + >> + node = mt_alloc_one(gfp); >> + if (!node) { >> + new_mas->node = MAS_NONE; >> + mas_set_err(mas, -ENOMEM); >> + return; >> + } >> + >> + type = mte_node_type(mas->node); >> + root = mt_mk_node(node, type); >> + new_mas->node = root; >> + new_mas->min = 0; >> + new_mas->max = ULONG_MAX; >> + root = mte_mk_root(root); >> + >> + while (1) { >> + mas_copy_node(mas, new_mas, parent); >> + >> + if (!mte_is_leaf(mas->node)) { >> + /* Only allocate child nodes for non-leaf nodes. */ >> + mas_dup_alloc(mas, new_mas, gfp); >> + if (unlikely(mas_is_err(mas))) >> + return; >> + } else { >> + /* >> + * This is the last leaf node and duplication is >> + * completed. >> + */ >> + if (mas->max == ULONG_MAX) >> + goto done; >> + >> + /* This is not the last leaf node and needs to go up. */ >> + do { >> + mas_ascend(mas); >> + mas_ascend(new_mas); >> + } while (mas->offset == mas_data_end(mas)); >> + >> + /* Move to the next subtree. */ >> + mas->offset++; >> + new_mas->offset++; >> + } >> + >> + mas_descend(mas); >> + parent = ma_parent_ptr(mte_to_node(new_mas->node)); >> + mas_descend(new_mas); >> + mas->offset = 0; >> + new_mas->offset = 0; >> + } >> +done: >> + /* Specially handle the parent of the root node. */ >> + mte_to_node(root)->parent = ma_parent_ptr(mas_tree_parent(new_mas)); >> +set_new_tree: >> + /* Make them the same height */ >> + new_mas->tree->ma_flags = mas->tree->ma_flags; >> + rcu_assign_pointer(new_mas->tree->ma_root, root); >> +} >> + >> +/** >> + * __mt_dup(): Duplicate an entire maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order >> + * traversal. It uses memcopy() to copy nodes in the source tree and allocate >> + * new child nodes in non-leaf nodes. The new node is exactly the same as the >> + * source node except for all the addresses stored in it. It will be faster than >> + * traversing all elements in the source tree and inserting them one by one into >> + * the new tree. >> + * The user needs to ensure that the attributes of the source tree and the new >> + * tree are the same, and the new tree needs to be an empty tree, otherwise >> + * -EINVAL will be returned. >> + * Note that the user needs to manually lock the source tree and the new tree. >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If >> + * the attributes of the two trees are different or the new tree is not an empty >> + * tree. >> + */ >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) >> +{ >> + int ret = 0; >> + MA_STATE(mas, mt, 0, 0); >> + MA_STATE(new_mas, new, 0, 0); >> + >> + mas_dup_build(&mas, &new_mas, gfp); >> + >> + if (unlikely(mas_is_err(&mas))) { >> + ret = xa_err(mas.node); >> + if (ret == -ENOMEM) >> + mas_dup_free(&new_mas); >> + } >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(__mt_dup); >> + >> +/** >> + * mtree_dup(): Duplicate an entire maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order >> + * traversal. It uses memcopy() to copy nodes in the source tree and allocate >> + * new child nodes in non-leaf nodes. The new node is exactly the same as the >> + * source node except for all the addresses stored in it. It will be faster than >> + * traversing all elements in the source tree and inserting them one by one into >> + * the new tree. >> + * The user needs to ensure that the attributes of the source tree and the new >> + * tree are the same, and the new tree needs to be an empty tree, otherwise >> + * -EINVAL will be returned. >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If >> + * the attributes of the two trees are different or the new tree is not an empty >> + * tree. >> + */ >> +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) >> +{ >> + int ret = 0; >> + MA_STATE(mas, mt, 0, 0); >> + MA_STATE(new_mas, new, 0, 0); >> + >> + mas_lock(&new_mas); >> + mas_lock_nested(&mas, SINGLE_DEPTH_NESTING); >> + >> + mas_dup_build(&mas, &new_mas, gfp); >> + mas_unlock(&mas); >> + >> + if (unlikely(mas_is_err(&mas))) { >> + ret = xa_err(mas.node); >> + if (ret == -ENOMEM) >> + mas_dup_free(&new_mas); >> + } >> + >> + mas_unlock(&new_mas); >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(mtree_dup); >> + >> /** >> * __mt_destroy() - Walk and free all nodes of a locked maple tree. >> * @mt: The maple tree >> -- >> 2.20.1 >> > > [1]. https://lore.kernel.org/lkml/20231004142500.gz2552r74aiphl4z@revolver/ > > Thanks, > Liam
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index f91dbc7fe091..a452dd8a1e5c 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -329,6 +329,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, void *entry, gfp_t gfp); void *mtree_erase(struct maple_tree *mt, unsigned long index); +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); + void mtree_destroy(struct maple_tree *mt); void __mt_destroy(struct maple_tree *mt); diff --git a/lib/maple_tree.c b/lib/maple_tree.c index ca7039633844..6e0ad83f14e3 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -4,6 +4,10 @@ * Copyright (c) 2018-2022 Oracle Corporation * Authors: Liam R. Howlett <Liam.Howlett@oracle.com> * Matthew Wilcox <willy@infradead.org> + * + * Algorithm for duplicating Maple Tree + * Copyright (c) 2023 ByteDance + * Author: Peng Zhang <zhangpeng.00@bytedance.com> */ /* @@ -6475,6 +6479,292 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) } EXPORT_SYMBOL(mtree_erase); +/* + * mas_dup_free() - Free an incomplete duplication of a tree. + * @mas: The maple state of a incomplete tree. + * + * The parameter @mas->node passed in indicates that the allocation failed on + * this node. This function frees all nodes starting from @mas->node in the + * reverse order of mas_dup_build(). There is no need to hold the source tree + * lock at this time. + */ +static void mas_dup_free(struct ma_state *mas) +{ + struct maple_node *node; + enum maple_type type; + void __rcu **slots; + unsigned char count, i; + + /* Maybe the first node allocation failed. */ + if (mas_is_none(mas)) + return; + + while (!mte_is_root(mas->node)) { + mas_ascend(mas); + + if (mas->offset) { + mas->offset--; + do { + mas_descend(mas); + mas->offset = mas_data_end(mas); + } while (!mte_is_leaf(mas->node)); + + mas_ascend(mas); + } + + node = mte_to_node(mas->node); + type = mte_node_type(mas->node); + slots = ma_slots(node, type); + count = mas_data_end(mas) + 1; + for (i = 0; i < count; i++) + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; + + mt_free_bulk(count, slots); + } + + node = mte_to_node(mas->node); + mt_free_one(node); +} + +/* + * mas_copy_node() - Copy a maple node and replace the parent. + * @mas: The maple state of source tree. + * @new_mas: The maple state of new tree. + * @parent: The parent of the new node. + * + * Copy @mas->node to @new_mas->node, set @parent to be the parent of + * @new_mas->node. If memory allocation fails, @mas is set to -ENOMEM. + */ +static inline void mas_copy_node(struct ma_state *mas, struct ma_state *new_mas, + struct maple_pnode *parent) +{ + struct maple_node *node = mte_to_node(mas->node); + struct maple_node *new_node = mte_to_node(new_mas->node); + unsigned long val; + + /* Copy the node completely. */ + memcpy(new_node, node, sizeof(struct maple_node)); + + /* Update the parent node pointer. */ + val = (unsigned long)node->parent & MAPLE_NODE_MASK; + new_node->parent = ma_parent_ptr(val | (unsigned long)parent); +} + +/* + * mas_dup_alloc() - Allocate child nodes for a maple node. + * @mas: The maple state of source tree. + * @new_mas: The maple state of new tree. + * @gfp: The GFP_FLAGS to use for allocations. + * + * This function allocates child nodes for @new_mas->node during the duplication + * process. If memory allocation fails, @mas is set to -ENOMEM. + */ +static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas, + gfp_t gfp) +{ + struct maple_node *node = mte_to_node(mas->node); + struct maple_node *new_node = mte_to_node(new_mas->node); + enum maple_type type; + unsigned char request, count, i; + void __rcu **slots; + void __rcu **new_slots; + unsigned long val; + + /* Allocate memory for child nodes. */ + type = mte_node_type(mas->node); + new_slots = ma_slots(new_node, type); + request = mas_data_end(mas) + 1; + count = mt_alloc_bulk(gfp, request, (void **)new_slots); + if (unlikely(count < request)) { + if (count) + mt_free_bulk(count, new_slots); + + memset(new_slots, 0, request * sizeof(void *)); + mas_set_err(mas, -ENOMEM); + return; + } + + /* Restore node type information in slots. */ + slots = ma_slots(node, type); + for (i = 0; i < count; i++) { + val = (unsigned long)mt_slot_locked(mas->tree, slots, i); + val &= MAPLE_NODE_MASK; + ((unsigned long *)new_slots)[i] |= val; + } +} + +/* + * mas_dup_build() - Build a new maple tree from a source tree + * @mas: The maple state of source tree, need to be in MAS_START state. + * @new_mas: The maple state of new tree, need to be in MAS_START state. + * @gfp: The GFP_FLAGS to use for allocations. + * + * This function builds a new tree in DFS preorder. If the memory allocation + * fails, the error code -ENOMEM will be set in @mas, and @new_mas points to the + * last node. mas_dup_free() will free the incomplete duplication of a tree. + * + * Note that the attributes of the two trees need to be exactly the same, and the + * new tree needs to be empty, otherwise -EINVAL will be set in @mas. + */ +static inline void mas_dup_build(struct ma_state *mas, struct ma_state *new_mas, + gfp_t gfp) +{ + struct maple_node *node; + struct maple_pnode *parent = NULL; + struct maple_enode *root; + enum maple_type type; + + if (unlikely(mt_attr(mas->tree) != mt_attr(new_mas->tree)) || + unlikely(!mtree_empty(new_mas->tree))) { + mas_set_err(mas, -EINVAL); + return; + } + + mas_start(mas); + if (mas_is_ptr(mas) || mas_is_none(mas)) { + root = mt_root_locked(mas->tree); + goto set_new_tree; + } + + node = mt_alloc_one(gfp); + if (!node) { + new_mas->node = MAS_NONE; + mas_set_err(mas, -ENOMEM); + return; + } + + type = mte_node_type(mas->node); + root = mt_mk_node(node, type); + new_mas->node = root; + new_mas->min = 0; + new_mas->max = ULONG_MAX; + root = mte_mk_root(root); + + while (1) { + mas_copy_node(mas, new_mas, parent); + + if (!mte_is_leaf(mas->node)) { + /* Only allocate child nodes for non-leaf nodes. */ + mas_dup_alloc(mas, new_mas, gfp); + if (unlikely(mas_is_err(mas))) + return; + } else { + /* + * This is the last leaf node and duplication is + * completed. + */ + if (mas->max == ULONG_MAX) + goto done; + + /* This is not the last leaf node and needs to go up. */ + do { + mas_ascend(mas); + mas_ascend(new_mas); + } while (mas->offset == mas_data_end(mas)); + + /* Move to the next subtree. */ + mas->offset++; + new_mas->offset++; + } + + mas_descend(mas); + parent = ma_parent_ptr(mte_to_node(new_mas->node)); + mas_descend(new_mas); + mas->offset = 0; + new_mas->offset = 0; + } +done: + /* Specially handle the parent of the root node. */ + mte_to_node(root)->parent = ma_parent_ptr(mas_tree_parent(new_mas)); +set_new_tree: + /* Make them the same height */ + new_mas->tree->ma_flags = mas->tree->ma_flags; + rcu_assign_pointer(new_mas->tree->ma_root, root); +} + +/** + * __mt_dup(): Duplicate an entire maple tree + * @mt: The source maple tree + * @new: The new maple tree + * @gfp: The GFP_FLAGS to use for allocations + * + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order + * traversal. It uses memcopy() to copy nodes in the source tree and allocate + * new child nodes in non-leaf nodes. The new node is exactly the same as the + * source node except for all the addresses stored in it. It will be faster than + * traversing all elements in the source tree and inserting them one by one into + * the new tree. + * The user needs to ensure that the attributes of the source tree and the new + * tree are the same, and the new tree needs to be an empty tree, otherwise + * -EINVAL will be returned. + * Note that the user needs to manually lock the source tree and the new tree. + * + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If + * the attributes of the two trees are different or the new tree is not an empty + * tree. + */ +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) +{ + int ret = 0; + MA_STATE(mas, mt, 0, 0); + MA_STATE(new_mas, new, 0, 0); + + mas_dup_build(&mas, &new_mas, gfp); + + if (unlikely(mas_is_err(&mas))) { + ret = xa_err(mas.node); + if (ret == -ENOMEM) + mas_dup_free(&new_mas); + } + + return ret; +} +EXPORT_SYMBOL(__mt_dup); + +/** + * mtree_dup(): Duplicate an entire maple tree + * @mt: The source maple tree + * @new: The new maple tree + * @gfp: The GFP_FLAGS to use for allocations + * + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order + * traversal. It uses memcopy() to copy nodes in the source tree and allocate + * new child nodes in non-leaf nodes. The new node is exactly the same as the + * source node except for all the addresses stored in it. It will be faster than + * traversing all elements in the source tree and inserting them one by one into + * the new tree. + * The user needs to ensure that the attributes of the source tree and the new + * tree are the same, and the new tree needs to be an empty tree, otherwise + * -EINVAL will be returned. + * + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If + * the attributes of the two trees are different or the new tree is not an empty + * tree. + */ +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) +{ + int ret = 0; + MA_STATE(mas, mt, 0, 0); + MA_STATE(new_mas, new, 0, 0); + + mas_lock(&new_mas); + mas_lock_nested(&mas, SINGLE_DEPTH_NESTING); + + mas_dup_build(&mas, &new_mas, gfp); + mas_unlock(&mas); + + if (unlikely(mas_is_err(&mas))) { + ret = xa_err(mas.node); + if (ret == -ENOMEM) + mas_dup_free(&new_mas); + } + + mas_unlock(&new_mas); + + return ret; +} +EXPORT_SYMBOL(mtree_dup); + /** * __mt_destroy() - Walk and free all nodes of a locked maple tree. * @mt: The maple tree
Introduce interfaces __mt_dup() and mtree_dup(), which are used to duplicate a maple tree. They duplicate a maple tree in Depth-First Search (DFS) pre-order traversal. It uses memcopy() to copy nodes in the source tree and allocate new child nodes in non-leaf nodes. The new node is exactly the same as the source node except for all the addresses stored in it. It will be faster than traversing all elements in the source tree and inserting them one by one into the new tree. The time complexity of these two functions is O(n). The difference between __mt_dup() and mtree_dup() is that mtree_dup() handles locks internally. Analysis of the average time complexity of this algorithm: For simplicity, let's assume that the maximum branching factor of all non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a full tree. Under the given conditions, if there is a maple tree with n elements, the number of its leaves is n/16. From bottom to top, the number of nodes in each level is 1/16 of the number of nodes in the level below. So the total number of nodes in the entire tree is given by the sum of n/16 + n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has log(n) terms with base 16. According to the formula for the sum of a geometric series, the sum of this series can be calculated as (n-1)/15. Each node has only one parent node pointer, which can be considered as an edge. In total, there are (n-1)/15-1 edges. This algorithm consists of two operations: 1. Traversing all nodes in DFS order. 2. For each node, making a copy and performing necessary modifications to create a new node. For the first part, DFS traversal will visit each edge twice. Let T(ascend) represent the cost of taking one step downwards, and T(descend) represent the cost of taking one step upwards. And both of them are constants (although mas_ascend() may not be, as it contains a loop, but here we ignore it and treat it as a constant). So the time spent on the first part can be represented as ((n-1)/15-1) * (T(ascend) + T(descend)). For the second part, each node will be copied, and the cost of copying a node is denoted as T(copy_node). For each non-leaf node, it is necessary to reallocate all child nodes, and the cost of this operation is denoted as T(dup_alloc). The behavior behind memory allocation is complex and not specific to the maple tree operation. Here, we assume that the time required for a single allocation is constant. Since the size of a node is fixed, both of these symbols are also constants. We can calculate that the time spent on the second part is ((n-1)/15) * T(copy_node) + ((n-1)/15 - n/16) * T(dup_alloc). Adding both parts together, the total time spent by the algorithm can be represented as: ((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) - n/16 * T(dup_alloc) - (T(ascend) + T(descend)) Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc) Let C2 = T(dup_alloc) Let C3 = T(ascend) + T(descend) Finally, the expression can be simplified as: ((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3). This is a linear function, so the average time complexity is O(n). Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> --- include/linux/maple_tree.h | 3 + lib/maple_tree.c | 290 +++++++++++++++++++++++++++++++++++++ 2 files changed, 293 insertions(+)