Message ID | 20250211155034.268962-2-ziy@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Buddy allocator like (or non-uniform) folio split | expand |
On 11 Feb 2025, at 10:50, Zi Yan wrote: > It is a preparation patch for non-uniform folio split, which always split > a folio into half iteratively, and minimal xarray entry split. > > Currently, xas_split_alloc() and xas_split() always split all slots from a > multi-index entry. They cost the same number of xa_node as the to-be-split > slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 > slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are > needed. Instead xas_try_split() is intended to be used iteratively to split > the order-9 entry into 2 order-8 entries, then split one order-8 entry, > based on the given index, to 2 order-7 entries, ..., and split one order-1 > entry to 2 order-0 entries. When splitting the order-6 entry and a new > xa_node is needed, xas_try_split() will try to allocate one if possible. > As a result, xas_try_split() would only need one xa_node instead of 8. > > When a new xa_node is needed during the split, xas_try_split() can try to > allocate one but no more. -ENOMEM will be return if a node cannot be > allocated. -EINVAL will be return if a sibling node is split or > cascade split happens, where two or more new nodes are needed, and these > are not supported by xas_try_split(). > > xas_split_alloc() and xas_split() split an order-9 to order-0: > > --------------------------------- > | | | | | | | | | > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | > | | | | | | | | | > --------------------------------- > | | | | > ------- --- --- ------- > | | ... | | > V V V V > ----------- ----------- ----------- ----------- > | xa_node | | xa_node | ... | xa_node | | xa_node | > ----------- ----------- ----------- ----------- > > xas_try_split() splits an order-9 to order-0: > --------------------------------- > | | | | | | | | | > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | > | | | | | | | | | > --------------------------------- > | > | > V > ----------- > | xa_node | > ----------- > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > Documentation/core-api/xarray.rst | 14 ++- > include/linux/xarray.h | 7 ++ > lib/test_xarray.c | 47 +++++++++++ > lib/xarray.c | 136 ++++++++++++++++++++++++++---- > tools/testing/radix-tree/Makefile | 1 + > 5 files changed, 188 insertions(+), 17 deletions(-) Hi Andrew, Do you mind folding the diff below to this one? I changed the function name but forgot the one in the xarray test. Thanks. diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 598ca38a2f5b..cc2dd325158f 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -1868,7 +1868,7 @@ static void check_split_2(struct xarray *xa, unsigned long index, xa_set_mark(xa, index, XA_MARK_1); xas_lock(&xas); - xas_try_halve(&xas, xa, order, GFP_KERNEL); + xas_try_split(&xas, xa, order, GFP_KERNEL); if (((new_order / XA_CHUNK_SHIFT) < (order / XA_CHUNK_SHIFT)) && new_order < order - 1) { XA_BUG_ON(xa, !xas_error(&xas) || xas_error(&xas) != -EINVAL); Best Regards, Yan, Zi
On 11 Feb 2025, at 19:57, Zi Yan wrote: > On 11 Feb 2025, at 10:50, Zi Yan wrote: > >> It is a preparation patch for non-uniform folio split, which always split >> a folio into half iteratively, and minimal xarray entry split. >> >> Currently, xas_split_alloc() and xas_split() always split all slots from a >> multi-index entry. They cost the same number of xa_node as the to-be-split >> slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 >> slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are >> needed. Instead xas_try_split() is intended to be used iteratively to split >> the order-9 entry into 2 order-8 entries, then split one order-8 entry, >> based on the given index, to 2 order-7 entries, ..., and split one order-1 >> entry to 2 order-0 entries. When splitting the order-6 entry and a new >> xa_node is needed, xas_try_split() will try to allocate one if possible. >> As a result, xas_try_split() would only need one xa_node instead of 8. >> >> When a new xa_node is needed during the split, xas_try_split() can try to >> allocate one but no more. -ENOMEM will be return if a node cannot be >> allocated. -EINVAL will be return if a sibling node is split or >> cascade split happens, where two or more new nodes are needed, and these >> are not supported by xas_try_split(). >> >> xas_split_alloc() and xas_split() split an order-9 to order-0: >> >> --------------------------------- >> | | | | | | | | | >> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >> | | | | | | | | | >> --------------------------------- >> | | | | >> ------- --- --- ------- >> | | ... | | >> V V V V >> ----------- ----------- ----------- ----------- >> | xa_node | | xa_node | ... | xa_node | | xa_node | >> ----------- ----------- ----------- ----------- >> >> xas_try_split() splits an order-9 to order-0: >> --------------------------------- >> | | | | | | | | | >> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >> | | | | | | | | | >> --------------------------------- >> | >> | >> V >> ----------- >> | xa_node | >> ----------- >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> Documentation/core-api/xarray.rst | 14 ++- >> include/linux/xarray.h | 7 ++ >> lib/test_xarray.c | 47 +++++++++++ >> lib/xarray.c | 136 ++++++++++++++++++++++++++---- >> tools/testing/radix-tree/Makefile | 1 + >> 5 files changed, 188 insertions(+), 17 deletions(-) > > Hi Andrew, > > Do you mind folding the diff below to this one? I changed the function > name but forgot the one in the xarray test. Thanks. From bdf3b10f2ebcd09898ba7a277ac7107c25b8c71b Mon Sep 17 00:00:00 2001 From: Zi Yan <ziy@nvidia.com> Date: Tue, 11 Feb 2025 20:48:55 -0500 Subject: [PATCH] correct the function name. Signed-off-by: Zi Yan <ziy@nvidia.com> --- lib/test_xarray.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 598ca38a2f5b..cc2dd325158f 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -1868,7 +1868,7 @@ static void check_split_2(struct xarray *xa, unsigned long index, xa_set_mark(xa, index, XA_MARK_1); xas_lock(&xas); - xas_try_halve(&xas, xa, order, GFP_KERNEL); + xas_try_split(&xas, xa, order, GFP_KERNEL); if (((new_order / XA_CHUNK_SHIFT) < (order / XA_CHUNK_SHIFT)) && new_order < order - 1) { XA_BUG_ON(xa, !xas_error(&xas) || xas_error(&xas) != -EINVAL);
On 11.02.25 16:50, Zi Yan wrote: > It is a preparation patch for non-uniform folio split, which always split > a folio into half iteratively, and minimal xarray entry split. > > Currently, xas_split_alloc() and xas_split() always split all slots from a > multi-index entry. They cost the same number of xa_node as the to-be-split > slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 > slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are > needed. Instead xas_try_split() is intended to be used iteratively to split > the order-9 entry into 2 order-8 entries, then split one order-8 entry, > based on the given index, to 2 order-7 entries, ..., and split one order-1 > entry to 2 order-0 entries. When splitting the order-6 entry and a new > xa_node is needed, xas_try_split() will try to allocate one if possible. > As a result, xas_try_split() would only need one xa_node instead of 8. > > When a new xa_node is needed during the split, xas_try_split() can try to > allocate one but no more. -ENOMEM will be return if a node cannot be > allocated. -EINVAL will be return if a sibling node is split or > cascade split happens, where two or more new nodes are needed, and these > are not supported by xas_try_split(). > > xas_split_alloc() and xas_split() split an order-9 to order-0: > > --------------------------------- > | | | | | | | | | > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | > | | | | | | | | | > --------------------------------- > | | | | > ------- --- --- ------- > | | ... | | > V V V V > ----------- ----------- ----------- ----------- > | xa_node | | xa_node | ... | xa_node | | xa_node | > ----------- ----------- ----------- ----------- > > xas_try_split() splits an order-9 to order-0: > --------------------------------- > | | | | | | | | | > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | > | | | | | | | | | > --------------------------------- > | > | > V > ----------- > | xa_node | > ----------- > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > Documentation/core-api/xarray.rst | 14 ++- > include/linux/xarray.h | 7 ++ > lib/test_xarray.c | 47 +++++++++++ > lib/xarray.c | 136 ++++++++++++++++++++++++++---- > tools/testing/radix-tree/Makefile | 1 + > 5 files changed, 188 insertions(+), 17 deletions(-) > > diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst > index f6a3eef4fe7f..c6c91cbd0c3c 100644 > --- a/Documentation/core-api/xarray.rst > +++ b/Documentation/core-api/xarray.rst > @@ -489,7 +489,19 @@ Storing ``NULL`` into any index of a multi-index entry will set the > entry at every index to ``NULL`` and dissolve the tie. A multi-index > entry can be split into entries occupying smaller ranges by calling > xas_split_alloc() without the xa_lock held, followed by taking the lock > -and calling xas_split(). > +and calling xas_split() or calling xas_try_split() with xa_lock. The > +difference between xas_split_alloc()+xas_split() and xas_try_alloc() is > +that xas_split_alloc() + xas_split() split the entry from the original > +order to the new order in one shot uniformly, whereas xas_try_split() > +iteratively splits the entry containing the index non-uniformly. > +For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, > +assuming ``XA_CHUNK_SHIFT`` is 6, xas_split_alloc() + xas_split() need > +8 xa_node. xas_try_split() splits the order-9 entry into > +2 order-8 entries, then split one order-8 entry, based on the given index, > +to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. > +When splitting the order-6 entry and a new xa_node is needed, xas_try_split() > +will try to allocate one if possible. As a result, xas_try_split() would only > +need 1 xa_node instead of 8. > > Functions and structures > ======================== > diff --git a/include/linux/xarray.h b/include/linux/xarray.h > index 0b618ec04115..9eb8c7425090 100644 > --- a/include/linux/xarray.h > +++ b/include/linux/xarray.h > @@ -1555,6 +1555,8 @@ int xa_get_order(struct xarray *, unsigned long index); > int xas_get_order(struct xa_state *xas); > void xas_split(struct xa_state *, void *entry, unsigned int order); > void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); > +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, > + gfp_t gfp); > #else > static inline int xa_get_order(struct xarray *xa, unsigned long index) > { > @@ -1576,6 +1578,11 @@ static inline void xas_split_alloc(struct xa_state *xas, void *entry, > unsigned int order, gfp_t gfp) > { > } > + > +static inline void xas_try_split(struct xa_state *xas, void *entry, > + unsigned int order, gfp_t gfp) > +{ > +} > #endif > > /** > diff --git a/lib/test_xarray.c b/lib/test_xarray.c > index 6932a26f4927..598ca38a2f5b 100644 > --- a/lib/test_xarray.c > +++ b/lib/test_xarray.c > @@ -1857,6 +1857,49 @@ static void check_split_1(struct xarray *xa, unsigned long index, > xa_destroy(xa); > } > > +static void check_split_2(struct xarray *xa, unsigned long index, > + unsigned int order, unsigned int new_order) > +{ > + XA_STATE_ORDER(xas, xa, index, new_order); > + unsigned int i, found; > + void *entry; > + > + xa_store_order(xa, index, order, xa, GFP_KERNEL); > + xa_set_mark(xa, index, XA_MARK_1); > + > + xas_lock(&xas); > + xas_try_halve(&xas, xa, order, GFP_KERNEL); > + if (((new_order / XA_CHUNK_SHIFT) < (order / XA_CHUNK_SHIFT)) && > + new_order < order - 1) { > + XA_BUG_ON(xa, !xas_error(&xas) || xas_error(&xas) != -EINVAL); > + xas_unlock(&xas); > + goto out; > + } > + for (i = 0; i < (1 << order); i += (1 << new_order)) > + __xa_store(xa, index + i, xa_mk_index(index + i), 0); > + xas_unlock(&xas); > + > + for (i = 0; i < (1 << order); i++) { > + unsigned int val = index + (i & ~((1 << new_order) - 1)); > + XA_BUG_ON(xa, xa_load(xa, index + i) != xa_mk_index(val)); > + } > + > + xa_set_mark(xa, index, XA_MARK_0); > + XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0)); > + > + xas_set_order(&xas, index, 0); > + found = 0; > + rcu_read_lock(); > + xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_1) { > + found++; > + XA_BUG_ON(xa, xa_is_internal(entry)); > + } > + rcu_read_unlock(); > + XA_BUG_ON(xa, found != 1 << (order - new_order)); > +out: > + xa_destroy(xa); > +} > + > static noinline void check_split(struct xarray *xa) > { > unsigned int order, new_order; > @@ -1868,6 +1911,10 @@ static noinline void check_split(struct xarray *xa) > check_split_1(xa, 0, order, new_order); > check_split_1(xa, 1UL << order, order, new_order); > check_split_1(xa, 3UL << order, order, new_order); > + > + check_split_2(xa, 0, order, new_order); > + check_split_2(xa, 1UL << order, order, new_order); > + check_split_2(xa, 3UL << order, order, new_order); > } > } > } > diff --git a/lib/xarray.c b/lib/xarray.c > index 116e9286c64e..c38beca77830 100644 > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -1007,6 +1007,31 @@ static void node_set_marks(struct xa_node *node, unsigned int offset, > } > } > > +static struct xa_node *__xas_alloc_node_for_split(struct xa_state *xas, > + void *entry, gfp_t gfp) > +{ > + unsigned int i; > + void *sibling = NULL; > + struct xa_node *node; > + unsigned int mask = xas->xa_sibs; > + > + node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); > + if (!node) > + return NULL; > + node->array = xas->xa; > + for (i = 0; i < XA_CHUNK_SIZE; i++) { > + if ((i & mask) == 0) { > + RCU_INIT_POINTER(node->slots[i], entry); > + sibling = xa_mk_sibling(i); > + } else { > + RCU_INIT_POINTER(node->slots[i], sibling); > + } > + } > + RCU_INIT_POINTER(node->parent, xas->xa_alloc); > + > + return node; > +} > + > /** > * xas_split_alloc() - Allocate memory for splitting an entry. > * @xas: XArray operation state. > @@ -1025,7 +1050,6 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, > gfp_t gfp) > { > unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; > - unsigned int mask = xas->xa_sibs; > > /* XXX: no support for splitting really large entries yet */ > if (WARN_ON(xas->xa_shift + 2 * XA_CHUNK_SHIFT <= order)) > @@ -1034,23 +1058,9 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, > return; > > do { > - unsigned int i; > - void *sibling = NULL; > - struct xa_node *node; > - > - node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); > + struct xa_node *node = __xas_alloc_node_for_split(xas, entry, gfp); > if (!node) > goto nomem; > - node->array = xas->xa; > - for (i = 0; i < XA_CHUNK_SIZE; i++) { > - if ((i & mask) == 0) { > - RCU_INIT_POINTER(node->slots[i], entry); > - sibling = xa_mk_sibling(i); > - } else { > - RCU_INIT_POINTER(node->slots[i], sibling); > - } > - } > - RCU_INIT_POINTER(node->parent, xas->xa_alloc); > xas->xa_alloc = node; > } while (sibs-- > 0); > > @@ -1122,6 +1132,100 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) > xas_update(xas, node); > } > EXPORT_SYMBOL_GPL(xas_split); > + > +/** > + * xas_try_split() - Try to split a multi-index entry. > + * @xas: XArray operation state. > + * @entry: New entry to store in the array. > + * @order: Current entry order. > + * @gfp: Memory allocation flags. > + * > + * The size of the new entries is set in @xas. The value in @entry is > + * copied to all the replacement entries. If and only if one xa_node needs to > + * be allocated, the function will use @gfp to get one. If more xa_node are > + * needed, the function gives EINVAL error. > + * > + * Context: Any context. The caller should hold the xa_lock. > + */ > +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, > + gfp_t gfp) > +{ > + unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; > + unsigned int offset, marks; > + struct xa_node *node; > + void *curr = xas_load(xas); > + int values = 0; > + > + node = xas->xa_node; > + if (xas_top(node)) > + return; > + > + if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) > + gfp |= __GFP_ACCOUNT; > + > + marks = node_get_marks(node, xas->xa_offset); > + > + offset = xas->xa_offset + sibs; > + do { > + if (xas->xa_shift < node->shift) { > + struct xa_node *child = xas->xa_alloc; > + unsigned int expected_sibs = > + (1 << ((order - 1) % XA_CHUNK_SHIFT)) - 1; > + > + /* > + * No support for splitting sibling entries > + * (horizontally) or cascade split (vertically), which > + * requires two or more new xa_nodes. > + * Since if one xa_node allocation fails, > + * it is hard to free the prior allocations. > + */ > + if (sibs || xas->xa_sibs != expected_sibs) { > + xas_destroy(xas); > + xas_set_err(xas, -EINVAL); > + return; > + } > + > + if (!child) { > + child = __xas_alloc_node_for_split(xas, entry, > + gfp); > + if (!child) { > + xas_destroy(xas); > + xas_set_err(xas, -ENOMEM); > + return; > + } > + } No expert on this, just wondering ... ... what is the effect if we halfway-through fail the split? Is it okay to leave that "partially split" thing in place? Can callers deal with that?
On 17 Feb 2025, at 16:44, David Hildenbrand wrote: > On 11.02.25 16:50, Zi Yan wrote: >> It is a preparation patch for non-uniform folio split, which always split >> a folio into half iteratively, and minimal xarray entry split. >> >> Currently, xas_split_alloc() and xas_split() always split all slots from a >> multi-index entry. They cost the same number of xa_node as the to-be-split >> slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 >> slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are >> needed. Instead xas_try_split() is intended to be used iteratively to split >> the order-9 entry into 2 order-8 entries, then split one order-8 entry, >> based on the given index, to 2 order-7 entries, ..., and split one order-1 >> entry to 2 order-0 entries. When splitting the order-6 entry and a new >> xa_node is needed, xas_try_split() will try to allocate one if possible. >> As a result, xas_try_split() would only need one xa_node instead of 8. >> >> When a new xa_node is needed during the split, xas_try_split() can try to >> allocate one but no more. -ENOMEM will be return if a node cannot be >> allocated. -EINVAL will be return if a sibling node is split or >> cascade split happens, where two or more new nodes are needed, and these >> are not supported by xas_try_split(). >> >> xas_split_alloc() and xas_split() split an order-9 to order-0: >> >> --------------------------------- >> | | | | | | | | | >> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >> | | | | | | | | | >> --------------------------------- >> | | | | >> ------- --- --- ------- >> | | ... | | >> V V V V >> ----------- ----------- ----------- ----------- >> | xa_node | | xa_node | ... | xa_node | | xa_node | >> ----------- ----------- ----------- ----------- >> >> xas_try_split() splits an order-9 to order-0: >> --------------------------------- >> | | | | | | | | | >> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >> | | | | | | | | | >> --------------------------------- >> | >> | >> V >> ----------- >> | xa_node | >> ----------- >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> Documentation/core-api/xarray.rst | 14 ++- >> include/linux/xarray.h | 7 ++ >> lib/test_xarray.c | 47 +++++++++++ >> lib/xarray.c | 136 ++++++++++++++++++++++++++---- >> tools/testing/radix-tree/Makefile | 1 + >> 5 files changed, 188 insertions(+), 17 deletions(-) >> >> diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst >> index f6a3eef4fe7f..c6c91cbd0c3c 100644 >> --- a/Documentation/core-api/xarray.rst >> +++ b/Documentation/core-api/xarray.rst >> @@ -489,7 +489,19 @@ Storing ``NULL`` into any index of a multi-index entry will set the >> entry at every index to ``NULL`` and dissolve the tie. A multi-index >> entry can be split into entries occupying smaller ranges by calling >> xas_split_alloc() without the xa_lock held, followed by taking the lock >> -and calling xas_split(). >> +and calling xas_split() or calling xas_try_split() with xa_lock. The >> +difference between xas_split_alloc()+xas_split() and xas_try_alloc() is >> +that xas_split_alloc() + xas_split() split the entry from the original >> +order to the new order in one shot uniformly, whereas xas_try_split() >> +iteratively splits the entry containing the index non-uniformly. >> +For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, >> +assuming ``XA_CHUNK_SHIFT`` is 6, xas_split_alloc() + xas_split() need >> +8 xa_node. xas_try_split() splits the order-9 entry into >> +2 order-8 entries, then split one order-8 entry, based on the given index, >> +to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. >> +When splitting the order-6 entry and a new xa_node is needed, xas_try_split() >> +will try to allocate one if possible. As a result, xas_try_split() would only >> +need 1 xa_node instead of 8. >> Functions and structures >> ======================== >> diff --git a/include/linux/xarray.h b/include/linux/xarray.h >> index 0b618ec04115..9eb8c7425090 100644 >> --- a/include/linux/xarray.h >> +++ b/include/linux/xarray.h >> @@ -1555,6 +1555,8 @@ int xa_get_order(struct xarray *, unsigned long index); >> int xas_get_order(struct xa_state *xas); >> void xas_split(struct xa_state *, void *entry, unsigned int order); >> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); >> +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, >> + gfp_t gfp); >> #else >> static inline int xa_get_order(struct xarray *xa, unsigned long index) >> { >> @@ -1576,6 +1578,11 @@ static inline void xas_split_alloc(struct xa_state *xas, void *entry, >> unsigned int order, gfp_t gfp) >> { >> } >> + >> +static inline void xas_try_split(struct xa_state *xas, void *entry, >> + unsigned int order, gfp_t gfp) >> +{ >> +} >> #endif >> /** >> diff --git a/lib/test_xarray.c b/lib/test_xarray.c >> index 6932a26f4927..598ca38a2f5b 100644 >> --- a/lib/test_xarray.c >> +++ b/lib/test_xarray.c >> @@ -1857,6 +1857,49 @@ static void check_split_1(struct xarray *xa, unsigned long index, >> xa_destroy(xa); >> } >> +static void check_split_2(struct xarray *xa, unsigned long index, >> + unsigned int order, unsigned int new_order) >> +{ >> + XA_STATE_ORDER(xas, xa, index, new_order); >> + unsigned int i, found; >> + void *entry; >> + >> + xa_store_order(xa, index, order, xa, GFP_KERNEL); >> + xa_set_mark(xa, index, XA_MARK_1); >> + >> + xas_lock(&xas); >> + xas_try_halve(&xas, xa, order, GFP_KERNEL); >> + if (((new_order / XA_CHUNK_SHIFT) < (order / XA_CHUNK_SHIFT)) && >> + new_order < order - 1) { >> + XA_BUG_ON(xa, !xas_error(&xas) || xas_error(&xas) != -EINVAL); >> + xas_unlock(&xas); >> + goto out; >> + } >> + for (i = 0; i < (1 << order); i += (1 << new_order)) >> + __xa_store(xa, index + i, xa_mk_index(index + i), 0); >> + xas_unlock(&xas); >> + >> + for (i = 0; i < (1 << order); i++) { >> + unsigned int val = index + (i & ~((1 << new_order) - 1)); >> + XA_BUG_ON(xa, xa_load(xa, index + i) != xa_mk_index(val)); >> + } >> + >> + xa_set_mark(xa, index, XA_MARK_0); >> + XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0)); >> + >> + xas_set_order(&xas, index, 0); >> + found = 0; >> + rcu_read_lock(); >> + xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_1) { >> + found++; >> + XA_BUG_ON(xa, xa_is_internal(entry)); >> + } >> + rcu_read_unlock(); >> + XA_BUG_ON(xa, found != 1 << (order - new_order)); >> +out: >> + xa_destroy(xa); >> +} >> + >> static noinline void check_split(struct xarray *xa) >> { >> unsigned int order, new_order; >> @@ -1868,6 +1911,10 @@ static noinline void check_split(struct xarray *xa) >> check_split_1(xa, 0, order, new_order); >> check_split_1(xa, 1UL << order, order, new_order); >> check_split_1(xa, 3UL << order, order, new_order); >> + >> + check_split_2(xa, 0, order, new_order); >> + check_split_2(xa, 1UL << order, order, new_order); >> + check_split_2(xa, 3UL << order, order, new_order); >> } >> } >> } >> diff --git a/lib/xarray.c b/lib/xarray.c >> index 116e9286c64e..c38beca77830 100644 >> --- a/lib/xarray.c >> +++ b/lib/xarray.c >> @@ -1007,6 +1007,31 @@ static void node_set_marks(struct xa_node *node, unsigned int offset, >> } >> } >> +static struct xa_node *__xas_alloc_node_for_split(struct xa_state *xas, >> + void *entry, gfp_t gfp) >> +{ >> + unsigned int i; >> + void *sibling = NULL; >> + struct xa_node *node; >> + unsigned int mask = xas->xa_sibs; >> + >> + node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); >> + if (!node) >> + return NULL; >> + node->array = xas->xa; >> + for (i = 0; i < XA_CHUNK_SIZE; i++) { >> + if ((i & mask) == 0) { >> + RCU_INIT_POINTER(node->slots[i], entry); >> + sibling = xa_mk_sibling(i); >> + } else { >> + RCU_INIT_POINTER(node->slots[i], sibling); >> + } >> + } >> + RCU_INIT_POINTER(node->parent, xas->xa_alloc); >> + >> + return node; >> +} >> + >> /** >> * xas_split_alloc() - Allocate memory for splitting an entry. >> * @xas: XArray operation state. >> @@ -1025,7 +1050,6 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, >> gfp_t gfp) >> { >> unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; >> - unsigned int mask = xas->xa_sibs; >> /* XXX: no support for splitting really large entries yet */ >> if (WARN_ON(xas->xa_shift + 2 * XA_CHUNK_SHIFT <= order)) >> @@ -1034,23 +1058,9 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, >> return; >> do { >> - unsigned int i; >> - void *sibling = NULL; >> - struct xa_node *node; >> - >> - node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); >> + struct xa_node *node = __xas_alloc_node_for_split(xas, entry, gfp); >> if (!node) >> goto nomem; >> - node->array = xas->xa; >> - for (i = 0; i < XA_CHUNK_SIZE; i++) { >> - if ((i & mask) == 0) { >> - RCU_INIT_POINTER(node->slots[i], entry); >> - sibling = xa_mk_sibling(i); >> - } else { >> - RCU_INIT_POINTER(node->slots[i], sibling); >> - } >> - } >> - RCU_INIT_POINTER(node->parent, xas->xa_alloc); >> xas->xa_alloc = node; >> } while (sibs-- > 0); >> @@ -1122,6 +1132,100 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) >> xas_update(xas, node); >> } >> EXPORT_SYMBOL_GPL(xas_split); >> + >> +/** >> + * xas_try_split() - Try to split a multi-index entry. >> + * @xas: XArray operation state. >> + * @entry: New entry to store in the array. >> + * @order: Current entry order. >> + * @gfp: Memory allocation flags. >> + * >> + * The size of the new entries is set in @xas. The value in @entry is >> + * copied to all the replacement entries. If and only if one xa_node needs to >> + * be allocated, the function will use @gfp to get one. If more xa_node are >> + * needed, the function gives EINVAL error. >> + * >> + * Context: Any context. The caller should hold the xa_lock. >> + */ >> +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, >> + gfp_t gfp) >> +{ >> + unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; >> + unsigned int offset, marks; >> + struct xa_node *node; >> + void *curr = xas_load(xas); >> + int values = 0; >> + >> + node = xas->xa_node; >> + if (xas_top(node)) >> + return; >> + >> + if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) >> + gfp |= __GFP_ACCOUNT; >> + >> + marks = node_get_marks(node, xas->xa_offset); >> + >> + offset = xas->xa_offset + sibs; >> + do { >> + if (xas->xa_shift < node->shift) { >> + struct xa_node *child = xas->xa_alloc; >> + unsigned int expected_sibs = >> + (1 << ((order - 1) % XA_CHUNK_SHIFT)) - 1; >> + >> + /* >> + * No support for splitting sibling entries >> + * (horizontally) or cascade split (vertically), which >> + * requires two or more new xa_nodes. >> + * Since if one xa_node allocation fails, >> + * it is hard to free the prior allocations. >> + */ >> + if (sibs || xas->xa_sibs != expected_sibs) { >> + xas_destroy(xas); >> + xas_set_err(xas, -EINVAL); >> + return; >> + } >> + >> + if (!child) { >> + child = __xas_alloc_node_for_split(xas, entry, >> + gfp); >> + if (!child) { >> + xas_destroy(xas); >> + xas_set_err(xas, -ENOMEM); >> + return; >> + } >> + } > > No expert on this, just wondering ... > > ... what is the effect if we halfway-through fail the split? Is it okay to leave that "partially split" thing in place? Can callers deal with that? Good question. xas_try_split() imposes what kind of split it does and is usually used to split from order N to order N-1: 1. when N is a multiplier of XA_CHUNK_SHIFT, a new xa_node is needed, so either child (namely xas->xa_alloc) is not NULL, meaning someone called xa_nomem() to allocate a xa_node before xas_try_split() or child is NULL and an allocation is needed. If child is still NULL after the allocation, meaning we are out of memory, no split is done; 2. when N is not, no new xa_node is needed, xas_try_split() just rewrites existing slot values to perform the split (the code after the hunk above). No fail will happen. For this split, since no new xa_node is needed, the caller is actually allowed to split from N to a value smaller than N-1 as long as N-1 is >= (N - N % XA_CHUNK_SHIFT). Various checks make sure xas_try_split() only sees the two above situation: a. "xas->xa_shift < node->shift" means the split crosses XA_CHUNK_SHIFT, at least 1 new xa_node is needed; the else branch only handles the case 2 above; b. for the then branch the "if (sibs || xas->xa_sibs != expected_sibs)" check makes sure N is a multiplier of XA_CHUNK_SHIFT and the new order has to be N-1. In "if (sibs || xas->xa_sibs != expected_sibs)", "sibs != 0" means the from order N covers more than 1 slot, so more than 1 new xa_node is needed, "xas->xa_sibs != expected_sibs" makes sure the new order is N-1 (you can see it from how expected_sibs is assigned). Let me know if you have any other question. -- Best Regards, Yan, Zi
On 17.02.25 23:05, Zi Yan wrote: > On 17 Feb 2025, at 16:44, David Hildenbrand wrote: > >> On 11.02.25 16:50, Zi Yan wrote: >>> It is a preparation patch for non-uniform folio split, which always split >>> a folio into half iteratively, and minimal xarray entry split. >>> >>> Currently, xas_split_alloc() and xas_split() always split all slots from a >>> multi-index entry. They cost the same number of xa_node as the to-be-split >>> slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 >>> slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are >>> needed. Instead xas_try_split() is intended to be used iteratively to split >>> the order-9 entry into 2 order-8 entries, then split one order-8 entry, >>> based on the given index, to 2 order-7 entries, ..., and split one order-1 >>> entry to 2 order-0 entries. When splitting the order-6 entry and a new >>> xa_node is needed, xas_try_split() will try to allocate one if possible. >>> As a result, xas_try_split() would only need one xa_node instead of 8. >>> >>> When a new xa_node is needed during the split, xas_try_split() can try to >>> allocate one but no more. -ENOMEM will be return if a node cannot be >>> allocated. -EINVAL will be return if a sibling node is split or >>> cascade split happens, where two or more new nodes are needed, and these >>> are not supported by xas_try_split(). >>> >>> xas_split_alloc() and xas_split() split an order-9 to order-0: >>> >>> --------------------------------- >>> | | | | | | | | | >>> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >>> | | | | | | | | | >>> --------------------------------- >>> | | | | >>> ------- --- --- ------- >>> | | ... | | >>> V V V V >>> ----------- ----------- ----------- ----------- >>> | xa_node | | xa_node | ... | xa_node | | xa_node | >>> ----------- ----------- ----------- ----------- >>> >>> xas_try_split() splits an order-9 to order-0: >>> --------------------------------- >>> | | | | | | | | | >>> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >>> | | | | | | | | | >>> --------------------------------- >>> | >>> | >>> V >>> ----------- >>> | xa_node | >>> ----------- >>> >>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>> --- >>> Documentation/core-api/xarray.rst | 14 ++- >>> include/linux/xarray.h | 7 ++ >>> lib/test_xarray.c | 47 +++++++++++ >>> lib/xarray.c | 136 ++++++++++++++++++++++++++---- >>> tools/testing/radix-tree/Makefile | 1 + >>> 5 files changed, 188 insertions(+), 17 deletions(-) >>> >>> diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst >>> index f6a3eef4fe7f..c6c91cbd0c3c 100644 >>> --- a/Documentation/core-api/xarray.rst >>> +++ b/Documentation/core-api/xarray.rst >>> @@ -489,7 +489,19 @@ Storing ``NULL`` into any index of a multi-index entry will set the >>> entry at every index to ``NULL`` and dissolve the tie. A multi-index >>> entry can be split into entries occupying smaller ranges by calling >>> xas_split_alloc() without the xa_lock held, followed by taking the lock >>> -and calling xas_split(). >>> +and calling xas_split() or calling xas_try_split() with xa_lock. The >>> +difference between xas_split_alloc()+xas_split() and xas_try_alloc() is >>> +that xas_split_alloc() + xas_split() split the entry from the original >>> +order to the new order in one shot uniformly, whereas xas_try_split() >>> +iteratively splits the entry containing the index non-uniformly. >>> +For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, >>> +assuming ``XA_CHUNK_SHIFT`` is 6, xas_split_alloc() + xas_split() need >>> +8 xa_node. xas_try_split() splits the order-9 entry into >>> +2 order-8 entries, then split one order-8 entry, based on the given index, >>> +to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. >>> +When splitting the order-6 entry and a new xa_node is needed, xas_try_split() >>> +will try to allocate one if possible. As a result, xas_try_split() would only >>> +need 1 xa_node instead of 8. >>> Functions and structures >>> ======================== >>> diff --git a/include/linux/xarray.h b/include/linux/xarray.h >>> index 0b618ec04115..9eb8c7425090 100644 >>> --- a/include/linux/xarray.h >>> +++ b/include/linux/xarray.h >>> @@ -1555,6 +1555,8 @@ int xa_get_order(struct xarray *, unsigned long index); >>> int xas_get_order(struct xa_state *xas); >>> void xas_split(struct xa_state *, void *entry, unsigned int order); >>> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); >>> +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, >>> + gfp_t gfp); >>> #else >>> static inline int xa_get_order(struct xarray *xa, unsigned long index) >>> { >>> @@ -1576,6 +1578,11 @@ static inline void xas_split_alloc(struct xa_state *xas, void *entry, >>> unsigned int order, gfp_t gfp) >>> { >>> } >>> + >>> +static inline void xas_try_split(struct xa_state *xas, void *entry, >>> + unsigned int order, gfp_t gfp) >>> +{ >>> +} >>> #endif >>> /** >>> diff --git a/lib/test_xarray.c b/lib/test_xarray.c >>> index 6932a26f4927..598ca38a2f5b 100644 >>> --- a/lib/test_xarray.c >>> +++ b/lib/test_xarray.c >>> @@ -1857,6 +1857,49 @@ static void check_split_1(struct xarray *xa, unsigned long index, >>> xa_destroy(xa); >>> } >>> +static void check_split_2(struct xarray *xa, unsigned long index, >>> + unsigned int order, unsigned int new_order) >>> +{ >>> + XA_STATE_ORDER(xas, xa, index, new_order); >>> + unsigned int i, found; >>> + void *entry; >>> + >>> + xa_store_order(xa, index, order, xa, GFP_KERNEL); >>> + xa_set_mark(xa, index, XA_MARK_1); >>> + >>> + xas_lock(&xas); >>> + xas_try_halve(&xas, xa, order, GFP_KERNEL); >>> + if (((new_order / XA_CHUNK_SHIFT) < (order / XA_CHUNK_SHIFT)) && >>> + new_order < order - 1) { >>> + XA_BUG_ON(xa, !xas_error(&xas) || xas_error(&xas) != -EINVAL); >>> + xas_unlock(&xas); >>> + goto out; >>> + } >>> + for (i = 0; i < (1 << order); i += (1 << new_order)) >>> + __xa_store(xa, index + i, xa_mk_index(index + i), 0); >>> + xas_unlock(&xas); >>> + >>> + for (i = 0; i < (1 << order); i++) { >>> + unsigned int val = index + (i & ~((1 << new_order) - 1)); >>> + XA_BUG_ON(xa, xa_load(xa, index + i) != xa_mk_index(val)); >>> + } >>> + >>> + xa_set_mark(xa, index, XA_MARK_0); >>> + XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0)); >>> + >>> + xas_set_order(&xas, index, 0); >>> + found = 0; >>> + rcu_read_lock(); >>> + xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_1) { >>> + found++; >>> + XA_BUG_ON(xa, xa_is_internal(entry)); >>> + } >>> + rcu_read_unlock(); >>> + XA_BUG_ON(xa, found != 1 << (order - new_order)); >>> +out: >>> + xa_destroy(xa); >>> +} >>> + >>> static noinline void check_split(struct xarray *xa) >>> { >>> unsigned int order, new_order; >>> @@ -1868,6 +1911,10 @@ static noinline void check_split(struct xarray *xa) >>> check_split_1(xa, 0, order, new_order); >>> check_split_1(xa, 1UL << order, order, new_order); >>> check_split_1(xa, 3UL << order, order, new_order); >>> + >>> + check_split_2(xa, 0, order, new_order); >>> + check_split_2(xa, 1UL << order, order, new_order); >>> + check_split_2(xa, 3UL << order, order, new_order); >>> } >>> } >>> } >>> diff --git a/lib/xarray.c b/lib/xarray.c >>> index 116e9286c64e..c38beca77830 100644 >>> --- a/lib/xarray.c >>> +++ b/lib/xarray.c >>> @@ -1007,6 +1007,31 @@ static void node_set_marks(struct xa_node *node, unsigned int offset, >>> } >>> } >>> +static struct xa_node *__xas_alloc_node_for_split(struct xa_state *xas, >>> + void *entry, gfp_t gfp) >>> +{ >>> + unsigned int i; >>> + void *sibling = NULL; >>> + struct xa_node *node; >>> + unsigned int mask = xas->xa_sibs; >>> + >>> + node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); >>> + if (!node) >>> + return NULL; >>> + node->array = xas->xa; >>> + for (i = 0; i < XA_CHUNK_SIZE; i++) { >>> + if ((i & mask) == 0) { >>> + RCU_INIT_POINTER(node->slots[i], entry); >>> + sibling = xa_mk_sibling(i); >>> + } else { >>> + RCU_INIT_POINTER(node->slots[i], sibling); >>> + } >>> + } >>> + RCU_INIT_POINTER(node->parent, xas->xa_alloc); >>> + >>> + return node; >>> +} >>> + >>> /** >>> * xas_split_alloc() - Allocate memory for splitting an entry. >>> * @xas: XArray operation state. >>> @@ -1025,7 +1050,6 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, >>> gfp_t gfp) >>> { >>> unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; >>> - unsigned int mask = xas->xa_sibs; >>> /* XXX: no support for splitting really large entries yet */ >>> if (WARN_ON(xas->xa_shift + 2 * XA_CHUNK_SHIFT <= order)) >>> @@ -1034,23 +1058,9 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, >>> return; >>> do { >>> - unsigned int i; >>> - void *sibling = NULL; >>> - struct xa_node *node; >>> - >>> - node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); >>> + struct xa_node *node = __xas_alloc_node_for_split(xas, entry, gfp); >>> if (!node) >>> goto nomem; >>> - node->array = xas->xa; >>> - for (i = 0; i < XA_CHUNK_SIZE; i++) { >>> - if ((i & mask) == 0) { >>> - RCU_INIT_POINTER(node->slots[i], entry); >>> - sibling = xa_mk_sibling(i); >>> - } else { >>> - RCU_INIT_POINTER(node->slots[i], sibling); >>> - } >>> - } >>> - RCU_INIT_POINTER(node->parent, xas->xa_alloc); >>> xas->xa_alloc = node; >>> } while (sibs-- > 0); >>> @@ -1122,6 +1132,100 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) >>> xas_update(xas, node); >>> } >>> EXPORT_SYMBOL_GPL(xas_split); >>> + >>> +/** >>> + * xas_try_split() - Try to split a multi-index entry. >>> + * @xas: XArray operation state. >>> + * @entry: New entry to store in the array. >>> + * @order: Current entry order. >>> + * @gfp: Memory allocation flags. >>> + * >>> + * The size of the new entries is set in @xas. The value in @entry is >>> + * copied to all the replacement entries. If and only if one xa_node needs to >>> + * be allocated, the function will use @gfp to get one. If more xa_node are >>> + * needed, the function gives EINVAL error. >>> + * >>> + * Context: Any context. The caller should hold the xa_lock. >>> + */ >>> +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, >>> + gfp_t gfp) >>> +{ >>> + unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; >>> + unsigned int offset, marks; >>> + struct xa_node *node; >>> + void *curr = xas_load(xas); >>> + int values = 0; >>> + >>> + node = xas->xa_node; >>> + if (xas_top(node)) >>> + return; >>> + >>> + if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) >>> + gfp |= __GFP_ACCOUNT; >>> + >>> + marks = node_get_marks(node, xas->xa_offset); >>> + >>> + offset = xas->xa_offset + sibs; >>> + do { >>> + if (xas->xa_shift < node->shift) { >>> + struct xa_node *child = xas->xa_alloc; >>> + unsigned int expected_sibs = >>> + (1 << ((order - 1) % XA_CHUNK_SHIFT)) - 1; >>> + >>> + /* >>> + * No support for splitting sibling entries >>> + * (horizontally) or cascade split (vertically), which >>> + * requires two or more new xa_nodes. >>> + * Since if one xa_node allocation fails, >>> + * it is hard to free the prior allocations. >>> + */ >>> + if (sibs || xas->xa_sibs != expected_sibs) { >>> + xas_destroy(xas); >>> + xas_set_err(xas, -EINVAL); >>> + return; >>> + } >>> + >>> + if (!child) { >>> + child = __xas_alloc_node_for_split(xas, entry, >>> + gfp); >>> + if (!child) { >>> + xas_destroy(xas); >>> + xas_set_err(xas, -ENOMEM); >>> + return; >>> + } >>> + } >> >> No expert on this, just wondering ... >> >> ... what is the effect if we halfway-through fail the split? Is it okay to leave that "partially split" thing in place? Can callers deal with that? > > Good question. > Let me rephrase: In __split_unmapped_folio(), we call xas_try_split(). If that fails, we stop the split and effectively skip over the __split_folio_to_order(). The folio remains unsplit (no order change: old_order). xas_try_split() was instructed to split from old_order -> split_order. xas_try_split() documents that: "The value in @entry is copied to all the replacement entries.", meaning after the split, all entries will be pointing at the folio. Now, can it happen that xas_try_split() would ever perform a partial split in any way, when invoked from __split_unmapped_folio(), such that we run into the do { } while(); loop and fail with -ENOMEM after already having performed changes -- xas_update(). Or is that simply impossible? Maybe it's just the do { } while(); loop in there that is confusing me. (again, no expert) > xas_try_split() imposes what kind of split it does and is usually used to > split from order N to order N-1: You mean that old_order -> split_order will in the case of __split_unmapped_folio() always be a difference of 1? > > 1. when N is a multiplier of XA_CHUNK_SHIFT, a new xa_node is needed, so > either child (namely xas->xa_alloc) is not NULL, meaning someone called > xa_nomem() to allocate a xa_node before xas_try_split() or child is NULL > and an allocation is needed. If child is still NULL after the allocation, > meaning we are out of memory, no split is done; > > 2. when N is not, no new xa_node is needed, xas_try_split() just rewrites > existing slot values to perform the split (the code after the hunk above). > No fail will happen. For this split, since no new xa_node is needed, > the caller is actually allowed to split from N to a value smaller than > N-1 as long as N-1 is >= (N - N % XA_CHUNK_SHIFT). > > > Various checks make sure xas_try_split() only sees the two above situation: > > a. "xas->xa_shift < node->shift" means the split crosses XA_CHUNK_SHIFT, > at least 1 new xa_node is needed; the else branch only handles the case > 2 above; > > b. for the then branch the "if (sibs || xas->xa_sibs != expected_sibs)" > check makes sure N is a multiplier of XA_CHUNK_SHIFT and the new order > has to be N-1. In "if (sibs || xas->xa_sibs != expected_sibs)", > "sibs != 0" means the from order N covers more than 1 slot, so more than 1 > new xa_node is needed, "xas->xa_sibs != expected_sibs" makes sure > the new order is N-1 (you can see it from how expected_sibs is assigned). Thanks! > > Let me know if you have any other question. Thanks for the details!
On 18 Feb 2025, at 10:44, David Hildenbrand wrote: > On 17.02.25 23:05, Zi Yan wrote: >> On 17 Feb 2025, at 16:44, David Hildenbrand wrote: >> >>> On 11.02.25 16:50, Zi Yan wrote: >>>> It is a preparation patch for non-uniform folio split, which always split >>>> a folio into half iteratively, and minimal xarray entry split. >>>> >>>> Currently, xas_split_alloc() and xas_split() always split all slots from a >>>> multi-index entry. They cost the same number of xa_node as the to-be-split >>>> slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 >>>> slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are >>>> needed. Instead xas_try_split() is intended to be used iteratively to split >>>> the order-9 entry into 2 order-8 entries, then split one order-8 entry, >>>> based on the given index, to 2 order-7 entries, ..., and split one order-1 >>>> entry to 2 order-0 entries. When splitting the order-6 entry and a new >>>> xa_node is needed, xas_try_split() will try to allocate one if possible. >>>> As a result, xas_try_split() would only need one xa_node instead of 8. >>>> >>>> When a new xa_node is needed during the split, xas_try_split() can try to >>>> allocate one but no more. -ENOMEM will be return if a node cannot be >>>> allocated. -EINVAL will be return if a sibling node is split or >>>> cascade split happens, where two or more new nodes are needed, and these >>>> are not supported by xas_try_split(). >>>> >>>> xas_split_alloc() and xas_split() split an order-9 to order-0: >>>> >>>> --------------------------------- >>>> | | | | | | | | | >>>> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >>>> | | | | | | | | | >>>> --------------------------------- >>>> | | | | >>>> ------- --- --- ------- >>>> | | ... | | >>>> V V V V >>>> ----------- ----------- ----------- ----------- >>>> | xa_node | | xa_node | ... | xa_node | | xa_node | >>>> ----------- ----------- ----------- ----------- >>>> >>>> xas_try_split() splits an order-9 to order-0: >>>> --------------------------------- >>>> | | | | | | | | | >>>> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >>>> | | | | | | | | | >>>> --------------------------------- >>>> | >>>> | >>>> V >>>> ----------- >>>> | xa_node | >>>> ----------- >>>> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>>> --- >>>> Documentation/core-api/xarray.rst | 14 ++- >>>> include/linux/xarray.h | 7 ++ >>>> lib/test_xarray.c | 47 +++++++++++ >>>> lib/xarray.c | 136 ++++++++++++++++++++++++++---- >>>> tools/testing/radix-tree/Makefile | 1 + >>>> 5 files changed, 188 insertions(+), 17 deletions(-) >>>> >>>> diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst >>>> index f6a3eef4fe7f..c6c91cbd0c3c 100644 >>>> --- a/Documentation/core-api/xarray.rst >>>> +++ b/Documentation/core-api/xarray.rst >>>> @@ -489,7 +489,19 @@ Storing ``NULL`` into any index of a multi-index entry will set the >>>> entry at every index to ``NULL`` and dissolve the tie. A multi-index >>>> entry can be split into entries occupying smaller ranges by calling >>>> xas_split_alloc() without the xa_lock held, followed by taking the lock >>>> -and calling xas_split(). >>>> +and calling xas_split() or calling xas_try_split() with xa_lock. The >>>> +difference between xas_split_alloc()+xas_split() and xas_try_alloc() is >>>> +that xas_split_alloc() + xas_split() split the entry from the original >>>> +order to the new order in one shot uniformly, whereas xas_try_split() >>>> +iteratively splits the entry containing the index non-uniformly. >>>> +For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, >>>> +assuming ``XA_CHUNK_SHIFT`` is 6, xas_split_alloc() + xas_split() need >>>> +8 xa_node. xas_try_split() splits the order-9 entry into >>>> +2 order-8 entries, then split one order-8 entry, based on the given index, >>>> +to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. >>>> +When splitting the order-6 entry and a new xa_node is needed, xas_try_split() >>>> +will try to allocate one if possible. As a result, xas_try_split() would only >>>> +need 1 xa_node instead of 8. >>>> Functions and structures >>>> ======================== >>>> diff --git a/include/linux/xarray.h b/include/linux/xarray.h >>>> index 0b618ec04115..9eb8c7425090 100644 >>>> --- a/include/linux/xarray.h >>>> +++ b/include/linux/xarray.h >>>> @@ -1555,6 +1555,8 @@ int xa_get_order(struct xarray *, unsigned long index); >>>> int xas_get_order(struct xa_state *xas); >>>> void xas_split(struct xa_state *, void *entry, unsigned int order); >>>> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); >>>> +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, >>>> + gfp_t gfp); >>>> #else >>>> static inline int xa_get_order(struct xarray *xa, unsigned long index) >>>> { >>>> @@ -1576,6 +1578,11 @@ static inline void xas_split_alloc(struct xa_state *xas, void *entry, >>>> unsigned int order, gfp_t gfp) >>>> { >>>> } >>>> + >>>> +static inline void xas_try_split(struct xa_state *xas, void *entry, >>>> + unsigned int order, gfp_t gfp) >>>> +{ >>>> +} >>>> #endif >>>> /** >>>> diff --git a/lib/test_xarray.c b/lib/test_xarray.c >>>> index 6932a26f4927..598ca38a2f5b 100644 >>>> --- a/lib/test_xarray.c >>>> +++ b/lib/test_xarray.c >>>> @@ -1857,6 +1857,49 @@ static void check_split_1(struct xarray *xa, unsigned long index, >>>> xa_destroy(xa); >>>> } >>>> +static void check_split_2(struct xarray *xa, unsigned long index, >>>> + unsigned int order, unsigned int new_order) >>>> +{ >>>> + XA_STATE_ORDER(xas, xa, index, new_order); >>>> + unsigned int i, found; >>>> + void *entry; >>>> + >>>> + xa_store_order(xa, index, order, xa, GFP_KERNEL); >>>> + xa_set_mark(xa, index, XA_MARK_1); >>>> + >>>> + xas_lock(&xas); >>>> + xas_try_halve(&xas, xa, order, GFP_KERNEL); >>>> + if (((new_order / XA_CHUNK_SHIFT) < (order / XA_CHUNK_SHIFT)) && >>>> + new_order < order - 1) { >>>> + XA_BUG_ON(xa, !xas_error(&xas) || xas_error(&xas) != -EINVAL); >>>> + xas_unlock(&xas); >>>> + goto out; >>>> + } >>>> + for (i = 0; i < (1 << order); i += (1 << new_order)) >>>> + __xa_store(xa, index + i, xa_mk_index(index + i), 0); >>>> + xas_unlock(&xas); >>>> + >>>> + for (i = 0; i < (1 << order); i++) { >>>> + unsigned int val = index + (i & ~((1 << new_order) - 1)); >>>> + XA_BUG_ON(xa, xa_load(xa, index + i) != xa_mk_index(val)); >>>> + } >>>> + >>>> + xa_set_mark(xa, index, XA_MARK_0); >>>> + XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0)); >>>> + >>>> + xas_set_order(&xas, index, 0); >>>> + found = 0; >>>> + rcu_read_lock(); >>>> + xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_1) { >>>> + found++; >>>> + XA_BUG_ON(xa, xa_is_internal(entry)); >>>> + } >>>> + rcu_read_unlock(); >>>> + XA_BUG_ON(xa, found != 1 << (order - new_order)); >>>> +out: >>>> + xa_destroy(xa); >>>> +} >>>> + >>>> static noinline void check_split(struct xarray *xa) >>>> { >>>> unsigned int order, new_order; >>>> @@ -1868,6 +1911,10 @@ static noinline void check_split(struct xarray *xa) >>>> check_split_1(xa, 0, order, new_order); >>>> check_split_1(xa, 1UL << order, order, new_order); >>>> check_split_1(xa, 3UL << order, order, new_order); >>>> + >>>> + check_split_2(xa, 0, order, new_order); >>>> + check_split_2(xa, 1UL << order, order, new_order); >>>> + check_split_2(xa, 3UL << order, order, new_order); >>>> } >>>> } >>>> } >>>> diff --git a/lib/xarray.c b/lib/xarray.c >>>> index 116e9286c64e..c38beca77830 100644 >>>> --- a/lib/xarray.c >>>> +++ b/lib/xarray.c >>>> @@ -1007,6 +1007,31 @@ static void node_set_marks(struct xa_node *node, unsigned int offset, >>>> } >>>> } >>>> +static struct xa_node *__xas_alloc_node_for_split(struct xa_state *xas, >>>> + void *entry, gfp_t gfp) >>>> +{ >>>> + unsigned int i; >>>> + void *sibling = NULL; >>>> + struct xa_node *node; >>>> + unsigned int mask = xas->xa_sibs; >>>> + >>>> + node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); >>>> + if (!node) >>>> + return NULL; >>>> + node->array = xas->xa; >>>> + for (i = 0; i < XA_CHUNK_SIZE; i++) { >>>> + if ((i & mask) == 0) { >>>> + RCU_INIT_POINTER(node->slots[i], entry); >>>> + sibling = xa_mk_sibling(i); >>>> + } else { >>>> + RCU_INIT_POINTER(node->slots[i], sibling); >>>> + } >>>> + } >>>> + RCU_INIT_POINTER(node->parent, xas->xa_alloc); >>>> + >>>> + return node; >>>> +} >>>> + >>>> /** >>>> * xas_split_alloc() - Allocate memory for splitting an entry. >>>> * @xas: XArray operation state. >>>> @@ -1025,7 +1050,6 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, >>>> gfp_t gfp) >>>> { >>>> unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; >>>> - unsigned int mask = xas->xa_sibs; >>>> /* XXX: no support for splitting really large entries yet */ >>>> if (WARN_ON(xas->xa_shift + 2 * XA_CHUNK_SHIFT <= order)) >>>> @@ -1034,23 +1058,9 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, >>>> return; >>>> do { >>>> - unsigned int i; >>>> - void *sibling = NULL; >>>> - struct xa_node *node; >>>> - >>>> - node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); >>>> + struct xa_node *node = __xas_alloc_node_for_split(xas, entry, gfp); >>>> if (!node) >>>> goto nomem; >>>> - node->array = xas->xa; >>>> - for (i = 0; i < XA_CHUNK_SIZE; i++) { >>>> - if ((i & mask) == 0) { >>>> - RCU_INIT_POINTER(node->slots[i], entry); >>>> - sibling = xa_mk_sibling(i); >>>> - } else { >>>> - RCU_INIT_POINTER(node->slots[i], sibling); >>>> - } >>>> - } >>>> - RCU_INIT_POINTER(node->parent, xas->xa_alloc); >>>> xas->xa_alloc = node; >>>> } while (sibs-- > 0); >>>> @@ -1122,6 +1132,100 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) >>>> xas_update(xas, node); >>>> } >>>> EXPORT_SYMBOL_GPL(xas_split); >>>> + >>>> +/** >>>> + * xas_try_split() - Try to split a multi-index entry. >>>> + * @xas: XArray operation state. >>>> + * @entry: New entry to store in the array. >>>> + * @order: Current entry order. >>>> + * @gfp: Memory allocation flags. >>>> + * >>>> + * The size of the new entries is set in @xas. The value in @entry is >>>> + * copied to all the replacement entries. If and only if one xa_node needs to >>>> + * be allocated, the function will use @gfp to get one. If more xa_node are >>>> + * needed, the function gives EINVAL error. >>>> + * >>>> + * Context: Any context. The caller should hold the xa_lock. >>>> + */ >>>> +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, >>>> + gfp_t gfp) >>>> +{ >>>> + unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; >>>> + unsigned int offset, marks; >>>> + struct xa_node *node; >>>> + void *curr = xas_load(xas); >>>> + int values = 0; >>>> + >>>> + node = xas->xa_node; >>>> + if (xas_top(node)) >>>> + return; >>>> + >>>> + if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) >>>> + gfp |= __GFP_ACCOUNT; >>>> + >>>> + marks = node_get_marks(node, xas->xa_offset); >>>> + >>>> + offset = xas->xa_offset + sibs; >>>> + do { >>>> + if (xas->xa_shift < node->shift) { >>>> + struct xa_node *child = xas->xa_alloc; >>>> + unsigned int expected_sibs = >>>> + (1 << ((order - 1) % XA_CHUNK_SHIFT)) - 1; >>>> + >>>> + /* >>>> + * No support for splitting sibling entries >>>> + * (horizontally) or cascade split (vertically), which >>>> + * requires two or more new xa_nodes. >>>> + * Since if one xa_node allocation fails, >>>> + * it is hard to free the prior allocations. >>>> + */ >>>> + if (sibs || xas->xa_sibs != expected_sibs) { >>>> + xas_destroy(xas); >>>> + xas_set_err(xas, -EINVAL); >>>> + return; >>>> + } >>>> + >>>> + if (!child) { >>>> + child = __xas_alloc_node_for_split(xas, entry, >>>> + gfp); >>>> + if (!child) { >>>> + xas_destroy(xas); >>>> + xas_set_err(xas, -ENOMEM); >>>> + return; >>>> + } >>>> + } >>> >>> No expert on this, just wondering ... >>> >>> ... what is the effect if we halfway-through fail the split? Is it okay to leave that "partially split" thing in place? Can callers deal with that? >> >> Good question. >> > > Let me rephrase: In __split_unmapped_folio(), we call xas_try_split(). If that fails, we stop the split and effectively skip over the __split_folio_to_order(). The folio remains unsplit (no order change: old_order). Right. To be more specific, in !uniform_split case, the original folio can be split and old_order can change. Namely, if the caller wants to split an order-9, folio_split() can split it to 2 order-6s, 1 order-7, and 1 order-8 then cannot allocate a new xa_node due to memory constrains and stop. The caller will get 2 order-6s, 1 order-7, and 1 order-8 and folio_split() returns -ENOMEM. The caller needs to handle this situation, although it should be quite rare. Because unless the caller is splitting order-12 (or even higher orders) to order-0, at most 1 xa_node is needed. > > xas_try_split() was instructed to split from old_order -> split_order. > > xas_try_split() documents that: "The value in @entry is copied to all the replacement entries.", meaning after the split, all entries will be pointing at the folio. Right. > > Now, can it happen that xas_try_split() would ever perform a partial split in any way, when invoked from __split_unmapped_folio(), such that we run into the do { } while(); loop and fail with -ENOMEM after already having performed changes -- xas_update(). > > Or is that simply impossible? Right. It is impossible. xas_try_split() either splits by copying @entry to all the replacement entries, or is trying to allocate a new xa_node, which can result in -ENOMEM. These two will not be mixed. > > Maybe it's just the do { } while(); loop in there that is confusing me. (again, no expert) Yeah, that the do while loop is confusing. Let me restructure the code so that the do while loop only runs in the @entry copy case not the xa_node allocation case. > >> xas_try_split() imposes what kind of split it does and is usually used to >> split from order N to order N-1: > > You mean that old_order -> split_order will in the case of __split_unmapped_folio() always be a difference of 1? Yes for !uniform_split case. For uniform_split case (split_huge_page*() uses), xas_split() is used and all required new xa_node are preallocated by xas_split_alloc() in __folio_split(). > >> >> 1. when N is a multiplier of XA_CHUNK_SHIFT, a new xa_node is needed, so >> either child (namely xas->xa_alloc) is not NULL, meaning someone called >> xa_nomem() to allocate a xa_node before xas_try_split() or child is NULL >> and an allocation is needed. If child is still NULL after the allocation, >> meaning we are out of memory, no split is done; >> >> 2. when N is not, no new xa_node is needed, xas_try_split() just rewrites >> existing slot values to perform the split (the code after the hunk above). >> No fail will happen. For this split, since no new xa_node is needed, >> the caller is actually allowed to split from N to a value smaller than >> N-1 as long as N-1 is >= (N - N % XA_CHUNK_SHIFT). >> >> >> Various checks make sure xas_try_split() only sees the two above situation: >> >> a. "xas->xa_shift < node->shift" means the split crosses XA_CHUNK_SHIFT, >> at least 1 new xa_node is needed; the else branch only handles the case >> 2 above; >> >> b. for the then branch the "if (sibs || xas->xa_sibs != expected_sibs)" >> check makes sure N is a multiplier of XA_CHUNK_SHIFT and the new order >> has to be N-1. In "if (sibs || xas->xa_sibs != expected_sibs)", >> "sibs != 0" means the from order N covers more than 1 slot, so more than 1 >> new xa_node is needed, "xas->xa_sibs != expected_sibs" makes sure >> the new order is N-1 (you can see it from how expected_sibs is assigned). > > Thanks! > >> >> Let me know if you have any other question. > > Thanks for the details! Thank you for checking the code. :) Best Regards, Yan, Zi
>> >> Now, can it happen that xas_try_split() would ever perform a partial split in any way, when invoked from __split_unmapped_folio(), such that we run into the do { } while(); loop and fail with -ENOMEM after already having performed changes -- xas_update(). >> >> Or is that simply impossible? > > Right. It is impossible. xas_try_split() either splits by copying @entry > to all the replacement entries, or is trying to allocate a new xa_node, > which can result in -ENOMEM. These two will not be mixed. > >> >> Maybe it's just the do { } while(); loop in there that is confusing me. (again, no expert) > > Yeah, that the do while loop is confusing. Let me restructure the code > so that the do while loop only runs in the @entry copy case not the > xa_node allocation case. Great! > >> >>> xas_try_split() imposes what kind of split it does and is usually used to >>> split from order N to order N-1: >> >> You mean that old_order -> split_order will in the case of __split_unmapped_folio() always be a difference of 1? > > Yes for !uniform_split case. For uniform_split case (split_huge_page*() uses), > xas_split() is used and all required new xa_node are preallocated by > xas_split_alloc() in __folio_split(). Got it, thanks!
diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst index f6a3eef4fe7f..c6c91cbd0c3c 100644 --- a/Documentation/core-api/xarray.rst +++ b/Documentation/core-api/xarray.rst @@ -489,7 +489,19 @@ Storing ``NULL`` into any index of a multi-index entry will set the entry at every index to ``NULL`` and dissolve the tie. A multi-index entry can be split into entries occupying smaller ranges by calling xas_split_alloc() without the xa_lock held, followed by taking the lock -and calling xas_split(). +and calling xas_split() or calling xas_try_split() with xa_lock. The +difference between xas_split_alloc()+xas_split() and xas_try_alloc() is +that xas_split_alloc() + xas_split() split the entry from the original +order to the new order in one shot uniformly, whereas xas_try_split() +iteratively splits the entry containing the index non-uniformly. +For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, +assuming ``XA_CHUNK_SHIFT`` is 6, xas_split_alloc() + xas_split() need +8 xa_node. xas_try_split() splits the order-9 entry into +2 order-8 entries, then split one order-8 entry, based on the given index, +to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. +When splitting the order-6 entry and a new xa_node is needed, xas_try_split() +will try to allocate one if possible. As a result, xas_try_split() would only +need 1 xa_node instead of 8. Functions and structures ======================== diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 0b618ec04115..9eb8c7425090 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1555,6 +1555,8 @@ int xa_get_order(struct xarray *, unsigned long index); int xas_get_order(struct xa_state *xas); void xas_split(struct xa_state *, void *entry, unsigned int order); void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, + gfp_t gfp); #else static inline int xa_get_order(struct xarray *xa, unsigned long index) { @@ -1576,6 +1578,11 @@ static inline void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, gfp_t gfp) { } + +static inline void xas_try_split(struct xa_state *xas, void *entry, + unsigned int order, gfp_t gfp) +{ +} #endif /** diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 6932a26f4927..598ca38a2f5b 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -1857,6 +1857,49 @@ static void check_split_1(struct xarray *xa, unsigned long index, xa_destroy(xa); } +static void check_split_2(struct xarray *xa, unsigned long index, + unsigned int order, unsigned int new_order) +{ + XA_STATE_ORDER(xas, xa, index, new_order); + unsigned int i, found; + void *entry; + + xa_store_order(xa, index, order, xa, GFP_KERNEL); + xa_set_mark(xa, index, XA_MARK_1); + + xas_lock(&xas); + xas_try_halve(&xas, xa, order, GFP_KERNEL); + if (((new_order / XA_CHUNK_SHIFT) < (order / XA_CHUNK_SHIFT)) && + new_order < order - 1) { + XA_BUG_ON(xa, !xas_error(&xas) || xas_error(&xas) != -EINVAL); + xas_unlock(&xas); + goto out; + } + for (i = 0; i < (1 << order); i += (1 << new_order)) + __xa_store(xa, index + i, xa_mk_index(index + i), 0); + xas_unlock(&xas); + + for (i = 0; i < (1 << order); i++) { + unsigned int val = index + (i & ~((1 << new_order) - 1)); + XA_BUG_ON(xa, xa_load(xa, index + i) != xa_mk_index(val)); + } + + xa_set_mark(xa, index, XA_MARK_0); + XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0)); + + xas_set_order(&xas, index, 0); + found = 0; + rcu_read_lock(); + xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_1) { + found++; + XA_BUG_ON(xa, xa_is_internal(entry)); + } + rcu_read_unlock(); + XA_BUG_ON(xa, found != 1 << (order - new_order)); +out: + xa_destroy(xa); +} + static noinline void check_split(struct xarray *xa) { unsigned int order, new_order; @@ -1868,6 +1911,10 @@ static noinline void check_split(struct xarray *xa) check_split_1(xa, 0, order, new_order); check_split_1(xa, 1UL << order, order, new_order); check_split_1(xa, 3UL << order, order, new_order); + + check_split_2(xa, 0, order, new_order); + check_split_2(xa, 1UL << order, order, new_order); + check_split_2(xa, 3UL << order, order, new_order); } } } diff --git a/lib/xarray.c b/lib/xarray.c index 116e9286c64e..c38beca77830 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -1007,6 +1007,31 @@ static void node_set_marks(struct xa_node *node, unsigned int offset, } } +static struct xa_node *__xas_alloc_node_for_split(struct xa_state *xas, + void *entry, gfp_t gfp) +{ + unsigned int i; + void *sibling = NULL; + struct xa_node *node; + unsigned int mask = xas->xa_sibs; + + node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); + if (!node) + return NULL; + node->array = xas->xa; + for (i = 0; i < XA_CHUNK_SIZE; i++) { + if ((i & mask) == 0) { + RCU_INIT_POINTER(node->slots[i], entry); + sibling = xa_mk_sibling(i); + } else { + RCU_INIT_POINTER(node->slots[i], sibling); + } + } + RCU_INIT_POINTER(node->parent, xas->xa_alloc); + + return node; +} + /** * xas_split_alloc() - Allocate memory for splitting an entry. * @xas: XArray operation state. @@ -1025,7 +1050,6 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, gfp_t gfp) { unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; - unsigned int mask = xas->xa_sibs; /* XXX: no support for splitting really large entries yet */ if (WARN_ON(xas->xa_shift + 2 * XA_CHUNK_SHIFT <= order)) @@ -1034,23 +1058,9 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, return; do { - unsigned int i; - void *sibling = NULL; - struct xa_node *node; - - node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); + struct xa_node *node = __xas_alloc_node_for_split(xas, entry, gfp); if (!node) goto nomem; - node->array = xas->xa; - for (i = 0; i < XA_CHUNK_SIZE; i++) { - if ((i & mask) == 0) { - RCU_INIT_POINTER(node->slots[i], entry); - sibling = xa_mk_sibling(i); - } else { - RCU_INIT_POINTER(node->slots[i], sibling); - } - } - RCU_INIT_POINTER(node->parent, xas->xa_alloc); xas->xa_alloc = node; } while (sibs-- > 0); @@ -1122,6 +1132,100 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) xas_update(xas, node); } EXPORT_SYMBOL_GPL(xas_split); + +/** + * xas_try_split() - Try to split a multi-index entry. + * @xas: XArray operation state. + * @entry: New entry to store in the array. + * @order: Current entry order. + * @gfp: Memory allocation flags. + * + * The size of the new entries is set in @xas. The value in @entry is + * copied to all the replacement entries. If and only if one xa_node needs to + * be allocated, the function will use @gfp to get one. If more xa_node are + * needed, the function gives EINVAL error. + * + * Context: Any context. The caller should hold the xa_lock. + */ +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, + gfp_t gfp) +{ + unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; + unsigned int offset, marks; + struct xa_node *node; + void *curr = xas_load(xas); + int values = 0; + + node = xas->xa_node; + if (xas_top(node)) + return; + + if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) + gfp |= __GFP_ACCOUNT; + + marks = node_get_marks(node, xas->xa_offset); + + offset = xas->xa_offset + sibs; + do { + if (xas->xa_shift < node->shift) { + struct xa_node *child = xas->xa_alloc; + unsigned int expected_sibs = + (1 << ((order - 1) % XA_CHUNK_SHIFT)) - 1; + + /* + * No support for splitting sibling entries + * (horizontally) or cascade split (vertically), which + * requires two or more new xa_nodes. + * Since if one xa_node allocation fails, + * it is hard to free the prior allocations. + */ + if (sibs || xas->xa_sibs != expected_sibs) { + xas_destroy(xas); + xas_set_err(xas, -EINVAL); + return; + } + + if (!child) { + child = __xas_alloc_node_for_split(xas, entry, + gfp); + if (!child) { + xas_destroy(xas); + xas_set_err(xas, -ENOMEM); + return; + } + } + + xas->xa_alloc = rcu_dereference_raw(child->parent); + child->shift = node->shift - XA_CHUNK_SHIFT; + child->offset = offset; + child->count = XA_CHUNK_SIZE; + child->nr_values = xa_is_value(entry) ? + XA_CHUNK_SIZE : 0; + RCU_INIT_POINTER(child->parent, node); + node_set_marks(node, offset, child, xas->xa_sibs, + marks); + rcu_assign_pointer(node->slots[offset], + xa_mk_node(child)); + if (xa_is_value(curr)) + values--; + xas_update(xas, child); + } else { + unsigned int canon = offset - xas->xa_sibs; + + node_set_marks(node, canon, NULL, 0, marks); + rcu_assign_pointer(node->slots[canon], entry); + while (offset > canon) + rcu_assign_pointer(node->slots[offset--], + xa_mk_sibling(canon)); + values += (xa_is_value(entry) - xa_is_value(curr)) * + (xas->xa_sibs + 1); + } + } while (offset-- > xas->xa_offset); + + node->nr_values += values; + xas_update(xas, node); +} +EXPORT_SYMBOL_GPL(xas_try_split); #endif /** diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile index 8b3591a51e1f..b2a6660bbd92 100644 --- a/tools/testing/radix-tree/Makefile +++ b/tools/testing/radix-tree/Makefile @@ -14,6 +14,7 @@ include ../shared/shared.mk main: $(OFILES) +xarray.o: ../../../lib/test_xarray.c idr-test.o: ../../../lib/test_ida.c idr-test: idr-test.o $(CORE_OFILES)
It is a preparation patch for non-uniform folio split, which always split a folio into half iteratively, and minimal xarray entry split. Currently, xas_split_alloc() and xas_split() always split all slots from a multi-index entry. They cost the same number of xa_node as the to-be-split slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are needed. Instead xas_try_split() is intended to be used iteratively to split the order-9 entry into 2 order-8 entries, then split one order-8 entry, based on the given index, to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. When splitting the order-6 entry and a new xa_node is needed, xas_try_split() will try to allocate one if possible. As a result, xas_try_split() would only need one xa_node instead of 8. When a new xa_node is needed during the split, xas_try_split() can try to allocate one but no more. -ENOMEM will be return if a node cannot be allocated. -EINVAL will be return if a sibling node is split or cascade split happens, where two or more new nodes are needed, and these are not supported by xas_try_split(). xas_split_alloc() and xas_split() split an order-9 to order-0: --------------------------------- | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | | | | | | | | --------------------------------- | | | | ------- --- --- ------- | | ... | | V V V V ----------- ----------- ----------- ----------- | xa_node | | xa_node | ... | xa_node | | xa_node | ----------- ----------- ----------- ----------- xas_try_split() splits an order-9 to order-0: --------------------------------- | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | | | | | | | | --------------------------------- | | V ----------- | xa_node | ----------- Signed-off-by: Zi Yan <ziy@nvidia.com> --- Documentation/core-api/xarray.rst | 14 ++- include/linux/xarray.h | 7 ++ lib/test_xarray.c | 47 +++++++++++ lib/xarray.c | 136 ++++++++++++++++++++++++++---- tools/testing/radix-tree/Makefile | 1 + 5 files changed, 188 insertions(+), 17 deletions(-)