Message ID | 20160321173458.GO23727@linux.intel.com |
---|---|

State | New, archived |

Headers | show |

On Mon 21-03-16 13:34:58, Matthew Wilcox wrote: > I have a patch currently in my tree which has the same effect, but looks > a little neater: > > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index b77c31c..06dfed5 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -70,6 +70,8 @@ struct radix_tree_preload { > }; > static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, }; > > +#define RADIX_TREE_RETRY ((void *)1) > + > static inline void *ptr_to_indirect(void *ptr) > { > return (void *)((unsigned long)ptr | RADIX_TREE_INDIRECT_PTR); > @@ -934,7 +936,7 @@ restart: > } > > slot = rcu_dereference_raw(node->slots[offset]); > - if (slot == NULL) > + if ((slot == NULL) || (slot == RADIX_TREE_RETRY)) > goto restart; > offset = follow_sibling(node, &slot, offset); > if (!radix_tree_is_indirect_ptr(slot)) > @@ -1443,8 +1455,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root) > * to force callers to retry. > */ > if (!radix_tree_is_indirect_ptr(slot)) > - *((unsigned long *)&to_free->slots[0]) |= > - RADIX_TREE_INDIRECT_PTR; > + to_free->slots[0] = RADIX_TREE_RETRY; > > radix_tree_node_free(to_free); > } > > What do you think to doing it this way? > > It might be slightly neater to replace the first hunk with this: > > #define RADIX_TREE_RETRY ((void *)RADIX_TREE_INDIRECT_PTR) > > I also considered putting that define in radix-tree.h instead of > radix-tree.c, but on the whole I don't think it'll be useful outside > radix-tree.h. So after spending over and hour reading radix tree code back and forth (and also digging into historical versions where stuff is easier to understand) I think I can finally fully appreciate subtlety of the retry logic ;). And actually now I think that Neil's variant is buggy because in his case radix_tree_lookup() could return NULL if it raced with radix_tree_shrink() for index 0, although there is valid entry at index 0. Your variant actually doesn't make things much better. See e.g. mm/filemap.c: find_get_entry() pagep = radix_tree_lookup_slot(&mapping->page_tree, offset); // pagep points to node that is under RCU freeing if (pagep) { page = radix_tree_deref_slot(pagep); if (unlikely(!page)) // False since // RADIX_TREE_INDIRECT_PTR is set if (radix_tree_exception(page)) // False - no exeptional bit if (!page_cache_get_speculative(page)) // oops... What we need to do is either to make all radix_tree_deref_slot() callers check return value immediately with something like radix_tree_deref_retry() (but that is still prone to hard to debug bugs when someone forgets to call radix_tree_deref_retry() in some place) or we just give up the idea of using INDIRECT bit in exceptional entries. Honza

On Tue, Mar 22, 2016 at 10:12:32AM +0100, Jan Kara wrote: > if (unlikely(!page)) // False since > // RADIX_TREE_INDIRECT_PTR is set > if (radix_tree_exception(page)) // False - no exeptional bit Oops, you got confused: static inline int radix_tree_exception(void *arg) { return unlikely((unsigned long)arg & (RADIX_TREE_INDIRECT_PTR | RADIX_TREE_EXCEPTIONAL_ENTRY)); }

On Tue 22-03-16 05:27:08, Matthew Wilcox wrote: > On Tue, Mar 22, 2016 at 10:12:32AM +0100, Jan Kara wrote: > > if (unlikely(!page)) // False since > > // RADIX_TREE_INDIRECT_PTR is set > > if (radix_tree_exception(page)) // False - no exeptional bit > > Oops, you got confused: > > static inline int radix_tree_exception(void *arg) > { > return unlikely((unsigned long)arg & > (RADIX_TREE_INDIRECT_PTR | RADIX_TREE_EXCEPTIONAL_ENTRY)); > } Ah, I've confused radix_tree_exception() and radix_tree_exceptional_entry(). OK, so your code works AFAICT. But using RADIX_TREE_RETRY still doesn't make things clearer to me - you still need to check for INDIRECT bit in the retry logic to catch the radix_tree_extend() race as well... As a side note I think we should do away with radix_tree_exception() - it isn't very useful (doesn't simplify any of its callers) and only creates possibility for confusion. Honza

On Tue, Mar 22, 2016 at 11:37:54AM +0100, Jan Kara wrote: > On Tue 22-03-16 05:27:08, Matthew Wilcox wrote: > > On Tue, Mar 22, 2016 at 10:12:32AM +0100, Jan Kara wrote: > > > if (unlikely(!page)) // False since > > > // RADIX_TREE_INDIRECT_PTR is set > > > if (radix_tree_exception(page)) // False - no exeptional bit > > > > Oops, you got confused: > > > > static inline int radix_tree_exception(void *arg) > > { > > return unlikely((unsigned long)arg & > > (RADIX_TREE_INDIRECT_PTR | RADIX_TREE_EXCEPTIONAL_ENTRY)); > > } > > Ah, I've confused radix_tree_exception() and > radix_tree_exceptional_entry(). OK, so your code works AFAICT. But using > RADIX_TREE_RETRY still doesn't make things clearer to me - you still need > to check for INDIRECT bit in the retry logic to catch the > radix_tree_extend() race as well... > > As a side note I think we should do away with radix_tree_exception() - it > isn't very useful (doesn't simplify any of its callers) and only creates > possibility for confusion. Perhaps it would be clearer if we explicitly enumerated the four radix tree entry types? #define RADIX_TREE_TYPE_MASK 3 #define RADIX_TREE_TYPE_DATA 0 #define RADIX_TREE_TYPE_INDIRECT 1 #define RADIX_TREE_TYPE_EXCEPTIONAL 2 #define RADIX_TREE_TYPE_LOCKED_EXC 3 This would make radix_tree_exception (which we could rename so it doesn't get confused with "exceptional" entries): static inline int radix_tree_non_data(void *arg) { return unlikely((unsigned long)arg & RADIX_TREE_TYPE_MASK); } Etc? I guess we'd have to code it up and see if the result was simpler, but it seems like it might be.

On Wed 23-03-16 10:41:44, Ross Zwisler wrote: > On Tue, Mar 22, 2016 at 11:37:54AM +0100, Jan Kara wrote: > > On Tue 22-03-16 05:27:08, Matthew Wilcox wrote: > > > On Tue, Mar 22, 2016 at 10:12:32AM +0100, Jan Kara wrote: > > > > if (unlikely(!page)) // False since > > > > // RADIX_TREE_INDIRECT_PTR is set > > > > if (radix_tree_exception(page)) // False - no exeptional bit > > > > > > Oops, you got confused: > > > > > > static inline int radix_tree_exception(void *arg) > > > { > > > return unlikely((unsigned long)arg & > > > (RADIX_TREE_INDIRECT_PTR | RADIX_TREE_EXCEPTIONAL_ENTRY)); > > > } > > > > Ah, I've confused radix_tree_exception() and > > radix_tree_exceptional_entry(). OK, so your code works AFAICT. But using > > RADIX_TREE_RETRY still doesn't make things clearer to me - you still need > > to check for INDIRECT bit in the retry logic to catch the > > radix_tree_extend() race as well... > > > > As a side note I think we should do away with radix_tree_exception() - it > > isn't very useful (doesn't simplify any of its callers) and only creates > > possibility for confusion. > > Perhaps it would be clearer if we explicitly enumerated the four radix tree > entry types? > > #define RADIX_TREE_TYPE_MASK 3 > > #define RADIX_TREE_TYPE_DATA 0 > #define RADIX_TREE_TYPE_INDIRECT 1 > #define RADIX_TREE_TYPE_EXCEPTIONAL 2 > #define RADIX_TREE_TYPE_LOCKED_EXC 3 > > This would make radix_tree_exception (which we could rename so it doesn't > get confused with "exceptional" entries): > > static inline int radix_tree_non_data(void *arg) > { > return unlikely((unsigned long)arg & RADIX_TREE_TYPE_MASK); > } > > Etc? I guess we'd have to code it up and see if the result was simpler, but > it seems like it might be. Well, for now I have decided to postpone tricks with saving exceptional entry bits and just used bit 2 for the lock bit for DAX exceptional entries because the retry logic in the RCU walking code got rather convoluted with that. If we ever feel we are running out of bits in the entry, we can always look again at compressing the contents more. Honza

diff --git a/lib/radix-tree.c b/lib/radix-tree.c index b77c31c..06dfed5 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -70,6 +70,8 @@ struct radix_tree_preload { }; static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, }; +#define RADIX_TREE_RETRY ((void *)1) + static inline void *ptr_to_indirect(void *ptr) { return (void *)((unsigned long)ptr | RADIX_TREE_INDIRECT_PTR); @@ -934,7 +936,7 @@ restart: } slot = rcu_dereference_raw(node->slots[offset]); - if (slot == NULL) + if ((slot == NULL) || (slot == RADIX_TREE_RETRY)) goto restart; offset = follow_sibling(node, &slot, offset); if (!radix_tree_is_indirect_ptr(slot)) @@ -1443,8 +1455,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root) * to force callers to retry. */ if (!radix_tree_is_indirect_ptr(slot)) - *((unsigned long *)&to_free->slots[0]) |= - RADIX_TREE_INDIRECT_PTR; + to_free->slots[0] = RADIX_TREE_RETRY; radix_tree_node_free(to_free); }