diff mbox

fs: Safe rcu access to hlist.

Message ID 20171119200210.hhrklgm6hxhoyhqh@debian (mailing list archive)
State New, archived
Headers show

Commit Message

Tim Hansen Nov. 19, 2017, 8:02 p.m. UTC
Adds hlist_first_rcu and hlist_next_rcu for safe access
to the hlist in seq_hlist_next_rcu.

Found on linux-next branch, tag next-20171117 with sparse.

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
---
 fs/seq_file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Al Viro Nov. 19, 2017, 9:28 p.m. UTC | #1
On Sun, Nov 19, 2017 at 03:02:10PM -0500, Tim Hansen wrote:
> Adds hlist_first_rcu and hlist_next_rcu for safe access
> to the hlist in seq_hlist_next_rcu.
> 
> Found on linux-next branch, tag next-20171117 with sparse.

Frankly, I'm tempted to take sparse RCU annotations out for good -
they are far too noisy and I'm not sure sparse is suitable for the
analysis needed to prove safety of that stuff, so unless you (or
somebody else) figures out how to use them in a reasonably clean
way, we'd probably be better off just dropping them.
Tim Hansen Nov. 20, 2017, 6:55 p.m. UTC | #2
On Sun, Nov 19, 2017 at 09:28:49PM +0000, Al Viro wrote:
> On Sun, Nov 19, 2017 at 03:02:10PM -0500, Tim Hansen wrote:
> > Adds hlist_first_rcu and hlist_next_rcu for safe access
> > to the hlist in seq_hlist_next_rcu.
> > 
> > Found on linux-next branch, tag next-20171117 with sparse.
> 
> Frankly, I'm tempted to take sparse RCU annotations out for good -
> they are far too noisy and I'm not sure sparse is suitable for the
> analysis needed to prove safety of that stuff, so unless you (or
> somebody else) figures out how to use them in a reasonably clean
> way, we'd probably be better off just dropping them.

Can you detail how sparse is insufficent to prove RCU saftey? 
I'm not an RCU expert by any means but I don't know of any 
complaints regarding the capabilities of sparse to detect RCU
correctness in the community.  That however could just be my 
own ignornace. As far as I know these sparse RCU annotations 
are used widely across other subsystems.

I'd defer to other people more knowledgable on sparse to chime 
in regarding the "correctness" of it's capability on the RCU.
Luc Van Oostenryck Nov. 20, 2017, 8:01 p.m. UTC | #3
On Mon, Nov 20, 2017 at 01:55:35PM -0500, Tim Hansen wrote:
> On Sun, Nov 19, 2017 at 09:28:49PM +0000, Al Viro wrote:
> > On Sun, Nov 19, 2017 at 03:02:10PM -0500, Tim Hansen wrote:
> > > Adds hlist_first_rcu and hlist_next_rcu for safe access
> > > to the hlist in seq_hlist_next_rcu.
> > > 
> > > Found on linux-next branch, tag next-20171117 with sparse.
> > 
> > Frankly, I'm tempted to take sparse RCU annotations out for good -
> > they are far too noisy and I'm not sure sparse is suitable for the
> > analysis needed to prove safety of that stuff, so unless you (or
> > somebody else) figures out how to use them in a reasonably clean
> > way, we'd probably be better off just dropping them.
> 
> Can you detail how sparse is insufficent to prove RCU saftey? 
> I'm not an RCU expert by any means but I don't know of any 
> complaints regarding the capabilities of sparse to detect RCU
> correctness in the community.  That however could just be my 
> own ignornace. As far as I know these sparse RCU annotations 
> are used widely across other subsystems.
> 
> I'd defer to other people more knowledgable on sparse to chime 
> in regarding the "correctness" of it's capability on the RCU.

Hi,

[not knowing much about RCU's needs here but knowing quite a bit
 about sparse]

I think the issue here is mainly about the use of the address space.
For kernel space vs. __user vs. __iomem, address space works quite
well: a pointer points either to a kernel address or to userland
or to some device or bus memory. It's an exclusive thing.
So you can annotate the pointer with __user or __iomem, it's a 
kind of extension of the typing system, and you can let sparse
do its job.
For the endianness annotations, it's very similar: a variable
either points to a native value or to a big or little endian
value, only one can (should!) be correct. The __be32/__le32/...
annotations are once again an extension of the typing system.
Fine for sparse.

For RCU, the impression I have is that things are completly
different: it's more a question of transient state than
something exclusive. The choice to use another address space
imposes the need of a lot of artificial annotation. And to 
make sparse able to do its job, a lot of artificial helpers
are needed to cast variable in and out of the __rcu address
space.

-- Luc Van Oostenryck
Matthew Wilcox Nov. 20, 2017, 8:42 p.m. UTC | #4
On Mon, Nov 20, 2017 at 09:01:32PM +0100, Luc Van Oostenryck wrote:
> [not knowing much about RCU's needs here but knowing quite a bit
>  about sparse]
> 
> I think the issue here is mainly about the use of the address space.
> For kernel space vs. __user vs. __iomem, address space works quite
> well: a pointer points either to a kernel address or to userland
> or to some device or bus memory. It's an exclusive thing.
> So you can annotate the pointer with __user or __iomem, it's a 
> kind of extension of the typing system, and you can let sparse
> do its job.
> For the endianness annotations, it's very similar: a variable
> either points to a native value or to a big or little endian
> value, only one can (should!) be correct. The __be32/__le32/...
> annotations are once again an extension of the typing system.
> Fine for sparse.
> 
> For RCU, the impression I have is that things are completly
> different: it's more a question of transient state than
> something exclusive. The choice to use another address space
> imposes the need of a lot of artificial annotation. And to 
> make sparse able to do its job, a lot of artificial helpers
> are needed to cast variable in and out of the __rcu address
> space.

I disagree.  The notion of whether a pointer is protected by RCU or not
is definitely not transient.  There are a lot of places in the kernel
with missing RCU annotations, and that's where you'll see a lot of
sparse warnings.  They're even correct in some cases!  For example,
this part *of the page cache* is not RCU safe (uhm, if I'm reading
rcu_dereference.txt correctly):

        void **slot;
        rcu_read_lock();
        radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
                page = radix_tree_deref_slot(slot);
                page_cache_get_speculative(head)
                /* Has the page moved? */
                if (unlikely(page != *slot)) {

Now, it's pretty subtle why it's wrong, and we're probably getting away
with it with current compiler & CPU technology, but if people were more
diligent about the sparse RCU warnings, there would be no doubt that it
was correct.

(one of the major problems was that the radix tree was not diligent about
annotating the 'slot' pointer as being an __rcu pointer).
Luc Van Oostenryck Nov. 20, 2017, 8:58 p.m. UTC | #5
On Mon, Nov 20, 2017 at 12:42:53PM -0800, Matthew Wilcox wrote:
> 
> I disagree.  The notion of whether a pointer is protected by RCU or not
> is definitely not transient.

Sure. But what about the memory it points to?
It's just 'normal' kernel memory, there is nowhere
something like some 'RCU memory', right?

And the memory accessed through a __rcu annotated
pointer can be legally be accessed with normal
memory operation, because it's only the pointer that
is concerned by the annotation?

-- Luc
Paul E. McKenney Nov. 20, 2017, 9:21 p.m. UTC | #6
On Mon, Nov 20, 2017 at 09:58:02PM +0100, Luc Van Oostenryck wrote:
> On Mon, Nov 20, 2017 at 12:42:53PM -0800, Matthew Wilcox wrote:
> > 
> > I disagree.  The notion of whether a pointer is protected by RCU or not
> > is definitely not transient.
> 
> Sure. But what about the memory it points to?
> It's just 'normal' kernel memory, there is nowhere
> something like some 'RCU memory', right?
> 
> And the memory accessed through a __rcu annotated
> pointer can be legally be accessed with normal
> memory operation, because it's only the pointer that
> is concerned by the annotation?

It is the dereferencing of the pointer that is important.

For the pointer itself, once we have loaded it, we have loaded it,
and that is that.

The ordering that must be preserved is the load of the pointer against
later loads dereferencing that pointer.  Now you might ask, as I once
did, "How can the later dereference possibly be reordered against the
pointer being dereferenced?"  And the answer is that DEC Alpha really
did such reordering, and also that feedback-based optimizations could
potentially cause compilers to do such reordering.  There is a lot
written on this topic, but Documentation/RCU/rcu_dereference.txt and
Documentation/memory-barriers.txt are reasonable places to start.
Or, for more recent but still experimental documentation, the file
Documentation/explanation.txt at https://github.com/aparri/memory-model.

In short, sparse's approach really does make sense here.

							Thanx, Paul
diff mbox

Patch

diff --git a/fs/seq_file.c b/fs/seq_file.c
index fb17f35a49a6..0b966781fd60 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -968,9 +968,9 @@  struct hlist_node *seq_hlist_next_rcu(void *v,
 
 	++*ppos;
 	if (v == SEQ_START_TOKEN)
-		return rcu_dereference(head->first);
+		return rcu_dereference(hlist_first_rcu(head));
 	else
-		return rcu_dereference(node->next);
+		return rcu_dereference(hlist_next_rcu(node));
 }
 EXPORT_SYMBOL(seq_hlist_next_rcu);