diff mbox

Btrfs: memset to avoid stale content in btree node block

Message ID 1473898977-29406-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State Accepted
Headers show

Commit Message

Liu Bo Sept. 15, 2016, 12:22 a.m. UTC
During updating btree, we could push items between sibling
nodes/leaves, for leaves data sections starts reversely from
the end of the block while for nodes we only have key pairs
which are stored one by one from the start of the block.

So we could do try to push key pairs from one node to the next
node right in the tree, and after that, we update the node's
nritems to reflect the correct end while leaving the stale
content in the node.  One may intentionally corrupt the fs
image and access the stale content by bumping the nritems and
causes various crashes.

This takes the in-memory @nritems as the correct one and
gets to memset the unused part of a btree node.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

David Sterba Sept. 20, 2016, 1:16 p.m. UTC | #1
On Wed, Sep 14, 2016 at 05:22:57PM -0700, Liu Bo wrote:
> During updating btree, we could push items between sibling
> nodes/leaves, for leaves data sections starts reversely from
> the end of the block while for nodes we only have key pairs
> which are stored one by one from the start of the block.
> 
> So we could do try to push key pairs from one node to the next
> node right in the tree, and after that, we update the node's
> nritems to reflect the correct end while leaving the stale
> content in the node.  One may intentionally corrupt the fs
> image and access the stale content by bumping the nritems and
> causes various crashes.
> 
> This takes the in-memory @nritems as the correct one and
> gets to memset the unused part of a btree node.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

> ---
>  fs/btrfs/extent_io.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c2325c3..56c9dee 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3732,6 +3732,17 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
>  	if (btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID)
>  		bio_flags = EXTENT_BIO_TREE_LOG;
>  
> +	/* set btree node beyond nritems with 0 to avoid stale content */
> +	if (btrfs_header_level(eb) > 0) {

We can do the same for leaves.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Sept. 20, 2016, 5:57 p.m. UTC | #2
On Tue, Sep 20, 2016 at 03:16:36PM +0200, David Sterba wrote:
> On Wed, Sep 14, 2016 at 05:22:57PM -0700, Liu Bo wrote:
> > During updating btree, we could push items between sibling
> > nodes/leaves, for leaves data sections starts reversely from
> > the end of the block while for nodes we only have key pairs
> > which are stored one by one from the start of the block.
> > 
> > So we could do try to push key pairs from one node to the next
> > node right in the tree, and after that, we update the node's
> > nritems to reflect the correct end while leaving the stale
> > content in the node.  One may intentionally corrupt the fs
> > image and access the stale content by bumping the nritems and
> > causes various crashes.
> > 
> > This takes the in-memory @nritems as the correct one and
> > gets to memset the unused part of a btree node.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> 
> Reviewed-by: David Sterba <dsterba@suse.com>
> 
> > ---
> >  fs/btrfs/extent_io.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index c2325c3..56c9dee 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -3732,6 +3732,17 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
> >  	if (btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID)
> >  		bio_flags = EXTENT_BIO_TREE_LOG;
> >  
> > +	/* set btree node beyond nritems with 0 to avoid stale content */
> > +	if (btrfs_header_level(eb) > 0) {
> 
> We can do the same for leaves.

In theory, the problem also applies for leaves, but I haven't got a
reproducer for leaf case.

So I'll update a v2 with leaf memset, please review that part more
carefully :)

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Sept. 21, 2016, 8:04 a.m. UTC | #3
On Tue, Sep 20, 2016 at 10:57:41AM -0700, Liu Bo wrote:
> On Tue, Sep 20, 2016 at 03:16:36PM +0200, David Sterba wrote:
> > On Wed, Sep 14, 2016 at 05:22:57PM -0700, Liu Bo wrote:
> > > During updating btree, we could push items between sibling
> > > nodes/leaves, for leaves data sections starts reversely from
> > > the end of the block while for nodes we only have key pairs
> > > which are stored one by one from the start of the block.
> > > 
> > > So we could do try to push key pairs from one node to the next
> > > node right in the tree, and after that, we update the node's
> > > nritems to reflect the correct end while leaving the stale
> > > content in the node.  One may intentionally corrupt the fs
> > > image and access the stale content by bumping the nritems and
> > > causes various crashes.
> > > 
> > > This takes the in-memory @nritems as the correct one and
> > > gets to memset the unused part of a btree node.
> > > 
> > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > 
> > Reviewed-by: David Sterba <dsterba@suse.com>
> > 
> > > ---
> > >  fs/btrfs/extent_io.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > index c2325c3..56c9dee 100644
> > > --- a/fs/btrfs/extent_io.c
> > > +++ b/fs/btrfs/extent_io.c
> > > @@ -3732,6 +3732,17 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
> > >  	if (btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID)
> > >  		bio_flags = EXTENT_BIO_TREE_LOG;
> > >  
> > > +	/* set btree node beyond nritems with 0 to avoid stale content */
> > > +	if (btrfs_header_level(eb) > 0) {
> > 
> > We can do the same for leaves.
> 
> In theory, the problem also applies for leaves, but I haven't got a
> reproducer for leaf case.
> 
> So I'll update a v2 with leaf memset, please review that part more
> carefully :)

You can keep it a separate patch, this one is fine. I didn't expect to
reproduce a crash with a bogus nritems in a leaf but rather apply the
same on a leaf buffer. The magic formula is (please verify)

start = nr * sizeof(struct btrfs_disk_key);
end = nr ? btrfs_item_offset(eb, btrfs_item_nr(nr - 1)) : eb->len;

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Sept. 21, 2016, 1:09 p.m. UTC | #4
On 09/21/2016 04:04 AM, David Sterba wrote:
> On Tue, Sep 20, 2016 at 10:57:41AM -0700, Liu Bo wrote:
>> On Tue, Sep 20, 2016 at 03:16:36PM +0200, David Sterba wrote:
>>> On Wed, Sep 14, 2016 at 05:22:57PM -0700, Liu Bo wrote:
>>>> During updating btree, we could push items between sibling
>>>> nodes/leaves, for leaves data sections starts reversely from
>>>> the end of the block while for nodes we only have key pairs
>>>> which are stored one by one from the start of the block.
>>>>
>>>> So we could do try to push key pairs from one node to the next
>>>> node right in the tree, and after that, we update the node's
>>>> nritems to reflect the correct end while leaving the stale
>>>> content in the node.  One may intentionally corrupt the fs
>>>> image and access the stale content by bumping the nritems and
>>>> causes various crashes.
>>>>
>>>> This takes the in-memory @nritems as the correct one and
>>>> gets to memset the unused part of a btree node.
>>>>
>>>> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
>>>
>>> Reviewed-by: David Sterba <dsterba@suse.com>
>>>
>>>> ---
>>>>  fs/btrfs/extent_io.c | 11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>> index c2325c3..56c9dee 100644
>>>> --- a/fs/btrfs/extent_io.c
>>>> +++ b/fs/btrfs/extent_io.c
>>>> @@ -3732,6 +3732,17 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
>>>>  	if (btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID)
>>>>  		bio_flags = EXTENT_BIO_TREE_LOG;
>>>>
>>>> +	/* set btree node beyond nritems with 0 to avoid stale content */
>>>> +	if (btrfs_header_level(eb) > 0) {
>>>
>>> We can do the same for leaves.
>>
>> In theory, the problem also applies for leaves, but I haven't got a
>> reproducer for leaf case.
>>
>> So I'll update a v2 with leaf memset, please review that part more
>> carefully :)
>
> You can keep it a separate patch, this one is fine. I didn't expect to
> reproduce a crash with a bogus nritems in a leaf but rather apply the
> same on a leaf buffer. The magic formula is (please verify)
>
> start = nr * sizeof(struct btrfs_disk_key);
> end = nr ? btrfs_item_offset(eb, btrfs_item_nr(nr - 1)) : eb->len;
>

This is the start/end of the memset for the leaves?  Doesn't look right, 
since leaves looks like this:

item headers 0,1,2 .. N  -> [ empty space ] <- item data N, ... 2, 1, 0

The empty space is from the end of header N to the start of data N.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Sept. 22, 2016, 1:20 a.m. UTC | #5
On Wed, Sep 21, 2016 at 09:09:32AM -0400, Chris Mason wrote:
> 
> 
> On 09/21/2016 04:04 AM, David Sterba wrote:
> > On Tue, Sep 20, 2016 at 10:57:41AM -0700, Liu Bo wrote:
> > > On Tue, Sep 20, 2016 at 03:16:36PM +0200, David Sterba wrote:
> > > > On Wed, Sep 14, 2016 at 05:22:57PM -0700, Liu Bo wrote:
> > > > > During updating btree, we could push items between sibling
> > > > > nodes/leaves, for leaves data sections starts reversely from
> > > > > the end of the block while for nodes we only have key pairs
> > > > > which are stored one by one from the start of the block.
> > > > > 
> > > > > So we could do try to push key pairs from one node to the next
> > > > > node right in the tree, and after that, we update the node's
> > > > > nritems to reflect the correct end while leaving the stale
> > > > > content in the node.  One may intentionally corrupt the fs
> > > > > image and access the stale content by bumping the nritems and
> > > > > causes various crashes.
> > > > > 
> > > > > This takes the in-memory @nritems as the correct one and
> > > > > gets to memset the unused part of a btree node.
> > > > > 
> > > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > > 
> > > > Reviewed-by: David Sterba <dsterba@suse.com>
> > > > 
> > > > > ---
> > > > >  fs/btrfs/extent_io.c | 11 +++++++++++
> > > > >  1 file changed, 11 insertions(+)
> > > > > 
> > > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > > > index c2325c3..56c9dee 100644
> > > > > --- a/fs/btrfs/extent_io.c
> > > > > +++ b/fs/btrfs/extent_io.c
> > > > > @@ -3732,6 +3732,17 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
> > > > >  	if (btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID)
> > > > >  		bio_flags = EXTENT_BIO_TREE_LOG;
> > > > > 
> > > > > +	/* set btree node beyond nritems with 0 to avoid stale content */
> > > > > +	if (btrfs_header_level(eb) > 0) {
> > > > 
> > > > We can do the same for leaves.
> > > 
> > > In theory, the problem also applies for leaves, but I haven't got a
> > > reproducer for leaf case.
> > > 
> > > So I'll update a v2 with leaf memset, please review that part more
> > > carefully :)
> > 
> > You can keep it a separate patch, this one is fine. I didn't expect to
> > reproduce a crash with a bogus nritems in a leaf but rather apply the
> > same on a leaf buffer. The magic formula is (please verify)
> > 
> > start = nr * sizeof(struct btrfs_disk_key);
> > end = nr ? btrfs_item_offset(eb, btrfs_item_nr(nr - 1)) : eb->len;
> > 
> 
> This is the start/end of the memset for the leaves?  Doesn't look right,
> since leaves looks like this:
> 
> item headers 0,1,2 .. N  -> [ empty space ] <- item data N, ... 2, 1, 0
> 
> The empty space is from the end of header N to the start of data N.

Right, we've already got a good helper "leaf_data_end".

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c2325c3..56c9dee 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3732,6 +3732,17 @@  static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	if (btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID)
 		bio_flags = EXTENT_BIO_TREE_LOG;
 
+	/* set btree node beyond nritems with 0 to avoid stale content */
+	if (btrfs_header_level(eb) > 0) {
+		u32 nritems;
+		unsigned long end;
+
+		nritems = btrfs_header_nritems(eb);
+		end = btrfs_node_key_ptr_offset(nritems);
+
+		memset_extent_buffer(eb, 0, end, eb->len - end);
+	}
+
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];