hfsplus: release bnode pages after use, not before

Message ID	1433637776-3559-1-git-send-email-saproj@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Sergei Antonov <saproj@gmail.com> To: linux-fsdevel@vger.kernel.org, Sasha Levin <sasha.levin@oracle.com> Cc: Anton Altaparmakov <aia21@cam.ac.uk>, Al Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, Vyacheslav Dubeyko <slava@dubeyko.com>, Hin-Tak Leung <htl10@users.sourceforge.net>, Sougata Santra <sougata@tuxera.com>, Sergei Antonov <saproj@gmail.com> Subject: [PATCH] hfsplus: release bnode pages after use, not before Date: Sun, 7 Jun 2015 02:42:56 +0200 Message-Id: <1433637776-3559-1-git-send-email-saproj@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Sergei Antonov June 7, 2015, 12:42 a.m. UTC

Fix this bugreport by Sasha Levin:
http://lkml.org/lkml/2015/2/20/85 ("use after free")
Make sure mapped pages are available for the entire lifetime of hfs_bnode.

Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sougata Santra <sougata@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
---
 fs/hfsplus/bnode.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Viacheslav Dubeyko June 8, 2015, 3:45 p.m. UTC | #1

On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
> Fix this bugreport by Sasha Levin:
> http://lkml.org/lkml/2015/2/20/85 ("use after free")
> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
> 

Sorry, I missed the point. What do you try to fix? How this change fixes
the issue?

I think that maybe this fix makes sense. But it needs to describe it
more deeply. Could you describe the fix with more details?

Thanks,
Vyacheslav Dubeyko.


> Cc: Anton Altaparmakov <aia21@cam.ac.uk>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
> Cc: Sougata Santra <sougata@tuxera.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Sergei Antonov <saproj@gmail.com>
> ---
>  fs/hfsplus/bnode.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
> index 759708f..5af50fb 100644
> --- a/fs/hfsplus/bnode.c
> +++ b/fs/hfsplus/bnode.c
> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>  			page_cache_release(page);
>  			goto fail;
>  		}
> -		page_cache_release(page);
>  		node->page[i] = page;
>  	}
>  
> @@ -566,13 +565,12 @@ node_error:
>  
>  void hfs_bnode_free(struct hfs_bnode *node)
>  {
> -#if 0
>  	int i;
>  
> -	for (i = 0; i < node->tree->pages_per_bnode; i++)
> +	for (i = 0; i < node->tree->pages_per_bnode; i++) {
>  		if (node->page[i])
>  			page_cache_release(node->page[i]);
> -#endif
> +	}
>  	kfree(node);
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sergei Antonov June 8, 2015, 4:32 p.m. UTC | #2

On 8 June 2015 at 17:45, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
> On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
>> Fix this bugreport by Sasha Levin:
>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>
>
> Sorry, I missed the point. What do you try to fix? How this change fixes
> the issue?
>
> I think that maybe this fix makes sense. But it needs to describe it
> more deeply. Could you describe the fix with more details?

You are basically saying you don’t understand it. Too bad, because the
bug is very simple. It is the „use after free“ type of bug, and it can
be illustrated by this:
(1) void *ptr = malloc(…);
(2) free(ptr);
(3) memcpy(…, ptr, 1);
Guess which two of these three lines are executed in wrong order.

My patch is about the same type of bug, but with memory pages mapping.
The driver currently accesses pages that may be unavailable, or
contain different data. The problem is more likely to occur when
memory is a limited resource. I reproduced it while running a
memory-hungry program.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Viacheslav Dubeyko June 8, 2015, 4:45 p.m. UTC | #3

On Mon, 2015-06-08 at 18:32 +0200, Sergei Antonov wrote:
> On 8 June 2015 at 17:45, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
> > On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
> >> Fix this bugreport by Sasha Levin:
> >> http://lkml.org/lkml/2015/2/20/85 ("use after free")
> >> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
> >>
> >
> > Sorry, I missed the point. What do you try to fix? How this change fixes
> > the issue?
> >
> > I think that maybe this fix makes sense. But it needs to describe it
> > more deeply. Could you describe the fix with more details?
> 
> You are basically saying you don’t understand it. Too bad, because the
> bug is very simple. It is the „use after free“ type of bug, and it can
> be illustrated by this:
> (1) void *ptr = malloc(…);
> (2) free(ptr);
> (3) memcpy(…, ptr, 1);
> Guess which two of these three lines are executed in wrong order.
> 
> My patch is about the same type of bug, but with memory pages mapping.
> The driver currently accesses pages that may be unavailable, or
> contain different data. The problem is more likely to occur when
> memory is a limited resource. I reproduced it while running a
> memory-hungry program.

I worried not about myself but about potential readers of description of
the fix. The description is completely obscure. And it needs to describe
the fix in clear and descriptive manner. This is my request. Please,
describe the fix in a clear way.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sergei Antonov June 8, 2015, 4:50 p.m. UTC | #4

On 8 June 2015 at 18:45, Viacheslav Dubeyko <slava@dubeyko.com> wrote:
> On Mon, 2015-06-08 at 18:32 +0200, Sergei Antonov wrote:
>> On 8 June 2015 at 17:45, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
>> > On Sun, 2015-06-07 at 02:42 +0200, Sergei Antonov wrote:
>> >> Fix this bugreport by Sasha Levin:
>> >> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>> >> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>> >>
>> >
>> > Sorry, I missed the point. What do you try to fix? How this change fixes
>> > the issue?
>> >
>> > I think that maybe this fix makes sense. But it needs to describe it
>> > more deeply. Could you describe the fix with more details?
>>
>> You are basically saying you don’t understand it. Too bad, because the
>> bug is very simple. It is the „use after free“ type of bug, and it can
>> be illustrated by this:
>> (1) void *ptr = malloc(…);
>> (2) free(ptr);
>> (3) memcpy(…, ptr, 1);
>> Guess which two of these three lines are executed in wrong order.
>>
>> My patch is about the same type of bug, but with memory pages mapping.
>> The driver currently accesses pages that may be unavailable, or
>> contain different data. The problem is more likely to occur when
>> memory is a limited resource. I reproduced it while running a
>> memory-hungry program.
>
> I worried not about myself but about potential readers of description of
> the fix. The description is completely obscure. And it needs to describe
> the fix in clear and descriptive manner. This is my request. Please,
> describe the fix in a clear way.

The description is just right.
Anton, can you give your opinion? You commented my patches before.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anton Altaparmakov June 9, 2015, 6:06 p.m. UTC | #5

Hi Andrew,

Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.

Feel free to add:

Reviewed-by: Anton Altaparmakov <anton@tuxera.com>

Thanks a lot in advance!

Best regards,

	Anton

> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
> 
> Fix this bugreport by Sasha Levin:
> http://lkml.org/lkml/2015/2/20/85 ("use after free")
> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
> Cc: Sougata Santra <sougata@tuxera.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Sergei Antonov <saproj@gmail.com>
> ---
> fs/hfsplus/bnode.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
> index 759708f..5af50fb 100644
> --- a/fs/hfsplus/bnode.c
> +++ b/fs/hfsplus/bnode.c
> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
> 			page_cache_release(page);
> 			goto fail;
> 		}
> -		page_cache_release(page);
> 		node->page[i] = page;
> 	}
> 
> @@ -566,13 +565,12 @@ node_error:
> 
> void hfs_bnode_free(struct hfs_bnode *node)
> {
> -#if 0
> 	int i;
> 
> -	for (i = 0; i < node->tree->pages_per_bnode; i++)
> +	for (i = 0; i < node->tree->pages_per_bnode; i++) {
> 		if (node->page[i])
> 			page_cache_release(node->page[i]);
> -#endif
> +	}
> 	kfree(node);
> }
> 
> -- 
> 2.3.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrew Morton June 9, 2015, 10:15 p.m. UTC | #6

On Mon, 8 Jun 2015 18:50:00 +0200 Sergei Antonov <saproj@gmail.com> wrote:

> >> You are basically saying you don___t understand it. Too bad, because the
> >> bug is very simple. It is the ___use after free___ type of bug, and it can
> >> be illustrated by this:
> >> (1) void *ptr = malloc(___);
> >> (2) free(ptr);
> >> (3) memcpy(___, ptr, 1);
> >> Guess which two of these three lines are executed in wrong order.
> >>
> >> My patch is about the same type of bug, but with memory pages mapping.
> >> The driver currently accesses pages that may be unavailable, or
> >> contain different data. The problem is more likely to occur when
> >> memory is a limited resource. I reproduced it while running a
> >> memory-hungry program.
> >
> > I worried not about myself but about potential readers of description of
> > the fix. The description is completely obscure. And it needs to describe
> > the fix in clear and descriptive manner. This is my request. Please,
> > describe the fix in a clear way.
> 
> The description is just right.

Yes, I too would like to hear much more about your thinking on this,
and a detailed description of the bug and how the patch fixes it.

The code is distressingly undocumented and has been that way since
Roman Zippel's original commit in 2004.

From the looks of it, that loop in __hfs_bnode_create() is simply doing
readahead and is designed as a performance optimisation.  The pages are
pulled into pagecache in the expectation that they will soon be
accessed.  What your patch does is to instead pin the pages in
pagecache until the bnode is freed.  If we're going to do this then we
need to be very careful about worst-case scenarios: we could even run
the machine out of memory.

If I'm correct, and this is just readahead then the bug lies elsewhere:
if other parts of hfsplus are assuming that this memory is in pagecache
then that's an error - that code (where is it?) should instead be performing
a pagecache lookup and if the page isn't present, read it from disk
again.

But for others to be able to review and understand this change and
suggest alternatives, we'll need a much much better changelog!
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anton Altaparmakov June 9, 2015, 11 p.m. UTC | #7

Hi Andrew,

> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 8 Jun 2015 18:50:00 +0200 Sergei Antonov <saproj@gmail.com> wrote:
>>>> You are basically saying you don___t understand it. Too bad, because the
>>>> bug is very simple. It is the ___use after free___ type of bug, and it can
>>>> be illustrated by this:
>>>> (1) void *ptr = malloc(___);
>>>> (2) free(ptr);
>>>> (3) memcpy(___, ptr, 1);
>>>> Guess which two of these three lines are executed in wrong order.
>>>> 
>>>> My patch is about the same type of bug, but with memory pages mapping.
>>>> The driver currently accesses pages that may be unavailable, or
>>>> contain different data. The problem is more likely to occur when
>>>> memory is a limited resource. I reproduced it while running a
>>>> memory-hungry program.
>>> 
>>> I worried not about myself but about potential readers of description of
>>> the fix. The description is completely obscure. And it needs to describe
>>> the fix in clear and descriptive manner. This is my request. Please,
>>> describe the fix in a clear way.
>> 
>> The description is just right.
> 
> Yes, I too would like to hear much more about your thinking on this,
> and a detailed description of the bug and how the patch fixes it.
> 
> 
> The code is distressingly undocumented and has been that way since
> Roman Zippel's original commit in 2004.
> 
> From the looks of it, that loop in __hfs_bnode_create() is simply doing
> readahead and is designed as a performance optimisation.  The pages are
> pulled into pagecache in the expectation that they will soon be
> accessed.  What your patch does is to instead pin the pages in
> pagecache until the bnode is freed.  If we're going to do this then we
> need to be very careful about worst-case scenarios: we could even run
> the machine out of memory.
> 
> If I'm correct, and this is just readahead then the bug lies elsewhere:
> if other parts of hfsplus are assuming that this memory is in pagecache
> then that's an error - that code (where is it?) should instead be performing
> a pagecache lookup and if the page isn't present, read it from disk
> again.
> 
> But for others to be able to review and understand this change and
> suggest alternatives, we'll need a much much better changelog!

Given I had just looked at the code...  __hfs_bnode_create() is not doing read-ahead at all from my reading of the code.  What it does is to gather the needed pages that are immediately processed, i.e.

hfs_bnode_find() does:

- call __hfs_bnode_create() which gathers the pages into the array of pages @node->page

- kmap() some page from @node->page array, read some information from the kmapped page then kunmap it again (this is actually insane - either all the pages should have been kmapped in __hfs_bnode_create or it should be using kmap_atomic!).

Here is the actual code:

        desc = (struct hfs_bnode_desc *)(kmap(node->page[0]) +
                        node->page_offset);
        node->prev = be32_to_cpu(desc->prev);
        node->next = be32_to_cpu(desc->next);
        node->num_recs = be16_to_cpu(desc->num_recs);
        node->type = desc->type;
        node->height = desc->height;
        kunmap(node->page[0]);

That actually makes some sense (though kmap_atomic would still be much better!) but then this follows:

	off = hfs_bnode_read_u16(node, rec_off);

and (omitting lines for clarity):

        for (i = 1; i <= node->num_recs; off = next_off, i++) {
                ...
                next_off = hfs_bnode_read_u16(node, rec_off);
                ...
                key_size = hfs_bnode_read_u16(node, off) + 2;
                ...
        }

Where all those hfs_bnode_read_u16() are simply "kmap, copy u16 from kmapped page, kunmap" so again crazy thing to be doing - either kmap_atomic or probably actually better just kmap all the pages to start with in @node->page array at the time of reading them all in...

If you consider __hfs_bnode_create to be doing readahead then all those calls to hfs_bnode_read_u16() would have to do a read_mapping_page() and incur the overhead of a radix tree lookup for each u16 read.  That would not be very good for performance/cpu usage.

Also note there is no danger of running out of RAM as the largest allowed B-tree node size in HFS+ according to Apple's own specification is 32kiB, i.e. on 4k page size only 8 pages maximum - readpages functions deal with significantly more pages at a time without worrying about running out of RAM.

But yes I agree with you that HFS+ is an undocumented mess but at least Sergei is putting some effort to make it better!

Best regards,

	Anton

Anton Altaparmakov June 9, 2015, 11:08 p.m. UTC | #8

Hi Andrew,

Forgot to reply to one point you made:

> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> Yes, I too would like to hear much more about your thinking on this,
> and a detailed description of the bug and how the patch fixes it.

Perhaps the patch description is lacking but here is what the current code does:

struct page *page = read_mapping_page();
page_cache_release(page);
u8 *kaddr = kmap(page);
memcpy(..., kaddr, ...);
kunmap(page);

Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().

The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.

Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.

And his patch changes the above to this instead:

struct page *page = read_mapping_page();
u8 *kaddr = kmap(page);
memcpy(..., kaddr, ...);
kunmap(page);
page_cache_release(page);

Which is the correct sequence of events.

Although perhaps there should also be a mark_page_accessed(page); thrown in there, too, before the page_cache_release() in the expectation that the B tree node is likely to be used again?

Best regards,

	Anton

Andrew Morton June 9, 2015, 11:16 p.m. UTC | #9

On Tue, 9 Jun 2015 23:08:48 +0000 Anton Altaparmakov <anton@tuxera.com> wrote:

> Hi Andrew,
> 
> Forgot to reply to one point you made:
> 
> > On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> > Yes, I too would like to hear much more about your thinking on this,
> > and a detailed description of the bug and how the patch fixes it.
> 
> Perhaps the patch description is lacking but here is what the current code does:
> 
> struct page *page = read_mapping_page();
> page_cache_release(page);
> u8 *kaddr = kmap(page);
> memcpy(..., kaddr, ...);
> kunmap(page);
> 
> Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().
> 
> The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.
> 
> Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.
> 
> And his patch changes the above to this instead:
> 
> struct page *page = read_mapping_page();
> u8 *kaddr = kmap(page);
> memcpy(..., kaddr, ...);
> kunmap(page);
> page_cache_release(page);
> 
> Which is the correct sequence of events.

OK, pinning 8 pages for the duration of hfs_bnode_find() sounds
reasonable.

This is a painful way to write a changelog :(

> Although perhaps there should also be a mark_page_accessed(page);
> thrown in there, too, before the page_cache_release() in the
> expectation that the B tree node is likely to be used again?

Probably.

Also, using read_mapping_page() is quite inefficient: it's a
synchronous read.  Putting a single call to read_cache_pages() before
the loop would be sufficient to get all that IO done in a single lump. 

But first we fix the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anton Altaparmakov June 9, 2015, 11:23 p.m. UTC | #10

btw. XFS does something very similar except they actually go and vm_map_ram() the pages into a contiguous virtual memory region so they can just do normal accesses into the linear buffer instead of having to mess about with the fact the node is spread across multiple pages like HFS+ is doing and they use submit_bio() directly instead of read_mapping_page() but the idea is much the same...  See fs/xfs/xfs_buf.c...

Best regards,

	Anton

> On 10 Jun 2015, at 02:08, Anton Altaparmakov <anton@tuxera.com> wrote:
> 
> Hi Andrew,
> 
> Forgot to reply to one point you made:
> 
>> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
>> Yes, I too would like to hear much more about your thinking on this,
>> and a detailed description of the bug and how the patch fixes it.
> 
> Perhaps the patch description is lacking but here is what the current code does:
> 
> struct page *page = read_mapping_page();
> page_cache_release(page);
> u8 *kaddr = kmap(page);
> memcpy(..., kaddr, ...);
> kunmap(page);
> 
> Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().
> 
> The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.
> 
> Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.
> 
> And his patch changes the above to this instead:
> 
> struct page *page = read_mapping_page();
> u8 *kaddr = kmap(page);
> memcpy(..., kaddr, ...);
> kunmap(page);
> page_cache_release(page);
> 
> Which is the correct sequence of events.
> 
> Although perhaps there should also be a mark_page_accessed(page); thrown in there, too, before the page_cache_release() in the expectation that the B tree node is likely to be used again?
> 
> Best regards,
> 
> 	Anton

Anton Altaparmakov June 9, 2015, 11:34 p.m. UTC | #11

Hi,

> On 10 Jun 2015, at 02:16, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 9 Jun 2015 23:08:48 +0000 Anton Altaparmakov <anton@tuxera.com> wrote:
>> Hi Andrew,
>> 
>> Forgot to reply to one point you made:
>> 
>>> On 10 Jun 2015, at 01:15, Andrew Morton <akpm@linux-foundation.org> wrote:
>>> Yes, I too would like to hear much more about your thinking on this,
>>> and a detailed description of the bug and how the patch fixes it.
>> 
>> Perhaps the patch description is lacking but here is what the current code does:
>> 
>> struct page *page = read_mapping_page();
>> page_cache_release(page);
>> u8 *kaddr = kmap(page);
>> memcpy(..., kaddr, ...);
>> kunmap(page);
>> 
>> Now in what world is that a valid thing to do?  When the page_cache_release() happens the page is no longer allocated and the kmap() is referencing not-in-use memory and so is the memcpy() and so is the kunmap().
>> 
>> The only reason the code gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else.
>> 
>> Sergei said that he got a problem when he was running memory intensive processes at same time so the kernel was thrashing/evicting/resuing page cache pages at high speed and then obviously the kmap() actually mapped a different page to the original that was page_cache_release()d and thus the memcpy() effectively copied random data which was then considered corrupt by the verification code and thus the entire B tree node was considered corrupt and in Sergei's case the volume thus failed to mount.
>> 
>> And his patch changes the above to this instead:
>> 
>> struct page *page = read_mapping_page();
>> u8 *kaddr = kmap(page);
>> memcpy(..., kaddr, ...);
>> kunmap(page);
>> page_cache_release(page);
>> 
>> Which is the correct sequence of events.
> 
> OK, pinning 8 pages for the duration of hfs_bnode_find() sounds
> reasonable.
> 
> This is a painful way to write a changelog :(

I will grant you that Sergei's change log was a bit brief.  I had to wade through the code to ensure I knew what he was talking about which the changelog should have spared me from doing.

Sergei, perhaps your take home message is that more verbose changelogs would be a good idea because even if something is obvious to you because you have studied and worked on the code it does not mean it is obvious to anyone else who likely has never seen the HFS+ code except in passing.  (-;

/offtopic alert: This reminds me of the maths professor who wrote a long and very complicated proof of some difficult problem on the blackboard and finished with "... and thus this obviously concludes the proof." and an incredulous student asked him "Excuse me professor but is it obvious?".  The professor left the room came back some time later and simply answered "Yes it is" without further explanation.  (-;

>> Although perhaps there should also be a mark_page_accessed(page);
>> thrown in there, too, before the page_cache_release() in the
>> expectation that the B tree node is likely to be used again?
> 
> Probably.
> 
> Also, using read_mapping_page() is quite inefficient: it's a
> synchronous read.  Putting a single call to read_cache_pages() before
> the loop would be sufficient to get all that IO done in a single lump.

That is very true.  IIRC the code came before the advent of the ->readpages address space operation...  But yes it needs modernising...

> But first we fix the bug.

I am glad we agree on that point.  (-:

Best regards,

	Anton

Sergei Antonov June 9, 2015, 11:40 p.m. UTC | #12

On 10 June 2015 at 00:15, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 8 Jun 2015 18:50:00 +0200 Sergei Antonov <saproj@gmail.com> wrote:
>
>> >> You are basically saying you don___t understand it. Too bad, because the
>> >> bug is very simple. It is the ___use after free___ type of bug, and it can
>> >> be illustrated by this:
>> >> (1) void *ptr = malloc(___);
>> >> (2) free(ptr);
>> >> (3) memcpy(___, ptr, 1);
>> >> Guess which two of these three lines are executed in wrong order.
>> >>
>> >> My patch is about the same type of bug, but with memory pages mapping.
>> >> The driver currently accesses pages that may be unavailable, or
>> >> contain different data. The problem is more likely to occur when
>> >> memory is a limited resource. I reproduced it while running a
>> >> memory-hungry program.
>> >
>> > I worried not about myself but about potential readers of description of
>> > the fix. The description is completely obscure. And it needs to describe
>> > the fix in clear and descriptive manner. This is my request. Please,
>> > describe the fix in a clear way.
>>
>> The description is just right.
>
> Yes, I too would like to hear much more about your thinking on this,
> and a detailed description of the bug and how the patch fixes it.

By calling page_cache_release() when it is OK to.

> The code is distressingly undocumented and has been that way since
> Roman Zippel's original commit in 2004.

I looked into it before submitting the patch. The code submitted in
2004 was already broken.

> From the looks of it, that loop in __hfs_bnode_create() is simply doing
> readahead and is designed as a performance optimisation.  The pages are
> pulled into pagecache in the expectation that they will soon be
> accessed.  What your patch does is to instead pin the pages in
> pagecache until the bnode is freed.  If we're going to do this then we
> need to be very careful about worst-case scenarios: we could even run
> the machine out of memory.

I did not try to change the logic of the driver. Just fixed one
glaring defect. Which, by the way, in addition to the aforementioned
bug by Sasha Levin caused:
1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
4. https://bugzilla.kernel.org/show_bug.cgi?id=42342

> If I'm correct, and this is just readahead then the bug lies elsewhere:

I pinpointed the bug so well. My test code dumped the content of the
page at the moment corrupted data was detected. Then I looked at the
dumped data, and - guess what - data from the memory-hungry program I
was using sneaked in there! So I'm certain the cause of the bug is
indeed the wrong sequence of
read_mapping_page/kmap/kunmap/page_cache_release.

> if other parts of hfsplus are assuming that this memory is in pagecache
> then that's an error - that code (where is it?) should instead be performing
> a pagecache lookup and if the page isn't present, read it from disk
> again.
>
> But for others to be able to review and understand this change and
> suggest alternatives, we'll need a much much better changelog!
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hin-Tak Leung June 14, 2015, 2:27 a.m. UTC | #13

On 10 June 2015 at 00:40, Sergei Antonov <saproj@gmail.com> wrote:
...
> I did not try to change the logic of the driver. Just fixed one
> glaring defect. Which, by the way, in addition to the aforementioned
> bug by Sasha Levin caused:
> 1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
> 2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
> 3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
> 4. https://bugzilla.kernel.org/show_bug.cgi?id=42342

For some unknown reason majordomo won't let my regular sourceforge
account/alias post
to its lists (not just fsdevel, but also git), anyway, I just managed
to reproduced the issue -
my older hardware has been retired so it is harder on more-generous
memory systems.
The steps are:
1. mount mac os x system disk image (I do read-only, and also from
qemu-nbd read-only, but those
details don't seem to be important)
2. run "du /mnt" on the mac os x disk mount repeatedly; I wasn't able
to cause the kernel
to fault on my current system - 8GB, kernel 4.0.5-300.fc22.x86_64 ,
until I also run 'du /' (i.e. the whole linux system) in a separate window.

Then I get this:
--------------
[17257.917645] hfsplus: recoff 22364 too large
[17257.993329] hfsplus: recoff 22364 too large
[17274.686714] hfsplus: recoff 27136 too large
[17274.686747] hfsplus: recoff 27136 too large
[17276.643246] hfsplus: recoff 28528 too large
[17276.643283] hfsplus: recoff 28528 too large
[17276.643334] hfsplus: recoff 31034 too large
[17276.643432] hfsplus: recoff 31034 too large
[17276.644568] hfsplus: recoff 31034 too large
[17276.645408] hfsplus: recoff 31034 too large
[17276.648083] hfsplus: recoff 31034 too large
[17283.456405] hfsplus: recoff 18688 too large
[17283.484236] hfsplus: recoff 18688 too large
[17283.489637] hfsplus: recoff 18688 too large
[17283.567689] hfsplus: recoff 18688 too large
[17283.580719] hfsplus: recoff 18688 too large
[17283.586192] hfsplus: recoff 18688 too large
[17283.590115] hfsplus: recoff 18688 too large
[17283.594808] hfsplus: recoff 18688 too large
[17283.598225] hfsplus: recoff 18688 too large
[17283.608131] hfsplus: recoff 18688 too large
[17283.613090] hfsplus: recoff 18688 too large
[17283.621220] hfsplus: recoff 18688 too large
[17283.777970] hfsplus: recoff 18688 too large
[17283.816374] hfsplus: recoff 18688 too large
[17283.902398] hfsplus: recoff 18688 too large
[17284.552987] hfsplus: recoff 16933 too large
[17284.738526] hfsplus: recoff 25661 too large
[17284.738557] hfsplus: recoff 25661 too large
[17284.871063] hfsplus: recoff 14131 too large
[17284.871992] hfsplus: recoff 14131 too large
[17287.330129] hfsplus: recoff 14131 too large
[17287.685383] hfsplus: recoff 14131 too large
[17287.860485] hfsplus: recoff 14131 too large
[17287.881720] hfsplus: recoff 14131 too large
[17287.923793] hfsplus: recoff 14131 too large
[17288.133657] hfsplus: recoff 14131 too large
[17288.310771] hfsplus: recoff 14131 too large
[17288.789545] BUG: Bad page state in process firefox  pfn:22cb45
[17288.789559] page:ffffea0008b2d140 count:0 mapcount:0 mapping:
   (null) index:0x2
[17288.789563] flags: 0x5ffff800000004(referenced)
[17288.789570] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[17288.789573] bad because of flags:
[17288.789576] flags: 0x4(referenced)
[17288.789580] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
snd_hwdep snd_seq snd_seq_device
[17288.789660]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
rtsx_pci mii mfd_core
[17288.789722] CPU: 3 PID: 2844 Comm: firefox Tainted: G           O
 4.0.5-300.fc22.x86_64 #1
[17288.789726] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
1.30 06/06/2014
[17288.789730]  0000000000000000 0000000064b601fc ffff8801fff9baf8
ffffffff817834f4
[17288.789736]  0000000000000000 ffffea0008b2d140 ffff8801fff9bb28
ffffffff811a48ec
[17288.789742]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
ffff88023ed98e58
[17288.789748] Call Trace:
[17288.789762]  [<ffffffff817834f4>] dump_stack+0x45/0x57
[17288.789771]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
[17288.789780]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
[17288.789787]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
[17288.789794]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
[17288.789802]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
[17288.789808]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
[17288.789817]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
[17288.789825]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
[17288.789832]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
[17288.789839]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
[17288.789846]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
[17288.789851]  [<ffffffff81064011>] do_page_fault+0x31/0x70
[17288.789858]  [<ffffffff8178bb08>] page_fault+0x28/0x30
[17288.789862] Disabling lock debugging due to kernel taint
[17288.790999] BUG: Bad page state in process firefox  pfn:22ce72
[17288.791042] page:ffffea0008b39c80 count:0 mapcount:0 mapping:
   (null) index:0x2
[17288.791046] flags: 0x5ffff800000004(referenced)
[17288.791053] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[17288.791055] bad because of flags:
[17288.791057] flags: 0x4(referenced)
[17288.791062] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
snd_hwdep snd_seq snd_seq_device
[17288.791141]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
rtsx_pci mii mfd_core
[17288.791202] CPU: 3 PID: 2844 Comm: firefox Tainted: G    B      O
 4.0.5-300.fc22.x86_64 #1
[17288.791206] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
1.30 06/06/2014
[17288.791209]  0000000000000000 0000000064b601fc ffff8801fff9baf8
ffffffff817834f4
[17288.791215]  0000000000000000 ffffea0008b39c80 ffff8801fff9bb28
ffffffff811a48ec
[17288.791221]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
ffff88023ed98e58
[17288.791226] Call Trace:
[17288.791241]  [<ffffffff817834f4>] dump_stack+0x45/0x57
[17288.791249]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
[17288.791256]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
[17288.791263]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
[17288.791271]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
[17288.791278]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
[17288.791283]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
[17288.791292]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
[17288.791300]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
[17288.791307]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
[17288.791314]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
[17288.791320]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
[17288.791325]  [<ffffffff81064011>] do_page_fault+0x31/0x70
[17288.791332]  [<ffffffff8178bb08>] page_fault+0x28/0x30
[17288.791338] BUG: Bad page state in process firefox  pfn:22ce73
[17288.791342] page:ffffea0008b39cc0 count:0 mapcount:0 mapping:
   (null) index:0x2
[17288.791344] flags: 0x5ffff800000004(referenced)
[17288.791349] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[17288.791351] bad because of flags:
[17288.791354] flags: 0x4(referenced)
[17288.791357] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
snd_hwdep snd_seq snd_seq_device
[17288.791419]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
rtsx_pci mii mfd_core
[17288.791464] CPU: 3 PID: 2844 Comm: firefox Tainted: G    B      O
 4.0.5-300.fc22.x86_64 #1
[17288.791467] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
1.30 06/06/2014
[17288.791470]  0000000000000000 0000000064b601fc ffff8801fff9baf8
ffffffff817834f4
[17288.791475]  0000000000000000 ffffea0008b39cc0 ffff8801fff9bb28
ffffffff811a48ec
[17288.791480]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
ffff88023ed98e58
[17288.791486] Call Trace:
[17288.791492]  [<ffffffff817834f4>] dump_stack+0x45/0x57
[17288.791497]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
[17288.791503]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
[17288.791509]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
[17288.791514]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
[17288.791521]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
[17288.791527]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
[17288.791534]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
[17288.791540]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
[17288.791546]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
[17288.791552]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
[17288.791558]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
[17288.791563]  [<ffffffff81064011>] do_page_fault+0x31/0x70
[17288.791568]  [<ffffffff8178bb08>] page_fault+0x28/0x30
[17289.351078] hfsplus: recoff 14131 too large
[17289.900811] hfsplus: recoff 14131 too large
[17290.059094] hfsplus: recoff 16933 too large
[17323.760264] hfsplus: recoff 27392 too large
[17323.761847] hfsplus: recoff 27392 too large
[17323.762163] hfsplus: recoff 27392 too large
[17323.762435] hfsplus: recoff 27392 too large
[17323.762545] hfsplus: recoff 27392 too large
[17323.762811] hfsplus: recoff 27392 too large
[17323.763119] hfsplus: recoff 27392 too large
[17323.763236] hfsplus: recoff 27392 too large
[17323.763345] hfsplus: recoff 27392 too large
[17323.763453] hfsplus: recoff 27392 too large
[17323.763725] hfsplus: recoff 27392 too large
[17348.390526] hfsplus: recoff 30565 too large
[17348.390629] hfsplus: recoff 30565 too large
[17371.062906] hfsplus: bad catalog entry type
[17371.062944] hfsplus: bad catalog entry type
[17387.389622] hfsplus: recoff 22364 too large
[17387.501186] hfsplus: recoff 22364 too large
[17394.430052] hfsplus: walked past end of dir
[17394.430099] hfsplus: walked past end of dir
[17395.077257] hfsplus: recoff 18688 too large
[17395.081749] hfsplus: recoff 18688 too large
[17395.086034] hfsplus: recoff 18688 too large
[17395.090322] hfsplus: recoff 18688 too large
[17395.094605] hfsplus: recoff 18688 too large
[17395.099004] hfsplus: recoff 18688 too large
[17395.103987] hfsplus: recoff 18688 too large
[17395.108969] hfsplus: recoff 18688 too large
[17395.113921] hfsplus: recoff 18688 too large
[17395.123899] hfsplus: recoff 18688 too large
[17395.128711] hfsplus: recoff 18688 too large
[17395.133313] hfsplus: recoff 18688 too large
[17395.142577] hfsplus: recoff 18688 too large
[17395.147776] hfsplus: recoff 18688 too large
[17395.157595] hfsplus: recoff 18688 too large
[17399.292963] BUG: Bad page state in process qemu-nbd  pfn:2307de
[17399.292975] page:ffffea0008c1f780 count:0 mapcount:0 mapping:
   (null) index:0x2
[17399.292979] flags: 0x5ffff800000004(referenced)
[17399.292986] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[17399.292988] bad because of flags:
[17399.292991] flags: 0x4(referenced)
[17399.292995] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
snd_hwdep snd_seq snd_seq_device
[17399.293117]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
rtsx_pci mii mfd_core
[17399.293181] CPU: 3 PID: 14819 Comm: qemu-nbd Tainted: G    B      O
   4.0.5-300.fc22.x86_64 #1
[17399.293185] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
1.30 06/06/2014
[17399.293189]  0000000000000000 00000000ffffffff ffff88010a30b948
ffffffff817834f4
[17399.293194]  0000000000000000 ffffea0008c1f780 ffff88010a30b978
ffffffff811a48ec
[17399.293199]  ffffffff811c0330 ffff88010a30bb38 ffff88023ed98e38
ffff88023ed98e58
[17399.293205] Call Trace:
[17399.293219]  [<ffffffff817834f4>] dump_stack+0x45/0x57
[17399.293227]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
[17399.293234]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
[17399.293241]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
[17399.293249]  [<ffffffff813965a1>] ? cfq_dispatch_requests+0xaf1/0xca0
[17399.293256]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
[17399.293263]  [<ffffffff810d3cfc>] ? update_curr+0x5c/0x160
[17399.293272]  [<ffffffff811f2e21>] alloc_pages_current+0x91/0x110
[17399.293277]  [<ffffffff8119fbdf>] __page_cache_alloc+0xaf/0xd0
[17399.293284]  [<ffffffff811ad601>] __do_page_cache_readahead+0x101/0x250
[17399.293291]  [<ffffffff81117a14>] ? futex_wait+0x204/0x290
[17399.293298]  [<ffffffff811ad9c4>] ondemand_readahead+0x274/0x280
[17399.293304]  [<ffffffff811adb1e>] page_cache_sync_readahead+0x2e/0x50
[17399.293309]  [<ffffffff811a1574>] generic_file_read_iter+0x504/0x620
[17399.293315]  [<ffffffff8121c92e>] new_sync_read+0x8e/0xd0
[17399.293321]  [<ffffffff8121dc38>] __vfs_read+0x18/0x50
[17399.293326]  [<ffffffff8121dcf7>] vfs_read+0x87/0x140
[17399.293331]  [<ffffffff8121dfea>] SyS_pread64+0x9a/0xc0
[17399.293339]  [<ffffffff81789b09>] system_call_fastpath+0x12/0x17
------------------

My previous experience with is problem has a sightly different oops,
but closer to what's in
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
involving hfsplus_brec_find and hfsplus_bnode_read .

I remember there were other simular reports concerning updatedb,but I
could only find
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
No doubt there are more.

I agree that Sergei needs to explain the problem clearer though.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sergei Antonov June 14, 2015, 2:18 p.m. UTC | #14

On 14 June 2015 at 04:27, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 10 June 2015 at 00:40, Sergei Antonov <saproj@gmail.com> wrote:
> ...
>> I did not try to change the logic of the driver. Just fixed one
>> glaring defect. Which, by the way, in addition to the aforementioned
>> bug by Sasha Levin caused:
>> 1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
>> 2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
>> 3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
>> 4. https://bugzilla.kernel.org/show_bug.cgi?id=42342
>
> For some unknown reason majordomo won't let my regular sourceforge
> account/alias post
> to its lists (not just fsdevel, but also git), anyway, I just managed
> to reproduced the issue -

With the patch or without?

> my older hardware has been retired so it is harder on more-generous
> memory systems.
> The steps are:
> 1. mount mac os x system disk image (I do read-only, and also from
> qemu-nbd read-only, but those
> details don't seem to be important)
> 2. run "du /mnt" on the mac os x disk mount repeatedly; I wasn't able
> to cause the kernel
> to fault on my current system - 8GB, kernel 4.0.5-300.fc22.x86_64 ,
> until I also run 'du /' (i.e. the whole linux system) in a separate window.
>
> Then I get this:
> --------------
> [17257.917645] hfsplus: recoff 22364 too large
> [17257.993329] hfsplus: recoff 22364 too large
> [17274.686714] hfsplus: recoff 27136 too large
> [17274.686747] hfsplus: recoff 27136 too large
> [17276.643246] hfsplus: recoff 28528 too large
> [17276.643283] hfsplus: recoff 28528 too large
> [17276.643334] hfsplus: recoff 31034 too large
> [17276.643432] hfsplus: recoff 31034 too large
> [17276.644568] hfsplus: recoff 31034 too large
> [17276.645408] hfsplus: recoff 31034 too large
> [17276.648083] hfsplus: recoff 31034 too large
> [17283.456405] hfsplus: recoff 18688 too large
> [17283.484236] hfsplus: recoff 18688 too large
> [17283.489637] hfsplus: recoff 18688 too large
> [17283.567689] hfsplus: recoff 18688 too large
> [17283.580719] hfsplus: recoff 18688 too large
> [17283.586192] hfsplus: recoff 18688 too large
> [17283.590115] hfsplus: recoff 18688 too large
> [17283.594808] hfsplus: recoff 18688 too large
> [17283.598225] hfsplus: recoff 18688 too large
> [17283.608131] hfsplus: recoff 18688 too large
> [17283.613090] hfsplus: recoff 18688 too large
> [17283.621220] hfsplus: recoff 18688 too large
> [17283.777970] hfsplus: recoff 18688 too large
> [17283.816374] hfsplus: recoff 18688 too large
> [17283.902398] hfsplus: recoff 18688 too large
> [17284.552987] hfsplus: recoff 16933 too large
> [17284.738526] hfsplus: recoff 25661 too large
> [17284.738557] hfsplus: recoff 25661 too large
> [17284.871063] hfsplus: recoff 14131 too large
> [17284.871992] hfsplus: recoff 14131 too large
> [17287.330129] hfsplus: recoff 14131 too large
> [17287.685383] hfsplus: recoff 14131 too large
> [17287.860485] hfsplus: recoff 14131 too large
> [17287.881720] hfsplus: recoff 14131 too large
> [17287.923793] hfsplus: recoff 14131 too large
> [17288.133657] hfsplus: recoff 14131 too large
> [17288.310771] hfsplus: recoff 14131 too large
> [17288.789545] BUG: Bad page state in process firefox  pfn:22cb45
> [17288.789559] page:ffffea0008b2d140 count:0 mapcount:0 mapping:
>    (null) index:0x2
> [17288.789563] flags: 0x5ffff800000004(referenced)
> [17288.789570] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> [17288.789573] bad because of flags:
> [17288.789576] flags: 0x4(referenced)
> [17288.789580] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
> fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
> fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
> v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
> snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
> ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
> ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
> snd_hwdep snd_seq snd_seq_device
> [17288.789660]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
> edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
> serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
> acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
> toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
> rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
> rtsx_pci mii mfd_core
> [17288.789722] CPU: 3 PID: 2844 Comm: firefox Tainted: G           O
>  4.0.5-300.fc22.x86_64 #1
> [17288.789726] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
> 1.30 06/06/2014
> [17288.789730]  0000000000000000 0000000064b601fc ffff8801fff9baf8
> ffffffff817834f4
> [17288.789736]  0000000000000000 ffffea0008b2d140 ffff8801fff9bb28
> ffffffff811a48ec
> [17288.789742]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
> ffff88023ed98e58
> [17288.789748] Call Trace:
> [17288.789762]  [<ffffffff817834f4>] dump_stack+0x45/0x57
> [17288.789771]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
> [17288.789780]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
> [17288.789787]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
> [17288.789794]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
> [17288.789802]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
> [17288.789808]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
> [17288.789817]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
> [17288.789825]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
> [17288.789832]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
> [17288.789839]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
> [17288.789846]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
> [17288.789851]  [<ffffffff81064011>] do_page_fault+0x31/0x70
> [17288.789858]  [<ffffffff8178bb08>] page_fault+0x28/0x30
> [17288.789862] Disabling lock debugging due to kernel taint
> [17288.790999] BUG: Bad page state in process firefox  pfn:22ce72
> [17288.791042] page:ffffea0008b39c80 count:0 mapcount:0 mapping:
>    (null) index:0x2
> [17288.791046] flags: 0x5ffff800000004(referenced)
> [17288.791053] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> [17288.791055] bad because of flags:
> [17288.791057] flags: 0x4(referenced)
> [17288.791062] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
> fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
> fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
> v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
> snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
> ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
> ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
> snd_hwdep snd_seq snd_seq_device
> [17288.791141]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
> edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
> serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
> acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
> toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
> rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
> rtsx_pci mii mfd_core
> [17288.791202] CPU: 3 PID: 2844 Comm: firefox Tainted: G    B      O
>  4.0.5-300.fc22.x86_64 #1
> [17288.791206] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
> 1.30 06/06/2014
> [17288.791209]  0000000000000000 0000000064b601fc ffff8801fff9baf8
> ffffffff817834f4
> [17288.791215]  0000000000000000 ffffea0008b39c80 ffff8801fff9bb28
> ffffffff811a48ec
> [17288.791221]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
> ffff88023ed98e58
> [17288.791226] Call Trace:
> [17288.791241]  [<ffffffff817834f4>] dump_stack+0x45/0x57
> [17288.791249]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
> [17288.791256]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
> [17288.791263]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
> [17288.791271]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
> [17288.791278]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
> [17288.791283]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
> [17288.791292]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
> [17288.791300]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
> [17288.791307]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
> [17288.791314]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
> [17288.791320]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
> [17288.791325]  [<ffffffff81064011>] do_page_fault+0x31/0x70
> [17288.791332]  [<ffffffff8178bb08>] page_fault+0x28/0x30
> [17288.791338] BUG: Bad page state in process firefox  pfn:22ce73
> [17288.791342] page:ffffea0008b39cc0 count:0 mapcount:0 mapping:
>    (null) index:0x2
> [17288.791344] flags: 0x5ffff800000004(referenced)
> [17288.791349] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> [17288.791351] bad because of flags:
> [17288.791354] flags: 0x4(referenced)
> [17288.791357] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
> fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
> fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
> v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
> snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
> ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
> ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
> snd_hwdep snd_seq snd_seq_device
> [17288.791419]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
> edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
> serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
> acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
> toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
> rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
> rtsx_pci mii mfd_core
> [17288.791464] CPU: 3 PID: 2844 Comm: firefox Tainted: G    B      O
>  4.0.5-300.fc22.x86_64 #1
> [17288.791467] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
> 1.30 06/06/2014
> [17288.791470]  0000000000000000 0000000064b601fc ffff8801fff9baf8
> ffffffff817834f4
> [17288.791475]  0000000000000000 ffffea0008b39cc0 ffff8801fff9bb28
> ffffffff811a48ec
> [17288.791480]  ffffffff811c0330 ffff8801fff9bce8 ffff88023ed98e38
> ffff88023ed98e58
> [17288.791486] Call Trace:
> [17288.791492]  [<ffffffff817834f4>] dump_stack+0x45/0x57
> [17288.791497]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
> [17288.791503]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
> [17288.791509]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
> [17288.791514]  [<ffffffff8101265b>] ? __switch_to+0x19b/0x5e0
> [17288.791521]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
> [17288.791527]  [<ffffffff811a7db6>] ? free_hot_cold_page_list+0x56/0xb0
> [17288.791534]  [<ffffffff811f4778>] alloc_pages_vma+0xb8/0x230
> [17288.791540]  [<ffffffff811d2c3c>] handle_mm_fault+0x10ec/0x1840
> [17288.791546]  [<ffffffff810cea88>] ? __enqueue_entity+0x78/0x80
> [17288.791552]  [<ffffffff810d8b29>] ? pick_next_task_fair+0x919/0x970
> [17288.791558]  [<ffffffff81063d33>] __do_page_fault+0x193/0x440
> [17288.791563]  [<ffffffff81064011>] do_page_fault+0x31/0x70
> [17288.791568]  [<ffffffff8178bb08>] page_fault+0x28/0x30
> [17289.351078] hfsplus: recoff 14131 too large
> [17289.900811] hfsplus: recoff 14131 too large
> [17290.059094] hfsplus: recoff 16933 too large
> [17323.760264] hfsplus: recoff 27392 too large
> [17323.761847] hfsplus: recoff 27392 too large
> [17323.762163] hfsplus: recoff 27392 too large
> [17323.762435] hfsplus: recoff 27392 too large
> [17323.762545] hfsplus: recoff 27392 too large
> [17323.762811] hfsplus: recoff 27392 too large
> [17323.763119] hfsplus: recoff 27392 too large
> [17323.763236] hfsplus: recoff 27392 too large
> [17323.763345] hfsplus: recoff 27392 too large
> [17323.763453] hfsplus: recoff 27392 too large
> [17323.763725] hfsplus: recoff 27392 too large
> [17348.390526] hfsplus: recoff 30565 too large
> [17348.390629] hfsplus: recoff 30565 too large
> [17371.062906] hfsplus: bad catalog entry type
> [17371.062944] hfsplus: bad catalog entry type
> [17387.389622] hfsplus: recoff 22364 too large
> [17387.501186] hfsplus: recoff 22364 too large
> [17394.430052] hfsplus: walked past end of dir
> [17394.430099] hfsplus: walked past end of dir
> [17395.077257] hfsplus: recoff 18688 too large
> [17395.081749] hfsplus: recoff 18688 too large
> [17395.086034] hfsplus: recoff 18688 too large
> [17395.090322] hfsplus: recoff 18688 too large
> [17395.094605] hfsplus: recoff 18688 too large
> [17395.099004] hfsplus: recoff 18688 too large
> [17395.103987] hfsplus: recoff 18688 too large
> [17395.108969] hfsplus: recoff 18688 too large
> [17395.113921] hfsplus: recoff 18688 too large
> [17395.123899] hfsplus: recoff 18688 too large
> [17395.128711] hfsplus: recoff 18688 too large
> [17395.133313] hfsplus: recoff 18688 too large
> [17395.142577] hfsplus: recoff 18688 too large
> [17395.147776] hfsplus: recoff 18688 too large
> [17395.157595] hfsplus: recoff 18688 too large
> [17399.292963] BUG: Bad page state in process qemu-nbd  pfn:2307de
> [17399.292975] page:ffffea0008c1f780 count:0 mapcount:0 mapping:
>    (null) index:0x2
> [17399.292979] flags: 0x5ffff800000004(referenced)
> [17399.292986] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
> [17399.292988] bad because of flags:
> [17399.292991] flags: 0x4(referenced)
> [17399.292995] Modules linked in: nls_utf8 hfsplus nbd uas usb_storage
> fuse ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat
> fat uvcvideo videobuf2_vmalloc videobuf2_core videobuf2_memops
> v4l2_common videodev ath3k btusb bluetooth media kvm_amd kvm
> snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic arc4
> ath9k ath9k_common snd_hda_intel crct10dif_pclmul snd_hda_controller
> ath9k_hw crc32_pclmul ath snd_hda_codec crc32c_intel mac80211
> snd_hwdep snd_seq snd_seq_device
> [17399.293117]  snd_pcm ghash_clmulni_intel toshiba_acpi fam15h_power
> edac_core edac_mce_amd cfg80211 snd_timer joydev rtsx_pci_ms memstick
> serio_raw i2c_piix4 sparse_keymap k10temp wmi tpm_tis tpm video ccp
> acpi_cpufreq rfkill snd soundcore shpchp toshiba_haps
> toshiba_bluetooth nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> vboxnetadp(O) vboxnetflt(O) binfmt_misc vboxdrv(O) radeon
> rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper ttm drm r8169
> rtsx_pci mii mfd_core
> [17399.293181] CPU: 3 PID: 14819 Comm: qemu-nbd Tainted: G    B      O
>    4.0.5-300.fc22.x86_64 #1
> [17399.293185] Hardware name: TOSHIBA SATELLITE C50D-B/ZBWAE, BIOS
> 1.30 06/06/2014
> [17399.293189]  0000000000000000 00000000ffffffff ffff88010a30b948
> ffffffff817834f4
> [17399.293194]  0000000000000000 ffffea0008c1f780 ffff88010a30b978
> ffffffff811a48ec
> [17399.293199]  ffffffff811c0330 ffff88010a30bb38 ffff88023ed98e38
> ffff88023ed98e58
> [17399.293205] Call Trace:
> [17399.293219]  [<ffffffff817834f4>] dump_stack+0x45/0x57
> [17399.293227]  [<ffffffff811a48ec>] bad_page.part.62+0xbc/0x110
> [17399.293234]  [<ffffffff811c0330>] ? zone_statistics+0x80/0xa0
> [17399.293241]  [<ffffffff811a9135>] get_page_from_freelist+0x4f5/0xab0
> [17399.293249]  [<ffffffff813965a1>] ? cfq_dispatch_requests+0xaf1/0xca0
> [17399.293256]  [<ffffffff811a998a>] __alloc_pages_nodemask+0x19a/0xa10
> [17399.293263]  [<ffffffff810d3cfc>] ? update_curr+0x5c/0x160
> [17399.293272]  [<ffffffff811f2e21>] alloc_pages_current+0x91/0x110
> [17399.293277]  [<ffffffff8119fbdf>] __page_cache_alloc+0xaf/0xd0
> [17399.293284]  [<ffffffff811ad601>] __do_page_cache_readahead+0x101/0x250
> [17399.293291]  [<ffffffff81117a14>] ? futex_wait+0x204/0x290
> [17399.293298]  [<ffffffff811ad9c4>] ondemand_readahead+0x274/0x280
> [17399.293304]  [<ffffffff811adb1e>] page_cache_sync_readahead+0x2e/0x50
> [17399.293309]  [<ffffffff811a1574>] generic_file_read_iter+0x504/0x620
> [17399.293315]  [<ffffffff8121c92e>] new_sync_read+0x8e/0xd0
> [17399.293321]  [<ffffffff8121dc38>] __vfs_read+0x18/0x50
> [17399.293326]  [<ffffffff8121dcf7>] vfs_read+0x87/0x140
> [17399.293331]  [<ffffffff8121dfea>] SyS_pread64+0x9a/0xc0
> [17399.293339]  [<ffffffff81789b09>] system_call_fastpath+0x12/0x17
> ------------------
>
> My previous experience with is problem has a sightly different oops,
> but closer to what's in
> http://www.spinics.net/lists/linux-fsdevel/msg63807.html
> involving hfsplus_brec_find and hfsplus_bnode_read .
>
> I remember there were other simular reports concerning updatedb,but I
> could only find
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
> No doubt there are more.
>
> I agree that Sergei needs to explain the problem clearer though.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hin-Tak Leung June 17, 2015, 11:26 p.m. UTC | #15

On 14 June 2015 at 15:18, Sergei Antonov <saproj@gmail.com> wrote:
> On 14 June 2015 at 04:27, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>> On 10 June 2015 at 00:40, Sergei Antonov <saproj@gmail.com> wrote:
>> ...
>>> I did not try to change the logic of the driver. Just fixed one
>>> glaring defect. Which, by the way, in addition to the aforementioned
>>> bug by Sasha Levin caused:
>>> 1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
>>> 2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
>>> 3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
>>> 4. https://bugzilla.kernel.org/show_bug.cgi?id=42342
>>
>> For some unknown reason majordomo won't let my regular sourceforge
>> account/alias post
>> to its lists (not just fsdevel, but also git), anyway, I just managed
>> to reproduced the issue -
>
> With the patch or without?

Without. My older computer died about 9 months ago and I have not tried
to see the problem on my current hardware until now. The point
I am trying to see is to first see the problem on my current hardware,
then test the change (which I'll hope to find time to do in the next few days).
Otherwise, it is from not-see-problem to not-see-problem.

I'll write again after I've tested the change.

Hin-Tak
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sergei Antonov June 18, 2015, 1:09 p.m. UTC | #16

On 18 June 2015 at 01:26, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 14 June 2015 at 15:18, Sergei Antonov <saproj@gmail.com> wrote:
>> On 14 June 2015 at 04:27, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>> On 10 June 2015 at 00:40, Sergei Antonov <saproj@gmail.com> wrote:
>>> ...
>>>> I did not try to change the logic of the driver. Just fixed one
>>>> glaring defect. Which, by the way, in addition to the aforementioned
>>>> bug by Sasha Levin caused:
>>>> 1. A "du"-related bug reported by Hin-Tak Leung earlier in the list.
>>>> 2. https://bugzilla.kernel.org/show_bug.cgi?id=63841
>>>> 3. https://bugzilla.kernel.org/show_bug.cgi?id=78761
>>>> 4. https://bugzilla.kernel.org/show_bug.cgi?id=42342
>>>
>>> For some unknown reason majordomo won't let my regular sourceforge
>>> account/alias post
>>> to its lists (not just fsdevel, but also git), anyway, I just managed
>>> to reproduced the issue -
>>
>> With the patch or without?
>
> Without. My older computer died about 9 months ago and I have not tried
> to see the problem on my current hardware until now. The point
> I am trying to see is to first see the problem on my current hardware,
> then test the change (which I'll hope to find time to do in the next few days).
> Otherwise, it is from not-see-problem to not-see-problem.
>
> I'll write again after I've tested the change.

I'm looking forward to it.
The result of the test (which I'm sure will show the bug is gone) will
be by far more valuable contribution than your other messages in the
thread.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

hfsplus: release bnode pages after use, not before

Commit Message

Comments

Patch