diff mbox

hfsplus: release bnode pages after use, not before

Message ID CAJMB+Ng3MPHfMXcPvTU0udxX7_Z5+pFVhWVQuP-yzecA-QqV8w@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hin-Tak Leung June 18, 2015, 2:51 a.m. UTC
Hi Andrew and everybody else,

I looked through the pre-git history and seem to have found the reason of how
the current code had come to be, and why Sergei's fix seems to involve
re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
debug code"
commit below did not remove the right stuff.

Looking at the old code, I think REF_PAGES may have started out being
1 (i.e. pages are released right after create, then get/put when needed, no
need to release on bnode_free),
then the code was modified for efficiency so that pages are not released
on bnode_create and not put/get at bnode_put/get but release at bnode_free.
But it was still largely working in the REF_PAGE=1 mode (and with the commented
out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
Then it was flipped over, and seems to work, and things were forgotten.

I think the release at bnode_free was commented out because the person who
wrote it wasn't sure - I am not sure either, since it seems like there might be
other/more places that one might need to release the pages than just
at bnode_free().

Does this train of thought make sense?

Hin-Tak

<commit>
commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date:   Tue Sep 6 15:18:47 2005 -0700

    [PATCH] hfs: remove debug code

    This removes some old debug code, which is no longer needed.

    Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>


@@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
 {
     if (node) {
         atomic_inc(&node->refcnt);
-#if REF_PAGES
-        {
-        int i;
-        for (i = 0; i < node->tree->pages_per_bnode; i++)
-            get_page(node->page[i]);
-        }
-#endif
         dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
                node->tree->cnid, node->this, atomic_read(&node->refcnt));
     }
@@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
                node->tree->cnid, node->this, atomic_read(&node->refcnt));
         if (!atomic_read(&node->refcnt))
             BUG();
-        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
-#if REF_PAGES
-            for (i = 0; i < tree->pages_per_bnode; i++)
-                put_page(node->page[i]);
-#endif
+        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
             return;
-        }
         for (i = 0; i < tree->pages_per_bnode; i++) {
             if (!node->page[i])
                 continue;
             mark_page_accessed(node->page[i]);
-#if REF_PAGES
-            put_page(node->page[i]);
-#endif
         }

         if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
</commit>

On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
> ------------------------------
> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>
>>Hi Andrew,
>>
>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>
>>Feel free to add:
>>
>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>
>>Thanks a lot in advance!
>>
>>Best regards,
>>
>>    Anton
>
> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>
> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>
> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
> I'm looking over the pre-git history to see how that comes about.
>
> Hin-Tak
>
>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>
>>> Fix this bugreport by Sasha Levin:
>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>
>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>> Cc: Christoph Hellwig <hch@infradead.org>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>> Cc: Sougata Santra <sougata@tuxera.com>
>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>> ---
>>> fs/hfsplus/bnode.c | 6 ++----
>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>> index 759708f..5af50fb 100644
>>> --- a/fs/hfsplus/bnode.c
>>> +++ b/fs/hfsplus/bnode.c
>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>             page_cache_release(page);
>>>             goto fail;
>>>         }
>>> -        page_cache_release(page);
>>>         node->page[i] = page;
>>>     }
>>>
>>> @@ -566,13 +565,12 @@ node_error:
>>>
>>> void hfs_bnode_free(struct hfs_bnode *node)
>>> {
>>> -#if 0
>>>     int i;
>>>
>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>         if (node->page[i])
>>>             page_cache_release(node->page[i]);
>>> -#endif
>>> +    }
>>>     kfree(node);
>>> }
>>>
>>> --
>>> 2.3.0
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>--
>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>Linux NTFS maintainer
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sergei Antonov June 18, 2015, 12:58 p.m. UTC | #1
On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> Hi Andrew and everybody else,
>
> I looked through the pre-git history and seem to have found the reason of how
> the current code had come to be, and why Sergei's fix seems to involve
> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
> debug code"
> commit below did not remove the right stuff.
>
> Looking at the old code, I think REF_PAGES may have started out being
> 1 (i.e. pages are released right after create, then get/put when needed, no
> need to release on bnode_free),
> then the code was modified for efficiency so that pages are not released
> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
> But it was still largely working in the REF_PAGE=1 mode (and with the commented
> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
> Then it was flipped over, and seems to work, and things were forgotten.
>
> I think the release at bnode_free was commented out because the person who
> wrote it wasn't sure - I am not sure either, since it seems like there might be
> other/more places that one might need to release the pages than just
> at bnode_free().
>
> Does this train of thought make sense?

No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
at first it was my thought too. But later I considered what the logic
of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
came to a conclusion that REF_PAGES=1 was a safe approach and
REF_PAGES=0 was an experimental unsafe release-early approach, so
!REF_PAGES was what he really meant.

>
> Hin-Tak
>
> <commit>
> commit a5e3985fa014029eb6795664c704953720cc7f7d
> Author: Roman Zippel <zippel@linux-m68k.org>
> Date:   Tue Sep 6 15:18:47 2005 -0700
>
>     [PATCH] hfs: remove debug code
>
>     This removes some old debug code, which is no longer needed.
>
>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>
> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
> index a096c5a..3d5cdc6 100644
> --- a/fs/hfs/bnode.c
> +++ b/fs/hfs/bnode.c
> @@ -13,8 +13,6 @@
>
>  #include "btree.h"
>
> -#define REF_PAGES    0
> -
>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>          int off, int len)
>  {
> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
> hfs_btree *tree, u32 cnid)
>              page_cache_release(page);
>              goto fail;
>          }
> -#if !REF_PAGES
>          page_cache_release(page);
> -#endif
>          node->page[i] = page;
>      }
>
> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>  {
>      if (node) {
>          atomic_inc(&node->refcnt);
> -#if REF_PAGES
> -        {
> -        int i;
> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
> -            get_page(node->page[i]);
> -        }
> -#endif
>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>      }
> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>          if (!atomic_read(&node->refcnt))
>              BUG();
> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
> -#if REF_PAGES
> -            for (i = 0; i < tree->pages_per_bnode; i++)
> -                put_page(node->page[i]);
> -#endif
> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>              return;
> -        }
>          for (i = 0; i < tree->pages_per_bnode; i++) {
>              if (!node->page[i])
>                  continue;
>              mark_page_accessed(node->page[i]);
> -#if REF_PAGES
> -            put_page(node->page[i]);
> -#endif
>          }
>
>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
> index 8868d3b..b85abc6 100644
> --- a/fs/hfsplus/bnode.c
> +++ b/fs/hfsplus/bnode.c
> @@ -18,8 +18,6 @@
>  #include "hfsplus_fs.h"
>  #include "hfsplus_raw.h"
>
> -#define REF_PAGES    0
> -
>  /* Copy a specified range of bytes from the raw data of a node */
>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>  {
> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
> hfs_btree *tree, u32 cnid)
>              page_cache_release(page);
>              goto fail;
>          }
> -#if !REF_PAGES
>          page_cache_release(page);
> -#endif
>          node->page[i] = page;
>      }
>
> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>  {
>      if (node) {
>          atomic_inc(&node->refcnt);
> -#if REF_PAGES
> -        {
> -        int i;
> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
> -            get_page(node->page[i]);
> -        }
> -#endif
>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>      }
> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>          if (!atomic_read(&node->refcnt))
>              BUG();
> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
> -#if REF_PAGES
> -            for (i = 0; i < tree->pages_per_bnode; i++)
> -                put_page(node->page[i]);
> -#endif
> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>              return;
> -        }
>          for (i = 0; i < tree->pages_per_bnode; i++) {
>              if (!node->page[i])
>                  continue;
>              mark_page_accessed(node->page[i]);
> -#if REF_PAGES
> -            put_page(node->page[i]);
> -#endif
>          }
>
>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
> </commit>
>
> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>> ------------------------------
>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>
>>>Hi Andrew,
>>>
>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>
>>>Feel free to add:
>>>
>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>
>>>Thanks a lot in advance!
>>>
>>>Best regards,
>>>
>>>    Anton
>>
>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>
>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>
>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>> I'm looking over the pre-git history to see how that comes about.
>>
>> Hin-Tak
>>
>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>
>>>> Fix this bugreport by Sasha Levin:
>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>
>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>> ---
>>>> fs/hfsplus/bnode.c | 6 ++----
>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>> index 759708f..5af50fb 100644
>>>> --- a/fs/hfsplus/bnode.c
>>>> +++ b/fs/hfsplus/bnode.c
>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>             page_cache_release(page);
>>>>             goto fail;
>>>>         }
>>>> -        page_cache_release(page);
>>>>         node->page[i] = page;
>>>>     }
>>>>
>>>> @@ -566,13 +565,12 @@ node_error:
>>>>
>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>> {
>>>> -#if 0
>>>>     int i;
>>>>
>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>         if (node->page[i])
>>>>             page_cache_release(node->page[i]);
>>>> -#endif
>>>> +    }
>>>>     kfree(node);
>>>> }
>>>>
>>>> --
>>>> 2.3.0
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>--
>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>Linux NTFS maintainer
>>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hin-Tak Leung June 18, 2015, 3:37 p.m. UTC | #2
On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>> Hi Andrew and everybody else,
>>
>> I looked through the pre-git history and seem to have found the reason of how
>> the current code had come to be, and why Sergei's fix seems to involve
>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>> debug code"
>> commit below did not remove the right stuff.
>>
>> Looking at the old code, I think REF_PAGES may have started out being
>> 1 (i.e. pages are released right after create, then get/put when needed, no
>> need to release on bnode_free),
>> then the code was modified for efficiency so that pages are not released
>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>> Then it was flipped over, and seems to work, and things were forgotten.
>>
>> I think the release at bnode_free was commented out because the person who
>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>> other/more places that one might need to release the pages than just
>> at bnode_free().
>>
>> Does this train of thought make sense?
>
> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
> at first it was my thought too. But later I considered what the logic
> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
> came to a conclusion that REF_PAGES=1 was a safe approach and
> REF_PAGES=0 was an experimental unsafe release-early approach, so
> !REF_PAGES was what he really meant.
>

You are wrong, it seems. I found even earlier versions of the code,
with history, at
http://sourceforge.net/projects/linux-hfsplus/ .
The earlier version of the code
(http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
does get_page and put_page at bnode_get and bnode_put, while the later version
(http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
)
have those removed, dated 2001 december and 2002 june.

The change log says so:

Date:   Thu Jun 6 09:45:14 2002 +0000

    Use buffer cache instead of page cache in bnode.c. Cache inode extents.

The change also removed any page_cache_release() . It would seem that
the additional REF_PAGES
conditional was a fork/rewrite/clean-up which re-uses the earlier
development logic, and eventually took over.

The release-early code came first, not after. That makes sense, as
releasing early
and read_cache_page on each bnode_get is the slower, safer but
inefficient approach.
It would be absurd to go from caching pages, then to 'experiment' to
see if release-early + later read_cache_page
on each bnode_get 'improves'.

>>
>> Hin-Tak
>>
>> <commit>
>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>> Author: Roman Zippel <zippel@linux-m68k.org>
>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>
>>     [PATCH] hfs: remove debug code
>>
>>     This removes some old debug code, which is no longer needed.
>>
>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>
>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>> index a096c5a..3d5cdc6 100644
>> --- a/fs/hfs/bnode.c
>> +++ b/fs/hfs/bnode.c
>> @@ -13,8 +13,6 @@
>>
>>  #include "btree.h"
>>
>> -#define REF_PAGES    0
>> -
>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>          int off, int len)
>>  {
>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>> hfs_btree *tree, u32 cnid)
>>              page_cache_release(page);
>>              goto fail;
>>          }
>> -#if !REF_PAGES
>>          page_cache_release(page);
>> -#endif
>>          node->page[i] = page;
>>      }
>>
>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>  {
>>      if (node) {
>>          atomic_inc(&node->refcnt);
>> -#if REF_PAGES
>> -        {
>> -        int i;
>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>> -            get_page(node->page[i]);
>> -        }
>> -#endif
>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>      }
>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>          if (!atomic_read(&node->refcnt))
>>              BUG();
>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>> -#if REF_PAGES
>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>> -                put_page(node->page[i]);
>> -#endif
>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>              return;
>> -        }
>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>              if (!node->page[i])
>>                  continue;
>>              mark_page_accessed(node->page[i]);
>> -#if REF_PAGES
>> -            put_page(node->page[i]);
>> -#endif
>>          }
>>
>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>> index 8868d3b..b85abc6 100644
>> --- a/fs/hfsplus/bnode.c
>> +++ b/fs/hfsplus/bnode.c
>> @@ -18,8 +18,6 @@
>>  #include "hfsplus_fs.h"
>>  #include "hfsplus_raw.h"
>>
>> -#define REF_PAGES    0
>> -
>>  /* Copy a specified range of bytes from the raw data of a node */
>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>  {
>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>> hfs_btree *tree, u32 cnid)
>>              page_cache_release(page);
>>              goto fail;
>>          }
>> -#if !REF_PAGES
>>          page_cache_release(page);
>> -#endif
>>          node->page[i] = page;
>>      }
>>
>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>  {
>>      if (node) {
>>          atomic_inc(&node->refcnt);
>> -#if REF_PAGES
>> -        {
>> -        int i;
>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>> -            get_page(node->page[i]);
>> -        }
>> -#endif
>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>      }
>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>          if (!atomic_read(&node->refcnt))
>>              BUG();
>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>> -#if REF_PAGES
>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>> -                put_page(node->page[i]);
>> -#endif
>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>              return;
>> -        }
>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>              if (!node->page[i])
>>                  continue;
>>              mark_page_accessed(node->page[i]);
>> -#if REF_PAGES
>> -            put_page(node->page[i]);
>> -#endif
>>          }
>>
>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>> </commit>
>>
>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>> ------------------------------
>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>
>>>>Hi Andrew,
>>>>
>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>
>>>>Feel free to add:
>>>>
>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>
>>>>Thanks a lot in advance!
>>>>
>>>>Best regards,
>>>>
>>>>    Anton
>>>
>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>
>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>
>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>> I'm looking over the pre-git history to see how that comes about.
>>>
>>> Hin-Tak
>>>
>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>
>>>>> Fix this bugreport by Sasha Levin:
>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>
>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>> ---
>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>> index 759708f..5af50fb 100644
>>>>> --- a/fs/hfsplus/bnode.c
>>>>> +++ b/fs/hfsplus/bnode.c
>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>             page_cache_release(page);
>>>>>             goto fail;
>>>>>         }
>>>>> -        page_cache_release(page);
>>>>>         node->page[i] = page;
>>>>>     }
>>>>>
>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>
>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>> {
>>>>> -#if 0
>>>>>     int i;
>>>>>
>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>         if (node->page[i])
>>>>>             page_cache_release(node->page[i]);
>>>>> -#endif
>>>>> +    }
>>>>>     kfree(node);
>>>>> }
>>>>>
>>>>> --
>>>>> 2.3.0
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>--
>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>Linux NTFS maintainer
>>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Antonov June 18, 2015, 4:19 p.m. UTC | #3
On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>> Hi Andrew and everybody else,
>>>
>>> I looked through the pre-git history and seem to have found the reason of how
>>> the current code had come to be, and why Sergei's fix seems to involve
>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>> debug code"
>>> commit below did not remove the right stuff.
>>>
>>> Looking at the old code, I think REF_PAGES may have started out being
>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>> need to release on bnode_free),
>>> then the code was modified for efficiency so that pages are not released
>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>
>>> I think the release at bnode_free was commented out because the person who
>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>> other/more places that one might need to release the pages than just
>>> at bnode_free().
>>>
>>> Does this train of thought make sense?
>>
>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>> at first it was my thought too. But later I considered what the logic
>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>> came to a conclusion that REF_PAGES=1 was a safe approach and
>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>> !REF_PAGES was what he really meant.
>>
>
> You are wrong, it seems. I found even earlier versions of the code,
> with history, at
> http://sourceforge.net/projects/linux-hfsplus/ .
> The earlier version of the code
> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
> does get_page and put_page at bnode_get and bnode_put, while the later version
> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
> )
> have those removed, dated 2001 december and 2002 june.

Interesting. I only looked at historical linux git.

> The change log says so:
>
> Date:   Thu Jun 6 09:45:14 2002 +0000
>
>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>
> The change also removed any page_cache_release() . It would seem that
> the additional REF_PAGES
> conditional was a fork/rewrite/clean-up which re-uses the earlier
> development logic, and eventually took over.
>
> The release-early code came first, not after. That makes sense, as
> releasing early

We may be talking about different things. I did not write what code
came earlier and what after. Anyway, this becomes to difficult to
follow. My point was that in the variant without refs, calling
page_cache_release() right after to read_cache_page() was exactly what
the author wanted (although it is a bug).

Again, we are probably arguing about different things. And I'm a
little less zealous than you to do this code archeology.

> and read_cache_page on each bnode_get is the slower, safer but
> inefficient approach.
> It would be absurd to go from caching pages, then to 'experiment' to
> see if release-early + later read_cache_page
> on each bnode_get 'improves'.
>
>>>
>>> Hin-Tak
>>>
>>> <commit>
>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>> Author: Roman Zippel <zippel@linux-m68k.org>
>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>
>>>     [PATCH] hfs: remove debug code
>>>
>>>     This removes some old debug code, which is no longer needed.
>>>
>>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>
>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>> index a096c5a..3d5cdc6 100644
>>> --- a/fs/hfs/bnode.c
>>> +++ b/fs/hfs/bnode.c
>>> @@ -13,8 +13,6 @@
>>>
>>>  #include "btree.h"
>>>
>>> -#define REF_PAGES    0
>>> -
>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>          int off, int len)
>>>  {
>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>> hfs_btree *tree, u32 cnid)
>>>              page_cache_release(page);
>>>              goto fail;
>>>          }
>>> -#if !REF_PAGES
>>>          page_cache_release(page);
>>> -#endif
>>>          node->page[i] = page;
>>>      }
>>>
>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>  {
>>>      if (node) {
>>>          atomic_inc(&node->refcnt);
>>> -#if REF_PAGES
>>> -        {
>>> -        int i;
>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> -            get_page(node->page[i]);
>>> -        }
>>> -#endif
>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>      }
>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>          if (!atomic_read(&node->refcnt))
>>>              BUG();
>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>> -#if REF_PAGES
>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>> -                put_page(node->page[i]);
>>> -#endif
>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>              return;
>>> -        }
>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>              if (!node->page[i])
>>>                  continue;
>>>              mark_page_accessed(node->page[i]);
>>> -#if REF_PAGES
>>> -            put_page(node->page[i]);
>>> -#endif
>>>          }
>>>
>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>> index 8868d3b..b85abc6 100644
>>> --- a/fs/hfsplus/bnode.c
>>> +++ b/fs/hfsplus/bnode.c
>>> @@ -18,8 +18,6 @@
>>>  #include "hfsplus_fs.h"
>>>  #include "hfsplus_raw.h"
>>>
>>> -#define REF_PAGES    0
>>> -
>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>  {
>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>> hfs_btree *tree, u32 cnid)
>>>              page_cache_release(page);
>>>              goto fail;
>>>          }
>>> -#if !REF_PAGES
>>>          page_cache_release(page);
>>> -#endif
>>>          node->page[i] = page;
>>>      }
>>>
>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>  {
>>>      if (node) {
>>>          atomic_inc(&node->refcnt);
>>> -#if REF_PAGES
>>> -        {
>>> -        int i;
>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>> -            get_page(node->page[i]);
>>> -        }
>>> -#endif
>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>      }
>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>          if (!atomic_read(&node->refcnt))
>>>              BUG();
>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>> -#if REF_PAGES
>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>> -                put_page(node->page[i]);
>>> -#endif
>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>              return;
>>> -        }
>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>              if (!node->page[i])
>>>                  continue;
>>>              mark_page_accessed(node->page[i]);
>>> -#if REF_PAGES
>>> -            put_page(node->page[i]);
>>> -#endif
>>>          }
>>>
>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>> </commit>
>>>
>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>>> ------------------------------
>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>
>>>>>Hi Andrew,
>>>>>
>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>
>>>>>Feel free to add:
>>>>>
>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>>
>>>>>Thanks a lot in advance!
>>>>>
>>>>>Best regards,
>>>>>
>>>>>    Anton
>>>>
>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>
>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>
>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>> I'm looking over the pre-git history to see how that comes about.
>>>>
>>>> Hin-Tak
>>>>
>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>>
>>>>>> Fix this bugreport by Sasha Levin:
>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>
>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>>> ---
>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>> index 759708f..5af50fb 100644
>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>             page_cache_release(page);
>>>>>>             goto fail;
>>>>>>         }
>>>>>> -        page_cache_release(page);
>>>>>>         node->page[i] = page;
>>>>>>     }
>>>>>>
>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>
>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>> {
>>>>>> -#if 0
>>>>>>     int i;
>>>>>>
>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>         if (node->page[i])
>>>>>>             page_cache_release(node->page[i]);
>>>>>> -#endif
>>>>>> +    }
>>>>>>     kfree(node);
>>>>>> }
>>>>>>
>>>>>> --
>>>>>> 2.3.0
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>--
>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>Linux NTFS maintainer
>>>>>
>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hin-Tak Leung June 18, 2015, 5:16 p.m. UTC | #4
On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote:
> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>> Hi Andrew and everybody else,
>>>>
>>>> I looked through the pre-git history and seem to have found the reason of how
>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>> debug code"
>>>> commit below did not remove the right stuff.
>>>>
>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>> need to release on bnode_free),
>>>> then the code was modified for efficiency so that pages are not released
>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>
>>>> I think the release at bnode_free was commented out because the person who
>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>> other/more places that one might need to release the pages than just
>>>> at bnode_free().
>>>>
>>>> Does this train of thought make sense?
>>>
>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>> at first it was my thought too. But later I considered what the logic
>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>> !REF_PAGES was what he really meant.
>>>
>>
>> You are wrong, it seems. I found even earlier versions of the code,
>> with history, at
>> http://sourceforge.net/projects/linux-hfsplus/ .
>> The earlier version of the code
>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>> does get_page and put_page at bnode_get and bnode_put, while the later version
>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>> )
>> have those removed, dated 2001 december and 2002 june.
>
> Interesting. I only looked at historical linux git.
>
>> The change log says so:
>>
>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>
>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>
>> The change also removed any page_cache_release() . It would seem that
>> the additional REF_PAGES
>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>> development logic, and eventually took over.
>>
>> The release-early code came first, not after. That makes sense, as
>> releasing early
>
> We may be talking about different things. I did not write what code
> came earlier and what after. Anyway, this becomes to difficult to
> follow. My point was that in the variant without refs, calling
> page_cache_release() right after to read_cache_page() was exactly what
> the author wanted (although it is a bug).
>
> Again, we are probably arguing about different things. And I'm a
> little less zealous than you to do this code archeology.
>

I would really prefer that you don't use personal descriptions such as "zealous"
as well as making the kind of comments such as
"The result of the test (which I'm sure will show the bug is gone) will
be by far more valuable contribution than your other messages in the
thread.". Those are not helpful.

In my first reply I have already stated my reasoning: your fix consists
mainly of removing an "if 0" and re-enabling seemingly dead code. So the
dead code was written for a purpose which is now lost and forgotten.

And here is that purpose - early development of the code was done
with getting and putting pages within each routine, for safeness, and migrating
towards caching pages for a longer life cycle in 2001-2002. When the
code was cleaned
up and re-worked before putting into the kernel in 2005, this
development process
was repeated but with a REF_PAGES conditional and some additional code
which was // commented out.

The migration to caching pages
would have been more or less correct, if REF_PAGES was flipped
and the // commented out code was enabled. Except there was a mistake
in one of the conditionals, and also the commented out code never got
enabled. The commented out code was where a !REF_PAGES supposed to be.
the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES"
was a mistake that has propagated.


>> and read_cache_page on each bnode_get is the slower, safer but
>> inefficient approach.
>> It would be absurd to go from caching pages, then to 'experiment' to
>> see if release-early + later read_cache_page
>> on each bnode_get 'improves'.
>>
>>>>
>>>> Hin-Tak
>>>>
>>>> <commit>
>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>>> Author: Roman Zippel <zippel@linux-m68k.org>
>>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>>
>>>>     [PATCH] hfs: remove debug code
>>>>
>>>>     This removes some old debug code, which is no longer needed.
>>>>
>>>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>
>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>>> index a096c5a..3d5cdc6 100644
>>>> --- a/fs/hfs/bnode.c
>>>> +++ b/fs/hfs/bnode.c
>>>> @@ -13,8 +13,6 @@
>>>>
>>>>  #include "btree.h"
>>>>
>>>> -#define REF_PAGES    0
>>>> -
>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>>          int off, int len)
>>>>  {
>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>> hfs_btree *tree, u32 cnid)
>>>>              page_cache_release(page);
>>>>              goto fail;
>>>>          }
>>>> -#if !REF_PAGES
>>>>          page_cache_release(page);
>>>> -#endif
>>>>          node->page[i] = page;
>>>>      }
>>>>
>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>  {
>>>>      if (node) {
>>>>          atomic_inc(&node->refcnt);
>>>> -#if REF_PAGES
>>>> -        {
>>>> -        int i;
>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>> -            get_page(node->page[i]);
>>>> -        }
>>>> -#endif
>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>      }
>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>          if (!atomic_read(&node->refcnt))
>>>>              BUG();
>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>> -#if REF_PAGES
>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>> -                put_page(node->page[i]);
>>>> -#endif
>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>              return;
>>>> -        }
>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>              if (!node->page[i])
>>>>                  continue;
>>>>              mark_page_accessed(node->page[i]);
>>>> -#if REF_PAGES
>>>> -            put_page(node->page[i]);
>>>> -#endif
>>>>          }
>>>>
>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>> index 8868d3b..b85abc6 100644
>>>> --- a/fs/hfsplus/bnode.c
>>>> +++ b/fs/hfsplus/bnode.c
>>>> @@ -18,8 +18,6 @@
>>>>  #include "hfsplus_fs.h"
>>>>  #include "hfsplus_raw.h"
>>>>
>>>> -#define REF_PAGES    0
>>>> -
>>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>>  {
>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>> hfs_btree *tree, u32 cnid)
>>>>              page_cache_release(page);
>>>>              goto fail;
>>>>          }
>>>> -#if !REF_PAGES
>>>>          page_cache_release(page);
>>>> -#endif
>>>>          node->page[i] = page;
>>>>      }
>>>>
>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>  {
>>>>      if (node) {
>>>>          atomic_inc(&node->refcnt);
>>>> -#if REF_PAGES
>>>> -        {
>>>> -        int i;
>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>> -            get_page(node->page[i]);
>>>> -        }
>>>> -#endif
>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>      }
>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>          if (!atomic_read(&node->refcnt))
>>>>              BUG();
>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>> -#if REF_PAGES
>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>> -                put_page(node->page[i]);
>>>> -#endif
>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>              return;
>>>> -        }
>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>              if (!node->page[i])
>>>>                  continue;
>>>>              mark_page_accessed(node->page[i]);
>>>> -#if REF_PAGES
>>>> -            put_page(node->page[i]);
>>>> -#endif
>>>>          }
>>>>
>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>> </commit>
>>>>
>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>>>> ------------------------------
>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>>
>>>>>>Hi Andrew,
>>>>>>
>>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>>
>>>>>>Feel free to add:
>>>>>>
>>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>>>
>>>>>>Thanks a lot in advance!
>>>>>>
>>>>>>Best regards,
>>>>>>
>>>>>>    Anton
>>>>>
>>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>>
>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>>
>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>>> I'm looking over the pre-git history to see how that comes about.
>>>>>
>>>>> Hin-Tak
>>>>>
>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>>>
>>>>>>> Fix this bugreport by Sasha Levin:
>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>>
>>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>>>> ---
>>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>>> index 759708f..5af50fb 100644
>>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>>             page_cache_release(page);
>>>>>>>             goto fail;
>>>>>>>         }
>>>>>>> -        page_cache_release(page);
>>>>>>>         node->page[i] = page;
>>>>>>>     }
>>>>>>>
>>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>>
>>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>>> {
>>>>>>> -#if 0
>>>>>>>     int i;
>>>>>>>
>>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>>         if (node->page[i])
>>>>>>>             page_cache_release(node->page[i]);
>>>>>>> -#endif
>>>>>>> +    }
>>>>>>>     kfree(node);
>>>>>>> }
>>>>>>>
>>>>>>> --
>>>>>>> 2.3.0
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>--
>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>>Linux NTFS maintainer
>>>>>>
>>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Antonov June 18, 2015, 8:51 p.m. UTC | #5
On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote:
>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>>> Hi Andrew and everybody else,
>>>>>
>>>>> I looked through the pre-git history and seem to have found the reason of how
>>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>>> debug code"
>>>>> commit below did not remove the right stuff.
>>>>>
>>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>>> need to release on bnode_free),
>>>>> then the code was modified for efficiency so that pages are not released
>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>>
>>>>> I think the release at bnode_free was commented out because the person who
>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>>> other/more places that one might need to release the pages than just
>>>>> at bnode_free().
>>>>>
>>>>> Does this train of thought make sense?
>>>>
>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>>> at first it was my thought too. But later I considered what the logic
>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>>> !REF_PAGES was what he really meant.
>>>>
>>>
>>> You are wrong, it seems. I found even earlier versions of the code,
>>> with history, at
>>> http://sourceforge.net/projects/linux-hfsplus/ .
>>> The earlier version of the code
>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>>> does get_page and put_page at bnode_get and bnode_put, while the later version
>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>> )
>>> have those removed, dated 2001 december and 2002 june.
>>
>> Interesting. I only looked at historical linux git.
>>
>>> The change log says so:
>>>
>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>>
>>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>
>>> The change also removed any page_cache_release() . It would seem that
>>> the additional REF_PAGES
>>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>>> development logic, and eventually took over.
>>>
>>> The release-early code came first, not after. That makes sense, as
>>> releasing early
>>
>> We may be talking about different things. I did not write what code
>> came earlier and what after. Anyway, this becomes to difficult to
>> follow. My point was that in the variant without refs, calling
>> page_cache_release() right after to read_cache_page() was exactly what
>> the author wanted (although it is a bug).
>>
>> Again, we are probably arguing about different things. And I'm a
>> little less zealous than you to do this code archeology.
>>
>
> I would really prefer that you don't use personal descriptions such as "zealous"
> as well as making the kind of comments such as
> "The result of the test (which I'm sure will show the bug is gone) will
> be by far more valuable contribution than your other messages in the
> thread.". Those are not helpful.
>
> In my first reply I have already stated my reasoning: your fix consists
> mainly of removing an "if 0" and re-enabling seemingly dead code. So the
> dead code was written for a purpose which is now lost and forgotten.

Of course. And it is not hard to guess that purpose quite reliably :).
The code used to release pages correctly, i. e. when freeing the bnode
structure.

> And here is that purpose - early development of the code was done
> with getting and putting pages within each routine, for safeness, and migrating
> towards caching pages for a longer life cycle in 2001-2002. When the
> code was cleaned
> up and re-worked before putting into the kernel in 2005, this
> development process
> was repeated but with a REF_PAGES conditional and some additional code
> which was // commented out.
>
> The migration to caching pages
> would have been more or less correct, if REF_PAGES was flipped
> and the // commented out code was enabled.  Except there was a mistake
> in one of the conditionals

By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not
a mistake under the conditions you provided above. With REF_PAGES
flipped to 1 and the // piece uncommented you don't want to release in
__hfs_bnode_create() because releasing is done in hfs_bnode_free().

>, and also the commented out code never got
> enabled. The commented out code was where a !REF_PAGES supposed to be.
> the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES"
> was a mistake that has propagated.
>
>
>>> and read_cache_page on each bnode_get is the slower, safer but
>>> inefficient approach.
>>> It would be absurd to go from caching pages, then to 'experiment' to
>>> see if release-early + later read_cache_page
>>> on each bnode_get 'improves'.
>>>
>>>>>
>>>>> Hin-Tak
>>>>>
>>>>> <commit>
>>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>>>> Author: Roman Zippel <zippel@linux-m68k.org>
>>>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>>>
>>>>>     [PATCH] hfs: remove debug code
>>>>>
>>>>>     This removes some old debug code, which is no longer needed.
>>>>>
>>>>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>>>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>>
>>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>>>> index a096c5a..3d5cdc6 100644
>>>>> --- a/fs/hfs/bnode.c
>>>>> +++ b/fs/hfs/bnode.c
>>>>> @@ -13,8 +13,6 @@
>>>>>
>>>>>  #include "btree.h"
>>>>>
>>>>> -#define REF_PAGES    0
>>>>> -
>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>>>          int off, int len)
>>>>>  {
>>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>> hfs_btree *tree, u32 cnid)
>>>>>              page_cache_release(page);
>>>>>              goto fail;
>>>>>          }
>>>>> -#if !REF_PAGES
>>>>>          page_cache_release(page);
>>>>> -#endif
>>>>>          node->page[i] = page;
>>>>>      }
>>>>>
>>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>  {
>>>>>      if (node) {
>>>>>          atomic_inc(&node->refcnt);
>>>>> -#if REF_PAGES
>>>>> -        {
>>>>> -        int i;
>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>> -            get_page(node->page[i]);
>>>>> -        }
>>>>> -#endif
>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>      }
>>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>          if (!atomic_read(&node->refcnt))
>>>>>              BUG();
>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>> -#if REF_PAGES
>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>> -                put_page(node->page[i]);
>>>>> -#endif
>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>              return;
>>>>> -        }
>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>              if (!node->page[i])
>>>>>                  continue;
>>>>>              mark_page_accessed(node->page[i]);
>>>>> -#if REF_PAGES
>>>>> -            put_page(node->page[i]);
>>>>> -#endif
>>>>>          }
>>>>>
>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>> index 8868d3b..b85abc6 100644
>>>>> --- a/fs/hfsplus/bnode.c
>>>>> +++ b/fs/hfsplus/bnode.c
>>>>> @@ -18,8 +18,6 @@
>>>>>  #include "hfsplus_fs.h"
>>>>>  #include "hfsplus_raw.h"
>>>>>
>>>>> -#define REF_PAGES    0
>>>>> -
>>>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>>>  {
>>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>> hfs_btree *tree, u32 cnid)
>>>>>              page_cache_release(page);
>>>>>              goto fail;
>>>>>          }
>>>>> -#if !REF_PAGES
>>>>>          page_cache_release(page);
>>>>> -#endif
>>>>>          node->page[i] = page;
>>>>>      }
>>>>>
>>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>  {
>>>>>      if (node) {
>>>>>          atomic_inc(&node->refcnt);
>>>>> -#if REF_PAGES
>>>>> -        {
>>>>> -        int i;
>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>> -            get_page(node->page[i]);
>>>>> -        }
>>>>> -#endif
>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>      }
>>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>          if (!atomic_read(&node->refcnt))
>>>>>              BUG();
>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>> -#if REF_PAGES
>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>> -                put_page(node->page[i]);
>>>>> -#endif
>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>              return;
>>>>> -        }
>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>              if (!node->page[i])
>>>>>                  continue;
>>>>>              mark_page_accessed(node->page[i]);
>>>>> -#if REF_PAGES
>>>>> -            put_page(node->page[i]);
>>>>> -#endif
>>>>>          }
>>>>>
>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>> </commit>
>>>>>
>>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>>>>> ------------------------------
>>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>>>
>>>>>>>Hi Andrew,
>>>>>>>
>>>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>>>
>>>>>>>Feel free to add:
>>>>>>>
>>>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>>>>
>>>>>>>Thanks a lot in advance!
>>>>>>>
>>>>>>>Best regards,
>>>>>>>
>>>>>>>    Anton
>>>>>>
>>>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>>>
>>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>>>
>>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>>>> I'm looking over the pre-git history to see how that comes about.
>>>>>>
>>>>>> Hin-Tak
>>>>>>
>>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Fix this bugreport by Sasha Levin:
>>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>>>
>>>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>>>>> ---
>>>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>>>> index 759708f..5af50fb 100644
>>>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>>>             page_cache_release(page);
>>>>>>>>             goto fail;
>>>>>>>>         }
>>>>>>>> -        page_cache_release(page);
>>>>>>>>         node->page[i] = page;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>>>
>>>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>>>> {
>>>>>>>> -#if 0
>>>>>>>>     int i;
>>>>>>>>
>>>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>>>         if (node->page[i])
>>>>>>>>             page_cache_release(node->page[i]);
>>>>>>>> -#endif
>>>>>>>> +    }
>>>>>>>>     kfree(node);
>>>>>>>> }
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.3.0
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>--
>>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>>>Linux NTFS maintainer
>>>>>>>
>>>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hin-Tak Leung June 18, 2015, 10:16 p.m. UTC | #6
On 18 June 2015 at 21:51, Sergei Antonov <saproj@gmail.com> wrote:
> On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>> On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote:
>>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>>>> Hi Andrew and everybody else,
>>>>>>
>>>>>> I looked through the pre-git history and seem to have found the reason of how
>>>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>>>> debug code"
>>>>>> commit below did not remove the right stuff.
>>>>>>
>>>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>>>> need to release on bnode_free),
>>>>>> then the code was modified for efficiency so that pages are not released
>>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>>>
>>>>>> I think the release at bnode_free was commented out because the person who
>>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>>>> other/more places that one might need to release the pages than just
>>>>>> at bnode_free().
>>>>>>
>>>>>> Does this train of thought make sense?
>>>>>
>>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>>>> at first it was my thought too. But later I considered what the logic
>>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>>>> !REF_PAGES was what he really meant.
>>>>>
>>>>
>>>> You are wrong, it seems. I found even earlier versions of the code,
>>>> with history, at
>>>> http://sourceforge.net/projects/linux-hfsplus/ .
>>>> The earlier version of the code
>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>>>> does get_page and put_page at bnode_get and bnode_put, while the later version
>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>>> )
>>>> have those removed, dated 2001 december and 2002 june.
>>>
>>> Interesting. I only looked at historical linux git.
>>>
>>>> The change log says so:
>>>>
>>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>>>
>>>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>>
>>>> The change also removed any page_cache_release() . It would seem that
>>>> the additional REF_PAGES
>>>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>>>> development logic, and eventually took over.
>>>>
>>>> The release-early code came first, not after. That makes sense, as
>>>> releasing early
>>>
>>> We may be talking about different things. I did not write what code
>>> came earlier and what after. Anyway, this becomes to difficult to
>>> follow. My point was that in the variant without refs, calling
>>> page_cache_release() right after to read_cache_page() was exactly what
>>> the author wanted (although it is a bug).
>>>
>>> Again, we are probably arguing about different things. And I'm a
>>> little less zealous than you to do this code archeology.
>>>
>>
>> I would really prefer that you don't use personal descriptions such as "zealous"
>> as well as making the kind of comments such as
>> "The result of the test (which I'm sure will show the bug is gone) will
>> be by far more valuable contribution than your other messages in the
>> thread.". Those are not helpful.
>>
>> In my first reply I have already stated my reasoning: your fix consists
>> mainly of removing an "if 0" and re-enabling seemingly dead code. So the
>> dead code was written for a purpose which is now lost and forgotten.
>
> Of course. And it is not hard to guess that purpose quite reliably :).
> The code used to release pages correctly, i. e. when freeing the bnode
> structure.
>
>> And here is that purpose - early development of the code was done
>> with getting and putting pages within each routine, for safeness, and migrating
>> towards caching pages for a longer life cycle in 2001-2002. When the
>> code was cleaned
>> up and re-worked before putting into the kernel in 2005, this
>> development process
>> was repeated but with a REF_PAGES conditional and some additional code
>> which was // commented out.
>>
>> The migration to caching pages
>> would have been more or less correct, if REF_PAGES was flipped
>> and the // commented out code was enabled.  Except there was a mistake
>> in one of the conditionals
>
> By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not
> a mistake under the conditions you provided above. With REF_PAGES
> flipped to 1 and the // piece uncommented you don't want to release in
> __hfs_bnode_create() because releasing is done in hfs_bnode_free().
>

Yes, that's what I am saying. I believe the development of the code happens
as follows:

- bnode_get() does get_page(), bnode_put() does put_page(), and the rest
of the routines - i.e. bnode_create() look up the pages but
immediately release them. bnode_free()
does nothing about pages.

- then it was desired to keep the pages longer without the repeated
get_pages and put_pages.
so those are "#if REF_PAGES" out. And we no longer get and put pages on demand
and should also stop releasing them after looking up at bnode_create() also.
(the bnode_create() is a poor name for what it does, but never mind). New code
is added to release at bnode_free(). The new code should have been spanned
by the opposite, i.e. "#if !REF_PAGES", but was inserted as place
holder with //.

The rest of the story I have already tried to explain a couple of
times. The switch
from REF_PAGES=1 to =0 (i.e. from get_pages/put_pages per
bnode_get/bnode_put routines,
to keeping pages around longer between bnode_create/bnode_free) did
not quite work correctly.

For the others who want to see the version of the code with those REF_PAGES
and want to weigh in on this,
I cannot find a web interface for the repo, but it is the commit below
in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

    From: Roman Zippel <zippel@linux-m68k.org>

    This driver adds full read/write support for HFS+ and is based on the
    readonly driver by Brad Broyer.

    Thanks to Ethan Benson <erbenson@alaska.net> for a number of patches to
    make the driver more compliant with the spec.

I would also repeat that earlier code and change history was at
http://sourceforge.net/projects/linux-hfsplus/, and it showed the
changes between
2001 dec to 2002 june

from get_page/put_page per bnode_get/bnode_put to removing those and
switching to a longer page life-cycle with the log message:

 "Use buffer cache instead of page cache in bnode.c. Cache inode extents."


>>, and also the commented out code never got
>> enabled. The commented out code was where a !REF_PAGES supposed to be.
>> the rest should all be all plain "if REF_PAGES". The singular "if !REF_PAGES"
>> was a mistake that has propagated.
>>
>>
>>>> and read_cache_page on each bnode_get is the slower, safer but
>>>> inefficient approach.
>>>> It would be absurd to go from caching pages, then to 'experiment' to
>>>> see if release-early + later read_cache_page
>>>> on each bnode_get 'improves'.
>>>>
>>>>>>
>>>>>> Hin-Tak
>>>>>>
>>>>>> <commit>
>>>>>> commit a5e3985fa014029eb6795664c704953720cc7f7d
>>>>>> Author: Roman Zippel <zippel@linux-m68k.org>
>>>>>> Date:   Tue Sep 6 15:18:47 2005 -0700
>>>>>>
>>>>>>     [PATCH] hfs: remove debug code
>>>>>>
>>>>>>     This removes some old debug code, which is no longer needed.
>>>>>>
>>>>>>     Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
>>>>>>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>>>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>>>
>>>>>> diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
>>>>>> index a096c5a..3d5cdc6 100644
>>>>>> --- a/fs/hfs/bnode.c
>>>>>> +++ b/fs/hfs/bnode.c
>>>>>> @@ -13,8 +13,6 @@
>>>>>>
>>>>>>  #include "btree.h"
>>>>>>
>>>>>> -#define REF_PAGES    0
>>>>>> -
>>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf,
>>>>>>          int off, int len)
>>>>>>  {
>>>>>> @@ -289,9 +287,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>>> hfs_btree *tree, u32 cnid)
>>>>>>              page_cache_release(page);
>>>>>>              goto fail;
>>>>>>          }
>>>>>> -#if !REF_PAGES
>>>>>>          page_cache_release(page);
>>>>>> -#endif
>>>>>>          node->page[i] = page;
>>>>>>      }
>>>>>>
>>>>>> @@ -449,13 +445,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>>  {
>>>>>>      if (node) {
>>>>>>          atomic_inc(&node->refcnt);
>>>>>> -#if REF_PAGES
>>>>>> -        {
>>>>>> -        int i;
>>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> -            get_page(node->page[i]);
>>>>>> -        }
>>>>>> -#endif
>>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>      }
>>>>>> @@ -472,20 +461,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>          if (!atomic_read(&node->refcnt))
>>>>>>              BUG();
>>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>>> -#if REF_PAGES
>>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>>> -                put_page(node->page[i]);
>>>>>> -#endif
>>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>>              return;
>>>>>> -        }
>>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>>              if (!node->page[i])
>>>>>>                  continue;
>>>>>>              mark_page_accessed(node->page[i]);
>>>>>> -#if REF_PAGES
>>>>>> -            put_page(node->page[i]);
>>>>>> -#endif
>>>>>>          }
>>>>>>
>>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>> index 8868d3b..b85abc6 100644
>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>> @@ -18,8 +18,6 @@
>>>>>>  #include "hfsplus_fs.h"
>>>>>>  #include "hfsplus_raw.h"
>>>>>>
>>>>>> -#define REF_PAGES    0
>>>>>> -
>>>>>>  /* Copy a specified range of bytes from the raw data of a node */
>>>>>>  void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
>>>>>>  {
>>>>>> @@ -450,9 +448,7 @@ static struct hfs_bnode *__hfs_bnode_create(struct
>>>>>> hfs_btree *tree, u32 cnid)
>>>>>>              page_cache_release(page);
>>>>>>              goto fail;
>>>>>>          }
>>>>>> -#if !REF_PAGES
>>>>>>          page_cache_release(page);
>>>>>> -#endif
>>>>>>          node->page[i] = page;
>>>>>>      }
>>>>>>
>>>>>> @@ -612,13 +608,6 @@ void hfs_bnode_get(struct hfs_bnode *node)
>>>>>>  {
>>>>>>      if (node) {
>>>>>>          atomic_inc(&node->refcnt);
>>>>>> -#if REF_PAGES
>>>>>> -        {
>>>>>> -        int i;
>>>>>> -        for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>> -            get_page(node->page[i]);
>>>>>> -        }
>>>>>> -#endif
>>>>>>          dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>      }
>>>>>> @@ -635,20 +624,12 @@ void hfs_bnode_put(struct hfs_bnode *node)
>>>>>>                 node->tree->cnid, node->this, atomic_read(&node->refcnt));
>>>>>>          if (!atomic_read(&node->refcnt))
>>>>>>              BUG();
>>>>>> -        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
>>>>>> -#if REF_PAGES
>>>>>> -            for (i = 0; i < tree->pages_per_bnode; i++)
>>>>>> -                put_page(node->page[i]);
>>>>>> -#endif
>>>>>> +        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
>>>>>>              return;
>>>>>> -        }
>>>>>>          for (i = 0; i < tree->pages_per_bnode; i++) {
>>>>>>              if (!node->page[i])
>>>>>>                  continue;
>>>>>>              mark_page_accessed(node->page[i]);
>>>>>> -#if REF_PAGES
>>>>>> -            put_page(node->page[i]);
>>>>>> -#endif
>>>>>>          }
>>>>>>
>>>>>>          if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
>>>>>> </commit>
>>>>>>
>>>>>> On 18 June 2015 at 00:41, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
>>>>>>> ------------------------------
>>>>>>> On Tue, Jun 9, 2015 7:06 PM BST Anton Altaparmakov wrote:
>>>>>>>
>>>>>>>>Hi Andrew,
>>>>>>>>
>>>>>>>>Can you please take this patch up and get it merged into mainline?  Despite Vyacheslav's lamentations this patch is obviously correct.  __hfs_bnode_create() is called in two places from the driver and the pages it returned are kmap()-ed and used (both read and write) so it is quite obvious they cannot be page_cache_release()-d until after they are kunmap()-ed instead of before they are even kmap()-ed as happens without this patch and this patch fixes exactly this by moving the page_cache_release() calls to after the pages are kunmap()-ed.
>>>>>>>>
>>>>>>>>Feel free to add:
>>>>>>>>
>>>>>>>>Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
>>>>>>>>
>>>>>>>>Thanks a lot in advance!
>>>>>>>>
>>>>>>>>Best regards,
>>>>>>>>
>>>>>>>>    Anton
>>>>>>>
>>>>>>> I went around and looked at the code and Anton is right -  __hfs_bnode_create() is called from hfs_bnode_create() and bfs_bnode_find(), and the pages are played with afterwards a bit further, so page_cache_release() at where the current location is, is wrong.
>>>>>>>
>>>>>>> Since I have a way of seeing a likely-related problem, I'll try testing this change in the next few days and possibly adding a Signed-off at the point.
>>>>>>>
>>>>>>> BTW, there is a similiar problem with the hfs (no plus) driver also; the "if 0" part is commented out with // c++ style comments instead.
>>>>>>> I'm looking over the pre-git history to see how that comes about.
>>>>>>>
>>>>>>> Hin-Tak
>>>>>>>
>>>>>>>>> On 7 Jun 2015, at 03:42, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Fix this bugreport by Sasha Levin:
>>>>>>>>> http://lkml.org/lkml/2015/2/20/85 ("use after free")
>>>>>>>>> Make sure mapped pages are available for the entire lifetime of hfs_bnode.
>>>>>>>>>
>>>>>>>>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>> Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
>>>>>>>>> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
>>>>>>>>> Cc: Sougata Santra <sougata@tuxera.com>
>>>>>>>>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>>>>>>>>> Signed-off-by: Sergei Antonov <saproj@gmail.com>
>>>>>>>>> ---
>>>>>>>>> fs/hfsplus/bnode.c | 6 ++----
>>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
>>>>>>>>> index 759708f..5af50fb 100644
>>>>>>>>> --- a/fs/hfsplus/bnode.c
>>>>>>>>> +++ b/fs/hfsplus/bnode.c
>>>>>>>>> @@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
>>>>>>>>>             page_cache_release(page);
>>>>>>>>>             goto fail;
>>>>>>>>>         }
>>>>>>>>> -        page_cache_release(page);
>>>>>>>>>         node->page[i] = page;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> @@ -566,13 +565,12 @@ node_error:
>>>>>>>>>
>>>>>>>>> void hfs_bnode_free(struct hfs_bnode *node)
>>>>>>>>> {
>>>>>>>>> -#if 0
>>>>>>>>>     int i;
>>>>>>>>>
>>>>>>>>> -    for (i = 0; i < node->tree->pages_per_bnode; i++)
>>>>>>>>> +    for (i = 0; i < node->tree->pages_per_bnode; i++) {
>>>>>>>>>         if (node->page[i])
>>>>>>>>>             page_cache_release(node->page[i]);
>>>>>>>>> -#endif
>>>>>>>>> +    }
>>>>>>>>>     kfree(node);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 2.3.0
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>--
>>>>>>>>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>>>>>>>>Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
>>>>>>>>Linux NTFS maintainer
>>>>>>>>
>>>>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Antonov June 19, 2015, 1:30 a.m. UTC | #7
On 19 June 2015 at 00:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> On 18 June 2015 at 21:51, Sergei Antonov <saproj@gmail.com> wrote:
>> On 18 June 2015 at 19:16, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>> On 18 June 2015 at 17:19, Sergei Antonov <saproj@gmail.com> wrote:
>>>> On 18 June 2015 at 17:37, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>>> On 18 June 2015 at 13:58, Sergei Antonov <saproj@gmail.com> wrote:
>>>>>> On 18 June 2015 at 04:51, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>>>>>> Hi Andrew and everybody else,
>>>>>>>
>>>>>>> I looked through the pre-git history and seem to have found the reason of how
>>>>>>> the current code had come to be, and why Sergei's fix seems to involve
>>>>>>> re-enable "#if 0"'ed out code. It is this - I think the code was wrong on entry,
>>>>>>> in that the first !REF_PAGES should be REF_PAGES. Then the "remove old
>>>>>>> debug code"
>>>>>>> commit below did not remove the right stuff.
>>>>>>>
>>>>>>> Looking at the old code, I think REF_PAGES may have started out being
>>>>>>> 1 (i.e. pages are released right after create, then get/put when needed, no
>>>>>>> need to release on bnode_free),
>>>>>>> then the code was modified for efficiency so that pages are not released
>>>>>>> on bnode_create and not put/get at bnode_put/get but release at bnode_free.
>>>>>>> But it was still largely working in the REF_PAGE=1 mode (and with the commented
>>>>>>> out release at bnode_free, which is supposed to be spanned by a !REF_PAGE).
>>>>>>> Then it was flipped over, and seems to work, and things were forgotten.
>>>>>>>
>>>>>>> I think the release at bnode_free was commented out because the person who
>>>>>>> wrote it wasn't sure - I am not sure either, since it seems like there might be
>>>>>>> other/more places that one might need to release the pages than just
>>>>>>> at bnode_free().
>>>>>>>
>>>>>>> Does this train of thought make sense?
>>>>>>
>>>>>> No. The occurrence of !REF_PAGES was not a typo, it seems to me. Well,
>>>>>> at first it was my thought too. But later I considered what the logic
>>>>>> of the author could have been with REF_PAGES=0 and REF_PAGES=1 and
>>>>>> came to a conclusion that REF_PAGES=1 was a safe approach and
>>>>>> REF_PAGES=0 was an experimental unsafe release-early approach, so
>>>>>> !REF_PAGES was what he really meant.
>>>>>>
>>>>>
>>>>> You are wrong, it seems. I found even earlier versions of the code,
>>>>> with history, at
>>>>> http://sourceforge.net/projects/linux-hfsplus/ .
>>>>> The earlier version of the code
>>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/)
>>>>> does get_page and put_page at bnode_get and bnode_put, while the later version
>>>>> (http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
>>>>> )
>>>>> have those removed, dated 2001 december and 2002 june.
>>>>
>>>> Interesting. I only looked at historical linux git.
>>>>
>>>>> The change log says so:
>>>>>
>>>>> Date:   Thu Jun 6 09:45:14 2002 +0000
>>>>>
>>>>>     Use buffer cache instead of page cache in bnode.c. Cache inode extents.
>>>>>
>>>>> The change also removed any page_cache_release() . It would seem that
>>>>> the additional REF_PAGES
>>>>> conditional was a fork/rewrite/clean-up which re-uses the earlier
>>>>> development logic, and eventually took over.
>>>>>
>>>>> The release-early code came first, not after. That makes sense, as
>>>>> releasing early
>>>>
>>>> We may be talking about different things. I did not write what code
>>>> came earlier and what after. Anyway, this becomes to difficult to
>>>> follow. My point was that in the variant without refs, calling
>>>> page_cache_release() right after to read_cache_page() was exactly what
>>>> the author wanted (although it is a bug).
>>>>
>>>> Again, we are probably arguing about different things. And I'm a
>>>> little less zealous than you to do this code archeology.
>>>>
>>>
>>> I would really prefer that you don't use personal descriptions such as "zealous"
>>> as well as making the kind of comments such as
>>> "The result of the test (which I'm sure will show the bug is gone) will
>>> be by far more valuable contribution than your other messages in the
>>> thread.". Those are not helpful.
>>>
>>> In my first reply I have already stated my reasoning: your fix consists
>>> mainly of removing an "if 0" and re-enabling seemingly dead code. So the
>>> dead code was written for a purpose which is now lost and forgotten.
>>
>> Of course. And it is not hard to guess that purpose quite reliably :).
>> The code used to release pages correctly, i. e. when freeing the bnode
>> structure.
>>
>>> And here is that purpose - early development of the code was done
>>> with getting and putting pages within each routine, for safeness, and migrating
>>> towards caching pages for a longer life cycle in 2001-2002. When the
>>> code was cleaned
>>> up and re-worked before putting into the kernel in 2005, this
>>> development process
>>> was repeated but with a REF_PAGES conditional and some additional code
>>> which was // commented out.
>>>
>>> The migration to caching pages
>>> would have been more or less correct, if REF_PAGES was flipped
>>> and the // commented out code was enabled.  Except there was a mistake
>>> in one of the conditionals
>>
>> By mistake you mean !REF_PAGES in __hfs_bnode_create()? But it is not
>> a mistake under the conditions you provided above. With REF_PAGES
>> flipped to 1 and the // piece uncommented you don't want to release in
>> __hfs_bnode_create() because releasing is done in hfs_bnode_free().
>>
>
> Yes, that's what I am saying. I believe the development of the code happens
> as follows:
>
> - bnode_get() does get_page(), bnode_put() does put_page(), and the rest
> of the routines - i.e. bnode_create() look up the pages but
> immediately release them. bnode_free()
> does nothing about pages.
>
> - then it was desired to keep the pages longer without the repeated
> get_pages and put_pages.
> so those are "#if REF_PAGES" out. And we no longer get and put pages on demand
> and should also stop releasing them after looking up at bnode_create() also.
> (the bnode_create() is a poor name for what it does, but never mind). New code
> is added to release at bnode_free(). The new code should have been spanned
> by the opposite, i.e. "#if !REF_PAGES", but was inserted as place
> holder with //.
>
> The rest of the story I have already tried to explain a couple of
> times. The switch
> from REF_PAGES=1 to =0 (i.e. from get_pages/put_pages per
> bnode_get/bnode_put routines,
> to keeping pages around longer between bnode_create/bnode_free) did
> not quite work correctly.
>
> For the others who want to see the version of the code with those REF_PAGES
> and want to weigh in on this,
> I cannot find a web interface for the repo, but it is the commit below
> in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

A web link: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/log/fs/hfsplus/bnode.c

> commit 91556682e0bf004d98a529bf829d339abb98bbbd
> Author: Andrew Morton <akpm@osdl.org>
> Date:   Wed Feb 25 16:17:48 2004 -0800
>
>     [PATCH] HFS+ support
>
>     From: Roman Zippel <zippel@linux-m68k.org>
>
>     This driver adds full read/write support for HFS+ and is based on the
>     readonly driver by Brad Broyer.
>
>     Thanks to Ethan Benson <erbenson@alaska.net> for a number of patches to
>     make the driver more compliant with the spec.
>
> I would also repeat that earlier code and change history was at
> http://sourceforge.net/projects/linux-hfsplus/, and it showed the
> changes between
> 2001 dec to 2002 june
>
> from get_page/put_page per bnode_get/bnode_put to removing those and
> switching to a longer page life-cycle with the log message:
>
>  "Use buffer cache instead of page cache in bnode.c. Cache inode extents."

Alright. I did not correctly understand your original idea and write
some irrelevant stuff. Maybe you were right. However, it is only a
question of historical interest.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index a096c5a..3d5cdc6 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -13,8 +13,6 @@ 

 #include "btree.h"

-#define REF_PAGES    0
-
 void hfs_bnode_read(struct hfs_bnode *node, void *buf,
         int off, int len)
 {
@@ -289,9 +287,7 @@  static struct hfs_bnode *__hfs_bnode_create(struct
hfs_btree *tree, u32 cnid)
             page_cache_release(page);
             goto fail;
         }
-#if !REF_PAGES
         page_cache_release(page);
-#endif
         node->page[i] = page;
     }

@@ -449,13 +445,6 @@  void hfs_bnode_get(struct hfs_bnode *node)
 {
     if (node) {
         atomic_inc(&node->refcnt);
-#if REF_PAGES
-        {
-        int i;
-        for (i = 0; i < node->tree->pages_per_bnode; i++)
-            get_page(node->page[i]);
-        }
-#endif
         dprint(DBG_BNODE_REFS, "get_node(%d:%d): %d\n",
                node->tree->cnid, node->this, atomic_read(&node->refcnt));
     }
@@ -472,20 +461,12 @@  void hfs_bnode_put(struct hfs_bnode *node)
                node->tree->cnid, node->this, atomic_read(&node->refcnt));
         if (!atomic_read(&node->refcnt))
             BUG();
-        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock)) {
-#if REF_PAGES
-            for (i = 0; i < tree->pages_per_bnode; i++)
-                put_page(node->page[i]);
-#endif
+        if (!atomic_dec_and_lock(&node->refcnt, &tree->hash_lock))
             return;
-        }
         for (i = 0; i < tree->pages_per_bnode; i++) {
             if (!node->page[i])
                 continue;
             mark_page_accessed(node->page[i]);
-#if REF_PAGES
-            put_page(node->page[i]);
-#endif
         }

         if (test_bit(HFS_BNODE_DELETED, &node->flags)) {
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 8868d3b..b85abc6 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -18,8 +18,6 @@ 
 #include "hfsplus_fs.h"
 #include "hfsplus_raw.h"

-#define REF_PAGES    0
-
 /* Copy a specified range of bytes from the raw data of a node */
 void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
 {
@@ -450,9 +448,7 @@  static struct hfs_bnode *__hfs_bnode_create(struct
hfs_btree *tree, u32 cnid)
             page_cache_release(page);
             goto fail;
         }
-#if !REF_PAGES
         page_cache_release(page);
-#endif
         node->page[i] = page;
     }