Message ID | 1525862104-3407-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 9 May 2018 19:35:04 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > syzbot is reporting too large memory allocation at bfs_fill_super() [1]. > Since file system image is corrupted such that bfs_sb->s_start == 0, > bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this > by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf(). > > [1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96 > > ... > > --- a/fs/bfs/inode.c > +++ b/fs/bfs/inode.c > @@ -350,7 +350,8 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent) > > s->s_magic = BFS_MAGIC; > > - if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end)) { > + if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end) || > + le32_to_cpu(bfs_sb->s_start) < BFS_BSIZE) { hm, how did you figure out that s_start cannot be less than BFS_BSIZE? From this, I guess? info->si_lasti = (le32_to_cpu(bfs_sb->s_start) - BFS_BSIZE) / I wonder if this is the most appropriate way to check. > printf("Superblock is corrupted\n"); > goto out1; > } > @@ -359,9 +360,11 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent) > sizeof(struct bfs_inode) > + BFS_ROOT_INO - 1; > imap_len = (info->si_lasti / 8) + 1; > - info->si_imap = kzalloc(imap_len, GFP_KERNEL); > - if (!info->si_imap) > + info->si_imap = kzalloc(imap_len, GFP_KERNEL | __GFP_NOWARN); > + if (!info->si_imap) { > + printf("Cannot allocate %u bytes\n", imap_len); > goto out1; > + } This seems unnecessary - if the kzalloc() fails we'll get the page-allocation-fauilure warning and a nice backtrace, etc. Why suppress all of that and add our custom warning instead?
On Thu, 10 May 2018 08:46:18 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > page-allocation-fauilure warning and a nice backtrace, etc. Why > > suppress all of that and add our custom warning instead? > > the intent of this patch is to avoid panic() by panic_on_warn == 1 > due to hitting > > struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) > { > unsigned int index; > > if (unlikely(size > KMALLOC_MAX_SIZE)) { > WARN_ON_ONCE(!(flags & __GFP_NOWARN)); /* <= this line */ > return NULL; > } > > when size to allocate is controlled by the filesystem image. Well, the same could happen with many many memory-allocation sites. What's special about BFS? If someone sets panic_on_warn=1 then presumably this panic is the behaviour they wanted in this case.
On Wed, May 09, 2018 at 07:35:04PM +0900, Tetsuo Handa wrote: > syzbot is reporting too large memory allocation at bfs_fill_super() [1]. > Since file system image is corrupted such that bfs_sb->s_start == 0, > bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this > by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf(). > > [1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96 Looking at that C reproducer, it looks like it's trying to fuzz filesystem images: https://syzkaller.appspot.com/text?tag=ReproC&x=15b834a7800000 Do we really want this? I was under the impression that we simply do not care about attackers who have control of the media, and having syzbot generate thousands of bugs that nobody will work on is not a good use of anybody's time.
On 2018/05/10 8:53, Andrew Morton wrote: > On Thu, 10 May 2018 08:46:18 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > >>> page-allocation-fauilure warning and a nice backtrace, etc. Why >>> suppress all of that and add our custom warning instead? >> >> the intent of this patch is to avoid panic() by panic_on_warn == 1 >> due to hitting >> >> struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) >> { >> unsigned int index; >> >> if (unlikely(size > KMALLOC_MAX_SIZE)) { >> WARN_ON_ONCE(!(flags & __GFP_NOWARN)); /* <= this line */ >> return NULL; >> } >> >> when size to allocate is controlled by the filesystem image. > > Well, the same could happen with many many memory-allocation sites. > What's special about BFS? If someone sets panic_on_warn=1 then > presumably this panic is the behaviour they wanted in this case. > Tigran, this patch is stalling. Do we want to apply this? Or, ignore as invalid? errors=panic mount option for ext4 case was ignored as invalid. http://lkml.kernel.org/r/CACT4Y+Z+2YW_VALJzzQr6hLsviA=dXk3iFqwVf+P5zqojeC9Zg@mail.gmail.com But I prefer avoiding crashes if we can fix it.
Having read the discussion carefully, I personally prefer to ignore the fix as invalid, because mounting a filesystem image is a privileged operation and if attempting to mount a corrupted image causes a panic, that is no big deal, imho. On 13 June 2018 at 14:33, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > On 2018/05/10 8:53, Andrew Morton wrote: >> On Thu, 10 May 2018 08:46:18 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >>>> page-allocation-fauilure warning and a nice backtrace, etc. Why >>>> suppress all of that and add our custom warning instead? >>> >>> the intent of this patch is to avoid panic() by panic_on_warn == 1 >>> due to hitting >>> >>> struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) >>> { >>> unsigned int index; >>> >>> if (unlikely(size > KMALLOC_MAX_SIZE)) { >>> WARN_ON_ONCE(!(flags & __GFP_NOWARN)); /* <= this line */ >>> return NULL; >>> } >>> >>> when size to allocate is controlled by the filesystem image. >> >> Well, the same could happen with many many memory-allocation sites. >> What's special about BFS? If someone sets panic_on_warn=1 then >> presumably this panic is the behaviour they wanted in this case. >> > > Tigran, this patch is stalling. Do we want to apply this? Or, ignore as invalid? > > errors=panic mount option for ext4 case was ignored as invalid. > http://lkml.kernel.org/r/CACT4Y+Z+2YW_VALJzzQr6hLsviA=dXk3iFqwVf+P5zqojeC9Zg@mail.gmail.com > > But I prefer avoiding crashes if we can fix it.
On Wed, Jun 13, 2018 at 3:49 PM, Tigran Aivazian <aivazian.tigran@gmail.com> wrote: > Having read the discussion carefully, I personally prefer to ignore > the fix as invalid, because mounting a filesystem image is a > privileged operation and if attempting to mount a corrupted image > causes a panic, that is no big deal, imho. Even if not a big deal, but still a bug? Also note that this is kind a chicken and egg problem. The only reason why these operations are privileged is that we have bugs and we are not fixing them. Also keep this in mind whenever you insert anything into usb and automount if turned on ;) You are basically giving permission to that thing to do everything with the machine. Or when you mount any image not created by you with trusted tools that you build yourself from trusted sources with a trusted compiler. Also keep in mind Android and other systems where root is not a trusted entity. > On 13 June 2018 at 14:33, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> On 2018/05/10 8:53, Andrew Morton wrote: >>> On Thu, 10 May 2018 08:46:18 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: >>> >>>>> page-allocation-fauilure warning and a nice backtrace, etc. Why >>>>> suppress all of that and add our custom warning instead? >>>> >>>> the intent of this patch is to avoid panic() by panic_on_warn == 1 >>>> due to hitting >>>> >>>> struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) >>>> { >>>> unsigned int index; >>>> >>>> if (unlikely(size > KMALLOC_MAX_SIZE)) { >>>> WARN_ON_ONCE(!(flags & __GFP_NOWARN)); /* <= this line */ >>>> return NULL; >>>> } >>>> >>>> when size to allocate is controlled by the filesystem image. >>> >>> Well, the same could happen with many many memory-allocation sites. >>> What's special about BFS? If someone sets panic_on_warn=1 then >>> presumably this panic is the behaviour they wanted in this case. >>> >> >> Tigran, this patch is stalling. Do we want to apply this? Or, ignore as invalid? >> >> errors=panic mount option for ext4 case was ignored as invalid. >> http://lkml.kernel.org/r/CACT4Y+Z+2YW_VALJzzQr6hLsviA=dXk3iFqwVf+P5zqojeC9Zg@mail.gmail.com >> >> But I prefer avoiding crashes if we can fix it. > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/CAK%2B_RLko_OCepN4FCmsaQPAKkt9JNGe8pNRK7SO-onhw5zCneA%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout.
On 2018/06/13 22:49, Tigran Aivazian wrote: > Having read the discussion carefully, I personally prefer to ignore > the fix as invalid, because mounting a filesystem image is a > privileged operation and if attempting to mount a corrupted image > causes a panic, that is no big deal, imho. While this report is triggered by a crafted filesystem image, I don't think that a legitimate but huge filesystem image crashes the system by hitting (size > KMALLOC_MAX_SIZE) path is nice. While filesystem should try to avoid such large allocation, there is no need to crash the system just because kmalloc() failed. e.g. http://lkml.kernel.org/r/927f24d4-b0c3-8192-5723-c314f38b4292@iogearbox.net
On 13 June 2018 at 23:09, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > While this report is triggered by a crafted filesystem image, I don't think that > a legitimate but huge filesystem image crashes the system by hitting > (size > KMALLOC_MAX_SIZE) path is nice. While filesystem should try to avoid > such large allocation, there is no need to crash the system just because > kmalloc() failed. > > e.g. http://lkml.kernel.org/r/927f24d4-b0c3-8192-5723-c314f38b4292@iogearbox.net Ok, could you please show me the very final version of the suggested patch, please?
Hello Dmitry, Historically, Unix mount(2) (and mount(8)) are normally privileged unless the filesystem is "trusted" by sysadmin who recorded this fact in the corresponding entry in /etc/fstab (see fstab(5)). I agree that the filesystem mounting code should perform enough validation on the superblock to avoid crashing on an invalid filesystem image. But I disagree that some specific filesystem (and least of all BFS!) should modify the panic_on_warn semantics and replace it with its own private warning message on kernel memory allocation failures. Kind regards, Tigran On 13 June 2018 at 17:00, Dmitry Vyukov <dvyukov@google.com> wrote: > On Wed, Jun 13, 2018 at 3:49 PM, Tigran Aivazian > <aivazian.tigran@gmail.com> wrote: >> Having read the discussion carefully, I personally prefer to ignore >> the fix as invalid, because mounting a filesystem image is a >> privileged operation and if attempting to mount a corrupted image >> causes a panic, that is no big deal, imho. > > Even if not a big deal, but still a bug? > > Also note that this is kind a chicken and egg problem. The only reason > why these operations are privileged is that we have bugs and we are > not fixing them. > > Also keep this in mind whenever you insert anything into usb and > automount if turned on ;) You are basically giving permission to that > thing to do everything with the machine. Or when you mount any image > not created by you with trusted tools that you build yourself from > trusted sources with a trusted compiler. > > Also keep in mind Android and other systems where root is not a trusted entity. > > > >> On 13 June 2018 at 14:33, Tetsuo Handa >> <penguin-kernel@i-love.sakura.ne.jp> wrote: >>> On 2018/05/10 8:53, Andrew Morton wrote: >>>> On Thu, 10 May 2018 08:46:18 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: >>>> >>>>>> page-allocation-fauilure warning and a nice backtrace, etc. Why >>>>>> suppress all of that and add our custom warning instead? >>>>> >>>>> the intent of this patch is to avoid panic() by panic_on_warn == 1 >>>>> due to hitting >>>>> >>>>> struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) >>>>> { >>>>> unsigned int index; >>>>> >>>>> if (unlikely(size > KMALLOC_MAX_SIZE)) { >>>>> WARN_ON_ONCE(!(flags & __GFP_NOWARN)); /* <= this line */ >>>>> return NULL; >>>>> } >>>>> >>>>> when size to allocate is controlled by the filesystem image. >>>> >>>> Well, the same could happen with many many memory-allocation sites. >>>> What's special about BFS? If someone sets panic_on_warn=1 then >>>> presumably this panic is the behaviour they wanted in this case. >>>> >>> >>> Tigran, this patch is stalling. Do we want to apply this? Or, ignore as invalid? >>> >>> errors=panic mount option for ext4 case was ignored as invalid. >>> http://lkml.kernel.org/r/CACT4Y+Z+2YW_VALJzzQr6hLsviA=dXk3iFqwVf+P5zqojeC9Zg@mail.gmail.com >>> >>> But I prefer avoiding crashes if we can fix it. >> >> -- >> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. >> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/CAK%2B_RLko_OCepN4FCmsaQPAKkt9JNGe8pNRK7SO-onhw5zCneA%40mail.gmail.com. >> For more options, visit https://groups.google.com/d/optout.
On Thu, Jun 14, 2018 at 2:23 PM, Tigran Aivazian <aivazian.tigran@gmail.com> wrote: > Hello Dmitry, > > Historically, Unix mount(2) (and mount(8)) are normally privileged > unless the filesystem is "trusted" by sysadmin who recorded this fact > in the corresponding entry in /etc/fstab (see fstab(5)). I agree that > the filesystem mounting code should perform enough validation on the > superblock to avoid crashing on an invalid filesystem image. But I > disagree that some specific filesystem (and least of all BFS!) should > modify the panic_on_warn semantics and replace it with its own private > warning message on kernel memory allocation failures. This is not a general memory allocation failure. Nothing is printed on memory allocation failures. This is a very specific case of memory allocation failure, namely when kernel code asks for too large block, such large that it can't be possible allocated regardless of amount of available memory. For allocations with user-controllable size kernel code either needs to use __GFP_NOWARN, or (better) impose own limits such that it can always be allocated with kmalloc. Consider, currently we can have a bfs image that works fine on one kernel, but fails to mount on another just because it happens so that one could allocate 4MB with kmalloc, but another can't (different allocator/different settings/different kernel revision). This looks like pretty unfortunate property for persistent data formats in general. > Kind regards, > Tigran > > On 13 June 2018 at 17:00, Dmitry Vyukov <dvyukov@google.com> wrote: >> On Wed, Jun 13, 2018 at 3:49 PM, Tigran Aivazian >> <aivazian.tigran@gmail.com> wrote: >>> Having read the discussion carefully, I personally prefer to ignore >>> the fix as invalid, because mounting a filesystem image is a >>> privileged operation and if attempting to mount a corrupted image >>> causes a panic, that is no big deal, imho. >> >> Even if not a big deal, but still a bug? >> >> Also note that this is kind a chicken and egg problem. The only reason >> why these operations are privileged is that we have bugs and we are >> not fixing them. >> >> Also keep this in mind whenever you insert anything into usb and >> automount if turned on ;) You are basically giving permission to that >> thing to do everything with the machine. Or when you mount any image >> not created by you with trusted tools that you build yourself from >> trusted sources with a trusted compiler. >> >> Also keep in mind Android and other systems where root is not a trusted entity. >> >> >> >>> On 13 June 2018 at 14:33, Tetsuo Handa >>> <penguin-kernel@i-love.sakura.ne.jp> wrote: >>>> On 2018/05/10 8:53, Andrew Morton wrote: >>>>> On Thu, 10 May 2018 08:46:18 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: >>>>> >>>>>>> page-allocation-fauilure warning and a nice backtrace, etc. Why >>>>>>> suppress all of that and add our custom warning instead? >>>>>> >>>>>> the intent of this patch is to avoid panic() by panic_on_warn == 1 >>>>>> due to hitting >>>>>> >>>>>> struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) >>>>>> { >>>>>> unsigned int index; >>>>>> >>>>>> if (unlikely(size > KMALLOC_MAX_SIZE)) { >>>>>> WARN_ON_ONCE(!(flags & __GFP_NOWARN)); /* <= this line */ >>>>>> return NULL; >>>>>> } >>>>>> >>>>>> when size to allocate is controlled by the filesystem image. >>>>> >>>>> Well, the same could happen with many many memory-allocation sites. >>>>> What's special about BFS? If someone sets panic_on_warn=1 then >>>>> presumably this panic is the behaviour they wanted in this case. >>>>> >>>> >>>> Tigran, this patch is stalling. Do we want to apply this? Or, ignore as invalid? >>>> >>>> errors=panic mount option for ext4 case was ignored as invalid. >>>> http://lkml.kernel.org/r/CACT4Y+Z+2YW_VALJzzQr6hLsviA=dXk3iFqwVf+P5zqojeC9Zg@mail.gmail.com >>>> >>>> But I prefer avoiding crashes if we can fix it. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/CAK%2B_RLko_OCepN4FCmsaQPAKkt9JNGe8pNRK7SO-onhw5zCneA%40mail.gmail.com. >>> For more options, visit https://groups.google.com/d/optout.
On 14 June 2018 at 13:38, Dmitry Vyukov <dvyukov@google.com> wrote: > Consider, currently we can have a bfs image that works fine on one > kernel, but fails to mount on another just because it happens so that > one could allocate 4MB with kmalloc, but another can't (different > allocator/different settings/different kernel revision). Yes, but this would only happen _without_ the validation proposed by Tetsuo Handa. If we check s_start then the invalid enormous allocation request will not be made and what you describe won't not happen.
On Thu, Jun 14, 2018 at 3:05 PM, Tigran Aivazian <aivazian.tigran@gmail.com> wrote: > On 14 June 2018 at 13:38, Dmitry Vyukov <dvyukov@google.com> wrote: >> Consider, currently we can have a bfs image that works fine on one >> kernel, but fails to mount on another just because it happens so that >> one could allocate 4MB with kmalloc, but another can't (different >> allocator/different settings/different kernel revision). > > Yes, but this would only happen _without_ the validation proposed by > Tetsuo Handa. If we check s_start then the invalid enormous allocation > request will not be made and what you describe won't not happen. Agree. If we do a sanity check first, then __GFP_NOWARN is actually useful: it will detect the cases where a filesystem can be mounted on one machine but not on another (and will also double check that our sanity is really sane).
On 2018/06/14 22:05, Tigran Aivazian wrote: > On 14 June 2018 at 13:38, Dmitry Vyukov <dvyukov@google.com> wrote: >> Consider, currently we can have a bfs image that works fine on one >> kernel, but fails to mount on another just because it happens so that >> one could allocate 4MB with kmalloc, but another can't (different >> allocator/different settings/different kernel revision). > > Yes, but this would only happen _without_ the validation proposed by > Tetsuo Handa. If we check s_start then the invalid enormous allocation > request will not be made and what you describe won't not happen. > What is possible largest value for imap_len ? info->si_lasti = (le32_to_cpu(bfs_sb->s_start) - BFS_BSIZE) / sizeof(struct bfs_inode) + BFS_ROOT_INO - 1; imap_len = (info->si_lasti / 8) + 1; info->si_imap = kzalloc(imap_len, GFP_KERNEL); Since sizeof(struct bfs_inode) is 64 and bfs_sb->s_start is unsigned 32bits integer (where constraints is BFS_BSIZE <= bfs_sb->s_start <= bfs_sb->s_end), theoretically it is possible to assign bfs_sb->s_start > 2GB (apart from whether such value makes sense). Then, isn't it possible that imap_len > 4M and still hit KMALLOC_MAX_SIZE limit?
On 14 June 2018 at 14:28, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > What is possible largest value for imap_len ? > > info->si_lasti = (le32_to_cpu(bfs_sb->s_start) - BFS_BSIZE) / sizeof(struct bfs_inode) + BFS_ROOT_INO - 1; > imap_len = (info->si_lasti / 8) + 1; > info->si_imap = kzalloc(imap_len, GFP_KERNEL); > > Since sizeof(struct bfs_inode) is 64 and bfs_sb->s_start is unsigned 32bits integer > (where constraints is BFS_BSIZE <= bfs_sb->s_start <= bfs_sb->s_end), theoretically > it is possible to assign bfs_sb->s_start > 2GB (apart from whether such value makes > sense). Then, isn't it possible that imap_len > 4M and still hit KMALLOC_MAX_SIZE limit? You are correct, but the proper fix should be to restrict imap_len to whatever the maximum value allowed by BFS filesystem layout and reject anything beyond it. I will try to remember what it was from the notes I made when I wrote BFS back in 1999. Please wait (possibly a few days) and I will let you know what those values are. Kind regards, Tigran
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c index 9a69392..d81c148 100644 --- a/fs/bfs/inode.c +++ b/fs/bfs/inode.c @@ -350,7 +350,8 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent) s->s_magic = BFS_MAGIC; - if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end)) { + if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end) || + le32_to_cpu(bfs_sb->s_start) < BFS_BSIZE) { printf("Superblock is corrupted\n"); goto out1; } @@ -359,9 +360,11 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent) sizeof(struct bfs_inode) + BFS_ROOT_INO - 1; imap_len = (info->si_lasti / 8) + 1; - info->si_imap = kzalloc(imap_len, GFP_KERNEL); - if (!info->si_imap) + info->si_imap = kzalloc(imap_len, GFP_KERNEL | __GFP_NOWARN); + if (!info->si_imap) { + printf("Cannot allocate %u bytes\n", imap_len); goto out1; + } for (i = 0; i < BFS_ROOT_INO; i++) set_bit(i, info->si_imap);
syzbot is reporting too large memory allocation at bfs_fill_super() [1]. Since file system image is corrupted such that bfs_sb->s_start == 0, bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf(). [1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+71c6b5d68e91149fc8a4@syzkaller.appspotmail.com> Cc: Tigran Aivazian <aivazian.tigran@gmail.com> --- fs/bfs/inode.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)