diff mbox

sound: use-after-free in snd_timer_interrupt

Message ID s5ha8o8uoqb.wl-tiwai@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Takashi Iwai Jan. 14, 2016, 4:09 p.m. UTC
On Wed, 13 Jan 2016 21:54:10 +0100,
Takashi Iwai wrote:
> 
> OK, then this might be a possible race at the current snd_timer_stop()
> implementation.  There is no sync action there, so the ISR might be
> still alive after snd_timer_close() call.  Or might be another race.
> This pattern looks a bit different, as it's involved with hrtimer.
> 
> I'll take a look at it tomorrow.

I've audited the code today, but the open window doesn't look like
what I expected.  I found only some possible cases with slave timer
instances.

In anyway, below is a test fix patch.  Since I couldn't reproduce the
issue on my local machines, it's hard to say whether this covers the
holes you fell.  Let's see...


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] ALSA: timer: Harden slave timer list handling

A slave timer instance might be still accessible in a racy way while
operating the master instance as it lacks of locking.  Since the
master operation is mostly protected with timer->lock, we should cope
with it while changing the slave instance, too.  Also, some linked
lists (active_list and ack_list) of slave instances aren't unlinked
immediately at stopping or closing, and this may lead to unexpected
accesses.

This patch tries to address these issues.  It adds spin lock of
timer->lock (either from master or slave, which is equivalent) in a
few places.  For avoiding a deadlock, we ensure that the global
slave_active_lock is always locked at first before each timer lock.

Also, ack and active_list of slave instances are properly unlinked at
snd_timer_stop() and snd_timer_close().

Last but not least, remove the superfluous call of _snd_timer_stop()
at removing slave links.  This is a noop, and calling it may confuse
readers wrt locking.  Further cleanup will follow in a later patch.

Actually we've got reports of use-after-free by syzkaller fuzzer, and
this hopefully fixes these issues.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/core/timer.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

Dmitry Vyukov Jan. 15, 2016, 8:06 a.m. UTC | #1
On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
> On Wed, 13 Jan 2016 21:54:10 +0100,
> Takashi Iwai wrote:
>>
>> OK, then this might be a possible race at the current snd_timer_stop()
>> implementation.  There is no sync action there, so the ISR might be
>> still alive after snd_timer_close() call.  Or might be another race.
>> This pattern looks a bit different, as it's involved with hrtimer.
>>
>> I'll take a look at it tomorrow.
>
> I've audited the code today, but the open window doesn't look like
> what I expected.  I found only some possible cases with slave timer
> instances.
>
> In anyway, below is a test fix patch.  Since I couldn't reproduce the
> issue on my local machines, it's hard to say whether this covers the
> holes you fell.  Let's see...


Hi Takashi,

I would be interested to understand why other people can't reproduce
issues that I hit pretty reliably.
I suspect that it can be due to .config. Please try with the following
config values.

I also start qemu with "-soundhw all" arg.

CONFIG_SOUND=y
CONFIG_SOUND_OSS_CORE=y
CONFIG_SOUND_OSS_CORE_PRECLAIM=y
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_HWDEP=y
CONFIG_SND_RAWMIDI=y
CONFIG_SND_JACK=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=y
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_PCM_TIMER=y
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_HRTIMER=y
CONFIG_SND_SEQ_HRTIMER_DEFAULT=y
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_PROC_FS=y
CONFIG_SND_VERBOSE_PROCFS=y
CONFIG_SND_VMASTER=y
CONFIG_SND_DMA_SGBUF=y
CONFIG_SND_RAWMIDI_SEQ=y
CONFIG_SND_OPL3_LIB_SEQ=y
CONFIG_SND_MPU401_UART=y
CONFIG_SND_OPL3_LIB=y
CONFIG_SND_AC97_CODEC=y
CONFIG_SND_DRIVERS=y
CONFIG_SND_AC97_POWER_SAVE=y
CONFIG_SND_AC97_POWER_SAVE_DEFAULT=0
CONFIG_SND_SB_COMMON=y
CONFIG_SND_PCI=y
CONFIG_SND_AD1889=y
CONFIG_SND_ALS300=y
CONFIG_SND_ALS4000=y
CONFIG_SND_ALI5451=y
CONFIG_SND_OXYGEN_LIB=y
CONFIG_SND_VIRTUOSO=y
CONFIG_SND_HDA=y
CONFIG_SND_HDA_INTEL=y
CONFIG_SND_HDA_HWDEP=y
CONFIG_SND_HDA_RECONFIG=y
CONFIG_SND_HDA_INPUT_BEEP=y
CONFIG_SND_HDA_INPUT_BEEP_MODE=1
CONFIG_SND_HDA_PATCH_LOADER=y
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
CONFIG_SND_HDA_CODEC_VIA=y
CONFIG_SND_HDA_CODEC_HDMI=y
CONFIG_SND_HDA_GENERIC=y
CONFIG_SND_HDA_POWER_SAVE_DEFAULT=0
CONFIG_SND_HDA_CORE=y
CONFIG_SND_HDA_I915=y
CONFIG_SND_HDA_PREALLOC_SIZE=64
CONFIG_SND_USB=y
CONFIG_SND_USB_AUDIO=y
CONFIG_SND_FIREWIRE=y
CONFIG_SND_FIREWIRE_LIB=y
CONFIG_SND_DICE=y
CONFIG_SND_OXFW=y
CONFIG_SND_ISIGHT=y
CONFIG_SND_SCS1X=y
CONFIG_SND_FIREWORKS=y
CONFIG_SND_BEBOB=y
CONFIG_SND_FIREWIRE_DIGI00X=y
CONFIG_SND_FIREWIRE_TASCAM=y
CONFIG_SND_PCMCIA=y



> Takashi
>
> -- 8< --
> From: Takashi Iwai <tiwai@suse.de>
> Subject: [PATCH] ALSA: timer: Harden slave timer list handling
>
> A slave timer instance might be still accessible in a racy way while
> operating the master instance as it lacks of locking.  Since the
> master operation is mostly protected with timer->lock, we should cope
> with it while changing the slave instance, too.  Also, some linked
> lists (active_list and ack_list) of slave instances aren't unlinked
> immediately at stopping or closing, and this may lead to unexpected
> accesses.
>
> This patch tries to address these issues.  It adds spin lock of
> timer->lock (either from master or slave, which is equivalent) in a
> few places.  For avoiding a deadlock, we ensure that the global
> slave_active_lock is always locked at first before each timer lock.
>
> Also, ack and active_list of slave instances are properly unlinked at
> snd_timer_stop() and snd_timer_close().
>
> Last but not least, remove the superfluous call of _snd_timer_stop()
> at removing slave links.  This is a noop, and calling it may confuse
> readers wrt locking.  Further cleanup will follow in a later patch.
>
> Actually we've got reports of use-after-free by syzkaller fuzzer, and
> this hopefully fixes these issues.
>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  sound/core/timer.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/sound/core/timer.c b/sound/core/timer.c
> index 3810ee8f1205..4e8d7bfffff6 100644
> --- a/sound/core/timer.c
> +++ b/sound/core/timer.c
> @@ -215,11 +215,13 @@ static void snd_timer_check_master(struct snd_timer_instance *master)
>                     slave->slave_id == master->slave_id) {
>                         list_move_tail(&slave->open_list, &master->slave_list_head);
>                         spin_lock_irq(&slave_active_lock);
> +                       spin_lock(&master->timer->lock);
>                         slave->master = master;
>                         slave->timer = master->timer;
>                         if (slave->flags & SNDRV_TIMER_IFLG_RUNNING)
>                                 list_add_tail(&slave->active_list,
>                                               &master->slave_active_head);
> +                       spin_unlock(&master->timer->lock);
>                         spin_unlock_irq(&slave_active_lock);
>                 }
>         }
> @@ -346,15 +348,18 @@ int snd_timer_close(struct snd_timer_instance *timeri)
>                     timer->hw.close)
>                         timer->hw.close(timer);
>                 /* remove slave links */
> +               spin_lock_irq(&slave_active_lock);
> +               spin_lock(&timer->lock);
>                 list_for_each_entry_safe(slave, tmp, &timeri->slave_list_head,
>                                          open_list) {
> -                       spin_lock_irq(&slave_active_lock);
> -                       _snd_timer_stop(slave, 1, SNDRV_TIMER_EVENT_RESOLUTION);
>                         list_move_tail(&slave->open_list, &snd_timer_slave_list);
>                         slave->master = NULL;
>                         slave->timer = NULL;
> -                       spin_unlock_irq(&slave_active_lock);
> +                       list_del_init(&slave->ack_list);
> +                       list_del_init(&slave->active_list);
>                 }
> +               spin_unlock(&timer->lock);
> +               spin_unlock_irq(&slave_active_lock);
>                 mutex_unlock(&register_mutex);
>         }
>   out:
> @@ -441,9 +446,12 @@ static int snd_timer_start_slave(struct snd_timer_instance *timeri)
>
>         spin_lock_irqsave(&slave_active_lock, flags);
>         timeri->flags |= SNDRV_TIMER_IFLG_RUNNING;
> -       if (timeri->master)
> +       if (timeri->master && timeri->timer) {
> +               spin_lock(&timeri->timer->lock);
>                 list_add_tail(&timeri->active_list,
>                               &timeri->master->slave_active_head);
> +               spin_unlock(&timeri->timer->lock);
> +       }
>         spin_unlock_irqrestore(&slave_active_lock, flags);
>         return 1; /* delayed start */
>  }
> @@ -489,6 +497,8 @@ static int _snd_timer_stop(struct snd_timer_instance * timeri,
>                 if (!keep_flag) {
>                         spin_lock_irqsave(&slave_active_lock, flags);
>                         timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING;
> +                       list_del_init(&timeri->ack_list);
> +                       list_del_init(&timeri->active_list);
>                         spin_unlock_irqrestore(&slave_active_lock, flags);
>                 }
>                 goto __end;
> --
> 2.7.0
>
>
Takashi Iwai Jan. 15, 2016, 11 a.m. UTC | #2
On Fri, 15 Jan 2016 09:06:10 +0100,
Dmitry Vyukov wrote:
> 
> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
> > On Wed, 13 Jan 2016 21:54:10 +0100,
> > Takashi Iwai wrote:
> >>
> >> OK, then this might be a possible race at the current snd_timer_stop()
> >> implementation.  There is no sync action there, so the ISR might be
> >> still alive after snd_timer_close() call.  Or might be another race.
> >> This pattern looks a bit different, as it's involved with hrtimer.
> >>
> >> I'll take a look at it tomorrow.
> >
> > I've audited the code today, but the open window doesn't look like
> > what I expected.  I found only some possible cases with slave timer
> > instances.
> >
> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
> > issue on my local machines, it's hard to say whether this covers the
> > holes you fell.  Let's see...
> 
> 
> Hi Takashi,
> 
> I would be interested to understand why other people can't reproduce
> issues that I hit pretty reliably.
> I suspect that it can be due to .config. Please try with the following
> config values.

I guess rather other config, e.g. the kernel debug options.
I suppose you enabled KASAN and DEBUG_LIST.  What else?


> I also start qemu with "-soundhw all" arg.

OK, so you're testing with VM?  This makes easier to recheck.


Takashi
Dmitry Vyukov Jan. 15, 2016, 11:03 a.m. UTC | #3
On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote:
> On Fri, 15 Jan 2016 09:06:10 +0100,
> Dmitry Vyukov wrote:
>>
>> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
>> > On Wed, 13 Jan 2016 21:54:10 +0100,
>> > Takashi Iwai wrote:
>> >>
>> >> OK, then this might be a possible race at the current snd_timer_stop()
>> >> implementation.  There is no sync action there, so the ISR might be
>> >> still alive after snd_timer_close() call.  Or might be another race.
>> >> This pattern looks a bit different, as it's involved with hrtimer.
>> >>
>> >> I'll take a look at it tomorrow.
>> >
>> > I've audited the code today, but the open window doesn't look like
>> > what I expected.  I found only some possible cases with slave timer
>> > instances.
>> >
>> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
>> > issue on my local machines, it's hard to say whether this covers the
>> > holes you fell.  Let's see...
>>
>>
>> Hi Takashi,
>>
>> I would be interested to understand why other people can't reproduce
>> issues that I hit pretty reliably.
>> I suspect that it can be due to .config. Please try with the following
>> config values.
>
> I guess rather other config, e.g. the kernel debug options.
> I suppose you enabled KASAN and DEBUG_LIST.  What else?

I've attached my config (you will need to disable CONFIG_KCOV, it is
not upstreamed).


>> I also start qemu with "-soundhw all" arg.
>
> OK, so you're testing with VM?  This makes easier to recheck.

Yes, I start qemu as:

qemu-system-x86_64 -hda wheezy.img -net
user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug
earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa
node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet
-soundhw all
Takashi Iwai Jan. 15, 2016, 1:51 p.m. UTC | #4
On Fri, 15 Jan 2016 12:03:17 +0100,
Dmitry Vyukov wrote:
> 
> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote:
> > On Fri, 15 Jan 2016 09:06:10 +0100,
> > Dmitry Vyukov wrote:
> >>
> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >> > On Wed, 13 Jan 2016 21:54:10 +0100,
> >> > Takashi Iwai wrote:
> >> >>
> >> >> OK, then this might be a possible race at the current snd_timer_stop()
> >> >> implementation.  There is no sync action there, so the ISR might be
> >> >> still alive after snd_timer_close() call.  Or might be another race.
> >> >> This pattern looks a bit different, as it's involved with hrtimer.
> >> >>
> >> >> I'll take a look at it tomorrow.
> >> >
> >> > I've audited the code today, but the open window doesn't look like
> >> > what I expected.  I found only some possible cases with slave timer
> >> > instances.
> >> >
> >> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
> >> > issue on my local machines, it's hard to say whether this covers the
> >> > holes you fell.  Let's see...
> >>
> >>
> >> Hi Takashi,
> >>
> >> I would be interested to understand why other people can't reproduce
> >> issues that I hit pretty reliably.
> >> I suspect that it can be due to .config. Please try with the following
> >> config values.
> >
> > I guess rather other config, e.g. the kernel debug options.
> > I suppose you enabled KASAN and DEBUG_LIST.  What else?
> 
> I've attached my config (you will need to disable CONFIG_KCOV, it is
> not upstreamed).

Hm, that has lots of other drivers built-in...

> >> I also start qemu with "-soundhw all" arg.
> >
> > OK, so you're testing with VM?  This makes easier to recheck.
> 
> Yes, I start qemu as:
> 
> qemu-system-x86_64 -hda wheezy.img -net
> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug
> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa
> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet
> -soundhw all

And which test did trigger use-after-free, even with all previous
patches?


Takashi
Dmitry Vyukov Jan. 15, 2016, 2:38 p.m. UTC | #5
On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote:
> On Fri, 15 Jan 2016 12:03:17 +0100,
> Dmitry Vyukov wrote:
>>
>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote:
>> > On Fri, 15 Jan 2016 09:06:10 +0100,
>> > Dmitry Vyukov wrote:
>> >>
>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
>> >> > On Wed, 13 Jan 2016 21:54:10 +0100,
>> >> > Takashi Iwai wrote:
>> >> >>
>> >> >> OK, then this might be a possible race at the current snd_timer_stop()
>> >> >> implementation.  There is no sync action there, so the ISR might be
>> >> >> still alive after snd_timer_close() call.  Or might be another race.
>> >> >> This pattern looks a bit different, as it's involved with hrtimer.
>> >> >>
>> >> >> I'll take a look at it tomorrow.
>> >> >
>> >> > I've audited the code today, but the open window doesn't look like
>> >> > what I expected.  I found only some possible cases with slave timer
>> >> > instances.
>> >> >
>> >> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
>> >> > issue on my local machines, it's hard to say whether this covers the
>> >> > holes you fell.  Let's see...
>> >>
>> >>
>> >> Hi Takashi,
>> >>
>> >> I would be interested to understand why other people can't reproduce
>> >> issues that I hit pretty reliably.
>> >> I suspect that it can be due to .config. Please try with the following
>> >> config values.
>> >
>> > I guess rather other config, e.g. the kernel debug options.
>> > I suppose you enabled KASAN and DEBUG_LIST.  What else?
>>
>> I've attached my config (you will need to disable CONFIG_KCOV, it is
>> not upstreamed).
>
> Hm, that has lots of other drivers built-in...
>
>> >> I also start qemu with "-soundhw all" arg.
>> >
>> > OK, so you're testing with VM?  This makes easier to recheck.
>>
>> Yes, I start qemu as:
>>
>> qemu-system-x86_64 -hda wheezy.img -net
>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug
>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa
>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet
>> -soundhw all
>
> And which test did trigger use-after-free, even with all previous
> patches?

I will try to extract a new reproducer now.
Meanwhile, can you try to reproduce this one:
https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ
? I run the program in a tight parallel loop.
Takashi Iwai Jan. 15, 2016, 3:21 p.m. UTC | #6
On Fri, 15 Jan 2016 15:38:58 +0100,
Dmitry Vyukov wrote:
> 
> On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote:
> > On Fri, 15 Jan 2016 12:03:17 +0100,
> > Dmitry Vyukov wrote:
> >>
> >> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >> > On Fri, 15 Jan 2016 09:06:10 +0100,
> >> > Dmitry Vyukov wrote:
> >> >>
> >> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >> >> > On Wed, 13 Jan 2016 21:54:10 +0100,
> >> >> > Takashi Iwai wrote:
> >> >> >>
> >> >> >> OK, then this might be a possible race at the current snd_timer_stop()
> >> >> >> implementation.  There is no sync action there, so the ISR might be
> >> >> >> still alive after snd_timer_close() call.  Or might be another race.
> >> >> >> This pattern looks a bit different, as it's involved with hrtimer.
> >> >> >>
> >> >> >> I'll take a look at it tomorrow.
> >> >> >
> >> >> > I've audited the code today, but the open window doesn't look like
> >> >> > what I expected.  I found only some possible cases with slave timer
> >> >> > instances.
> >> >> >
> >> >> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
> >> >> > issue on my local machines, it's hard to say whether this covers the
> >> >> > holes you fell.  Let's see...
> >> >>
> >> >>
> >> >> Hi Takashi,
> >> >>
> >> >> I would be interested to understand why other people can't reproduce
> >> >> issues that I hit pretty reliably.
> >> >> I suspect that it can be due to .config. Please try with the following
> >> >> config values.
> >> >
> >> > I guess rather other config, e.g. the kernel debug options.
> >> > I suppose you enabled KASAN and DEBUG_LIST.  What else?
> >>
> >> I've attached my config (you will need to disable CONFIG_KCOV, it is
> >> not upstreamed).
> >
> > Hm, that has lots of other drivers built-in...
> >
> >> >> I also start qemu with "-soundhw all" arg.
> >> >
> >> > OK, so you're testing with VM?  This makes easier to recheck.
> >>
> >> Yes, I start qemu as:
> >>
> >> qemu-system-x86_64 -hda wheezy.img -net
> >> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
> >> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug
> >> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa
> >> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
> >> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet
> >> -soundhw all
> >
> > And which test did trigger use-after-free, even with all previous
> > patches?
> 
> I will try to extract a new reproducer now.
> Meanwhile, can you try to reproduce this one:
> https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ
> ? I run the program in a tight parallel loop.

So you're running this in parallel?  Or a tight sequential loop?
I did the latter, and I tried even this on a bare metal, but couldn't
trigger the Oops, so far.

Meanwhile, I pushed the tree including all fixes at for-linus branch:
  git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus

It'd be appreciated if you can test this one.


thanks,

Takashi
Dmitry Vyukov Jan. 15, 2016, 3:28 p.m. UTC | #7
On Fri, Jan 15, 2016 at 4:21 PM, Takashi Iwai <tiwai@suse.de> wrote:
> So you're running this in parallel?  Or a tight sequential loop?
> I did the latter, and I tried even this on a bare metal, but couldn't
> trigger the Oops, so far.

Yes, I run it in parallel using:

$ go get golang.org/x/tools/cmd/stress
$ ./stress -p 8 ./a.out

But it just keeps 8 parallel processes running.


> Meanwhile, I pushed the tree including all fixes at for-linus branch:
>   git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus
>
> It'd be appreciated if you can test this one.

Is it different from the patches you mailed? I keep several dozens
fixes for bugs that are not yet merged into Linus tree + own kcov
patch. It is not easy to rebase...
Here is what I now have for sound/
https://gist.githubusercontent.com/dvyukov/dc29dbfd320126285fd8/raw/e2ca7b59c0dc118045f9fb4e3d052cbc751e787e/gistfile1.txt
Takashi Iwai Jan. 15, 2016, 3:39 p.m. UTC | #8
On Fri, 15 Jan 2016 16:28:33 +0100,
Dmitry Vyukov wrote:
> 
> On Fri, Jan 15, 2016 at 4:21 PM, Takashi Iwai <tiwai@suse.de> wrote:
> > So you're running this in parallel?  Or a tight sequential loop?
> > I did the latter, and I tried even this on a bare metal, but couldn't
> > trigger the Oops, so far.
> 
> Yes, I run it in parallel using:
> 
> $ go get golang.org/x/tools/cmd/stress
> $ ./stress -p 8 ./a.out
> 
> But it just keeps 8 parallel processes running.

OK, then a bit different than I tested.  Will check.

> > Meanwhile, I pushed the tree including all fixes at for-linus branch:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus
> >
> > It'd be appreciated if you can test this one.
> 
> Is it different from the patches you mailed?

No, they should be basically same, but just to make sure that we're
on the same ground.

> I keep several dozens
> fixes for bugs that are not yet merged into Linus tree + own kcov
> patch. It is not easy to rebase...

The branch should be pullable onto 4.4-final cleanly.

> Here is what I now have for sound/
> https://gist.githubusercontent.com/dvyukov/dc29dbfd320126285fd8/raw/e2ca7b59c0dc118045f9fb4e3d052cbc751e787e/gistfile1.txt
> 

Takashi
Dmitry Vyukov Jan. 15, 2016, 7:13 p.m. UTC | #9
On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote:
>> On Fri, 15 Jan 2016 12:03:17 +0100,
>> Dmitry Vyukov wrote:
>>>
>>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote:
>>> > On Fri, 15 Jan 2016 09:06:10 +0100,
>>> > Dmitry Vyukov wrote:
>>> >>
>>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
>>> >> > On Wed, 13 Jan 2016 21:54:10 +0100,
>>> >> > Takashi Iwai wrote:
>>> >> >>
>>> >> >> OK, then this might be a possible race at the current snd_timer_stop()
>>> >> >> implementation.  There is no sync action there, so the ISR might be
>>> >> >> still alive after snd_timer_close() call.  Or might be another race.
>>> >> >> This pattern looks a bit different, as it's involved with hrtimer.
>>> >> >>
>>> >> >> I'll take a look at it tomorrow.
>>> >> >
>>> >> > I've audited the code today, but the open window doesn't look like
>>> >> > what I expected.  I found only some possible cases with slave timer
>>> >> > instances.
>>> >> >
>>> >> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
>>> >> > issue on my local machines, it's hard to say whether this covers the
>>> >> > holes you fell.  Let's see...
>>> >>
>>> >>
>>> >> Hi Takashi,
>>> >>
>>> >> I would be interested to understand why other people can't reproduce
>>> >> issues that I hit pretty reliably.
>>> >> I suspect that it can be due to .config. Please try with the following
>>> >> config values.
>>> >
>>> > I guess rather other config, e.g. the kernel debug options.
>>> > I suppose you enabled KASAN and DEBUG_LIST.  What else?
>>>
>>> I've attached my config (you will need to disable CONFIG_KCOV, it is
>>> not upstreamed).
>>
>> Hm, that has lots of other drivers built-in...
>>
>>> >> I also start qemu with "-soundhw all" arg.
>>> >
>>> > OK, so you're testing with VM?  This makes easier to recheck.
>>>
>>> Yes, I start qemu as:
>>>
>>> qemu-system-x86_64 -hda wheezy.img -net
>>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
>>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug
>>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa
>>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
>>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet
>>> -soundhw all
>>
>> And which test did trigger use-after-free, even with all previous
>> patches?
>
> I will try to extract a new reproducer now.

Ok, I does not seem to see any crashes except the timer hangs below.
Let's consider all other bugs as fixed. I will report anything new
that I see separately.


> Meanwhile, can you try to reproduce this one:
> https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ
> ? I run the program in a tight parallel loop.
Takashi Iwai Jan. 15, 2016, 7:18 p.m. UTC | #10
On Fri, 15 Jan 2016 20:13:11 +0100,
Dmitry Vyukov wrote:
> 
> On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >> On Fri, 15 Jan 2016 12:03:17 +0100,
> >> Dmitry Vyukov wrote:
> >>>
> >>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >>> > On Fri, 15 Jan 2016 09:06:10 +0100,
> >>> > Dmitry Vyukov wrote:
> >>> >>
> >>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >>> >> > On Wed, 13 Jan 2016 21:54:10 +0100,
> >>> >> > Takashi Iwai wrote:
> >>> >> >>
> >>> >> >> OK, then this might be a possible race at the current snd_timer_stop()
> >>> >> >> implementation.  There is no sync action there, so the ISR might be
> >>> >> >> still alive after snd_timer_close() call.  Or might be another race.
> >>> >> >> This pattern looks a bit different, as it's involved with hrtimer.
> >>> >> >>
> >>> >> >> I'll take a look at it tomorrow.
> >>> >> >
> >>> >> > I've audited the code today, but the open window doesn't look like
> >>> >> > what I expected.  I found only some possible cases with slave timer
> >>> >> > instances.
> >>> >> >
> >>> >> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
> >>> >> > issue on my local machines, it's hard to say whether this covers the
> >>> >> > holes you fell.  Let's see...
> >>> >>
> >>> >>
> >>> >> Hi Takashi,
> >>> >>
> >>> >> I would be interested to understand why other people can't reproduce
> >>> >> issues that I hit pretty reliably.
> >>> >> I suspect that it can be due to .config. Please try with the following
> >>> >> config values.
> >>> >
> >>> > I guess rather other config, e.g. the kernel debug options.
> >>> > I suppose you enabled KASAN and DEBUG_LIST.  What else?
> >>>
> >>> I've attached my config (you will need to disable CONFIG_KCOV, it is
> >>> not upstreamed).
> >>
> >> Hm, that has lots of other drivers built-in...
> >>
> >>> >> I also start qemu with "-soundhw all" arg.
> >>> >
> >>> > OK, so you're testing with VM?  This makes easier to recheck.
> >>>
> >>> Yes, I start qemu as:
> >>>
> >>> qemu-system-x86_64 -hda wheezy.img -net
> >>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
> >>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug
> >>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa
> >>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
> >>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet
> >>> -soundhw all
> >>
> >> And which test did trigger use-after-free, even with all previous
> >> patches?
> >
> > I will try to extract a new reproducer now.
> 
> Ok, I does not seem to see any crashes except the timer hangs below.
> Let's consider all other bugs as fixed. I will report anything new
> that I see separately.

OK, good to hear.

> > Meanwhile, can you try to reproduce this one:
> > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ
> > ? I run the program in a tight parallel loop.

I could reproduce this after your suggestion with parallel runs.

This seems specific to hrtimer.  Possibly it's not about the snd-timer
core itself.  Could you check whether this doesn't happen when
CONFIG_SND_HRTIMER isn't set?


thanks,

Takashi
Dmitry Vyukov Jan. 15, 2016, 7:47 p.m. UTC | #11
On Fri, Jan 15, 2016 at 8:18 PM, Takashi Iwai <tiwai@suse.de> wrote:
> On Fri, 15 Jan 2016 20:13:11 +0100,
> Dmitry Vyukov wrote:
>>
>> On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote:
>> >> On Fri, 15 Jan 2016 12:03:17 +0100,
>> >> Dmitry Vyukov wrote:
>> >>>
>> >>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote:
>> >>> > On Fri, 15 Jan 2016 09:06:10 +0100,
>> >>> > Dmitry Vyukov wrote:
>> >>> >>
>> >>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
>> >>> >> > On Wed, 13 Jan 2016 21:54:10 +0100,
>> >>> >> > Takashi Iwai wrote:
>> >>> >> >>
>> >>> >> >> OK, then this might be a possible race at the current snd_timer_stop()
>> >>> >> >> implementation.  There is no sync action there, so the ISR might be
>> >>> >> >> still alive after snd_timer_close() call.  Or might be another race.
>> >>> >> >> This pattern looks a bit different, as it's involved with hrtimer.
>> >>> >> >>
>> >>> >> >> I'll take a look at it tomorrow.
>> >>> >> >
>> >>> >> > I've audited the code today, but the open window doesn't look like
>> >>> >> > what I expected.  I found only some possible cases with slave timer
>> >>> >> > instances.
>> >>> >> >
>> >>> >> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
>> >>> >> > issue on my local machines, it's hard to say whether this covers the
>> >>> >> > holes you fell.  Let's see...
>> >>> >>
>> >>> >>
>> >>> >> Hi Takashi,
>> >>> >>
>> >>> >> I would be interested to understand why other people can't reproduce
>> >>> >> issues that I hit pretty reliably.
>> >>> >> I suspect that it can be due to .config. Please try with the following
>> >>> >> config values.
>> >>> >
>> >>> > I guess rather other config, e.g. the kernel debug options.
>> >>> > I suppose you enabled KASAN and DEBUG_LIST.  What else?
>> >>>
>> >>> I've attached my config (you will need to disable CONFIG_KCOV, it is
>> >>> not upstreamed).
>> >>
>> >> Hm, that has lots of other drivers built-in...
>> >>
>> >>> >> I also start qemu with "-soundhw all" arg.
>> >>> >
>> >>> > OK, so you're testing with VM?  This makes easier to recheck.
>> >>>
>> >>> Yes, I start qemu as:
>> >>>
>> >>> qemu-system-x86_64 -hda wheezy.img -net
>> >>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
>> >>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug
>> >>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa
>> >>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
>> >>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet
>> >>> -soundhw all
>> >>
>> >> And which test did trigger use-after-free, even with all previous
>> >> patches?
>> >
>> > I will try to extract a new reproducer now.
>>
>> Ok, I does not seem to see any crashes except the timer hangs below.
>> Let's consider all other bugs as fixed. I will report anything new
>> that I see separately.
>
> OK, good to hear.
>
>> > Meanwhile, can you try to reproduce this one:
>> > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ
>> > ? I run the program in a tight parallel loop.
>
> I could reproduce this after your suggestion with parallel runs.
>
> This seems specific to hrtimer.  Possibly it's not about the snd-timer
> core itself.  Could you check whether this doesn't happen when
> CONFIG_SND_HRTIMER isn't set?


Does not happen without CONFIG_SND_HRTIMER.
Do you mean that this is hrtimer bug?
Takashi Iwai Jan. 15, 2016, 9:22 p.m. UTC | #12
On Fri, 15 Jan 2016 20:47:05 +0100,
Dmitry Vyukov wrote:
> 
> On Fri, Jan 15, 2016 at 8:18 PM, Takashi Iwai <tiwai@suse.de> wrote:
> > On Fri, 15 Jan 2016 20:13:11 +0100,
> > Dmitry Vyukov wrote:
> >>
> >> On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> >> > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >> >> On Fri, 15 Jan 2016 12:03:17 +0100,
> >> >> Dmitry Vyukov wrote:
> >> >>>
> >> >>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >> >>> > On Fri, 15 Jan 2016 09:06:10 +0100,
> >> >>> > Dmitry Vyukov wrote:
> >> >>> >>
> >> >>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote:
> >> >>> >> > On Wed, 13 Jan 2016 21:54:10 +0100,
> >> >>> >> > Takashi Iwai wrote:
> >> >>> >> >>
> >> >>> >> >> OK, then this might be a possible race at the current snd_timer_stop()
> >> >>> >> >> implementation.  There is no sync action there, so the ISR might be
> >> >>> >> >> still alive after snd_timer_close() call.  Or might be another race.
> >> >>> >> >> This pattern looks a bit different, as it's involved with hrtimer.
> >> >>> >> >>
> >> >>> >> >> I'll take a look at it tomorrow.
> >> >>> >> >
> >> >>> >> > I've audited the code today, but the open window doesn't look like
> >> >>> >> > what I expected.  I found only some possible cases with slave timer
> >> >>> >> > instances.
> >> >>> >> >
> >> >>> >> > In anyway, below is a test fix patch.  Since I couldn't reproduce the
> >> >>> >> > issue on my local machines, it's hard to say whether this covers the
> >> >>> >> > holes you fell.  Let's see...
> >> >>> >>
> >> >>> >>
> >> >>> >> Hi Takashi,
> >> >>> >>
> >> >>> >> I would be interested to understand why other people can't reproduce
> >> >>> >> issues that I hit pretty reliably.
> >> >>> >> I suspect that it can be due to .config. Please try with the following
> >> >>> >> config values.
> >> >>> >
> >> >>> > I guess rather other config, e.g. the kernel debug options.
> >> >>> > I suppose you enabled KASAN and DEBUG_LIST.  What else?
> >> >>>
> >> >>> I've attached my config (you will need to disable CONFIG_KCOV, it is
> >> >>> not upstreamed).
> >> >>
> >> >> Hm, that has lots of other drivers built-in...
> >> >>
> >> >>> >> I also start qemu with "-soundhw all" arg.
> >> >>> >
> >> >>> > OK, so you're testing with VM?  This makes easier to recheck.
> >> >>>
> >> >>> Yes, I start qemu as:
> >> >>>
> >> >>> qemu-system-x86_64 -hda wheezy.img -net
> >> >>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
> >> >>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug
> >> >>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa
> >> >>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp
> >> >>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet
> >> >>> -soundhw all
> >> >>
> >> >> And which test did trigger use-after-free, even with all previous
> >> >> patches?
> >> >
> >> > I will try to extract a new reproducer now.
> >>
> >> Ok, I does not seem to see any crashes except the timer hangs below.
> >> Let's consider all other bugs as fixed. I will report anything new
> >> that I see separately.
> >
> > OK, good to hear.
> >
> >> > Meanwhile, can you try to reproduce this one:
> >> > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ
> >> > ? I run the program in a tight parallel loop.
> >
> > I could reproduce this after your suggestion with parallel runs.
> >
> > This seems specific to hrtimer.  Possibly it's not about the snd-timer
> > core itself.  Could you check whether this doesn't happen when
> > CONFIG_SND_HRTIMER isn't set?
> 
> 
> Does not happen without CONFIG_SND_HRTIMER.
> Do you mean that this is hrtimer bug?

I guess rather it's a bug in snd-hrtimer driver.
Will check it later.


Takashi
diff mbox

Patch

diff --git a/sound/core/timer.c b/sound/core/timer.c
index 3810ee8f1205..4e8d7bfffff6 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -215,11 +215,13 @@  static void snd_timer_check_master(struct snd_timer_instance *master)
 		    slave->slave_id == master->slave_id) {
 			list_move_tail(&slave->open_list, &master->slave_list_head);
 			spin_lock_irq(&slave_active_lock);
+			spin_lock(&master->timer->lock);
 			slave->master = master;
 			slave->timer = master->timer;
 			if (slave->flags & SNDRV_TIMER_IFLG_RUNNING)
 				list_add_tail(&slave->active_list,
 					      &master->slave_active_head);
+			spin_unlock(&master->timer->lock);
 			spin_unlock_irq(&slave_active_lock);
 		}
 	}
@@ -346,15 +348,18 @@  int snd_timer_close(struct snd_timer_instance *timeri)
 		    timer->hw.close)
 			timer->hw.close(timer);
 		/* remove slave links */
+		spin_lock_irq(&slave_active_lock);
+		spin_lock(&timer->lock);
 		list_for_each_entry_safe(slave, tmp, &timeri->slave_list_head,
 					 open_list) {
-			spin_lock_irq(&slave_active_lock);
-			_snd_timer_stop(slave, 1, SNDRV_TIMER_EVENT_RESOLUTION);
 			list_move_tail(&slave->open_list, &snd_timer_slave_list);
 			slave->master = NULL;
 			slave->timer = NULL;
-			spin_unlock_irq(&slave_active_lock);
+			list_del_init(&slave->ack_list);
+			list_del_init(&slave->active_list);
 		}
+		spin_unlock(&timer->lock);
+		spin_unlock_irq(&slave_active_lock);
 		mutex_unlock(&register_mutex);
 	}
  out:
@@ -441,9 +446,12 @@  static int snd_timer_start_slave(struct snd_timer_instance *timeri)
 
 	spin_lock_irqsave(&slave_active_lock, flags);
 	timeri->flags |= SNDRV_TIMER_IFLG_RUNNING;
-	if (timeri->master)
+	if (timeri->master && timeri->timer) {
+		spin_lock(&timeri->timer->lock);
 		list_add_tail(&timeri->active_list,
 			      &timeri->master->slave_active_head);
+		spin_unlock(&timeri->timer->lock);
+	}
 	spin_unlock_irqrestore(&slave_active_lock, flags);
 	return 1; /* delayed start */
 }
@@ -489,6 +497,8 @@  static int _snd_timer_stop(struct snd_timer_instance * timeri,
 		if (!keep_flag) {
 			spin_lock_irqsave(&slave_active_lock, flags);
 			timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING;
+			list_del_init(&timeri->ack_list);
+			list_del_init(&timeri->active_list);
 			spin_unlock_irqrestore(&slave_active_lock, flags);
 		}
 		goto __end;