Message ID | s5ha8o8uoqb.wl-tiwai@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: > On Wed, 13 Jan 2016 21:54:10 +0100, > Takashi Iwai wrote: >> >> OK, then this might be a possible race at the current snd_timer_stop() >> implementation. There is no sync action there, so the ISR might be >> still alive after snd_timer_close() call. Or might be another race. >> This pattern looks a bit different, as it's involved with hrtimer. >> >> I'll take a look at it tomorrow. > > I've audited the code today, but the open window doesn't look like > what I expected. I found only some possible cases with slave timer > instances. > > In anyway, below is a test fix patch. Since I couldn't reproduce the > issue on my local machines, it's hard to say whether this covers the > holes you fell. Let's see... Hi Takashi, I would be interested to understand why other people can't reproduce issues that I hit pretty reliably. I suspect that it can be due to .config. Please try with the following config values. I also start qemu with "-soundhw all" arg. CONFIG_SOUND=y CONFIG_SOUND_OSS_CORE=y CONFIG_SOUND_OSS_CORE_PRECLAIM=y CONFIG_SND=y CONFIG_SND_TIMER=y CONFIG_SND_PCM=y CONFIG_SND_HWDEP=y CONFIG_SND_RAWMIDI=y CONFIG_SND_JACK=y CONFIG_SND_SEQUENCER=y CONFIG_SND_SEQ_DUMMY=y CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=y CONFIG_SND_PCM_OSS=y CONFIG_SND_PCM_OSS_PLUGINS=y CONFIG_SND_PCM_TIMER=y CONFIG_SND_SEQUENCER_OSS=y CONFIG_SND_HRTIMER=y CONFIG_SND_SEQ_HRTIMER_DEFAULT=y CONFIG_SND_SUPPORT_OLD_API=y CONFIG_SND_PROC_FS=y CONFIG_SND_VERBOSE_PROCFS=y CONFIG_SND_VMASTER=y CONFIG_SND_DMA_SGBUF=y CONFIG_SND_RAWMIDI_SEQ=y CONFIG_SND_OPL3_LIB_SEQ=y CONFIG_SND_MPU401_UART=y CONFIG_SND_OPL3_LIB=y CONFIG_SND_AC97_CODEC=y CONFIG_SND_DRIVERS=y CONFIG_SND_AC97_POWER_SAVE=y CONFIG_SND_AC97_POWER_SAVE_DEFAULT=0 CONFIG_SND_SB_COMMON=y CONFIG_SND_PCI=y CONFIG_SND_AD1889=y CONFIG_SND_ALS300=y CONFIG_SND_ALS4000=y CONFIG_SND_ALI5451=y CONFIG_SND_OXYGEN_LIB=y CONFIG_SND_VIRTUOSO=y CONFIG_SND_HDA=y CONFIG_SND_HDA_INTEL=y CONFIG_SND_HDA_HWDEP=y CONFIG_SND_HDA_RECONFIG=y CONFIG_SND_HDA_INPUT_BEEP=y CONFIG_SND_HDA_INPUT_BEEP_MODE=1 CONFIG_SND_HDA_PATCH_LOADER=y CONFIG_SND_HDA_CODEC_REALTEK=y CONFIG_SND_HDA_CODEC_ANALOG=y CONFIG_SND_HDA_CODEC_SIGMATEL=y CONFIG_SND_HDA_CODEC_VIA=y CONFIG_SND_HDA_CODEC_HDMI=y CONFIG_SND_HDA_GENERIC=y CONFIG_SND_HDA_POWER_SAVE_DEFAULT=0 CONFIG_SND_HDA_CORE=y CONFIG_SND_HDA_I915=y CONFIG_SND_HDA_PREALLOC_SIZE=64 CONFIG_SND_USB=y CONFIG_SND_USB_AUDIO=y CONFIG_SND_FIREWIRE=y CONFIG_SND_FIREWIRE_LIB=y CONFIG_SND_DICE=y CONFIG_SND_OXFW=y CONFIG_SND_ISIGHT=y CONFIG_SND_SCS1X=y CONFIG_SND_FIREWORKS=y CONFIG_SND_BEBOB=y CONFIG_SND_FIREWIRE_DIGI00X=y CONFIG_SND_FIREWIRE_TASCAM=y CONFIG_SND_PCMCIA=y > Takashi > > -- 8< -- > From: Takashi Iwai <tiwai@suse.de> > Subject: [PATCH] ALSA: timer: Harden slave timer list handling > > A slave timer instance might be still accessible in a racy way while > operating the master instance as it lacks of locking. Since the > master operation is mostly protected with timer->lock, we should cope > with it while changing the slave instance, too. Also, some linked > lists (active_list and ack_list) of slave instances aren't unlinked > immediately at stopping or closing, and this may lead to unexpected > accesses. > > This patch tries to address these issues. It adds spin lock of > timer->lock (either from master or slave, which is equivalent) in a > few places. For avoiding a deadlock, we ensure that the global > slave_active_lock is always locked at first before each timer lock. > > Also, ack and active_list of slave instances are properly unlinked at > snd_timer_stop() and snd_timer_close(). > > Last but not least, remove the superfluous call of _snd_timer_stop() > at removing slave links. This is a noop, and calling it may confuse > readers wrt locking. Further cleanup will follow in a later patch. > > Actually we've got reports of use-after-free by syzkaller fuzzer, and > this hopefully fixes these issues. > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Takashi Iwai <tiwai@suse.de> > --- > sound/core/timer.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/sound/core/timer.c b/sound/core/timer.c > index 3810ee8f1205..4e8d7bfffff6 100644 > --- a/sound/core/timer.c > +++ b/sound/core/timer.c > @@ -215,11 +215,13 @@ static void snd_timer_check_master(struct snd_timer_instance *master) > slave->slave_id == master->slave_id) { > list_move_tail(&slave->open_list, &master->slave_list_head); > spin_lock_irq(&slave_active_lock); > + spin_lock(&master->timer->lock); > slave->master = master; > slave->timer = master->timer; > if (slave->flags & SNDRV_TIMER_IFLG_RUNNING) > list_add_tail(&slave->active_list, > &master->slave_active_head); > + spin_unlock(&master->timer->lock); > spin_unlock_irq(&slave_active_lock); > } > } > @@ -346,15 +348,18 @@ int snd_timer_close(struct snd_timer_instance *timeri) > timer->hw.close) > timer->hw.close(timer); > /* remove slave links */ > + spin_lock_irq(&slave_active_lock); > + spin_lock(&timer->lock); > list_for_each_entry_safe(slave, tmp, &timeri->slave_list_head, > open_list) { > - spin_lock_irq(&slave_active_lock); > - _snd_timer_stop(slave, 1, SNDRV_TIMER_EVENT_RESOLUTION); > list_move_tail(&slave->open_list, &snd_timer_slave_list); > slave->master = NULL; > slave->timer = NULL; > - spin_unlock_irq(&slave_active_lock); > + list_del_init(&slave->ack_list); > + list_del_init(&slave->active_list); > } > + spin_unlock(&timer->lock); > + spin_unlock_irq(&slave_active_lock); > mutex_unlock(®ister_mutex); > } > out: > @@ -441,9 +446,12 @@ static int snd_timer_start_slave(struct snd_timer_instance *timeri) > > spin_lock_irqsave(&slave_active_lock, flags); > timeri->flags |= SNDRV_TIMER_IFLG_RUNNING; > - if (timeri->master) > + if (timeri->master && timeri->timer) { > + spin_lock(&timeri->timer->lock); > list_add_tail(&timeri->active_list, > &timeri->master->slave_active_head); > + spin_unlock(&timeri->timer->lock); > + } > spin_unlock_irqrestore(&slave_active_lock, flags); > return 1; /* delayed start */ > } > @@ -489,6 +497,8 @@ static int _snd_timer_stop(struct snd_timer_instance * timeri, > if (!keep_flag) { > spin_lock_irqsave(&slave_active_lock, flags); > timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING; > + list_del_init(&timeri->ack_list); > + list_del_init(&timeri->active_list); > spin_unlock_irqrestore(&slave_active_lock, flags); > } > goto __end; > -- > 2.7.0 > >
On Fri, 15 Jan 2016 09:06:10 +0100, Dmitry Vyukov wrote: > > On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: > > On Wed, 13 Jan 2016 21:54:10 +0100, > > Takashi Iwai wrote: > >> > >> OK, then this might be a possible race at the current snd_timer_stop() > >> implementation. There is no sync action there, so the ISR might be > >> still alive after snd_timer_close() call. Or might be another race. > >> This pattern looks a bit different, as it's involved with hrtimer. > >> > >> I'll take a look at it tomorrow. > > > > I've audited the code today, but the open window doesn't look like > > what I expected. I found only some possible cases with slave timer > > instances. > > > > In anyway, below is a test fix patch. Since I couldn't reproduce the > > issue on my local machines, it's hard to say whether this covers the > > holes you fell. Let's see... > > > Hi Takashi, > > I would be interested to understand why other people can't reproduce > issues that I hit pretty reliably. > I suspect that it can be due to .config. Please try with the following > config values. I guess rather other config, e.g. the kernel debug options. I suppose you enabled KASAN and DEBUG_LIST. What else? > I also start qemu with "-soundhw all" arg. OK, so you're testing with VM? This makes easier to recheck. Takashi
On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote: > On Fri, 15 Jan 2016 09:06:10 +0100, > Dmitry Vyukov wrote: >> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: >> > On Wed, 13 Jan 2016 21:54:10 +0100, >> > Takashi Iwai wrote: >> >> >> >> OK, then this might be a possible race at the current snd_timer_stop() >> >> implementation. There is no sync action there, so the ISR might be >> >> still alive after snd_timer_close() call. Or might be another race. >> >> This pattern looks a bit different, as it's involved with hrtimer. >> >> >> >> I'll take a look at it tomorrow. >> > >> > I've audited the code today, but the open window doesn't look like >> > what I expected. I found only some possible cases with slave timer >> > instances. >> > >> > In anyway, below is a test fix patch. Since I couldn't reproduce the >> > issue on my local machines, it's hard to say whether this covers the >> > holes you fell. Let's see... >> >> >> Hi Takashi, >> >> I would be interested to understand why other people can't reproduce >> issues that I hit pretty reliably. >> I suspect that it can be due to .config. Please try with the following >> config values. > > I guess rather other config, e.g. the kernel debug options. > I suppose you enabled KASAN and DEBUG_LIST. What else? I've attached my config (you will need to disable CONFIG_KCOV, it is not upstreamed). >> I also start qemu with "-soundhw all" arg. > > OK, so you're testing with VM? This makes easier to recheck. Yes, I start qemu as: qemu-system-x86_64 -hda wheezy.img -net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet -soundhw all
On Fri, 15 Jan 2016 12:03:17 +0100, Dmitry Vyukov wrote: > > On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote: > > On Fri, 15 Jan 2016 09:06:10 +0100, > > Dmitry Vyukov wrote: > >> > >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: > >> > On Wed, 13 Jan 2016 21:54:10 +0100, > >> > Takashi Iwai wrote: > >> >> > >> >> OK, then this might be a possible race at the current snd_timer_stop() > >> >> implementation. There is no sync action there, so the ISR might be > >> >> still alive after snd_timer_close() call. Or might be another race. > >> >> This pattern looks a bit different, as it's involved with hrtimer. > >> >> > >> >> I'll take a look at it tomorrow. > >> > > >> > I've audited the code today, but the open window doesn't look like > >> > what I expected. I found only some possible cases with slave timer > >> > instances. > >> > > >> > In anyway, below is a test fix patch. Since I couldn't reproduce the > >> > issue on my local machines, it's hard to say whether this covers the > >> > holes you fell. Let's see... > >> > >> > >> Hi Takashi, > >> > >> I would be interested to understand why other people can't reproduce > >> issues that I hit pretty reliably. > >> I suspect that it can be due to .config. Please try with the following > >> config values. > > > > I guess rather other config, e.g. the kernel debug options. > > I suppose you enabled KASAN and DEBUG_LIST. What else? > > I've attached my config (you will need to disable CONFIG_KCOV, it is > not upstreamed). Hm, that has lots of other drivers built-in... > >> I also start qemu with "-soundhw all" arg. > > > > OK, so you're testing with VM? This makes easier to recheck. > > Yes, I start qemu as: > > qemu-system-x86_64 -hda wheezy.img -net > user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel > arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug > earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa > node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp > sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet > -soundhw all And which test did trigger use-after-free, even with all previous patches? Takashi
On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote: > On Fri, 15 Jan 2016 12:03:17 +0100, > Dmitry Vyukov wrote: >> >> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote: >> > On Fri, 15 Jan 2016 09:06:10 +0100, >> > Dmitry Vyukov wrote: >> >> >> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: >> >> > On Wed, 13 Jan 2016 21:54:10 +0100, >> >> > Takashi Iwai wrote: >> >> >> >> >> >> OK, then this might be a possible race at the current snd_timer_stop() >> >> >> implementation. There is no sync action there, so the ISR might be >> >> >> still alive after snd_timer_close() call. Or might be another race. >> >> >> This pattern looks a bit different, as it's involved with hrtimer. >> >> >> >> >> >> I'll take a look at it tomorrow. >> >> > >> >> > I've audited the code today, but the open window doesn't look like >> >> > what I expected. I found only some possible cases with slave timer >> >> > instances. >> >> > >> >> > In anyway, below is a test fix patch. Since I couldn't reproduce the >> >> > issue on my local machines, it's hard to say whether this covers the >> >> > holes you fell. Let's see... >> >> >> >> >> >> Hi Takashi, >> >> >> >> I would be interested to understand why other people can't reproduce >> >> issues that I hit pretty reliably. >> >> I suspect that it can be due to .config. Please try with the following >> >> config values. >> > >> > I guess rather other config, e.g. the kernel debug options. >> > I suppose you enabled KASAN and DEBUG_LIST. What else? >> >> I've attached my config (you will need to disable CONFIG_KCOV, it is >> not upstreamed). > > Hm, that has lots of other drivers built-in... > >> >> I also start qemu with "-soundhw all" arg. >> > >> > OK, so you're testing with VM? This makes easier to recheck. >> >> Yes, I start qemu as: >> >> qemu-system-x86_64 -hda wheezy.img -net >> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel >> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug >> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa >> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp >> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet >> -soundhw all > > And which test did trigger use-after-free, even with all previous > patches? I will try to extract a new reproducer now. Meanwhile, can you try to reproduce this one: https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ ? I run the program in a tight parallel loop.
On Fri, 15 Jan 2016 15:38:58 +0100, Dmitry Vyukov wrote: > > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote: > > On Fri, 15 Jan 2016 12:03:17 +0100, > > Dmitry Vyukov wrote: > >> > >> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote: > >> > On Fri, 15 Jan 2016 09:06:10 +0100, > >> > Dmitry Vyukov wrote: > >> >> > >> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: > >> >> > On Wed, 13 Jan 2016 21:54:10 +0100, > >> >> > Takashi Iwai wrote: > >> >> >> > >> >> >> OK, then this might be a possible race at the current snd_timer_stop() > >> >> >> implementation. There is no sync action there, so the ISR might be > >> >> >> still alive after snd_timer_close() call. Or might be another race. > >> >> >> This pattern looks a bit different, as it's involved with hrtimer. > >> >> >> > >> >> >> I'll take a look at it tomorrow. > >> >> > > >> >> > I've audited the code today, but the open window doesn't look like > >> >> > what I expected. I found only some possible cases with slave timer > >> >> > instances. > >> >> > > >> >> > In anyway, below is a test fix patch. Since I couldn't reproduce the > >> >> > issue on my local machines, it's hard to say whether this covers the > >> >> > holes you fell. Let's see... > >> >> > >> >> > >> >> Hi Takashi, > >> >> > >> >> I would be interested to understand why other people can't reproduce > >> >> issues that I hit pretty reliably. > >> >> I suspect that it can be due to .config. Please try with the following > >> >> config values. > >> > > >> > I guess rather other config, e.g. the kernel debug options. > >> > I suppose you enabled KASAN and DEBUG_LIST. What else? > >> > >> I've attached my config (you will need to disable CONFIG_KCOV, it is > >> not upstreamed). > > > > Hm, that has lots of other drivers built-in... > > > >> >> I also start qemu with "-soundhw all" arg. > >> > > >> > OK, so you're testing with VM? This makes easier to recheck. > >> > >> Yes, I start qemu as: > >> > >> qemu-system-x86_64 -hda wheezy.img -net > >> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel > >> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug > >> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa > >> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp > >> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet > >> -soundhw all > > > > And which test did trigger use-after-free, even with all previous > > patches? > > I will try to extract a new reproducer now. > Meanwhile, can you try to reproduce this one: > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ > ? I run the program in a tight parallel loop. So you're running this in parallel? Or a tight sequential loop? I did the latter, and I tried even this on a bare metal, but couldn't trigger the Oops, so far. Meanwhile, I pushed the tree including all fixes at for-linus branch: git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus It'd be appreciated if you can test this one. thanks, Takashi
On Fri, Jan 15, 2016 at 4:21 PM, Takashi Iwai <tiwai@suse.de> wrote: > So you're running this in parallel? Or a tight sequential loop? > I did the latter, and I tried even this on a bare metal, but couldn't > trigger the Oops, so far. Yes, I run it in parallel using: $ go get golang.org/x/tools/cmd/stress $ ./stress -p 8 ./a.out But it just keeps 8 parallel processes running. > Meanwhile, I pushed the tree including all fixes at for-linus branch: > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus > > It'd be appreciated if you can test this one. Is it different from the patches you mailed? I keep several dozens fixes for bugs that are not yet merged into Linus tree + own kcov patch. It is not easy to rebase... Here is what I now have for sound/ https://gist.githubusercontent.com/dvyukov/dc29dbfd320126285fd8/raw/e2ca7b59c0dc118045f9fb4e3d052cbc751e787e/gistfile1.txt
On Fri, 15 Jan 2016 16:28:33 +0100, Dmitry Vyukov wrote: > > On Fri, Jan 15, 2016 at 4:21 PM, Takashi Iwai <tiwai@suse.de> wrote: > > So you're running this in parallel? Or a tight sequential loop? > > I did the latter, and I tried even this on a bare metal, but couldn't > > trigger the Oops, so far. > > Yes, I run it in parallel using: > > $ go get golang.org/x/tools/cmd/stress > $ ./stress -p 8 ./a.out > > But it just keeps 8 parallel processes running. OK, then a bit different than I tested. Will check. > > Meanwhile, I pushed the tree including all fixes at for-linus branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus > > > > It'd be appreciated if you can test this one. > > Is it different from the patches you mailed? No, they should be basically same, but just to make sure that we're on the same ground. > I keep several dozens > fixes for bugs that are not yet merged into Linus tree + own kcov > patch. It is not easy to rebase... The branch should be pullable onto 4.4-final cleanly. > Here is what I now have for sound/ > https://gist.githubusercontent.com/dvyukov/dc29dbfd320126285fd8/raw/e2ca7b59c0dc118045f9fb4e3d052cbc751e787e/gistfile1.txt > Takashi
On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote: >> On Fri, 15 Jan 2016 12:03:17 +0100, >> Dmitry Vyukov wrote: >>> >>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote: >>> > On Fri, 15 Jan 2016 09:06:10 +0100, >>> > Dmitry Vyukov wrote: >>> >> >>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: >>> >> > On Wed, 13 Jan 2016 21:54:10 +0100, >>> >> > Takashi Iwai wrote: >>> >> >> >>> >> >> OK, then this might be a possible race at the current snd_timer_stop() >>> >> >> implementation. There is no sync action there, so the ISR might be >>> >> >> still alive after snd_timer_close() call. Or might be another race. >>> >> >> This pattern looks a bit different, as it's involved with hrtimer. >>> >> >> >>> >> >> I'll take a look at it tomorrow. >>> >> > >>> >> > I've audited the code today, but the open window doesn't look like >>> >> > what I expected. I found only some possible cases with slave timer >>> >> > instances. >>> >> > >>> >> > In anyway, below is a test fix patch. Since I couldn't reproduce the >>> >> > issue on my local machines, it's hard to say whether this covers the >>> >> > holes you fell. Let's see... >>> >> >>> >> >>> >> Hi Takashi, >>> >> >>> >> I would be interested to understand why other people can't reproduce >>> >> issues that I hit pretty reliably. >>> >> I suspect that it can be due to .config. Please try with the following >>> >> config values. >>> > >>> > I guess rather other config, e.g. the kernel debug options. >>> > I suppose you enabled KASAN and DEBUG_LIST. What else? >>> >>> I've attached my config (you will need to disable CONFIG_KCOV, it is >>> not upstreamed). >> >> Hm, that has lots of other drivers built-in... >> >>> >> I also start qemu with "-soundhw all" arg. >>> > >>> > OK, so you're testing with VM? This makes easier to recheck. >>> >>> Yes, I start qemu as: >>> >>> qemu-system-x86_64 -hda wheezy.img -net >>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel >>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug >>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa >>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp >>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet >>> -soundhw all >> >> And which test did trigger use-after-free, even with all previous >> patches? > > I will try to extract a new reproducer now. Ok, I does not seem to see any crashes except the timer hangs below. Let's consider all other bugs as fixed. I will report anything new that I see separately. > Meanwhile, can you try to reproduce this one: > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ > ? I run the program in a tight parallel loop.
On Fri, 15 Jan 2016 20:13:11 +0100, Dmitry Vyukov wrote: > > On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote: > >> On Fri, 15 Jan 2016 12:03:17 +0100, > >> Dmitry Vyukov wrote: > >>> > >>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote: > >>> > On Fri, 15 Jan 2016 09:06:10 +0100, > >>> > Dmitry Vyukov wrote: > >>> >> > >>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: > >>> >> > On Wed, 13 Jan 2016 21:54:10 +0100, > >>> >> > Takashi Iwai wrote: > >>> >> >> > >>> >> >> OK, then this might be a possible race at the current snd_timer_stop() > >>> >> >> implementation. There is no sync action there, so the ISR might be > >>> >> >> still alive after snd_timer_close() call. Or might be another race. > >>> >> >> This pattern looks a bit different, as it's involved with hrtimer. > >>> >> >> > >>> >> >> I'll take a look at it tomorrow. > >>> >> > > >>> >> > I've audited the code today, but the open window doesn't look like > >>> >> > what I expected. I found only some possible cases with slave timer > >>> >> > instances. > >>> >> > > >>> >> > In anyway, below is a test fix patch. Since I couldn't reproduce the > >>> >> > issue on my local machines, it's hard to say whether this covers the > >>> >> > holes you fell. Let's see... > >>> >> > >>> >> > >>> >> Hi Takashi, > >>> >> > >>> >> I would be interested to understand why other people can't reproduce > >>> >> issues that I hit pretty reliably. > >>> >> I suspect that it can be due to .config. Please try with the following > >>> >> config values. > >>> > > >>> > I guess rather other config, e.g. the kernel debug options. > >>> > I suppose you enabled KASAN and DEBUG_LIST. What else? > >>> > >>> I've attached my config (you will need to disable CONFIG_KCOV, it is > >>> not upstreamed). > >> > >> Hm, that has lots of other drivers built-in... > >> > >>> >> I also start qemu with "-soundhw all" arg. > >>> > > >>> > OK, so you're testing with VM? This makes easier to recheck. > >>> > >>> Yes, I start qemu as: > >>> > >>> qemu-system-x86_64 -hda wheezy.img -net > >>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel > >>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug > >>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa > >>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp > >>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet > >>> -soundhw all > >> > >> And which test did trigger use-after-free, even with all previous > >> patches? > > > > I will try to extract a new reproducer now. > > Ok, I does not seem to see any crashes except the timer hangs below. > Let's consider all other bugs as fixed. I will report anything new > that I see separately. OK, good to hear. > > Meanwhile, can you try to reproduce this one: > > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ > > ? I run the program in a tight parallel loop. I could reproduce this after your suggestion with parallel runs. This seems specific to hrtimer. Possibly it's not about the snd-timer core itself. Could you check whether this doesn't happen when CONFIG_SND_HRTIMER isn't set? thanks, Takashi
On Fri, Jan 15, 2016 at 8:18 PM, Takashi Iwai <tiwai@suse.de> wrote: > On Fri, 15 Jan 2016 20:13:11 +0100, > Dmitry Vyukov wrote: >> >> On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov <dvyukov@google.com> wrote: >> > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote: >> >> On Fri, 15 Jan 2016 12:03:17 +0100, >> >> Dmitry Vyukov wrote: >> >>> >> >>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote: >> >>> > On Fri, 15 Jan 2016 09:06:10 +0100, >> >>> > Dmitry Vyukov wrote: >> >>> >> >> >>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: >> >>> >> > On Wed, 13 Jan 2016 21:54:10 +0100, >> >>> >> > Takashi Iwai wrote: >> >>> >> >> >> >>> >> >> OK, then this might be a possible race at the current snd_timer_stop() >> >>> >> >> implementation. There is no sync action there, so the ISR might be >> >>> >> >> still alive after snd_timer_close() call. Or might be another race. >> >>> >> >> This pattern looks a bit different, as it's involved with hrtimer. >> >>> >> >> >> >>> >> >> I'll take a look at it tomorrow. >> >>> >> > >> >>> >> > I've audited the code today, but the open window doesn't look like >> >>> >> > what I expected. I found only some possible cases with slave timer >> >>> >> > instances. >> >>> >> > >> >>> >> > In anyway, below is a test fix patch. Since I couldn't reproduce the >> >>> >> > issue on my local machines, it's hard to say whether this covers the >> >>> >> > holes you fell. Let's see... >> >>> >> >> >>> >> >> >>> >> Hi Takashi, >> >>> >> >> >>> >> I would be interested to understand why other people can't reproduce >> >>> >> issues that I hit pretty reliably. >> >>> >> I suspect that it can be due to .config. Please try with the following >> >>> >> config values. >> >>> > >> >>> > I guess rather other config, e.g. the kernel debug options. >> >>> > I suppose you enabled KASAN and DEBUG_LIST. What else? >> >>> >> >>> I've attached my config (you will need to disable CONFIG_KCOV, it is >> >>> not upstreamed). >> >> >> >> Hm, that has lots of other drivers built-in... >> >> >> >>> >> I also start qemu with "-soundhw all" arg. >> >>> > >> >>> > OK, so you're testing with VM? This makes easier to recheck. >> >>> >> >>> Yes, I start qemu as: >> >>> >> >>> qemu-system-x86_64 -hda wheezy.img -net >> >>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel >> >>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug >> >>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa >> >>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp >> >>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet >> >>> -soundhw all >> >> >> >> And which test did trigger use-after-free, even with all previous >> >> patches? >> > >> > I will try to extract a new reproducer now. >> >> Ok, I does not seem to see any crashes except the timer hangs below. >> Let's consider all other bugs as fixed. I will report anything new >> that I see separately. > > OK, good to hear. > >> > Meanwhile, can you try to reproduce this one: >> > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ >> > ? I run the program in a tight parallel loop. > > I could reproduce this after your suggestion with parallel runs. > > This seems specific to hrtimer. Possibly it's not about the snd-timer > core itself. Could you check whether this doesn't happen when > CONFIG_SND_HRTIMER isn't set? Does not happen without CONFIG_SND_HRTIMER. Do you mean that this is hrtimer bug?
On Fri, 15 Jan 2016 20:47:05 +0100, Dmitry Vyukov wrote: > > On Fri, Jan 15, 2016 at 8:18 PM, Takashi Iwai <tiwai@suse.de> wrote: > > On Fri, 15 Jan 2016 20:13:11 +0100, > > Dmitry Vyukov wrote: > >> > >> On Fri, Jan 15, 2016 at 3:38 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > >> > On Fri, Jan 15, 2016 at 2:51 PM, Takashi Iwai <tiwai@suse.de> wrote: > >> >> On Fri, 15 Jan 2016 12:03:17 +0100, > >> >> Dmitry Vyukov wrote: > >> >>> > >> >>> On Fri, Jan 15, 2016 at 12:00 PM, Takashi Iwai <tiwai@suse.de> wrote: > >> >>> > On Fri, 15 Jan 2016 09:06:10 +0100, > >> >>> > Dmitry Vyukov wrote: > >> >>> >> > >> >>> >> On Thu, Jan 14, 2016 at 5:09 PM, Takashi Iwai <tiwai@suse.de> wrote: > >> >>> >> > On Wed, 13 Jan 2016 21:54:10 +0100, > >> >>> >> > Takashi Iwai wrote: > >> >>> >> >> > >> >>> >> >> OK, then this might be a possible race at the current snd_timer_stop() > >> >>> >> >> implementation. There is no sync action there, so the ISR might be > >> >>> >> >> still alive after snd_timer_close() call. Or might be another race. > >> >>> >> >> This pattern looks a bit different, as it's involved with hrtimer. > >> >>> >> >> > >> >>> >> >> I'll take a look at it tomorrow. > >> >>> >> > > >> >>> >> > I've audited the code today, but the open window doesn't look like > >> >>> >> > what I expected. I found only some possible cases with slave timer > >> >>> >> > instances. > >> >>> >> > > >> >>> >> > In anyway, below is a test fix patch. Since I couldn't reproduce the > >> >>> >> > issue on my local machines, it's hard to say whether this covers the > >> >>> >> > holes you fell. Let's see... > >> >>> >> > >> >>> >> > >> >>> >> Hi Takashi, > >> >>> >> > >> >>> >> I would be interested to understand why other people can't reproduce > >> >>> >> issues that I hit pretty reliably. > >> >>> >> I suspect that it can be due to .config. Please try with the following > >> >>> >> config values. > >> >>> > > >> >>> > I guess rather other config, e.g. the kernel debug options. > >> >>> > I suppose you enabled KASAN and DEBUG_LIST. What else? > >> >>> > >> >>> I've attached my config (you will need to disable CONFIG_KCOV, it is > >> >>> not upstreamed). > >> >> > >> >> Hm, that has lots of other drivers built-in... > >> >> > >> >>> >> I also start qemu with "-soundhw all" arg. > >> >>> > > >> >>> > OK, so you're testing with VM? This makes easier to recheck. > >> >>> > >> >>> Yes, I start qemu as: > >> >>> > >> >>> qemu-system-x86_64 -hda wheezy.img -net > >> >>> user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel > >> >>> arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda debug > >> >>> earlyprintk=serial slub_debug=UZ" -enable-kvm -m 2G -numa > >> >>> node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 -smp > >> >>> sockets=2,cores=2,threads=1 -usb -usbdevice mouse -usbdevice tablet > >> >>> -soundhw all > >> >> > >> >> And which test did trigger use-after-free, even with all previous > >> >> patches? > >> > > >> > I will try to extract a new reproducer now. > >> > >> Ok, I does not seem to see any crashes except the timer hangs below. > >> Let's consider all other bugs as fixed. I will report anything new > >> that I see separately. > > > > OK, good to hear. > > > >> > Meanwhile, can you try to reproduce this one: > >> > https://groups.google.com/forum/#!msg/syzkaller/bbtG9_h1ONU/CPLblMC6FAAJ > >> > ? I run the program in a tight parallel loop. > > > > I could reproduce this after your suggestion with parallel runs. > > > > This seems specific to hrtimer. Possibly it's not about the snd-timer > > core itself. Could you check whether this doesn't happen when > > CONFIG_SND_HRTIMER isn't set? > > > Does not happen without CONFIG_SND_HRTIMER. > Do you mean that this is hrtimer bug? I guess rather it's a bug in snd-hrtimer driver. Will check it later. Takashi
diff --git a/sound/core/timer.c b/sound/core/timer.c index 3810ee8f1205..4e8d7bfffff6 100644 --- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -215,11 +215,13 @@ static void snd_timer_check_master(struct snd_timer_instance *master) slave->slave_id == master->slave_id) { list_move_tail(&slave->open_list, &master->slave_list_head); spin_lock_irq(&slave_active_lock); + spin_lock(&master->timer->lock); slave->master = master; slave->timer = master->timer; if (slave->flags & SNDRV_TIMER_IFLG_RUNNING) list_add_tail(&slave->active_list, &master->slave_active_head); + spin_unlock(&master->timer->lock); spin_unlock_irq(&slave_active_lock); } } @@ -346,15 +348,18 @@ int snd_timer_close(struct snd_timer_instance *timeri) timer->hw.close) timer->hw.close(timer); /* remove slave links */ + spin_lock_irq(&slave_active_lock); + spin_lock(&timer->lock); list_for_each_entry_safe(slave, tmp, &timeri->slave_list_head, open_list) { - spin_lock_irq(&slave_active_lock); - _snd_timer_stop(slave, 1, SNDRV_TIMER_EVENT_RESOLUTION); list_move_tail(&slave->open_list, &snd_timer_slave_list); slave->master = NULL; slave->timer = NULL; - spin_unlock_irq(&slave_active_lock); + list_del_init(&slave->ack_list); + list_del_init(&slave->active_list); } + spin_unlock(&timer->lock); + spin_unlock_irq(&slave_active_lock); mutex_unlock(®ister_mutex); } out: @@ -441,9 +446,12 @@ static int snd_timer_start_slave(struct snd_timer_instance *timeri) spin_lock_irqsave(&slave_active_lock, flags); timeri->flags |= SNDRV_TIMER_IFLG_RUNNING; - if (timeri->master) + if (timeri->master && timeri->timer) { + spin_lock(&timeri->timer->lock); list_add_tail(&timeri->active_list, &timeri->master->slave_active_head); + spin_unlock(&timeri->timer->lock); + } spin_unlock_irqrestore(&slave_active_lock, flags); return 1; /* delayed start */ } @@ -489,6 +497,8 @@ static int _snd_timer_stop(struct snd_timer_instance * timeri, if (!keep_flag) { spin_lock_irqsave(&slave_active_lock, flags); timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING; + list_del_init(&timeri->ack_list); + list_del_init(&timeri->active_list); spin_unlock_irqrestore(&slave_active_lock, flags); } goto __end;