Message ID | 1370291642-13259-2-git-send-email-sboyd@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/03/2013 10:33 PM, Stephen Boyd wrote: > On an SMP system with only one global clockevent and a dummy > clockevent per CPU we run into problems. We want the dummy > clockevents to be registered as the per CPU tick devices, but > we can only achieve that if we register the dummy clockevents > before the global clockevent or if we artificially inflate the > rating of the dummy clockevents to be higher than the rating > of the global clockevent. Failure to do so leads to boot > hangs when the dummy timers are registered on all other CPUs > besides the CPU that accepted the global clockevent as its tick > device and there is no broadcast timer to poke the dummy > devices. > > If we're registering multiple clockevents and one clockevent is > global and the other is local to a particular CPU we should > choose to use the local clockevent regardless of the rating of > the device. This way, if the clockevent is a dummy it will take > the tick device duty as long as there isn't a higher rated tick > device and any global clockevent will be bumped out into > broadcast mode, fixing the problem described above. It is not clear the connection between the changelog, the patch and the comment. Could you clarify a bit ? Thanks -- Daniel > Reported-by: Mark Rutland <mark.rutland@arm.com> > Tested-by: Mark Rutland <mark.rutland@arm.com> > Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>, > Cc: John Stultz <john.stultz@linaro.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Daniel Lezcano <daniel.lezcano@linaro.org> > Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> > --- > kernel/time/tick-common.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c > index 5d3fb10..3da62de 100644 > --- a/kernel/time/tick-common.c > +++ b/kernel/time/tick-common.c > @@ -254,9 +254,10 @@ static int tick_check_new_device(struct clock_event_device *newdev) > !(newdev->features & CLOCK_EVT_FEAT_ONESHOT)) > goto out_bc; > /* > - * Check the rating > + * Check the rating, but prefer CPU local devices > */ > - if (curdev->rating >= newdev->rating) > + if (curdev->rating >= newdev->rating && > + cpumask_equal(curdev->cpumask, newdev->cpumask)) > goto out_bc; > } > >
On 06/06, Daniel Lezcano wrote: > On 06/03/2013 10:33 PM, Stephen Boyd wrote: > > On an SMP system with only one global clockevent and a dummy > > clockevent per CPU we run into problems. We want the dummy > > clockevents to be registered as the per CPU tick devices, but > > we can only achieve that if we register the dummy clockevents > > before the global clockevent or if we artificially inflate the > > rating of the dummy clockevents to be higher than the rating > > of the global clockevent. Failure to do so leads to boot > > hangs when the dummy timers are registered on all other CPUs > > besides the CPU that accepted the global clockevent as its tick > > device and there is no broadcast timer to poke the dummy > > devices. > > > > If we're registering multiple clockevents and one clockevent is > > global and the other is local to a particular CPU we should > > choose to use the local clockevent regardless of the rating of > > the device. This way, if the clockevent is a dummy it will take > > the tick device duty as long as there isn't a higher rated tick > > device and any global clockevent will be bumped out into > > broadcast mode, fixing the problem described above. > > It is not clear the connection between the changelog, the patch and the > comment. Could you clarify a bit ? > There is one tick device per-cpu and one broadcast device. The broadcast device can only be a global clockevent, whereas the per-cpu tick device can be a global clockevent or a per-cpu clockevent. The code tries hard to keep per-cpu clockevents in the tick device slots but it has an ordering/rating requirement that doesn't work when there are only dummy per-cpu devices and one global device. Perhaps an example will help. Let's say you only have one global clockevent such as the sp804, and you have SMP enabled. To support SMP we have to register dummy clockevents on each CPU so that the sp804 can go into broadcast mode. If we don't do this, only the CPU that registered the sp804 will get interrupts while the other CPUs will be left with no tick device and thus no scheduling. To fix this we register dummy clockevents on all the CPUs _before_ we register the sp804 to force the sp804 into the broadcast slot. Or we give the dummy clockevents a higher rating than the sp804 so that when we register them after the sp804 the sp804 is bumped out to broadcast duty. If the dummy devices are registered before the sp804 we can give the dummies a low rating and the sp804 will still go into the broadcast slot due to this code: /* * If we have a cpu local device already, do not replace it * by a non cpu local device */ if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu))) goto out_bc; If we register the sp804 before the dummies we're also fine as long as the rating of the dummy is more than the sp804. Playing games with the dummy rating is not very nice so this patch fixes it by allowing the per-cpu device to replace the global device no matter what the rating of the global device is. This fixes the sp804 case when the dummy is rated lower than sp804 and it removes any ordering requirement from the registration of clockevents. It also completes the logic above where we prefer cpu local devices over non cpu local devices.
On 06/06/2013 08:04 PM, Stephen Boyd wrote: > On 06/06, Daniel Lezcano wrote: >> On 06/03/2013 10:33 PM, Stephen Boyd wrote: >>> On an SMP system with only one global clockevent and a dummy >>> clockevent per CPU we run into problems. We want the dummy >>> clockevents to be registered as the per CPU tick devices, but >>> we can only achieve that if we register the dummy clockevents >>> before the global clockevent or if we artificially inflate the >>> rating of the dummy clockevents to be higher than the rating >>> of the global clockevent. Failure to do so leads to boot >>> hangs when the dummy timers are registered on all other CPUs >>> besides the CPU that accepted the global clockevent as its tick >>> device and there is no broadcast timer to poke the dummy >>> devices. >>> >>> If we're registering multiple clockevents and one clockevent is >>> global and the other is local to a particular CPU we should >>> choose to use the local clockevent regardless of the rating of >>> the device. This way, if the clockevent is a dummy it will take >>> the tick device duty as long as there isn't a higher rated tick >>> device and any global clockevent will be bumped out into >>> broadcast mode, fixing the problem described above. >> >> It is not clear the connection between the changelog, the patch and the >> comment. Could you clarify a bit ? >> > > There is one tick device per-cpu and one broadcast device. The > broadcast device can only be a global clockevent, whereas the > per-cpu tick device can be a global clockevent or a per-cpu > clockevent. The code tries hard to keep per-cpu clockevents in > the tick device slots but it has an ordering/rating requirement > that doesn't work when there are only dummy per-cpu devices and > one global device. > > Perhaps an example will help. Let's say you only have one global > clockevent such as the sp804, and you have SMP enabled. To > support SMP we have to register dummy clockevents on each CPU so > that the sp804 can go into broadcast mode. If we don't do this, > only the CPU that registered the sp804 will get interrupts while > the other CPUs will be left with no tick device and thus no > scheduling. To fix this we register dummy clockevents on all the > CPUs _before_ we register the sp804 to force the sp804 into the > broadcast slot. Or we give the dummy clockevents a higher rating > than the sp804 so that when we register them after the sp804 the > sp804 is bumped out to broadcast duty. > > If the dummy devices are registered before the sp804 we can give > the dummies a low rating and the sp804 will still go into the > broadcast slot due to this code: > > /* > * If we have a cpu local device already, do not replace it > * by a non cpu local device > */ > if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu))) > goto out_bc; > > If we register the sp804 before the dummies we're also fine as > long as the rating of the dummy is more than the sp804. Playing > games with the dummy rating is not very nice so this patch fixes > it by allowing the per-cpu device to replace the global device no > matter what the rating of the global device is. > > This fixes the sp804 case when the dummy is rated lower than > sp804 and it removes any ordering requirement from the > registration of clockevents. It also completes the logic above > where we prefer cpu local devices over non cpu local devices. Thanks for the detailed explanation. Did Thomas reacted to this patch ?
On 06/07, Daniel Lezcano wrote: > On 06/06/2013 08:04 PM, Stephen Boyd wrote: > > On 06/06, Daniel Lezcano wrote: > >> On 06/03/2013 10:33 PM, Stephen Boyd wrote: > >>> On an SMP system with only one global clockevent and a dummy > >>> clockevent per CPU we run into problems. We want the dummy > >>> clockevents to be registered as the per CPU tick devices, but > >>> we can only achieve that if we register the dummy clockevents > >>> before the global clockevent or if we artificially inflate the > >>> rating of the dummy clockevents to be higher than the rating > >>> of the global clockevent. Failure to do so leads to boot > >>> hangs when the dummy timers are registered on all other CPUs > >>> besides the CPU that accepted the global clockevent as its tick > >>> device and there is no broadcast timer to poke the dummy > >>> devices. > >>> > >>> If we're registering multiple clockevents and one clockevent is > >>> global and the other is local to a particular CPU we should > >>> choose to use the local clockevent regardless of the rating of > >>> the device. This way, if the clockevent is a dummy it will take > >>> the tick device duty as long as there isn't a higher rated tick > >>> device and any global clockevent will be bumped out into > >>> broadcast mode, fixing the problem described above. > >> > >> It is not clear the connection between the changelog, the patch and the > >> comment. Could you clarify a bit ? > >> > > > > There is one tick device per-cpu and one broadcast device. The > > broadcast device can only be a global clockevent, whereas the > > per-cpu tick device can be a global clockevent or a per-cpu > > clockevent. The code tries hard to keep per-cpu clockevents in > > the tick device slots but it has an ordering/rating requirement > > that doesn't work when there are only dummy per-cpu devices and > > one global device. > > > > Perhaps an example will help. Let's say you only have one global > > clockevent such as the sp804, and you have SMP enabled. To > > support SMP we have to register dummy clockevents on each CPU so > > that the sp804 can go into broadcast mode. If we don't do this, > > only the CPU that registered the sp804 will get interrupts while > > the other CPUs will be left with no tick device and thus no > > scheduling. To fix this we register dummy clockevents on all the > > CPUs _before_ we register the sp804 to force the sp804 into the > > broadcast slot. Or we give the dummy clockevents a higher rating > > than the sp804 so that when we register them after the sp804 the > > sp804 is bumped out to broadcast duty. > > > > If the dummy devices are registered before the sp804 we can give > > the dummies a low rating and the sp804 will still go into the > > broadcast slot due to this code: > > > > /* > > * If we have a cpu local device already, do not replace it > > * by a non cpu local device > > */ > > if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu))) > > goto out_bc; > > > > If we register the sp804 before the dummies we're also fine as > > long as the rating of the dummy is more than the sp804. Playing > > games with the dummy rating is not very nice so this patch fixes > > it by allowing the per-cpu device to replace the global device no > > matter what the rating of the global device is. > > > > This fixes the sp804 case when the dummy is rated lower than > > sp804 and it removes any ordering requirement from the > > registration of clockevents. It also completes the logic above > > where we prefer cpu local devices over non cpu local devices. > > Thanks for the detailed explanation. > > Did Thomas reacted to this patch ? > So far there has been no response from Thomas.
On 06/06, Stephen Boyd wrote: > On 06/07, Daniel Lezcano wrote: > > On 06/06/2013 08:04 PM, Stephen Boyd wrote: > > > On 06/06, Daniel Lezcano wrote: > > >> On 06/03/2013 10:33 PM, Stephen Boyd wrote: > > >>> On an SMP system with only one global clockevent and a dummy > > >>> clockevent per CPU we run into problems. We want the dummy > > >>> clockevents to be registered as the per CPU tick devices, but > > >>> we can only achieve that if we register the dummy clockevents > > >>> before the global clockevent or if we artificially inflate the > > >>> rating of the dummy clockevents to be higher than the rating > > >>> of the global clockevent. Failure to do so leads to boot > > >>> hangs when the dummy timers are registered on all other CPUs > > >>> besides the CPU that accepted the global clockevent as its tick > > >>> device and there is no broadcast timer to poke the dummy > > >>> devices. > > >>> > > >>> If we're registering multiple clockevents and one clockevent is > > >>> global and the other is local to a particular CPU we should > > >>> choose to use the local clockevent regardless of the rating of > > >>> the device. This way, if the clockevent is a dummy it will take > > >>> the tick device duty as long as there isn't a higher rated tick > > >>> device and any global clockevent will be bumped out into > > >>> broadcast mode, fixing the problem described above. > > >> > > >> It is not clear the connection between the changelog, the patch and the > > >> comment. Could you clarify a bit ? > > >> > > > > > > There is one tick device per-cpu and one broadcast device. The > > > broadcast device can only be a global clockevent, whereas the > > > per-cpu tick device can be a global clockevent or a per-cpu > > > clockevent. The code tries hard to keep per-cpu clockevents in > > > the tick device slots but it has an ordering/rating requirement > > > that doesn't work when there are only dummy per-cpu devices and > > > one global device. > > > > > > Perhaps an example will help. Let's say you only have one global > > > clockevent such as the sp804, and you have SMP enabled. To > > > support SMP we have to register dummy clockevents on each CPU so > > > that the sp804 can go into broadcast mode. If we don't do this, > > > only the CPU that registered the sp804 will get interrupts while > > > the other CPUs will be left with no tick device and thus no > > > scheduling. To fix this we register dummy clockevents on all the > > > CPUs _before_ we register the sp804 to force the sp804 into the > > > broadcast slot. Or we give the dummy clockevents a higher rating > > > than the sp804 so that when we register them after the sp804 the > > > sp804 is bumped out to broadcast duty. > > > > > > If the dummy devices are registered before the sp804 we can give > > > the dummies a low rating and the sp804 will still go into the > > > broadcast slot due to this code: > > > > > > /* > > > * If we have a cpu local device already, do not replace it > > > * by a non cpu local device > > > */ > > > if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu))) > > > goto out_bc; > > > > > > If we register the sp804 before the dummies we're also fine as > > > long as the rating of the dummy is more than the sp804. Playing > > > games with the dummy rating is not very nice so this patch fixes > > > it by allowing the per-cpu device to replace the global device no > > > matter what the rating of the global device is. > > > > > > This fixes the sp804 case when the dummy is rated lower than > > > sp804 and it removes any ordering requirement from the > > > registration of clockevents. It also completes the logic above > > > where we prefer cpu local devices over non cpu local devices. > > > > Thanks for the detailed explanation. > > > > Did Thomas reacted to this patch ? > > > > So far there has been no response from Thomas. > Will you ack this patch anyway? Or do we need Thomas to review this patch? It seems that this patch series has stalled again.
On 06/12/2013 11:44 PM, Stephen Boyd wrote: > On 06/06, Stephen Boyd wrote: >> On 06/07, Daniel Lezcano wrote: >>> On 06/06/2013 08:04 PM, Stephen Boyd wrote: >>>> On 06/06, Daniel Lezcano wrote: >>>>> On 06/03/2013 10:33 PM, Stephen Boyd wrote: >>>>>> On an SMP system with only one global clockevent and a dummy >>>>>> clockevent per CPU we run into problems. We want the dummy >>>>>> clockevents to be registered as the per CPU tick devices, but >>>>>> we can only achieve that if we register the dummy clockevents >>>>>> before the global clockevent or if we artificially inflate the >>>>>> rating of the dummy clockevents to be higher than the rating >>>>>> of the global clockevent. Failure to do so leads to boot >>>>>> hangs when the dummy timers are registered on all other CPUs >>>>>> besides the CPU that accepted the global clockevent as its tick >>>>>> device and there is no broadcast timer to poke the dummy >>>>>> devices. >>>>>> >>>>>> If we're registering multiple clockevents and one clockevent is >>>>>> global and the other is local to a particular CPU we should >>>>>> choose to use the local clockevent regardless of the rating of >>>>>> the device. This way, if the clockevent is a dummy it will take >>>>>> the tick device duty as long as there isn't a higher rated tick >>>>>> device and any global clockevent will be bumped out into >>>>>> broadcast mode, fixing the problem described above. >>>>> >>>>> It is not clear the connection between the changelog, the patch and the >>>>> comment. Could you clarify a bit ? >>>>> >>>> >>>> There is one tick device per-cpu and one broadcast device. The >>>> broadcast device can only be a global clockevent, whereas the >>>> per-cpu tick device can be a global clockevent or a per-cpu >>>> clockevent. The code tries hard to keep per-cpu clockevents in >>>> the tick device slots but it has an ordering/rating requirement >>>> that doesn't work when there are only dummy per-cpu devices and >>>> one global device. >>>> >>>> Perhaps an example will help. Let's say you only have one global >>>> clockevent such as the sp804, and you have SMP enabled. To >>>> support SMP we have to register dummy clockevents on each CPU so >>>> that the sp804 can go into broadcast mode. If we don't do this, >>>> only the CPU that registered the sp804 will get interrupts while >>>> the other CPUs will be left with no tick device and thus no >>>> scheduling. To fix this we register dummy clockevents on all the >>>> CPUs _before_ we register the sp804 to force the sp804 into the >>>> broadcast slot. Or we give the dummy clockevents a higher rating >>>> than the sp804 so that when we register them after the sp804 the >>>> sp804 is bumped out to broadcast duty. >>>> >>>> If the dummy devices are registered before the sp804 we can give >>>> the dummies a low rating and the sp804 will still go into the >>>> broadcast slot due to this code: >>>> >>>> /* >>>> * If we have a cpu local device already, do not replace it >>>> * by a non cpu local device >>>> */ >>>> if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu))) >>>> goto out_bc; >>>> >>>> If we register the sp804 before the dummies we're also fine as >>>> long as the rating of the dummy is more than the sp804. Playing >>>> games with the dummy rating is not very nice so this patch fixes >>>> it by allowing the per-cpu device to replace the global device no >>>> matter what the rating of the global device is. >>>> >>>> This fixes the sp804 case when the dummy is rated lower than >>>> sp804 and it removes any ordering requirement from the >>>> registration of clockevents. It also completes the logic above >>>> where we prefer cpu local devices over non cpu local devices. >>> >>> Thanks for the detailed explanation. >>> >>> Did Thomas reacted to this patch ? >>> >> >> So far there has been no response from Thomas. >> > > Will you ack this patch anyway? Or do we need Thomas to review > this patch? It seems that this patch series has stalled again. I prefer Thomas to have a look at it and ack it. I changed Cc to To for Thomas. Thanks -- Daniel
On Thu, 13 Jun 2013, Daniel Lezcano wrote: > I prefer Thomas to have a look at it and ack it. I changed Cc to To for > Thomas. The patch does not apply on tip timers/core. The code has been reworked a month ago. Please work against tip timers/core. That's where this stuff ends up. Thanks, tglx
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 5d3fb10..3da62de 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -254,9 +254,10 @@ static int tick_check_new_device(struct clock_event_device *newdev) !(newdev->features & CLOCK_EVT_FEAT_ONESHOT)) goto out_bc; /* - * Check the rating + * Check the rating, but prefer CPU local devices */ - if (curdev->rating >= newdev->rating) + if (curdev->rating >= newdev->rating && + cpumask_equal(curdev->cpumask, newdev->cpumask)) goto out_bc; }