diff mbox

[RFC,00/10] hw/mos6522: VIA timer emulation fixes and improvements

Message ID cover.1629799776.git.fthain@linux-m68k.org (mailing list archive)
State New, archived
Headers show

Commit Message

Finn Thain Aug. 24, 2021, 10:09 a.m. UTC
This is a patch series that I started last year. The aim was to try to 
get a monotonic clocksource for Linux/m68k guests. That aim hasn't been 
achieved yet (for q800 machines) but I'm submitting the patch series as 
an RFC because,

 - It does improve 6522 emulation fidelity.

 - It allows Linux/m68k to make use of the additional timer that the 
   hardware indeed offers but which QEMU omits. This has several 
   benefits for Linux guests [1].

 - I see that Mark has been working on timer emulation issues in his 
   github repo [2] and it seems likely that MacOS, NetBSD or A/UX guests 
   will also require better 6522 emulation.

To make collaboration easier these patches can also be fetched from 
github [3].

On a real Quadra, accesses to the SY6522 chips are slow because they are 
synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow 
only because of the division operation in the timer count calculation.

This patch series improves the fidelity of the emulated chip, but the 
price is more division ops. I haven't tried to measure this yet.

The emulated 6522 still deviates from the behaviour of the real thing, 
however. For example, two consecutive accesses to a real 6522 timer 
counter can never yield the same value. This is not true of the 6522 in 
QEMU 6 wherein two consecutive accesses to a timer count register have 
been observed to yield the same value.

Linux is not particularly robust in the face of a 6522 that deviates 
from the usual behaviour. The problem presently affecting a Linux guest 
is that its 'via' clocksource is prone to monotonicity failure. That is, 
the clocksource counter can jump backwards. This can be observed by 
patching Linux like so:


This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100, 
the emulator will theoretically see each timer 1 interrupt cleared 
within 20 ms of the previous one. But that deadline is often missed on 
my QEMU host [4].

On real Mac hardware you could observe the same scenario if a high 
priority interrupt were to sufficiently delay the timer interrupt 
handler. (This is the reason why the VIA1 interrupt priority gets 
increased from level 1 to level 5 when running on Quadras.)

Anyway, for now, the clocksource monotonicity problem in Linux/mac68k 
guests is still unresolved. Nonetheless, I think this patch series does 
improve the situation.

[1] I've also been working on some improvements to Linux/m68k based on 
Arnd Bergman's clockevent RFC patch, 
https://lore.kernel.org/linux-m68k/20201008154651.1901126-14-arnd@arndb.de/ 
The idea is to add a oneshot clockevent device by making use of the 
second VIA1 timer. This approach should help mitigate the clock drift 
problem as well as assist with GENERIC_CLOCKEVENTS adoption.

[2] https://github.com/mcayland/qemu/commits/q800.upstream

[3] https://github.com/fthain/qemu/commits/via-timer/

[4] This theoretical 20 ms deadline is not missed prior to every 
backwards jump in the clocksource counter. AFAICT, that's because the 
true deadline is somewhat shorter than 20 ms.


Finn Thain (10):
  hw/mos6522: Remove get_load_time() methods and functions
  hw/mos6522: Remove get_counter_value() methods and functions
  hw/mos6522: Remove redundant mos6522_timer1_update() calls
  hw/mos6522: Rename timer callback functions
  hw/mos6522: Don't clear T1 interrupt flag on latch write
  hw/mos6522: Implement oneshot mode
  hw/mos6522: Fix initial timer counter reload
  hw/mos6522: Call mos6522_update_irq() when appropriate
  hw/mos6522: Avoid using discrepant QEMU clock values
  hw/mos6522: Synchronize timer interrupt and timer counter

 hw/misc/mos6522.c         | 232 +++++++++++++++++---------------------
 hw/misc/trace-events      |   2 +-
 include/hw/misc/mos6522.h |   9 ++
 3 files changed, 113 insertions(+), 130 deletions(-)

Comments

Philippe Mathieu-Daudé Aug. 24, 2021, 10:34 a.m. UTC | #1
On 8/24/21 12:09 PM, Finn Thain wrote:

> On a real Quadra, accesses to the SY6522 chips are slow because they are 
> synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow 
> only because of the division operation in the timer count calculation.
> 
> This patch series improves the fidelity of the emulated chip, but the 
> price is more division ops. I haven't tried to measure this yet.
> 
> The emulated 6522 still deviates from the behaviour of the real thing, 
> however. For example, two consecutive accesses to a real 6522 timer 
> counter can never yield the same value. This is not true of the 6522 in 
> QEMU 6 wherein two consecutive accesses to a timer count register have 
> been observed to yield the same value.
> 
> Linux is not particularly robust in the face of a 6522 that deviates 
> from the usual behaviour. The problem presently affecting a Linux guest 
> is that its 'via' clocksource is prone to monotonicity failure. That is, 
> the clocksource counter can jump backwards. This can be observed by 
> patching Linux like so:
> 
> diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
> --- a/arch/m68k/mac/via.c
> +++ b/arch/m68k/mac/via.c
> @@ -606,6 +606,8 @@ void __init via_init_clock(void)
>  	clocksource_register_hz(&mac_clk, VIA_CLOCK_FREQ);
>  }
>  
> +static u32 prev_ticks;
> +
>  static u64 mac_read_clk(struct clocksource *cs)
>  {
>  	unsigned long flags;
> @@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
>  	count = count_high << 8;
>  	ticks = VIA_TIMER_CYCLES - count;
>  	ticks += clk_offset + clk_total;
> +if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks, prev_ticks);
> +prev_ticks = ticks;
>  	local_irq_restore(flags);
>  
>  	return ticks;
> 
> This problem can be partly blamed on a 6522 design limitation, which is 
> that the timer counter has no overflow register. Hence, if a timer 
> counter wraps around and the kernel is late to handle the subsequent 
> interrupt, the kernel can't account for any missed ticks.
> 
> On a real Quadra, the kernel mitigates this limitation by minimizing 
> interrupt latency. But on QEMU, interrupt latency is unbounded. This 
> can't be mitigated by the guest kernel at all and leads to clock drift. 
> This can be observed by patching QEMU like so:
> 
> diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> --- a/hw/misc/mos6522.c
> +++ b/hw/misc/mos6522.c
> @@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>          s->pcr = val;
>          break;
>      case VIA_REG_IFR:
> +        if (val & T1_INT) {
> +            static int64_t last_t1_int_cleared;
> +            int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +            if (now - last_t1_int_cleared > 20000000) printf("\t%s: t1 int clear is late\n", __func__);
> +            last_t1_int_cleared = now;
> +        }
>          /* reset bits */
>          s->ifr &= ~val;
>          mos6522_update_irq(s);
> 
> This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100, 
> the emulator will theoretically see each timer 1 interrupt cleared 
> within 20 ms of the previous one. But that deadline is often missed on 
> my QEMU host [4].

I wonder if using QEMU ptimer wouldn't help here, instead of
calling qemu_clock_get_ns() and doing the math by hand:

/* Starting to run with/setting counter to "0" won't trigger immediately,
 * but after a one period for both oneshot and periodic modes.  */
#define PTIMER_POLICY_NO_IMMEDIATE_TRIGGER  (1 << 2)

/* Starting to run with/setting counter to "0" won't re-load counter
 * immediately, but after a one period.  */
#define PTIMER_POLICY_NO_IMMEDIATE_RELOAD   (1 << 3)

/* Make counter value of the running timer represent the actual value and
 * not the one less.  */
#define PTIMER_POLICY_NO_COUNTER_ROUND_DOWN (1 << 4)

> On real Mac hardware you could observe the same scenario if a high 
> priority interrupt were to sufficiently delay the timer interrupt 
> handler. (This is the reason why the VIA1 interrupt priority gets 
> increased from level 1 to level 5 when running on Quadras.)
> 
> Anyway, for now, the clocksource monotonicity problem in Linux/mac68k 
> guests is still unresolved. Nonetheless, I think this patch series does 
> improve the situation.
David Gibson Aug. 25, 2021, 3:11 a.m. UTC | #2
On Tue, Aug 24, 2021 at 08:09:36PM +1000, Finn Thain wrote:
> This is a patch series that I started last year. The aim was to try to 
> get a monotonic clocksource for Linux/m68k guests. That aim hasn't been 
> achieved yet (for q800 machines) but I'm submitting the patch series as 
> an RFC because,
> 
>  - It does improve 6522 emulation fidelity.
> 
>  - It allows Linux/m68k to make use of the additional timer that the 
>    hardware indeed offers but which QEMU omits. This has several 
>    benefits for Linux guests [1].
> 
>  - I see that Mark has been working on timer emulation issues in his 
>    github repo [2] and it seems likely that MacOS, NetBSD or A/UX guests 
>    will also require better 6522 emulation.
> 
> To make collaboration easier these patches can also be fetched from 
> github [3].
> 
> On a real Quadra, accesses to the SY6522 chips are slow because they are 
> synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow 
> only because of the division operation in the timer count calculation.
> 
> This patch series improves the fidelity of the emulated chip, but the 
> price is more division ops. I haven't tried to measure this yet.
> 
> The emulated 6522 still deviates from the behaviour of the real thing, 
> however. For example, two consecutive accesses to a real 6522 timer 
> counter can never yield the same value. This is not true of the 6522 in 
> QEMU 6 wherein two consecutive accesses to a timer count register have 
> been observed to yield the same value.
> 
> Linux is not particularly robust in the face of a 6522 that deviates 
> from the usual behaviour. The problem presently affecting a Linux guest 
> is that its 'via' clocksource is prone to monotonicity failure. That is, 
> the clocksource counter can jump backwards. This can be observed by 
> patching Linux like so:

I'm afraid I know very little about Mac hardware, and almost nothing
about m68k, so I can't really review these.  I have a batch of updates
for MAINTAINERS underway, which will take me off the list for this
file.
Mark Cave-Ayland Aug. 25, 2021, 9:10 a.m. UTC | #3
On 24/08/2021 11:09, Finn Thain wrote:

> This is a patch series that I started last year. The aim was to try to
> get a monotonic clocksource for Linux/m68k guests. That aim hasn't been
> achieved yet (for q800 machines) but I'm submitting the patch series as
> an RFC because,
> 
>   - It does improve 6522 emulation fidelity.
> 
>   - It allows Linux/m68k to make use of the additional timer that the
>     hardware indeed offers but which QEMU omits. This has several
>     benefits for Linux guests [1].
> 
>   - I see that Mark has been working on timer emulation issues in his
>     github repo [2] and it seems likely that MacOS, NetBSD or A/UX guests
>     will also require better 6522 emulation.
> 
> To make collaboration easier these patches can also be fetched from
> github [3].
> 
> On a real Quadra, accesses to the SY6522 chips are slow because they are
> synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow
> only because of the division operation in the timer count calculation.
> 
> This patch series improves the fidelity of the emulated chip, but the
> price is more division ops. I haven't tried to measure this yet.
> 
> The emulated 6522 still deviates from the behaviour of the real thing,
> however. For example, two consecutive accesses to a real 6522 timer
> counter can never yield the same value. This is not true of the 6522 in
> QEMU 6 wherein two consecutive accesses to a timer count register have
> been observed to yield the same value.
> 
> Linux is not particularly robust in the face of a 6522 that deviates
> from the usual behaviour. The problem presently affecting a Linux guest
> is that its 'via' clocksource is prone to monotonicity failure. That is,
> the clocksource counter can jump backwards. This can be observed by
> patching Linux like so:
> 
> diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
> --- a/arch/m68k/mac/via.c
> +++ b/arch/m68k/mac/via.c
> @@ -606,6 +606,8 @@ void __init via_init_clock(void)
>   	clocksource_register_hz(&mac_clk, VIA_CLOCK_FREQ);
>   }
>   
> +static u32 prev_ticks;
> +
>   static u64 mac_read_clk(struct clocksource *cs)
>   {
>   	unsigned long flags;
> @@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
>   	count = count_high << 8;
>   	ticks = VIA_TIMER_CYCLES - count;
>   	ticks += clk_offset + clk_total;
> +if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks, prev_ticks);
> +prev_ticks = ticks;
>   	local_irq_restore(flags);
>   
>   	return ticks;
> 
> This problem can be partly blamed on a 6522 design limitation, which is
> that the timer counter has no overflow register. Hence, if a timer
> counter wraps around and the kernel is late to handle the subsequent
> interrupt, the kernel can't account for any missed ticks.
> 
> On a real Quadra, the kernel mitigates this limitation by minimizing
> interrupt latency. But on QEMU, interrupt latency is unbounded. This
> can't be mitigated by the guest kernel at all and leads to clock drift.
> This can be observed by patching QEMU like so:
> 
> diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> --- a/hw/misc/mos6522.c
> +++ b/hw/misc/mos6522.c
> @@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>           s->pcr = val;
>           break;
>       case VIA_REG_IFR:
> +        if (val & T1_INT) {
> +            static int64_t last_t1_int_cleared;
> +            int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +            if (now - last_t1_int_cleared > 20000000) printf("\t%s: t1 int clear is late\n", __func__);
> +            last_t1_int_cleared = now;
> +        }
>           /* reset bits */
>           s->ifr &= ~val;
>           mos6522_update_irq(s);
> 
> This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100,
> the emulator will theoretically see each timer 1 interrupt cleared
> within 20 ms of the previous one. But that deadline is often missed on
> my QEMU host [4].
> 
> On real Mac hardware you could observe the same scenario if a high
> priority interrupt were to sufficiently delay the timer interrupt
> handler. (This is the reason why the VIA1 interrupt priority gets
> increased from level 1 to level 5 when running on Quadras.)
> 
> Anyway, for now, the clocksource monotonicity problem in Linux/mac68k
> guests is still unresolved. Nonetheless, I think this patch series does
> improve the situation.
> 
> [1] I've also been working on some improvements to Linux/m68k based on
> Arnd Bergman's clockevent RFC patch,
> https://lore.kernel.org/linux-m68k/20201008154651.1901126-14-arnd@arndb.de/
> The idea is to add a oneshot clockevent device by making use of the
> second VIA1 timer. This approach should help mitigate the clock drift
> problem as well as assist with GENERIC_CLOCKEVENTS adoption.
> 
> [2] https://github.com/mcayland/qemu/commits/q800.upstream
> 
> [3] https://github.com/fthain/qemu/commits/via-timer/
> 
> [4] This theoretical 20 ms deadline is not missed prior to every
> backwards jump in the clocksource counter. AFAICT, that's because the
> true deadline is somewhat shorter than 20 ms.
> 
> 
> Finn Thain (10):
>    hw/mos6522: Remove get_load_time() methods and functions
>    hw/mos6522: Remove get_counter_value() methods and functions
>    hw/mos6522: Remove redundant mos6522_timer1_update() calls
>    hw/mos6522: Rename timer callback functions
>    hw/mos6522: Don't clear T1 interrupt flag on latch write
>    hw/mos6522: Implement oneshot mode
>    hw/mos6522: Fix initial timer counter reload
>    hw/mos6522: Call mos6522_update_irq() when appropriate
>    hw/mos6522: Avoid using discrepant QEMU clock values
>    hw/mos6522: Synchronize timer interrupt and timer counter
> 
>   hw/misc/mos6522.c         | 232 +++++++++++++++++---------------------
>   hw/misc/trace-events      |   2 +-
>   include/hw/misc/mos6522.h |   9 ++
>   3 files changed, 113 insertions(+), 130 deletions(-)

I just wanted to say that this patchset is obviously the result of a huge amount of 
effort trying to figure out why the clock in Linux/m68k appears to jump backwards in 
QEMU, and certainly references conditions in real hardware that is not explained in 
the datasheet in sufficient detail.

 From my perspective I'd suggest tackling the 2 main issues first: 1) ensuring that 
the clock is monotonic and 2) adding the one shot timer mode. The other fixes/updates 
can then be layered on top once we're confident that the underlying timing mechanism 
works not just for Linux/m68k but also for cuda on PPC.

I'm also slightly suspicious of the if() blocks introduced in mos6522_read() 
introduce via commit cd8843ff25d ("mos6522: fix T1 and T2 timers").

In your comments above you mention:

 > On real Mac hardware you could observe the same scenario if a high
 > priority interrupt were to sufficiently delay the timer interrupt
 > handler. (This is the reason why the VIA1 interrupt priority gets
 > increased from level 1 to level 5 when running on Quadras.)

This isn't currently true for QEMU: if you look at hw/m68k/q800.c you can see that 
the VIA interrupts are hard-wired to levels 1 and 2 respectively. You can change the 
VIA1 interrupt so it is routed to level 5 instead of level 1 with the following diff:

diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
index ac0a13060b..dc8dbe5c6f 100644
--- a/hw/m68k/q800.c
+++ b/hw/m68k/q800.c
@@ -281,7 +281,7 @@ static void q800_init(MachineState *machine)
      sysbus_realize_and_unref(sysbus, &error_fatal);
      sysbus_mmio_map(sysbus, 0, VIA_BASE);
      qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 0,
-                                qdev_get_gpio_in(glue, 0));
+                                qdev_get_gpio_in(glue, 4));
      qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 1,
                                  qdev_get_gpio_in(glue, 1));

The q800.upstream branch goes further and implements the dynamic interrupt routing 
required by A/UX but the above should be a basic test to see if the increased 
priority helps with your timing issue at all.


ATB,

Mark.
Finn Thain Aug. 28, 2021, 1:22 a.m. UTC | #4
On Tue, 24 Aug 2021, Philippe Mathieu-Daudé wrote:

> On 8/24/21 12:09 PM, Finn Thain wrote:
> 
> > On a real Quadra, accesses to the SY6522 chips are slow because they are 
> > synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow 
> > only because of the division operation in the timer count calculation.
> > 
> > This patch series improves the fidelity of the emulated chip, but the 
> > price is more division ops. I haven't tried to measure this yet.
> > 
> > The emulated 6522 still deviates from the behaviour of the real thing, 
> > however. For example, two consecutive accesses to a real 6522 timer 
> > counter can never yield the same value. This is not true of the 6522 in 
> > QEMU 6 wherein two consecutive accesses to a timer count register have 
> > been observed to yield the same value.
> > 
> > Linux is not particularly robust in the face of a 6522 that deviates 
> > from the usual behaviour. The problem presently affecting a Linux guest 
> > is that its 'via' clocksource is prone to monotonicity failure. That is, 
> > the clocksource counter can jump backwards. This can be observed by 
> > patching Linux like so:
> > 
> > diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
> > --- a/arch/m68k/mac/via.c
> > +++ b/arch/m68k/mac/via.c
> > @@ -606,6 +606,8 @@ void __init via_init_clock(void)
> >  	clocksource_register_hz(&mac_clk, VIA_CLOCK_FREQ);
> >  }
> >  
> > +static u32 prev_ticks;
> > +
> >  static u64 mac_read_clk(struct clocksource *cs)
> >  {
> >  	unsigned long flags;
> > @@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
> >  	count = count_high << 8;
> >  	ticks = VIA_TIMER_CYCLES - count;
> >  	ticks += clk_offset + clk_total;
> > +if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks, prev_ticks);
> > +prev_ticks = ticks;
> >  	local_irq_restore(flags);
> >  
> >  	return ticks;
> > 
> > This problem can be partly blamed on a 6522 design limitation, which is 
> > that the timer counter has no overflow register. Hence, if a timer 
> > counter wraps around and the kernel is late to handle the subsequent 
> > interrupt, the kernel can't account for any missed ticks.
> > 
> > On a real Quadra, the kernel mitigates this limitation by minimizing 
> > interrupt latency. But on QEMU, interrupt latency is unbounded. This 
> > can't be mitigated by the guest kernel at all and leads to clock drift. 
> > This can be observed by patching QEMU like so:
> > 
> > diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> > --- a/hw/misc/mos6522.c
> > +++ b/hw/misc/mos6522.c
> > @@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
> >          s->pcr = val;
> >          break;
> >      case VIA_REG_IFR:
> > +        if (val & T1_INT) {
> > +            static int64_t last_t1_int_cleared;
> > +            int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> > +            if (now - last_t1_int_cleared > 20000000) printf("\t%s: t1 int clear is late\n", __func__);
> > +            last_t1_int_cleared = now;
> > +        }
> >          /* reset bits */
> >          s->ifr &= ~val;
> >          mos6522_update_irq(s);
> > 
> > This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100, 
> > the emulator will theoretically see each timer 1 interrupt cleared 
> > within 20 ms of the previous one. But that deadline is often missed on 
> > my QEMU host [4].
> 
> I wonder if using QEMU ptimer wouldn't help here, instead of
> calling qemu_clock_get_ns() and doing the math by hand:
> 
> /* Starting to run with/setting counter to "0" won't trigger immediately,
>  * but after a one period for both oneshot and periodic modes.  */
> #define PTIMER_POLICY_NO_IMMEDIATE_TRIGGER  (1 << 2)
> 
> /* Starting to run with/setting counter to "0" won't re-load counter
>  * immediately, but after a one period.  */
> #define PTIMER_POLICY_NO_IMMEDIATE_RELOAD   (1 << 3)
> 
> /* Make counter value of the running timer represent the actual value and
>  * not the one less.  */
> #define PTIMER_POLICY_NO_COUNTER_ROUND_DOWN (1 << 4)
> 

It's often the case that a conversion to a new API is trivial for someone 
who understands that API. But I still haven't got my head around the 
implementation (hw/core/ptimer.c).

So I agree the ptimer API could simplify mos6522.c but adopting it is not 
trivial for me.

QEMU's 6522 device does not attempt to model the relationship between the 
"phase two" clock and timer counters and timer interrupts. I haven't 
attempted to fix these deviations at all, as that's not trivial either.

But using the ptimer API could potentially make it easier to address those 
fidelity issues.
Finn Thain Aug. 28, 2021, 4:11 a.m. UTC | #5
On Wed, 25 Aug 2021, Mark Cave-Ayland wrote:

> On 24/08/2021 11:09, Finn Thain wrote:
> 
> > This is a patch series that I started last year. The aim was to try to
> > get a monotonic clocksource for Linux/m68k guests. That aim hasn't been
> > achieved yet (for q800 machines) but I'm submitting the patch series as
> > an RFC because,
> > 
> >   - It does improve 6522 emulation fidelity.
> > 
> >   - It allows Linux/m68k to make use of the additional timer that the
> >     hardware indeed offers but which QEMU omits. This has several
> >     benefits for Linux guests [1].
> > 
> >   - I see that Mark has been working on timer emulation issues in his
> >     github repo [2] and it seems likely that MacOS, NetBSD or A/UX guests
> >     will also require better 6522 emulation.
> > 
> > To make collaboration easier these patches can also be fetched from
> > github [3].
> > 
> > On a real Quadra, accesses to the SY6522 chips are slow because they are
> > synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow
> > only because of the division operation in the timer count calculation.
> > 
> > This patch series improves the fidelity of the emulated chip, but the
> > price is more division ops. I haven't tried to measure this yet.
> > 
> > The emulated 6522 still deviates from the behaviour of the real thing,
> > however. For example, two consecutive accesses to a real 6522 timer
> > counter can never yield the same value. This is not true of the 6522 in
> > QEMU 6 wherein two consecutive accesses to a timer count register have
> > been observed to yield the same value.
> > 
> > Linux is not particularly robust in the face of a 6522 that deviates
> > from the usual behaviour. The problem presently affecting a Linux guest
> > is that its 'via' clocksource is prone to monotonicity failure. That is,
> > the clocksource counter can jump backwards. This can be observed by
> > patching Linux like so:
> > 
> > diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
> > --- a/arch/m68k/mac/via.c
> > +++ b/arch/m68k/mac/via.c
> > @@ -606,6 +606,8 @@ void __init via_init_clock(void)
> >   	clocksource_register_hz(&mac_clk, VIA_CLOCK_FREQ);
> >   }
> >   +static u32 prev_ticks;
> > +
> >   static u64 mac_read_clk(struct clocksource *cs)
> >   {
> >   	unsigned long flags;
> > @@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
> >   	count = count_high << 8;
> >   	ticks = VIA_TIMER_CYCLES - count;
> >   	ticks += clk_offset + clk_total;
> > +if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks,
> > prev_ticks);
> > +prev_ticks = ticks;
> >   	local_irq_restore(flags);
> >     	return ticks;
> > 
> > This problem can be partly blamed on a 6522 design limitation, which is
> > that the timer counter has no overflow register. Hence, if a timer
> > counter wraps around and the kernel is late to handle the subsequent
> > interrupt, the kernel can't account for any missed ticks.
> > 
> > On a real Quadra, the kernel mitigates this limitation by minimizing
> > interrupt latency. But on QEMU, interrupt latency is unbounded. This
> > can't be mitigated by the guest kernel at all and leads to clock drift.
> > This can be observed by patching QEMU like so:
> > 
> > diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> > --- a/hw/misc/mos6522.c
> > +++ b/hw/misc/mos6522.c
> > @@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t
> > val, unsigned size)
> >           s->pcr = val;
> >           break;
> >       case VIA_REG_IFR:
> > +        if (val & T1_INT) {
> > +            static int64_t last_t1_int_cleared;
> > +            int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> > +            if (now - last_t1_int_cleared > 20000000) printf("\t%s: t1 int
> > clear is late\n", __func__);
> > +            last_t1_int_cleared = now;
> > +        }
> >           /* reset bits */
> >           s->ifr &= ~val;
> >           mos6522_update_irq(s);
> > 
> > This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100,
> > the emulator will theoretically see each timer 1 interrupt cleared
> > within 20 ms of the previous one. But that deadline is often missed on
> > my QEMU host [4].
> > 
> > On real Mac hardware you could observe the same scenario if a high
> > priority interrupt were to sufficiently delay the timer interrupt
> > handler. (This is the reason why the VIA1 interrupt priority gets
> > increased from level 1 to level 5 when running on Quadras.)
> > 
> > Anyway, for now, the clocksource monotonicity problem in Linux/mac68k
> > guests is still unresolved. Nonetheless, I think this patch series does
> > improve the situation.
> > 
> > [1] I've also been working on some improvements to Linux/m68k based on
> > Arnd Bergman's clockevent RFC patch,
> > https://lore.kernel.org/linux-m68k/20201008154651.1901126-14-arnd@arndb.de/
> > The idea is to add a oneshot clockevent device by making use of the
> > second VIA1 timer. This approach should help mitigate the clock drift
> > problem as well as assist with GENERIC_CLOCKEVENTS adoption.
> > 
> > [2] https://github.com/mcayland/qemu/commits/q800.upstream
> > 
> > [3] https://github.com/fthain/qemu/commits/via-timer/
> > 
> > [4] This theoretical 20 ms deadline is not missed prior to every
> > backwards jump in the clocksource counter. AFAICT, that's because the
> > true deadline is somewhat shorter than 20 ms.
> > 
> > 
> > Finn Thain (10):
> >    hw/mos6522: Remove get_load_time() methods and functions
> >    hw/mos6522: Remove get_counter_value() methods and functions
> >    hw/mos6522: Remove redundant mos6522_timer1_update() calls
> >    hw/mos6522: Rename timer callback functions
> >    hw/mos6522: Don't clear T1 interrupt flag on latch write
> >    hw/mos6522: Implement oneshot mode
> >    hw/mos6522: Fix initial timer counter reload
> >    hw/mos6522: Call mos6522_update_irq() when appropriate
> >    hw/mos6522: Avoid using discrepant QEMU clock values
> >    hw/mos6522: Synchronize timer interrupt and timer counter
> > 
> >   hw/misc/mos6522.c         | 232 +++++++++++++++++---------------------
> >   hw/misc/trace-events      |   2 +-
> >   include/hw/misc/mos6522.h |   9 ++
> >   3 files changed, 113 insertions(+), 130 deletions(-)
> 
> I just wanted to say that this patchset is obviously the result of a huge
> amount of effort trying to figure out why the clock in Linux/m68k appears to
> jump backwards in QEMU, and certainly references conditions in real hardware
> that is not explained in the datasheet in sufficient detail.
> 
> From my perspective I'd suggest tackling the 2 main issues first: 1) ensuring
> that the clock is monotonic and 2) adding the one shot timer mode. 

Well, that's how I arrived at this series. The oneshot timer mode was 
needed in order to implement a oneshot clockevent device.

> The other fixes/updates can then be layered on top once we're confident 
> that the underlying timing mechanism works not just for Linux/m68k but 
> also for cuda on PPC.
> 

Fixes normally go at the start of the series, new features at the end. 
This helps review, bisection and backporting.

> I'm also slightly suspicious of the if() blocks introduced in mos6522_read()
> introduce via commit cd8843ff25d ("mos6522: fix T1 and T2 timers").
> 

Fair enough... I see that you also raised this in your reply to patch 
6/10; I'll have to take another look at that code and respond in that 
thread. That patch was written some time ago and I've forgotten the 
rationale. Perhaps Laurent will comment on commit cd8843ff25d.

> In your comments above you mention:
> 
> > On real Mac hardware you could observe the same scenario if a high
> > priority interrupt were to sufficiently delay the timer interrupt
> > handler. (This is the reason why the VIA1 interrupt priority gets
> > increased from level 1 to level 5 when running on Quadras.)
> 
> This isn't currently true for QEMU: if you look at hw/m68k/q800.c you can see
> that the VIA interrupts are hard-wired to levels 1 and 2 respectively. You can
> change the VIA1 interrupt so it is routed to level 5 instead of level 1 with
> the following diff:
> 
> diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
> index ac0a13060b..dc8dbe5c6f 100644
> --- a/hw/m68k/q800.c
> +++ b/hw/m68k/q800.c
> @@ -281,7 +281,7 @@ static void q800_init(MachineState *machine)
>      sysbus_realize_and_unref(sysbus, &error_fatal);
>      sysbus_mmio_map(sysbus, 0, VIA_BASE);
>      qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 0,
> -                                qdev_get_gpio_in(glue, 0));
> +                                qdev_get_gpio_in(glue, 4));
>      qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 1,
>                                  qdev_get_gpio_in(glue, 1));
> 
> The q800.upstream branch goes further and implements the dynamic interrupt
> routing required by A/UX but the above should be a basic test to see if the
> increased priority helps with your timing issue at all.
> 

That would only help if the handlers for interrupts having greater 
priority (IPL > 1) were slow. But the problem is not that. The root cause 
here is unbounded interrupt latency in general. A real quadra does not 
inflict that on the kernel.
Mark Cave-Ayland Aug. 31, 2021, 9:14 p.m. UTC | #6
On 28/08/2021 02:22, Finn Thain wrote:

>> On 8/24/21 12:09 PM, Finn Thain wrote:
>>
>>> On a real Quadra, accesses to the SY6522 chips are slow because they are
>>> synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow
>>> only because of the division operation in the timer count calculation.
>>>
>>> This patch series improves the fidelity of the emulated chip, but the
>>> price is more division ops. I haven't tried to measure this yet.
>>>
>>> The emulated 6522 still deviates from the behaviour of the real thing,
>>> however. For example, two consecutive accesses to a real 6522 timer
>>> counter can never yield the same value. This is not true of the 6522 in
>>> QEMU 6 wherein two consecutive accesses to a timer count register have
>>> been observed to yield the same value.
>>>
>>> Linux is not particularly robust in the face of a 6522 that deviates
>>> from the usual behaviour. The problem presently affecting a Linux guest
>>> is that its 'via' clocksource is prone to monotonicity failure. That is,
>>> the clocksource counter can jump backwards. This can be observed by
>>> patching Linux like so:
>>>
>>> diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
>>> --- a/arch/m68k/mac/via.c
>>> +++ b/arch/m68k/mac/via.c
>>> @@ -606,6 +606,8 @@ void __init via_init_clock(void)
>>>   	clocksource_register_hz(&mac_clk, VIA_CLOCK_FREQ);
>>>   }
>>>   
>>> +static u32 prev_ticks;
>>> +
>>>   static u64 mac_read_clk(struct clocksource *cs)
>>>   {
>>>   	unsigned long flags;
>>> @@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
>>>   	count = count_high << 8;
>>>   	ticks = VIA_TIMER_CYCLES - count;
>>>   	ticks += clk_offset + clk_total;
>>> +if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks, prev_ticks);
>>> +prev_ticks = ticks;
>>>   	local_irq_restore(flags);
>>>   
>>>   	return ticks;
>>>
>>> This problem can be partly blamed on a 6522 design limitation, which is
>>> that the timer counter has no overflow register. Hence, if a timer
>>> counter wraps around and the kernel is late to handle the subsequent
>>> interrupt, the kernel can't account for any missed ticks.
>>>
>>> On a real Quadra, the kernel mitigates this limitation by minimizing
>>> interrupt latency. But on QEMU, interrupt latency is unbounded. This
>>> can't be mitigated by the guest kernel at all and leads to clock drift.
>>> This can be observed by patching QEMU like so:
>>>
>>> diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
>>> --- a/hw/misc/mos6522.c
>>> +++ b/hw/misc/mos6522.c
>>> @@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>>>           s->pcr = val;
>>>           break;
>>>       case VIA_REG_IFR:
>>> +        if (val & T1_INT) {
>>> +            static int64_t last_t1_int_cleared;
>>> +            int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>>> +            if (now - last_t1_int_cleared > 20000000) printf("\t%s: t1 int clear is late\n", __func__);
>>> +            last_t1_int_cleared = now;
>>> +        }
>>>           /* reset bits */
>>>           s->ifr &= ~val;
>>>           mos6522_update_irq(s);
>>>
>>> This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100,
>>> the emulator will theoretically see each timer 1 interrupt cleared
>>> within 20 ms of the previous one. But that deadline is often missed on
>>> my QEMU host [4].
>>
>> I wonder if using QEMU ptimer wouldn't help here, instead of
>> calling qemu_clock_get_ns() and doing the math by hand:
>>
>> /* Starting to run with/setting counter to "0" won't trigger immediately,
>>   * but after a one period for both oneshot and periodic modes.  */
>> #define PTIMER_POLICY_NO_IMMEDIATE_TRIGGER  (1 << 2)
>>
>> /* Starting to run with/setting counter to "0" won't re-load counter
>>   * immediately, but after a one period.  */
>> #define PTIMER_POLICY_NO_IMMEDIATE_RELOAD   (1 << 3)
>>
>> /* Make counter value of the running timer represent the actual value and
>>   * not the one less.  */
>> #define PTIMER_POLICY_NO_COUNTER_ROUND_DOWN (1 << 4)
>>
> 
> It's often the case that a conversion to a new API is trivial for someone
> who understands that API. But I still haven't got my head around the
> implementation (hw/core/ptimer.c).
> 
> So I agree the ptimer API could simplify mos6522.c but adopting it is not
> trivial for me.
> 
> QEMU's 6522 device does not attempt to model the relationship between the
> "phase two" clock and timer counters and timer interrupts. I haven't
> attempted to fix these deviations at all, as that's not trivial either.
> 
> But using the ptimer API could potentially make it easier to address those
> fidelity issues.

I had another look at the mos6522 code this evening, and certainly whilst there are 
things that could be improved, I'm still puzzled as to how you would see time going 
backwards:

- qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) eventually ends up calling cpu_get_clock() 
where the comment for the function is "Return the monotonic time elapsed in VM"

- get_next_irq_time() calculates the counter value for the current clock, adds the 
value of the counter (compensating for wraparound) and calculates the clock for the 
next IRQ

- Using the current clock instead of ti->next_irq_time in the timer callbacks should 
compensate for any latency when the callback is invoked

You mentioned that the OS may compensate for the fact that the 6522 doesn't have an 
overflow flag: can you explain more as to how this works in Linux? Is the problem 
here that even if you read the counter value in the interrupt handler to work out the 
latency, the counter may still have already wrapped?


ATB,

Mark.
Finn Thain Aug. 31, 2021, 10:44 p.m. UTC | #7
On Tue, 31 Aug 2021, Mark Cave-Ayland wrote:

> On 28/08/2021 02:22, Finn Thain wrote:
> 
> > > On 8/24/21 12:09 PM, Finn Thain wrote:
> > > 
> > > > On a real Quadra, accesses to the SY6522 chips are slow because they are
> > > > synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow
> > > > only because of the division operation in the timer count calculation.
> > > > 
> > > > This patch series improves the fidelity of the emulated chip, but the
> > > > price is more division ops. I haven't tried to measure this yet.
> > > > 
> > > > The emulated 6522 still deviates from the behaviour of the real thing,
> > > > however. For example, two consecutive accesses to a real 6522 timer
> > > > counter can never yield the same value. This is not true of the 6522 in
> > > > QEMU 6 wherein two consecutive accesses to a timer count register have
> > > > been observed to yield the same value.
> > > > 
> > > > Linux is not particularly robust in the face of a 6522 that deviates
> > > > from the usual behaviour. The problem presently affecting a Linux guest
> > > > is that its 'via' clocksource is prone to monotonicity failure. That is,
> > > > the clocksource counter can jump backwards. This can be observed by
> > > > patching Linux like so:
> > > > 
> > > > diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
> > > > --- a/arch/m68k/mac/via.c
> > > > +++ b/arch/m68k/mac/via.c
> > > > @@ -606,6 +606,8 @@ void __init via_init_clock(void)
> > > >   	clocksource_register_hz(&mac_clk, VIA_CLOCK_FREQ);
> > > >   }
> > > >   +static u32 prev_ticks;
> > > > +
> > > >   static u64 mac_read_clk(struct clocksource *cs)
> > > >   {
> > > >   	unsigned long flags;
> > > > @@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
> > > >   	count = count_high << 8;
> > > >   	ticks = VIA_TIMER_CYCLES - count;
> > > >   	ticks += clk_offset + clk_total;
> > > > +if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks,
> > > > prev_ticks);
> > > > +prev_ticks = ticks;
> > > >   	local_irq_restore(flags);
> > > >     	return ticks;
> > > > 
> > > > This problem can be partly blamed on a 6522 design limitation, which is
> > > > that the timer counter has no overflow register. Hence, if a timer
> > > > counter wraps around and the kernel is late to handle the subsequent
> > > > interrupt, the kernel can't account for any missed ticks.
> > > > 
> > > > On a real Quadra, the kernel mitigates this limitation by minimizing
> > > > interrupt latency. But on QEMU, interrupt latency is unbounded. This
> > > > can't be mitigated by the guest kernel at all and leads to clock drift.
> > > > This can be observed by patching QEMU like so:
> > > > 
> > > > diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> > > > --- a/hw/misc/mos6522.c
> > > > +++ b/hw/misc/mos6522.c
> > > > @@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr,
> > > > uint64_t val, unsigned size)
> > > >           s->pcr = val;
> > > >           break;
> > > >       case VIA_REG_IFR:
> > > > +        if (val & T1_INT) {
> > > > +            static int64_t last_t1_int_cleared;
> > > > +            int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> > > > +            if (now - last_t1_int_cleared > 20000000) printf("\t%s: t1
> > > > int clear is late\n", __func__);
> > > > +            last_t1_int_cleared = now;
> > > > +        }
> > > >           /* reset bits */
> > > >           s->ifr &= ~val;
> > > >           mos6522_update_irq(s);
> > > > 
> > > > This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100,
> > > > the emulator will theoretically see each timer 1 interrupt cleared
> > > > within 20 ms of the previous one. But that deadline is often missed on
> > > > my QEMU host [4].
> > > 
> > > I wonder if using QEMU ptimer wouldn't help here, instead of
> > > calling qemu_clock_get_ns() and doing the math by hand:
> > > 
> > > /* Starting to run with/setting counter to "0" won't trigger immediately,
> > >   * but after a one period for both oneshot and periodic modes.  */
> > > #define PTIMER_POLICY_NO_IMMEDIATE_TRIGGER  (1 << 2)
> > > 
> > > /* Starting to run with/setting counter to "0" won't re-load counter
> > >   * immediately, but after a one period.  */
> > > #define PTIMER_POLICY_NO_IMMEDIATE_RELOAD   (1 << 3)
> > > 
> > > /* Make counter value of the running timer represent the actual value and
> > >   * not the one less.  */
> > > #define PTIMER_POLICY_NO_COUNTER_ROUND_DOWN (1 << 4)
> > > 
> > 
> > It's often the case that a conversion to a new API is trivial for someone
> > who understands that API. But I still haven't got my head around the
> > implementation (hw/core/ptimer.c).
> > 
> > So I agree the ptimer API could simplify mos6522.c but adopting it is not
> > trivial for me.
> > 
> > QEMU's 6522 device does not attempt to model the relationship between the
> > "phase two" clock and timer counters and timer interrupts. I haven't
> > attempted to fix these deviations at all, as that's not trivial either.
> > 
> > But using the ptimer API could potentially make it easier to address those
> > fidelity issues.
> 
> I had another look at the mos6522 code this evening, and certainly whilst
> there are things that could be improved, I'm still puzzled as to how you would
> see time going backwards:
> 

I didn't say time goes backwards, I said the "clocksource counter can jump 
backwards". You can easily observe this for yourself if you apply the 
patch above.

> - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) eventually ends up calling 
> cpu_get_clock() where the comment for the function is "Return the 
> monotonic time elapsed in VM"
> 

AFAICT, QEMU_CLOCK_VIRTUAL is monotonic. I confirmed last year when 
debugging mos6522.c.

> - get_next_irq_time() calculates the counter value for the current 
> clock, adds the value of the counter (compensating for wraparound) and 
> calculates the clock for the next IRQ
> 
> - Using the current clock instead of ti->next_irq_time in the timer 
> callbacks should compensate for any latency when the callback is invoked
> 

That's the theory. In practice, however, there are known flaws in the 
present implementation which this series addresses.

> You mentioned that the OS may compensate for the fact that the 6522 
> doesn't have an overflow flag: can you explain more as to how this works 
> in Linux? Is the problem here that even if you read the counter value in 
> the interrupt handler to work out the latency, the counter may still 
> have already wrapped?
> 

Unbounded interrupt latency means that the guest kernel can not tell how 
many times the timer counter wrapped between timer interrupts. There will 
always be a race condition in the guest kernel.

It's possible to code around this in the kernel with a ratchet to prevent 
backwards movement in the clocksource counter (I had to do something 
similar for the Atari clocksource driver) but that would address only one 
symptom.

And that kind of hack certainly doesn't help any other guest operating 
system that's adversely affected by QEMU's design limitations (which is 
behavioural simulation rather than gate-level simulation).

This patch series doesn't solve the timer problem but it does improve the 
situation (for the reasons given in the cover letter).
Finn Thain Sept. 1, 2021, 2:20 a.m. UTC | #8
On Tue, 31 Aug 2021, Mark Cave-Ayland wrote:

> You mentioned that the OS may compensate for the fact that the 6522 
> doesn't have an overflow flag: can you explain more as to how this works 
> in Linux?

When running on real hardware, Linux/mac68k does so by

 - Elevating the interrupt priority of VIA 1 so that other drivers do not 
   interfere with timekeeping

 - Constraining intervals during which the IPL is kept elevated (i.e. 
   local_irq_disable/enable).

When runing on QEMU, none of that is sufficient and the Linux/mac68k 
kernel can do very little to influence interrupt latency.

Linux ports to other platforms typically have multiple timers and counters 
with which to implement reliable clocksource devices.

When running on other virtualization platforms, Linux may solve the 
problem using a paravirtual clock device. Please see 
CONFIG_PARAVIRT_CLOCK, arch/x86/include/asm/pvclock-abi.h, 
arch/x86/kernel/pvclock.c, arch/x86/include/asm/vdso/gettimeofday.h 
arch/x86/xen/time.c and so on.
Mark Cave-Ayland Sept. 1, 2021, 7:57 a.m. UTC | #9
On 31/08/2021 23:44, Finn Thain wrote:

>> You mentioned that the OS may compensate for the fact that the 6522
>> doesn't have an overflow flag: can you explain more as to how this works
>> in Linux? Is the problem here that even if you read the counter value in
>> the interrupt handler to work out the latency, the counter may still
>> have already wrapped?
>>
> 
> Unbounded interrupt latency means that the guest kernel can not tell how
> many times the timer counter wrapped between timer interrupts. There will
> always be a race condition in the guest kernel.
> 
> It's possible to code around this in the kernel with a ratchet to prevent
> backwards movement in the clocksource counter (I had to do something
> similar for the Atari clocksource driver) but that would address only one
> symptom.
> 
> And that kind of hack certainly doesn't help any other guest operating
> system that's adversely affected by QEMU's design limitations (which is
> behavioural simulation rather than gate-level simulation).
> 
> This patch series doesn't solve the timer problem but it does improve the
> situation (for the reasons given in the cover letter).

I had a quick look at your via-timer branch at 
https://github.com/fthain/qemu/commits/via-timer and spotted that your work is based 
upon the v6.0 release. Before digging further into this, can you try using vanilla 
git master or the v6.1 tag instead as there are a couple of related fixes from the 
6.1 development cycle that you are missing:

82ff856fe7 ("mac_via: fix 60Hz VIA1 timer interval")
- This fixed the 60Hz VIA1 timer interval (it was running 100x too fast) and so could 
limit the virtual CPUs ability to process the 100Hz timer.

30ca7eddc4 ("mac_via: remove VIA1 timer optimisations")
- This fixed an issue whereby constant writes to portB would reset the 60Hz VIA1 
timer and 1Hz VIA1 timer (Probably not relevant here, but worth having).


ATB,

Mark.
Mark Cave-Ayland Sept. 1, 2021, 8:06 a.m. UTC | #10
On 01/09/2021 08:57, Mark Cave-Ayland wrote:

> I had a quick look at your via-timer branch at 
> https://github.com/fthain/qemu/commits/via-timer and spotted that your work is based 
> upon the v6.0 release. Before digging further into this, can you try using vanilla 
> git master or the v6.1 tag instead as there are a couple of related fixes from the 
> 6.1 development cycle that you are missing:
> 
> 82ff856fe7 ("mac_via: fix 60Hz VIA1 timer interval")
> - This fixed the 60Hz VIA1 timer interval (it was running 100x too fast) and so could 
> limit the virtual CPUs ability to process the 100Hz timer.
> 
> 30ca7eddc4 ("mac_via: remove VIA1 timer optimisations")
> - This fixed an issue whereby constant writes to portB would reset the 60Hz VIA1 
> timer and 1Hz VIA1 timer (Probably not relevant here, but worth having).

Ah my mistake, clearly not enough caffeine this morning - looks like these patches 
did make v6.0 after all :/

I'll have a go at some basic timer measurements using your patch to see what sort of 
numbers I get for the latency here. Obviously QEMU doesn't guarantee response times 
but over 20ms does seem high.


ATB,

Mark.
Mark Cave-Ayland Sept. 10, 2021, 5:29 p.m. UTC | #11
On 01/09/2021 09:06, Mark Cave-Ayland wrote:

> I'll have a go at some basic timer measurements using your patch to see what sort of 
> numbers I get for the latency here. Obviously QEMU doesn't guarantee response times 
> but over 20ms does seem high.

I was able to spend some time today looking at this and came up with 
https://github.com/mcayland/qemu/tree/via-hacks with the aim of starting with your 
patches to reduce the calls to the timer update functions (to reduce jitter) and 
fixing up the places where mos6522_update_irq() isn't called.

That seemed okay, but the part I'm struggling with at the moment is removing the 
re-assertion of the timer interrupt if the timer has expired when any of the 
registers are read i.e.

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index 3c33cbebde..9884d7e178 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -229,16 +229,7 @@ uint64_t mos6522_read(void *opaque, hwaddr addr, unsigned size)
  {
      MOS6522State *s = opaque;
      uint32_t val;
-    int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);

-    if (now >= s->timers[0].next_irq_time) {
-        mos6522_timer1_update(s, &s->timers[0], now);
-        s->ifr |= T1_INT;
-    }
-    if (now >= s->timers[1].next_irq_time) {
-        mos6522_timer2_update(s, &s->timers[1], now);
-        s->ifr |= T2_INT;
-    }
      switch (addr) {
      case VIA_REG_B:
          val = s->b;

If I apply this then I see a hang in about roughly 1 in 10 boots. Poking it with a 
debugger shows that the VIA1 timer interrupts are constantly being fired as IER and 
IFR have the timer 1 bit set, but it is being filtered out by SR being set to 0x2700 
on the CPU.

The existing code above isn't correct since T1_INT should only be asserted by the 
expiry of the timer 1 via mos6522_timer1(), so I'm wondering if this is tickling a 
CPU bug somewhere? In what circumstances could interrupts be disabled via SR but not 
enabled again?


ATB,

Mark.
Finn Thain Sept. 11, 2021, 12:08 a.m. UTC | #12
On Fri, 10 Sep 2021, Mark Cave-Ayland wrote:

> On 01/09/2021 09:06, Mark Cave-Ayland wrote:
> 
> > I'll have a go at some basic timer measurements using your patch to 
> > see what sort of numbers I get for the latency here. Obviously QEMU 
> > doesn't guarantee response times but over 20ms does seem high.
> 
> I was able to spend some time today looking at this and came up with 
> https://github.com/mcayland/qemu/tree/via-hacks with the aim of starting 
> with your patches to reduce the calls to the timer update functions (to 
> reduce jitter) and fixing up the places where mos6522_update_irq() isn't 
> called.
> 

What kind of guest was that? What impact does jitter have on that guest? 
Was the jitter measurement increased, decreased or unchanged by this patch 
series?

> That seemed okay, but the part I'm struggling with at the moment is 
> removing the re-assertion of the timer interrupt if the timer has 
> expired when any of the registers are read i.e.
> 
> diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> index 3c33cbebde..9884d7e178 100644
> --- a/hw/misc/mos6522.c
> +++ b/hw/misc/mos6522.c
> @@ -229,16 +229,7 @@ uint64_t mos6522_read(void *opaque, hwaddr addr, unsigned
> size)
>  {
>      MOS6522State *s = opaque;
>      uint32_t val;
> -    int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> 
> -    if (now >= s->timers[0].next_irq_time) {
> -        mos6522_timer1_update(s, &s->timers[0], now);
> -        s->ifr |= T1_INT;
> -    }
> -    if (now >= s->timers[1].next_irq_time) {
> -        mos6522_timer2_update(s, &s->timers[1], now);
> -        s->ifr |= T2_INT;
> -    }
>      switch (addr) {
>      case VIA_REG_B:
>          val = s->b;
> 
> If I apply this then I see a hang in about roughly 1 in 10 boots. Poking 
> it with a debugger shows that the VIA1 timer interrupts are constantly 
> being fired as IER and IFR have the timer 1 bit set, but it is being 
> filtered out by SR being set to 0x2700 on the CPU.
> 
> The existing code above isn't correct since T1_INT should only be 
> asserted by the expiry of the timer 1 via mos6522_timer1(), so I'm 
> wondering if this is tickling a CPU bug somewhere? In what circumstances 
> could interrupts be disabled via SR but not enabled again?
> 

The code you're patching here was part of Laurent's commit cd8843ff25 
("mos6522: fix T1 and T2 timers"). You've mentioned that elsewhere in this 
thread. My response is the same as before: this patch series removes that 
code so it's moot.

Please test the patch series I sent, unmodified. If you find a problem 
with my code, please do report it here. I believe that you will see no 
hangs at all.
diff mbox

Patch

diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
--- a/arch/m68k/mac/via.c
+++ b/arch/m68k/mac/via.c
@@ -606,6 +606,8 @@  void __init via_init_clock(void)
 	clocksource_register_hz(&mac_clk, VIA_CLOCK_FREQ);
 }
 
+static u32 prev_ticks;
+
 static u64 mac_read_clk(struct clocksource *cs)
 {
 	unsigned long flags;
@@ -631,6 +633,8 @@  static u64 mac_read_clk(struct clocksource *cs)
 	count = count_high << 8;
 	ticks = VIA_TIMER_CYCLES - count;
 	ticks += clk_offset + clk_total;
+if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks, prev_ticks);
+prev_ticks = ticks;
 	local_irq_restore(flags);
 
 	return ticks;

This problem can be partly blamed on a 6522 design limitation, which is 
that the timer counter has no overflow register. Hence, if a timer 
counter wraps around and the kernel is late to handle the subsequent 
interrupt, the kernel can't account for any missed ticks.

On a real Quadra, the kernel mitigates this limitation by minimizing 
interrupt latency. But on QEMU, interrupt latency is unbounded. This 
can't be mitigated by the guest kernel at all and leads to clock drift. 
This can be observed by patching QEMU like so:

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -379,6 +379,12 @@  void mos6522_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
         s->pcr = val;
         break;
     case VIA_REG_IFR:
+        if (val & T1_INT) {
+            static int64_t last_t1_int_cleared;
+            int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+            if (now - last_t1_int_cleared > 20000000) printf("\t%s: t1 int clear is late\n", __func__);
+            last_t1_int_cleared = now;
+        }
         /* reset bits */
         s->ifr &= ~val;
         mos6522_update_irq(s);