diff mbox

[1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

Message ID 1367436460-10183-1-git-send-email-stefano.stabellini@eu.citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Stefano Stabellini May 1, 2013, 7:27 p.m. UTC
Introduce a function, called arch_timer_stolen_ticks, called from the
arch_timer interrupt handler to account for stolen ticks.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: linux@arm.linux.org.uk
CC: john.stultz@linaro.org
CC: catalin.marinas@arm.com
CC: marc.zyngier@arm.com
---
 arch/arm/include/asm/arch_timer.h    |    5 +++++
 drivers/clocksource/arm_arch_timer.c |    6 ++++++
 2 files changed, 11 insertions(+), 0 deletions(-)

Comments

Christopher Covington May 1, 2013, 8:36 p.m. UTC | #1
Hi Stefano,

On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
> Introduce a function, called arch_timer_stolen_ticks, called from the
> arch_timer interrupt handler to account for stolen ticks.

[...]

> diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
> index 7ade91d..30db413 100644
> --- a/arch/arm/include/asm/arch_timer.h
> +++ b/arch/arm/include/asm/arch_timer.h
> @@ -13,6 +13,11 @@
>  int arch_timer_of_register(void);
>  int arch_timer_sched_clock_init(void);
>  
> +/* per-platform function to calculate stolen ticks (clock cycles stolen
> + * to the vcpu by the hypervisor).

Stolen from the vcpu by the hypervisor?

Is the hypervisor adjusting the Virtual Offset Register?

[...]

Thanks,
Christopher
Ian Campbell May 2, 2013, 8:19 a.m. UTC | #2
On Wed, 2013-05-01 at 21:36 +0100, Christopher Covington wrote:
> Hi Stefano,
> 
> On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
> > Introduce a function, called arch_timer_stolen_ticks, called from the
> > arch_timer interrupt handler to account for stolen ticks.
> 
> [...]
> 
> > diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
> > index 7ade91d..30db413 100644
> > --- a/arch/arm/include/asm/arch_timer.h
> > +++ b/arch/arm/include/asm/arch_timer.h
> > @@ -13,6 +13,11 @@
> >  int arch_timer_of_register(void);
> >  int arch_timer_sched_clock_init(void);
> >  
> > +/* per-platform function to calculate stolen ticks (clock cycles stolen
> > + * to the vcpu by the hypervisor).
> 
> Stolen from the vcpu by the hypervisor?

Stolen is time where the VCPU wants to be running bit isn't because the
hypervisor has descheduled it, e.g. because another VCPU is being run.
So yes, Stefano meant "from".

> Is the hypervisor adjusting the Virtual Offset Register?

The virtual offset register is useful when a VCPU is migrated to another
system to account for the differences in physical time on the two hosts
but isn't useful for accounting for stolen time while running on a
single host.

e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
where the timer goes off. Nor does it make sense to require that the
guest actually be running for 5 before injecting the timer because that
would mean real time elapsed time for the timer would be 5+3 in the case
where 3 are stolen.

So the virtual timer should appear to have been running even while time
is being stolen and therefore stolen time needs to be accounted via some
other means.

Ian.
Christopher Covington May 2, 2013, 9:33 p.m. UTC | #3
Hi Ian,

On 05/02/2013 04:19 AM, Ian Campbell wrote:
> On Wed, 2013-05-01 at 21:36 +0100, Christopher Covington wrote:
>> Hi Stefano,
>>
>> On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
>>> Introduce a function, called arch_timer_stolen_ticks, called from the
>>> arch_timer interrupt handler to account for stolen ticks.
>>
>> [...]
>>
>>> diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
>>> index 7ade91d..30db413 100644
>>> --- a/arch/arm/include/asm/arch_timer.h
>>> +++ b/arch/arm/include/asm/arch_timer.h
>>> @@ -13,6 +13,11 @@
>>>  int arch_timer_of_register(void);
>>>  int arch_timer_sched_clock_init(void);
>>>  
>>> +/* per-platform function to calculate stolen ticks (clock cycles stolen
>>> + * to the vcpu by the hypervisor).

[...]

>> Is the hypervisor adjusting the Virtual Offset Register?
> 
> The virtual offset register is useful when a VCPU is migrated to another
> system to account for the differences in physical time on the two hosts
> but isn't useful for accounting for stolen time while running on a
> single host.
> 
> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
> where the timer goes off. Nor does it make sense to require that the
> guest actually be running for 5 before injecting the timer because that
> would mean real time elapsed time for the timer would be 5+3 in the case
> where 3 are stolen.

This is a bit of an aside, but I think that hiding time spent at higher
privilege levels can be a quite sensible approach to timekeeping in a
virtualized environment, but I understand that it's not the approach taken
with Xen, and as you pointed out above, adjusting the Virtual Offset Register
by itself isn't enough to implement that approach.

> So the virtual timer should appear to have been running even while time
> is being stolen and therefore stolen time needs to be accounted via some
> other means.

Something that's not currently obvious to me is that given that the stolen
cycle accounting should be done, what makes the architected timer interrupt
handler the ideal place to do it?

Thanks,
Christopher
Stefano Stabellini May 3, 2013, 10:43 a.m. UTC | #4
On Thu, 2 May 2013, Christopher Covington wrote:
> > So the virtual timer should appear to have been running even while time
> > is being stolen and therefore stolen time needs to be accounted via some
> > other means.
> 
> Something that's not currently obvious to me is that given that the stolen
> cycle accounting should be done, what makes the architected timer interrupt
> handler the ideal place to do it?

That is a good question and I would appreciate suggestions to improve
the patch.

Given that Xen x86 and ia64 does stolen time accounting from the timer
interrupt handler:

arch/x86/xen/time.c:xen_timer_interrupt
arch/ia64/kernel/time.c:timer_interrupt

and given that the arch_timer is the only timer used by Xen on ARM and
that it includes a virt_timer that is made on purpose to be used by
virtual machines, I thought that it might be a good place for it.

I also thought that doing it this way, KVM should be able to reuse the
same hook.
Marc Zyngier May 3, 2013, 10:54 a.m. UTC | #5
On 03/05/13 11:43, Stefano Stabellini wrote:
> On Thu, 2 May 2013, Christopher Covington wrote:
>>> So the virtual timer should appear to have been running even while time
>>> is being stolen and therefore stolen time needs to be accounted via some
>>> other means.
>>
>> Something that's not currently obvious to me is that given that the stolen
>> cycle accounting should be done, what makes the architected timer interrupt
>> handler the ideal place to do it?
> 
> That is a good question and I would appreciate suggestions to improve
> the patch.
> 
> Given that Xen x86 and ia64 does stolen time accounting from the timer
> interrupt handler:
> 
> arch/x86/xen/time.c:xen_timer_interrupt
> arch/ia64/kernel/time.c:timer_interrupt
> 
> and given that the arch_timer is the only timer used by Xen on ARM and
> that it includes a virt_timer that is made on purpose to be used by
> virtual machines, I thought that it might be a good place for it.
> 
> I also thought that doing it this way, KVM should be able to reuse the
> same hook.

Indeed. I just need to understand how time stealing works there ;-).

Now, KVM is not necessarily limited to arch_timers, and we've run KVM
using a QEMU-provided timer in the past. Can you think of a more generic
location for this hook? Possibly something that would satisfy the
requirements of other architectures while we're at it?

	M.
Stefano Stabellini May 5, 2013, 4:47 p.m. UTC | #6
On Fri, 3 May 2013, Marc Zyngier wrote:
> On 03/05/13 11:43, Stefano Stabellini wrote:
> > On Thu, 2 May 2013, Christopher Covington wrote:
> >>> So the virtual timer should appear to have been running even while time
> >>> is being stolen and therefore stolen time needs to be accounted via some
> >>> other means.
> >>
> >> Something that's not currently obvious to me is that given that the stolen
> >> cycle accounting should be done, what makes the architected timer interrupt
> >> handler the ideal place to do it?
> > 
> > That is a good question and I would appreciate suggestions to improve
> > the patch.
> > 
> > Given that Xen x86 and ia64 does stolen time accounting from the timer
> > interrupt handler:
> > 
> > arch/x86/xen/time.c:xen_timer_interrupt
> > arch/ia64/kernel/time.c:timer_interrupt
> > 
> > and given that the arch_timer is the only timer used by Xen on ARM and
> > that it includes a virt_timer that is made on purpose to be used by
> > virtual machines, I thought that it might be a good place for it.
> > 
> > I also thought that doing it this way, KVM should be able to reuse the
> > same hook.
> 
> Indeed. I just need to understand how time stealing works there ;-).
> 
> Now, KVM is not necessarily limited to arch_timers, and we've run KVM
> using a QEMU-provided timer in the past. Can you think of a more generic
> location for this hook? Possibly something that would satisfy the
> requirements of other architectures while we're at it?

Probably the best option would be to reuse

kernel/sched/cputime.c:steal_account_process_tick

but that also means introducing CONFIG_PARAVIRT on arm.
I am up for that, what do you think?
Konrad Rzeszutek Wilk May 6, 2013, 2:35 p.m. UTC | #7
> > e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
> > would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
> > where the timer goes off. Nor does it make sense to require that the
> > guest actually be running for 5 before injecting the timer because that
> > would mean real time elapsed time for the timer would be 5+3 in the case
> > where 3 are stolen.
> 
> This is a bit of an aside, but I think that hiding time spent at higher
> privilege levels can be a quite sensible approach to timekeeping in a
> virtualized environment, but I understand that it's not the approach taken
> with Xen, and as you pointed out above, adjusting the Virtual Offset Register
> by itself isn't enough to implement that approach.

This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for
implementation. In the user-space, the entry in 'top' of "stolen" (%st)
is for this exact value.
Christopher Covington May 6, 2013, 2:55 p.m. UTC | #8
On 05/05/2013 12:47 PM, Stefano Stabellini wrote:
> On Fri, 3 May 2013, Marc Zyngier wrote:
>> On 03/05/13 11:43, Stefano Stabellini wrote:
>>> On Thu, 2 May 2013, Christopher Covington wrote:
>>>>> So the virtual timer should appear to have been running even while time
>>>>> is being stolen and therefore stolen time needs to be accounted via some
>>>>> other means.
>>>>
>>>> Something that's not currently obvious to me is that given that the stolen
>>>> cycle accounting should be done, what makes the architected timer interrupt
>>>> handler the ideal place to do it?
>>>
>>> That is a good question and I would appreciate suggestions to improve
>>> the patch.
>>>
>>> Given that Xen x86 and ia64 does stolen time accounting from the timer
>>> interrupt handler:
>>>
>>> arch/x86/xen/time.c:xen_timer_interrupt
>>> arch/ia64/kernel/time.c:timer_interrupt
>>>
>>> and given that the arch_timer is the only timer used by Xen on ARM and
>>> that it includes a virt_timer that is made on purpose to be used by
>>> virtual machines, I thought that it might be a good place for it.
>>>
>>> I also thought that doing it this way, KVM should be able to reuse the
>>> same hook.
>>
>> Indeed. I just need to understand how time stealing works there ;-).
>>
>> Now, KVM is not necessarily limited to arch_timers, and we've run KVM
>> using a QEMU-provided timer in the past. Can you think of a more generic
>> location for this hook? Possibly something that would satisfy the
>> requirements of other architectures while we're at it?
> 
> Probably the best option would be to reuse
> 
> kernel/sched/cputime.c:steal_account_process_tick
> 
> but that also means introducing CONFIG_PARAVIRT on arm.
> I am up for that, what do you think?

That sounds like a good move to me.

Christopher
Christopher Covington May 7, 2013, 4:17 p.m. UTC | #9
Hi Konrad,

On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote:
>>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
>>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
>>> where the timer goes off. Nor does it make sense to require that the
>>> guest actually be running for 5 before injecting the timer because that
>>> would mean real time elapsed time for the timer would be 5+3 in the case
>>> where 3 are stolen.
>>
>> This is a bit of an aside, but I think that hiding time spent at higher
>> privilege levels can be a quite sensible approach to timekeeping in a
>> virtualized environment, but I understand that it's not the approach taken
>> with Xen, and as you pointed out above, adjusting the Virtual Offset Register
>> by itself isn't enough to implement that approach.
> 
> This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for
> implementation. In the user-space, the entry in 'top' of "stolen" (%st)
> is for this exact value.

I may have been unclear with my terms, sorry. When I refer to time being
"hidden", I mean that kernel level software (supervisor mode, EL1) cannot
detect the passage of that time at all. I don't know whether this would really
work, but I imagine one might be able to get close with the current
virtualization facilities for ARM.

Am I correct in interpreting that what you're referring to is the deployment
of paravirtualization code that ensures (observable) "stolen" time is factored
into kernel decision-making?

Thanks,
Christopher
Stefano Stabellini May 8, 2013, 11:19 a.m. UTC | #10
On Tue, 7 May 2013, Christopher Covington wrote:
> Hi Konrad,
> 
> On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote:
> >>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
> >>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
> >>> where the timer goes off. Nor does it make sense to require that the
> >>> guest actually be running for 5 before injecting the timer because that
> >>> would mean real time elapsed time for the timer would be 5+3 in the case
> >>> where 3 are stolen.
> >>
> >> This is a bit of an aside, but I think that hiding time spent at higher
> >> privilege levels can be a quite sensible approach to timekeeping in a
> >> virtualized environment, but I understand that it's not the approach taken
> >> with Xen, and as you pointed out above, adjusting the Virtual Offset Register
> >> by itself isn't enough to implement that approach.
> > 
> > This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for
> > implementation. In the user-space, the entry in 'top' of "stolen" (%st)
> > is for this exact value.
> 
> I may have been unclear with my terms, sorry. When I refer to time being
> "hidden", I mean that kernel level software (supervisor mode, EL1) cannot
> detect the passage of that time at all. I don't know whether this would really
> work, but I imagine one might be able to get close with the current
> virtualization facilities for ARM.
> 
> Am I correct in interpreting that what you're referring to is the deployment
> of paravirtualization code that ensures (observable) "stolen" time is factored
> into kernel decision-making?

Although it might be possible to hide the real time flow from the VM, it
is inadvisable: what would happen when the VM needs to deal with a real
hardware device? Or just send packets over the network?  This is why it
is much safer and more reliable to expose the stolen ticks to the VM.
Christopher Covington May 8, 2013, 11:48 a.m. UTC | #11
On 05/08/2013 07:19 AM, Stefano Stabellini wrote:
> On Tue, 7 May 2013, Christopher Covington wrote:
>> Hi Konrad,
>>
>> On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote:
>>>>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
>>>>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
>>>>> where the timer goes off. Nor does it make sense to require that the
>>>>> guest actually be running for 5 before injecting the timer because that
>>>>> would mean real time elapsed time for the timer would be 5+3 in the case
>>>>> where 3 are stolen.
>>>>
>>>> This is a bit of an aside, but I think that hiding time spent at higher
>>>> privilege levels can be a quite sensible approach to timekeeping in a
>>>> virtualized environment, but I understand that it's not the approach taken
>>>> with Xen, and as you pointed out above, adjusting the Virtual Offset Register
>>>> by itself isn't enough to implement that approach.
>>>
>>> This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for
>>> implementation. In the user-space, the entry in 'top' of "stolen" (%st)
>>> is for this exact value.
>>
>> I may have been unclear with my terms, sorry. When I refer to time being
>> "hidden", I mean that kernel level software (supervisor mode, EL1) cannot
>> detect the passage of that time at all. I don't know whether this would really
>> work, but I imagine one might be able to get close with the current
>> virtualization facilities for ARM.
>>
>> Am I correct in interpreting that what you're referring to is the deployment
>> of paravirtualization code that ensures (observable) "stolen" time is factored
>> into kernel decision-making?
> 
> Although it might be possible to hide the real time flow from the VM, it
> is inadvisable: what would happen when the VM needs to deal with a real
> hardware device? Or just send packets over the network?  This is why it
> is much safer and more reliable to expose the stolen ticks to the VM.

One would probably have to emulate all the hardware. I don't mean to imply
that I think this is useful for everyday virtualization, but I speculate that
such an approach might enable the analysis of kernels infected with
VM-detecting malware or other niche use-cases.

Christopher
diff mbox

Patch

diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
index 7ade91d..30db413 100644
--- a/arch/arm/include/asm/arch_timer.h
+++ b/arch/arm/include/asm/arch_timer.h
@@ -13,6 +13,11 @@ 
 int arch_timer_of_register(void);
 int arch_timer_sched_clock_init(void);
 
+/* per-platform function to calculate stolen ticks (clock cycles stolen
+ * to the vcpu by the hypervisor).
+ */
+extern void (*arch_timer_stolen_ticks)(void);
+
 /*
  * These register accessors are marked inline so the compiler can
  * nicely work out which register we want, and chuck away the rest of
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index d7ad425..d1ddf0b 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -37,6 +37,8 @@  static int arch_timer_ppi[MAX_TIMER_PPI];
 
 static struct clock_event_device __percpu *arch_timer_evt;
 
+void (*arch_timer_stolen_ticks)(void);
+
 static bool arch_timer_use_virtual = true;
 
 /*
@@ -52,6 +54,10 @@  static inline irqreturn_t timer_handler(const int access,
 		ctrl |= ARCH_TIMER_CTRL_IT_MASK;
 		arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl);
 		evt->event_handler(evt);
+
+		if ( arch_timer_stolen_ticks != NULL )
+			arch_timer_stolen_ticks();
+
 		return IRQ_HANDLED;
 	}