Message ID | 1367436460-10183-1-git-send-email-stefano.stabellini@eu.citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Stefano, On 05/01/2013 03:27 PM, Stefano Stabellini wrote: > Introduce a function, called arch_timer_stolen_ticks, called from the > arch_timer interrupt handler to account for stolen ticks. [...] > diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h > index 7ade91d..30db413 100644 > --- a/arch/arm/include/asm/arch_timer.h > +++ b/arch/arm/include/asm/arch_timer.h > @@ -13,6 +13,11 @@ > int arch_timer_of_register(void); > int arch_timer_sched_clock_init(void); > > +/* per-platform function to calculate stolen ticks (clock cycles stolen > + * to the vcpu by the hypervisor). Stolen from the vcpu by the hypervisor? Is the hypervisor adjusting the Virtual Offset Register? [...] Thanks, Christopher
On Wed, 2013-05-01 at 21:36 +0100, Christopher Covington wrote: > Hi Stefano, > > On 05/01/2013 03:27 PM, Stefano Stabellini wrote: > > Introduce a function, called arch_timer_stolen_ticks, called from the > > arch_timer interrupt handler to account for stolen ticks. > > [...] > > > diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h > > index 7ade91d..30db413 100644 > > --- a/arch/arm/include/asm/arch_timer.h > > +++ b/arch/arm/include/asm/arch_timer.h > > @@ -13,6 +13,11 @@ > > int arch_timer_of_register(void); > > int arch_timer_sched_clock_init(void); > > > > +/* per-platform function to calculate stolen ticks (clock cycles stolen > > + * to the vcpu by the hypervisor). > > Stolen from the vcpu by the hypervisor? Stolen is time where the VCPU wants to be running bit isn't because the hypervisor has descheduled it, e.g. because another VCPU is being run. So yes, Stefano meant "from". > Is the hypervisor adjusting the Virtual Offset Register? The virtual offset register is useful when a VCPU is migrated to another system to account for the differences in physical time on the two hosts but isn't useful for accounting for stolen time while running on a single host. e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it would not make sense (from the guests PoV) for NOW'==NOW+2 at the point where the timer goes off. Nor does it make sense to require that the guest actually be running for 5 before injecting the timer because that would mean real time elapsed time for the timer would be 5+3 in the case where 3 are stolen. So the virtual timer should appear to have been running even while time is being stolen and therefore stolen time needs to be accounted via some other means. Ian.
Hi Ian, On 05/02/2013 04:19 AM, Ian Campbell wrote: > On Wed, 2013-05-01 at 21:36 +0100, Christopher Covington wrote: >> Hi Stefano, >> >> On 05/01/2013 03:27 PM, Stefano Stabellini wrote: >>> Introduce a function, called arch_timer_stolen_ticks, called from the >>> arch_timer interrupt handler to account for stolen ticks. >> >> [...] >> >>> diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h >>> index 7ade91d..30db413 100644 >>> --- a/arch/arm/include/asm/arch_timer.h >>> +++ b/arch/arm/include/asm/arch_timer.h >>> @@ -13,6 +13,11 @@ >>> int arch_timer_of_register(void); >>> int arch_timer_sched_clock_init(void); >>> >>> +/* per-platform function to calculate stolen ticks (clock cycles stolen >>> + * to the vcpu by the hypervisor). [...] >> Is the hypervisor adjusting the Virtual Offset Register? > > The virtual offset register is useful when a VCPU is migrated to another > system to account for the differences in physical time on the two hosts > but isn't useful for accounting for stolen time while running on a > single host. > > e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it > would not make sense (from the guests PoV) for NOW'==NOW+2 at the point > where the timer goes off. Nor does it make sense to require that the > guest actually be running for 5 before injecting the timer because that > would mean real time elapsed time for the timer would be 5+3 in the case > where 3 are stolen. This is a bit of an aside, but I think that hiding time spent at higher privilege levels can be a quite sensible approach to timekeeping in a virtualized environment, but I understand that it's not the approach taken with Xen, and as you pointed out above, adjusting the Virtual Offset Register by itself isn't enough to implement that approach. > So the virtual timer should appear to have been running even while time > is being stolen and therefore stolen time needs to be accounted via some > other means. Something that's not currently obvious to me is that given that the stolen cycle accounting should be done, what makes the architected timer interrupt handler the ideal place to do it? Thanks, Christopher
On Thu, 2 May 2013, Christopher Covington wrote: > > So the virtual timer should appear to have been running even while time > > is being stolen and therefore stolen time needs to be accounted via some > > other means. > > Something that's not currently obvious to me is that given that the stolen > cycle accounting should be done, what makes the architected timer interrupt > handler the ideal place to do it? That is a good question and I would appreciate suggestions to improve the patch. Given that Xen x86 and ia64 does stolen time accounting from the timer interrupt handler: arch/x86/xen/time.c:xen_timer_interrupt arch/ia64/kernel/time.c:timer_interrupt and given that the arch_timer is the only timer used by Xen on ARM and that it includes a virt_timer that is made on purpose to be used by virtual machines, I thought that it might be a good place for it. I also thought that doing it this way, KVM should be able to reuse the same hook.
On 03/05/13 11:43, Stefano Stabellini wrote: > On Thu, 2 May 2013, Christopher Covington wrote: >>> So the virtual timer should appear to have been running even while time >>> is being stolen and therefore stolen time needs to be accounted via some >>> other means. >> >> Something that's not currently obvious to me is that given that the stolen >> cycle accounting should be done, what makes the architected timer interrupt >> handler the ideal place to do it? > > That is a good question and I would appreciate suggestions to improve > the patch. > > Given that Xen x86 and ia64 does stolen time accounting from the timer > interrupt handler: > > arch/x86/xen/time.c:xen_timer_interrupt > arch/ia64/kernel/time.c:timer_interrupt > > and given that the arch_timer is the only timer used by Xen on ARM and > that it includes a virt_timer that is made on purpose to be used by > virtual machines, I thought that it might be a good place for it. > > I also thought that doing it this way, KVM should be able to reuse the > same hook. Indeed. I just need to understand how time stealing works there ;-). Now, KVM is not necessarily limited to arch_timers, and we've run KVM using a QEMU-provided timer in the past. Can you think of a more generic location for this hook? Possibly something that would satisfy the requirements of other architectures while we're at it? M.
On Fri, 3 May 2013, Marc Zyngier wrote: > On 03/05/13 11:43, Stefano Stabellini wrote: > > On Thu, 2 May 2013, Christopher Covington wrote: > >>> So the virtual timer should appear to have been running even while time > >>> is being stolen and therefore stolen time needs to be accounted via some > >>> other means. > >> > >> Something that's not currently obvious to me is that given that the stolen > >> cycle accounting should be done, what makes the architected timer interrupt > >> handler the ideal place to do it? > > > > That is a good question and I would appreciate suggestions to improve > > the patch. > > > > Given that Xen x86 and ia64 does stolen time accounting from the timer > > interrupt handler: > > > > arch/x86/xen/time.c:xen_timer_interrupt > > arch/ia64/kernel/time.c:timer_interrupt > > > > and given that the arch_timer is the only timer used by Xen on ARM and > > that it includes a virt_timer that is made on purpose to be used by > > virtual machines, I thought that it might be a good place for it. > > > > I also thought that doing it this way, KVM should be able to reuse the > > same hook. > > Indeed. I just need to understand how time stealing works there ;-). > > Now, KVM is not necessarily limited to arch_timers, and we've run KVM > using a QEMU-provided timer in the past. Can you think of a more generic > location for this hook? Possibly something that would satisfy the > requirements of other architectures while we're at it? Probably the best option would be to reuse kernel/sched/cputime.c:steal_account_process_tick but that also means introducing CONFIG_PARAVIRT on arm. I am up for that, what do you think?
> > e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it > > would not make sense (from the guests PoV) for NOW'==NOW+2 at the point > > where the timer goes off. Nor does it make sense to require that the > > guest actually be running for 5 before injecting the timer because that > > would mean real time elapsed time for the timer would be 5+3 in the case > > where 3 are stolen. > > This is a bit of an aside, but I think that hiding time spent at higher > privilege levels can be a quite sensible approach to timekeeping in a > virtualized environment, but I understand that it's not the approach taken > with Xen, and as you pointed out above, adjusting the Virtual Offset Register > by itself isn't enough to implement that approach. This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for implementation. In the user-space, the entry in 'top' of "stolen" (%st) is for this exact value.
On 05/05/2013 12:47 PM, Stefano Stabellini wrote: > On Fri, 3 May 2013, Marc Zyngier wrote: >> On 03/05/13 11:43, Stefano Stabellini wrote: >>> On Thu, 2 May 2013, Christopher Covington wrote: >>>>> So the virtual timer should appear to have been running even while time >>>>> is being stolen and therefore stolen time needs to be accounted via some >>>>> other means. >>>> >>>> Something that's not currently obvious to me is that given that the stolen >>>> cycle accounting should be done, what makes the architected timer interrupt >>>> handler the ideal place to do it? >>> >>> That is a good question and I would appreciate suggestions to improve >>> the patch. >>> >>> Given that Xen x86 and ia64 does stolen time accounting from the timer >>> interrupt handler: >>> >>> arch/x86/xen/time.c:xen_timer_interrupt >>> arch/ia64/kernel/time.c:timer_interrupt >>> >>> and given that the arch_timer is the only timer used by Xen on ARM and >>> that it includes a virt_timer that is made on purpose to be used by >>> virtual machines, I thought that it might be a good place for it. >>> >>> I also thought that doing it this way, KVM should be able to reuse the >>> same hook. >> >> Indeed. I just need to understand how time stealing works there ;-). >> >> Now, KVM is not necessarily limited to arch_timers, and we've run KVM >> using a QEMU-provided timer in the past. Can you think of a more generic >> location for this hook? Possibly something that would satisfy the >> requirements of other architectures while we're at it? > > Probably the best option would be to reuse > > kernel/sched/cputime.c:steal_account_process_tick > > but that also means introducing CONFIG_PARAVIRT on arm. > I am up for that, what do you think? That sounds like a good move to me. Christopher
Hi Konrad, On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote: >>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it >>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point >>> where the timer goes off. Nor does it make sense to require that the >>> guest actually be running for 5 before injecting the timer because that >>> would mean real time elapsed time for the timer would be 5+3 in the case >>> where 3 are stolen. >> >> This is a bit of an aside, but I think that hiding time spent at higher >> privilege levels can be a quite sensible approach to timekeeping in a >> virtualized environment, but I understand that it's not the approach taken >> with Xen, and as you pointed out above, adjusting the Virtual Offset Register >> by itself isn't enough to implement that approach. > > This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for > implementation. In the user-space, the entry in 'top' of "stolen" (%st) > is for this exact value. I may have been unclear with my terms, sorry. When I refer to time being "hidden", I mean that kernel level software (supervisor mode, EL1) cannot detect the passage of that time at all. I don't know whether this would really work, but I imagine one might be able to get close with the current virtualization facilities for ARM. Am I correct in interpreting that what you're referring to is the deployment of paravirtualization code that ensures (observable) "stolen" time is factored into kernel decision-making? Thanks, Christopher
On Tue, 7 May 2013, Christopher Covington wrote: > Hi Konrad, > > On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote: > >>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it > >>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point > >>> where the timer goes off. Nor does it make sense to require that the > >>> guest actually be running for 5 before injecting the timer because that > >>> would mean real time elapsed time for the timer would be 5+3 in the case > >>> where 3 are stolen. > >> > >> This is a bit of an aside, but I think that hiding time spent at higher > >> privilege levels can be a quite sensible approach to timekeeping in a > >> virtualized environment, but I understand that it's not the approach taken > >> with Xen, and as you pointed out above, adjusting the Virtual Offset Register > >> by itself isn't enough to implement that approach. > > > > This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for > > implementation. In the user-space, the entry in 'top' of "stolen" (%st) > > is for this exact value. > > I may have been unclear with my terms, sorry. When I refer to time being > "hidden", I mean that kernel level software (supervisor mode, EL1) cannot > detect the passage of that time at all. I don't know whether this would really > work, but I imagine one might be able to get close with the current > virtualization facilities for ARM. > > Am I correct in interpreting that what you're referring to is the deployment > of paravirtualization code that ensures (observable) "stolen" time is factored > into kernel decision-making? Although it might be possible to hide the real time flow from the VM, it is inadvisable: what would happen when the VM needs to deal with a real hardware device? Or just send packets over the network? This is why it is much safer and more reliable to expose the stolen ticks to the VM.
On 05/08/2013 07:19 AM, Stefano Stabellini wrote: > On Tue, 7 May 2013, Christopher Covington wrote: >> Hi Konrad, >> >> On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote: >>>>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it >>>>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point >>>>> where the timer goes off. Nor does it make sense to require that the >>>>> guest actually be running for 5 before injecting the timer because that >>>>> would mean real time elapsed time for the timer would be 5+3 in the case >>>>> where 3 are stolen. >>>> >>>> This is a bit of an aside, but I think that hiding time spent at higher >>>> privilege levels can be a quite sensible approach to timekeeping in a >>>> virtualized environment, but I understand that it's not the approach taken >>>> with Xen, and as you pointed out above, adjusting the Virtual Offset Register >>>> by itself isn't enough to implement that approach. >>> >>> This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for >>> implementation. In the user-space, the entry in 'top' of "stolen" (%st) >>> is for this exact value. >> >> I may have been unclear with my terms, sorry. When I refer to time being >> "hidden", I mean that kernel level software (supervisor mode, EL1) cannot >> detect the passage of that time at all. I don't know whether this would really >> work, but I imagine one might be able to get close with the current >> virtualization facilities for ARM. >> >> Am I correct in interpreting that what you're referring to is the deployment >> of paravirtualization code that ensures (observable) "stolen" time is factored >> into kernel decision-making? > > Although it might be possible to hide the real time flow from the VM, it > is inadvisable: what would happen when the VM needs to deal with a real > hardware device? Or just send packets over the network? This is why it > is much safer and more reliable to expose the stolen ticks to the VM. One would probably have to emulate all the hardware. I don't mean to imply that I think this is useful for everyday virtualization, but I speculate that such an approach might enable the analysis of kernels infected with VM-detecting malware or other niche use-cases. Christopher
diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h index 7ade91d..30db413 100644 --- a/arch/arm/include/asm/arch_timer.h +++ b/arch/arm/include/asm/arch_timer.h @@ -13,6 +13,11 @@ int arch_timer_of_register(void); int arch_timer_sched_clock_init(void); +/* per-platform function to calculate stolen ticks (clock cycles stolen + * to the vcpu by the hypervisor). + */ +extern void (*arch_timer_stolen_ticks)(void); + /* * These register accessors are marked inline so the compiler can * nicely work out which register we want, and chuck away the rest of diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index d7ad425..d1ddf0b 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -37,6 +37,8 @@ static int arch_timer_ppi[MAX_TIMER_PPI]; static struct clock_event_device __percpu *arch_timer_evt; +void (*arch_timer_stolen_ticks)(void); + static bool arch_timer_use_virtual = true; /* @@ -52,6 +54,10 @@ static inline irqreturn_t timer_handler(const int access, ctrl |= ARCH_TIMER_CTRL_IT_MASK; arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl); evt->event_handler(evt); + + if ( arch_timer_stolen_ticks != NULL ) + arch_timer_stolen_ticks(); + return IRQ_HANDLED; }
Introduce a function, called arch_timer_stolen_ticks, called from the arch_timer interrupt handler to account for stolen ticks. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: linux@arm.linux.org.uk CC: john.stultz@linaro.org CC: catalin.marinas@arm.com CC: marc.zyngier@arm.com --- arch/arm/include/asm/arch_timer.h | 5 +++++ drivers/clocksource/arm_arch_timer.c | 6 ++++++ 2 files changed, 11 insertions(+), 0 deletions(-)