Message ID | alpine.DEB.2.10.1702091603240.20549@sstabellini-ThinkPad-X260 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: > Hi all, > Hi, > I have run some IRQ latency measurements on Xen on ARM on a Xilinx > ZynqMP board (four Cortex A53 cores, GICv2). > > Dom0 has 1 vcpu pinned to cpu0, DomU has 1 vcpu pinned to cpu2. > Dom0 is Ubuntu. DomU is an ad-hoc baremetal app to measure interrupt > latency: https://github.com/edgarigl/tbm > Right, interesting use case. I'm glad to see there's some interest in it, and am happy to help investigating, and trying to make things better. > I modified the app to use the phys_timer instead of the virt_timer. > You > can build it with: > > make CFG=configs/xen-guest-irq-latency.cfg > Ok, do you (or anyone) mind explaining in a little bit more details what the app tries to measure and how it does that. As a matter of fact, I'm quite familiar with the scenario (I've spent a lot of time playing with cyclictest https://rt.wiki.kernel.org/index.ph p/Cyclictest ) but I don't immediately understand the meaning of way the timer is programmed, what is supposed to be in the various variables/register, what actually is 'freq', etc. > These are the results, in nanosec: > > AVG MIN MAX WARM MAX > > NODEBUG no WFI 1890 1800 3170 2070 > NODEBUG WFI 4850 4810 7030 4980 > NODEBUG no WFI credit2 2217 2090 3420 2650 > NODEBUG WFI credit2 8080 7890 10320 8300 > > DEBUG no WFI 2252 2080 3320 2650 > DEBUG WFI 6500 6140 8520 8130 > DEBUG WFI, credit2 8050 7870 10680 8450 > > DEBUG means Xen DEBUG build. > Mmm, and Credit2 (with WFI) behave almost the same (and even a bit better in some cases) with debug enabled. While in Credit1, debug yes or no makes quite a few difference, AFAICT, especially in the WFI case. That looks a bit strange, as I'd have expected the effect to be similar (there's actually quite a bit of debug checks in Credit2, maybe even more than in Credit1). > WARM MAX is the maximum latency, taking out the first few interrupts > to > warm the caches. > WFI is the ARM and ARM64 sleeping instruction, trapped and emulated > by > Xen by calling vcpu_block. > > As you can see, depending on whether the guest issues a WFI or not > while > waiting for interrupts, the results change significantly. > Interestingly, > credit2 does worse than credit1 in this area. > This is with current staging right? If yes, in Credit1, you on ARM never stop the scheduler tick, like we do in x86. This means the system is, in general, "more awake" than Credit2, which does not have a periodic tick (and FWIW, also "more awake" of Credit1 in x86, as far as the scheduler is concerned, at least). Whether or not this impact significantly your measurements, I don't know, as it depends on a bunch of factors. What we know is that this has enough impact to trigger the RCU bug Julien discovered (in a different scenario, I know), so I would not rule it out. I can try sending a quick patch for disabling the tick when a CPU is idle, but I'd need your help in testing it. > Trying to figure out where those 3000-4000ns of difference between > the > WFI and non-WFI cases come from, I wrote a patch to zero the latency > introduced by xen/arch/arm/domain.c:schedule_tail. That saves about > 1000ns. There are no other arch specific context switch functions > worth > optimizing. > Yeah. It would be interesting to see a trace, but we still don't have that for ARM. :-( > We are down to 2000-3000ns. Then, I started investigating the > scheduler. > I measured how long it takes to run "vcpu_unblock": 1050ns, which is > significant. > How you measured, if I can ask. > I don't know what is causing the remaining 1000-2000ns, but > I bet on another scheduler function. Do you have any suggestions on > which one? > Well, when a vcpu is woken up, it is put in a runqueue, and a pCPU is poked to go get and run it. The other thing you may want to try to measure is how much time passes between when the vCPU becomes runnable and is added to the runqueue, and when it is actually put to run. Again, this would be visible in tracing. :-/ > Assuming that the problem is indeed the scheduler, one workaround > that > we could introduce today would be to avoid calling vcpu_unblock on > guest > WFI and call vcpu_yield instead. This change makes things > significantly > better: > > AVG MIN MAX WARM MAX > DEBUG WFI (yield, no block) 2900 2190 5130 5130 > DEBUG WFI (yield, no block) credit2 3514 2280 6180 5430 > > Is that a reasonable change to make? Would it cause significantly > more > power consumption in Xen (because xen/arch/arm/domain.c:idle_loop > might > not be called anymore)? > Exactly. So, I think that, as Linux has 'idle=poll', it is conceivable to have something similar in Xen, and if we do, I guess it can be implemented as you suggest. But, no, I don't think this is satisfying as default, not before trying to figure out what is going on, and if we can improve things in other ways. > If we wanted to zero the difference between the WFI and non-WFI > cases, > would we need a new scheduler? A simple "noop scheduler" that > statically > assigns vcpus to pcpus, one by one, until they run out, then return > error? > Well, writing such a scheduler would at least be useful as reference. As in, the latency that you measure on it, is the minimum possible latency the scheduler is responsible for, and we can compare that with what we get from 'regular' schedulers. As a matter of fact, it may also turn out useful for a couple of other issues/reason, so I may indeed give this a go. But it would not be much more useful than that, IMO. > Or do we need more extensive modifications to > xen/common/schedule.c? Any other ideas? > Not yet. :-/ Regards, Dario
On Fri, 10 Feb 2017, Dario Faggioli wrote: > On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: > > Hi all, > > > Hi, > > > I have run some IRQ latency measurements on Xen on ARM on a Xilinx > > ZynqMP board (four Cortex A53 cores, GICv2). > > > > Dom0 has 1 vcpu pinned to cpu0, DomU has 1 vcpu pinned to cpu2. > > Dom0 is Ubuntu. DomU is an ad-hoc baremetal app to measure interrupt > > latency: https://github.com/edgarigl/tbm > > > Right, interesting use case. I'm glad to see there's some interest in > it, and am happy to help investigating, and trying to make things > better. Thank you! > > I modified the app to use the phys_timer instead of the virt_timer. > > You > > can build it with: > > > > make CFG=configs/xen-guest-irq-latency.cfg > > > Ok, do you (or anyone) mind explaining in a little bit more details > what the app tries to measure and how it does that. Give a look at app/xen/guest_irq_latency/apu.c: https://github.com/edgarigl/tbm/blob/master/app/xen/guest_irq_latency/apu.c This is my version which uses the phys_timer (instead of the virt_timer): https://github.com/sstabellini/tbm/blob/phys-timer/app/xen/guest_irq_latency/apu.c Edgar can jump in to add more info if needed (he is the author of the app), but as you can see from the code, the app is very simple. It sets a timer event in the future, then, after receiving the event, it checks the current time and compare it with the deadline. > As a matter of fact, I'm quite familiar with the scenario (I've spent a > lot of time playing with cyclictest https://rt.wiki.kernel.org/index.ph > p/Cyclictest ) but I don't immediately understand the meaning of way > the timer is programmed, what is supposed to be in the various > variables/register, what actually is 'freq', etc. The timer is programmed by writing the compare value to the cntp_cval system register, see a64_write_timer_cval. The counter is read by reading the cntpct system register, see arch-aarch64/aarch64-excp.c:aarch64_irq. freq is the frequency of the timer (which is lower than the cpu frequency). freq_k is the multiplication factor to convert timer counter numbers into nanosec, on my platform it's 10. If you want more info on the timer, give a look at "Generic Timer" in the ARM Architecture Reference Manual. > > These are the results, in nanosec: > > > > AVG MIN MAX WARM MAX > > > > NODEBUG no WFI 1890 1800 3170 2070 > > NODEBUG WFI 4850 4810 7030 4980 > > NODEBUG no WFI credit2 2217 2090 3420 2650 > > NODEBUG WFI credit2 8080 7890 10320 8300 > > > > DEBUG no WFI 2252 2080 3320 2650 > > DEBUG WFI 6500 6140 8520 8130 > > DEBUG WFI, credit2 8050 7870 10680 8450 > > > > DEBUG means Xen DEBUG build. > > > Mmm, and Credit2 (with WFI) behave almost the same (and even a bit > better in some cases) with debug enabled. While in Credit1, debug yes > or no makes quite a few difference, AFAICT, especially in the WFI case. > > That looks a bit strange, as I'd have expected the effect to be similar > (there's actually quite a bit of debug checks in Credit2, maybe even > more than in Credit1). > > > WARM MAX is the maximum latency, taking out the first few interrupts > > to > > warm the caches. > > WFI is the ARM and ARM64 sleeping instruction, trapped and emulated > > by > > Xen by calling vcpu_block. > > > > As you can see, depending on whether the guest issues a WFI or not > > while > > waiting for interrupts, the results change significantly. > > Interestingly, > > credit2 does worse than credit1 in this area. > > > This is with current staging right? That's right. > If yes, in Credit1, you on ARM > never stop the scheduler tick, like we do in x86. This means the system > is, in general, "more awake" than Credit2, which does not have a > periodic tick (and FWIW, also "more awake" of Credit1 in x86, as far as > the scheduler is concerned, at least). > > Whether or not this impact significantly your measurements, I don't > know, as it depends on a bunch of factors. What we know is that this > has enough impact to trigger the RCU bug Julien discovered (in a > different scenario, I know), so I would not rule it out. > > I can try sending a quick patch for disabling the tick when a CPU is > idle, but I'd need your help in testing it. That might be useful, however, if I understand this right, we don't actually want a periodic timer in Xen just to make the system more responsive, do we? > > Trying to figure out where those 3000-4000ns of difference between > > the > > WFI and non-WFI cases come from, I wrote a patch to zero the latency > > introduced by xen/arch/arm/domain.c:schedule_tail. That saves about > > 1000ns. There are no other arch specific context switch functions > > worth > > optimizing. > > > Yeah. It would be interesting to see a trace, but we still don't have > that for ARM. :-( indeed > > We are down to 2000-3000ns. Then, I started investigating the > > scheduler. > > I measured how long it takes to run "vcpu_unblock": 1050ns, which is > > significant. > > > How you measured, if I can ask. Simple. I added a timer counter read before and after the function call: uint64_t n1 = 0, n2 = 0; n1 = READ_SYSREG64(CNTPCT_EL0); function_call_to_measure(); n2 = READ_SYSREG64(CNTPCT_EL0); printk("DEBUG %s %d ns=%lu\n",__func__,__LINE__,(n2-n1)*10); Where 10 is the calculated freq_k for the platform I have. > > I don't know what is causing the remaining 1000-2000ns, but > > I bet on another scheduler function. Do you have any suggestions on > > which one? > > > Well, when a vcpu is woken up, it is put in a runqueue, and a pCPU is > poked to go get and run it. The other thing you may want to try to > measure is how much time passes between when the vCPU becomes runnable > and is added to the runqueue, and when it is actually put to run. > > Again, this would be visible in tracing. :-/ I could do that of you tell me where to add the two 'READ_SYSREG64(CNTPCT_EL0)'. > > Assuming that the problem is indeed the scheduler, one workaround > > that > > we could introduce today would be to avoid calling vcpu_unblock on > > guest > > WFI and call vcpu_yield instead. This change makes things > > significantly > > better: > > > > AVG MIN MAX WARM MAX > > DEBUG WFI (yield, no block) 2900 2190 5130 5130 > > DEBUG WFI (yield, no block) credit2 3514 2280 6180 5430 > > > > Is that a reasonable change to make? Would it cause significantly > > more > > power consumption in Xen (because xen/arch/arm/domain.c:idle_loop > > might > > not be called anymore)? > > > Exactly. So, I think that, as Linux has 'idle=poll', it is conceivable > to have something similar in Xen, and if we do, I guess it can be > implemented as you suggest. > > But, no, I don't think this is satisfying as default, not before trying > to figure out what is going on, and if we can improve things in other > ways. OK. Should I write a patch for that? I guess it would be arm specific initially. What do you think it would be a good name for the option? > > If we wanted to zero the difference between the WFI and non-WFI > > cases, > > would we need a new scheduler? A simple "noop scheduler" that > > statically > > assigns vcpus to pcpus, one by one, until they run out, then return > > error? > > > Well, writing such a scheduler would at least be useful as reference. > As in, the latency that you measure on it, is the minimum possible > latency the scheduler is responsible for, and we can compare that with > what we get from 'regular' schedulers. > > As a matter of fact, it may also turn out useful for a couple of other > issues/reason, so I may indeed give this a go. Thank you! If you write it, I'll help you test it on ARM64 :-) > But it would not be much more useful than that, IMO. Why? Actually I know of several potential users of Xen on ARM interested exactly in this use-case. They only have a statically defined number of guests with a total amount of vcpu lower or equal to the number of pcpu in the system. Wouldn't a scheduler like that help in this scenario?
On Fri, 2017-02-10 at 10:32 -0800, Stefano Stabellini wrote: > On Fri, 10 Feb 2017, Dario Faggioli wrote: > > Right, interesting use case. I'm glad to see there's some interest > > in > > it, and am happy to help investigating, and trying to make things > > better. > > Thank you! > Hey, FYI, I am looking into this. It's just that I've got a couple of other things in my plate right now. > > Ok, do you (or anyone) mind explaining in a little bit more details > > what the app tries to measure and how it does that. > > Give a look at app/xen/guest_irq_latency/apu.c: > > https://github.com/edgarigl/tbm/blob/master/app/xen/guest_irq_latency > /apu.c > > This is my version which uses the phys_timer (instead of the > virt_timer): > > https://github.com/sstabellini/tbm/blob/phys-timer/app/xen/guest_irq_ > latency/apu.c > Yep, I did look at those. > Edgar can jump in to add more info if needed (he is the author of the > app), but as you can see from the code, the app is very simple. It > sets > a timer event in the future, then, after receiving the event, it > checks > the current time and compare it with the deadline. > Right, and you check the current time with: now = aarch64_irq_get_stamp(el); which I guess is compatible with the values you use for the counter. > > > These are the results, in nanosec: > > > > > > AVG MIN MAX WARM MAX > > > > > > NODEBUG no WFI 1890 1800 3170 2070 > > > NODEBUG WFI 4850 4810 7030 4980 > > > NODEBUG no WFI credit2 2217 2090 3420 2650 > > > NODEBUG WFI credit2 8080 7890 10320 8300 > > > > > > DEBUG no WFI 2252 2080 3320 2650 > > > DEBUG WFI 6500 6140 8520 8130 > > > DEBUG WFI, credit2 8050 7870 10680 8450 > > > > > > DEBUG means Xen DEBUG build. > > > [...] > > > As you can see, depending on whether the guest issues a WFI or > > > not > > > while > > > waiting for interrupts, the results change significantly. > > > Interestingly, > > > credit2 does worse than credit1 in this area. > > > > > This is with current staging right? > > That's right. > So, when you have the chance, can I see the output of xl debug-key r xl dmesg Both under Credit1 and Credit2? > > I can try sending a quick patch for disabling the tick when a CPU > > is > > idle, but I'd need your help in testing it. > > That might be useful, however, if I understand this right, we don't > actually want a periodic timer in Xen just to make the system more > responsive, do we? > IMO, no. I'd call that an hack, and don't think we should go that route. Not until we have figured out and squeezed as much as possible all the other sources of latency, and that has proven not to be enough, at least. I'll send the patch. > > > Assuming that the problem is indeed the scheduler, one workaround > > > that > > > we could introduce today would be to avoid calling vcpu_unblock > > > on > > > guest > > > WFI and call vcpu_yield instead. This change makes things > > > significantly > > > better: > > > > > > AVG MIN MAX WARM > > > MAX > > > DEBUG WFI (yield, no block) 2900 2190 5130 5130 > > > DEBUG WFI (yield, no block) credit2 3514 2280 6180 5430 > > > > > > Is that a reasonable change to make? Would it cause significantly > > > more > > > power consumption in Xen (because xen/arch/arm/domain.c:idle_loop > > > might > > > not be called anymore)? > > > > > Exactly. So, I think that, as Linux has 'idle=poll', it is > > conceivable > > to have something similar in Xen, and if we do, I guess it can be > > implemented as you suggest. > > > > But, no, I don't think this is satisfying as default, not before > > trying > > to figure out what is going on, and if we can improve things in > > other > > ways. > > OK. Should I write a patch for that? I guess it would be arm specific > initially. What do you think it would be a good name for the option? > Well, I think such an option may be useful on other arches too, but we better measure/verify that before. Therefore, I'd be ok for this to be only implemented on ARM for now. As per the name, I actually like the 'idle=', and as values, what about 'sleep' or 'block' for the current default, and stick to 'poll' for the new behavior you'll implement? Or do you think it is at risk of confusion with Linux? An alternative would be something like 'wfi=[sleep,idle]', or 'wfi=[block,poll]', but that is ARM specific, and it'd mean we will need another option for making x86 behave similarly. > > But it would not be much more useful than that, IMO. > > Why? Actually I know of several potential users of Xen on ARM > interested > exactly in this use-case. They only have a statically defined number > of > guests with a total amount of vcpu lower or equal to the number of > pcpu > in the system. Wouldn't a scheduler like that help in this scenario? > What I'm saying is that would be rather inflexible. In the sense that it won't be possible to have statically pinned and dynamically moving vcpus in the same guest, it would be hard to control what vcpu is statically assigned to what pcpu, making a domain statically assigned would mean move it to another cpupool (which is the only way to use a different scheduler, right now, in Xen), and things like this. I know there are static use cases... But I'm not entirely sure how static they really are, and whether they, in the end, will really like such degree of inflexibility. But anyway, indeed I can give you a scheduler that, provided it leaves in a cpupool with M pcpus, a soon as a new domain with n vcpus is moved inside the pool, statically assign its n0,n1...,nk,k<=M vcpus to a pcpu, and always stick with that. And we'll see what will happen! :-) Regards, Dario
On Thu, 16 Feb 2017, Dario Faggioli wrote: > On Fri, 2017-02-10 at 10:32 -0800, Stefano Stabellini wrote: > > On Fri, 10 Feb 2017, Dario Faggioli wrote: > > > Right, interesting use case. I'm glad to see there's some interest > > > in > > > it, and am happy to help investigating, and trying to make things > > > better. > > > > Thank you! > > > Hey, FYI, I am looking into this. It's just that I've got a couple of > other things in my plate right now. OK > > > Ok, do you (or anyone) mind explaining in a little bit more details > > > what the app tries to measure and how it does that. > > > > Give a look at app/xen/guest_irq_latency/apu.c: > > > > https://github.com/edgarigl/tbm/blob/master/app/xen/guest_irq_latency > > /apu.c > > > > This is my version which uses the phys_timer (instead of the > > virt_timer): > > > > https://github.com/sstabellini/tbm/blob/phys-timer/app/xen/guest_irq_ > > latency/apu.c > > > Yep, I did look at those. > > > Edgar can jump in to add more info if needed (he is the author of the > > app), but as you can see from the code, the app is very simple. It > > sets > > a timer event in the future, then, after receiving the event, it > > checks > > the current time and compare it with the deadline. > > > Right, and you check the current time with: > > now = aarch64_irq_get_stamp(el); > > which I guess is compatible with the values you use for the counter. Yes > > > > These are the results, in nanosec: > > > > > > > > AVG MIN MAX WARM MAX > > > > > > > > NODEBUG no WFI 1890 1800 3170 2070 > > > > NODEBUG WFI 4850 4810 7030 4980 > > > > NODEBUG no WFI credit2 2217 2090 3420 2650 > > > > NODEBUG WFI credit2 8080 7890 10320 8300 > > > > > > > > DEBUG no WFI 2252 2080 3320 2650 > > > > DEBUG WFI 6500 6140 8520 8130 > > > > DEBUG WFI, credit2 8050 7870 10680 8450 > > > > > > > > DEBUG means Xen DEBUG build. > > > > > [...] > > > > As you can see, depending on whether the guest issues a WFI or > > > > not > > > > while > > > > waiting for interrupts, the results change significantly. > > > > Interestingly, > > > > credit2 does worse than credit1 in this area. > > > > > > > This is with current staging right? > > > > That's right. > > > So, when you have the chance, can I see the output of > > xl debug-key r > xl dmesg > > Both under Credit1 and Credit2? I'll see what I can do. > > > I can try sending a quick patch for disabling the tick when a CPU > > > is > > > idle, but I'd need your help in testing it. > > > > That might be useful, however, if I understand this right, we don't > > actually want a periodic timer in Xen just to make the system more > > responsive, do we? > > > IMO, no. I'd call that an hack, and don't think we should go that > route. > > Not until we have figured out and squeezed as much as possible all the > other sources of latency, and that has proven not to be enough, at > least. > > I'll send the patch. > > > > > Assuming that the problem is indeed the scheduler, one workaround > > > > that > > > > we could introduce today would be to avoid calling vcpu_unblock > > > > on > > > > guest > > > > WFI and call vcpu_yield instead. This change makes things > > > > significantly > > > > better: > > > > > > > > AVG MIN MAX WARM > > > > MAX > > > > DEBUG WFI (yield, no block) 2900 2190 5130 5130 > > > > DEBUG WFI (yield, no block) credit2 3514 2280 6180 5430 > > > > > > > > Is that a reasonable change to make? Would it cause significantly > > > > more > > > > power consumption in Xen (because xen/arch/arm/domain.c:idle_loop > > > > might > > > > not be called anymore)? > > > > > > > Exactly. So, I think that, as Linux has 'idle=poll', it is > > > conceivable > > > to have something similar in Xen, and if we do, I guess it can be > > > implemented as you suggest. > > > > > > But, no, I don't think this is satisfying as default, not before > > > trying > > > to figure out what is going on, and if we can improve things in > > > other > > > ways. > > > > OK. Should I write a patch for that? I guess it would be arm specific > > initially. What do you think it would be a good name for the option? > > > Well, I think such an option may be useful on other arches too, but we > better measure/verify that before. Therefore, I'd be ok for this to be > only implemented on ARM for now. > > As per the name, I actually like the 'idle=', and as values, what about > 'sleep' or 'block' for the current default, and stick to 'poll' for the > new behavior you'll implement? Or do you think it is at risk of > confusion with Linux? > > An alternative would be something like 'wfi=[sleep,idle]', or > 'wfi=[block,poll]', but that is ARM specific, and it'd mean we will > need another option for making x86 behave similarly. That's a good idea. vwfi=[sleep,idle] looks like the right thing to introduce, given that the option would be ARM only at the moment and that it's the virtual wfi not the physical wfi behavior that we are changing. > > > But it would not be much more useful than that, IMO. > > > > Why? Actually I know of several potential users of Xen on ARM > > interested > > exactly in this use-case. They only have a statically defined number > > of > > guests with a total amount of vcpu lower or equal to the number of > > pcpu > > in the system. Wouldn't a scheduler like that help in this scenario? > > > What I'm saying is that would be rather inflexible. In the sense that > it won't be possible to have statically pinned and dynamically moving > vcpus in the same guest, it would be hard to control what vcpu is > statically assigned to what pcpu, making a domain statically assigned > would mean move it to another cpupool (which is the only way to use a > different scheduler, right now, in Xen), and things like this. > > I know there are static use cases... But I'm not entirely sure how > static they really are, and whether they, in the end, will really like > such degree of inflexibility. They are _very_ static :-) Think about the board on a mechanical robot or a drone. VMs are only created at host boot and never again. In fact we are planning to introduce a feature in Xen to be able to create a few VMs directly from the hypervisor, to skip the tools in Dom0 for these cases. > But anyway, indeed I can give you a scheduler that, provided it leaves > in a cpupool with M pcpus, a soon as a new domain with n vcpus is moved > inside the pool, statically assign its n0,n1...,nk,k<=M vcpus to a > pcpu, and always stick with that. And we'll see what will happen! :-) I am looking forward to it.
On Thu, 16 Feb 2017, Stefano Stabellini wrote: > > > > > AVG MIN MAX WARM MAX > > > > > > > > > > NODEBUG no WFI 1890 1800 3170 2070 > > > > > NODEBUG WFI 4850 4810 7030 4980 > > > > > NODEBUG no WFI credit2 2217 2090 3420 2650 > > > > > NODEBUG WFI credit2 8080 7890 10320 8300 > > > > > > > > > > DEBUG no WFI 2252 2080 3320 2650 > > > > > DEBUG WFI 6500 6140 8520 8130 > > > > > DEBUG WFI, credit2 8050 7870 10680 8450 > > > > > > > > > > DEBUG means Xen DEBUG build. > > > > > > > [...] > > > > > As you can see, depending on whether the guest issues a WFI or > > > > > not > > > > > while > > > > > waiting for interrupts, the results change significantly. > > > > > Interestingly, > > > > > credit2 does worse than credit1 in this area. > > > > > > > > > This is with current staging right? > > > > > > That's right. > > > > > So, when you have the chance, can I see the output of > > > > xl debug-key r > > xl dmesg > > > > Both under Credit1 and Credit2? > > I'll see what I can do. See attached. (XEN) Xen version 4.9-unstable (sstabellini@) (aarch64-linux-gnu-gcc (Linaro GCC 2014.05) 4.9.1 20140422 (prerelease)) debug=y Thu Feb 16 14:39:21 PST 2017 (XEN) Latest ChangeSet: Thu Feb 16 14:39:16 2017 -0800 git:4e0ef4d (XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4 (XEN) 64-bit Execution: (XEN) Processor Features: 0000000000002222 0000000000000000 (XEN) Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32 (XEN) Extensions: FloatingPoint AdvancedSIMD (XEN) Debug Features: 0000000010305106 0000000000000000 (XEN) Auxiliary Features: 0000000000000000 0000000000000000 (XEN) Memory Model Features: 0000000000001122 0000000000000000 (XEN) ISA Features: 0000000000011120 0000000000000000 (XEN) 32-bit Execution: (XEN) Processor Features: 00000131:00011011 (XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle (XEN) Extensions: GenericTimer Security (XEN) Debug Features: 03010066 (XEN) Auxiliary Features: 00000000 (XEN) Memory Model Features: 10201105 40000000 01260000 02102211 (XEN) ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121 (XEN) Using PSCI-1.0 for SMP bringup (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 99998 KHz (XEN) GICv2 initialization: (XEN) gic_dist_addr=00000000f9010000 (XEN) gic_cpu_addr=00000000f9020000 (XEN) gic_hyp_addr=00000000f9040000 (XEN) gic_vcpu_addr=00000000f9060000 (XEN) gic_maintenance_irq=25 (XEN) GICv2: Adjusting CPU interface base to 0xf902f000 (XEN) GICv2: 192 lines, 4 cpus, secure (IID 0200143b). (XEN) GICv2: WARNING: CPU0: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU0: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU0: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered. (XEN) Could not find scheduler: credit1 (XEN) Using 'SMP Credit Scheduler' (credit) (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Allocated console ring of 32 KiB. (XEN) Bringing up CPU1 (XEN) GICv2: WARNING: CPU1: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU1: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU1: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered. (XEN) CPU 1 booted. (XEN) Bringing up CPU2 (XEN) GICv2: WARNING: CPU2: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU2: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU2: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered. (XEN) CPU 2 booted. (XEN) Bringing up CPU3 (XEN) GICv2: WARNING: CPU3: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU3: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU3: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered. (XEN) CPU 3 booted. (XEN) Brought up 4 CPUs (XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID (XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558 (XEN) smmu: /amba/smmu@fd800000: probing hardware configuration... (XEN) smmu: /amba/smmu@fd800000: SMMUv2 with: (XEN) smmu: /amba/smmu@fd800000: stage 2 translation (XEN) smmu: /amba/smmu@fd800000: stream matching with 48 register groups, mask 0x7fff (XEN) smmu: /amba/smmu@fd800000: 16 context banks (0 stage-2 only) (XEN) smmu: /amba/smmu@fd800000: Stage-2: 40-bit IPA -> 48-bit PA (XEN) smmu: /amba/smmu@fd800000: registered 26 master devices (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Interrupt remapping enabled (XEN) build-id: 3162bd9a1aaa9749ea1c5a484a7062f60ea73550 (XEN) alternatives: Patching with alt table 00000000400f3c38 -> 00000000400f3fec (XEN) *** LOADING DOMAIN 0 *** (XEN) Loading kernel from boot module @ 0000000000080000 (XEN) Loading ramdisk from boot module @ 00000000060af000 (XEN) Allocating 1:1 mappings totalling 1024MB for dom0: (XEN) BANK[0] 0x00000040000000-0x00000080000000 (1024MB) (XEN) Grant table range: 0x0000087fe00000-0x0000087fe61000 (XEN) smmu: /amba/smmu@fd800000: d0: p2maddr 0x000000087ff68000 (XEN) Loading zImage from 0000000000080000 to 0000000040080000-0000000043180000 (XEN) Loading dom0 initrd from 00000000060af000 to 0x0000000048200000-0x000000004a150400 (XEN) Allocating PPI 16 for event channel interrupt (XEN) Loading dom0 DTB to 0x0000000048000000-0x0000000048006c4a (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen) (XEN) Freed 296kB init memory. [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 4.6.0-xilinx (xbrbbot@xcosswbld01) (gcc version 5.2.1 20151005 (Linaro GCC 5.2-2015.11-2) ) #1 SMP Thu Dec 15 18:01:07 MST 2016 [ 0.000000] Boot CPU: AArch64 Processor [410fd034] [ 0.000000] earlycon: xen0 at I/O port 0x0 (options '') [ 0.000000] bootconsole [xen0] enabled [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 128 MiB at 0x0000000078000000 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv0.2 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] Xen 4.9 support found [ 0.000000] percpu: Embedded 20 pages/cpu @ffffffc037f73000 s42240 r8192 d31488 u81920 [ 0.000000] Detected VIPT I-cache on CPU0 [ 0.000000] CPU features: enabling workaround for ARM erratum 845719 [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 258560 [ 0.000000] Kernel command line: console=hvc0 earlycon=xen earlyprintk=xen maxcpus=1 [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.000000] software IO TLB [mem 0x72e00000-0x76e00000] (64MB) mapped at [ffffffc032e00000-ffffffc036dfffff] [ 0.000000] Memory: 790568K/1048576K available (8136K kernel code, 561K rwdata, 2860K rodata, 460K init, 374K bss, 126936K reserved, 131072K cma-reserved) [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffffff8000000000 - 0xffffff8008000000 ( 128 MB) [ 0.000000] vmalloc : 0xffffff8008000000 - 0xffffffbdffff0000 ( 247 GB) [ 0.000000] .text : 0xffffff8008080000 - 0xffffff8008870000 ( 8128 KB) [ 0.000000] .rodata : 0xffffff8008870000 - 0xffffff8008b3f000 ( 2876 KB) [ 0.000000] .init : 0xffffff8008b3f000 - 0xffffff8008bb2000 ( 460 KB) [ 0.000000] .data : 0xffffff8008bb2000 - 0xffffff8008c3e400 ( 561 KB) [ 0.000000] vmemmap : 0xffffffbe00000000 - 0xffffffbfc0000000 ( 7 GB maximum) [ 0.000000] 0xffffffbe00000000 - 0xffffffbe00e00000 ( 14 MB actual) [ 0.000000] fixed : 0xffffffbffe7fd000 - 0xffffffbffec00000 ( 4108 KB) [ 0.000000] PCI I/O : 0xffffffbffee00000 - 0xffffffbfffe00000 ( 16 MB) [ 0.000000] memory : 0xffffffc000000000 - 0xffffffc040000000 ( 1024 MB) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] Build-time adjustment of leaf fanout to 64. [ 0.000000] RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4. [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=4 [ 0.000000] NR_IRQS:64 nr_irqs:64 0 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER4 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER8 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER12 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER16 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER20 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER0 [ 0.000000] Architected cp15 timer(s) running at 99.99MHz (virt). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x171015c6fb, max_idle_ns: 440795203665 ns [ 0.000004] sched_clock: 56 bits at 99MHz, resolution 10ns, wraps every 4398046511101ns [ 0.008279] Console: colour dummy device 80x25 [ 0.012382] console [hvc0] enabled [ 0.012382] console [hvc0] enabled [ 0.019204] bootconsole [xen0] disabled [ 0.019204] bootconsole [xen0] disabled [ 0.026913] Calibrating delay loop (skipped), value calculated using timer frequency.. 199.99 BogoMIPS (lpj=399995) [ 0.037362] pid_max: default: 32768 minimum: 301 [ 0.042141] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes) [ 0.048788] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes) [ 0.056539] ASID allocator initialised with 65536 entries [ 0.061530] xen:grant_table: Grant tables using version 1 layout [ 0.067499] Grant table initialized [ 0.071037] xen:events: Using FIFO-based ABI [ 0.075349] Xen: initializing cpu0 [ 0.079399] zynqmp_plat_init: pm node not found [ 0.083456] EFI services will not be available. [ 0.088061] Brought up 1 CPUs [ 0.090992] SMP: Total of 1 processors activated. [ 0.095763] CPU: All CPU(s) started at EL1 [ 0.099923] alternatives: patching kernel code [ 0.104878] devtmpfs: initialized [ 0.111860] DMI not present or invalid. [ 0.111979] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.124334] xor: measuring software checksum speed [ 0.166394] 8regs : 2177.000 MB/sec [ 0.206457] 8regs_prefetch: 1934.000 MB/sec [ 0.246520] 32regs : 2412.000 MB/sec [ 0.286585] 32regs_prefetch: 2178.000 MB/sec [ 0.286626] xor: using function: 32regs (2412.000 MB/sec) [ 0.292173] NET: Registered protocol family 16 [ 0.296514] vdso: 2 pages (1 code @ ffffff8008876000, 1 data @ ffffff8008bb8000) [ 0.303066] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.310704] DMA: preallocated 256 KiB pool for atomic allocations [ 0.316088] xen:swiotlb_xen: Warning: only able to allocate 4 MB for software IO TLB [ 0.324491] software IO TLB [mem 0x72000000-0x72400000] (4MB) mapped at [ffffffc032000000-ffffffc0323fffff] [ 0.334332] ARM CCI_400_r1 PMU driver probed [ 0.337722] /amba/dma@fd500000: could not find phandle [ 0.343190] /amba/dma@fd510000: could not find phandle [ 0.348368] /amba/dma@fd520000: could not find phandle [ 0.353562] /amba/dma@fd530000: could not find phandle [ 0.358746] /amba/dma@fd540000: could not find phandle [ 0.363946] /amba/dma@fd550000: could not find phandle [ 0.369129] /amba/dma@fd560000: could not find phandle [ 0.374321] /amba/dma@fd570000: could not find phandle [ 0.379625] /amba/dma@ffa80000: could not find phandle [ 0.384703] /amba/dma@ffa90000: could not find phandle [ 0.389886] /amba/dma@ffaa0000: could not find phandle [ 0.395083] /amba/dma@ffab0000: could not find phandle [ 0.400270] /amba/dma@ffac0000: could not find phandle [ 0.405464] /amba/dma@ffad0000: could not find phandle [ 0.410651] /amba/dma@ffae0000: could not find phandle [ 0.415844] /amba/dma@ffaf0000: could not find phandle [ 0.421126] /amba/ethernet@ff0e0000: could not find phandle [ 0.428928] /amba/spi@ff0f0000: could not find phandle [ 0.432102] /amba/sdhci@ff160000: could not find phandle [ 0.437225] /amba/sdhci@ff170000: could not find phandle [ 0.442638] /amba/usb@fe200000: could not find phandle [ 0.467070] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 0.536192] raid6: int64x1 gen() 409 MB/s [ 0.604225] raid6: int64x1 xor() 430 MB/s [ 0.672351] raid6: int64x2 gen() 617 MB/s [ 0.740502] raid6: int64x2 xor() 559 MB/s [ 0.808626] raid6: int64x4 gen() 907 MB/s [ 0.876750] raid6: int64x4 xor() 685 MB/s [ 0.944895] raid6: int64x8 gen() 1056 MB/s [ 1.012962] raid6: int64x8 xor() 683 MB/s [ 1.081131] raid6: neonx1 gen() 672 MB/s [ 1.149234] raid6: neonx1 xor() 702 MB/s [ 1.217382] raid6: neonx2 gen() 1044 MB/s [ 1.285484] raid6: neonx2 xor() 925 MB/s [ 1.353620] raid6: neonx4 gen() 1385 MB/s [ 1.421726] raid6: neonx4 xor() 1079 MB/s [ 1.489843] raid6: neonx8 gen() 1556 MB/s [ 1.557977] raid6: neonx8 xor() 1131 MB/s [ 1.558015] raid6: using algorithm neonx8 gen() 1556 MB/s [ 1.562155] raid6: .... xor() 1131 MB/s, rmw enabled [ 1.567169] raid6: using intx1 recovery algorithm [ 1.572162] ACPI: Interpreter disabled. [ 1.575851] xen:balloon: Initialising balloon driver [ 1.581157] xen_balloon: Initialising balloon driver [ 1.586566] SCSI subsystem initialized [ 1.589843] usbcore: registered new interface driver usbfs [ 1.595239] usbcore: registered new interface driver hub [ 1.600594] usbcore: registered new device driver usb [ 1.605749] media: Linux media interface: v0.10 [ 1.610279] Linux video capture interface: v2.00 [ 1.614952] pps_core: LinuxPPS API ver. 1 registered [ 1.619939] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> [ 1.629124] PTP clock support registered [ 1.633131] EDAC MC: Ver: 3.0.0 [ 1.637530] dmi: Firmware registration failed. [ 1.640838] FPGA manager framework [ 1.644297] Advanced Linux Sound Architecture Driver Initialized. [ 1.651805] clocksource: Switched to clocksource arch_sys_counter [ 1.656639] VFS: Disk quotas dquot_6.6.0 [ 1.660555] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 1.667640] pnp: PnP ACPI: disabled [ 1.677722] NET: Registered protocol family 2 [ 1.678115] TCP established hash table entries: 8192 (order: 4, 65536 bytes) [ 1.683729] TCP bind hash table entries: 8192 (order: 5, 131072 bytes) [ 1.690352] TCP: Hash tables configured (established 8192 bind 8192) [ 1.696668] UDP hash table entries: 512 (order: 2, 16384 bytes) [ 1.702620] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes) [ 1.709187] NET: Registered protocol family 1 [ 1.714143] RPC: Registered named UNIX socket transport module. [ 1.719368] RPC: Registered udp transport module. [ 1.724118] RPC: Registered tcp transport module. [ 1.728875] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 1.735507] Trying to unpack rootfs image as initramfs... [ 1.829211] Freeing initrd memory: 32064K (ffffffc008200000 - ffffffc00a150000) [ 1.831837] futex hash table entries: 1024 (order: 5, 131072 bytes) [ 1.837381] audit: initializing netlink subsys (disabled) [ 1.842785] audit: type=2000 audit(1.716:1): initialized [ 1.849215] workingset: timestamp_bits=60 max_order=18 bucket_order=0 [ 1.855369] NFS: Registering the id_resolver key type [ 1.859735] Key type id_resolver registered [ 1.863946] Key type id_legacy registered [ 1.868024] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 1.874794] jffs2: version 2.2. (NAND) (SUMMARY) © 2001-2006 Red Hat, Inc. [ 1.890483] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247) [ 1.892368] io scheduler noop registered [ 1.896338] io scheduler deadline registered [ 1.900671] io scheduler cfq registered (default) [ 1.905913] nwl-pcie fd0e0000.pcie: Link is DOWN [ 1.910114] PCI host bridge /amba/pcie@fd0e0000 ranges: [ 1.915355] No bus range found for /amba/pcie@fd0e0000, using [bus 00-ff] [ 1.922364] MEM 0xe0000000..0xefffffff -> 0xe0000000 [ 1.927547] MEM 0x600000000..0x7ffffffff -> 0x600000000 [ 1.933079] nwl-pcie fd0e0000.pcie: PCI host bridge to bus 0000:00 [ 1.939225] pci_bus 0000:00: root bus resource [bus 00-ff] [ 1.944764] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff] [ 1.951682] pci_bus 0000:00: root bus resource [mem 0x600000000-0x7ffffffff pref] (XEN) physdev.c:17:d0v0 PHYSDEVOP cmd=25: not implemented (XEN) physdev.c:17:d0v0 PHYSDEVOP cmd=15: not implemented [ 1.969418] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X might fail! [ 1.977117] pci 0000:00:00.0: PCI bridge to [bus 01-0c] [ 1.983421] xilinx-dpdma fd4c0000.dma: Xilinx DPDMA engine is probed [ 1.988983] xilinx-zynqmp-dma fd500000.dma: ZynqMP DMA driver Probe success [ 1.995839] xilinx-zynqmp-dma fd510000.dma: ZynqMP DMA driver Probe success [ 2.002833] xilinx-zynqmp-dma fd520000.dma: ZynqMP DMA driver Probe success [ 2.009840] xilinx-zynqmp-dma fd530000.dma: ZynqMP DMA driver Probe success [ 2.016839] xilinx-zynqmp-dma fd540000.dma: ZynqMP DMA driver Probe success [ 2.023862] xilinx-zynqmp-dma fd550000.dma: ZynqMP DMA driver Probe success [ 2.030855] xilinx-zynqmp-dma fd560000.dma: ZynqMP DMA driver Probe success [ 2.037859] xilinx-zynqmp-dma fd570000.dma: ZynqMP DMA driver Probe success [ 2.044870] xilinx-zynqmp-dma ffa80000.dma: ZynqMP DMA driver Probe success [ 2.051885] xilinx-zynqmp-dma ffa90000.dma: ZynqMP DMA driver Probe success [ 2.058876] xilinx-zynqmp-dma ffaa0000.dma: ZynqMP DMA driver Probe success [ 2.065892] xilinx-zynqmp-dma ffab0000.dma: ZynqMP DMA driver Probe success [ 2.072891] xilinx-zynqmp-dma ffac0000.dma: ZynqMP DMA driver Probe success [ 2.079906] xilinx-zynqmp-dma ffad0000.dma: ZynqMP DMA driver Probe success [ 2.086911] xilinx-zynqmp-dma ffae0000.dma: ZynqMP DMA driver Probe success [ 2.093913] xilinx-zynqmp-dma ffaf0000.dma: ZynqMP DMA driver Probe success [ 2.102037] xen:xen_evtchn: Event-channel device installed [ 2.153318] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 2.155458] ff010000.serial: ttyPS0 at MMIO 0xff010000 (irq = 215, base_baud = 6250000) is a xuartps [ 2.164196] [drm] Initialized drm 1.1.0 20060810 [ 2.168225] Unable to detect cache hierarchy from DT for CPU 0 [ 2.178951] brd: module loaded [ 2.182686] loop: module loaded [ 2.182731] Invalid max_queues (4), will use default max: 1. [ 2.186734] ahci-ceva fd0c0000.ahci: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0x3 impl platform mode [ 2.195012] ahci-ceva fd0c0000.ahci: flags: 64bit ncq sntf pm clo only pmp fbs pio slum part ccc sds apst [ 2.206047] scsi host0: ahci-ceva [ 2.208400] scsi host1: ahci-ceva [ 2.211527] ata1: SATA max UDMA/133 mmio [mem 0xfd0c0000-0xfd0c1fff] port 0x100 irq 212 [ 2.219494] ata2: SATA max UDMA/133 mmio [mem 0xfd0c0000-0xfd0c1fff] port 0x180 irq 212 [ 2.227654] mtdoops: mtd device (mtddev=name/number) must be supplied [ 2.234416] zynqmp-qspi ff0f0000.spi: rx bus width not found [ 2.240531] m25p80 spi0.0: found n25q512a, expected m25p80 [ 2.245489] m25p80 spi0.0: n25q512a (131072 Kbytes) [ 2.250216] 4 ofpart partitions found on MTD device spi0.0 [ 2.255731] Creating 4 MTD partitions on "spi0.0": [ 2.260573] 0x000000000000-0x000001e00000 : "boot" [ 2.265964] 0x000001e00000-0x000001e40000 : "bootenv" [ 2.271076] 0x000001e40000-0x000004240000 : "kernel" [ 2.276114] 0x000004240000-0x000008000000 : "spare" [ 2.282518] libphy: Fixed MDIO Bus: probed [ 2.285069] tun: Universal TUN/TAP device driver, 1.6 [ 2.289714] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> [ 2.299512] CAN device driver interface [ 2.300696] gpiod_set_value: invalid GPIO [ 2.304298] libphy: MACB_mii_bus: probed [ 2.311376] macb ff0e0000.ethernet eth0: Cadence GEM rev 0x50070106 at 0xff0e0000 irq 25 (00:0a:35:00:22:01) [ 2.317773] TI DP83867 ff0e0000.etherne:0c: attached PHY driver [TI DP83867] (mii_bus:phy_addr=ff0e0000.etherne:0c, irq=-1) [ 2.329427] xen_netfront: Initialising Xen virtual ethernet driver [ 2.335218] usbcore: registered new interface driver asix [ 2.340620] usbcore: registered new interface driver ax88179_178a [ 2.346749] usbcore: registered new interface driver cdc_ether [ 2.352629] usbcore: registered new interface driver net1080 [ 2.358339] usbcore: registered new interface driver cdc_subset [ 2.364306] usbcore: registered new interface driver zaurus [ 2.369953] usbcore: registered new interface driver cdc_ncm [ 2.377229] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 2.382206] ehci-pci: EHCI PCI platform driver [ 2.386955] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller [ 2.392245] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1 [ 2.400870] xhci-hcd xhci-hcd.0.auto: hcc params 0x0238f625 hci version 0x100 quirks 0x00010010 [ 2.408717] xhci-hcd xhci-hcd.0.auto: irq 220, io mem 0xfe200000 [ 2.414936] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 [ 2.421555] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2.428815] usb usb1: Product: xHCI Host Controller [ 2.433743] usb usb1: Manufacturer: Linux 4.6.0-xilinx xhci-hcd [ 2.439707] usb usb1: SerialNumber: xhci-hcd.0.auto [ 2.444940] hub 1-0:1.0: USB hub found [ 2.448559] hub 1-0:1.0: 1 port detected [ 2.452549] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller [ 2.457971] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2 [ 2.465711] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.473878] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003 [ 2.480630] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2.487889] usb usb2: Product: xHCI Host Controller [ 2.492823] usb usb2: Manufacturer: Linux 4.6.0-xilinx xhci-hcd [ 2.498787] usb usb2: SerialNumber: xhci-hcd.0.auto [ 2.503975] hub 2-0:1.0: USB hub found [ 2.507586] hub 2-0:1.0: 1 port detected [ 2.511689] usbcore: registered new interface driver uas [ 2.516936] usbcore: registered new interface driver usb-storage [ 2.523312] mousedev: PS/2 mouse device common for all mice [ 2.529083] rtc_zynqmp ffa60000.rtc: rtc core: registered ffa60000.rtc as rtc0 [ 2.535862] i2c /dev entries driver [ 2.541605] pca953x 0-0020: interrupt support not compiled in [ 2.546955] pca953x 0-0021: interrupt support not compiled in [ 2.551014] cdns-i2c ff020000.i2c: 400 kHz mmio ff020000 irq 201 [ 2.557191] ata2: SATA link down (SStatus 0 SControl 330) [ 2.562484] ata1: SATA link down (SStatus 0 SControl 330) [ 2.568159] cdns-i2c ff030000.i2c: 400 kHz mmio ff030000 irq 202 [ 2.574911] i2c i2c-0: Added multiplexed i2c bus 2 [ 2.579665] i2c i2c-0: Added multiplexed i2c bus 3 [ 2.584655] i2c i2c-0: Added multiplexed i2c bus 4 [ 2.589068] i2c i2c-0: Added multiplexed i2c bus 5 [ 2.593352] pca954x 0-0075: registered 4 multiplexed busses for I2C mux pca9544 [ 2.601179] at24 6-0054: 1024 byte 24c08 EEPROM, writable, 1 bytes/write [ 2.607451] i2c i2c-6: Failed to register i2c client 24c08 at 0x55 (-16) [ 2.614197] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@55 [ 2.623186] i2c i2c-6: Failed to register i2c client 24c08 at 0x56 (-16) [ 2.629927] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@56 [ 2.638934] i2c i2c-6: Failed to register i2c client 24c08 at 0x57 (-16) [ 2.645677] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@57 [ 2.654674] i2c i2c-1: Added multiplexed i2c bus 6 [ 2.659968] i2c i2c-1: Added multiplexed i2c bus 7 [ 2.664636] i2c i2c-1: Added multiplexed i2c bus 8 [ 2.669482] i2c i2c-1: Added multiplexed i2c bus 9 [ 2.674317] i2c i2c-1: Added multiplexed i2c bus 10 [ 2.679198] i2c i2c-1: Added multiplexed i2c bus 11 [ 2.684097] i2c i2c-1: Added multiplexed i2c bus 12 [ 2.689037] i2c i2c-1: Added multiplexed i2c bus 13 [ 2.693771] pca954x 1-0074: registered 8 multiplexed busses for I2C switch pca9548 [ 2.701596] i2c i2c-1: Added multiplexed i2c bus 14 [ 2.706487] i2c i2c-1: Added multiplexed i2c bus 15 [ 2.711445] i2c i2c-1: Added multiplexed i2c bus 16 [ 2.716584] i2c i2c-1: Added multiplexed i2c bus 17 [ 2.721445] i2c i2c-1: Added multiplexed i2c bus 18 [ 2.726229] i2c i2c-1: Added multiplexed i2c bus 19 [ 2.731187] i2c i2c-1: Added multiplexed i2c bus 20 [ 2.736091] i2c i2c-1: Added multiplexed i2c bus 21 [ 2.740828] pca954x 1-0075: registered 8 multiplexed busses for I2C switch pca9548 [ 2.750093] ina2xx 2-0040: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.755179] ina2xx 2-0041: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.761582] ina2xx 2-0042: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.768009] ina2xx 2-0043: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.774384] ina2xx 2-0044: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.780774] ina2xx 2-0045: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.787182] ina2xx 2-0046: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.793590] ina2xx 2-0047: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.799996] ina2xx 2-004a: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.806384] ina2xx 2-004b: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.812869] ina2xx 3-0040: power monitor ina226 (Rshunt = 2000 uOhm) [ 2.819187] ina2xx 3-0041: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.825588] ina2xx 3-0042: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.831994] ina2xx 3-0043: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.838399] ina2xx 3-0044: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.844799] ina2xx 3-0045: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.851191] ina2xx 3-0046: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.857596] ina2xx 3-0047: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.892592] EDAC MC: ECC not enabled [ 2.892881] EDAC DEVICE0: Giving out device to module zynqmp-ocm-edac controller zynqmp_ocm: DEV ff960000.memory-controller (INTERRUPT) [ 2.902927] sdhci: Secure Digital Host Controller Interface driver [ 2.909051] sdhci: Copyright(c) Pierre Ossman [ 2.913467] sdhci-pltfm: SDHCI platform and OF driver helper [ 2.919334] sdhci-arasan ff160000.sdhci: "xlnx,mio_bank " property is missing. [ 2.926451] sdhci-arasan: probe of ff160000.sdhci failed with error -22 [ 2.971794] mmc0: SDHCI controller on ff170000.sdhci [ff170000.sdhci] using ADMA 64-bit [ 2.974397] ledtrig-cpu: registered to indicate activity on CPUs [ 2.980436] usbcore: registered new interface driver usbhid [ 2.985915] usbhid: USB HID core driver [ 3.006852] si570 8-005d: registered, current frequency 300000000 Hz [ 3.031440] si570 9-005d: registered, current frequency 148500000 Hz [ 3.032480] fpga_manager fpga0: Xilinx ZynqMp FPGA Manager registered [ 3.039447] pktgen: Packet Generator for packet performance testing. Version: 2.75 [ 3.046764] Initializing XFRM netlink socket [ 3.050727] NET: Registered protocol family 10 [ 3.057680] sit: IPv6 over IPv4 tunneling driver [ 3.060216] NET: Registered protocol family 17 [ 3.064351] NET: Registered protocol family 15 [ 3.068838] can: controller area network core (rev 20120528 abi 9) [ 3.075093] NET: Registered protocol family 29 [ 3.079572] can: raw protocol (rev 20120528) [ 3.083890] can: broadcast manager protocol (rev 20120528 t) [ 3.089600] can: netlink gateway (rev 20130117) max_hops=1 [ 3.095299] 9pnet: Installing 9P2000 support [ 3.099481] Key type dns_resolver registered [ 3.106448] registered taskstats version 1 [ 3.108418] Btrfs loaded [ 3.114808] rtc_zynqmp ffa60000.rtc: setting system clock to 1986-12-22 00:43:37 UTC (535596217) [ 3.119494] ALSA device list: [ 3.122394] No soundcards found. [ 3.129402] Freeing unused kernel memory: 460K (ffffff8008b3f000 - ffffff8008bb2000) INIT: version 2.88 booting [ 4.087687] random: dd urandom read with 7 bits of entropy available Fri Dec 16 01:23:11 UTC 2016 Starting internet superserver: inetd. INIT: Entering runlevel: 5 Configuring network interfaces... [ 4.186752] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready udhcpc (v1.24.1) started Sending discover... [ 6.308108] macb ff0e0000.ethernet eth0: link up (1000/Full) [ 6.308242] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Sending discover... Sending select for 10.0.0.191... Lease of 10.0.0.191 obtained, lease time 604800 /etc/udhcpc.d/50default: Adding DNS 75.75.75.75 /etc/udhcpc.d/50default: Adding DNS 75.75.76.76 done. Starting OpenBSD Secure Shell server: sshd generating ssh RSA key... generating ssh ECDSA key... generating ssh DSA key... generating ssh ED25519 key... done. Starting syslogd/klogd: done Starting /usr/sbin/xenstored.... Setting domain 0 name, domid and JSON config... libxl: error: libxl.c:689:libxl_domain_info: getting domain info list: Permission denied libxl: error: libxl_dom.c:1127:libxl__userdata_path: unable to find domain info for domain 0: Permission denied cannot store stub json config for domain 0 Starting xenconsoled... Starting QEMU as disk backend for dom0 /etc/rc5.d/S80xencommons: line 102: /usr/bin/qemu-system-i386: No such file or directory Starting domain watchdog daemon: xenwatchdogd startup PetaLinux 2016.4 plnx_aarch64 /dev/hvc0 plnx_aarch64 login: root root@plnx_aarch64:~# [ 33.180039] random: nonblocking pool is initialized (XEN) (XEN) TBM Jan 25 2017 12:45:38 running on xen-guest_irq_latency (XEN) MIDR=410fd034 (XEN) D$ L1 32KB linesize=64b ways=4 sets=128 (XEN) I$ L1 32KB linesize=64b ways=2 sets=256 (XEN) D$ L2 1024KB linesize=64b ways=16 sets=1024 (XEN) I$ L2 1024KB linesize=64b ways=16 sets=1024 (XEN) plat_init_a64 (XEN) EL1 I and D caches enabled (XEN) RDO: enable MMU (XEN) RDO: Still here? EL1 S1 MMU is on (XEN) CNTFRQ=99998999 Hz k=10 (XEN) latency (ns): max=8470 warm_max=0 min=8470 avg=8470 (XEN) latency (ns): max=8470 warm_max=0 min=6700 avg=8249 (XEN) latency (ns): max=8470 warm_max=6580 min=6580 avg=8041 (XEN) latency (ns): max=8470 warm_max=6600 min=6580 avg=7861 (XEN) latency (ns): max=8470 warm_max=6600 min=6530 avg=7695 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7525 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7392 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7281 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7178 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7102 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7020 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6948 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6886 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6837 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6794 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6769 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6725 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6696 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6663 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6644 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6628 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6616 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6591 (XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6565 (XEN) latency (ns): max=8470 warm_max=6660 min=6330 avg=6576 (XEN) latency (ns): max=8470 warm_max=6660 min=6330 avg=6558 (XEN) latency (ns): max=8470 warm_max=6660 min=6180 avg=6511 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6555 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6535 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6522 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6507 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6497 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6494 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6484 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6478 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6464 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6461 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6462 (XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6457 (XEN) sched_smt_power_savings: disabled (XEN) NOW=56019474589 (XEN) Online Cpus: 0-3 (XEN) Cpupool 0: (XEN) Cpus: 0-3 (XEN) Scheduler: SMP Credit Scheduler (credit) (XEN) info: (XEN) ncpus = 4 (XEN) master = 0 (XEN) credit = 1200 (XEN) credit balance = 0 (XEN) weight = 256 (XEN) runq_sort = 260 (XEN) default-weight = 256 (XEN) tslice = 30ms (XEN) ratelimit = 1000us (XEN) credits per msec = 10 (XEN) ticks per tslice = 3 (XEN) migration delay = 0us (XEN) idlers: 00000000,00000000,00000000,0000000e (XEN) active vcpus: (XEN) 1: [0.0] pri=0 flags=0 cpu=0 credit=96 [w=256,cap=0] (XEN) CPUs info: (XEN) CPU[00] sort=260, sibling=00000000,00000000,00000000,00000001, core=00000000,00000000,00000000,00000001 (XEN) run: [0.0] pri=0 flags=0 cpu=0 credit=96 [w=256,cap=0] (XEN) 1: [32767.0] pri=-64 flags=0 cpu=0 (XEN) CPU[01] sort=260, sibling=00000000,00000000,00000000,00000002, core=00000000,00000000,00000000,00000002 (XEN) run: [32767.1] pri=-64 flags=0 cpu=1 (XEN) CPU[02] sort=260, sibling=00000000,00000000,00000000,00000004, core=00000000,00000000,00000000,00000004 (XEN) run: [32767.2] pri=-64 flags=0 cpu=2 (XEN) CPU[03] sort=260, sibling=00000000,00000000,00000000,00000008, core=00000000,00000000,00000000,00000008 (XEN) run: [32767.3] pri=-64 flags=0 cpu=3 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6629 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6596 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6579 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6570 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6562 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6555 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6534 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6505 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6480 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6473 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6459 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6448 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6447 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6438 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6427 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6432 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6423 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6425 (XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6414 (XEN) Xen version 4.9-unstable (sstabellini@) (aarch64-linux-gnu-gcc (Linaro GCC 2014.05) 4.9.1 20140422 (prerelease)) debug=y Thu Feb 16 14:55:12 PST 2017 (XEN) Latest ChangeSet: Thu Feb 16 14:39:16 2017 -0800 git:4e0ef4d-dirty (XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4 (XEN) 64-bit Execution: (XEN) Processor Features: 0000000000002222 0000000000000000 (XEN) Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32 (XEN) Extensions: FloatingPoint AdvancedSIMD (XEN) Debug Features: 0000000010305106 0000000000000000 (XEN) Auxiliary Features: 0000000000000000 0000000000000000 (XEN) Memory Model Features: 0000000000001122 0000000000000000 (XEN) ISA Features: 0000000000011120 0000000000000000 (XEN) 32-bit Execution: (XEN) Processor Features: 00000131:00011011 (XEN) Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle (XEN) Extensions: GenericTimer Security (XEN) Debug Features: 03010066 (XEN) Auxiliary Features: 00000000 (XEN) Memory Model Features: 10201105 40000000 01260000 02102211 (XEN) ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121 (XEN) Using PSCI-1.0 for SMP bringup (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 99998 KHz (XEN) GICv2 initialization: (XEN) gic_dist_addr=00000000f9010000 (XEN) gic_cpu_addr=00000000f9020000 (XEN) gic_hyp_addr=00000000f9040000 (XEN) gic_vcpu_addr=00000000f9060000 (XEN) gic_maintenance_irq=25 (XEN) GICv2: Adjusting CPU interface base to 0xf902f000 (XEN) GICv2: 192 lines, 4 cpus, secure (IID 0200143b). (XEN) GICv2: WARNING: CPU0: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU0: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU0: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered. (XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2) (XEN) Initializing Credit2 scheduler (XEN) load_precision_shift: 18 (XEN) load_window_shift: 30 (XEN) underload_balance_tolerance: 0 (XEN) overload_balance_tolerance: -3 (XEN) runqueues arrangement: socket (XEN) load tracking window length 1073741824 ns (XEN) Adding cpu 0 to runqueue 0 (XEN) First cpu on runqueue, activating (XEN) Allocated console ring of 32 KiB. (XEN) Bringing up CPU1 (XEN) GICv2: WARNING: CPU1: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU1: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU1: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered. (XEN) Adding cpu 1 to runqueue 0 (XEN) CPU 1 booted. (XEN) Bringing up CPU2 (XEN) GICv2: WARNING: CPU2: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU2: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU2: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered. (XEN) Adding cpu 2 to runqueue 0 (XEN) CPU 2 booted. (XEN) Bringing up CPU3 (XEN) GICv2: WARNING: CPU3: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU3: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered. (XEN) GICv2: WARNING: CPU3: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered. (XEN) Adding cpu 3 to runqueue 0 (XEN) CPU 3 booted. (XEN) Brought up 4 CPUs (XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID (XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558 (XEN) smmu: /amba/smmu@fd800000: probing hardware configuration... (XEN) smmu: /amba/smmu@fd800000: SMMUv2 with: (XEN) smmu: /amba/smmu@fd800000: stage 2 translation (XEN) smmu: /amba/smmu@fd800000: stream matching with 48 register groups, mask 0x7fff (XEN) smmu: /amba/smmu@fd800000: 16 context banks (0 stage-2 only) (XEN) smmu: /amba/smmu@fd800000: Stage-2: 40-bit IPA -> 48-bit PA (XEN) smmu: /amba/smmu@fd800000: registered 26 master devices (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Interrupt remapping enabled (XEN) build-id: 6f4fcc598664b1af0ae7b2e2f64b6ed3f140ef2f (XEN) alternatives: Patching with alt table 00000000400f3c48 -> 00000000400f3ffc (XEN) *** LOADING DOMAIN 0 *** (XEN) Loading kernel from boot module @ 0000000000080000 (XEN) Loading ramdisk from boot module @ 00000000060af000 (XEN) Allocating 1:1 mappings totalling 1024MB for dom0: (XEN) BANK[0] 0x00000040000000-0x00000080000000 (1024MB) (XEN) Grant table range: 0x0000087fe00000-0x0000087fe61000 (XEN) smmu: /amba/smmu@fd800000: d0: p2maddr 0x000000087ff62000 (XEN) Loading zImage from 0000000000080000 to 0000000040080000-0000000043180000 (XEN) Loading dom0 initrd from 00000000060af000 to 0x0000000048200000-0x000000004a150400 (XEN) Allocating PPI 16 for event channel interrupt (XEN) Loading dom0 DTB to 0x0000000048000000-0x0000000048006c4a (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen) (XEN) Freed 296kB init memory. [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 4.6.0-xilinx (xbrbbot@xcosswbld01) (gcc version 5.2.1 20151005 (Linaro GCC 5.2-2015.11-2) ) #1 SMP Thu Dec 15 18:01:07 MST 2016 [ 0.000000] Boot CPU: AArch64 Processor [410fd034] [ 0.000000] earlycon: xen0 at I/O port 0x0 (options '') [ 0.000000] bootconsole [xen0] enabled [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 128 MiB at 0x0000000078000000 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv0.2 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] Xen 4.9 support found [ 0.000000] percpu: Embedded 20 pages/cpu @ffffffc037f73000 s42240 r8192 d31488 u81920 [ 0.000000] Detected VIPT I-cache on CPU0 [ 0.000000] CPU features: enabling workaround for ARM erratum 845719 [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 258560 [ 0.000000] Kernel command line: console=hvc0 earlycon=xen earlyprintk=xen maxcpus=1 [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.000000] software IO TLB [mem 0x72e00000-0x76e00000] (64MB) mapped at [ffffffc032e00000-ffffffc036dfffff] [ 0.000000] Memory: 790568K/1048576K available (8136K kernel code, 561K rwdata, 2860K rodata, 460K init, 374K bss, 126936K reserved, 131072K cma-reserved) [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffffff8000000000 - 0xffffff8008000000 ( 128 MB) [ 0.000000] vmalloc : 0xffffff8008000000 - 0xffffffbdffff0000 ( 247 GB) [ 0.000000] .text : 0xffffff8008080000 - 0xffffff8008870000 ( 8128 KB) [ 0.000000] .rodata : 0xffffff8008870000 - 0xffffff8008b3f000 ( 2876 KB) [ 0.000000] .init : 0xffffff8008b3f000 - 0xffffff8008bb2000 ( 460 KB) [ 0.000000] .data : 0xffffff8008bb2000 - 0xffffff8008c3e400 ( 561 KB) [ 0.000000] vmemmap : 0xffffffbe00000000 - 0xffffffbfc0000000 ( 7 GB maximum) [ 0.000000] 0xffffffbe00000000 - 0xffffffbe00e00000 ( 14 MB actual) [ 0.000000] fixed : 0xffffffbffe7fd000 - 0xffffffbffec00000 ( 4108 KB) [ 0.000000] PCI I/O : 0xffffffbffee00000 - 0xffffffbfffe00000 ( 16 MB) [ 0.000000] memory : 0xffffffc000000000 - 0xffffffc040000000 ( 1024 MB) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] Build-time adjustment of leaf fanout to 64. [ 0.000000] RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4. [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=4 [ 0.000000] NR_IRQS:64 nr_irqs:64 0 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER4 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER8 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER12 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER16 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER20 (XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER0 [ 0.000000] Architected cp15 timer(s) running at 99.99MHz (virt). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x171015c6fb, max_idle_ns: 440795203665 ns [ 0.000004] sched_clock: 56 bits at 99MHz, resolution 10ns, wraps every 4398046511101ns [ 0.008283] Console: colour dummy device 80x25 [ 0.012378] console [hvc0] enabled [ 0.012378] console [hvc0] enabled [ 0.019210] bootconsole [xen0] disabled [ 0.019210] bootconsole [xen0] disabled [ 0.026913] Calibrating delay loop (skipped), value calculated using timer frequency.. 199.99 BogoMIPS (lpj=399995) [ 0.037368] pid_max: default: 32768 minimum: 301 [ 0.042142] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes) [ 0.048789] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes) [ 0.056540] ASID allocator initialised with 65536 entries [ 0.061531] xen:grant_table: Grant tables using version 1 layout [ 0.067507] Grant table initialized [ 0.071038] xen:events: Using FIFO-based ABI [ 0.075349] Xen: initializing cpu0 [ 0.079398] zynqmp_plat_init: pm node not found [ 0.083458] EFI services will not be available. [ 0.088063] Brought up 1 CPUs [ 0.090993] SMP: Total of 1 processors activated. [ 0.095764] CPU: All CPU(s) started at EL1 [ 0.099919] alternatives: patching kernel code [ 0.104879] devtmpfs: initialized [ 0.111828] DMI not present or invalid. [ 0.111948] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.124339] xor: measuring software checksum speed [ 0.166394] 8regs : 2177.000 MB/sec [ 0.206457] 8regs_prefetch: 1934.000 MB/sec [ 0.246520] 32regs : 2412.000 MB/sec [ 0.286585] 32regs_prefetch: 2178.000 MB/sec [ 0.286626] xor: using function: 32regs (2412.000 MB/sec) [ 0.292168] NET: Registered protocol family 16 [ 0.296515] vdso: 2 pages (1 code @ ffffff8008876000, 1 data @ ffffff8008bb8000) [ 0.303066] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.310705] DMA: preallocated 256 KiB pool for atomic allocations [ 0.316082] xen:swiotlb_xen: Warning: only able to allocate 4 MB for software IO TLB [ 0.324493] software IO TLB [mem 0x72000000-0x72400000] (4MB) mapped at [ffffffc032000000-ffffffc0323fffff] [ 0.334331] ARM CCI_400_r1 PMU driver probed [ 0.337730] /amba/dma@fd500000: could not find phandle [ 0.343186] /amba/dma@fd510000: could not find phandle [ 0.348369] /amba/dma@fd520000: could not find phandle [ 0.353556] /amba/dma@fd530000: could not find phandle [ 0.358747] /amba/dma@fd540000: could not find phandle [ 0.363941] /amba/dma@fd550000: could not find phandle [ 0.369132] /amba/dma@fd560000: could not find phandle [ 0.374317] /amba/dma@fd570000: could not find phandle [ 0.379625] /amba/dma@ffa80000: could not find phandle [ 0.384697] /amba/dma@ffa90000: could not find phandle [ 0.389889] /amba/dma@ffaa0000: could not find phandle [ 0.395077] /amba/dma@ffab0000: could not find phandle [ 0.400272] /amba/dma@ffac0000: could not find phandle [ 0.405459] /amba/dma@ffad0000: could not find phandle [ 0.410651] /amba/dma@ffae0000: could not find phandle [ 0.415840] /amba/dma@ffaf0000: could not find phandle [ 0.421124] /amba/ethernet@ff0e0000: could not find phandle [ 0.428927] /amba/spi@ff0f0000: could not find phandle [ 0.432103] /amba/sdhci@ff160000: could not find phandle [ 0.437227] /amba/sdhci@ff170000: could not find phandle [ 0.442640] /amba/usb@fe200000: could not find phandle [ 0.467039] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 0.536076] raid6: int64x1 gen() 409 MB/s [ 0.604165] raid6: int64x1 xor() 432 MB/s [ 0.672300] raid6: int64x2 gen() 617 MB/s [ 0.740425] raid6: int64x2 xor() 555 MB/s [ 0.808549] raid6: int64x4 gen() 907 MB/s [ 0.876699] raid6: int64x4 xor() 685 MB/s [ 0.944824] raid6: int64x8 gen() 1056 MB/s [ 1.012954] raid6: int64x8 xor() 683 MB/s [ 1.081060] raid6: neonx1 gen() 672 MB/s [ 1.149185] raid6: neonx1 xor() 702 MB/s [ 1.217308] raid6: neonx2 gen() 1044 MB/s [ 1.285429] raid6: neonx2 xor() 925 MB/s [ 1.353553] raid6: neonx4 gen() 1385 MB/s [ 1.421680] raid6: neonx4 xor() 1079 MB/s [ 1.489788] raid6: neonx8 gen() 1556 MB/s [ 1.557921] raid6: neonx8 xor() 1131 MB/s [ 1.557959] raid6: using algorithm neonx8 gen() 1556 MB/s [ 1.562095] raid6: .... xor() 1131 MB/s, rmw enabled [ 1.567115] raid6: using intx1 recovery algorithm [ 1.572102] ACPI: Interpreter disabled. [ 1.575793] xen:balloon: Initialising balloon driver [ 1.581095] xen_balloon: Initialising balloon driver [ 1.586511] SCSI subsystem initialized [ 1.589786] usbcore: registered new interface driver usbfs [ 1.595180] usbcore: registered new interface driver hub [ 1.600536] usbcore: registered new device driver usb [ 1.605689] media: Linux media interface: v0.10 [ 1.610221] Linux video capture interface: v2.00 [ 1.614892] pps_core: LinuxPPS API ver. 1 registered [ 1.619881] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> [ 1.629066] PTP clock support registered [ 1.633072] EDAC MC: Ver: 3.0.0 [ 1.637484] dmi: Firmware registration failed. [ 1.640780] FPGA manager framework [ 1.644238] Advanced Linux Sound Architecture Driver Initialized. [ 1.651754] clocksource: Switched to clocksource arch_sys_counter [ 1.656580] VFS: Disk quotas dquot_6.6.0 [ 1.660496] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 1.667589] pnp: PnP ACPI: disabled [ 1.677675] NET: Registered protocol family 2 [ 1.678071] TCP established hash table entries: 8192 (order: 4, 65536 bytes) [ 1.683679] TCP bind hash table entries: 8192 (order: 5, 131072 bytes) [ 1.690302] TCP: Hash tables configured (established 8192 bind 8192) [ 1.696611] UDP hash table entries: 512 (order: 2, 16384 bytes) [ 1.702559] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes) [ 1.709133] NET: Registered protocol family 1 [ 1.714098] RPC: Registered named UNIX socket transport module. [ 1.719317] RPC: Registered udp transport module. [ 1.724068] RPC: Registered tcp transport module. [ 1.728824] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 1.735457] Trying to unpack rootfs image as initramfs... [ 1.829214] Freeing initrd memory: 32064K (ffffffc008200000 - ffffffc00a150000) [ 1.831842] futex hash table entries: 1024 (order: 5, 131072 bytes) [ 1.837389] audit: initializing netlink subsys (disabled) [ 1.842787] audit: type=2000 audit(1.716:1): initialized [ 1.849222] workingset: timestamp_bits=60 max_order=18 bucket_order=0 [ 1.855367] NFS: Registering the id_resolver key type [ 1.859755] Key type id_resolver registered [ 1.863948] Key type id_legacy registered [ 1.868030] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 1.874796] jffs2: version 2.2. (NAND) (SUMMARY) © 2001-2006 Red Hat, Inc. [ 1.890491] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247) [ 1.892369] io scheduler noop registered [ 1.896340] io scheduler deadline registered [ 1.900675] io scheduler cfq registered (default) [ 1.905914] nwl-pcie fd0e0000.pcie: Link is DOWN [ 1.910115] PCI host bridge /amba/pcie@fd0e0000 ranges: [ 1.915351] No bus range found for /amba/pcie@fd0e0000, using [bus 00-ff] [ 1.922366] MEM 0xe0000000..0xefffffff -> 0xe0000000 [ 1.927553] MEM 0x600000000..0x7ffffffff -> 0x600000000 [ 1.933080] nwl-pcie fd0e0000.pcie: PCI host bridge to bus 0000:00 [ 1.939227] pci_bus 0000:00: root bus resource [bus 00-ff] [ 1.944760] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff] [ 1.951683] pci_bus 0000:00: root bus resource [mem 0x600000000-0x7ffffffff pref] (XEN) physdev.c:17:d0v0 PHYSDEVOP cmd=25: not implemented (XEN) physdev.c:17:d0v0 PHYSDEVOP cmd=15: not implemented [ 1.969420] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X might fail! [ 1.977127] pci 0000:00:00.0: PCI bridge to [bus 01-0c] [ 1.983420] xilinx-dpdma fd4c0000.dma: Xilinx DPDMA engine is probed [ 1.988985] xilinx-zynqmp-dma fd500000.dma: ZynqMP DMA driver Probe success [ 1.995839] xilinx-zynqmp-dma fd510000.dma: ZynqMP DMA driver Probe success [ 2.002835] xilinx-zynqmp-dma fd520000.dma: ZynqMP DMA driver Probe success [ 2.009841] xilinx-zynqmp-dma fd530000.dma: ZynqMP DMA driver Probe success [ 2.016840] xilinx-zynqmp-dma fd540000.dma: ZynqMP DMA driver Probe success [ 2.023846] xilinx-zynqmp-dma fd550000.dma: ZynqMP DMA driver Probe success [ 2.030856] xilinx-zynqmp-dma fd560000.dma: ZynqMP DMA driver Probe success [ 2.037865] xilinx-zynqmp-dma fd570000.dma: ZynqMP DMA driver Probe success [ 2.044871] xilinx-zynqmp-dma ffa80000.dma: ZynqMP DMA driver Probe success [ 2.051878] xilinx-zynqmp-dma ffa90000.dma: ZynqMP DMA driver Probe success [ 2.058878] xilinx-zynqmp-dma ffaa0000.dma: ZynqMP DMA driver Probe success [ 2.065893] xilinx-zynqmp-dma ffab0000.dma: ZynqMP DMA driver Probe success [ 2.072894] xilinx-zynqmp-dma ffac0000.dma: ZynqMP DMA driver Probe success [ 2.079901] xilinx-zynqmp-dma ffad0000.dma: ZynqMP DMA driver Probe success [ 2.086912] xilinx-zynqmp-dma ffae0000.dma: ZynqMP DMA driver Probe success [ 2.093913] xilinx-zynqmp-dma ffaf0000.dma: ZynqMP DMA driver Probe success [ 2.102033] xen:xen_evtchn: Event-channel device installed [ 2.153189] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 2.155328] ff010000.serial: ttyPS0 at MMIO 0xff010000 (irq = 215, base_baud = 6250000) is a xuartps [ 2.164053] [drm] Initialized drm 1.1.0 20060810 [ 2.168098] Unable to detect cache hierarchy from DT for CPU 0 [ 2.178894] brd: module loaded [ 2.182529] loop: module loaded [ 2.182577] Invalid max_queues (4), will use default max: 1. [ 2.186597] ahci-ceva fd0c0000.ahci: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0x3 impl platform mode [ 2.194860] ahci-ceva fd0c0000.ahci: flags: 64bit ncq sntf pm clo only pmp fbs pio slum part ccc sds apst [ 2.205933] scsi host0: ahci-ceva [ 2.208250] scsi host1: ahci-ceva [ 2.211373] ata1: SATA max UDMA/133 mmio [mem 0xfd0c0000-0xfd0c1fff] port 0x100 irq 212 [ 2.219333] ata2: SATA max UDMA/133 mmio [mem 0xfd0c0000-0xfd0c1fff] port 0x180 irq 212 [ 2.227508] mtdoops: mtd device (mtddev=name/number) must be supplied [ 2.234264] zynqmp-qspi ff0f0000.spi: rx bus width not found [ 2.240356] m25p80 spi0.0: found n25q512a, expected m25p80 [ 2.245312] m25p80 spi0.0: n25q512a (131072 Kbytes) [ 2.250055] 4 ofpart partitions found on MTD device spi0.0 [ 2.255569] Creating 4 MTD partitions on "spi0.0": [ 2.260416] 0x000000000000-0x000001e00000 : "boot" [ 2.265778] 0x000001e00000-0x000001e40000 : "bootenv" [ 2.270946] 0x000001e40000-0x000004240000 : "kernel" [ 2.275954] 0x000004240000-0x000008000000 : "spare" [ 2.282358] libphy: Fixed MDIO Bus: probed [ 2.284917] tun: Universal TUN/TAP device driver, 1.6 [ 2.289566] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> [ 2.299396] CAN device driver interface [ 2.300542] gpiod_set_value: invalid GPIO [ 2.304151] libphy: MACB_mii_bus: probed [ 2.311247] macb ff0e0000.ethernet eth0: Cadence GEM rev 0x50070106 at 0xff0e0000 irq 25 (00:0a:35:00:22:01) [ 2.317621] TI DP83867 ff0e0000.etherne:0c: attached PHY driver [TI DP83867] (mii_bus:phy_addr=ff0e0000.etherne:0c, irq=-1) [ 2.329274] xen_netfront: Initialising Xen virtual ethernet driver [ 2.335065] usbcore: registered new interface driver asix [ 2.340460] usbcore: registered new interface driver ax88179_178a [ 2.346595] usbcore: registered new interface driver cdc_ether [ 2.352474] usbcore: registered new interface driver net1080 [ 2.358185] usbcore: registered new interface driver cdc_subset [ 2.364152] usbcore: registered new interface driver zaurus [ 2.369795] usbcore: registered new interface driver cdc_ncm [ 2.377082] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 2.382051] ehci-pci: EHCI PCI platform driver [ 2.386802] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller [ 2.392085] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1 [ 2.400713] xhci-hcd xhci-hcd.0.auto: hcc params 0x0238f625 hci version 0x100 quirks 0x00010010 [ 2.408562] xhci-hcd xhci-hcd.0.auto: irq 220, io mem 0xfe200000 [ 2.414782] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 [ 2.421402] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2.428660] usb usb1: Product: xHCI Host Controller [ 2.433588] usb usb1: Manufacturer: Linux 4.6.0-xilinx xhci-hcd [ 2.439552] usb usb1: SerialNumber: xhci-hcd.0.auto [ 2.444783] hub 1-0:1.0: USB hub found [ 2.448405] hub 1-0:1.0: 1 port detected [ 2.452393] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller [ 2.457823] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2 [ 2.465551] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.473724] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003 [ 2.480476] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2.487734] usb usb2: Product: xHCI Host Controller [ 2.492663] usb usb2: Manufacturer: Linux 4.6.0-xilinx xhci-hcd [ 2.498633] usb usb2: SerialNumber: xhci-hcd.0.auto [ 2.503830] hub 2-0:1.0: USB hub found [ 2.507440] hub 2-0:1.0: 1 port detected [ 2.511533] usbcore: registered new interface driver uas [ 2.516782] usbcore: registered new interface driver usb-storage [ 2.523153] mousedev: PS/2 mouse device common for all mice [ 2.528935] rtc_zynqmp ffa60000.rtc: rtc core: registered ffa60000.rtc as rtc0 [ 2.535707] i2c /dev entries driver [ 2.541444] pca953x 0-0020: interrupt support not compiled in [ 2.546795] pca953x 0-0021: interrupt support not compiled in [ 2.550859] cdns-i2c ff020000.i2c: 400 kHz mmio ff020000 irq 201 [ 2.557035] ata2: SATA link down (SStatus 0 SControl 330) [ 2.562335] ata1: SATA link down (SStatus 0 SControl 330) [ 2.568009] cdns-i2c ff030000.i2c: 400 kHz mmio ff030000 irq 202 [ 2.574769] i2c i2c-0: Added multiplexed i2c bus 2 [ 2.579511] i2c i2c-0: Added multiplexed i2c bus 3 [ 2.584506] i2c i2c-0: Added multiplexed i2c bus 4 [ 2.588941] i2c i2c-0: Added multiplexed i2c bus 5 [ 2.593191] pca954x 0-0075: registered 4 multiplexed busses for I2C mux pca9544 [ 2.600992] at24 6-0054: 1024 byte 24c08 EEPROM, writable, 1 bytes/write [ 2.607297] i2c i2c-6: Failed to register i2c client 24c08 at 0x55 (-16) [ 2.614037] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@55 [ 2.623032] i2c i2c-6: Failed to register i2c client 24c08 at 0x56 (-16) [ 2.629773] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@56 [ 2.638775] i2c i2c-6: Failed to register i2c client 24c08 at 0x57 (-16) [ 2.645517] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@57 [ 2.654520] i2c i2c-1: Added multiplexed i2c bus 6 [ 2.659789] i2c i2c-1: Added multiplexed i2c bus 7 [ 2.664473] i2c i2c-1: Added multiplexed i2c bus 8 [ 2.669330] i2c i2c-1: Added multiplexed i2c bus 9 [ 2.674164] i2c i2c-1: Added multiplexed i2c bus 10 [ 2.679045] i2c i2c-1: Added multiplexed i2c bus 11 [ 2.683945] i2c i2c-1: Added multiplexed i2c bus 12 [ 2.688875] i2c i2c-1: Added multiplexed i2c bus 13 [ 2.693617] pca954x 1-0074: registered 8 multiplexed busses for I2C switch pca9548 [ 2.701443] i2c i2c-1: Added multiplexed i2c bus 14 [ 2.706331] i2c i2c-1: Added multiplexed i2c bus 15 [ 2.711290] i2c i2c-1: Added multiplexed i2c bus 16 [ 2.716428] i2c i2c-1: Added multiplexed i2c bus 17 [ 2.721284] i2c i2c-1: Added multiplexed i2c bus 18 [ 2.726075] i2c i2c-1: Added multiplexed i2c bus 19 [ 2.731025] i2c i2c-1: Added multiplexed i2c bus 20 [ 2.735935] i2c i2c-1: Added multiplexed i2c bus 21 [ 2.740674] pca954x 1-0075: registered 8 multiplexed busses for I2C switch pca9548 [ 2.749935] ina2xx 2-0040: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.755025] ina2xx 2-0041: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.761436] ina2xx 2-0042: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.767820] ina2xx 2-0043: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.774230] ina2xx 2-0044: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.780623] ina2xx 2-0045: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.787028] ina2xx 2-0046: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.793432] ina2xx 2-0047: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.799835] ina2xx 2-004a: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.806230] ina2xx 2-004b: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.812724] ina2xx 3-0040: power monitor ina226 (Rshunt = 2000 uOhm) [ 2.819033] ina2xx 3-0041: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.825438] ina2xx 3-0042: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.831835] ina2xx 3-0043: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.838245] ina2xx 3-0044: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.844638] ina2xx 3-0045: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.851038] ina2xx 3-0046: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.857443] ina2xx 3-0047: power monitor ina226 (Rshunt = 5000 uOhm) [ 2.893958] EDAC MC: ECC not enabled [ 2.894244] EDAC DEVICE0: Giving out device to module zynqmp-ocm-edac controller zynqmp_ocm: DEV ff960000.memory-controller (INTERRUPT) [ 2.904297] sdhci: Secure Digital Host Controller Interface driver [ 2.910419] sdhci: Copyright(c) Pierre Ossman [ 2.914835] sdhci-pltfm: SDHCI platform and OF driver helper [ 2.920704] sdhci-arasan ff160000.sdhci: "xlnx,mio_bank " property is missing. [ 2.927819] sdhci-arasan: probe of ff160000.sdhci failed with error -22 [ 2.971743] mmc0: SDHCI controller on ff170000.sdhci [ff170000.sdhci] using ADMA 64-bit [ 2.974342] ledtrig-cpu: registered to indicate activity on CPUs [ 2.980386] usbcore: registered new interface driver usbhid [ 2.985864] usbhid: USB HID core driver [ 3.006812] si570 8-005d: registered, current frequency 300000000 Hz [ 3.031405] si570 9-005d: registered, current frequency 148500000 Hz [ 3.032429] fpga_manager fpga0: Xilinx ZynqMp FPGA Manager registered [ 3.039397] pktgen: Packet Generator for packet performance testing. Version: 2.75 [ 3.046731] Initializing XFRM netlink socket [ 3.050686] NET: Registered protocol family 10 [ 3.057639] sit: IPv6 over IPv4 tunneling driver [ 3.060176] NET: Registered protocol family 17 [ 3.064309] NET: Registered protocol family 15 [ 3.068796] can: controller area network core (rev 20120528 abi 9) [ 3.075050] NET: Registered protocol family 29 [ 3.079530] can: raw protocol (rev 20120528) [ 3.083847] can: broadcast manager protocol (rev 20120528 t) [ 3.089559] can: netlink gateway (rev 20130117) max_hops=1 [ 3.095259] 9pnet: Installing 9P2000 support [ 3.099434] Key type dns_resolver registered [ 3.106404] registered taskstats version 1 [ 3.108383] Btrfs loaded [ 3.114762] rtc_zynqmp ffa60000.rtc: setting system clock to 1986-12-22 00:56:59 UTC (535597019) [ 3.119456] ALSA device list: [ 3.122346] No soundcards found. [ 3.129359] Freeing unused kernel memory: 460K (ffffff8008b3f000 - ffffff8008bb2000) INIT: version 2.88 booting [ 4.093690] random: dd urandom read with 7 bits of entropy available Fri Dec 16 01:23:11 UTC 2016 Starting internet superserver: inetd. INIT: Entering runlevel: 5 Configuring network interfaces... [ 4.193262] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready udhcpc (v1.24.1) started Sending discover... [ 6.308058] macb ff0e0000.ethernet eth0: link up (1000/Full) [ 6.308191] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Sending discover... Sending select for 10.0.0.191... Lease of 10.0.0.191 obtained, lease time 604800 /etc/udhcpc.d/50default: Adding DNS 75.75.75.75 /etc/udhcpc.d/50default: Adding DNS 75.75.76.76 done. Starting OpenBSD Secure Shell server: sshd generating ssh RSA key... generating ssh ECDSA key... generating ssh DSA key... generating ssh ED25519 key... done. Starting syslogd/klogd: done Starting /usr/sbin/xenstored.... Setting domain 0 name, domid and JSON config... libxl: error: libxl.c:689:libxl_domain_info: getting domain info list: Permission denied libxl: error: libxl_dom.c:1127:libxl__userdata_path: unable to find domain info for domain 0: Permission denied cannot store stub json config for domain 0 Starting xenconsoled... Starting QEMU as disk backend for dom0 /etc/rc5.d/S80xencommons: line 102: /usr/bin/qemu-system-i386: No such file or directory Starting domain watchdog daemon: xenwatchdogd startup PetaLinux 2016.4 plnx_aarch64 /dev/hvc0 plnx_aarch64 login: [ 50.558969] random: nonblocking pool is initialized (XEN) (XEN) TBM Jan 25 2017 12:45:38 running on xen-guest_irq_latency (XEN) MIDR=410fd034 (XEN) D$ L1 32KB linesize=64b ways=4 sets=128 (XEN) I$ L1 32KB linesize=64b ways=2 sets=256 (XEN) D$ L2 1024KB linesize=64b ways=16 sets=1024 (XEN) I$ L2 1024KB linesize=64b ways=16 sets=1024 (XEN) plat_init_a64 (XEN) EL1 I and D caches enabled (XEN) RDO: enable MMU (XEN) RDO: Still here? EL1 S1 MMU is on (XEN) CNTFRQ=99998999 Hz k=10 (XEN) latency (ns): max=10180 warm_max=0 min=10180 avg=10180 (XEN) latency (ns): max=10180 warm_max=0 min=8470 avg=9967 (XEN) latency (ns): max=10180 warm_max=8150 min=8150 avg=9740 (XEN) latency (ns): max=10180 warm_max=8150 min=8050 avg=9529 (XEN) latency (ns): max=10180 warm_max=8150 min=8050 avg=9357 (XEN) latency (ns): max=10180 warm_max=8150 min=8030 avg=9192 (XEN) latency (ns): max=10180 warm_max=8150 min=8030 avg=9058 (XEN) latency (ns): max=10180 warm_max=8150 min=8020 avg=8929 (XEN) latency (ns): max=10180 warm_max=8150 min=8020 avg=8832 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8708 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8617 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8552 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8491 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8440 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8395 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8346 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8304 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8270 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8252 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8215 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8196 (XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8176 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8180 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8173 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8159 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8157 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8144 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8119 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8127 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8113 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8098 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8077 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8068 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8056 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8051 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8054 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8068 (XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8061 (XEN) sched_smt_power_savings: disabled (XEN) NOW=83505753225 (XEN) Online Cpus: 0-3 (XEN) Cpupool 0: (XEN) Cpus: 0-3 (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2) (XEN) Active queues: 1 (XEN) default-weight = 256 (XEN) Runqueue 0: (XEN) ncpus = 4 (XEN) cpus = 0-3 (XEN) max_weight = 256 (XEN) instload = 1 (XEN) aveload = 3208 (~1%) (XEN) l(XEN) idlers: 00000000,00000000,00000000,0000000a a(XEN) tickled: 00000000,00000000,00000000,00000000 t(XEN) fully idle cores: 00000000,00000000,00000000,0000000a e(XEN) Domain info: n(XEN) Domain: 0 w 256 v 4 c(XEN) 1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%) (XEN) 2: y[0.1] flags=0 cpu=1 credit=10500000 [w=256]( load=131072 (~50%) (XEN) 3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072 (~50%): (XEN) 4: [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072 (~50%)x =(XEN) Domain: 1 w 256 v 1 1(XEN) 5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%) (XEN) Runqueue info: 6(XEN) runqueue 0: 9(XEN) CPUs info: 0(XEN) CPU[00] runq=0, sibling=00000000,00000000,00000000,00000001, wcore=00000000,00000000,00000000,00000001 a(XEN) run: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%) (XEN) CPU[01] r runq=0, sibling=00000000,00000000,00000000,00000002, mcore=00000000,00000000,00000000,00000002 (XEN) CPU[02] _ runq=0, sibling=00000000,00000000,00000000,00000004, mcore=00000000,00000000,00000000,00000004 (XEN) run: [1.0] flags=2 cpu=2 credit=2915518 [w=256] load=56 (~0%) (XEN) CPU[03] a runq=0, sibling=00000000,00000000,00000000,00000008, xcore=00000000,00000000,00000000,00000008 =11690 min=7840 avg=8514 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8494 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8444 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8401 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8346 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8664 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8565 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8497 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8432 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8388 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8349 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8305 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8280 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8237 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8203 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8197 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8187 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8154 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8150 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8144 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8136 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8111 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8109 (XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8116
Just very quickly... On Thu, 2017-02-16 at 15:07 -0800, Stefano Stabellini wrote: > (XEN) Active queues: 1 > (XEN) default-weight = 256 > (XEN) Runqueue 0: > (XEN) ncpus = 4 > (XEN) cpus = 0-3 > (XEN) max_weight = 256 > (XEN) instload = 1 > (XEN) aveload = 3208 (~1%) > (XEN) l(XEN) idlers: 00000000,00000000,00000000,0000000a > a(XEN) tickled: 00000000,00000000,00000000,00000000 > t(XEN) fully idle cores: 00000000,00000000,00000000,0000000a > e(XEN) Domain info: > n(XEN) Domain: 0 w 256 v 4 > c(XEN) 1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%) > (XEN) 2: y[0.1] flags=0 cpu=1 credit=10500000 [w=256]( load=131072 (~50%) > (XEN) 3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072 (~50%): > (XEN) 4: [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072 (~50%)x > Status of vcpus 2, 3 and 4 is a bit weird. I'll think about it. > =(XEN) Domain: 1 w 256 v 1 > 1(XEN) 5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%) > (XEN) Runqueue info: > 6(XEN) runqueue 0: > 9(XEN) CPUs info: > 0(XEN) CPU[00] runq=0, sibling=00000000,00000000,00000000,00000001, wcore=00000000,00000000,00000000,00000001 > This tells me that nr_cpu_ids is very big (I think it tells it is 128, i.e., ARM default), which means cpumask_t-s are huge. What does `xl info' says. On my (x86) test box, it's like this: ... nr_cpus : 16 max_cpu_id : 63 ... (and I have NR_CPUS=256, i.e., x86 the default). Cpumasks being bigger also means cpumask operation being slower, and this matters quite a bit in Credit2, because we use cpumasks a lot (but also in Credit1, because we use cpumasks a little less than in Credit2, but still quite a bit). Isn't there a way, on ARM, to figure out online that you're not going to have 128 cpus in the platform? Dario
On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: > These are the results, in nanosec: > > AVG MIN MAX WARM MAX > > NODEBUG no WFI 1890 1800 3170 2070 > NODEBUG WFI 4850 4810 7030 4980 > NODEBUG no WFI credit2 2217 2090 3420 2650 > NODEBUG WFI credit2 8080 7890 10320 8300 > > DEBUG no WFI 2252 2080 3320 2650 > DEBUG WFI 6500 6140 8520 8130 > DEBUG WFI, credit2 8050 7870 10680 8450 > > As you can see, depending on whether the guest issues a WFI or not > while > waiting for interrupts, the results change significantly. > Interestingly, > credit2 does worse than credit1 in this area. > I did some measuring myself, on x86, with different tools. So, cyclictest is basically something very very similar to the app Stefano's app. I've run it both within Dom0, and inside a guest. I also run a Xen build (in this case, only inside of the guest). > We are down to 2000-3000ns. Then, I started investigating the > scheduler. > I measured how long it takes to run "vcpu_unblock": 1050ns, which is > significant. I don't know what is causing the remaining 1000-2000ns, > but > I bet on another scheduler function. Do you have any suggestions on > which one? > So, vcpu_unblock() calls vcpu_wake(), which then invokes the scheduler's wakeup related functions. If you time vcpu_unblock(), from beginning to end of the function, you actually capture quite a few things. E.g., the scheduler lock is taken inside vcpu_wake(), so you're basically including time spent waited on the lock in the estimation. That is probably ok (as in, lock contention definitely is something relevant to latency), but it is expected for things to be rather different between Credit1 and Credit2. I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule), and here's the result. Numbers are in cycles (I've used RDTSC) and, for making sure to obtain consistent and comparable numbers, I've set the frequency scaling governor to performance. Dom0, [performance] cyclictest 1us cyclictest 1ms cyclictest 100ms (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 wakeup-avg 2429 2035 1980 1633 2535 1979 wakeup-max 14577 113682 15153 203136 12285 115164 sched-avg 1716 1860 2527 1651 2286 1670 sched-max 16059 15000 12297 101760 15831 13122 VM, [performance] cyclictest 1us cyclictest 1ms cyclictest 100ms make -j xen (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 wakeup-avg 2213 2128 1944 2342 2374 2213 2429 1618 wakeup-max 9990 10104 11262 9927 10290 10218 14430 15108 sched-avg 2437 2472 1620 1594 2498 1759 2449 1809 sched-max 14100 14634 10071 9984 10878 8748 16476 14220 Actually, TSC on this box should be stable and invariant, so I guess I can try with the default governor. Will do that on Monday. Does ARM have frequency scaling (I did remember something on xen-devel, but I am not sure whether it landed upstream)? But anyway. You're seeing big differences between Credit1 and Credit2, while I, at least as far as the actual schedulers' code is concerned, don't. Credit2 shows higher wakeup-max values, but only in cases where the workload runs in dom0. But it also shows better (lower) averages, in both the two kind of workload considered and both in the dom0 and VM case. I therefore wonder what is actually responsible for the huge differences between the two scheduler that you are seeing... could be lock contention, but with only 4 pCPUs and 2 active vCPUs, I honestly doubt it... Regards, Dario
Hi, On 02/17/2017 11:02 AM, Dario Faggioli wrote: > Just very quickly... > > On Thu, 2017-02-16 at 15:07 -0800, Stefano Stabellini wrote: >> (XEN) Active queues: 1 >> (XEN) default-weight = 256 >> (XEN) Runqueue 0: >> (XEN) ncpus = 4 >> (XEN) cpus = 0-3 >> (XEN) max_weight = 256 >> (XEN) instload = 1 >> (XEN) aveload = 3208 (~1%) >> (XEN) l(XEN) idlers: 00000000,00000000,00000000,0000000a >> a(XEN) tickled: 00000000,00000000,00000000,00000000 >> t(XEN) fully idle cores: 00000000,00000000,00000000,0000000a >> e(XEN) Domain info: >> n(XEN) Domain: 0 w 256 v 4 >> c(XEN) 1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%) >> (XEN) 2: y[0.1] flags=0 cpu=1 credit=10500000 [w=256]( load=131072 (~50%) >> (XEN) 3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072 (~50%): >> (XEN) 4: [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072 (~50%)x >> > Status of vcpus 2, 3 and 4 is a bit weird. I'll think about it. > >> =(XEN) Domain: 1 w 256 v 1 >> 1(XEN) 5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%) >> (XEN) Runqueue info: >> 6(XEN) runqueue 0: >> 9(XEN) CPUs info: >> 0(XEN) CPU[00] runq=0, sibling=00000000,00000000,00000000,00000001, wcore=00000000,00000000,00000000,00000001 >> > This tells me that nr_cpu_ids is very big (I think it tells it is 128, > i.e., ARM default), which means cpumask_t-s are huge. > > What does `xl info' says. On my (x86) test box, it's like this: > > ... > nr_cpus : 16 > max_cpu_id : 63 > ... > > (and I have NR_CPUS=256, i.e., x86 the default). > > Cpumasks being bigger also means cpumask operation being slower, and > this matters quite a bit in Credit2, because we use cpumasks a lot (but > also in Credit1, because we use cpumasks a little less than in Credit2, > but still quite a bit). > > Isn't there a way, on ARM, to figure out online that you're not going > to have 128 cpus in the platform? It is just we never set nr_cpu_ids on ARM :/. There was a patch on the ML a while ago [1] but never got applied. Stefano, I think the patch is still valid. Could you apply it? It would probably be worth to do the benchmark again with this patch applied. Cheers, [1] https://patchwork.kernel.org/patch/8177261/
Hi Dario, On 02/17/2017 06:40 PM, Dario Faggioli wrote: > On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: > Actually, TSC on this box should be stable and invariant, so I guess I > can try with the default governor. Will do that on Monday. Does ARM > have frequency scaling (I did remember something on xen-devel, but I am > not sure whether it landed upstream)? I guess you mean the series sent by globallogic ([1])? If so, it was never upstreamed. However the frequency scaling may depend on the processor used and sometimes implemented using big.LITTLE (e.g the task has to switch). I am not sure why it would matter in this case. [1] https://lists.xenproject.org/archives/html/xen-devel/2014-10/msg00801.html
On Fri, 17 Feb 2017, Julien Grall wrote: > Hi Dario, > > On 02/17/2017 06:40 PM, Dario Faggioli wrote: > > On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: > > Actually, TSC on this box should be stable and invariant, so I guess I > > can try with the default governor. Will do that on Monday. Does ARM > > have frequency scaling (I did remember something on xen-devel, but I am > > not sure whether it landed upstream)? > > I guess you mean the series sent by globallogic ([1])? If so, it was never > upstreamed. > > However the frequency scaling may depend on the processor used and sometimes > implemented using big.LITTLE (e.g the task has to switch). > > I am not sure why it would matter in this case. > > [1] https://lists.xenproject.org/archives/html/xen-devel/2014-10/msg00801.html Not in this case, I have four A53s on this platform.
On Fri, 17 Feb 2017, Julien Grall wrote: > Hi, > > On 02/17/2017 11:02 AM, Dario Faggioli wrote: > > Just very quickly... > > > > On Thu, 2017-02-16 at 15:07 -0800, Stefano Stabellini wrote: > > > (XEN) Active queues: 1 > > > (XEN) default-weight = 256 > > > (XEN) Runqueue 0: > > > (XEN) ncpus = 4 > > > (XEN) cpus = 0-3 > > > (XEN) max_weight = 256 > > > (XEN) instload = 1 > > > (XEN) aveload = 3208 (~1%) > > > (XEN) l(XEN) idlers: 00000000,00000000,00000000,0000000a > > > a(XEN) tickled: 00000000,00000000,00000000,00000000 > > > t(XEN) fully idle cores: 00000000,00000000,00000000,0000000a > > > e(XEN) Domain info: > > > n(XEN) Domain: 0 w 256 v 4 > > > c(XEN) 1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%) > > > (XEN) 2: y[0.1] flags=0 cpu=1 credit=10500000 [w=256]( load=131072 > > > (~50%) > > > (XEN) 3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072 > > > (~50%): > > > (XEN) 4: [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072 > > > (~50%)x > > > > > Status of vcpus 2, 3 and 4 is a bit weird. I'll think about it. > > > > > =(XEN) Domain: 1 w 256 v 1 > > > 1(XEN) 5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%) > > > (XEN) Runqueue info: > > > 6(XEN) runqueue 0: > > > 9(XEN) CPUs info: > > > 0(XEN) CPU[00] runq=0, sibling=00000000,00000000,00000000,00000001, > > > wcore=00000000,00000000,00000000,00000001 > > > > > This tells me that nr_cpu_ids is very big (I think it tells it is 128, > > i.e., ARM default), which means cpumask_t-s are huge. > > > > What does `xl info' says. On my (x86) test box, it's like this: > > > > ... > > nr_cpus : 16 > > max_cpu_id : 63 > > ... > > > > (and I have NR_CPUS=256, i.e., x86 the default). Indeed I have 127 > > Cpumasks being bigger also means cpumask operation being slower, and > > this matters quite a bit in Credit2, because we use cpumasks a lot (but > > also in Credit1, because we use cpumasks a little less than in Credit2, > > but still quite a bit). > > > > Isn't there a way, on ARM, to figure out online that you're not going > > to have 128 cpus in the platform? > > It is just we never set nr_cpu_ids on ARM :/. There was a patch on the ML a > while ago [1] but never got applied. > > Stefano, I think the patch is still valid. Could you apply it? > [1] https://patchwork.kernel.org/patch/8177261/ I pushed the patch > It would probably be worth to do the benchmark again with this patch applied. Unfortunately the numbers haven't changed much: AVG MIN MAX WARM MAX NODEBUG vwfi=sleep credit2 fix-cpumasks 8020 7670 10320 8390 NODEBUG vwfi=sleep credit1 fix-cpumasks 6400 6330 9650 6720 In addition to the mysterious difference between credit1 and credit2, we also have the difference between vwfi=idle and vwfi=sleep to deal with: NODEBUG vwfi=idle credit2 fix cpumasks 4000 2370 4500 3350 NODEBUG vwfi=idle credit1 fix cpumasks 3220 2180 4500 4320
On Fri, 17 Feb 2017, Stefano Stabellini wrote: > On Fri, 17 Feb 2017, Julien Grall wrote: > > Hi, > > > > On 02/17/2017 11:02 AM, Dario Faggioli wrote: > > > Just very quickly... > > > > > > On Thu, 2017-02-16 at 15:07 -0800, Stefano Stabellini wrote: > > > > (XEN) Active queues: 1 > > > > (XEN) default-weight = 256 > > > > (XEN) Runqueue 0: > > > > (XEN) ncpus = 4 > > > > (XEN) cpus = 0-3 > > > > (XEN) max_weight = 256 > > > > (XEN) instload = 1 > > > > (XEN) aveload = 3208 (~1%) > > > > (XEN) l(XEN) idlers: 00000000,00000000,00000000,0000000a > > > > a(XEN) tickled: 00000000,00000000,00000000,00000000 > > > > t(XEN) fully idle cores: 00000000,00000000,00000000,0000000a > > > > e(XEN) Domain info: > > > > n(XEN) Domain: 0 w 256 v 4 > > > > c(XEN) 1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%) > > > > (XEN) 2: y[0.1] flags=0 cpu=1 credit=10500000 [w=256]( load=131072 > > > > (~50%) > > > > (XEN) 3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072 > > > > (~50%): > > > > (XEN) 4: [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072 > > > > (~50%)x > > > > > > > Status of vcpus 2, 3 and 4 is a bit weird. I'll think about it. > > > > > > > =(XEN) Domain: 1 w 256 v 1 > > > > 1(XEN) 5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%) > > > > (XEN) Runqueue info: > > > > 6(XEN) runqueue 0: > > > > 9(XEN) CPUs info: > > > > 0(XEN) CPU[00] runq=0, sibling=00000000,00000000,00000000,00000001, > > > > wcore=00000000,00000000,00000000,00000001 > > > > > > > This tells me that nr_cpu_ids is very big (I think it tells it is 128, > > > i.e., ARM default), which means cpumask_t-s are huge. > > > > > > What does `xl info' says. On my (x86) test box, it's like this: > > > > > > ... > > > nr_cpus : 16 > > > max_cpu_id : 63 > > > ... > > > > > > (and I have NR_CPUS=256, i.e., x86 the default). > > Indeed I have 127 > > > > > Cpumasks being bigger also means cpumask operation being slower, and > > > this matters quite a bit in Credit2, because we use cpumasks a lot (but > > > also in Credit1, because we use cpumasks a little less than in Credit2, > > > but still quite a bit). > > > > > > Isn't there a way, on ARM, to figure out online that you're not going > > > to have 128 cpus in the platform? > > > > It is just we never set nr_cpu_ids on ARM :/. There was a patch on the ML a > > while ago [1] but never got applied. > > > > Stefano, I think the patch is still valid. Could you apply it? > > [1] https://patchwork.kernel.org/patch/8177261/ > > I pushed the patch > > > > It would probably be worth to do the benchmark again with this patch applied. > > Unfortunately the numbers haven't changed much: > > AVG MIN MAX WARM MAX > NODEBUG vwfi=sleep credit2 fix-cpumasks 8020 7670 10320 8390 > NODEBUG vwfi=sleep credit1 fix-cpumasks 6400 6330 9650 6720 > > In addition to the mysterious difference between credit1 and credit2, we > also have the difference between vwfi=idle and vwfi=sleep to deal with: > > NODEBUG vwfi=idle credit2 fix cpumasks 4000 2370 4500 3350 > NODEBUG vwfi=idle credit1 fix cpumasks 3220 2180 4500 4320 Actually those are still the old numbers, sorry! I didn't update the xen binary properly. These are the new numbers: AVG MIN MAX WARM MAX vwfi=sleep credit2 fix cpumasks 5910 5800 8520 6180 vwfi=sleep credit1 fix cpumasks 4900 4810 6910 4980 vwfi=idle credit2 fix cpumasks 2800 1560 4550 4200 vwfi=idle credit1 fix cpumasks 2800 1610 3420 1770 The difference between credit2 and credit1 is smaller now. In fact it's zero, when vwfi=idle. However, with vwfi=sleep, the larger MAX value is a bit worrying.
On Fri, 17 Feb 2017, Dario Faggioli wrote: > On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: > > These are the results, in nanosec: > > > > AVG MIN MAX WARM MAX > > > > NODEBUG no WFI 1890 1800 3170 2070 > > NODEBUG WFI 4850 4810 7030 4980 > > NODEBUG no WFI credit2 2217 2090 3420 2650 > > NODEBUG WFI credit2 8080 7890 10320 8300 > > > > DEBUG no WFI 2252 2080 3320 2650 > > DEBUG WFI 6500 6140 8520 8130 > > DEBUG WFI, credit2 8050 7870 10680 8450 > > > > As you can see, depending on whether the guest issues a WFI or not > > while > > waiting for interrupts, the results change significantly. > > Interestingly, > > credit2 does worse than credit1 in this area. > > > I did some measuring myself, on x86, with different tools. So, > cyclictest is basically something very very similar to the app > Stefano's app. > > I've run it both within Dom0, and inside a guest. I also run a Xen > build (in this case, only inside of the guest). > > > We are down to 2000-3000ns. Then, I started investigating the > > scheduler. > > I measured how long it takes to run "vcpu_unblock": 1050ns, which is > > significant. I don't know what is causing the remaining 1000-2000ns, > > but > > I bet on another scheduler function. Do you have any suggestions on > > which one? > > > So, vcpu_unblock() calls vcpu_wake(), which then invokes the > scheduler's wakeup related functions. > > If you time vcpu_unblock(), from beginning to end of the function, you > actually capture quite a few things. E.g., the scheduler lock is taken > inside vcpu_wake(), so you're basically including time spent waited on > the lock in the estimation. > > That is probably ok (as in, lock contention definitely is something > relevant to latency), but it is expected for things to be rather > different between Credit1 and Credit2. > > I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule), > and here's the result. Numbers are in cycles (I've used RDTSC) and, for > making sure to obtain consistent and comparable numbers, I've set the > frequency scaling governor to performance. > > Dom0, [performance] > cyclictest 1us cyclictest 1ms cyclictest 100ms > (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 > wakeup-avg 2429 2035 1980 1633 2535 1979 > wakeup-max 14577 113682 15153 203136 12285 115164 I am not that familiar with the x86 side of things, but the 113682 and 203136 look worrisome, especially considering that credit1 doesn't have them. > sched-avg 1716 1860 2527 1651 2286 1670 > sched-max 16059 15000 12297 101760 15831 13122 > > VM, [performance] > cyclictest 1us cyclictest 1ms cyclictest 100ms make -j xen > (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 > wakeup-avg 2213 2128 1944 2342 2374 2213 2429 1618 > wakeup-max 9990 10104 11262 9927 10290 10218 14430 15108 > sched-avg 2437 2472 1620 1594 2498 1759 2449 1809 > sched-max 14100 14634 10071 9984 10878 8748 16476 14220 > These are the corresponding numbers I have in ns: AVG MAX WARM MAX credit2 sched_op do_schedule 638 2410 2290 credit2 sched_op wake 603 2920 670 credit1 sched_op do_schedule 508 980 980 credit1 sched_op wake 792 2080 930 I would also like to see the nop scheduler as a comparison. It looks like that credit2 has higher max values. I am attaching the raw numbers because I think they are interesting (also in ns): credit2 has an higher initial variability. FYI the scenario is still the same: domU vcpu pinned to a pcpu, dom0 running elsewhere. (XEN) DEBUG vcpu_wake 450 ns=1640 (XEN) DEBUG schedule 1401 ns=480 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=980 (XEN) DEBUG vcpu_wake 450 ns=2080 (XEN) DEBUG schedule 1401 ns=330 (XEN) DEBUG schedule 1401 ns=800 (XEN) DEBUG vcpu_wake 450 ns=930 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=770 (XEN) DEBUG vcpu_wake 450 ns=800 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=800 (XEN) DEBUG vcpu_wake 450 ns=770 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=780 (XEN) DEBUG vcpu_wake 450 ns=800 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=790 (XEN) DEBUG vcpu_wake 450 ns=1510 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=780 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=750 (XEN) DEBUG vcpu_wake 450 ns=780 (XEN) DEBUG schedule 1401 ns=250 (XEN) DEBUG schedule 1401 ns=780 (XEN) DEBUG vcpu_wake 450 ns=780 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=770 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=780 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=760 (XEN) DEBUG vcpu_wake 450 ns=1170 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=760 (XEN) DEBUG vcpu_wake 450 ns=720 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=750 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=780 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=800 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=780 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=760 (XEN) DEBUG vcpu_wake 450 ns=780 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=760 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=770 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=770 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=780 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=700 (XEN) DEBUG vcpu_wake 450 ns=770 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=760 (XEN) DEBUG vcpu_wake 450 ns=720 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=740 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=700 (XEN) DEBUG vcpu_wake 450 ns=720 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=750 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=780 (XEN) DEBUG vcpu_wake 450 ns=720 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=740 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=790 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=750 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=770 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=740 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=760 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=710 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=750 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=700 [B[B[B[B[B[B(XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=730 [B[B[B[B(XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=280 (XEN) DEBUG schedule 1401 ns=750 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=690 (XEN) DEBUG vcpu_wake 450 ns=770 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=750 (XEN) DEBUG vcpu_wake 450 ns=700 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=770 (XEN) DEBUG vcpu_wake 450 ns=720 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=770 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=690 (XEN) DEBUG vcpu_wake 450 ns=780 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=740 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=740 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=750 (XEN) DEBUG vcpu_wake 450 ns=770 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=770 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=720 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=710 (XEN) DEBUG schedule 1401 ns=300 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=730 (XEN) DEBUG schedule 1401 ns=290 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=770 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=250 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=710 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=700 (XEN) DEBUG vcpu_wake 450 ns=690 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=730 (XEN) DEBUG vcpu_wake 450 ns=770 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=720 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=780 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=670 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=760 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=700 (XEN) DEBUG vcpu_wake 450 ns=750 (XEN) DEBUG schedule 1401 ns=260 (XEN) DEBUG schedule 1401 ns=700 (XEN) DEBUG vcpu_wake 450 ns=740 (XEN) DEBUG schedule 1401 ns=270 (XEN) DEBUG schedule 1401 ns=700 (XEN) DEBUG vcpu_wake 450 ns=2920 (XEN) DEBUG schedule 1401 ns=1160 (XEN) DEBUG schedule 1401 ns=2410 (XEN) IDEBUG schedule 1401 ns=2290 (XEN) DEBUG schedule 1401 ns=700 (XEN) DEBUG schedule 1401 ns=1150 (XEN) DEBUG vcpu_wake 450 ns=820 (XEN) DEBUG schedule 1401 ns=740 (XEN) DEBUG schedule 1401 ns=710 (XEN) DEBUG vcpu_wake 450 ns=670 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=630 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=620 (XEN) latDEBUG schedule 1401 ns=810 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=580 (XEN) latDEBUG schedule 1401 ns=790 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=630 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=550 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=670 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=660 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=600 (XEN) latDEBUG schedule 1401 ns=830 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=590 (XEN) latenDEBUG schedule 1401 ns=820 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=660 (XEN) DEBUG vcpu_wake 450 ns=630 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=590 (XEN) latDEBUG schedule 1401 ns=880 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=590 (XEN) latDEBUG schedule 1401 ns=860 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=630 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=830 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=670 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=580 (XEN) latDEBUG schedule 1401 ns=870 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=550 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=550 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=600 (XEN) latDEBUG schedule 1401 ns=810 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=600 (XEN) latenDEBUG schedule 1401 ns=810 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=1020 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=680 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=550 (XEN) latDEBUG schedule 1401 ns=840 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=590 (XEN) lateDEBUG schedule 1401 ns=830 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=660 (XEN) DEBUG vcpu_wake 450 ns=620 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=590 (XEN) latDEBUG schedule 1401 ns=850 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) latDEBUG schedule 1401 ns=820 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=550 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=530 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=580 (XEN) latDEBUG schedule 1401 ns=870 (XEN) DEBUG schedule 1401 ns=660 (XEN) DEBUG vcpu_wake 450 ns=540 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=550 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=540 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=550 (XEN) DEBUG schedule 1401 ns=560 (XEN) latDEBUG schedule 1401 ns=820 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=540 (XEN) lateDEBUG schedule 1401 ns=850 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=620 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=660 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=600 (XEN) latDEBUG schedule 1401 ns=840 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=650 (XEN) DEBUG schedule 1401 ns=640 (XEN) latDEBUG schedule 1401 ns=820 (XEN) DEBUG schedule 1401 ns=670 (XEN) DEBUG vcpu_wake 450 ns=620 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=670 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=660 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=540 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=560 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=550 (XEN) DEBUG schedule 1401 ns=520 (XEN) latDEBUG schedule 1401 ns=860 (XEN) DEBUG schedule 1401 ns=670 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=560 (XEN) lateDEBUG schedule 1401 ns=850 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=620 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=540 (XEN) latDEBUG schedule 1401 ns=830 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=530 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=540 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=570 (XEN) latDEBUG schedule 1401 ns=860 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=560 (XEN) lateDEBUG schedule 1401 ns=830 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=540 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG vcpu_wake 450 ns=550 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=630 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=1000 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=620 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=650 (XEN) DEBUG vcpu_wake 450 ns=570 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=620 (XEN) DEBUG schedule 1401 ns=590 (XEN) DEBUG schedule 1401 ns=640 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=600 (XEN) latDEBUG schedule 1401 ns=860 (XEN) DEBUG schedule 1401 ns=630 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=590 (XEN) latDEBUG schedule 1401 ns=830 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=590 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=610 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG schedule 1401 ns=600 (XEN) DEBUG vcpu_wake 450 ns=560 (XEN) DEBUG schedule 1401 ns=580 (XEN) DEBUG schedule 1401 ns=620 (XEN) DEBUG vcpu_wake 450 ns=600 (XEN) DEBUG schedule 1401 ns=550 (XEN) DEBUG schedule 1401 ns=610 (XEN) DEBUG vcpu_wake 450 ns=580 (XEN) DEBUG schedule 1401 ns=570 (XEN) DEBUG schedule 1401 ns=600
On Fri, 2017-02-17 at 16:02 -0800, Stefano Stabellini wrote: > On Fri, 17 Feb 2017, Stefano Stabellini wrote: > > > > NODEBUG vwfi=idle credit2 fix cpumasks 4000 2370 > > 4500 3350 > > NODEBUG vwfi=idle credit1 fix cpumasks 3220 2180 > > 4500 4320 > > Actually those are still the old numbers, sorry! > Yeah, I know, I've done the same sooo many times! :-P > I didn't update the xen > binary properly. These are the new numbers: > > AVG MIN MAX > WARM MAX > vwfi=sleep credit2 fix cpumasks 5910 5800 8520 > 6180 > vwfi=sleep credit1 fix cpumasks 4900 4810 6910 > 4980 > vwfi=idle credit2 fix cpumasks 2800 1560 4550 > 4200 > vwfi=idle credit1 fix cpumasks 2800 1610 3420 > 1770 > > The difference between credit2 and credit1 is smaller now. In fact > it's > zero, when vwfi=idle. > Ok, cool. > However, with vwfi=sleep, the larger MAX value is > a bit worrying. > I still think this may be due to the fact that you never stop Credit1's tick on ARM, which means that, when on Credit1, the system is a lot more "alive" (e.g., if we were on x86, I'd have said it'd spend less time in deep C-states, and things like that). And yes, I still owe you a patch for disabling it. I'm working on that one. :-) Regards, Dario
On Fri, 2017-02-17 at 19:44 +0000, Julien Grall wrote: > On 02/17/2017 06:40 PM, Dario Faggioli wrote: > > Does ARM > > have frequency scaling (I did remember something on xen-devel, but > > I am > > not sure whether it landed upstream)? > > I guess you mean the series sent by globallogic ([1])? If so, it was > never upstreamed. > Yep, that one. I was quite sure it did not went in, but was in hurry, and could not double check the code. Sorry. > However the frequency scaling may depend on the processor used and > sometimes implemented using big.LITTLE (e.g the task has to switch). > > I am not sure why it would matter in this case. > It's hard to tell whether or not (and if yes, how much) it matter in this specific case. In general, when I was working on squeezing latency on bare-metal Linux, disabling scaling was one of the steps. It depends on quite a few different factors, but, roughly, during the time you're idle waiting for the interrupt, you're running at low frequency. When the interrupt comes, what happens depends on the governor, but you usually don't immediately jump to the highest frequency, because that happens gradually. And if all you do is something quick, in response to the timer interrupt (like in this and similar cases), you most likely end up doing that running at low frequency, which may impact latency (at least the latency of the completion of the work that the timer triggered). But I was also referring to the fact that the app is using some hardware counters, and a frequency value for updating them, and for converting their value to time. So I was wondering how the fact that frequency may be changing dynamically may interact with that (and that's because I'm pretty ignorant about ARM internals, I know :-/). Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
On 18/02/17 00:41, Stefano Stabellini wrote: > On Fri, 17 Feb 2017, Dario Faggioli wrote: >> On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: >>> These are the results, in nanosec: >>> >>> AVG MIN MAX WARM MAX >>> >>> NODEBUG no WFI 1890 1800 3170 2070 >>> NODEBUG WFI 4850 4810 7030 4980 >>> NODEBUG no WFI credit2 2217 2090 3420 2650 >>> NODEBUG WFI credit2 8080 7890 10320 8300 >>> >>> DEBUG no WFI 2252 2080 3320 2650 >>> DEBUG WFI 6500 6140 8520 8130 >>> DEBUG WFI, credit2 8050 7870 10680 8450 >>> >>> As you can see, depending on whether the guest issues a WFI or not >>> while >>> waiting for interrupts, the results change significantly. >>> Interestingly, >>> credit2 does worse than credit1 in this area. >>> >> I did some measuring myself, on x86, with different tools. So, >> cyclictest is basically something very very similar to the app >> Stefano's app. >> >> I've run it both within Dom0, and inside a guest. I also run a Xen >> build (in this case, only inside of the guest). >> >>> We are down to 2000-3000ns. Then, I started investigating the >>> scheduler. >>> I measured how long it takes to run "vcpu_unblock": 1050ns, which is >>> significant. I don't know what is causing the remaining 1000-2000ns, >>> but >>> I bet on another scheduler function. Do you have any suggestions on >>> which one? >>> >> So, vcpu_unblock() calls vcpu_wake(), which then invokes the >> scheduler's wakeup related functions. >> >> If you time vcpu_unblock(), from beginning to end of the function, you >> actually capture quite a few things. E.g., the scheduler lock is taken >> inside vcpu_wake(), so you're basically including time spent waited on >> the lock in the estimation. >> >> That is probably ok (as in, lock contention definitely is something >> relevant to latency), but it is expected for things to be rather >> different between Credit1 and Credit2. >> >> I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule), >> and here's the result. Numbers are in cycles (I've used RDTSC) and, for >> making sure to obtain consistent and comparable numbers, I've set the >> frequency scaling governor to performance. >> >> Dom0, [performance] >> cyclictest 1us cyclictest 1ms cyclictest 100ms >> (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 >> wakeup-avg 2429 2035 1980 1633 2535 1979 >> wakeup-max 14577 113682 15153 203136 12285 115164 > > I am not that familiar with the x86 side of things, but the 113682 and > 203136 look worrisome, especially considering that credit1 doesn't have > them. Dario, Do you reckon those 'MAX' values could be the load balancer running (both for credit1 and credit2)? -George
On Mon, 2017-02-20 at 11:04 +0000, George Dunlap wrote: > On 18/02/17 00:41, Stefano Stabellini wrote: > > > > On Fri, 17 Feb 2017, Dario Faggioli wrote: > > > cyclictest 1us cyclictest 1ms cyclictest > > > 100ms > > > (cycles) Credit1 Credit2 Credit1 Cred > > > it2 Credit1 Credit2 > > > wakeup-avg 2429 2035 1980 1633 > > > 2535 1979 > > > wakeup-max 14577 113682 15153 203136 > > > 12285 115164 > > > > I am not that familiar with the x86 side of things, but the 113682 > > and > > 203136 look worrisome, especially considering that credit1 doesn't > > have > > them. > > Dario, > > Do you reckon those 'MAX' values could be the load balancer running > (both for credit1 and credit2)? > If they were max-es of do_schedule, that would have been my first guess. But not in this case, as the load balancer is never called during wakeup (neither in Credit nor Credit2). But I'm curious about what happen, and am going to investigate that. It should not be too hard has: 1) looks reproducible on my (x86) testbox that I have here; 2) since it's x86 I have tracing! :-P Dario
Dario Faggioli <dario.faggioli@citrix.com> writes: > On Fri, 2017-02-17 at 19:44 +0000, Julien Grall wrote: >> On 02/17/2017 06:40 PM, Dario Faggioli wrote: >> > Does ARM >> > have frequency scaling (I did remember something on xen-devel, but >> > I am >> > not sure whether it landed upstream)? >> >> I guess you mean the series sent by globallogic ([1])? If so, it was >> never upstreamed. >> > Yep, that one. I was quite sure it did not went in, but was in hurry, > and could not double check the code. Sorry. > >> However the frequency scaling may depend on the processor used and >> sometimes implemented using big.LITTLE (e.g the task has to switch). >> >> I am not sure why it would matter in this case. >> > It's hard to tell whether or not (and if yes, how much) it matter in > this specific case. In general, when I was working on squeezing latency > on bare-metal Linux, disabling scaling was one of the steps. > > It depends on quite a few different factors, but, roughly, during the > time you're idle waiting for the interrupt, you're running at low > frequency. > > When the interrupt comes, what happens depends on the governor, but you > usually don't immediately jump to the highest frequency, because that > happens gradually. And if all you do is something quick, in response to > the timer interrupt (like in this and similar cases), you most likely > end up doing that running at low frequency, which may impact latency > (at least the latency of the completion of the work that the timer > triggered). Jumping in a little late but thought I'd add my 2c on cpu frequency control on arm. Typically, changing frequency requires active participation by a software agent, e.g., linux. It's not done automagically behind the scene in hardware. If there are no agents changing frequency, then the cores run at the frequency they were booted at by the firmware. When waiting for an event (WFI/WFE) it is unlikely that frequency is lowered and ramped up on receiving such events. Such behaviour would need to be implemented as part of the agent controlling the frequency. Indeed, there have been discussions on implementing such functionality in linux on certain arm platforms. > > But I was also referring to the fact that the app is using some > hardware counters, and a frequency value for updating them, and for > converting their value to time. So I was wondering how the fact that > frequency may be changing dynamically may interact with that (and > that's because I'm pretty ignorant about ARM internals, I know :-/). In the absence of a controlling agent, hardware counters should not be running at variable frequency. Hope this helps. Thanks, Punit > > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 7e43691..f5ff69b 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -663,6 +663,7 @@ void arch_domain_destroy(struct domain *d) /* IOMMU page table is shared with P2M, always call * iommu_domain_destroy() before p2m_teardown(). */ + WRITE_SYSREG32(0, CNTP_CTL_EL0); iommu_domain_destroy(d); p2m_teardown(d); domain_vgic_free(d); diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index a5348f2..5c8b621 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -47,7 +47,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); static void gic_update_one_lr(struct vcpu *v, int i); -static const struct gic_hw_operations *gic_hw_ops; +const struct gic_hw_operations *gic_hw_ops; void register_gic_ops(const struct gic_hw_operations *ops) { diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index dd62ba6..9a4e50d 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -184,6 +184,7 @@ int request_irq(unsigned int irq, unsigned int irqflags, } /* Dispatch an interrupt */ +extern const struct gic_hw_operations *gic_hw_ops; void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq) { struct irq_desc *desc = irq_to_desc(irq); @@ -202,6 +203,12 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq) irq_enter(); spin_lock(&desc->lock); + + if (irq == 30) { + set_bit(_IRQ_GUEST, &desc->status); + desc->handler = gic_hw_ops->gic_guest_irq_type; + } + desc->handler->ack(desc); if ( !desc->action ) @@ -224,7 +231,23 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq) * The irq cannot be a PPI, we only support delivery of SPIs to * guests. */ - vgic_vcpu_inject_spi(info->d, info->virq); + if (irq != 30) + vgic_vcpu_inject_spi(info->d, info->virq); + else { + struct domain *d; + + for_each_domain ( d ) + { + struct pending_irq *p; + + if (d->domain_id == 0 || is_idle_domain(d)) + continue; + p = irq_to_pending(d->vcpu[0], 30); + p->desc = desc; + vgic_vcpu_inject_irq(d->vcpu[0], 30); + break; + } + } goto out_no_end; } diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index 7dae28b..0249631 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -297,9 +300,9 @@ void init_timer_interrupt(void) /* Sensible defaults */ WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */ /* Do not let the VMs program the physical timer, only read the physical counter */ - WRITE_SYSREG32(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2); WRITE_SYSREG32(0, CNTP_CTL_EL0); /* Physical timer disabled */ WRITE_SYSREG32(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */ + WRITE_SYSREG32(CNTHCTL_EL2_EL1PCTEN|CNTHCTL_EL2_EL1PCEN, CNTHCTL_EL2); isb(); request_irq(timer_irq[TIMER_HYP_PPI], 0, timer_interrupt,