Message ID | 4ccdfc3c-4796-cf1f-61a1-724ac560aa37@gmx.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/09/2016 02:48, Thorsten Kohfeldt wrote: > From: Thorsten Kohfeldt <mailto__qemu-devel@nongnu.org> > Date: Wed, 31 Aug 2016 22:43:14 +0200 > Subject: [PATCH] hmp: Improve 'info mtree' with optional parm for mapinfo > > Motivation > When 'tuning' 'quirks' for VFIO imported devices, it is not easy to > directly grasp the implications of the priorisation algorithms in place > for the 'layered mapping' of memory regions. > Even though there are rules (documented in docs/memory.txt), once in a > while one might question the correctness of the actual implementation of > the rules. > Particularly, I believe I have uncovered a divergence of (sub-)region > priorisation/order/visibility in a corner case of importing a device > (which requires a 'quirk') with mmap enabled vs. mmap disabled. > This modification provides a means of visualising the ACTUAL > mapping/visibility/occlusion of subregions within regions, whereas the > current info mtree command only lists the tree of regions (all, visible > and invisible ones). > It is primarily intended to provide support for easy presentation of my > cause, but I strongly believe this modification also has general purpose > advantages. It is a useful addition, but I think a simpler presentation is also fine. What about "info mtree -f" which would visit the FlatView instead of the tree? The patch would probably be much shorter. Thanks, Paolo
On 09/15/16 11:52, Paolo Bonzini wrote: > > > On 07/09/2016 02:48, Thorsten Kohfeldt wrote: >> From: Thorsten Kohfeldt <mailto__qemu-devel@nongnu.org> >> Date: Wed, 31 Aug 2016 22:43:14 +0200 >> Subject: [PATCH] hmp: Improve 'info mtree' with optional parm for mapinfo >> >> Motivation >> When 'tuning' 'quirks' for VFIO imported devices, it is not easy to >> directly grasp the implications of the priorisation algorithms in place >> for the 'layered mapping' of memory regions. >> Even though there are rules (documented in docs/memory.txt), once in a >> while one might question the correctness of the actual implementation of >> the rules. >> Particularly, I believe I have uncovered a divergence of (sub-)region >> priorisation/order/visibility in a corner case of importing a device >> (which requires a 'quirk') with mmap enabled vs. mmap disabled. >> This modification provides a means of visualising the ACTUAL >> mapping/visibility/occlusion of subregions within regions, whereas the >> current info mtree command only lists the tree of regions (all, visible >> and invisible ones). >> It is primarily intended to provide support for easy presentation of my >> cause, but I strongly believe this modification also has general purpose >> advantages. > > It is a useful addition, but I think a simpler presentation is also > fine. What about "info mtree -f" which would visit the FlatView instead > of the tree? The patch would probably be much shorter. I'm very interested in this feature (as a mere user at this point). Thorsten, can you please CC me on your next version, and also include an example command response in either the Notes section of the patch, or in a separate 0/1 blurb? Thanks Laszlo
Am 15.09.2016 um 11:52 schrieb Paolo Bonzini: > > On 07/09/2016 02:48, Thorsten Kohfeldt wrote: >> From: Thorsten Kohfeldt <mailto__qemu-devel@nongnu.org> >> Date: Wed, 31 Aug 2016 22:43:14 +0200 >> Subject: [PATCH] hmp: Improve 'info mtree' with optional parm for mapinfo >> >> Motivation >> When 'tuning' 'quirks' for VFIO imported devices, it is not easy to >> directly grasp the implications of the priorisation algorithms in place >> for the 'layered mapping' of memory regions. >> Even though there are rules (documented in docs/memory.txt), once in a >> while one might question the correctness of the actual implementation of >> the rules. >> Particularly, I believe I have uncovered a divergence of (sub-)region >> priorisation/order/visibility in a corner case of importing a device >> (which requires a 'quirk') with mmap enabled vs. mmap disabled. >> This modification provides a means of visualising the ACTUAL >> mapping/visibility/occlusion of subregions within regions, whereas the >> current info mtree command only lists the tree of regions (all, visible >> and invisible ones). >> It is primarily intended to provide support for easy presentation of my >> cause, but I strongly believe this modification also has general purpose >> advantages. > > It is a useful addition, but I think a simpler presentation is also > fine. What about "info mtree -f" which would visit the FlatView instead > of the tree? The patch would probably be much shorter. > > Thanks, > > Paolo Paolo, For quite some time I had the patch in use as a direct modification of 'info mtree', but I felt that a general purpose use must provide an ad hoc user selectable presentation width parameter for the map info. I personally use a width of 65 or even 129 characters PREFIXING the tree elements which the command currently responds. My guess is though that most users would want a width of only 9 or 17. So I believe that a numerical parameter is a must. Visit the flat view - I'm not sure I understand you. Do you suggest to traverse a completely different data structure ? The purpose of the suggested visualisation is to point out where each of the tree's memory regions are "pinched" by other regions, so their "native" contents is NOT visible any more throughout the full region length, but (fractionally) rather another regions's content. Thus, I personally require to traverse exactly that tree structure. No offence, but I would rather not want to modify the patch towards what I feel would be a completely different purpose. I would appreciate if someone would review the patch in its current functional form to get it queued for qemu 2.8. My intention is to be able to rely on communication partners being able to reproduce findings using the new command once I start to attack the VFIO mmap flaw I talk about in the commit comment. @ALL I have provided 2 patch branches in github, one for qemu-2.7.0 and one for qemu-current-master (this needed a tiny sed-conversion, see below). I also placed some example info mtree mapinfo output on gist.github: # patch commit for qemu-2.7 (same patch also works for qemu-2.6): https://github.com/Thorsten-Kohfeldt/qemu/commit/5633a3cdbf6fd7cccd098fb83f591fbb15e8d383 # in branch: https://github.com/Thorsten-Kohfeldt/qemu/commits/monitor_--hmp_info_mtree_mapinfo-width # PATCH (as published in mailing list) *CONVERSION* from qemu-2.6/qemu-2.7 to qemu-master: sed s/'[.]mhandler[.]cmd = '/'.cmd = '/ <qemu-2.6and7.patch >qemu-master.patch # patch commit adapted that way for qemu-master: https://github.com/Thorsten-Kohfeldt/qemu/commit/60b8c1be1a0119df1f1859b0d484e06e709c2ea2 # in branch: https://github.com/Thorsten-Kohfeldt/qemu/commits/upstream-pullrequest-mapinfo-V1 # sample output info mtree 9 https://gist.github.com/Thorsten-Kohfeldt/254b5a21fef497054959f58af53a44c9 # sample output info mtree 65 https://gist.github.com/Thorsten-Kohfeldt/39cf5c8521c1999518b3438315e439f4 Regards, Thorsten
On 09/20/16 02:16, Thorsten Kohfeldt wrote: > > Am 15.09.2016 um 11:52 schrieb Paolo Bonzini: >> >> On 07/09/2016 02:48, Thorsten Kohfeldt wrote: >>> From: Thorsten Kohfeldt <mailto__qemu-devel@nongnu.org> >>> Date: Wed, 31 Aug 2016 22:43:14 +0200 >>> Subject: [PATCH] hmp: Improve 'info mtree' with optional parm for >>> mapinfo >>> >>> Motivation >>> When 'tuning' 'quirks' for VFIO imported devices, it is not easy to >>> directly grasp the implications of the priorisation algorithms in place >>> for the 'layered mapping' of memory regions. >>> Even though there are rules (documented in docs/memory.txt), once in a >>> while one might question the correctness of the actual implementation of >>> the rules. >>> Particularly, I believe I have uncovered a divergence of (sub-)region >>> priorisation/order/visibility in a corner case of importing a device >>> (which requires a 'quirk') with mmap enabled vs. mmap disabled. >>> This modification provides a means of visualising the ACTUAL >>> mapping/visibility/occlusion of subregions within regions, whereas the >>> current info mtree command only lists the tree of regions (all, visible >>> and invisible ones). >>> It is primarily intended to provide support for easy presentation of my >>> cause, but I strongly believe this modification also has general purpose >>> advantages. >> >> It is a useful addition, but I think a simpler presentation is also >> fine. What about "info mtree -f" which would visit the FlatView instead >> of the tree? The patch would probably be much shorter. >> >> Thanks, >> >> Paolo > > Paolo, > > For quite some time I had the patch in use as a direct modification of > 'info mtree', but I felt that a general purpose use must provide an ad > hoc user selectable presentation width parameter for the map info. > I personally use a width of 65 or even 129 characters PREFIXING the > tree elements which the command currently responds. > My guess is though that most users would want a width of only 9 or 17. > So I believe that a numerical parameter is a must. > Visit the flat view - > I'm not sure I understand you. Do you suggest to traverse a completely > different data structure ? > The purpose of the suggested visualisation is to point out where > each of the tree's memory regions are "pinched" by other regions, so > their "native" contents is NOT visible any more throughout the full > region length, but (fractionally) rather another regions's content. > Thus, I personally require to traverse exactly that tree structure. > > No offence, but I would rather not want to modify the patch towards > what I feel would be a completely different purpose. > > I would appreciate if someone would review the patch in its current > functional form to get it queued for qemu 2.8. > My intention is to be able to rely on communication partners being > able to reproduce findings using the new command once I start to > attack the VFIO mmap flaw I talk about in the commit comment. > > > @ALL > I have provided 2 patch branches in github, > one for qemu-2.7.0 and > one for qemu-current-master (this needed a tiny sed-conversion, see below). > I also placed some example info mtree mapinfo output on gist.github: > > > # patch commit for qemu-2.7 (same patch also works for qemu-2.6): > https://github.com/Thorsten-Kohfeldt/qemu/commit/5633a3cdbf6fd7cccd098fb83f591fbb15e8d383 > > # in branch: > https://github.com/Thorsten-Kohfeldt/qemu/commits/monitor_--hmp_info_mtree_mapinfo-width > > > # PATCH (as published in mailing list) *CONVERSION* from > qemu-2.6/qemu-2.7 to qemu-master: > sed s/'[.]mhandler[.]cmd = '/'.cmd = '/ <qemu-2.6and7.patch >>qemu-master.patch > > # patch commit adapted that way for qemu-master: > https://github.com/Thorsten-Kohfeldt/qemu/commit/60b8c1be1a0119df1f1859b0d484e06e709c2ea2 > > # in branch: > https://github.com/Thorsten-Kohfeldt/qemu/commits/upstream-pullrequest-mapinfo-V1 > > > > # sample output info mtree 9 > https://gist.github.com/Thorsten-Kohfeldt/254b5a21fef497054959f58af53a44c9 > > # sample output info mtree 65 > https://gist.github.com/Thorsten-Kohfeldt/39cf5c8521c1999518b3438315e439f4 I think your current output explains, for each part (that is, each "sample" of user-selected size) of each MemoryRegion, whether that sample is actually visible in the finally rendered, flat view of the address space(s). (And, if not, why not.) Whereas I think Paolo's idea is to map the flat view to begin with, and visually associate each interval of the guest visible physical address space with the one region that is ultimately accessed in that interval. This too would explain what the guest sees where, and why. It wouldn't give much information about ranges that the guest can *not* access (due to occlusion by other regions or otherwise), but maybe the "why not" is not so important after all? Regarding the FlatView thing -- QEMU already maintains a flat rendering of the memory region tree, so that guest accesses to the various regions can be quickly dispatched to the accessors that can serve these accesses. Memory regions are massaged (created, deleted, enabled/disabled etc) in "transactions", and when the outermost transaction is committed, the flat-view is re-rendered (which is expensive). (I'm not an expert on this, so my explanation may be inaccurate.) I think it would make sense to work from the pre-rendered FlatView, if the information you can get out of it covers your needs. Here's an example, from one of your sample outputs: you currently have ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 00000000000c0000-00000000000c3fff (prio 1, R-): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] with @: alias region mapped at sample ~: alias region mappable but disabled at sample o: region occluded by some other region at sample If you have an address in the c0000-c3fff range, you have to consult all four lines to see which region will match. Working off of the FlatView, first I think you would find the right resolution for the output "for free" (you wouldn't need a user sample size -- the interval c0000-c3fff would neither need further subdivision nor be blurred by over-coarse resolution). You could represent the c0000-c3fff interval (and every other interval too) with a single letter, such as P, where P would stand for "alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff". Second, given an address in c0000-c3fff, there would be only one range to consult (same as QEMU itself does with FlatView), and you'd find the MemoryRegion visible in that range at once. ... I hope Paolo will correct me if I misunderstood his suggestion. Thanks Laszlo
On 20/09/2016 02:51, Laszlo Ersek wrote: > Here's an example, from one of your sample outputs: you currently have > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] > @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 00000000000c0000-00000000000c3fff (prio 1, R-): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff > ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] > > with > > @: alias region mapped at sample > ~: alias region mappable but disabled at sample > o: region occluded by some other region at sample > > If you have an address in the c0000-c3fff range, you have to consult all four lines to see which region will match. > > Working off of the FlatView, first I think you would find the right resolution for the output "for free" (you wouldn't need a user sample size -- the interval c0000-c3fff would neither need further subdivision nor be blurred by over-coarse resolution). You could represent the c0000-c3fff interval (and every other interval too) with a single letter, such as P, where P would stand for "alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff". > > Second, given an address in c0000-c3fff, there would be only one range to consult (same as QEMU itself does with FlatView), and you'd find the MemoryRegion visible in that range at once. > > ... I hope Paolo will correct me if I misunderstood his suggestion. Yes, this would work. Alternatively, dumping the flat-view would give you something much simpler: 0000000000000000-000000000009ffff (RW): pc.ram 00000000000a0000-00000000000bffff (RW): vga-lowmem 00000000000c0000-00000000000fffff (R-): pc.ram @ 00000000000c0000 0000000000100000-0000000007ffffff (RW): pc.ram @ 0000000000100000 00000000fd000000-00000000fdffffff (RW): vga.vram 00000000febc0000-00000000febdffff (RW): e1000-mmio 00000000febf0000-00000000febf0fff (RW): vga.mmio 00000000febf0400-00000000febf041f (RW): vga ioports remapped 00000000febf0500-00000000febf0515 (RW): bochs dispi interface 00000000febf0600-00000000febf0607 (RW): qemu extended regs 00000000fec00000-00000000fec00fff (RW): ioapic 00000000fed00000-00000000fed003ff (RW): hpet 00000000fee00000-00000000feefffff (RW): apic-msi 00000000fffc0000-00000000ffffffff (R-): pc.bios This is less information of course, but it may good idea depending on _why_ you're doing that. Paolo
Am 20.09.2016 um 02:51 schrieb Laszlo Ersek: > On 09/20/16 02:16, Thorsten Kohfeldt wrote: >> >> Am 15.09.2016 um 11:52 schrieb Paolo Bonzini: >>> >>> On 07/09/2016 02:48, Thorsten Kohfeldt wrote: >>>> From: Thorsten Kohfeldt <mailto__qemu-devel@nongnu.org> >>>> Date: Wed, 31 Aug 2016 22:43:14 +0200 >>>> Subject: [PATCH] hmp: Improve 'info mtree' with optional parm for >>>> mapinfo >>>> >>>> Motivation >>>> When 'tuning' 'quirks' for VFIO imported devices, it is not easy to >>>> directly grasp the implications of the priorisation algorithms in place >>>> for the 'layered mapping' of memory regions. >>>> Even though there are rules (documented in docs/memory.txt), once in a >>>> while one might question the correctness of the actual implementation of >>>> the rules. >>>> Particularly, I believe I have uncovered a divergence of (sub-)region >>>> priorisation/order/visibility in a corner case of importing a device >>>> (which requires a 'quirk') with mmap enabled vs. mmap disabled. >>>> This modification provides a means of visualising the ACTUAL >>>> mapping/visibility/occlusion of subregions within regions, whereas the >>>> current info mtree command only lists the tree of regions (all, visible >>>> and invisible ones). >>>> It is primarily intended to provide support for easy presentation of my >>>> cause, but I strongly believe this modification also has general purpose >>>> advantages. >>> >>> It is a useful addition, but I think a simpler presentation is also >>> fine. What about "info mtree -f" which would visit the FlatView instead >>> of the tree? The patch would probably be much shorter. >> >> For quite some time I had the patch in use as a direct modification of >> 'info mtree', but I felt that a general purpose use must provide an ad >> hoc user selectable presentation width parameter for the map info. >> I personally use a width of 65 or even 129 characters PREFIXING the >> tree elements which the command currently responds. >> My guess is though that most users would want a width of only 9 or 17. >> So I believe that a numerical parameter is a must. >> Visit the flat view - >> I'm not sure I understand you. Do you suggest to traverse a completely >> different data structure ? >> The purpose of the suggested visualisation is to point out where >> each of the tree's memory regions are "pinched" by other regions, so >> their "native" contents is NOT visible any more throughout the full >> region length, but (fractionally) rather another regions's content. >> Thus, I personally require to traverse exactly that tree structure. >> >> No offence, but I would rather not want to modify the patch towards >> what I feel would be a completely different purpose. >> >> I would appreciate if someone would review the patch in its current >> functional form to get it queued for qemu 2.8. >> My intention is to be able to rely on communication partners being >> able to reproduce findings using the new command once I start to >> attack the VFIO mmap flaw I talk about in the commit comment. >> >> >> # PATCH (as published in mailing list) *CONVERSION* from >> qemu-2.6/qemu-2.7 to qemu-master: >> sed s/'[.]mhandler[.]cmd = '/'.cmd = '/ <qemu-2.6and7.patch >>> qemu-master.patch # patch commit adapted that way for qemu-master and frequently rebased in branch: https://github.com/Thorsten-Kohfeldt/qemu/commits/upstream-pullrequest-mapinfo-V1-rebased >> # sample output info mtree 9 >> https://gist.github.com/Thorsten-Kohfeldt/254b5a21fef497054959f58af53a44c9 >> >> # sample output info mtree 65 >> https://gist.github.com/Thorsten-Kohfeldt/39cf5c8521c1999518b3438315e439f4 > > I think your current output explains, for each part (that is, each "sample" > of user-selected size) of each MemoryRegion, whether that sample is actually > visible in the finally rendered, flat view of the address space(s). > (And, if not, why not.) If not why not: Right. That _is_ my goal. > Whereas I think Paolo's idea is to map the flat view to begin with, > and visually associate each interval of the guest visible physical address > space with the one region that is ultimately accessed in that interval. > This too would explain what the guest sees where, and why. It wouldn't give > much information about ranges that the guest can *not* access (due to > occlusion by other regions or otherwise), > but maybe the "why not" is not so important after all? To the contrary IMHO. See my whole line of statements above. The "why not" explanation is all what drove me to attempt to make this patch an official part of qemu - as a tool for 'bugfix' discussions. Otherwise I'd have to state "apply this patch then you see what I mean". For me that "why not" is an important part of information from point of view of a "memory region subtree/stack maintainer". Please note that while hunting down that VFIO/mmap problem around last Xmas I _was_ experimenting with flat view dumps. But only the final mtree mapinfo was the real eye opener for me. > Regarding the FlatView thing -- QEMU already maintains a flat rendering of > the memory region tree, so that guest accesses to the various regions can > be quickly dispatched to the accessors that can serve these accesses. > Memory regions are massaged (created, deleted, enabled/disabled etc) in > "transactions", and when the outermost transaction is committed, > the flat-view is re-rendered (which is expensive). I knew that already, but thank you for summarising it quite to the point. > I think it would make sense to work from the pre-rendered FlatView, > if the information you can get out of it covers your needs. I assume you know by now that I do _not_ feel that it covers my needs. > Here's an example, from one of your sample outputs: you currently have > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] > @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 00000000000c0000-00000000000c3fff (prio 1, R-): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff > ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 00000000000c0000-00000000000c3fff (prio 1, RW): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] > > with > > @: alias region mapped at sample > ~: alias region mappable but disabled at sample > o: region occluded by some other region at sample > > If you have an address in the c0000-c3fff range, you have to consult > all four lines to see which region will match. Actually, I feel this is too simple an example. The real stuff starts when you mix in different priorities and mmap. Please note that I do not remember negative priorities being employed, but I would want to make a pass at suggesting to use them in order to achieve a certain reduction in complexity for VFIO/mmap region stacks. That suggestion is way easier explained when I can refer to the info mtree mapinfo output. > Working off of the FlatView, first I think you would find the right > resolution for the output "for free" (you wouldn't need a user sample > size -- the interval c0000-c3fff would neither need further subdivision > nor be blurred by over-coarse resolution). I knew all along that this is a weak point of my tree sampler. But I still prefer its combination of information presented. > You could represent the c0000-c3fff interval (and every other interval > too) with a single letter, such as P, where P would stand for > "alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff". > Second, given an address in c0000-c3fff, there would be only one range > to consult (same as QEMU itself does with FlatView), and you'd find > the MemoryRegion visible in that range at once. > > Thanks > Laszlo Please see also my answer to Paolo's follow up mail.
Am 20.09.2016 um 09:56 schrieb Paolo Bonzini: > > ... dumping the flat-view would give you something much simpler: > > 0000000000000000-000000000009ffff (RW): pc.ram > 00000000000a0000-00000000000bffff (RW): vga-lowmem > 00000000000c0000-00000000000fffff (R-): pc.ram @ 00000000000c0000 > 0000000000100000-0000000007ffffff (RW): pc.ram @ 0000000000100000 > 00000000fd000000-00000000fdffffff (RW): vga.vram > 00000000febc0000-00000000febdffff (RW): e1000-mmio > 00000000febf0000-00000000febf0fff (RW): vga.mmio > 00000000febf0400-00000000febf041f (RW): vga ioports remapped > 00000000febf0500-00000000febf0515 (RW): bochs dispi interface > 00000000febf0600-00000000febf0607 (RW): qemu extended regs > 00000000fec00000-00000000fec00fff (RW): ioapic > 00000000fed00000-00000000fed003ff (RW): hpet > 00000000fee00000-00000000feefffff (RW): apic-msi > 00000000fffc0000-00000000ffffffff (R-): pc.bios How did you create this output ? I did not find any hint in monitor help or the qemu source tree. (either too many matches or no matches at all for my searches.) Is this what you envision to receive from this patch initiative ? > This is less information of course, but it may good idea depending on > _why_ you're doing that. > > Paolo I explained more of my motivation in a reply to Laszlo's mail, i.e. why I am really keen on exactly the kind of output I suggest with the patch. But as a compromise, I would be willing to add the flat view map as well. Still though, I see that as an additional patch on top, coming later, as this would need more discussion about how to exacly format the output. I'd really like to have that mtree mapinfo discussion tool merged _soon_.
On 09/20/16 20:23, Thorsten Kohfeldt wrote: > > Am 20.09.2016 um 02:51 schrieb Laszlo Ersek: >> I think it would make sense to work from the pre-rendered FlatView, >> if the information you can get out of it covers your needs. > > I assume you know by now that I do _not_ feel that it covers my needs. Okay. I guess you've made your point convincingly enough (although I'm just an interested bystander in this thread). I think the patch has a very low chance for regressions and it shouldn't draw in a big maintenance burden, so it could be considered "pure improvement" as-is. I'm not volunteering to do a full review. One thing I notice though is that the constification of memory_region_size()'s sole parameter could be / should be moved into a separate, "prep" patch. (I propose you wait for further feedback before reposting just with this change.) Thanks Laszlo
Am 20.09.2016 um 20:46 schrieb Laszlo Ersek: > On 09/20/16 20:23, Thorsten Kohfeldt wrote: >> >> Am 20.09.2016 um 02:51 schrieb Laszlo Ersek: > >>> I think it would make sense to work from the pre-rendered FlatView, >>> if the information you can get out of it covers your needs. >> >> I assume you know by now that I do _not_ feel that it covers my needs. > > Okay. I guess you've made your point convincingly enough (although I'm > just an interested bystander in this thread). I think the patch has a > very low chance for regressions and it shouldn't draw in a big > maintenance burden, so it could be considered "pure improvement" as-is. > > I'm not volunteering to do a full review. One thing I notice though is > that the constification of memory_region_size()'s sole parameter could > be / should be moved into a separate, "prep" patch. I am really not having the attitude of unwillingness to listen to suggestions. But this was my train of thoughts towards that split up (before I posted): 1) This (split up) moves the patch suite size from 1 file to 3 files. 2) It now requires an aditional cover letter, a trivial patch and the actual patch. 3) If the trivial patch was applied but not the actual patch, I'd have to maintain 2 variants of the actual patch to fit current and also older trees (with and without constant local mr pointer variable). And this comes on top since 2b9e35760a8ff5548f6789d048c6e6e3c22c70ee 4) Well, that would be even more, as I have to maintain 2 variants already anyway: V2.6/V2.7 and master. Having that stated, of course I am willing to follow your suggestion anyway to get this through if it is insisted upon. > (I propose you wait for further feedback before reposting just with this > change.) I'm gladly looking forward to decisive guidance.
On 20/09/2016 20:29, Thorsten Kohfeldt wrote: >> >> 0000000000000000-000000000009ffff (RW): pc.ram >> 00000000000a0000-00000000000bffff (RW): vga-lowmem >> 00000000000c0000-00000000000fffff (R-): pc.ram @ 00000000000c0000 >> 0000000000100000-0000000007ffffff (RW): pc.ram @ 0000000000100000 >> 00000000fd000000-00000000fdffffff (RW): vga.vram >> 00000000febc0000-00000000febdffff (RW): e1000-mmio >> 00000000febf0000-00000000febf0fff (RW): vga.mmio >> 00000000febf0400-00000000febf041f (RW): vga ioports remapped >> 00000000febf0500-00000000febf0515 (RW): bochs dispi interface >> 00000000febf0600-00000000febf0607 (RW): qemu extended regs >> 00000000fec00000-00000000fec00fff (RW): ioapic >> 00000000fed00000-00000000fed003ff (RW): hpet >> 00000000fee00000-00000000feefffff (RW): apic-msi >> 00000000fffc0000-00000000ffffffff (R-): pc.bios > > How did you create this output ? By hand. :) Paolo > I did not find any hint in monitor help or the qemu source tree. > (either too many matches or no matches at all for my searches.) > Is this what you envision to receive from this patch initiative ?
> On 2016/9/14 6:55, Alex Williamson wrote: > > [cc +Paolo] > >> On Thu, 11 Aug 2016 19:05:57 +0800 >> Yongji Xie <address@hidden> wrote: >> >> Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap >> sub-page MMIO BARs if the mmio page is exclusive") allows VFIO >> to mmap sub-page BARs. This is the corresponding QEMU patch. Immediate questions first: It seems that mentioned commit will be part of Kernel 4.8 ? But as far as I can judge this change should also cooperate with older/existing kernels (which then just have qemu behave as before) ? (For my actual point of interrest related to this patch please see further down.) >> With those patches applied, we could passthrough sub-page BARs >> to guest, which can help to improve IO performance for some devices. >> >> In this patch, we expand MemoryRegions of these sub-page >> MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that >> the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION >> with a valid size. The expanding size will be recovered when >> the base address of sub-page BAR is changed and not page aligned >> any more in guest. And we also set the priority of these BARs' >> memory regions to zero in case of overlap with BARs which share >> the same page with sub-page BARs in guest. >> >> Signed-off-by: Yongji Xie <address@hidden> >> --- >> hw/vfio/common.c | 3 +-- >> hw/vfio/pci.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 77 insertions(+), 2 deletions(-) > > Hi Yongji, ... >> + mr = region->mem; >> + mmap_mr = ®ion->mmaps[0].mem; >> + memory_region_transaction_begin(); >> + if (memory_region_size(mr) < qemu_real_host_page_size) { >> + if (!(bar_addr & ~qemu_real_host_page_mask) && >> + memory_region_is_mapped(mr) && region->mmaps[0].mmap) { >> + /* Expand memory region to page size and set priority */ >> + memory_region_del_subregion(r->address_space, mr); >> + memory_region_set_size(mr, qemu_real_host_page_size); >> + memory_region_set_size(mmap_mr, qemu_real_host_page_size); >> + memory_region_add_subregion_overlap(r->address_space, >> + bar_addr, mr, 0); >> + } >> + } else { >> + /* This case would happen when guest rescan one PCI device */ >> + if (bar_addr & ~qemu_real_host_page_mask) { >> + /* Recover the size of memory region */ >> + memory_region_set_size(mr, r->size); >> + memory_region_set_size(mmap_mr, r->size); >> + } else if (memory_region_is_mapped(mr)) { >> + /* Set the priority of memory region to zero */ >> + memory_region_del_subregion(r->address_space, mr); >> + memory_region_add_subregion_overlap(r->address_space, >> + bar_addr, mr, 0); >> + } >> + } >> + memory_region_transaction_commit(); > > Paolo, as the reigning memory API expert, do you see any issues with > this? Expanding the size of a container MemoryRegion and the contained > mmap'd subregion and manipulating priorities so that we don't interfere > with other MemoryRegions. Since the following qemu commit function memory_region_add_subregion_overlap() actually has a misleading name: http://git.qemu.org/?p=qemu.git;a=blobdiff;f=memory.c;h=ac5236b51587ee397edd177502fc20ce159f2235;hp=9daac5ea2d9a9c83533880a812760683f6e09765;hb=b61359781958759317ee6fd1a45b59be0b7dbbe1;hpb=ab0a99560857302b60053c245d1231acbd976cd4 The sole thing that memory_region_add_subregion_overlap() now actually does differently from memory_region_add_subregion() is nothing else than setting the region's priority to a value of callers choice. The _default_ priority as chosen by memory_region_add_subregion() _is_ 0. So, explicitly choosing memory_region_add_subregion_overlap(... , 0) does nothing new. Or does it: Actually, _yes_, because I see Alex actually willing to discuss choice of memory region priorities related to VFIO and mmap. Why do I "invade" this thread ? I would like you to consider thinking twice about selecting proper priorities for _any_ mmap related region (i.e. also the aligned case), and here is why: (I will also make a statement related to region expansion for alignment.) First of all, I recently suggested a patch which can visualise what I write about subsequently: https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg01315.html (I would appreciate if somebody would review and thus get it merged.) As a general remark, the sub-page mmap case does not only occur when a 'small' BAR is encountered, it also occurs when a fully mmap-ed page is split by a 'small' VFIO quirk. Hi Alex, here we go again about RTL8168 and its MSIX quirk. (Subsequently I also relate to/conclude for Yongji's patch.) Mentioned quirk cuts for certain RTL8168 models a full-page BAR right into 3 pieces, 0..qirkaddr-1, quirk and quirk+qsize..pagesize-1. What I found is that both "mmap-fractions" behave _buggy_. (Attempt of an analysis subsequently.) Here the first piece could be covered by Yongji's patch, but the third piece certainly not, as the kernel patch is limited to aligned small pieces. And that only starting with kernel 4.8. This is what we also need to solve and where above priority choice is coming into play (but there's more to consider): Right after memory_region_transaction_commit() a new flat view is created. As documented in docs/memory.txt, priorities _are_ important for the final 'stuff' which is exposed in the flat view in a certain address range. _But_ - priorities are not the only important property. So priorities need to fit into a larger picture. (Here I refer again to my suggested mtree info <mapinfo> patch.) Complexity of the situation besides watching your priorities: As far as I understand the mmap functionality is 2-phased, there is the mmap system call, which Yongji's patch is tuning around, and then the setup of the actual dma, which happens for sections of the flat view in function hw/vfio/common.c/vfio_dma_map() via an ioctl on the container file descriptor. vfio_dma_map() is called from hw/vfio/common.c/vfio_iommu_map_notify() _and_ hw/vfio/common.c/vfio_listener_region_add() (for mmap flat ranges). There be dragons - vfio_dma_map() is _skipped_ for ranges which are _not_ fully page aligned (there may be exceptions, but the buggy behavior is triggered by a skip-case currently occurring for the quirk split pieces mentioned above). This would also happen for non-page-extended page-aligned sub-page BARs, I suppose. Why is this skip-and-forget 'dangerous' ? Well, on the one hand memory.txt rules specify which mmap region is actually exposed in the flat range and thus fed into the skip-happy function vfio_listener_region_add() (I do not understand enough about vfio_iommu_map_notify(), so I disregard that one here for now). On the other hand, those regions that are dma-skipped do not seem to safely fallback to the 'slow' region-ops/memory-handlers. This is certainly also due to rules specified in memory.txt, be those either implemented correctly, but not correctly employed by well adjusted subregion-stacking or even by a buggy implementation of those stacking rules. End of last year I was able to 'fix' the RTL8168 MSIX quirk problem by adding an additional subregion (one page in size, referring to the page sized quirk split BAR), which re-exposed the required region-ops for the slow path. Back then Alex called that a hack, but still: I assume, sub-page BARs which are the subject of Yongji's patch will also be subject of the correct-stacking-problem and thus, just picking a seemingly well edjucated priority for a page-extended BAR-region might 'endanger' the extended page-fraction's slow-path/region-ops. Conclusion _This_ patch probably needs an additional page sized region added right at the correct position of the subregion 'stack' in order to be able to 'create-and-forget' to be prepared for all different additional subregion add-ons, potentially by yet unknown quirks. And while we're at it, we can also fix the RTL8168 MSIX quirk using the same precautional add-a-helper-region approach. I have a few patch snippets in store since start of this year to roughly address the dma-skip problem (still need refinement). Anybody interrested in starting to discuss those ? In order to test/verify/visualise I suggest again https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg04808.html (I would really appreciate a review, refine and merge.) And maybe we should even add a refactoring patch for the purpose of renaming memory_region_add_subregion_overlap() to memory_region_add_subregion_stacked() or memory_region_add_subregion_prioritised() or similar. Regards, Thorsten
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx index 74446c6..1593238 100644 --- a/hmp-commands-info.hx +++ b/hmp-commands-info.hx @@ -264,16 +264,17 @@ ETEXI { .name = "mtree", - .args_type = "", - .params = "", - .help = "show memory tree", + .args_type = "mapinfo-width:l?", + .params = "[mapinfo-width]", + .help = "show memory tree " + "(mapinfo-width: depict memory subregion mappings with leading characters)", .mhandler.cmd = hmp_info_mtree, }, STEXI @item info mtree @findex mtree -Show memory tree. +Show memory tree optionally depicting subregion mappings. ETEXI { diff --git a/include/exec/memory.h b/include/exec/memory.h index 3e4d416..751483c 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -536,7 +536,7 @@ struct Object *memory_region_owner(MemoryRegion *mr); * * @mr: the memory region being queried. */ -uint64_t memory_region_size(MemoryRegion *mr); +uint64_t memory_region_size(const MemoryRegion *mr); /** * memory_region_is_ram: check whether a memory region is random access @@ -1202,7 +1202,10 @@ void memory_global_dirty_log_start(void); */ void memory_global_dirty_log_stop(void); -void mtree_info(fprintf_function mon_printf, void *f); +/** + * mtree_info: see hmp-commands-info.hx item info mtree + */ +void mtree_info(fprintf_function mon_printf, void *f, const int mapinfo_width); /** * memory_region_dispatch_read: perform a read directly to the specified diff --git a/memory.c b/memory.c index 0eb6895..99161bd 100644 --- a/memory.c +++ b/memory.c @@ -595,7 +595,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr, return r; } -static AddressSpace *memory_region_to_address_space(MemoryRegion *mr) +static AddressSpace *memory_region_to_address_space(const MemoryRegion *mr) { AddressSpace *as; @@ -1477,7 +1477,7 @@ void memory_region_unref(MemoryRegion *mr) } } -uint64_t memory_region_size(MemoryRegion *mr) +uint64_t memory_region_size(const MemoryRegion *mr) { if (int128_eq(mr->size, int128_2_64())) { return UINT64_MAX; @@ -2062,11 +2062,11 @@ bool memory_region_is_mapped(MemoryRegion *mr) /* Same as memory_region_find, but it does not add a reference to the * returned region. It must be called from an RCU critical section. */ -static MemoryRegionSection memory_region_find_rcu(MemoryRegion *mr, +static MemoryRegionSection memory_region_find_rcu(const MemoryRegion *mr, hwaddr addr, uint64_t size) { MemoryRegionSection ret = { .mr = NULL }; - MemoryRegion *root; + const MemoryRegion *root; AddressSpace *as; AddrRange range; FlatView *view; @@ -2324,10 +2324,172 @@ struct MemoryRegionList { typedef QTAILQ_HEAD(queue, MemoryRegionList) MemoryRegionListHead; +static char mtree_mr_sample_reftype_marker(const MemoryRegion *mr, + const MemoryRegion *mv, + const MemoryRegion *mc, + const hwaddr offset, + const int max_chainlen) +{ + const MemoryRegion *al = mc->alias; + const MemoryRegion *submr; + char marker, hint = 0; + + if (int128_ge(int128_make64(offset), mc->size)) { + return 0; + } + if (max_chainlen < 0) { + /* this is most probably a complex alias loop situation */ + return 'E'; /* max. link chain length exceeded */ + } + + if (al) { + if (al == mv) { + if (mr->enabled) { + if (mc == mr) { + return '@'; /* indirect link found */ + } else { + return 'a'; /* 2nd degree related alias */ + } + } else { + return '~'; + } + } + if (al == mr) { + return 'L'; /* alias loop */ + } + if (al == mc) { + return 'X'; /* alias self reference */ + } + if (al->alias == mc) { + return 'M'; /* mutual alias */ + } + if ((al->enabled) && + (int128_lt(int128_add(int128_make64(offset), + int128_make64(mc->alias_offset)), + al->size))) { + marker = mtree_mr_sample_reftype_marker(mr, mv, al, + offset + mc->alias_offset, + max_chainlen - 1); + if (marker && marker != 'E') { + return marker; + } + hint |= marker; /* propagate 'E' dominantly over 0 */ + } + } + + QTAILQ_FOREACH(submr, &mc->subregions, subregions_link) { + if (submr == mv) { + if (mr->enabled) { + if (mc == mr) { + return 's'; /* occluded by subregion */ + } else { + return 'c'; /* 2nd degree related child */ + } + } else { + return '.'; + } + } + if ((submr->enabled) && (offset >= submr->addr)) { + marker = mtree_mr_sample_reftype_marker(mr, mv, submr, + offset - submr->addr, + max_chainlen - 1); + if (marker && marker != 'E') { + return marker; + } + hint |= marker; /* propagate 'E' dominantly over 0 */ + } + } + + return hint; /* either 0 or 'E' */ +} + +/* Depict memory region subrange structure, + * i.e. occlusion by submaps respectively visibility of submaps + * as supposedly resulting from region priorisation rules. + */ +static void mtree_print_mr_v_samples(fprintf_function mon_printf, + void *f, + const MemoryRegion *mr, + const unsigned int columns) +{ + const MemoryRegion *mv; + unsigned int i, j; + hwaddr size, offset; + bool covered = false; + + /* prevent uncovered corner cases and excessive sampling effort */ + if ((columns < 2) || (columns > 1000)) { + if (columns) { + mon_printf(f, "[%s: not supporting %d column(s)]", + __func__, columns); + } + return; + } + + /* convert/constrain mr->size to hwaddr bit size */ + size = memory_region_size(mr); + if (!size) { + /* size is 0 for example with 'dummy' regions in VFIO */ + mon_printf(f, "%*s", columns, ""); + return; + } + + i = columns; + j = size / (i - 1); /* step ('slice') width */ + if (j < 1) { + j = 1; + } + offset = 0; /* sample position within region */ + while (i--) { + if (offset >= (size - 1)) { + /* region might have less bytes than columns were requested */ + if (covered) { + mon_printf(f, " "); /* no more additional samples */ + continue; + } + covered = true; + offset = size - 1; + } + rcu_read_lock(); + mv = memory_region_find_rcu(mr, offset, 1).mr; + rcu_read_unlock(); + if (mv == mr) { + if (mr->enabled) { + mon_printf(f, "+"); /* mr is directly visible at the sample addr */ + } else { + mon_printf(f, "-"); + } + } else { + /* mr is not visible, but mv is visible unless NULL */ + if (!mv) { + mon_printf(f, "/"); /* nothing mapped at the sample addr */ + } else { + /* mr is occluded by mv which supposedly is + * either linked via an alias/subregion construct + * or a higher prioritized subregion of the same parent + */ + char marker; + + /* current memory region hierarchy depth is close to 5, + * so a maximum search depth of 20 should be sufficient; + * i.e. a link chain length longer than 20 is considered a loop + */ + marker = mtree_mr_sample_reftype_marker(mr, mv, mr, offset, 20); + if (!marker) { + marker = 'o'; /* occlusion by sibling or unrelated region */ + } + mon_printf(f, "%c", marker); + } + } + offset += j; + } +} + static void mtree_print_mr(fprintf_function mon_printf, void *f, const MemoryRegion *mr, unsigned int level, hwaddr base, - MemoryRegionListHead *alias_print_queue) + MemoryRegionListHead *alias_print_queue, + const unsigned int mr_visibility_samples) { MemoryRegionList *new_ml, *ml, *next_ml; MemoryRegionListHead submr_print_queue; @@ -2338,6 +2500,12 @@ static void mtree_print_mr(fprintf_function mon_printf, void *f, return; } + if (mr_visibility_samples) { + /* on the very left depict region occlusion/visibility */ + mtree_print_mr_v_samples(mon_printf, f, + mr, mr_visibility_samples); + } + for (i = 0; i < level; i++) { mon_printf(f, " "); } @@ -2415,7 +2583,7 @@ static void mtree_print_mr(fprintf_function mon_printf, void *f, QTAILQ_FOREACH(ml, &submr_print_queue, queue) { mtree_print_mr(mon_printf, f, ml->mr, level + 1, base + mr->addr, - alias_print_queue); + alias_print_queue, mr_visibility_samples); } QTAILQ_FOREACH_SAFE(ml, &submr_print_queue, queue, next_ml) { @@ -2423,24 +2591,84 @@ static void mtree_print_mr(fprintf_function mon_printf, void *f, } } -void mtree_info(fprintf_function mon_printf, void *f) +static int sane_mtree_info_mapinfo_width(const int requested_width) +{ + static int default_width = -1; + + int width = requested_width; + + /* (on first call) establish a default for the number of region samples */ + if (default_width < 0) { + char *str = getenv("QEMU_INFO_MTREE_MAPINFO_WIDTH"); + + if (str) { + default_width = atoi(str); + } else { + default_width = 0; + } + if (default_width < 0) { + /* prevent repeating getenv - fallback to sampling over 8 'slices' */ + default_width = 9; + } + } + + /* use the default when the requested width is negative */ + if (width < 0) { + width = default_width; + } + + /* and finally adapt to 'usability' constraints */ + if (width < 5) { + /* sampling over very few 'slices' creates too much + * corner case complexity and is also not much of use + */ + width = 0; /* disable the sample display completely */ + } + if (width > 0x101) { + /* limit the output prefix width to some (un-)reasonable value: + * max 2^8+1 samples subdividing a region into 2^8 'slices' + */ + width = 0x101; + } + + return width; +} + +void mtree_info(fprintf_function mon_printf, void *f, const int mapinfo_width) { MemoryRegionListHead ml_head; MemoryRegionList *ml, *ml2; AddressSpace *as; + int mr_visibility_samples = sane_mtree_info_mapinfo_width(mapinfo_width); + + if (mr_visibility_samples) { + /* print a symbolisation cross reference */ + mon_printf(f, "\n"); + mon_printf(f, "/: nothing mapped at sample address\n"); + mon_printf(f, "+: region directly mapped at sample\n"); + mon_printf(f, "@: alias region mapped at sample\n"); + mon_printf(f, "~: alias region mappable but disabled at sample\n"); + mon_printf(f, "s: region occluded by subregion at sample\n"); + mon_printf(f, "a: region occluded by an aliased subregion at sample\n"); + mon_printf(f, "c: region occluded by a child (subregion) of an alias at sample\n"); + mon_printf(f, "o: region occluded by some other region at sample\n"); + mon_printf(f, "\n"); + } QTAILQ_INIT(&ml_head); QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { mon_printf(f, "address-space: %s\n", as->name); - mtree_print_mr(mon_printf, f, as->root, 1, 0, &ml_head); + mtree_print_mr(mon_printf, f, as->root, 1, 0, &ml_head, + mr_visibility_samples); mon_printf(f, "\n"); } /* print aliased regions */ QTAILQ_FOREACH(ml, &ml_head, queue) { mon_printf(f, "memory-region: %s\n", memory_region_name(ml->mr)); - mtree_print_mr(mon_printf, f, ml->mr, 1, 0, &ml_head); + mtree_print_mr(mon_printf, f, ml->mr, 1, 0, &ml_head, + mr_visibility_samples); mon_printf(f, "\n"); } diff --git a/monitor.c b/monitor.c index 5c00373..2f55d12 100644 --- a/monitor.c +++ b/monitor.c @@ -1527,7 +1527,9 @@ static void hmp_boot_set(Monitor *mon, const QDict *qdict) static void hmp_info_mtree(Monitor *mon, const QDict *qdict) { - mtree_info((fprintf_function)monitor_printf, mon); + int mapinfo_width = qdict_get_try_int(qdict, "mapinfo-width", -1); + + mtree_info((fprintf_function)monitor_printf, mon, mapinfo_width); } static void hmp_info_numa(Monitor *mon, const QDict *qdict)