Message ID | 1475316333-9776-4-git-send-email-atar4qemu@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/01/2016 05:05 AM, Artyom Tarasenko wrote: > #define TTE_VALID_BIT (1ULL << 63) > #define TTE_NFO_BIT (1ULL << 60) > +#define TTE_NFO_BIT_UA2005 (1ULL << 62) > #define TTE_USED_BIT (1ULL << 41) > +#define TTE_USED_BIT_UA2005 (1ULL << 47) > #define TTE_LOCKED_BIT (1ULL << 6) > +#define TTE_LOCKED_BIT_UA2005 (1ULL << 61) > #define TTE_SIDEEFFECT_BIT (1ULL << 3) > +#define TTE_SIDEEFFECT_BIT_UA2005 (1ULL << 11) > #define TTE_PRIV_BIT (1ULL << 2) > +#define TTE_PRIV_BIT_UA2005 (1ULL << 8) > #define TTE_W_OK_BIT (1ULL << 1) > +#define TTE_W_OK_BIT_UA2005 (1ULL << 6) > #define TTE_GLOBAL_BIT (1ULL << 0) Hmm. Would it make more sense to reorg these as TTE_US1_* TTE_UA2005_* with some duplication for the bits that are shared? As is, it's pretty hard to tell which actually change... r~
10 окт. 2016 г. 23:22 пользователь "Richard Henderson" <rth@twiddle.net> написал: > > On 10/01/2016 05:05 AM, Artyom Tarasenko wrote: >> >> #define TTE_VALID_BIT (1ULL << 63) >> #define TTE_NFO_BIT (1ULL << 60) >> +#define TTE_NFO_BIT_UA2005 (1ULL << 62) >> #define TTE_USED_BIT (1ULL << 41) >> +#define TTE_USED_BIT_UA2005 (1ULL << 47) >> #define TTE_LOCKED_BIT (1ULL << 6) >> +#define TTE_LOCKED_BIT_UA2005 (1ULL << 61) >> #define TTE_SIDEEFFECT_BIT (1ULL << 3) >> +#define TTE_SIDEEFFECT_BIT_UA2005 (1ULL << 11) >> #define TTE_PRIV_BIT (1ULL << 2) >> +#define TTE_PRIV_BIT_UA2005 (1ULL << 8) >> #define TTE_W_OK_BIT (1ULL << 1) >> +#define TTE_W_OK_BIT_UA2005 (1ULL << 6) >> #define TTE_GLOBAL_BIT (1ULL << 0) > > > Hmm. Would it make more sense to reorg these as > > TTE_US1_* > TTE_UA2005_* > > with some duplication for the bits that are shared? > As is, it's pretty hard to tell which actually change... All of them :-) I'm not sure about renaming: the US1 format is still used in T1 on the read access. On the other hand, it's not used in T2. And then again we don't have the T2 emulation yet. Artyom
On 10/10/2016 04:45 PM, Artyom Tarasenko wrote: >> Hmm. Would it make more sense to reorg these as >> >> TTE_US1_* >> TTE_UA2005_* >> >> with some duplication for the bits that are shared? >> As is, it's pretty hard to tell which actually change... > > All of them :-) > I'm not sure about renaming: the US1 format is still used in T1 on the read > access. > > On the other hand, it's not used in T2. And then again we don't have the T2 > emulation yet. Oh my. Different on T2 as well? I wonder if it would make sense to have different functions with which to fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary) for the major entry points. E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so that the choice of how to handle the tlb miss is chosen at startup time, and not during each fault. One can arrange subroutines as necessary to share code between the alternate routines, such as when T1 needs to use parts of US1. Similarly for out-of-line ASI handling, which is already beyond messy, with handling for all cpus thrown in the same switch statement. r~
On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson <rth@twiddle.net> wrote: > On 10/10/2016 04:45 PM, Artyom Tarasenko wrote: >>> >>> Hmm. Would it make more sense to reorg these as >>> >>> TTE_US1_* >>> TTE_UA2005_* >>> >>> with some duplication for the bits that are shared? >>> As is, it's pretty hard to tell which actually change... >> >> >> All of them :-) >> I'm not sure about renaming: the US1 format is still used in T1 on the >> read >> access. >> >> On the other hand, it's not used in T2. And then again we don't have the >> T2 >> emulation yet. > > > Oh my. Different on T2 as well? T2 has more used bits, and can not use the US1 format, I think. > I wonder if it would make sense to have different functions with which to > fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary) > for the major entry points. > > E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so > that the choice of how to handle the tlb miss is chosen at startup time, and > not during each fault. One can arrange subroutines as necessary to share > code between the alternate routines, such as when T1 needs to use parts of > US1. Yes, I plan to do it once I get to T2 emulation. > Similarly for out-of-line ASI handling, which is already beyond messy, with > handling for all cpus thrown in the same switch statement. Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific ones, call cpu-specific handlers first and standard handler afterwards. But not in this series. Artyom
On 10/11/2016 08:51 AM, Artyom Tarasenko wrote: > On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson <rth@twiddle.net> wrote: >> On 10/10/2016 04:45 PM, Artyom Tarasenko wrote: >>>> >>>> Hmm. Would it make more sense to reorg these as >>>> >>>> TTE_US1_* >>>> TTE_UA2005_* >>>> >>>> with some duplication for the bits that are shared? >>>> As is, it's pretty hard to tell which actually change... >>> >>> >>> All of them :-) >>> I'm not sure about renaming: the US1 format is still used in T1 on the >>> read >>> access. >>> >>> On the other hand, it's not used in T2. And then again we don't have the >>> T2 >>> emulation yet. >> >> >> Oh my. Different on T2 as well? > > T2 has more used bits, and can not use the US1 format, I think. > >> I wonder if it would make sense to have different functions with which to >> fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as necessary) >> for the major entry points. >> >> E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, so >> that the choice of how to handle the tlb miss is chosen at startup time, and >> not during each fault. One can arrange subroutines as necessary to share >> code between the alternate routines, such as when T1 needs to use parts of >> US1. > > Yes, I plan to do it once I get to T2 emulation. Ok. >> Similarly for out-of-line ASI handling, which is already beyond messy, with >> handling for all cpus thrown in the same switch statement. > > Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific > ones, call cpu-specific handlers first and standard handler > afterwards. > But not in this series. Fair enough. What I would most like to see, for QEMU, is an artificial sun4v compatible machine that implements a "hardware" page table walk. I.e. no use of SparcTLBEntry, but walking the page tables directly. Because QEMU can then satisfy a page lookup internally, without having to longjmp out of a memory reference in progress in order to restart the cpu for the software TLB miss handler, the emulation runs about 30-50% faster. At least that has been my experience emulating Alpha vs MIPS. It would require custom roms, but those should be fairly easy to modify from the existing source. r~
On Tue, Oct 11, 2016 at 5:08 PM, Richard Henderson <rth@twiddle.net> wrote: > On 10/11/2016 08:51 AM, Artyom Tarasenko wrote: >> >> On Tue, Oct 11, 2016 at 7:50 AM, Richard Henderson <rth@twiddle.net> >> wrote: >>> >>> On 10/10/2016 04:45 PM, Artyom Tarasenko wrote: >>>>> >>>>> >>>>> Hmm. Would it make more sense to reorg these as >>>>> >>>>> TTE_US1_* >>>>> TTE_UA2005_* >>>>> >>>>> with some duplication for the bits that are shared? >>>>> As is, it's pretty hard to tell which actually change... >>>> >>>> >>>> >>>> All of them :-) >>>> I'm not sure about renaming: the US1 format is still used in T1 on the >>>> read >>>> access. >>>> >>>> On the other hand, it's not used in T2. And then again we don't have the >>>> T2 >>>> emulation yet. >>> >>> >>> >>> Oh my. Different on T2 as well? >> >> >> T2 has more used bits, and can not use the US1 format, I think. >> >>> I wonder if it would make sense to have different functions with which to >>> fill in the CPUClass hooks (or invent new SPARCCPUClass hooks as >>> necessary) >>> for the major entry points. >>> >>> E.g. sparc_cpu_handle_mmu_fault or get_physical_address could be hooked, >>> so >>> that the choice of how to handle the tlb miss is chosen at startup time, >>> and >>> not during each fault. One can arrange subroutines as necessary to share >>> code between the alternate routines, such as when T1 needs to use parts >>> of >>> US1. >> >> >> Yes, I plan to do it once I get to T2 emulation. > > > Ok. > >>> Similarly for out-of-line ASI handling, which is already beyond messy, >>> with >>> handling for all cpus thrown in the same switch statement. >> >> >> Yes. I think we need to split SPARCv9 standard ASIs from CPU-specific >> ones, call cpu-specific handlers first and standard handler >> afterwards. >> But not in this series. > > > Fair enough. > > What I would most like to see, for QEMU, is an artificial sun4v compatible > machine that implements a "hardware" page table walk. I.e. no use of > SparcTLBEntry, but walking the page tables directly. > > Because QEMU can then satisfy a page lookup internally, without having to > longjmp out of a memory reference in progress in order to restart the cpu > for the software TLB miss handler, the emulation runs about 30-50% faster. > At least that has been my experience emulating Alpha vs MIPS. > > It would require custom roms, but those should be fairly easy to modify from > the existing source. > Maybe it's even possible without the modifications. For instance, implement the table walk compatible with the current hypervisor, and then just add possibility to overlay hypervisor call using some CPU feature flag.
On 10/12/2016 06:18 AM, Artyom Tarasenko wrote: >> What I would most like to see, for QEMU, is an artificial sun4v compatible >> machine that implements a "hardware" page table walk. I.e. no use of >> SparcTLBEntry, but walking the page tables directly. >> >> Because QEMU can then satisfy a page lookup internally, without having to >> longjmp out of a memory reference in progress in order to restart the cpu >> for the software TLB miss handler, the emulation runs about 30-50% faster. >> At least that has been my experience emulating Alpha vs MIPS. >> >> It would require custom roms, but those should be fairly easy to modify from >> the existing source. >> > > Maybe it's even possible without the modifications. For instance, > implement the table walk compatible with the current hypervisor, and > then just add possibility to overlay hypervisor call using some CPU > feature flag. Maybe so. What we lack is being given direct access to the page table base. But we know that the CPU structure is in the hypervisor shadow register 0, and that offset CPU_ROOT is the page table base. As long as we're willing to hard-code these two facts concerning any rom we care to load, we could in fact implement the tlb miss success path inside QEMU. We would let the rom re-do the work for the tlb miss failure path, on the way to raising the exception with the supervisor. r~
diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h index 238ebf2..2c169e1 100644 --- a/target-sparc/cpu.h +++ b/target-sparc/cpu.h @@ -290,11 +290,17 @@ enum { #define TTE_VALID_BIT (1ULL << 63) #define TTE_NFO_BIT (1ULL << 60) +#define TTE_NFO_BIT_UA2005 (1ULL << 62) #define TTE_USED_BIT (1ULL << 41) +#define TTE_USED_BIT_UA2005 (1ULL << 47) #define TTE_LOCKED_BIT (1ULL << 6) +#define TTE_LOCKED_BIT_UA2005 (1ULL << 61) #define TTE_SIDEEFFECT_BIT (1ULL << 3) +#define TTE_SIDEEFFECT_BIT_UA2005 (1ULL << 11) #define TTE_PRIV_BIT (1ULL << 2) +#define TTE_PRIV_BIT_UA2005 (1ULL << 8) #define TTE_W_OK_BIT (1ULL << 1) +#define TTE_W_OK_BIT_UA2005 (1ULL << 6) #define TTE_GLOBAL_BIT (1ULL << 0) #define TTE_IS_VALID(tte) ((tte) & TTE_VALID_BIT) @@ -302,14 +308,24 @@ enum { #define TTE_IS_USED(tte) ((tte) & TTE_USED_BIT) #define TTE_IS_LOCKED(tte) ((tte) & TTE_LOCKED_BIT) #define TTE_IS_SIDEEFFECT(tte) ((tte) & TTE_SIDEEFFECT_BIT) +#define TTE_IS_SIDEEFFECT_UA2005(tte) ((tte) & TTE_SIDEEFFECT_BIT_UA2005) #define TTE_IS_PRIV(tte) ((tte) & TTE_PRIV_BIT) #define TTE_IS_W_OK(tte) ((tte) & TTE_W_OK_BIT) + +#define TTE_IS_NFO_UA2005(tte) ((tte) & TTE_NFO_BIT_UA2005) +#define TTE_IS_USED_UA2005(tte) ((tte) & TTE_USED_BIT_UA2005) +#define TTE_IS_LOCKED_UA2005(tte) ((tte) & TTE_LOCKED_BIT_UA2005) +#define TTE_IS_SIDEEFFECT_UA2005(tte) ((tte) & TTE_SIDEEFFECT_BIT_UA2005) +#define TTE_IS_PRIV_UA2005(tte) ((tte) & TTE_PRIV_BIT_UA2005) +#define TTE_IS_W_OK_UA2005(tte) ((tte) & TTE_W_OK_BIT_UA2005) + #define TTE_IS_GLOBAL(tte) ((tte) & TTE_GLOBAL_BIT) #define TTE_SET_USED(tte) ((tte) |= TTE_USED_BIT) #define TTE_SET_UNUSED(tte) ((tte) &= ~TTE_USED_BIT) #define TTE_PGSIZE(tte) (((tte) >> 61) & 3ULL) +#define TTE_PGSIZE_UA2005(tte) ((tte) & 7ULL) #define TTE_PA(tte) ((tte) & 0x1ffffffe000ULL) #define SFSR_NF_BIT (1ULL << 24) /* JPS1 NoFault */
Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com> --- target-sparc/cpu.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)