From patchwork Thu Jul 25 09:51:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Nguyen X-Patchwork-Id: 11058367 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0AFE1398 for ; Thu, 25 Jul 2019 09:51:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCFDF28989 for ; Thu, 25 Jul 2019 09:51:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CCAAD289A0; Thu, 25 Jul 2019 09:51:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BB06728989 for ; Thu, 25 Jul 2019 09:51:29 +0000 (UTC) Received: from localhost ([::1]:57750 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqaPJ-0003GP-2c for patchwork-qemu-devel@patchwork.kernel.org; Thu, 25 Jul 2019 05:51:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:43838) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqaP9-0002qN-A2 for qemu-devel@nongnu.org; Thu, 25 Jul 2019 05:51:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hqaP7-0007sW-Em for qemu-devel@nongnu.org; Thu, 25 Jul 2019 05:51:19 -0400 Received: from smtpe1.intersmtp.com ([213.121.35.80]:18602) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hqaP7-0007ro-4x; Thu, 25 Jul 2019 05:51:17 -0400 Received: from tpw09926dag18h.domain1.systemhost.net (10.9.212.42) by BWP09926085.bt.com (10.36.82.116) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1713.5; Thu, 25 Jul 2019 10:51:12 +0100 Received: from tpw09926dag18e.domain1.systemhost.net (10.9.212.18) by tpw09926dag18h.domain1.systemhost.net (10.9.212.42) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Thu, 25 Jul 2019 10:51:14 +0100 Received: from tpw09926dag18e.domain1.systemhost.net ([fe80::a946:6348:ccf4:fa6c]) by tpw09926dag18e.domain1.systemhost.net ([fe80::a946:6348:ccf4:fa6c%12]) with mapi id 15.00.1395.000; Thu, 25 Jul 2019 10:51:14 +0100 From: To: Thread-Topic: [Qemu-devel] [PATCH v4 00/15] Invert Endian bit in SPARCv9 MMU TTE Thread-Index: AQHVQs5/0GZI33VvdEK7f5C9/L2VTg== Date: Thu, 25 Jul 2019 09:51:14 +0000 Message-ID: <45d1ebe4b2ed4c039c9da20a738652df@tpw09926dag18e.domain1.systemhost.net> Accept-Language: en-AU, en-GB, en-US Content-Language: en-AU X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.187.101.42] MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 213.121.35.80 Subject: [Qemu-devel] [PATCH v4 00/15] Invert Endian bit in SPARCv9 MMU TTE X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, walling@linux.ibm.com, sagark@eecs.berkeley.edu, david@redhat.com, palmer@sifive.com, mark.cave-ayland@ilande.co.uk, Alistair.Francis@wdc.com, edgar.iglesias@gmail.com, arikalo@wavecomp.com, pasic@linux.ibm.com, borntraeger@de.ibm.com, rth@twiddle.net, atar4qemu@gmail.com, ehabkost@redhat.com, alex.williamson@redhat.com, qemu-arm@nongnu.org, stefanha@redhat.com, shorne@gmail.com, david@gibson.dropbear.id.au, qemu-riscv@nongnu.org, qemu-s390x@nongnu.org, kbastian@mail.uni-paderborn.de, cohuck@redhat.com, laurent@vivier.eu, qemu-ppc@nongnu.org, amarkovic@wavecomp.com, pbonzini@redhat.com, aurelien@aurel32.net Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE. It is an attempt of the instructions outlined by Richard Henderson to Mark Cave-Ayland. Tested with OpenBSD on sun4u. Solaris 10 is my actual goal, but unfortunately a separate keyboard issue remains in the way. On 01/11/17 19:15, Mark Cave-Ayland wrote: >On 15/08/17 19:10, Richard Henderson wrote: > >> [CC Peter re MemTxAttrs below] >> >> On 08/15/2017 09:38 AM, Mark Cave-Ayland wrote: >>> Working through an incorrect endian issue on qemu-system-sparc64, it has >>> become apparent that at least one OS makes use of the IE (Invert Endian) >>> bit in the SPARCv9 MMU TTE to map PCI memory space without the >>> programmer having to manually endian-swap accesses. >>> >>> In other words, to quote the UltraSPARC specification: "if this bit is >>> set, accesses to the associated page are processed with inverse >>> endianness from what is specified by the instruction (big-for-little and >>> little-for-big)". A good explanation by Mark why the IE bit is required. >>> >>> Looking through various bits of code, I'm trying to get a feel for the >>> best way to implement this in an efficient manner. From what I can see >>> this could be solved using an additional MMU index, however I'm not >>> overly familiar with the memory and softmmu subsystems. >> >> No, it can't be solved with an MMU index. >> >>> Can anyone point me in the right direction as to what would be the best >>> way to implement this feature within QEMU? >> >> It's definitely tricky. >> >> We definitely need some TLB_FLAGS_MASK bit set so that we're forced through >> the >> memory slow path. There is no other way to bypass the endianness that we've >> already encoded from the target instruction. >> >> Given the tlb_set_page_with_attrs interface, I would think that we need a new >> bit in MemTxAttrs, so that the target/sparc tlb_fill (and subroutines) can >> pass >> along the TTE bit for the given page. >> >> We have an existing problem in softmmu_template.h, >> >> /* ??? Note that the io helpers always read data in the target >> byte ordering. We should push the LE/BE request down into io. */ >> res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr); >> res = TGT_BE(res); >> >> We do not want to add a third(!) byte swap along the i/o path. We need to >> collapse the two that we have already before considering this one. >> >> This probably takes the form of: >> >> (1) Replacing the "int size" argument with "TCGMemOp memop" for >> a) io_{read,write}x in accel/tcg/cputlb.c, >> b) memory_region_dispatch_{read,write} in memory.c, >> c) adjust_endianness in memory.c. >> This carries size+sign+endianness down to the next level. >> >> (2) In memory.c, adjust_endianness, >> >> if (memory_region_wrong_endianness(mr)) { >> - switch (size) { >> + memop ^= MO_BSWAP; >> + } >> + if (memop & MO_BSWAP) { >> >> For extra credit, re-arrange memory_region_wrong_endianness >> to something more explicit -- "wrong" isn't helpful. > >Finally I've had a bit of spare time to experiment with this approach, >and from what I can see there are currently 2 issues: > > >1) Using TCGMemOp in memory.c means it is no longer accelerator agnostic > >For the moment I've defined a separate MemOp in memory.h and provided a >mapping function in io_{read,write}x to map from TCGMemOp to MemOp and >then pass that into memory_region_dispatch_{read,write}. > >Other than not referencing TCGMemOp in the memory API, another reason >for doing this was that I wasn't convinced that all the MO_ attributes >were valid outside of TCG. I do, of course, strongly defer to other >people's knowledge in this area though. > > >2) The above changes to adjust_endianness() fail when >memory_region_dispatch_{read,write} are called recursively > >Whilst booting qemu-system-sparc64 I see that >memory_region_dispatch_{read,write} get called recursively - once via >io_{read,write}x and then again via flatview_read_continue() in exec.c. > >The net effect of this is that we perform the bswap correctly at the >tail of the recursion, but then as we travel back up the stack we hit >memory_region_dispatch_{read,write} once again causing a second bswap >which means the value is returned with the incorrect endian again. > > >My understanding from your softmmu_template.h comment above is that the >memory API should do the endian swapping internally allowing the removal >of the final TGT_BE/TGT_LE applied to the result, or did I get this wrong? > >> (3) In tlb_set_page_with_attrs, notice attrs.byte_swap and set >> a new TLB_FORCE_SLOW bit within TLB_FLAGS_MASK. >> >> (4) In io_{read,write}x, if iotlbentry->attrs.byte_swap is set, >> then memop ^= MO_BSWAP. Thanks all for the v1 and v2 feedback. v2: - Moved size+sign+endianness attributes from TCGMemOp into MemOp. In v1 TCGMemOp was re-purposed entirely into MemOp. - Replaced MemOp MO_{8|16|32|64} with TCGMemOp MO_{UB|UW|UL|UQ} alias. This is to avoid warnings on comparing and coercing different enums. - Renamed get_memop to get_tcgmemop for clarity. - MEMOP is now SIZE_MEMOP, which is just ctzl(size). - Split patch 3/4 so one memory_region_dispatch_{read|write} interface is converted per patch. - Do not reuse TLB_RECHECK, use new TLB_FORCE_SLOW instead. - Split patch 4/4 so adding the MemTxAddrs parameters and converting tlb_set_page() to tlb_set_page_with_attrs() is separate from usage. - CC'd maintainers. v3: - Like v1, the entire TCGMemOp enum is now MemOp. - MemOp target dependant attributes are conditional upon NEED_CPU_H v4: - Added Paolo Bonzini as include/exec/memop.h maintainer Tony Nguyen (15): tcg: TCGMemOp is now accelerator independent MemOp memory: Access MemoryRegion with MemOp target/mips: Access MemoryRegion with MemOp hw/s390x: Access MemoryRegion with MemOp hw/intc/armv7m_nic: Access MemoryRegion with MemOp hw/virtio: Access MemoryRegion with MemOp hw/vfio: Access MemoryRegion with MemOp exec: Access MemoryRegion with MemOp cputlb: Access MemoryRegion with MemOp memory: Access MemoryRegion with MemOp semantics memory: Single byte swap along the I/O path cpu: TLB_FLAGS_MASK bit to force memory slow path cputlb: Byte swap memory transaction attribute target/sparc: Add TLB entry with attributes target/sparc: sun4u Invert Endian TTE bit MAINTAINERS | 1 + accel/tcg/cputlb.c | 71 +++++++++-------- exec.c | 6 +- hw/intc/armv7m_nvic.c | 12 ++- hw/s390x/s390-pci-inst.c | 8 +- hw/vfio/pci-quirks.c | 5 +- hw/virtio/virtio-pci.c | 7 +- include/exec/cpu-all.h | 10 ++- include/exec/memattrs.h | 2 + include/exec/memop.h | 112 +++++++++++++++++++++++++++ include/exec/memory.h | 9 ++- memory.c | 37 +++++---- memory_ldst.inc.c | 18 ++--- target/alpha/translate.c | 2 +- target/arm/translate-a64.c | 48 ++++++------ target/arm/translate-a64.h | 2 +- target/arm/translate-sve.c | 2 +- target/arm/translate.c | 32 ++++---- target/arm/translate.h | 2 +- target/hppa/translate.c | 14 ++-- target/i386/translate.c | 132 ++++++++++++++++---------------- target/m68k/translate.c | 2 +- target/microblaze/translate.c | 4 +- target/mips/op_helper.c | 5 +- target/mips/translate.c | 8 +- target/openrisc/translate.c | 4 +- target/ppc/translate.c | 12 +-- target/riscv/insn_trans/trans_rva.inc.c | 8 +- target/riscv/insn_trans/trans_rvi.inc.c | 4 +- target/s390x/translate.c | 6 +- target/s390x/translate_vx.inc.c | 10 +-- target/sparc/cpu.h | 2 + target/sparc/mmu_helper.c | 40 ++++++---- target/sparc/translate.c | 14 ++-- target/tilegx/translate.c | 10 +-- target/tricore/translate.c | 8 +- tcg/README | 2 +- tcg/aarch64/tcg-target.inc.c | 26 +++---- tcg/arm/tcg-target.inc.c | 26 +++---- tcg/i386/tcg-target.inc.c | 24 +++--- tcg/mips/tcg-target.inc.c | 16 ++-- tcg/optimize.c | 2 +- tcg/ppc/tcg-target.inc.c | 12 +-- tcg/riscv/tcg-target.inc.c | 20 ++--- tcg/s390/tcg-target.inc.c | 14 ++-- tcg/sparc/tcg-target.inc.c | 6 +- tcg/tcg-op.c | 38 ++++----- tcg/tcg-op.h | 86 ++++++++++----------- tcg/tcg.c | 2 +- tcg/tcg.h | 99 ++---------------------- trace/mem-internal.h | 4 +- trace/mem.h | 4 +- 52 files changed, 562 insertions(+), 488 deletions(-) create mode 100644 include/exec/memop.h