Message ID | 20210114150902.11515-1-bmeng.cn@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands | expand |
On 1/14/21 4:08 PM, Bin Meng wrote: > From: Bin Meng <bin.meng@windriver.com> > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > bytes are expected to be received after it receives a command. For > example, depending on the address mode, either 3-byte address or > 4-byte address is needed. > > For fast read family commands, some dummy cycles are required after > sending the address bytes, and the dummy cycles need to be counted > in s->needed_bytes. This is where the mess began. > > As the variable name (needed_bytes) indicates, the unit is in byte. > It is not in bit, or cycle. However for some reason the model has > been using the number of dummy cycles for s->needed_bytes. The right > approach is to convert the number of dummy cycles to bytes based on > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > Things get complicated when interacting with different SPI or QSPI > flash controllers. There are major two cases: > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > For such case, driver will calculate the correct number of dummy > bytes and write them into the tx fifo. Fixing the m25p80 model will > fix flashes working with such controllers. > - Dummy bytes not prepared by drivers. Drivers just tell the hardware > the dummy cycle configuration via some registers, and hardware will > automatically generate dummy cycles for us. Fixing the m25p80 model > is not enough, and we will need to fix the SPI/QSPI models for such > controllers. > > This series fixes the mess in the m25p80 from the flash side first, > followed by fixes to 3 known SPI controller models that fall into > the 2nd case above. > > Please note, I have no way to verify patch 7/8/9 because: > > * There is no public datasheet available for the SoC / SPI controller > * There is no QEMU docs, or details that tell people how to boot either > U-Boot or Linux kernel to verify the functionality The Linux drivers are available in mainline but these branches are more up to date since not everything is merged : https://github.com/openbmc/linux u-boot : https://github.com/openbmc/u-boot/tree/v2016.07-aspeed-openbmc (ast2400/ast2500) https://github.com/openbmc/u-boot/tree/v2019.04-aspeed-openbmc (ast2600) A quick intro : https://www.qemu.org/docs/master/system/arm/aspeed.html > > These 3 patches are very likely to be wrong. Hence I would like to ask > help from the original author who wrote these SPI controller models > to help testing, or completely rewrite these 3 patches to fix things. > Thanks! A quick test shows that all Aspeed machines are broken with this patchset. Please try these command lines : wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=palmetto/artifact/deploy/images/palmetto/flash-palmetto wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=romulus/artifact/deploy/images/romulus/flash-romulus wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=witherspoon/artifact/deploy/images/witherspoon/obmc-phosphor-image-witherspoon.ubi.mtd qemu-system-arm -M witherspoon-bmc -nic user -drive file=obmc-phosphor-image-witherspoon.ubi.mtd,format=raw,if=mtd -nographic qemu-system-arm -M romulus-bmc -nic user -drive file=flash-romulus,format=raw,if=mtd -nographic qemu-system-arm -M palmetto-bmc -nic user -drive file=flash-palmetto,format=raw,if=mtd -nographic The Aspeed SMC model has traces to help you in the task. Thanks, C. > Patch 6 is unvalidated with QEMU, mainly because there is no doc to > tell people how to boot anything to test. But I have some confidence > based on my read of the ZynqMP manual, as well as some experimental > testing on a real ZCU102 board. > > Other flash patches can be tested with the SiFive SPI series: > http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391 > > Cherry-pick patch 16 and 17 from the series above, and switch to > different flash model to test with the following command: > > $ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot > > I've picked up two for testing: > > QEMU flash: "sst25vf032b" > > U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800) > > CPU: rv64imafdcsu > Model: SiFive HiFive Unleashed A00 > DRAM: 2 GiB > MMC: > Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB > *** Warning - bad CRC, using default environment > > In: serial@10010000 > Out: serial@10010000 > Err: serial@10010000 > Net: failed to get gemgxl_reset reset > > Warning: ethernet@10090000 MAC addresses don't match: > Address in DT is 52:54:00:12:34:56 > Address in environment is 70:b3:d5:92:f0:01 > eth0: ethernet@10090000 > Hit any key to stop autoboot: 0 > => sf probe > SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, > total 4 MiB > => sf test 1ff000 1000 > SPI flash test: > 0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps > 1 check: 10 ticks, 400 KiB/s 3.200 Mbps > 2 write: 170 ticks, 23 KiB/s 0.184 Mbps > 3 read: 9 ticks, 444 KiB/s 3.552 Mbps > Test passed > 0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps > 1 check: 10 ticks, 400 KiB/s 3.200 Mbps > 2 write: 170 ticks, 23 KiB/s 0.184 Mbps > 3 read: 9 ticks, 444 KiB/s 3.552 Mbps > > QEMU flash: "mx66u51235f" > > U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800) > > CPU: rv64imafdcsu > Model: SiFive HiFive Unleashed A00 > DRAM: 2 GiB > MMC: > Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB > *** Warning - bad CRC, using default environment > > In: serial@10010000 > Out: serial@10010000 > Err: serial@10010000 > Net: failed to get gemgxl_reset reset > > Warning: ethernet@10090000 MAC addresses don't match: > Address in DT is 52:54:00:12:34:56 > Address in environment is 70:b3:d5:92:f0:01 > eth0: ethernet@10090000 > Hit any key to stop autoboot: 0 > => sf probe > SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB > => sf test 0 8000 > SPI flash test: > 0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps > 1 check: 80 ticks, 400 KiB/s 3.200 Mbps > 2 write: 83 ticks, 385 KiB/s 3.080 Mbps > 3 read: 79 ticks, 405 KiB/s 3.240 Mbps > Test passed > 0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps > 1 check: 80 ticks, 400 KiB/s 3.200 Mbps > 2 write: 83 ticks, 385 KiB/s 3.080 Mbps > 3 read: 79 ticks, 405 KiB/s 3.240 Mbps > > I am sure there will be bugs, and I have not tested all flashes affected. > But I want to send out this series for an early discussion and comments. > I will continue my testing. > > > Bin Meng (9): > hw/block: m25p80: Fix the number of dummy bytes needed for Windbond > flashes > hw/block: m25p80: Fix the number of dummy bytes needed for > Numonyx/Micron flashes > hw/block: m25p80: Fix the number of dummy bytes needed for Macronix > flashes > hw/block: m25p80: Fix the number of dummy bytes needed for Spansion > flashes > hw/block: m25p80: Support fast read for SST flashes > hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling > Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 > command" > Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles" > hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic > > include/hw/ssi/aspeed_smc.h | 3 - > hw/block/m25p80.c | 153 ++++++++++++++++++++++++++++-------- > hw/ssi/aspeed_smc.c | 116 +-------------------------- > hw/ssi/npcm7xx_fiu.c | 8 +- > hw/ssi/xilinx_spips.c | 29 ++++++- > 5 files changed, 153 insertions(+), 156 deletions(-) >
Patchew URL: https://patchew.org/QEMU/20210114150902.11515-1-bmeng.cn@gmail.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20210114150902.11515-1-bmeng.cn@gmail.com Subject: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20210114013147.92962-1-jiaxun.yang@flygoat.com -> patchew/20210114013147.92962-1-jiaxun.yang@flygoat.com * [new tag] patchew/20210114150902.11515-1-bmeng.cn@gmail.com -> patchew/20210114150902.11515-1-bmeng.cn@gmail.com Switched to a new branch 'test' b87aded hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic 4518be2 Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles" 6a4067a Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command" e5ea744 hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling 3294942 hw/block: m25p80: Support fast read for SST flashes 50a7f9f hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes cf6f8e1 hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes 3925fcf hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes 5344168 hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes === OUTPUT BEGIN === 1/9 Checking commit 5344168de433 (hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes) 2/9 Checking commit 3925fcf79dbc (hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes) 3/9 Checking commit cf6f8e145faa (hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes) 4/9 Checking commit 50a7f9fb909b (hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes) 5/9 Checking commit 3294942ca3a1 (hw/block: m25p80: Support fast read for SST flashes) 6/9 Checking commit e5ea74473d87 (hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling) ERROR: line over 90 characters #63: FILE: hw/ssi/xilinx_spips.c:510: + uint8_t spi_mode = ARRAY_FIELD_EX32(s->regs, GQSPI_GF_SNAPSHOT, SPI_MODE); ERROR: line over 90 characters #71: FILE: hw/ssi/xilinx_spips.c:518: + qemu_log_mask(LOG_GUEST_ERROR, "Unknown SPI MODE: 0x%x ", spi_mode); total: 2 errors, 0 warnings, 41 lines checked Patch 6/9 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 7/9 Checking commit 6a4067a6a9fc (Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command") 8/9 Checking commit 4518be22e1c9 (Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles") 9/9 Checking commit b87aded6dc2a (hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic) === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20210114150902.11515-1-bmeng.cn@gmail.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Hi Bin, On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > From: Bin Meng <bin.meng@windriver.com> > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > bytes are expected to be received after it receives a command. For > example, depending on the address mode, either 3-byte address or > 4-byte address is needed. > > For fast read family commands, some dummy cycles are required after > sending the address bytes, and the dummy cycles need to be counted > in s->needed_bytes. This is where the mess began. > > As the variable name (needed_bytes) indicates, the unit is in byte. > It is not in bit, or cycle. However for some reason the model has > been using the number of dummy cycles for s->needed_bytes. The right > approach is to convert the number of dummy cycles to bytes based on > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). While not being the original implementor I must assume that above solution was considered but not chosen by the developers due to it is inaccuracy (it wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, meaning that if the controller is wrongly programmed to generate 7 the error wouldn't be caught and the controller will still be considered "correct"). Now that we have this detail in the implementation I'm in favor of keeping it, this also because the detail is already in use for catching exactly above error. > > Things get complicated when interacting with different SPI or QSPI > flash controllers. There are major two cases: > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > For such case, driver will calculate the correct number of dummy > bytes and write them into the tx fifo. Fixing the m25p80 model will > fix flashes working with such controllers. Above can be fixed while still keeping the detailed dummy cycle implementation inside m25p80. Perhaps one of the following could be looked into: configurating the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting some functionality handling this in the SPI controller. Or a mixture of above. > - Dummy bytes not prepared by drivers. Drivers just tell the hardware > the dummy cycle configuration via some registers, and hardware will > automatically generate dummy cycles for us. Fixing the m25p80 model > is not enough, and we will need to fix the SPI/QSPI models for such > controllers. > > This series fixes the mess in the m25p80 from the flash side first, Considering the problems solved by the solution in tree I find m25p80 pretty clean, at least I don't see any clearly better way for accurately modeling the dummy clock cycles. Counting bits instead of bytes would for example still force the controllers to mark which bits to count (when transmitting one dummy byte from a txfifo on four lines (Quad command) it generates 2 dummy clock cycles since it takes two cycles to transfer 8 bits). Best regards, Francisco Iglesias > followed by fixes to 3 known SPI controller models that fall into > the 2nd case above. > > Please note, I have no way to verify patch 7/8/9 because: > > * There is no public datasheet available for the SoC / SPI controller > * There is no QEMU docs, or details that tell people how to boot either > U-Boot or Linux kernel to verify the functionality > > These 3 patches are very likely to be wrong. Hence I would like to ask > help from the original author who wrote these SPI controller models > to help testing, or completely rewrite these 3 patches to fix things. > Thanks! > > Patch 6 is unvalidated with QEMU, mainly because there is no doc to > tell people how to boot anything to test. But I have some confidence > based on my read of the ZynqMP manual, as well as some experimental > testing on a real ZCU102 board. > > Other flash patches can be tested with the SiFive SPI series: > http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391 > > Cherry-pick patch 16 and 17 from the series above, and switch to > different flash model to test with the following command: > > $ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot > > I've picked up two for testing: > > QEMU flash: "sst25vf032b" > > U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800) > > CPU: rv64imafdcsu > Model: SiFive HiFive Unleashed A00 > DRAM: 2 GiB > MMC: > Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB > *** Warning - bad CRC, using default environment > > In: serial@10010000 > Out: serial@10010000 > Err: serial@10010000 > Net: failed to get gemgxl_reset reset > > Warning: ethernet@10090000 MAC addresses don't match: > Address in DT is 52:54:00:12:34:56 > Address in environment is 70:b3:d5:92:f0:01 > eth0: ethernet@10090000 > Hit any key to stop autoboot: 0 > => sf probe > SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, > total 4 MiB > => sf test 1ff000 1000 > SPI flash test: > 0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps > 1 check: 10 ticks, 400 KiB/s 3.200 Mbps > 2 write: 170 ticks, 23 KiB/s 0.184 Mbps > 3 read: 9 ticks, 444 KiB/s 3.552 Mbps > Test passed > 0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps > 1 check: 10 ticks, 400 KiB/s 3.200 Mbps > 2 write: 170 ticks, 23 KiB/s 0.184 Mbps > 3 read: 9 ticks, 444 KiB/s 3.552 Mbps > > QEMU flash: "mx66u51235f" > > U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800) > > CPU: rv64imafdcsu > Model: SiFive HiFive Unleashed A00 > DRAM: 2 GiB > MMC: > Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB > *** Warning - bad CRC, using default environment > > In: serial@10010000 > Out: serial@10010000 > Err: serial@10010000 > Net: failed to get gemgxl_reset reset > > Warning: ethernet@10090000 MAC addresses don't match: > Address in DT is 52:54:00:12:34:56 > Address in environment is 70:b3:d5:92:f0:01 > eth0: ethernet@10090000 > Hit any key to stop autoboot: 0 > => sf probe > SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB > => sf test 0 8000 > SPI flash test: > 0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps > 1 check: 80 ticks, 400 KiB/s 3.200 Mbps > 2 write: 83 ticks, 385 KiB/s 3.080 Mbps > 3 read: 79 ticks, 405 KiB/s 3.240 Mbps > Test passed > 0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps > 1 check: 80 ticks, 400 KiB/s 3.200 Mbps > 2 write: 83 ticks, 385 KiB/s 3.080 Mbps > 3 read: 79 ticks, 405 KiB/s 3.240 Mbps > > I am sure there will be bugs, and I have not tested all flashes affected. > But I want to send out this series for an early discussion and comments. > I will continue my testing. > > > Bin Meng (9): > hw/block: m25p80: Fix the number of dummy bytes needed for Windbond > flashes > hw/block: m25p80: Fix the number of dummy bytes needed for > Numonyx/Micron flashes > hw/block: m25p80: Fix the number of dummy bytes needed for Macronix > flashes > hw/block: m25p80: Fix the number of dummy bytes needed for Spansion > flashes > hw/block: m25p80: Support fast read for SST flashes > hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling > Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 > command" > Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles" > hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic > > include/hw/ssi/aspeed_smc.h | 3 - > hw/block/m25p80.c | 153 ++++++++++++++++++++++++++++-------- > hw/ssi/aspeed_smc.c | 116 +-------------------------- > hw/ssi/npcm7xx_fiu.c | 8 +- > hw/ssi/xilinx_spips.c | 29 ++++++- > 5 files changed, 153 insertions(+), 156 deletions(-) > > -- > 2.25.1 >
Hi Francisco, On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias <frasse.iglesias@gmail.com> wrote: > > Hi Bin, > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > From: Bin Meng <bin.meng@windriver.com> > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > bytes are expected to be received after it receives a command. For > > example, depending on the address mode, either 3-byte address or > > 4-byte address is needed. > > > > For fast read family commands, some dummy cycles are required after > > sending the address bytes, and the dummy cycles need to be counted > > in s->needed_bytes. This is where the mess began. > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > It is not in bit, or cycle. However for some reason the model has > > been using the number of dummy cycles for s->needed_bytes. The right > > approach is to convert the number of dummy cycles to bytes based on > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > While not being the original implementor I must assume that above solution was > considered but not chosen by the developers due to it is inaccuracy (it > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > meaning that if the controller is wrongly programmed to generate 7 the error > wouldn't be caught and the controller will still be considered "correct"). Now > that we have this detail in the implementation I'm in favor of keeping it, this > also because the detail is already in use for catching exactly above error. > I found no clue from the commit message that my proposed solution here was ever considered, otherwise all SPI controller models supporting software generation should have been found out seriously broken long time ago! The issue you pointed out that we require the total number of dummy bits should be multiple of 8 is true, that's why I added the unimplemented log message in this series (patch 2/3/4) to warn users if this expectation is not met. However this will not cause any issue when running U-Boot or Linux, because both spi-nor drivers expect the same assumption as we do here. See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), there is a logic to calculate the dummy bytes needed for fast read command: /* convert the dummy cycles to the number of bytes */ op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; Note the default dummy cycles configuration for all flashes I have looked into as of today, meets the multiple of 8 assumption. On some flashes the dummy cycle number is configurable, and if it's been configured to be an odd value, it would not work on U-Boot/Linux in the first place. > > > > Things get complicated when interacting with different SPI or QSPI > > flash controllers. There are major two cases: > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > For such case, driver will calculate the correct number of dummy > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > fix flashes working with such controllers. > > Above can be fixed while still keeping the detailed dummy cycle implementation > inside m25p80. Perhaps one of the following could be looked into: configurating > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > some functionality handling this in the SPI controller. Or a mixture of above. Please send patches to explain this in detail how this is going to work. I am open to all possible solutions. > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware > > the dummy cycle configuration via some registers, and hardware will > > automatically generate dummy cycles for us. Fixing the m25p80 model > > is not enough, and we will need to fix the SPI/QSPI models for such > > controllers. > > > > This series fixes the mess in the m25p80 from the flash side first, > > Considering the problems solved by the solution in tree I find m25p80 pretty > clean, at least I don't see any clearly better way for accurately modeling the > dummy clock cycles. Counting bits instead of bytes would for example still > force the controllers to mark which bits to count (when transmitting one dummy > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock > cycles since it takes two cycles to transfer 8 bits). > SPI is a bit based protocol, not bytes. If you insist on bit modeling with the dummy cycles then you should also suggest we change all cycles (including command/addr/dummy/data phases) to be modeled with bits. That way we can accurately emulate everything, for example one potential problem like transferring 9 bit in the data phase. However modeling everything with bit is super inefficient. My view is that we should avoid trying to support uncommon use cases (like not multiple of 8 for dummy bits) in QEMU. Regards, Bin
Hi Bin, On Thu, Jan 14, 2021 at 6:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > Hi Francisco, > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Hi Bin, > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > bytes are expected to be received after it receives a command. For > > > example, depending on the address mode, either 3-byte address or > > > 4-byte address is needed. > > > > > > For fast read family commands, some dummy cycles are required after > > > sending the address bytes, and the dummy cycles need to be counted > > > in s->needed_bytes. This is where the mess began. > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > It is not in bit, or cycle. However for some reason the model has > > > been using the number of dummy cycles for s->needed_bytes. The right > > > approach is to convert the number of dummy cycles to bytes based on > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > While not being the original implementor I must assume that above solution was > > considered but not chosen by the developers due to it is inaccuracy (it > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > meaning that if the controller is wrongly programmed to generate 7 the error > > wouldn't be caught and the controller will still be considered "correct"). Now > > that we have this detail in the implementation I'm in favor of keeping it, this > > also because the detail is already in use for catching exactly above error. > > > > I found no clue from the commit message that my proposed solution here > was ever considered, otherwise all SPI controller models supporting > software generation should have been found out seriously broken long > time ago! > > The issue you pointed out that we require the total number of dummy > bits should be multiple of 8 is true, that's why I added the > unimplemented log message in this series (patch 2/3/4) to warn users > if this expectation is not met. However this will not cause any issue > when running U-Boot or Linux, because both spi-nor drivers expect the > same assumption as we do here. > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > there is a logic to calculate the dummy bytes needed for fast read > command: > > /* convert the dummy cycles to the number of bytes */ > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > Note the default dummy cycles configuration for all flashes I have > looked into as of today, meets the multiple of 8 assumption. On some > flashes the dummy cycle number is configurable, and if it's been > configured to be an odd value, it would not work on U-Boot/Linux in > the first place. > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > flash controllers. There are major two cases: > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > For such case, driver will calculate the correct number of dummy > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > fix flashes working with such controllers. > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > inside m25p80. Perhaps one of the following could be looked into: configurating > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > some functionality handling this in the SPI controller. Or a mixture of above. > > Please send patches to explain this in detail how this is going to > work. I am open to all possible solutions. > > > > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware > > > the dummy cycle configuration via some registers, and hardware will > > > automatically generate dummy cycles for us. Fixing the m25p80 model > > > is not enough, and we will need to fix the SPI/QSPI models for such > > > controllers. > > > > > > This series fixes the mess in the m25p80 from the flash side first, > > > > Considering the problems solved by the solution in tree I find m25p80 pretty > > clean, at least I don't see any clearly better way for accurately modeling the > > dummy clock cycles. Counting bits instead of bytes would for example still > > force the controllers to mark which bits to count (when transmitting one dummy > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock > > cycles since it takes two cycles to transfer 8 bits). > > > > SPI is a bit based protocol, not bytes. If you insist on bit modeling > with the dummy cycles then you should also suggest we change all > cycles (including command/addr/dummy/data phases) to be modeled with > bits. That way we can accurately emulate everything, for example one > potential problem like transferring 9 bit in the data phase. I agree with this. There's really nothing special about dummy cycles. Making them special makes it super painful to implement SPI controller emulation because you have to anticipate when ssi_transfer changes semantics from byte-at-a-time to bit-at-a-time. I doubt all the SPI controllers in the tree gets it right all the time. > However modeling everything with bit is super inefficient. My view is > that we should avoid trying to support uncommon use cases (like not > multiple of 8 for dummy bits) in QEMU. Perhaps ssi_transfer could take an additional bits parameter? That should make it possible to transfer any number of bits up to 32, while keeping the common case simple on both sides. And it would work for any SPI transfer, not just dummy cycles. Havard
Hi Bin, On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > Hi Francisco, > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Hi Bin, > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > bytes are expected to be received after it receives a command. For > > > example, depending on the address mode, either 3-byte address or > > > 4-byte address is needed. > > > > > > For fast read family commands, some dummy cycles are required after > > > sending the address bytes, and the dummy cycles need to be counted > > > in s->needed_bytes. This is where the mess began. > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > It is not in bit, or cycle. However for some reason the model has > > > been using the number of dummy cycles for s->needed_bytes. The right > > > approach is to convert the number of dummy cycles to bytes based on > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > While not being the original implementor I must assume that above solution was > > considered but not chosen by the developers due to it is inaccuracy (it > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > meaning that if the controller is wrongly programmed to generate 7 the error > > wouldn't be caught and the controller will still be considered "correct"). Now > > that we have this detail in the implementation I'm in favor of keeping it, this > > also because the detail is already in use for catching exactly above error. > > > > I found no clue from the commit message that my proposed solution here > was ever considered, otherwise all SPI controller models supporting > software generation should have been found out seriously broken long > time ago! The controllers you are referring to might lack support for commands requiring dummy clock cycles but I really hope they work with the other commands? If so I don't think it is fair to call them 'seriously broken' (and else we should probably let the maintainers know about it). Most likely the lack of support for the commands is because no request has been made for them. Also there is one controller that has support. > > The issue you pointed out that we require the total number of dummy > bits should be multiple of 8 is true, that's why I added the > unimplemented log message in this series (patch 2/3/4) to warn users > if this expectation is not met. However this will not cause any issue > when running U-Boot or Linux, because both spi-nor drivers expect the > same assumption as we do here. > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > there is a logic to calculate the dummy bytes needed for fast read > command: > > /* convert the dummy cycles to the number of bytes */ > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > Note the default dummy cycles configuration for all flashes I have > looked into as of today, meets the multiple of 8 assumption. On some > flashes the dummy cycle number is configurable, and if it's been > configured to be an odd value, it would not work on U-Boot/Linux in > the first place. > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > flash controllers. There are major two cases: > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > For such case, driver will calculate the correct number of dummy > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > fix flashes working with such controllers. > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > inside m25p80. Perhaps one of the following could be looked into: configurating > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > some functionality handling this in the SPI controller. Or a mixture of above. > > Please send patches to explain this in detail how this is going to > work. I am open to all possible solutions. In that case I suggest that you instead try with a device property 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle count to dummy bytes inside m25p80. Below is an example on how to modify the decode_fast_read_cmd function (the other commands requiring dummy clock cycles can follow a similar pattern). This way the fifo mode will be able to work the way you desire while also keeping the current functionality intact. Suddenly removing functionality (features) will take users by surprise. static void decode_fast_read_cmd(Flash *s) { uint8_t dummy_clk_cycles = 0; uint8_t extra_bytes; s->needed_bytes = get_addr_length(s); /* Obtain the number of dummy clock cycles needed */ switch (get_man(s)) { case MAN_WINBOND: dummy_clk_cycles += 8; break; case MAN_NUMONYX: dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s); break; case MAN_MACRONIX: if (extract32(s->volatile_cfg, 6, 2) == 1) { dummy_clk_cycles += 6; } else { dummy_clk_cycles += 8; } break; case MAN_SPANSION: dummy_clk_cycles += extract32(s->spansion_cr2v, SPANSION_DUMMY_CLK_POS, SPANSION_DUMMY_CLK_LEN ); break; default: break; } if (s->model_dummy_bytes) { int lines = 1; /* * Expect dummy bytes from the controller so convert the dummy * clock cycles to dummy_bytes. */ extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines); } else { /* Model individual dummy clock cycles as byte writes */ extra_bytes = dummy_clk_cycles; } s->needed_bytes += extra_bytes; s->pos = 0; s->len = 0; s->state = STATE_COLLECTING_DATA; } Best regards, Francisco Iglesias > > > > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware > > > the dummy cycle configuration via some registers, and hardware will > > > automatically generate dummy cycles for us. Fixing the m25p80 model > > > is not enough, and we will need to fix the SPI/QSPI models for such > > > controllers. > > > > > > This series fixes the mess in the m25p80 from the flash side first, > > > > Considering the problems solved by the solution in tree I find m25p80 pretty > > clean, at least I don't see any clearly better way for accurately modeling the > > dummy clock cycles. Counting bits instead of bytes would for example still > > force the controllers to mark which bits to count (when transmitting one dummy > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock > > cycles since it takes two cycles to transfer 8 bits). > > > > SPI is a bit based protocol, not bytes. If you insist on bit modeling > with the dummy cycles then you should also suggest we change all > cycles (including command/addr/dummy/data phases) to be modeled with > bits. That way we can accurately emulate everything, for example one > potential problem like transferring 9 bit in the data phase. > > However modeling everything with bit is super inefficient. My view is > that we should avoid trying to support uncommon use cases (like not > multiple of 8 for dummy bits) in QEMU. > > Regards, > Bin
Hi Havard, On Fri, Jan 15, 2021 at 11:29 AM Havard Skinnemoen <hskinnemoen@google.com> wrote: > > Hi Bin, > > On Thu, Jan 14, 2021 at 6:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > Hi Francisco, > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > <frasse.iglesias@gmail.com> wrote: > > > > > > Hi Bin, > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > bytes are expected to be received after it receives a command. For > > > > example, depending on the address mode, either 3-byte address or > > > > 4-byte address is needed. > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > sending the address bytes, and the dummy cycles need to be counted > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > It is not in bit, or cycle. However for some reason the model has > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > approach is to convert the number of dummy cycles to bytes based on > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > While not being the original implementor I must assume that above solution was > > > considered but not chosen by the developers due to it is inaccuracy (it > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > also because the detail is already in use for catching exactly above error. > > > > > > > I found no clue from the commit message that my proposed solution here > > was ever considered, otherwise all SPI controller models supporting > > software generation should have been found out seriously broken long > > time ago! > > > > The issue you pointed out that we require the total number of dummy > > bits should be multiple of 8 is true, that's why I added the > > unimplemented log message in this series (patch 2/3/4) to warn users > > if this expectation is not met. However this will not cause any issue > > when running U-Boot or Linux, because both spi-nor drivers expect the > > same assumption as we do here. > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > there is a logic to calculate the dummy bytes needed for fast read > > command: > > > > /* convert the dummy cycles to the number of bytes */ > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > Note the default dummy cycles configuration for all flashes I have > > looked into as of today, meets the multiple of 8 assumption. On some > > flashes the dummy cycle number is configurable, and if it's been > > configured to be an odd value, it would not work on U-Boot/Linux in > > the first place. > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > flash controllers. There are major two cases: > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > For such case, driver will calculate the correct number of dummy > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > fix flashes working with such controllers. > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > Please send patches to explain this in detail how this is going to > > work. I am open to all possible solutions. > > > > > > > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware > > > > the dummy cycle configuration via some registers, and hardware will > > > > automatically generate dummy cycles for us. Fixing the m25p80 model > > > > is not enough, and we will need to fix the SPI/QSPI models for such > > > > controllers. > > > > > > > > This series fixes the mess in the m25p80 from the flash side first, > > > > > > Considering the problems solved by the solution in tree I find m25p80 pretty > > > clean, at least I don't see any clearly better way for accurately modeling the > > > dummy clock cycles. Counting bits instead of bytes would for example still > > > force the controllers to mark which bits to count (when transmitting one dummy > > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock > > > cycles since it takes two cycles to transfer 8 bits). > > > > > > > SPI is a bit based protocol, not bytes. If you insist on bit modeling > > with the dummy cycles then you should also suggest we change all > > cycles (including command/addr/dummy/data phases) to be modeled with > > bits. That way we can accurately emulate everything, for example one > > potential problem like transferring 9 bit in the data phase. > > I agree with this. There's really nothing special about dummy cycles. > Making them special makes it super painful to implement SPI controller > emulation because you have to anticipate when ssi_transfer changes > semantics from byte-at-a-time to bit-at-a-time. I doubt all the SPI > controllers in the tree gets it right all the time. > Yep, it's not just painful for SPI controllers, and for the case 1 SPI controller it's impossible to snoop the data to distinguish when the dummy cycles begin. > > However modeling everything with bit is super inefficient. My view is > > that we should avoid trying to support uncommon use cases (like not > > multiple of 8 for dummy bits) in QEMU. > > Perhaps ssi_transfer could take an additional bits parameter? That > should make it possible to transfer any number of bits up to 32, while > keeping the common case simple on both sides. And it would work for > any SPI transfer, not just dummy cycles. This sounds like a good tradeoff from the emulator perspective. But I am not sure we should do this to solve the dummy cycle mess given all the default dummy cycle configurations so far match the multiple of 8 assumption. Regards, Bin
Hi Francisco, On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias <frasse.iglesias@gmail.com> wrote: > > Hi Bin, > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > Hi Francisco, > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > <frasse.iglesias@gmail.com> wrote: > > > > > > Hi Bin, > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > bytes are expected to be received after it receives a command. For > > > > example, depending on the address mode, either 3-byte address or > > > > 4-byte address is needed. > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > sending the address bytes, and the dummy cycles need to be counted > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > It is not in bit, or cycle. However for some reason the model has > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > approach is to convert the number of dummy cycles to bytes based on > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > While not being the original implementor I must assume that above solution was > > > considered but not chosen by the developers due to it is inaccuracy (it > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > also because the detail is already in use for catching exactly above error. > > > > > > > I found no clue from the commit message that my proposed solution here > > was ever considered, otherwise all SPI controller models supporting > > software generation should have been found out seriously broken long > > time ago! > > > The controllers you are referring to might lack support for commands requiring > dummy clock cycles but I really hope they work with the other commands? If so I I am not sure why you view dummy clock cycles as something special that needs some special support from the SPI controller. For the case 1 controller, it's nothing special from the controller perspective, just like sending out a command, or address bytes, or data. The controller just shifts data bit by bit from its tx fifo and that's it. In the Xilinx GQSPI controller case, the dummy cycles can either be sent via a regular data (the case 1 controller) in the tx fifo, or automatically generated (case 2 controller) by the hardware. > don't think it is fair to call them 'seriously broken' (and else we should > probably let the maintainers know about it). Most likely the lack of support I called it "seriously broken" because current implementation only considered one type of SPI controllers while completely ignoring the other type. > for the commands is because no request has been made for them. Also there is > one controller that has support. Definitely it's not "no request". Nearly all SPI flashes support the Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is "seriously broken" for those case 1 type controllers because they cannot read anything from the m25p80 model at all. Unless the guest software being tested only uses Read (03h) command which is not affected. But I can't find a software that uses Read instead of Fast Read. > > The issue you pointed out that we require the total number of dummy > > bits should be multiple of 8 is true, that's why I added the > > unimplemented log message in this series (patch 2/3/4) to warn users > > if this expectation is not met. However this will not cause any issue > > when running U-Boot or Linux, because both spi-nor drivers expect the > > same assumption as we do here. > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > there is a logic to calculate the dummy bytes needed for fast read > > command: > > > > /* convert the dummy cycles to the number of bytes */ > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > Note the default dummy cycles configuration for all flashes I have > > looked into as of today, meets the multiple of 8 assumption. On some > > flashes the dummy cycle number is configurable, and if it's been > > configured to be an odd value, it would not work on U-Boot/Linux in > > the first place. > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > flash controllers. There are major two cases: > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > For such case, driver will calculate the correct number of dummy > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > fix flashes working with such controllers. > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > Please send patches to explain this in detail how this is going to > > work. I am open to all possible solutions. > > In that case I suggest that you instead try with a device property > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > count to dummy bytes inside m25p80. Below is an example on how to modify the No this is wrong in my view. This is not like a DMA vs. PIO handling. > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > can follow a similar pattern). This way the fifo mode will be able to work the > way you desire while also keeping the current functionality intact. Suddenly > removing functionality (features) will take users by surprise. I don't think we are removing any features. This is a fix to make the model to be used by any SPI controllers. As I pointed out, both U-Boot and Linux have the multiple of 8 assumption for the dummy bit, which is the default configuration for all flashes I have looked into so far. Can you please comment what use case you want to support? I requested a U-Boot/Linux kernel testing in the previous SST thread [1] against Xilinx GQSPI but there was no response. [1] http://patchwork.ozlabs.org/project/qemu-devel/patch/1606704602-59435-1-git-send-email-bmeng.cn@gmail.com/ > > static void decode_fast_read_cmd(Flash *s) > { > uint8_t dummy_clk_cycles = 0; > uint8_t extra_bytes; > > s->needed_bytes = get_addr_length(s); > > /* Obtain the number of dummy clock cycles needed */ > switch (get_man(s)) { > case MAN_WINBOND: > dummy_clk_cycles += 8; > break; > case MAN_NUMONYX: > dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s); > break; > case MAN_MACRONIX: > if (extract32(s->volatile_cfg, 6, 2) == 1) { > dummy_clk_cycles += 6; > } else { > dummy_clk_cycles += 8; > } > break; > case MAN_SPANSION: > dummy_clk_cycles += extract32(s->spansion_cr2v, > SPANSION_DUMMY_CLK_POS, > SPANSION_DUMMY_CLK_LEN > ); > break; > default: > break; > } > > if (s->model_dummy_bytes) { > int lines = 1; > > /* > * Expect dummy bytes from the controller so convert the dummy > * clock cycles to dummy_bytes. > */ > extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines); > } else { > /* Model individual dummy clock cycles as byte writes */ > extra_bytes = dummy_clk_cycles; > } > > s->needed_bytes += extra_bytes; > s->pos = 0; > s->len = 0; > s->state = STATE_COLLECTING_DATA; > } > > Best regards, > Francisco Iglesias > > > > > > > > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware > > > > the dummy cycle configuration via some registers, and hardware will > > > > automatically generate dummy cycles for us. Fixing the m25p80 model > > > > is not enough, and we will need to fix the SPI/QSPI models for such > > > > controllers. > > > > > > > > This series fixes the mess in the m25p80 from the flash side first, > > > > > > Considering the problems solved by the solution in tree I find m25p80 pretty > > > clean, at least I don't see any clearly better way for accurately modeling the > > > dummy clock cycles. Counting bits instead of bytes would for example still > > > force the controllers to mark which bits to count (when transmitting one dummy > > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock > > > cycles since it takes two cycles to transfer 8 bits). > > > > > > > SPI is a bit based protocol, not bytes. If you insist on bit modeling > > with the dummy cycles then you should also suggest we change all > > cycles (including command/addr/dummy/data phases) to be modeled with > > bits. That way we can accurately emulate everything, for example one > > potential problem like transferring 9 bit in the data phase. > > > > However modeling everything with bit is super inefficient. My view is > > that we should avoid trying to support uncommon use cases (like not > > multiple of 8 for dummy bits) in QEMU. Regards, Bin
Hi Bin, On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > Hi Francisco, > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Hi Bin, > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > Hi Francisco, > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Hi Bin, > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > bytes are expected to be received after it receives a command. For > > > > > example, depending on the address mode, either 3-byte address or > > > > > 4-byte address is needed. > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > While not being the original implementor I must assume that above solution was > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > was ever considered, otherwise all SPI controller models supporting > > > software generation should have been found out seriously broken long > > > time ago! > > > > > > The controllers you are referring to might lack support for commands requiring > > dummy clock cycles but I really hope they work with the other commands? If so I > > I am not sure why you view dummy clock cycles as something special > that needs some special support from the SPI controller. For the case > 1 controller, it's nothing special from the controller perspective, > just like sending out a command, or address bytes, or data. The > controller just shifts data bit by bit from its tx fifo and that's it. > In the Xilinx GQSPI controller case, the dummy cycles can either be > sent via a regular data (the case 1 controller) in the tx fifo, or > automatically generated (case 2 controller) by the hardware. Ok, I'll try to explain my view point a little differently. For that we also need to keep in mind that QEMU models HW, and any binary that runs on a HW board supported in QEMU should ideally run on that board inside QEMU aswell (this can be a bare metal application equaly well as a modified u-boot/Linux using SPI commands with a non multiple of 8 number of dummy clock cycles). Once functionality has been introduced into QEMU it is not easy to know which intentional or untentional features provided by the functionality are being used by users. One of the (perhaps not well known) features I'm aware of that is in use and is provided by the accurate dummy clock cycle modeling inside m25p80 is the be ability to test drivers accurately regarding the dummy clock cycles (even when using commands with a non-multiple of 8 number of dummy clock cycles), but there might be others aswell. So by removing this functionality above use case will brake, this since those test will not be reliable. Furthermore, since users tend to be creative it is not possible to know if there are other use cases that will be affected. This means that in case [1] needs to be followed the safe path is to add functionality instead of removing. Luckily it also easier in this case, see below. > > > don't think it is fair to call them 'seriously broken' (and else we should > > probably let the maintainers know about it). Most likely the lack of support > > I called it "seriously broken" because current implementation only > considered one type of SPI controllers while completely ignoring the > other type. If we change view and see this from the perspective of m25p80, it models the commands a certain way and provides an API that the SPI controllers need to implement for interacting with it. It is true that there are SPI controllers referred to above that do not support the portion of that API that corresponds to commands with dummy clock cycles, but I don't think it is true that this is broken since there is also one SPI controller that has a working implementation of m25p80's full API also when transfering through a tx fifo (use case 1). But as mentioned above, by doing a minor extension and improvement to m25p80's API and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] will still be honored as in the same time making it possible to have full support for the API in the SPI controllers that currently do not (please reread the proposal in my previous reply that attempts to do this). I myself see this as win/win situation, also because no controller should need modifications. > > > for the commands is because no request has been made for them. Also there is > > one controller that has support. > > Definitely it's not "no request". Nearly all SPI flashes support the > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is > "seriously broken" for those case 1 type controllers because they > cannot read anything from the m25p80 model at all. Unless the guest > software being tested only uses Read (03h) command which is not > affected. But I can't find a software that uses Read instead of Fast > Read. > > > > The issue you pointed out that we require the total number of dummy > > > bits should be multiple of 8 is true, that's why I added the > > > unimplemented log message in this series (patch 2/3/4) to warn users > > > if this expectation is not met. However this will not cause any issue > > > when running U-Boot or Linux, because both spi-nor drivers expect the > > > same assumption as we do here. > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > > there is a logic to calculate the dummy bytes needed for fast read > > > command: > > > > > > /* convert the dummy cycles to the number of bytes */ > > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > > > Note the default dummy cycles configuration for all flashes I have > > > looked into as of today, meets the multiple of 8 assumption. On some > > > flashes the dummy cycle number is configurable, and if it's been > > > configured to be an odd value, it would not work on U-Boot/Linux in > > > the first place. > > > > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > > flash controllers. There are major two cases: > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > > For such case, driver will calculate the correct number of dummy > > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > > fix flashes working with such controllers. > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > > > Please send patches to explain this in detail how this is going to > > > work. I am open to all possible solutions. > > > > In that case I suggest that you instead try with a device property > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > > count to dummy bytes inside m25p80. Below is an example on how to modify the > > No this is wrong in my view. This is not like a DMA vs. PIO handling. > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > > can follow a similar pattern). This way the fifo mode will be able to work the > > way you desire while also keeping the current functionality intact. Suddenly > > removing functionality (features) will take users by surprise. > > I don't think we are removing any features. This is a fix to make the > model to be used by any SPI controllers. > > As I pointed out, both U-Boot and Linux have the multiple of 8 > assumption for the dummy bit, which is the default configuration for > all flashes I have looked into so far. Can you please comment what use > case you want to support? I requested a U-Boot/Linux kernel testing in > the previous SST thread [1] against Xilinx GQSPI but there was no > response. In [2] instructions on how to boot u-boot/Linux is found. For building the various software components I followed the official doc in [3]. Best regards, Francisco [1] qemu/docs/system/deprecated.rst [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md [3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux > > [1] http://patchwork.ozlabs.org/project/qemu-devel/patch/1606704602-59435-1-git-send-email-bmeng.cn@gmail.com/ > > > > > static void decode_fast_read_cmd(Flash *s) > > { > > uint8_t dummy_clk_cycles = 0; > > uint8_t extra_bytes; > > > > s->needed_bytes = get_addr_length(s); > > > > /* Obtain the number of dummy clock cycles needed */ > > switch (get_man(s)) { > > case MAN_WINBOND: > > dummy_clk_cycles += 8; > > break; > > case MAN_NUMONYX: > > dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s); > > break; > > case MAN_MACRONIX: > > if (extract32(s->volatile_cfg, 6, 2) == 1) { > > dummy_clk_cycles += 6; > > } else { > > dummy_clk_cycles += 8; > > } > > break; > > case MAN_SPANSION: > > dummy_clk_cycles += extract32(s->spansion_cr2v, > > SPANSION_DUMMY_CLK_POS, > > SPANSION_DUMMY_CLK_LEN > > ); > > break; > > default: > > break; > > } > > > > if (s->model_dummy_bytes) { > > int lines = 1; > > > > /* > > * Expect dummy bytes from the controller so convert the dummy > > * clock cycles to dummy_bytes. > > */ > > extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines); > > } else { > > /* Model individual dummy clock cycles as byte writes */ > > extra_bytes = dummy_clk_cycles; > > } > > > > s->needed_bytes += extra_bytes; > > s->pos = 0; > > s->len = 0; > > s->state = STATE_COLLECTING_DATA; > > } > > > > Best regards, > > Francisco Iglesias > > > > > > > > > > > > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware > > > > > the dummy cycle configuration via some registers, and hardware will > > > > > automatically generate dummy cycles for us. Fixing the m25p80 model > > > > > is not enough, and we will need to fix the SPI/QSPI models for such > > > > > controllers. > > > > > > > > > > This series fixes the mess in the m25p80 from the flash side first, > > > > > > > > Considering the problems solved by the solution in tree I find m25p80 pretty > > > > clean, at least I don't see any clearly better way for accurately modeling the > > > > dummy clock cycles. Counting bits instead of bytes would for example still > > > > force the controllers to mark which bits to count (when transmitting one dummy > > > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock > > > > cycles since it takes two cycles to transfer 8 bits). > > > > > > > > > > SPI is a bit based protocol, not bytes. If you insist on bit modeling > > > with the dummy cycles then you should also suggest we change all > > > cycles (including command/addr/dummy/data phases) to be modeled with > > > bits. That way we can accurately emulate everything, for example one > > > potential problem like transferring 9 bit in the data phase. > > > > > > However modeling everything with bit is super inefficient. My view is > > > that we should avoid trying to support uncommon use cases (like not > > > multiple of 8 for dummy bits) in QEMU. > > Regards, > Bin
Hi Francisco, On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias <frasse.iglesias@gmail.com> wrote: > > Hi Bin, > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > Hi Francisco, > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > <frasse.iglesias@gmail.com> wrote: > > > > > > Hi Bin, > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > Hi Francisco, > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > Hi Bin, > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > bytes are expected to be received after it receives a command. For > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > 4-byte address is needed. > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > was ever considered, otherwise all SPI controller models supporting > > > > software generation should have been found out seriously broken long > > > > time ago! > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > I am not sure why you view dummy clock cycles as something special > > that needs some special support from the SPI controller. For the case > > 1 controller, it's nothing special from the controller perspective, > > just like sending out a command, or address bytes, or data. The > > controller just shifts data bit by bit from its tx fifo and that's it. > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > sent via a regular data (the case 1 controller) in the tx fifo, or > > automatically generated (case 2 controller) by the hardware. > > Ok, I'll try to explain my view point a little differently. For that we also > need to keep in mind that QEMU models HW, and any binary that runs on a HW > board supported in QEMU should ideally run on that board inside QEMU aswell > (this can be a bare metal application equaly well as a modified u-boot/Linux > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > Once functionality has been introduced into QEMU it is not easy to know which > intentional or untentional features provided by the functionality are being > used by users. One of the (perhaps not well known) features I'm aware of that > is in use and is provided by the accurate dummy clock cycle modeling inside > m25p80 is the be ability to test drivers accurately regarding the dummy clock > cycles (even when using commands with a non-multiple of 8 number of dummy clock > cycles), but there might be others aswell. So by removing this functionality > above use case will brake, this since those test will not be reliable. > Furthermore, since users tend to be creative it is not possible to know if > there are other use cases that will be affected. This means that in case [1] > needs to be followed the safe path is to add functionality instead of removing. > Luckily it also easier in this case, see below. I understand there might be users other than U-Boot/Linux that use an odd number of dummy bits (not multiple of 8). If your concern was about model behavior changes, sure I can update qemu/docs/system/deprecated.rst to mention that some flashes in the m25p80 model now implement dummy cycles as bytes. > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > probably let the maintainers know about it). Most likely the lack of support > > > > I called it "seriously broken" because current implementation only > > considered one type of SPI controllers while completely ignoring the > > other type. > > If we change view and see this from the perspective of m25p80, it models the > commands a certain way and provides an API that the SPI controllers need to > implement for interacting with it. It is true that there are SPI controllers > referred to above that do not support the portion of that API that corresponds > to commands with dummy clock cycles, but I don't think it is true that this is > broken since there is also one SPI controller that has a working implementation > of m25p80's full API also when transfering through a tx fifo (use case 1). But > as mentioned above, by doing a minor extension and improvement to m25p80's API > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > will still be honored as in the same time making it possible to have full > support for the API in the SPI controllers that currently do not (please reread > the proposal in my previous reply that attempts to do this). I myself see this > as win/win situation, also because no controller should need modifications. > I am afraid your proposal does not work. Your proposed new device property 'model_dummy_bytes' to select to convert the accurate dummy clock cycle count to dummy bytes inside m25p80, is hard to justify as a property to the flash itself, as the behavior is tightly coupled to how the SPI controller works. Please take a look at the Xilinx GQSPI controller, which supports both use cases, that the dummy cycles can be transferred via tx fifo, or generated by the controller automatically. Please read the example given in: table 24‐22, an example of Generic FIFO Contents for Quad I/O Read Command (EBh) in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf If you choose to set the m25p80 device property 'model_dummy_bytes' to true when working with the Xilinx GQSPI controller, you are bound to only allow guest software to use tx fifo to transfer the dummy cycles, and this is wrong. > > > > > > for the commands is because no request has been made for them. Also there is > > > one controller that has support. > > > > Definitely it's not "no request". Nearly all SPI flashes support the > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is > > "seriously broken" for those case 1 type controllers because they > > cannot read anything from the m25p80 model at all. Unless the guest > > software being tested only uses Read (03h) command which is not > > affected. But I can't find a software that uses Read instead of Fast > > Read. > > > > > > The issue you pointed out that we require the total number of dummy > > > > bits should be multiple of 8 is true, that's why I added the > > > > unimplemented log message in this series (patch 2/3/4) to warn users > > > > if this expectation is not met. However this will not cause any issue > > > > when running U-Boot or Linux, because both spi-nor drivers expect the > > > > same assumption as we do here. > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > > > there is a logic to calculate the dummy bytes needed for fast read > > > > command: > > > > > > > > /* convert the dummy cycles to the number of bytes */ > > > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > > > > > Note the default dummy cycles configuration for all flashes I have > > > > looked into as of today, meets the multiple of 8 assumption. On some > > > > flashes the dummy cycle number is configurable, and if it's been > > > > configured to be an odd value, it would not work on U-Boot/Linux in > > > > the first place. > > > > > > > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > > > flash controllers. There are major two cases: > > > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > > > For such case, driver will calculate the correct number of dummy > > > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > > > fix flashes working with such controllers. > > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > > > > > Please send patches to explain this in detail how this is going to > > > > work. I am open to all possible solutions. > > > > > > In that case I suggest that you instead try with a device property > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > > > count to dummy bytes inside m25p80. Below is an example on how to modify the > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling. > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > > > can follow a similar pattern). This way the fifo mode will be able to work the > > > way you desire while also keeping the current functionality intact. Suddenly > > > removing functionality (features) will take users by surprise. > > > > I don't think we are removing any features. This is a fix to make the > > model to be used by any SPI controllers. > > > > As I pointed out, both U-Boot and Linux have the multiple of 8 > > assumption for the dummy bit, which is the default configuration for > > all flashes I have looked into so far. Can you please comment what use > > case you want to support? I requested a U-Boot/Linux kernel testing in > > the previous SST thread [1] against Xilinx GQSPI but there was no > > response. > > In [2] instructions on how to boot u-boot/Linux is found. For building the > various software components I followed the official doc in [3]. I see the following QEMU commands are used to test booting U-Boot/Linux: $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel bl31.elf -device loader,addr=0x40000000,file=Image -device loader,addr=0x2000000,file=system.dtb I am not sure where the system.dtb gets built from? In [3], it mentions the Xilinx QEMU is used. And a different QEMU command is used as the example to launch U-Boot which is different from your command above. See https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841606/QEMU+-+Zynq+UltraScale+MPSoC#QEMU-ZynqUltraScale+MPSoC-RunningaZynqUltraScale+U-bootImageOnXilinx'sARMQEMU $ ./aarch64-softmmu/qemu-system-aarch64 -M arm-generic-fdt -serial mon:stdio -serial /dev/null -display none \ -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4 \ # Un-reset the A53 -device loader,file=./pre-built/linux/images/bl31.elf,cpu-num=0 \ # ARM Trusted Firmware -device loader,file=./pre-built/linux/images/u-boot.elf\ # The u-boot exectuable -hw-dtb ./pre-built/linux/images/zynqmp-qemu-arm.dtb # HW Device Tree that QEMU uses to generate the model It is using a machine called "arm-generic-fdt", but in the mainline QEMU there is no such machine called "arm-generic-fdt". > > Best regards, > Francisco > > [1] qemu/docs/system/deprecated.rst > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md > [3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux > Regards, Bin
Hi Bin, On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > Hi Francisco, > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Hi Bin, > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > Hi Francisco, > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Hi Bin, > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > Hi Francisco, > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > software generation should have been found out seriously broken long > > > > > time ago! > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > I am not sure why you view dummy clock cycles as something special > > > that needs some special support from the SPI controller. For the case > > > 1 controller, it's nothing special from the controller perspective, > > > just like sending out a command, or address bytes, or data. The > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > automatically generated (case 2 controller) by the hardware. > > > > Ok, I'll try to explain my view point a little differently. For that we also > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > board supported in QEMU should ideally run on that board inside QEMU aswell > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > Once functionality has been introduced into QEMU it is not easy to know which > > intentional or untentional features provided by the functionality are being > > used by users. One of the (perhaps not well known) features I'm aware of that > > is in use and is provided by the accurate dummy clock cycle modeling inside > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > cycles), but there might be others aswell. So by removing this functionality > > above use case will brake, this since those test will not be reliable. > > Furthermore, since users tend to be creative it is not possible to know if > > there are other use cases that will be affected. This means that in case [1] > > needs to be followed the safe path is to add functionality instead of removing. > > Luckily it also easier in this case, see below. > > I understand there might be users other than U-Boot/Linux that use an > odd number of dummy bits (not multiple of 8). If your concern was > about model behavior changes, sure I can update > qemu/docs/system/deprecated.rst to mention that some flashes in the > m25p80 model now implement dummy cycles as bytes. Yes, something like that. My concern is that since this functionality has been in tree for while, users have found known or unknown features that got introduced by it. By removing the functionality (and the known/uknown features) we are riscing to brake our user's use cases (currently I'm aware of one feature/use case but it is not unlikely that there are more). [1] states that "In general features are intended to be supported indefinitely once introduced into QEMU", to me that makes very much sense because the opposite would mean that we were not reliable. So in case [1] needs to be honored it looks to be safer to add functionality instead of removing (and riscing the removal of use cases/features). Luckily I still believe in this case that it will be easier to go forward (even if I also agree on what you are saying below about what I proposed). > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > I called it "seriously broken" because current implementation only > > > considered one type of SPI controllers while completely ignoring the > > > other type. > > > > If we change view and see this from the perspective of m25p80, it models the > > commands a certain way and provides an API that the SPI controllers need to > > implement for interacting with it. It is true that there are SPI controllers > > referred to above that do not support the portion of that API that corresponds > > to commands with dummy clock cycles, but I don't think it is true that this is > > broken since there is also one SPI controller that has a working implementation > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > will still be honored as in the same time making it possible to have full > > support for the API in the SPI controllers that currently do not (please reread > > the proposal in my previous reply that attempts to do this). I myself see this > > as win/win situation, also because no controller should need modifications. > > > > I am afraid your proposal does not work. Your proposed new device > property 'model_dummy_bytes' to select to convert the accurate dummy > clock cycle count to dummy bytes inside m25p80, is hard to justify as > a property to the flash itself, as the behavior is tightly coupled to > how the SPI controller works. I agree on above. I decided though that instead of posting sample code in here I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, Xilinx ZynqMP GQSPI should not need any modication in a first step. > > Please take a look at the Xilinx GQSPI controller, which supports both > use cases, that the dummy cycles can be transferred via tx fifo, or > generated by the controller automatically. Please read the example > given in: > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > Command (EBh) > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > true when working with the Xilinx GQSPI controller, you are bound to > only allow guest software to use tx fifo to transfer the dummy cycles, > and this is wrong. > > > > > > > > > > for the commands is because no request has been made for them. Also there is > > > > one controller that has support. > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is > > > "seriously broken" for those case 1 type controllers because they > > > cannot read anything from the m25p80 model at all. Unless the guest > > > software being tested only uses Read (03h) command which is not > > > affected. But I can't find a software that uses Read instead of Fast > > > Read. > > > > > > > > The issue you pointed out that we require the total number of dummy > > > > > bits should be multiple of 8 is true, that's why I added the > > > > > unimplemented log message in this series (patch 2/3/4) to warn users > > > > > if this expectation is not met. However this will not cause any issue > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the > > > > > same assumption as we do here. > > > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > > > > there is a logic to calculate the dummy bytes needed for fast read > > > > > command: > > > > > > > > > > /* convert the dummy cycles to the number of bytes */ > > > > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > > > > > > > Note the default dummy cycles configuration for all flashes I have > > > > > looked into as of today, meets the multiple of 8 assumption. On some > > > > > flashes the dummy cycle number is configurable, and if it's been > > > > > configured to be an odd value, it would not work on U-Boot/Linux in > > > > > the first place. > > > > > > > > > > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > > > > flash controllers. There are major two cases: > > > > > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > > > > For such case, driver will calculate the correct number of dummy > > > > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > > > > fix flashes working with such controllers. > > > > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > > > > > > > Please send patches to explain this in detail how this is going to > > > > > work. I am open to all possible solutions. > > > > > > > > In that case I suggest that you instead try with a device property > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling. > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > > > > can follow a similar pattern). This way the fifo mode will be able to work the > > > > way you desire while also keeping the current functionality intact. Suddenly > > > > removing functionality (features) will take users by surprise. > > > > > > I don't think we are removing any features. This is a fix to make the > > > model to be used by any SPI controllers. > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8 > > > assumption for the dummy bit, which is the default configuration for > > > all flashes I have looked into so far. Can you please comment what use > > > case you want to support? I requested a U-Boot/Linux kernel testing in > > > the previous SST thread [1] against Xilinx GQSPI but there was no > > > response. > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the > > various software components I followed the official doc in [3]. > > I see the following QEMU commands are used to test booting U-Boot/Linux: > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G > -serial stdio -display none -device loader,file=u-boot.elf -kernel > bl31.elf -device loader,addr=0x40000000,file=Image -device > loader,addr=0x2000000,file=system.dtb > > I am not sure where the system.dtb gets built from? It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I will ask you to try a little first before asking for further guidance. Best regards, Francisco Iglesias [1] qemu/docs/system/deprecated.rst [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md > > In [3], it mentions the Xilinx QEMU is used. And a different QEMU > command is used as the example to launch U-Boot which is different > from your command above. > > See https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841606/QEMU+-+Zynq+UltraScale+MPSoC#QEMU-ZynqUltraScale+MPSoC-RunningaZynqUltraScale+U-bootImageOnXilinx'sARMQEMU > > $ ./aarch64-softmmu/qemu-system-aarch64 -M arm-generic-fdt -serial > mon:stdio -serial /dev/null -display none \ > -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4 \ # Un-reset the A53 > -device loader,file=./pre-built/linux/images/bl31.elf,cpu-num=0 \ # > ARM Trusted Firmware > -device loader,file=./pre-built/linux/images/u-boot.elf\ # The > u-boot exectuable > -hw-dtb ./pre-built/linux/images/zynqmp-qemu-arm.dtb # HW Device > Tree that QEMU uses to generate the model > > It is using a machine called "arm-generic-fdt", but in the mainline > QEMU there is no such machine called "arm-generic-fdt". > > > > > Best regards, > > Francisco > > > > [1] qemu/docs/system/deprecated.rst > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md > > [3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux > > > > Regards, > Bin
Hi Francisco, On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias <frasse.iglesias@gmail.com> wrote: > > Hi Bin, > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > Hi Francisco, > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > <frasse.iglesias@gmail.com> wrote: > > > > > > Hi Bin, > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > Hi Francisco, > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > Hi Bin, > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > Hi Francisco, > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > software generation should have been found out seriously broken long > > > > > > time ago! > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > that needs some special support from the SPI controller. For the case > > > > 1 controller, it's nothing special from the controller perspective, > > > > just like sending out a command, or address bytes, or data. The > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > automatically generated (case 2 controller) by the hardware. > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > intentional or untentional features provided by the functionality are being > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > cycles), but there might be others aswell. So by removing this functionality > > > above use case will brake, this since those test will not be reliable. > > > Furthermore, since users tend to be creative it is not possible to know if > > > there are other use cases that will be affected. This means that in case [1] > > > needs to be followed the safe path is to add functionality instead of removing. > > > Luckily it also easier in this case, see below. > > > > I understand there might be users other than U-Boot/Linux that use an > > odd number of dummy bits (not multiple of 8). If your concern was > > about model behavior changes, sure I can update > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > m25p80 model now implement dummy cycles as bytes. > > Yes, something like that. My concern is that since this functionality has been > in tree for while, users have found known or unknown features that got > introduced by it. By removing the functionality (and the known/uknown features) > we are riscing to brake our user's use cases (currently I'm aware of one > feature/use case but it is not unlikely that there are more). [1] states that > "In general features are intended to be supported indefinitely once introduced > into QEMU", to me that makes very much sense because the opposite would mean > that we were not reliable. So in case [1] needs to be honored it looks to be > safer to add functionality instead of removing (and riscing the removal of use > cases/features). Luckily I still believe in this case that it will be easier to > go forward (even if I also agree on what you are saying below about what I > proposed). > Even if the implementation is buggy and we need to keep the buggy implementation forever? I think that's why qemu/docs/system/deprecated.rst was created for deprecating such feature. > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > I called it "seriously broken" because current implementation only > > > > considered one type of SPI controllers while completely ignoring the > > > > other type. > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > commands a certain way and provides an API that the SPI controllers need to > > > implement for interacting with it. It is true that there are SPI controllers > > > referred to above that do not support the portion of that API that corresponds > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > broken since there is also one SPI controller that has a working implementation > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > will still be honored as in the same time making it possible to have full > > > support for the API in the SPI controllers that currently do not (please reread > > > the proposal in my previous reply that attempts to do this). I myself see this > > > as win/win situation, also because no controller should need modifications. > > > > > > > I am afraid your proposal does not work. Your proposed new device > > property 'model_dummy_bytes' to select to convert the accurate dummy > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > a property to the flash itself, as the behavior is tightly coupled to > > how the SPI controller works. > > I agree on above. I decided though that instead of posting sample code in here > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > Xilinx ZynqMP GQSPI should not need any modication in a first step. > Wait, (see below) > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > use cases, that the dummy cycles can be transferred via tx fifo, or > > generated by the controller automatically. Please read the example > > given in: > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > Command (EBh) > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > true when working with the Xilinx GQSPI controller, you are bound to > > only allow guest software to use tx fifo to transfer the dummy cycles, > > and this is wrong. > > You missed this part. I looked at your RFC, and as I mentioned above your proposal cannot support the complicated controller like Xilinx GQSPI. Please read the example of table 24-22. With your RFC, you mandate guest software's GQSPI driver to only use hardware dummy cycle generation, which is wrong. > > > > > > > > > > > > for the commands is because no request has been made for them. Also there is > > > > > one controller that has support. > > > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is > > > > "seriously broken" for those case 1 type controllers because they > > > > cannot read anything from the m25p80 model at all. Unless the guest > > > > software being tested only uses Read (03h) command which is not > > > > affected. But I can't find a software that uses Read instead of Fast > > > > Read. > > > > > > > > > > The issue you pointed out that we require the total number of dummy > > > > > > bits should be multiple of 8 is true, that's why I added the > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users > > > > > > if this expectation is not met. However this will not cause any issue > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the > > > > > > same assumption as we do here. > > > > > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > > > > > there is a logic to calculate the dummy bytes needed for fast read > > > > > > command: > > > > > > > > > > > > /* convert the dummy cycles to the number of bytes */ > > > > > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > > > > > > > > > Note the default dummy cycles configuration for all flashes I have > > > > > > looked into as of today, meets the multiple of 8 assumption. On some > > > > > > flashes the dummy cycle number is configurable, and if it's been > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in > > > > > > the first place. > > > > > > > > > > > > > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > > > > > flash controllers. There are major two cases: > > > > > > > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > > > > > For such case, driver will calculate the correct number of dummy > > > > > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > > > > > fix flashes working with such controllers. > > > > > > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > > > > > > > > > Please send patches to explain this in detail how this is going to > > > > > > work. I am open to all possible solutions. > > > > > > > > > > In that case I suggest that you instead try with a device property > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the > > > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling. > > > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > > > > > can follow a similar pattern). This way the fifo mode will be able to work the > > > > > way you desire while also keeping the current functionality intact. Suddenly > > > > > removing functionality (features) will take users by surprise. > > > > > > > > I don't think we are removing any features. This is a fix to make the > > > > model to be used by any SPI controllers. > > > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8 > > > > assumption for the dummy bit, which is the default configuration for > > > > all flashes I have looked into so far. Can you please comment what use > > > > case you want to support? I requested a U-Boot/Linux kernel testing in > > > > the previous SST thread [1] against Xilinx GQSPI but there was no > > > > response. > > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the > > > various software components I followed the official doc in [3]. > > > > I see the following QEMU commands are used to test booting U-Boot/Linux: > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G > > -serial stdio -display none -device loader,file=u-boot.elf -kernel > > bl31.elf -device loader,addr=0x40000000,file=Image -device > > loader,addr=0x2000000,file=system.dtb > > > > I am not sure where the system.dtb gets built from? > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I > will ask you to try a little first before asking for further guidance. > I tried, but no success. I removed the "-device loader" part for loading kernel image and the device tree, and only focused on booting U-Boot. The ATF bl31.elf was built from https://github.com/ARM-software/arm-trusted-firmware, by following build instructions at https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html. U-Boot was built from the upstream U-Boot. $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel bl31.elf ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 NOTICE: BL31: v2.4(release):v2.4-228-g337e493 NOTICE: BL31: Built : 21:18:14, Jan 20 2021 ERROR: BL31: Platform Management API version error. Expected: v1.1 - Found: v0.0 ERROR: Error initializing runtime service sip_svc I also tried the Xilinx fork of ATF from https://github.com/Xilinx/arm-trusted-firmware, by following build instructions at https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel bl31.elf ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 NOTICE: BL31: v2.2(release):xilinx-v2020.2 NOTICE: BL31: Built : 21:52:38, Jan 20 2021 ERROR: BL31: Platform Management API version error. Expected: v1.1 - Found: v0.0 ERROR: Error initializing runtime service sip_svc Then I tried to build a U-Boot from the Xilinx fork at https://github.com/Xilinx/u-boot-xlnx/, still no success. > Best regards, > Francisco Iglesias > > [1] qemu/docs/system/deprecated.rst > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md > > Regards, Bin
Dear Bin, On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > Hi Francisco, > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Hi Bin, > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > Hi Francisco, > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Hi Bin, > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > Hi Francisco, > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > Hi Francisco, > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > > software generation should have been found out seriously broken long > > > > > > > time ago! > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > > that needs some special support from the SPI controller. For the case > > > > > 1 controller, it's nothing special from the controller perspective, > > > > > just like sending out a command, or address bytes, or data. The > > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > > automatically generated (case 2 controller) by the hardware. > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > > intentional or untentional features provided by the functionality are being > > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > > cycles), but there might be others aswell. So by removing this functionality > > > > above use case will brake, this since those test will not be reliable. > > > > Furthermore, since users tend to be creative it is not possible to know if > > > > there are other use cases that will be affected. This means that in case [1] > > > > needs to be followed the safe path is to add functionality instead of removing. > > > > Luckily it also easier in this case, see below. > > > > > > I understand there might be users other than U-Boot/Linux that use an > > > odd number of dummy bits (not multiple of 8). If your concern was > > > about model behavior changes, sure I can update > > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > > m25p80 model now implement dummy cycles as bytes. > > > > Yes, something like that. My concern is that since this functionality has been > > in tree for while, users have found known or unknown features that got > > introduced by it. By removing the functionality (and the known/uknown features) > > we are riscing to brake our user's use cases (currently I'm aware of one > > feature/use case but it is not unlikely that there are more). [1] states that > > "In general features are intended to be supported indefinitely once introduced > > into QEMU", to me that makes very much sense because the opposite would mean > > that we were not reliable. So in case [1] needs to be honored it looks to be > > safer to add functionality instead of removing (and riscing the removal of use > > cases/features). Luckily I still believe in this case that it will be easier to > > go forward (even if I also agree on what you are saying below about what I > > proposed). > > > > Even if the implementation is buggy and we need to keep the buggy > implementation forever? I think that's why > qemu/docs/system/deprecated.rst was created for deprecating such > feature. With the RFC I posted all commands in m25p80 are working for both the case 1 controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). Because of this, I, with all respect, will have to disagree that this is buggy. > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > > > I called it "seriously broken" because current implementation only > > > > > considered one type of SPI controllers while completely ignoring the > > > > > other type. > > > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > > commands a certain way and provides an API that the SPI controllers need to > > > > implement for interacting with it. It is true that there are SPI controllers > > > > referred to above that do not support the portion of that API that corresponds > > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > > broken since there is also one SPI controller that has a working implementation > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > > will still be honored as in the same time making it possible to have full > > > > support for the API in the SPI controllers that currently do not (please reread > > > > the proposal in my previous reply that attempts to do this). I myself see this > > > > as win/win situation, also because no controller should need modifications. > > > > > > > > > > I am afraid your proposal does not work. Your proposed new device > > > property 'model_dummy_bytes' to select to convert the accurate dummy > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > > a property to the flash itself, as the behavior is tightly coupled to > > > how the SPI controller works. > > > > I agree on above. I decided though that instead of posting sample code in here > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > > Xilinx ZynqMP GQSPI should not need any modication in a first step. > > > > Wait, (see below) > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > > use cases, that the dummy cycles can be transferred via tx fifo, or > > > generated by the controller automatically. Please read the example > > > given in: > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > > Command (EBh) > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > > true when working with the Xilinx GQSPI controller, you are bound to > > > only allow guest software to use tx fifo to transfer the dummy cycles, > > > and this is wrong. > > > > > You missed this part. I looked at your RFC, and as I mentioned above > your proposal cannot support the complicated controller like Xilinx > GQSPI. Please read the example of table 24-22. With your RFC, you > mandate guest software's GQSPI driver to only use hardware dummy cycle > generation, which is wrong. > First, thank you very much for looking into the RFC series, very much appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 locations in the file, in 1 location the transfer referred to above is done, in another location the transfer through the txfifo is done. The location where transfer referred to above is done will not need any modifications (and will thus work equally well as it does currently). Now that above has is cleared out, and since I know you are heavily loaded with other higher prio tasks, lets wait for the maintainers to also have a look into the RFC (understandibly this can take some time due to that they also are heavily loaded). Best regards, Francisco Iglesias > > > > > > > > > > > > > > > for the commands is because no request has been made for them. Also there is > > > > > > one controller that has support. > > > > > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is > > > > > "seriously broken" for those case 1 type controllers because they > > > > > cannot read anything from the m25p80 model at all. Unless the guest > > > > > software being tested only uses Read (03h) command which is not > > > > > affected. But I can't find a software that uses Read instead of Fast > > > > > Read. > > > > > > > > > > > > The issue you pointed out that we require the total number of dummy > > > > > > > bits should be multiple of 8 is true, that's why I added the > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users > > > > > > > if this expectation is not met. However this will not cause any issue > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the > > > > > > > same assumption as we do here. > > > > > > > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > > > > > > there is a logic to calculate the dummy bytes needed for fast read > > > > > > > command: > > > > > > > > > > > > > > /* convert the dummy cycles to the number of bytes */ > > > > > > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > > > > > > > > > > > Note the default dummy cycles configuration for all flashes I have > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some > > > > > > > flashes the dummy cycle number is configurable, and if it's been > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in > > > > > > > the first place. > > > > > > > > > > > > > > > > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > > > > > > flash controllers. There are major two cases: > > > > > > > > > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > > > > > > For such case, driver will calculate the correct number of dummy > > > > > > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > > > > > > fix flashes working with such controllers. > > > > > > > > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > > > > > > > > > > > Please send patches to explain this in detail how this is going to > > > > > > > work. I am open to all possible solutions. > > > > > > > > > > > > In that case I suggest that you instead try with a device property > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the > > > > > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling. > > > > > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the > > > > > > way you desire while also keeping the current functionality intact. Suddenly > > > > > > removing functionality (features) will take users by surprise. > > > > > > > > > > I don't think we are removing any features. This is a fix to make the > > > > > model to be used by any SPI controllers. > > > > > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8 > > > > > assumption for the dummy bit, which is the default configuration for > > > > > all flashes I have looked into so far. Can you please comment what use > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no > > > > > response. > > > > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the > > > > various software components I followed the official doc in [3]. > > > > > > I see the following QEMU commands are used to test booting U-Boot/Linux: > > > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel > > > bl31.elf -device loader,addr=0x40000000,file=Image -device > > > loader,addr=0x2000000,file=system.dtb > > > > > > I am not sure where the system.dtb gets built from? > > > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I > > will ask you to try a little first before asking for further guidance. > > > > I tried, but no success. I removed the "-device loader" part for > loading kernel image and the device tree, and only focused on booting > U-Boot. > > The ATF bl31.elf was built from > https://github.com/ARM-software/arm-trusted-firmware, by following > build instructions at > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html. > U-Boot was built from the upstream U-Boot. > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel > bl31.elf > ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 > NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 > NOTICE: BL31: v2.4(release):v2.4-228-g337e493 > NOTICE: BL31: Built : 21:18:14, Jan 20 2021 > ERROR: BL31: Platform Management API version error. Expected: v1.1 - > Found: v0.0 > ERROR: Error initializing runtime service sip_svc > > I also tried the Xilinx fork of ATF from > https://github.com/Xilinx/arm-trusted-firmware, by following build > instructions at > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel > bl31.elf > ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 > NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 > NOTICE: BL31: v2.2(release):xilinx-v2020.2 > NOTICE: BL31: Built : 21:52:38, Jan 20 2021 > ERROR: BL31: Platform Management API version error. Expected: v1.1 - > Found: v0.0 > ERROR: Error initializing runtime service sip_svc > > Then I tried to build a U-Boot from the Xilinx fork at > https://github.com/Xilinx/u-boot-xlnx/, still no success. > > > Best regards, > > Francisco Iglesias > > > > [1] qemu/docs/system/deprecated.rst > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md > > > > > > Regards, > Bin
Hi Francisco, On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias <frasse.iglesias@gmail.com> wrote: > > Dear Bin, > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > Hi Francisco, > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > <frasse.iglesias@gmail.com> wrote: > > > > > > Hi Bin, > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > Hi Francisco, > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > Hi Bin, > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > Hi Francisco, > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > > > software generation should have been found out seriously broken long > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > > > that needs some special support from the SPI controller. For the case > > > > > > 1 controller, it's nothing special from the controller perspective, > > > > > > just like sending out a command, or address bytes, or data. The > > > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > > > automatically generated (case 2 controller) by the hardware. > > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > > > intentional or untentional features provided by the functionality are being > > > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > > > cycles), but there might be others aswell. So by removing this functionality > > > > > above use case will brake, this since those test will not be reliable. > > > > > Furthermore, since users tend to be creative it is not possible to know if > > > > > there are other use cases that will be affected. This means that in case [1] > > > > > needs to be followed the safe path is to add functionality instead of removing. > > > > > Luckily it also easier in this case, see below. > > > > > > > > I understand there might be users other than U-Boot/Linux that use an > > > > odd number of dummy bits (not multiple of 8). If your concern was > > > > about model behavior changes, sure I can update > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > Yes, something like that. My concern is that since this functionality has been > > > in tree for while, users have found known or unknown features that got > > > introduced by it. By removing the functionality (and the known/uknown features) > > > we are riscing to brake our user's use cases (currently I'm aware of one > > > feature/use case but it is not unlikely that there are more). [1] states that > > > "In general features are intended to be supported indefinitely once introduced > > > into QEMU", to me that makes very much sense because the opposite would mean > > > that we were not reliable. So in case [1] needs to be honored it looks to be > > > safer to add functionality instead of removing (and riscing the removal of use > > > cases/features). Luckily I still believe in this case that it will be easier to > > > go forward (even if I also agree on what you are saying below about what I > > > proposed). > > > > > > > Even if the implementation is buggy and we need to keep the buggy > > implementation forever? I think that's why > > qemu/docs/system/deprecated.rst was created for deprecating such > > feature. > > With the RFC I posted all commands in m25p80 are working for both the case 1 > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). > Because of this, I, with all respect, will have to disagree that this is buggy. Well, the existing m25p80 implementation that uses dummy cycle accuracy for those flashes prevents all SPI controllers that use tx fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > > > > > I called it "seriously broken" because current implementation only > > > > > > considered one type of SPI controllers while completely ignoring the > > > > > > other type. > > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > > > commands a certain way and provides an API that the SPI controllers need to > > > > > implement for interacting with it. It is true that there are SPI controllers > > > > > referred to above that do not support the portion of that API that corresponds > > > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > > > broken since there is also one SPI controller that has a working implementation > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > > > will still be honored as in the same time making it possible to have full > > > > > support for the API in the SPI controllers that currently do not (please reread > > > > > the proposal in my previous reply that attempts to do this). I myself see this > > > > > as win/win situation, also because no controller should need modifications. > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new device > > > > property 'model_dummy_bytes' to select to convert the accurate dummy > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > > > a property to the flash itself, as the behavior is tightly coupled to > > > > how the SPI controller works. > > > > > > I agree on above. I decided though that instead of posting sample code in here > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > > > Xilinx ZynqMP GQSPI should not need any modication in a first step. > > > > > > > Wait, (see below) > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > > > use cases, that the dummy cycles can be transferred via tx fifo, or > > > > generated by the controller automatically. Please read the example > > > > given in: > > > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > > > Command (EBh) > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > > > true when working with the Xilinx GQSPI controller, you are bound to > > > > only allow guest software to use tx fifo to transfer the dummy cycles, > > > > and this is wrong. > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned above > > your proposal cannot support the complicated controller like Xilinx > > GQSPI. Please read the example of table 24-22. With your RFC, you > > mandate guest software's GQSPI driver to only use hardware dummy cycle > > generation, which is wrong. > > > > First, thank you very much for looking into the RFC series, very much > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 > locations in the file, in 1 location the transfer referred to above is done, in > another location the transfer through the txfifo is done. The location where > transfer referred to above is done will not need any modifications (and will > thus work equally well as it does currently). Please explain this a little bit. How does your RFC series handle cases as described in table 24-22, where the 6 dummy cycles are split into 2 transfers, with one transfer using tx fifo, and the other one using hardware dummy cycle generation? > > Now that above has is cleared out, and since I know you are heavily loaded with > other higher prio tasks, lets wait for the maintainers to also have a look into > the RFC (understandibly this can take some time due to that they also are > heavily loaded). Yes, maintainers are pretty much silent on this topic. However may I ask you to provide more details on my questions below on booting U-Boot/Linux with the QEMU? You can post patches to add documentation for zynqmp in docs/system/arm, or once I get a working instructions, I could do that too. Much appreciated. > > Best regards, > Francisco Iglesias > > > > > > > > > > > > > > > > > > > > for the commands is because no request has been made for them. Also there is > > > > > > > one controller that has support. > > > > > > > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the > > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is > > > > > > "seriously broken" for those case 1 type controllers because they > > > > > > cannot read anything from the m25p80 model at all. Unless the guest > > > > > > software being tested only uses Read (03h) command which is not > > > > > > affected. But I can't find a software that uses Read instead of Fast > > > > > > Read. > > > > > > > > > > > > > > The issue you pointed out that we require the total number of dummy > > > > > > > > bits should be multiple of 8 is true, that's why I added the > > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users > > > > > > > > if this expectation is not met. However this will not cause any issue > > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the > > > > > > > > same assumption as we do here. > > > > > > > > > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > > > > > > > there is a logic to calculate the dummy bytes needed for fast read > > > > > > > > command: > > > > > > > > > > > > > > > > /* convert the dummy cycles to the number of bytes */ > > > > > > > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > > > > > > > > > > > > > Note the default dummy cycles configuration for all flashes I have > > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some > > > > > > > > flashes the dummy cycle number is configurable, and if it's been > > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in > > > > > > > > the first place. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > > > > > > > flash controllers. There are major two cases: > > > > > > > > > > > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > > > > > > > For such case, driver will calculate the correct number of dummy > > > > > > > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > > > > > > > fix flashes working with such controllers. > > > > > > > > > > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > > > > > > > > > > > > > Please send patches to explain this in detail how this is going to > > > > > > > > work. I am open to all possible solutions. > > > > > > > > > > > > > > In that case I suggest that you instead try with a device property > > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the > > > > > > > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling. > > > > > > > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the > > > > > > > way you desire while also keeping the current functionality intact. Suddenly > > > > > > > removing functionality (features) will take users by surprise. > > > > > > > > > > > > I don't think we are removing any features. This is a fix to make the > > > > > > model to be used by any SPI controllers. > > > > > > > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8 > > > > > > assumption for the dummy bit, which is the default configuration for > > > > > > all flashes I have looked into so far. Can you please comment what use > > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in > > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no > > > > > > response. > > > > > > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the > > > > > various software components I followed the official doc in [3]. > > > > > > > > I see the following QEMU commands are used to test booting U-Boot/Linux: > > > > > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G > > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel > > > > bl31.elf -device loader,addr=0x40000000,file=Image -device > > > > loader,addr=0x2000000,file=system.dtb > > > > > > > > I am not sure where the system.dtb gets built from? > > > > > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for > > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I > > > will ask you to try a little first before asking for further guidance. > > > > > > > I tried, but no success. I removed the "-device loader" part for > > loading kernel image and the device tree, and only focused on booting > > U-Boot. > > > > The ATF bl31.elf was built from > > https://github.com/ARM-software/arm-trusted-firmware, by following > > build instructions at > > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html. > > U-Boot was built from the upstream U-Boot. > > > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel > > bl31.elf > > ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 > > NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 > > NOTICE: BL31: v2.4(release):v2.4-228-g337e493 > > NOTICE: BL31: Built : 21:18:14, Jan 20 2021 > > ERROR: BL31: Platform Management API version error. Expected: v1.1 - > > Found: v0.0 > > ERROR: Error initializing runtime service sip_svc > > > > I also tried the Xilinx fork of ATF from > > https://github.com/Xilinx/arm-trusted-firmware, by following build > > instructions at > > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF > > > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel > > bl31.elf > > ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 > > NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 > > NOTICE: BL31: v2.2(release):xilinx-v2020.2 > > NOTICE: BL31: Built : 21:52:38, Jan 20 2021 > > ERROR: BL31: Platform Management API version error. Expected: v1.1 - > > Found: v0.0 > > ERROR: Error initializing runtime service sip_svc > > > > Then I tried to build a U-Boot from the Xilinx fork at > > https://github.com/Xilinx/u-boot-xlnx/, still no success. > > > > > Best regards, > > > Francisco Iglesias > > > > > > [1] qemu/docs/system/deprecated.rst > > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md > > > Regards, Bin
Dear Bin, On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > Hi Francisco, > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Dear Bin, > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > > Hi Francisco, > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Hi Bin, > > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > > Hi Francisco, > > > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > > Hi Francisco, > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > > > > software generation should have been found out seriously broken long > > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > > > > that needs some special support from the SPI controller. For the case > > > > > > > 1 controller, it's nothing special from the controller perspective, > > > > > > > just like sending out a command, or address bytes, or data. The > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > > > > automatically generated (case 2 controller) by the hardware. > > > > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > > > > intentional or untentional features provided by the functionality are being > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > > > > cycles), but there might be others aswell. So by removing this functionality > > > > > > above use case will brake, this since those test will not be reliable. > > > > > > Furthermore, since users tend to be creative it is not possible to know if > > > > > > there are other use cases that will be affected. This means that in case [1] > > > > > > needs to be followed the safe path is to add functionality instead of removing. > > > > > > Luckily it also easier in this case, see below. > > > > > > > > > > I understand there might be users other than U-Boot/Linux that use an > > > > > odd number of dummy bits (not multiple of 8). If your concern was > > > > > about model behavior changes, sure I can update > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > > > Yes, something like that. My concern is that since this functionality has been > > > > in tree for while, users have found known or unknown features that got > > > > introduced by it. By removing the functionality (and the known/uknown features) > > > > we are riscing to brake our user's use cases (currently I'm aware of one > > > > feature/use case but it is not unlikely that there are more). [1] states that > > > > "In general features are intended to be supported indefinitely once introduced > > > > into QEMU", to me that makes very much sense because the opposite would mean > > > > that we were not reliable. So in case [1] needs to be honored it looks to be > > > > safer to add functionality instead of removing (and riscing the removal of use > > > > cases/features). Luckily I still believe in this case that it will be easier to > > > > go forward (even if I also agree on what you are saying below about what I > > > > proposed). > > > > > > > > > > Even if the implementation is buggy and we need to keep the buggy > > > implementation forever? I think that's why > > > qemu/docs/system/deprecated.rst was created for deprecating such > > > feature. > > > > With the RFC I posted all commands in m25p80 are working for both the case 1 > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). > > Because of this, I, with all respect, will have to disagree that this is buggy. > > Well, the existing m25p80 implementation that uses dummy cycle > accuracy for those flashes prevents all SPI controllers that use tx > fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > > > > > > > I called it "seriously broken" because current implementation only > > > > > > > considered one type of SPI controllers while completely ignoring the > > > > > > > other type. > > > > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > > > > commands a certain way and provides an API that the SPI controllers need to > > > > > > implement for interacting with it. It is true that there are SPI controllers > > > > > > referred to above that do not support the portion of that API that corresponds > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > > > > broken since there is also one SPI controller that has a working implementation > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > > > > will still be honored as in the same time making it possible to have full > > > > > > support for the API in the SPI controllers that currently do not (please reread > > > > > > the proposal in my previous reply that attempts to do this). I myself see this > > > > > > as win/win situation, also because no controller should need modifications. > > > > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new device > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > > > > a property to the flash itself, as the behavior is tightly coupled to > > > > > how the SPI controller works. > > > > > > > > I agree on above. I decided though that instead of posting sample code in here > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step. > > > > > > > > > > Wait, (see below) > > > > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or > > > > > generated by the controller automatically. Please read the example > > > > > given in: > > > > > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > > > > Command (EBh) > > > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > > > > true when working with the Xilinx GQSPI controller, you are bound to > > > > > only allow guest software to use tx fifo to transfer the dummy cycles, > > > > > and this is wrong. > > > > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned above > > > your proposal cannot support the complicated controller like Xilinx > > > GQSPI. Please read the example of table 24-22. With your RFC, you > > > mandate guest software's GQSPI driver to only use hardware dummy cycle > > > generation, which is wrong. > > > > > > > First, thank you very much for looking into the RFC series, very much > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 > > locations in the file, in 1 location the transfer referred to above is done, in > > another location the transfer through the txfifo is done. The location where > > transfer referred to above is done will not need any modifications (and will > > thus work equally well as it does currently). > > Please explain this a little bit. How does your RFC series handle > cases as described in table 24-22, where the 6 dummy cycles are split > into 2 transfers, with one transfer using tx fifo, and the other one > using hardware dummy cycle generation? Above transfer is already handled in the model, and since it will not change it will still work afterwards. About below, sure I'll provide some doc once I get some time over. Best regards, Francisco Iglesias > > > > > Now that above has is cleared out, and since I know you are heavily loaded with > > other higher prio tasks, lets wait for the maintainers to also have a look into > > the RFC (understandibly this can take some time due to that they also are > > heavily loaded). > > Yes, maintainers are pretty much silent on this topic. > > However may I ask you to provide more details on my questions below on > booting U-Boot/Linux with the QEMU? > > You can post patches to add documentation for zynqmp in > docs/system/arm, or once I get a working instructions, I could do that > too. Much appreciated. > > > > > Best regards, > > Francisco Iglesias > > > > > > > > > > > > > > > > > > > > > > > > > for the commands is because no request has been made for them. Also there is > > > > > > > > one controller that has support. > > > > > > > > > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the > > > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is > > > > > > > "seriously broken" for those case 1 type controllers because they > > > > > > > cannot read anything from the m25p80 model at all. Unless the guest > > > > > > > software being tested only uses Read (03h) command which is not > > > > > > > affected. But I can't find a software that uses Read instead of Fast > > > > > > > Read. > > > > > > > > > > > > > > > > The issue you pointed out that we require the total number of dummy > > > > > > > > > bits should be multiple of 8 is true, that's why I added the > > > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users > > > > > > > > > if this expectation is not met. However this will not cause any issue > > > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the > > > > > > > > > same assumption as we do here. > > > > > > > > > > > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > > > > > > > > there is a logic to calculate the dummy bytes needed for fast read > > > > > > > > > command: > > > > > > > > > > > > > > > > > > /* convert the dummy cycles to the number of bytes */ > > > > > > > > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > > > > > > > > > > > > > > > Note the default dummy cycles configuration for all flashes I have > > > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some > > > > > > > > > flashes the dummy cycle number is configurable, and if it's been > > > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in > > > > > > > > > the first place. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > > > > > > > > flash controllers. There are major two cases: > > > > > > > > > > > > > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > > > > > > > > For such case, driver will calculate the correct number of dummy > > > > > > > > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > > > > > > > > fix flashes working with such controllers. > > > > > > > > > > > > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > > > > > > > > > > > > > > > Please send patches to explain this in detail how this is going to > > > > > > > > > work. I am open to all possible solutions. > > > > > > > > > > > > > > > > In that case I suggest that you instead try with a device property > > > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > > > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the > > > > > > > > > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling. > > > > > > > > > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > > > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the > > > > > > > > way you desire while also keeping the current functionality intact. Suddenly > > > > > > > > removing functionality (features) will take users by surprise. > > > > > > > > > > > > > > I don't think we are removing any features. This is a fix to make the > > > > > > > model to be used by any SPI controllers. > > > > > > > > > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8 > > > > > > > assumption for the dummy bit, which is the default configuration for > > > > > > > all flashes I have looked into so far. Can you please comment what use > > > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in > > > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no > > > > > > > response. > > > > > > > > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the > > > > > > various software components I followed the official doc in [3]. > > > > > > > > > > I see the following QEMU commands are used to test booting U-Boot/Linux: > > > > > > > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G > > > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel > > > > > bl31.elf -device loader,addr=0x40000000,file=Image -device > > > > > loader,addr=0x2000000,file=system.dtb > > > > > > > > > > I am not sure where the system.dtb gets built from? > > > > > > > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for > > > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I > > > > will ask you to try a little first before asking for further guidance. > > > > > > > > > > I tried, but no success. I removed the "-device loader" part for > > > loading kernel image and the device tree, and only focused on booting > > > U-Boot. > > > > > > The ATF bl31.elf was built from > > > https://github.com/ARM-software/arm-trusted-firmware, by following > > > build instructions at > > > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html. > > > U-Boot was built from the upstream U-Boot. > > > > > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m > > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel > > > bl31.elf > > > ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 > > > NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 > > > NOTICE: BL31: v2.4(release):v2.4-228-g337e493 > > > NOTICE: BL31: Built : 21:18:14, Jan 20 2021 > > > ERROR: BL31: Platform Management API version error. Expected: v1.1 - > > > Found: v0.0 > > > ERROR: Error initializing runtime service sip_svc > > > > > > I also tried the Xilinx fork of ATF from > > > https://github.com/Xilinx/arm-trusted-firmware, by following build > > > instructions at > > > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF > > > > > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m > > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel > > > bl31.elf > > > ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 > > > NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 > > > NOTICE: BL31: v2.2(release):xilinx-v2020.2 > > > NOTICE: BL31: Built : 21:52:38, Jan 20 2021 > > > ERROR: BL31: Platform Management API version error. Expected: v1.1 - > > > Found: v0.0 > > > ERROR: Error initializing runtime service sip_svc > > > > > > Then I tried to build a U-Boot from the Xilinx fork at > > > https://github.com/Xilinx/u-boot-xlnx/, still no success. > > > > > > > Best regards, > > > > Francisco Iglesias > > > > > > > > [1] qemu/docs/system/deprecated.rst > > > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md > > > > > > Regards, > Bin
Hi Bin, On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > Hi Francisco, > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Dear Bin, > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > > Hi Francisco, > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Hi Bin, > > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > > Hi Francisco, > > > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > > Hi Francisco, > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > > > > software generation should have been found out seriously broken long > > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > > > > that needs some special support from the SPI controller. For the case > > > > > > > 1 controller, it's nothing special from the controller perspective, > > > > > > > just like sending out a command, or address bytes, or data. The > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > > > > automatically generated (case 2 controller) by the hardware. > > > > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > > > > intentional or untentional features provided by the functionality are being > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > > > > cycles), but there might be others aswell. So by removing this functionality > > > > > > above use case will brake, this since those test will not be reliable. > > > > > > Furthermore, since users tend to be creative it is not possible to know if > > > > > > there are other use cases that will be affected. This means that in case [1] > > > > > > needs to be followed the safe path is to add functionality instead of removing. > > > > > > Luckily it also easier in this case, see below. > > > > > > > > > > I understand there might be users other than U-Boot/Linux that use an > > > > > odd number of dummy bits (not multiple of 8). If your concern was > > > > > about model behavior changes, sure I can update > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > > > Yes, something like that. My concern is that since this functionality has been > > > > in tree for while, users have found known or unknown features that got > > > > introduced by it. By removing the functionality (and the known/uknown features) > > > > we are riscing to brake our user's use cases (currently I'm aware of one > > > > feature/use case but it is not unlikely that there are more). [1] states that > > > > "In general features are intended to be supported indefinitely once introduced > > > > into QEMU", to me that makes very much sense because the opposite would mean > > > > that we were not reliable. So in case [1] needs to be honored it looks to be > > > > safer to add functionality instead of removing (and riscing the removal of use > > > > cases/features). Luckily I still believe in this case that it will be easier to > > > > go forward (even if I also agree on what you are saying below about what I > > > > proposed). > > > > > > > > > > Even if the implementation is buggy and we need to keep the buggy > > > implementation forever? I think that's why > > > qemu/docs/system/deprecated.rst was created for deprecating such > > > feature. > > > > With the RFC I posted all commands in m25p80 are working for both the case 1 > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). > > Because of this, I, with all respect, will have to disagree that this is buggy. > > Well, the existing m25p80 implementation that uses dummy cycle > accuracy for those flashes prevents all SPI controllers that use tx > fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > > > > > > > I called it "seriously broken" because current implementation only > > > > > > > considered one type of SPI controllers while completely ignoring the > > > > > > > other type. > > > > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > > > > commands a certain way and provides an API that the SPI controllers need to > > > > > > implement for interacting with it. It is true that there are SPI controllers > > > > > > referred to above that do not support the portion of that API that corresponds > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > > > > broken since there is also one SPI controller that has a working implementation > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > > > > will still be honored as in the same time making it possible to have full > > > > > > support for the API in the SPI controllers that currently do not (please reread > > > > > > the proposal in my previous reply that attempts to do this). I myself see this > > > > > > as win/win situation, also because no controller should need modifications. > > > > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new device > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > > > > a property to the flash itself, as the behavior is tightly coupled to > > > > > how the SPI controller works. > > > > > > > > I agree on above. I decided though that instead of posting sample code in here > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step. > > > > > > > > > > Wait, (see below) > > > > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or > > > > > generated by the controller automatically. Please read the example > > > > > given in: > > > > > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > > > > Command (EBh) > > > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > > > > true when working with the Xilinx GQSPI controller, you are bound to > > > > > only allow guest software to use tx fifo to transfer the dummy cycles, > > > > > and this is wrong. > > > > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned above > > > your proposal cannot support the complicated controller like Xilinx > > > GQSPI. Please read the example of table 24-22. With your RFC, you > > > mandate guest software's GQSPI driver to only use hardware dummy cycle > > > generation, which is wrong. > > > > > > > First, thank you very much for looking into the RFC series, very much > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 > > locations in the file, in 1 location the transfer referred to above is done, in > > another location the transfer through the txfifo is done. The location where > > transfer referred to above is done will not need any modifications (and will > > thus work equally well as it does currently). > > Please explain this a little bit. How does your RFC series handle > cases as described in table 24-22, where the 6 dummy cycles are split > into 2 transfers, with one transfer using tx fifo, and the other one > using hardware dummy cycle generation? Sorry, I missunderstod. You are right, that won't work. Best regards, Francisco Iglesias > > > > > Now that above has is cleared out, and since I know you are heavily loaded with > > other higher prio tasks, lets wait for the maintainers to also have a look into > > the RFC (understandibly this can take some time due to that they also are > > heavily loaded). > > Yes, maintainers are pretty much silent on this topic. > > However may I ask you to provide more details on my questions below on > booting U-Boot/Linux with the QEMU? > > You can post patches to add documentation for zynqmp in > docs/system/arm, or once I get a working instructions, I could do that > too. Much appreciated. > > > > > Best regards, > > Francisco Iglesias > > > > > > > > > > > > > > > > > > > > > > > > > for the commands is because no request has been made for them. Also there is > > > > > > > > one controller that has support. > > > > > > > > > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the > > > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is > > > > > > > "seriously broken" for those case 1 type controllers because they > > > > > > > cannot read anything from the m25p80 model at all. Unless the guest > > > > > > > software being tested only uses Read (03h) command which is not > > > > > > > affected. But I can't find a software that uses Read instead of Fast > > > > > > > Read. > > > > > > > > > > > > > > > > The issue you pointed out that we require the total number of dummy > > > > > > > > > bits should be multiple of 8 is true, that's why I added the > > > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users > > > > > > > > > if this expectation is not met. However this will not cause any issue > > > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the > > > > > > > > > same assumption as we do here. > > > > > > > > > > > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(), > > > > > > > > > there is a logic to calculate the dummy bytes needed for fast read > > > > > > > > > command: > > > > > > > > > > > > > > > > > > /* convert the dummy cycles to the number of bytes */ > > > > > > > > > op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8; > > > > > > > > > > > > > > > > > > Note the default dummy cycles configuration for all flashes I have > > > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some > > > > > > > > > flashes the dummy cycle number is configurable, and if it's been > > > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in > > > > > > > > > the first place. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI > > > > > > > > > > > flash controllers. There are major two cases: > > > > > > > > > > > > > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo. > > > > > > > > > > > For such case, driver will calculate the correct number of dummy > > > > > > > > > > > bytes and write them into the tx fifo. Fixing the m25p80 model will > > > > > > > > > > > fix flashes working with such controllers. > > > > > > > > > > > > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation > > > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating > > > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting > > > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above. > > > > > > > > > > > > > > > > > > Please send patches to explain this in detail how this is going to > > > > > > > > > work. I am open to all possible solutions. > > > > > > > > > > > > > > > > In that case I suggest that you instead try with a device property > > > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle > > > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the > > > > > > > > > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling. > > > > > > > > > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles > > > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the > > > > > > > > way you desire while also keeping the current functionality intact. Suddenly > > > > > > > > removing functionality (features) will take users by surprise. > > > > > > > > > > > > > > I don't think we are removing any features. This is a fix to make the > > > > > > > model to be used by any SPI controllers. > > > > > > > > > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8 > > > > > > > assumption for the dummy bit, which is the default configuration for > > > > > > > all flashes I have looked into so far. Can you please comment what use > > > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in > > > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no > > > > > > > response. > > > > > > > > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the > > > > > > various software components I followed the official doc in [3]. > > > > > > > > > > I see the following QEMU commands are used to test booting U-Boot/Linux: > > > > > > > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G > > > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel > > > > > bl31.elf -device loader,addr=0x40000000,file=Image -device > > > > > loader,addr=0x2000000,file=system.dtb > > > > > > > > > > I am not sure where the system.dtb gets built from? > > > > > > > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for > > > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I > > > > will ask you to try a little first before asking for further guidance. > > > > > > > > > > I tried, but no success. I removed the "-device loader" part for > > > loading kernel image and the device tree, and only focused on booting > > > U-Boot. > > > > > > The ATF bl31.elf was built from > > > https://github.com/ARM-software/arm-trusted-firmware, by following > > > build instructions at > > > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html. > > > U-Boot was built from the upstream U-Boot. > > > > > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m > > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel > > > bl31.elf > > > ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 > > > NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 > > > NOTICE: BL31: v2.4(release):v2.4-228-g337e493 > > > NOTICE: BL31: Built : 21:18:14, Jan 20 2021 > > > ERROR: BL31: Platform Management API version error. Expected: v1.1 - > > > Found: v0.0 > > > ERROR: Error initializing runtime service sip_svc > > > > > > I also tried the Xilinx fork of ATF from > > > https://github.com/Xilinx/arm-trusted-firmware, by following build > > > instructions at > > > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF > > > > > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m > > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel > > > bl31.elf > > > ERROR: Incorrect XILINX IDCODE 0x0, maskid 0x4600093 > > > NOTICE: ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000 > > > NOTICE: BL31: v2.2(release):xilinx-v2020.2 > > > NOTICE: BL31: Built : 21:52:38, Jan 20 2021 > > > ERROR: BL31: Platform Management API version error. Expected: v1.1 - > > > Found: v0.0 > > > ERROR: Error initializing runtime service sip_svc > > > > > > Then I tried to build a U-Boot from the Xilinx fork at > > > https://github.com/Xilinx/u-boot-xlnx/, still no success. > > > > > > > Best regards, > > > > Francisco Iglesias > > > > > > > > [1] qemu/docs/system/deprecated.rst > > > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md > > > > > > Regards, > Bin
On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias <frasse.iglesias@gmail.com> wrote: > > Hi Bin, > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > > Hi Francisco, > > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > > <frasse.iglesias@gmail.com> wrote: > > > > > > Dear Bin, > > > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > > > Hi Francisco, > > > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > Hi Bin, > > > > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > > > Hi Francisco, > > > > > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > > > > > software generation should have been found out seriously broken long > > > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > > > > > that needs some special support from the SPI controller. For the case > > > > > > > > 1 controller, it's nothing special from the controller perspective, > > > > > > > > just like sending out a command, or address bytes, or data. The > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > > > > > automatically generated (case 2 controller) by the hardware. > > > > > > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > > > > > intentional or untentional features provided by the functionality are being > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > > > > > cycles), but there might be others aswell. So by removing this functionality > > > > > > > above use case will brake, this since those test will not be reliable. > > > > > > > Furthermore, since users tend to be creative it is not possible to know if > > > > > > > there are other use cases that will be affected. This means that in case [1] > > > > > > > needs to be followed the safe path is to add functionality instead of removing. > > > > > > > Luckily it also easier in this case, see below. > > > > > > > > > > > > I understand there might be users other than U-Boot/Linux that use an > > > > > > odd number of dummy bits (not multiple of 8). If your concern was > > > > > > about model behavior changes, sure I can update > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > > > > > Yes, something like that. My concern is that since this functionality has been > > > > > in tree for while, users have found known or unknown features that got > > > > > introduced by it. By removing the functionality (and the known/uknown features) > > > > > we are riscing to brake our user's use cases (currently I'm aware of one > > > > > feature/use case but it is not unlikely that there are more). [1] states that > > > > > "In general features are intended to be supported indefinitely once introduced > > > > > into QEMU", to me that makes very much sense because the opposite would mean > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be > > > > > safer to add functionality instead of removing (and riscing the removal of use > > > > > cases/features). Luckily I still believe in this case that it will be easier to > > > > > go forward (even if I also agree on what you are saying below about what I > > > > > proposed). > > > > > > > > > > > > > Even if the implementation is buggy and we need to keep the buggy > > > > implementation forever? I think that's why > > > > qemu/docs/system/deprecated.rst was created for deprecating such > > > > feature. > > > > > > With the RFC I posted all commands in m25p80 are working for both the case 1 > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). > > > Because of this, I, with all respect, will have to disagree that this is buggy. > > > > Well, the existing m25p80 implementation that uses dummy cycle > > accuracy for those flashes prevents all SPI controllers that use tx > > fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > > > > > > > > > I called it "seriously broken" because current implementation only > > > > > > > > considered one type of SPI controllers while completely ignoring the > > > > > > > > other type. > > > > > > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > > > > > commands a certain way and provides an API that the SPI controllers need to > > > > > > > implement for interacting with it. It is true that there are SPI controllers > > > > > > > referred to above that do not support the portion of that API that corresponds > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > > > > > broken since there is also one SPI controller that has a working implementation > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > > > > > will still be honored as in the same time making it possible to have full > > > > > > > support for the API in the SPI controllers that currently do not (please reread > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this > > > > > > > as win/win situation, also because no controller should need modifications. > > > > > > > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new device > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > > > > > a property to the flash itself, as the behavior is tightly coupled to > > > > > > how the SPI controller works. > > > > > > > > > > I agree on above. I decided though that instead of posting sample code in here > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step. > > > > > > > > > > > > > Wait, (see below) > > > > > > > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or > > > > > > generated by the controller automatically. Please read the example > > > > > > given in: > > > > > > > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > > > > > Command (EBh) > > > > > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > > > > > true when working with the Xilinx GQSPI controller, you are bound to > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles, > > > > > > and this is wrong. > > > > > > > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned above > > > > your proposal cannot support the complicated controller like Xilinx > > > > GQSPI. Please read the example of table 24-22. With your RFC, you > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle > > > > generation, which is wrong. > > > > > > > > > > First, thank you very much for looking into the RFC series, very much > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 > > > locations in the file, in 1 location the transfer referred to above is done, in > > > another location the transfer through the txfifo is done. The location where > > > transfer referred to above is done will not need any modifications (and will > > > thus work equally well as it does currently). > > > > Please explain this a little bit. How does your RFC series handle > > cases as described in table 24-22, where the 6 dummy cycles are split > > into 2 transfers, with one transfer using tx fifo, and the other one > > using hardware dummy cycle generation? > > Sorry, I missunderstod. You are right, that won't work. +Edgar E. Iglesias So it looks by far the only way to implement dummy cycles correctly to work with all SPI controller models is what I proposed here in this patch series. Maintainers are quite silent, so I would like to hear your thoughts. @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you please share your thoughts since you are the one who reviewed the existing dummy implementation (based on commits history) Regards, Bin
On Mon, Feb 8, 2021 at 3:42 PM Bin Meng <bmeng.cn@gmail.com> wrote: > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Hi Bin, > > > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > > > Hi Francisco, > > > > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Dear Bin, > > > > > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > > > > Hi Francisco, > > > > > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > > > > Hi Francisco, > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate > how many follow-up > > > > > > > > > > > > > bytes are expected to be received after it > receives a command. For > > > > > > > > > > > > > example, depending on the address mode, either > 3-byte address or > > > > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles > are required after > > > > > > > > > > > > > sending the address bytes, and the dummy cycles > need to be counted > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the > unit is in byte. > > > > > > > > > > > > > It is not in bit, or cycle. However for some > reason the model has > > > > > > > > > > > > > been using the number of dummy cycles for > s->needed_bytes. The right > > > > > > > > > > > > > approach is to convert the number of dummy cycles > to bytes based on > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for > the Fast Read Quad > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the > formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > > > > > > > While not being the original implementor I must > assume that above solution was > > > > > > > > > > > > considered but not chosen by the developers due to > it is inaccuracy (it > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, > only a multiple of 8, > > > > > > > > > > > > meaning that if the controller is wrongly programmed > to generate 7 the error > > > > > > > > > > > > wouldn't be caught and the controller will still be > considered "correct"). Now > > > > > > > > > > > > that we have this detail in the implementation I'm > in favor of keeping it, this > > > > > > > > > > > > also because the detail is already in use for > catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my > proposed solution here > > > > > > > > > > > was ever considered, otherwise all SPI controller > models supporting > > > > > > > > > > > software generation should have been found out > seriously broken long > > > > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support > for commands requiring > > > > > > > > > > dummy clock cycles but I really hope they work with the > other commands? If so I > > > > > > > > > > > > > > > > > > I am not sure why you view dummy clock cycles as something > special > > > > > > > > > that needs some special support from the SPI controller. > For the case > > > > > > > > > 1 controller, it's nothing special from the controller > perspective, > > > > > > > > > just like sending out a command, or address bytes, or > data. The > > > > > > > > > controller just shifts data bit by bit from its tx fifo > and that's it. > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can > either be > > > > > > > > > sent via a regular data (the case 1 controller) in the tx > fifo, or > > > > > > > > > automatically generated (case 2 controller) by the > hardware. > > > > > > > > > > > > > > > > Ok, I'll try to explain my view point a little differently. > For that we also > > > > > > > > need to keep in mind that QEMU models HW, and any binary > that runs on a HW > > > > > > > > board supported in QEMU should ideally run on that board > inside QEMU aswell > > > > > > > > (this can be a bare metal application equaly well as a > modified u-boot/Linux > > > > > > > > using SPI commands with a non multiple of 8 number of dummy > clock cycles). > > > > > > > > > > > > > > > > Once functionality has been introduced into QEMU it is not > easy to know which > > > > > > > > intentional or untentional features provided by the > functionality are being > > > > > > > > used by users. One of the (perhaps not well known) features > I'm aware of that > > > > > > > > is in use and is provided by the accurate dummy clock cycle > modeling inside > > > > > > > > m25p80 is the be ability to test drivers accurately > regarding the dummy clock > > > > > > > > cycles (even when using commands with a non-multiple of 8 > number of dummy clock > > > > > > > > cycles), but there might be others aswell. So by removing > this functionality > > > > > > > > above use case will brake, this since those test will not be > reliable. > > > > > > > > Furthermore, since users tend to be creative it is not > possible to know if > > > > > > > > there are other use cases that will be affected. This means > that in case [1] > > > > > > > > needs to be followed the safe path is to add functionality > instead of removing. > > > > > > > > Luckily it also easier in this case, see below. > > > > > > > > > > > > > > I understand there might be users other than U-Boot/Linux that > use an > > > > > > > odd number of dummy bits (not multiple of 8). If your concern > was > > > > > > > about model behavior changes, sure I can update > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes > in the > > > > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > > > > > > > Yes, something like that. My concern is that since this > functionality has been > > > > > > in tree for while, users have found known or unknown features > that got > > > > > > introduced by it. By removing the functionality (and the > known/uknown features) > > > > > > we are riscing to brake our user's use cases (currently I'm > aware of one > > > > > > feature/use case but it is not unlikely that there are more). > [1] states that > > > > > > "In general features are intended to be supported indefinitely > once introduced > > > > > > into QEMU", to me that makes very much sense because the > opposite would mean > > > > > > that we were not reliable. So in case [1] needs to be honored it > looks to be > > > > > > safer to add functionality instead of removing (and riscing the > removal of use > > > > > > cases/features). Luckily I still believe in this case that it > will be easier to > > > > > > go forward (even if I also agree on what you are saying below > about what I > > > > > > proposed). > > > > > > > > > > > > > > > > Even if the implementation is buggy and we need to keep the buggy > > > > > implementation forever? I think that's why > > > > > qemu/docs/system/deprecated.rst was created for deprecating such > > > > > feature. > > > > > > > > With the RFC I posted all commands in m25p80 are working for both > the case 1 > > > > controller (using a txfifo) and the case 2 controller (no txfifo, as > GQSPI). > > > > Because of this, I, with all respect, will have to disagree that > this is buggy. > > > > > > Well, the existing m25p80 implementation that uses dummy cycle > > > accuracy for those flashes prevents all SPI controllers that use tx > > > fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' > (and else we should > > > > > > > > > > probably let the maintainers know about it). Most likely > the lack of support > > > > > > > > > > > > > > > > > > I called it "seriously broken" because current > implementation only > > > > > > > > > considered one type of SPI controllers while completely > ignoring the > > > > > > > > > other type. > > > > > > > > > > > > > > > > If we change view and see this from the perspective of > m25p80, it models the > > > > > > > > commands a certain way and provides an API that the SPI > controllers need to > > > > > > > > implement for interacting with it. It is true that there are > SPI controllers > > > > > > > > referred to above that do not support the portion of that > API that corresponds > > > > > > > > to commands with dummy clock cycles, but I don't think it is > true that this is > > > > > > > > broken since there is also one SPI controller that has a > working implementation > > > > > > > > of m25p80's full API also when transfering through a tx fifo > (use case 1). But > > > > > > > > as mentioned above, by doing a minor extension and > improvement to m25p80's API > > > > > > > > and allow for toggling the accuracy from dummy clock cycles > to dummy bytes [1] > > > > > > > > will still be honored as in the same time making it possible > to have full > > > > > > > > support for the API in the SPI controllers that currently do > not (please reread > > > > > > > > the proposal in my previous reply that attempts to do this). > I myself see this > > > > > > > > as win/win situation, also because no controller should need > modifications. > > > > > > > > > > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new > device > > > > > > > property 'model_dummy_bytes' to select to convert the accurate > dummy > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to > justify as > > > > > > > a property to the flash itself, as the behavior is tightly > coupled to > > > > > > > how the SPI controller works. > > > > > > > > > > > > I agree on above. I decided though that instead of posting > sample code in here > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc > you. About below, > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first > step. > > > > > > > > > > > > > > > > Wait, (see below) > > > > > > > > > > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which > supports both > > > > > > > use cases, that the dummy cycles can be transferred via tx > fifo, or > > > > > > > generated by the controller automatically. Please read the > example > > > > > > > given in: > > > > > > > > > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad > I/O Read > > > > > > > Command (EBh) > > > > > > > > > > > > > > in > https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > > > > > > > If you choose to set the m25p80 device property > 'model_dummy_bytes' to > > > > > > > true when working with the Xilinx GQSPI controller, you are > bound to > > > > > > > only allow guest software to use tx fifo to transfer the dummy > cycles, > > > > > > > and this is wrong. > > > > > > > > > > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned > above > > > > > your proposal cannot support the complicated controller like Xilinx > > > > > GQSPI. Please read the example of table 24-22. With your RFC, you > > > > > mandate guest software's GQSPI driver to only use hardware dummy > cycle > > > > > generation, which is wrong. > > > > > > > > > > > > > First, thank you very much for looking into the RFC series, very much > > > > appreciated. Secondly, about above, the GQSPI model in QEMU > transfers from 2 > > > > locations in the file, in 1 location the transfer referred to above > is done, in > > > > another location the transfer through the txfifo is done. The > location where > > > > transfer referred to above is done will not need any modifications > (and will > > > > thus work equally well as it does currently). > > > > > > Please explain this a little bit. How does your RFC series handle > > > cases as described in table 24-22, where the 6 dummy cycles are split > > > into 2 transfers, with one transfer using tx fifo, and the other one > > > using hardware dummy cycle generation? > > > > Sorry, I missunderstod. You are right, that won't work. > > +Edgar E. Iglesias > > So it looks by far the only way to implement dummy cycles correctly to > work with all SPI controller models is what I proposed here in this > patch series. > > Maintainers are quite silent, so I would like to hear your thoughts. > > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you > please share your thoughts since you are the one who reviewed the > existing dummy implementation (based on commits history) > > Francisco really knows this stuff better than me.... I would tend to agree that it's unfortunate to model things in cycles, if we could abstract things at a higher level that would be nice. Without breaking existing use-cases. Francisco, is it impossible to bring up the abstraction level to bytes and keep existing use-cases? We have a bunch of test-cases, We'll publish some of them in source code, others we can't publish since they use proprietary SW we're not allowed to publish at all, but we can run tests and Ack if things work. Best regards, Edgar
Hello Edgar, On [2021 Feb 08] Mon 16:30:00, Edgar E. Iglesias wrote: > On Mon, Feb 8, 2021 at 3:42 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Hi Bin, > > > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > > > Hi Francisco, > > > > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Dear Bin, > > > > > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > > > > Hi Francisco, > > > > > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > > > > Hi Francisco, > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to > indicate how many follow-up > > > > > > > > > > > > > bytes are expected to be received after it > receives a command. For > > > > > > > > > > > > > example, depending on the address mode, either > 3-byte address or > > > > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles > are required after > > > > > > > > > > > > > sending the address bytes, and the dummy cycles > need to be counted > > > > > > > > > > > > > in s->needed_bytes. This is where the mess > began. > > > > > > > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, > the unit is in byte. > > > > > > > > > > > > > It is not in bit, or cycle. However for some > reason the model has > > > > > > > > > > > > > been using the number of dummy cycles for > s->needed_bytes. The right > > > > > > > > > > > > > approach is to convert the number of dummy > cycles to bytes based on > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles > for the Fast Read Quad > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the > formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > > > > > > > While not being the original implementor I must > assume that above solution was > > > > > > > > > > > > considered but not chosen by the developers due to > it is inaccuracy (it > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy > cycles, only a multiple of 8, > > > > > > > > > > > > meaning that if the controller is wrongly > programmed to generate 7 the error > > > > > > > > > > > > wouldn't be caught and the controller will still > be considered "correct"). Now > > > > > > > > > > > > that we have this detail in the implementation I'm > in favor of keeping it, this > > > > > > > > > > > > also because the detail is already in use for > catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my > proposed solution here > > > > > > > > > > > was ever considered, otherwise all SPI controller > models supporting > > > > > > > > > > > software generation should have been found out > seriously broken long > > > > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack > support for commands requiring > > > > > > > > > > dummy clock cycles but I really hope they work with > the other commands? If so I > > > > > > > > > > > > > > > > > > I am not sure why you view dummy clock cycles as > something special > > > > > > > > > that needs some special support from the SPI controller. > For the case > > > > > > > > > 1 controller, it's nothing special from the controller > perspective, > > > > > > > > > just like sending out a command, or address bytes, or > data. The > > > > > > > > > controller just shifts data bit by bit from its tx fifo > and that's it. > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles > can either be > > > > > > > > > sent via a regular data (the case 1 controller) in the > tx fifo, or > > > > > > > > > automatically generated (case 2 controller) by the > hardware. > > > > > > > > > > > > > > > > Ok, I'll try to explain my view point a little > differently. For that we also > > > > > > > > need to keep in mind that QEMU models HW, and any binary > that runs on a HW > > > > > > > > board supported in QEMU should ideally run on that board > inside QEMU aswell > > > > > > > > (this can be a bare metal application equaly well as a > modified u-boot/Linux > > > > > > > > using SPI commands with a non multiple of 8 number of > dummy clock cycles). > > > > > > > > > > > > > > > > Once functionality has been introduced into QEMU it is not > easy to know which > > > > > > > > intentional or untentional features provided by the > functionality are being > > > > > > > > used by users. One of the (perhaps not well known) > features I'm aware of that > > > > > > > > is in use and is provided by the accurate dummy clock > cycle modeling inside > > > > > > > > m25p80 is the be ability to test drivers accurately > regarding the dummy clock > > > > > > > > cycles (even when using commands with a non-multiple of 8 > number of dummy clock > > > > > > > > cycles), but there might be others aswell. So by removing > this functionality > > > > > > > > above use case will brake, this since those test will not > be reliable. > > > > > > > > Furthermore, since users tend to be creative it is not > possible to know if > > > > > > > > there are other use cases that will be affected. This > means that in case [1] > > > > > > > > needs to be followed the safe path is to add functionality > instead of removing. > > > > > > > > Luckily it also easier in this case, see below. > > > > > > > > > > > > > > I understand there might be users other than U-Boot/Linux > that use an > > > > > > > odd number of dummy bits (not multiple of 8). If your > concern was > > > > > > > about model behavior changes, sure I can update > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes > in the > > > > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > > > > > > > Yes, something like that. My concern is that since this > functionality has been > > > > > > in tree for while, users have found known or unknown features > that got > > > > > > introduced by it. By removing the functionality (and the > known/uknown features) > > > > > > we are riscing to brake our user's use cases (currently I'm > aware of one > > > > > > feature/use case but it is not unlikely that there are more). > [1] states that > > > > > > "In general features are intended to be supported indefinitely > once introduced > > > > > > into QEMU", to me that makes very much sense because the > opposite would mean > > > > > > that we were not reliable. So in case [1] needs to be honored > it looks to be > > > > > > safer to add functionality instead of removing (and riscing > the removal of use > > > > > > cases/features). Luckily I still believe in this case that it > will be easier to > > > > > > go forward (even if I also agree on what you are saying below > about what I > > > > > > proposed). > > > > > > > > > > > > > > > > Even if the implementation is buggy and we need to keep the > buggy > > > > > implementation forever? I think that's why > > > > > qemu/docs/system/deprecated.rst was created for deprecating such > > > > > feature. > > > > > > > > With the RFC I posted all commands in m25p80 are working for both > the case 1 > > > > controller (using a txfifo) and the case 2 controller (no txfifo, > as GQSPI). > > > > Because of this, I, with all respect, will have to disagree that > this is buggy. > > > > > > Well, the existing m25p80 implementation that uses dummy cycle > > > accuracy for those flashes prevents all SPI controllers that use tx > > > fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' > (and else we should > > > > > > > > > > probably let the maintainers know about it). Most > likely the lack of support > > > > > > > > > > > > > > > > > > I called it "seriously broken" because current > implementation only > > > > > > > > > considered one type of SPI controllers while completely > ignoring the > > > > > > > > > other type. > > > > > > > > > > > > > > > > If we change view and see this from the perspective of > m25p80, it models the > > > > > > > > commands a certain way and provides an API that the SPI > controllers need to > > > > > > > > implement for interacting with it. It is true that there > are SPI controllers > > > > > > > > referred to above that do not support the portion of that > API that corresponds > > > > > > > > to commands with dummy clock cycles, but I don't think it > is true that this is > > > > > > > > broken since there is also one SPI controller that has a > working implementation > > > > > > > > of m25p80's full API also when transfering through a tx > fifo (use case 1). But > > > > > > > > as mentioned above, by doing a minor extension and > improvement to m25p80's API > > > > > > > > and allow for toggling the accuracy from dummy clock > cycles to dummy bytes [1] > > > > > > > > will still be honored as in the same time making it > possible to have full > > > > > > > > support for the API in the SPI controllers that currently > do not (please reread > > > > > > > > the proposal in my previous reply that attempts to do > this). I myself see this > > > > > > > > as win/win situation, also because no controller should > need modifications. > > > > > > > > > > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new > device > > > > > > > property 'model_dummy_bytes' to select to convert the > accurate dummy > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to > justify as > > > > > > > a property to the flash itself, as the behavior is tightly > coupled to > > > > > > > how the SPI controller works. > > > > > > > > > > > > I agree on above. I decided though that instead of posting > sample code in here > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc > you. About below, > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first > step. > > > > > > > > > > > > > > > > Wait, (see below) > > > > > > > > > > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which > supports both > > > > > > > use cases, that the dummy cycles can be transferred via tx > fifo, or > > > > > > > generated by the controller automatically. Please read the > example > > > > > > > given in: > > > > > > > > > > > > > > table 24‐22, an example of Generic FIFO Contents for > Quad I/O Read > > > > > > > Command (EBh) > > > > > > > > > > > > > > in > https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > > > > > > > If you choose to set the m25p80 device property > 'model_dummy_bytes' to > > > > > > > true when working with the Xilinx GQSPI controller, you are > bound to > > > > > > > only allow guest software to use tx fifo to transfer the > dummy cycles, > > > > > > > and this is wrong. > > > > > > > > > > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned > above > > > > > your proposal cannot support the complicated controller like > Xilinx > > > > > GQSPI. Please read the example of table 24-22. With your RFC, > you > > > > > mandate guest software's GQSPI driver to only use hardware dummy > cycle > > > > > generation, which is wrong. > > > > > > > > > > > > > First, thank you very much for looking into the RFC series, very > much > > > > appreciated. Secondly, about above, the GQSPI model in QEMU > transfers from 2 > > > > locations in the file, in 1 location the transfer referred to > above is done, in > > > > another location the transfer through the txfifo is done. The > location where > > > > transfer referred to above is done will not need any modifications > (and will > > > > thus work equally well as it does currently). > > > > > > Please explain this a little bit. How does your RFC series handle > > > cases as described in table 24-22, where the 6 dummy cycles are > split > > > into 2 transfers, with one transfer using tx fifo, and the other one > > > using hardware dummy cycle generation? > > > > Sorry, I missunderstod. You are right, that won't work. > > +Edgar E. Iglesias > > So it looks by far the only way to implement dummy cycles correctly to > work with all SPI controller models is what I proposed here in this > patch series. > > Maintainers are quite silent, so I would like to hear your thoughts. > > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you > please share your thoughts since you are the one who reviewed the > existing dummy implementation (based on commits history) > > Francisco really knows this stuff better than me.... > I would tend to agree that it's unfortunate to model things in cycles, if > we could abstract things at a higher level that would be nice. Without > breaking existing use-cases. > Francisco, is it impossible to bring up the abstraction level to bytes and > keep existing use-cases? Great question, I'm leaning on that it shouldn't be impossible to be honest (but I haven't been able to try anything yet though). Best regards, Francisco Iglesias > We have a bunch of test-cases, We'll publish some of them in source code, > others we can't publish since they use proprietary SW we're not allowed to > publish at all, but we can run tests and Ack if things work. > Best regards, > Edgar
On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias > <frasse.iglesias@gmail.com> wrote: > > > > Hi Bin, > > > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > > > Hi Francisco, > > > > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Dear Bin, > > > > > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > > > > Hi Francisco, > > > > > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > > > > Hi Francisco, > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > > > > > > software generation should have been found out seriously broken long > > > > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > > > > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > > > > > > that needs some special support from the SPI controller. For the case > > > > > > > > > 1 controller, it's nothing special from the controller perspective, > > > > > > > > > just like sending out a command, or address bytes, or data. The > > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > > > > > > automatically generated (case 2 controller) by the hardware. > > > > > > > > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > > > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > > > > > > intentional or untentional features provided by the functionality are being > > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > > > > > > cycles), but there might be others aswell. So by removing this functionality > > > > > > > > above use case will brake, this since those test will not be reliable. > > > > > > > > Furthermore, since users tend to be creative it is not possible to know if > > > > > > > > there are other use cases that will be affected. This means that in case [1] > > > > > > > > needs to be followed the safe path is to add functionality instead of removing. > > > > > > > > Luckily it also easier in this case, see below. > > > > > > > > > > > > > > I understand there might be users other than U-Boot/Linux that use an > > > > > > > odd number of dummy bits (not multiple of 8). If your concern was > > > > > > > about model behavior changes, sure I can update > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > > > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > > > > > > > Yes, something like that. My concern is that since this functionality has been > > > > > > in tree for while, users have found known or unknown features that got > > > > > > introduced by it. By removing the functionality (and the known/uknown features) > > > > > > we are riscing to brake our user's use cases (currently I'm aware of one > > > > > > feature/use case but it is not unlikely that there are more). [1] states that > > > > > > "In general features are intended to be supported indefinitely once introduced > > > > > > into QEMU", to me that makes very much sense because the opposite would mean > > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be > > > > > > safer to add functionality instead of removing (and riscing the removal of use > > > > > > cases/features). Luckily I still believe in this case that it will be easier to > > > > > > go forward (even if I also agree on what you are saying below about what I > > > > > > proposed). > > > > > > > > > > > > > > > > Even if the implementation is buggy and we need to keep the buggy > > > > > implementation forever? I think that's why > > > > > qemu/docs/system/deprecated.rst was created for deprecating such > > > > > feature. > > > > > > > > With the RFC I posted all commands in m25p80 are working for both the case 1 > > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). > > > > Because of this, I, with all respect, will have to disagree that this is buggy. > > > > > > Well, the existing m25p80 implementation that uses dummy cycle > > > accuracy for those flashes prevents all SPI controllers that use tx > > > fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > > > > > > > > > > > I called it "seriously broken" because current implementation only > > > > > > > > > considered one type of SPI controllers while completely ignoring the > > > > > > > > > other type. > > > > > > > > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > > > > > > commands a certain way and provides an API that the SPI controllers need to > > > > > > > > implement for interacting with it. It is true that there are SPI controllers > > > > > > > > referred to above that do not support the portion of that API that corresponds > > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > > > > > > broken since there is also one SPI controller that has a working implementation > > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > > > > > > will still be honored as in the same time making it possible to have full > > > > > > > > support for the API in the SPI controllers that currently do not (please reread > > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this > > > > > > > > as win/win situation, also because no controller should need modifications. > > > > > > > > > > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new device > > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > > > > > > a property to the flash itself, as the behavior is tightly coupled to > > > > > > > how the SPI controller works. > > > > > > > > > > > > I agree on above. I decided though that instead of posting sample code in here > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step. > > > > > > > > > > > > > > > > Wait, (see below) > > > > > > > > > > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or > > > > > > > generated by the controller automatically. Please read the example > > > > > > > given in: > > > > > > > > > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > > > > > > Command (EBh) > > > > > > > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > > > > > > true when working with the Xilinx GQSPI controller, you are bound to > > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles, > > > > > > > and this is wrong. > > > > > > > > > > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned above > > > > > your proposal cannot support the complicated controller like Xilinx > > > > > GQSPI. Please read the example of table 24-22. With your RFC, you > > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle > > > > > generation, which is wrong. > > > > > > > > > > > > > First, thank you very much for looking into the RFC series, very much > > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 > > > > locations in the file, in 1 location the transfer referred to above is done, in > > > > another location the transfer through the txfifo is done. The location where > > > > transfer referred to above is done will not need any modifications (and will > > > > thus work equally well as it does currently). > > > > > > Please explain this a little bit. How does your RFC series handle > > > cases as described in table 24-22, where the 6 dummy cycles are split > > > into 2 transfers, with one transfer using tx fifo, and the other one > > > using hardware dummy cycle generation? > > > > Sorry, I missunderstod. You are right, that won't work. > > +Edgar E. Iglesias > > So it looks by far the only way to implement dummy cycles correctly to > work with all SPI controller models is what I proposed here in this > patch series. > > Maintainers are quite silent, so I would like to hear your thoughts. > > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you > please share your thoughts since you are the one who reviewed the > existing dummy implementation (based on commits history) Hello maintainers, We apparently missed the 6.0 window to address this mess of the m25p80 model. Please provide your inputs on this before I start working on the v2. Regards, Bin
On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias > > <frasse.iglesias@gmail.com> wrote: > > > > > > Hi Bin, > > > > > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > > > > Hi Francisco, > > > > > > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > Dear Bin, > > > > > > > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > > > > > Hi Francisco, > > > > > > > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > > > > > > > software generation should have been found out seriously broken long > > > > > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > > > > > > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > > > > > > > that needs some special support from the SPI controller. For the case > > > > > > > > > > 1 controller, it's nothing special from the controller perspective, > > > > > > > > > > just like sending out a command, or address bytes, or data. The > > > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > > > > > > > automatically generated (case 2 controller) by the hardware. > > > > > > > > > > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > > > > > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > > > > > > > intentional or untentional features provided by the functionality are being > > > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > > > > > > > cycles), but there might be others aswell. So by removing this functionality > > > > > > > > > above use case will brake, this since those test will not be reliable. > > > > > > > > > Furthermore, since users tend to be creative it is not possible to know if > > > > > > > > > there are other use cases that will be affected. This means that in case [1] > > > > > > > > > needs to be followed the safe path is to add functionality instead of removing. > > > > > > > > > Luckily it also easier in this case, see below. > > > > > > > > > > > > > > > > I understand there might be users other than U-Boot/Linux that use an > > > > > > > > odd number of dummy bits (not multiple of 8). If your concern was > > > > > > > > about model behavior changes, sure I can update > > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > > > > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > > > > > > > > > Yes, something like that. My concern is that since this functionality has been > > > > > > > in tree for while, users have found known or unknown features that got > > > > > > > introduced by it. By removing the functionality (and the known/uknown features) > > > > > > > we are riscing to brake our user's use cases (currently I'm aware of one > > > > > > > feature/use case but it is not unlikely that there are more). [1] states that > > > > > > > "In general features are intended to be supported indefinitely once introduced > > > > > > > into QEMU", to me that makes very much sense because the opposite would mean > > > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be > > > > > > > safer to add functionality instead of removing (and riscing the removal of use > > > > > > > cases/features). Luckily I still believe in this case that it will be easier to > > > > > > > go forward (even if I also agree on what you are saying below about what I > > > > > > > proposed). > > > > > > > > > > > > > > > > > > > Even if the implementation is buggy and we need to keep the buggy > > > > > > implementation forever? I think that's why > > > > > > qemu/docs/system/deprecated.rst was created for deprecating such > > > > > > feature. > > > > > > > > > > With the RFC I posted all commands in m25p80 are working for both the case 1 > > > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). > > > > > Because of this, I, with all respect, will have to disagree that this is buggy. > > > > > > > > Well, the existing m25p80 implementation that uses dummy cycle > > > > accuracy for those flashes prevents all SPI controllers that use tx > > > > fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > > > > > > > > > > > > > I called it "seriously broken" because current implementation only > > > > > > > > > > considered one type of SPI controllers while completely ignoring the > > > > > > > > > > other type. > > > > > > > > > > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > > > > > > > commands a certain way and provides an API that the SPI controllers need to > > > > > > > > > implement for interacting with it. It is true that there are SPI controllers > > > > > > > > > referred to above that do not support the portion of that API that corresponds > > > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > > > > > > > broken since there is also one SPI controller that has a working implementation > > > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > > > > > > > will still be honored as in the same time making it possible to have full > > > > > > > > > support for the API in the SPI controllers that currently do not (please reread > > > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this > > > > > > > > > as win/win situation, also because no controller should need modifications. > > > > > > > > > > > > > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new device > > > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy > > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > > > > > > > a property to the flash itself, as the behavior is tightly coupled to > > > > > > > > how the SPI controller works. > > > > > > > > > > > > > > I agree on above. I decided though that instead of posting sample code in here > > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step. > > > > > > > > > > > > > > > > > > > Wait, (see below) > > > > > > > > > > > > > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or > > > > > > > > generated by the controller automatically. Please read the example > > > > > > > > given in: > > > > > > > > > > > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > > > > > > > Command (EBh) > > > > > > > > > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > > > > > > > true when working with the Xilinx GQSPI controller, you are bound to > > > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles, > > > > > > > > and this is wrong. > > > > > > > > > > > > > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned above > > > > > > your proposal cannot support the complicated controller like Xilinx > > > > > > GQSPI. Please read the example of table 24-22. With your RFC, you > > > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle > > > > > > generation, which is wrong. > > > > > > > > > > > > > > > > First, thank you very much for looking into the RFC series, very much > > > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 > > > > > locations in the file, in 1 location the transfer referred to above is done, in > > > > > another location the transfer through the txfifo is done. The location where > > > > > transfer referred to above is done will not need any modifications (and will > > > > > thus work equally well as it does currently). > > > > > > > > Please explain this a little bit. How does your RFC series handle > > > > cases as described in table 24-22, where the 6 dummy cycles are split > > > > into 2 transfers, with one transfer using tx fifo, and the other one > > > > using hardware dummy cycle generation? > > > > > > Sorry, I missunderstod. You are right, that won't work. > > > > +Edgar E. Iglesias > > > > So it looks by far the only way to implement dummy cycles correctly to > > work with all SPI controller models is what I proposed here in this > > patch series. > > > > Maintainers are quite silent, so I would like to hear your thoughts. > > > > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you > > please share your thoughts since you are the one who reviewed the > > existing dummy implementation (based on commits history) I agree with Edgar, in that Francisco and Bin know this better than me and that modelling things in cycles is a pain. As Bin points out it seems like currently we should be modelling bytes (from the variable name) so it makes sense to keep it in bytes. I would be in favour of this series in that case. Do we know what use cases this will break? I know it's hard to answer but I don't think there are too many SSI users in QEMU so it might not be too hard to test most of the possible use cases. Alistair > > Hello maintainers, > > We apparently missed the 6.0 window to address this mess of the m25p80 > model. Please provide your inputs on this before I start working on > the v2. > > Regards, > Bin >
On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote: > On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > Hi Bin, > > > > > > > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > > > > > Hi Francisco, > > > > > > > > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > Dear Bin, > > > > > > > > > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > > > > > > > Hi Francisco, > > > > > > > > > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > > > > > > > > > > > > > Hi Francisco, > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > > > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin, > > > > > > > > > > > > > > > > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > > > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up > > > > > > > > > > > > > > > bytes are expected to be received after it receives a command. For > > > > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or > > > > > > > > > > > > > > > 4-byte address is needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after > > > > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted > > > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte. > > > > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has > > > > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right > > > > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on > > > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > > > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > > > > > > > > > > > > > > > > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was > > > > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it > > > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > > > > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error > > > > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now > > > > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this > > > > > > > > > > > > > > also because the detail is already in use for catching exactly above error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here > > > > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting > > > > > > > > > > > > > software generation should have been found out seriously broken long > > > > > > > > > > > > > time ago! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring > > > > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I > > > > > > > > > > > > > > > > > > > > > > I am not sure why you view dummy clock cycles as something special > > > > > > > > > > > that needs some special support from the SPI controller. For the case > > > > > > > > > > > 1 controller, it's nothing special from the controller perspective, > > > > > > > > > > > just like sending out a command, or address bytes, or data. The > > > > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it. > > > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be > > > > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or > > > > > > > > > > > automatically generated (case 2 controller) by the hardware. > > > > > > > > > > > > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also > > > > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW > > > > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell > > > > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux > > > > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles). > > > > > > > > > > > > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which > > > > > > > > > > intentional or untentional features provided by the functionality are being > > > > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that > > > > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside > > > > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock > > > > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock > > > > > > > > > > cycles), but there might be others aswell. So by removing this functionality > > > > > > > > > > above use case will brake, this since those test will not be reliable. > > > > > > > > > > Furthermore, since users tend to be creative it is not possible to know if > > > > > > > > > > there are other use cases that will be affected. This means that in case [1] > > > > > > > > > > needs to be followed the safe path is to add functionality instead of removing. > > > > > > > > > > Luckily it also easier in this case, see below. > > > > > > > > > > > > > > > > > > I understand there might be users other than U-Boot/Linux that use an > > > > > > > > > odd number of dummy bits (not multiple of 8). If your concern was > > > > > > > > > about model behavior changes, sure I can update > > > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the > > > > > > > > > m25p80 model now implement dummy cycles as bytes. > > > > > > > > > > > > > > > > Yes, something like that. My concern is that since this functionality has been > > > > > > > > in tree for while, users have found known or unknown features that got > > > > > > > > introduced by it. By removing the functionality (and the known/uknown features) > > > > > > > > we are riscing to brake our user's use cases (currently I'm aware of one > > > > > > > > feature/use case but it is not unlikely that there are more). [1] states that > > > > > > > > "In general features are intended to be supported indefinitely once introduced > > > > > > > > into QEMU", to me that makes very much sense because the opposite would mean > > > > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be > > > > > > > > safer to add functionality instead of removing (and riscing the removal of use > > > > > > > > cases/features). Luckily I still believe in this case that it will be easier to > > > > > > > > go forward (even if I also agree on what you are saying below about what I > > > > > > > > proposed). > > > > > > > > > > > > > > > > > > > > > > Even if the implementation is buggy and we need to keep the buggy > > > > > > > implementation forever? I think that's why > > > > > > > qemu/docs/system/deprecated.rst was created for deprecating such > > > > > > > feature. > > > > > > > > > > > > With the RFC I posted all commands in m25p80 are working for both the case 1 > > > > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). > > > > > > Because of this, I, with all respect, will have to disagree that this is buggy. > > > > > > > > > > Well, the existing m25p80 implementation that uses dummy cycle > > > > > accuracy for those flashes prevents all SPI controllers that use tx > > > > > fifo to work with those flashes. Hence it is buggy. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should > > > > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support > > > > > > > > > > > > > > > > > > > > > > I called it "seriously broken" because current implementation only > > > > > > > > > > > considered one type of SPI controllers while completely ignoring the > > > > > > > > > > > other type. > > > > > > > > > > > > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the > > > > > > > > > > commands a certain way and provides an API that the SPI controllers need to > > > > > > > > > > implement for interacting with it. It is true that there are SPI controllers > > > > > > > > > > referred to above that do not support the portion of that API that corresponds > > > > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is > > > > > > > > > > broken since there is also one SPI controller that has a working implementation > > > > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But > > > > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API > > > > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > > > > > > > > > > will still be honored as in the same time making it possible to have full > > > > > > > > > > support for the API in the SPI controllers that currently do not (please reread > > > > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this > > > > > > > > > > as win/win situation, also because no controller should need modifications. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am afraid your proposal does not work. Your proposed new device > > > > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy > > > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as > > > > > > > > > a property to the flash itself, as the behavior is tightly coupled to > > > > > > > > > how the SPI controller works. > > > > > > > > > > > > > > > > I agree on above. I decided though that instead of posting sample code in here > > > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > > > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step. > > > > > > > > > > > > > > > > > > > > > > Wait, (see below) > > > > > > > > > > > > > > > > > > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both > > > > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or > > > > > > > > > generated by the controller automatically. Please read the example > > > > > > > > > given in: > > > > > > > > > > > > > > > > > > table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > > > > > > > > > Command (EBh) > > > > > > > > > > > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > > > > > > > > > > > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to > > > > > > > > > true when working with the Xilinx GQSPI controller, you are bound to > > > > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles, > > > > > > > > > and this is wrong. > > > > > > > > > > > > > > > > > > > > > > > You missed this part. I looked at your RFC, and as I mentioned above > > > > > > > your proposal cannot support the complicated controller like Xilinx > > > > > > > GQSPI. Please read the example of table 24-22. With your RFC, you > > > > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle > > > > > > > generation, which is wrong. > > > > > > > > > > > > > > > > > > > First, thank you very much for looking into the RFC series, very much > > > > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 > > > > > > locations in the file, in 1 location the transfer referred to above is done, in > > > > > > another location the transfer through the txfifo is done. The location where > > > > > > transfer referred to above is done will not need any modifications (and will > > > > > > thus work equally well as it does currently). > > > > > > > > > > Please explain this a little bit. How does your RFC series handle > > > > > cases as described in table 24-22, where the 6 dummy cycles are split > > > > > into 2 transfers, with one transfer using tx fifo, and the other one > > > > > using hardware dummy cycle generation? > > > > > > > > Sorry, I missunderstod. You are right, that won't work. > > > > > > +Edgar E. Iglesias > > > > > > So it looks by far the only way to implement dummy cycles correctly to > > > work with all SPI controller models is what I proposed here in this > > > patch series. > > > > > > Maintainers are quite silent, so I would like to hear your thoughts. > > > > > > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you > > > please share your thoughts since you are the one who reviewed the > > > existing dummy implementation (based on commits history) > > I agree with Edgar, in that Francisco and Bin know this better than me > and that modelling things in cycles is a pain. Hi Alistair, > > As Bin points out it seems like currently we should be modelling bytes > (from the variable name) so it makes sense to keep it in bytes. I > would be in favour of this series in that case. Do we know what use > cases this will break? I know it's hard to answer but I don't think > there are too many SSI users in QEMU so it might not be too hard to > test most of the possible use cases. The use case I'm aware of is regression testing of drivers. Ex: if a driver is using 10 dummy clock cycles with the commands and a patch accidentaly changes the driver to use 11 dummy clock cycles QEMU currently finds the problem, that won't be possible with this series. It's difficult to say but it is not impossible there are other use cases also. More importantly IMO though is that the current use cases can be keept while still providing support for commands with dummy clock cycles into the QEMU SPI controllers lacking at the moment. (If I recall correctly this series might also have another issue regarding the GQSPI SPI mode configuration, with that it is possible transmit 8 dummy clock cycles as 1 data byte, 2 data bytes or 4 data bytes, so I think some form of calculation might be needed inside m25p80). Best regards, Francisco > > Alistair > > > > > Hello maintainers, > > > > We apparently missed the 6.0 window to address this mess of the m25p80 > > model. Please provide your inputs on this before I start working on > > the v2. > > > > Regards, > > Bin > >
Hello, On 4/27/21 10:54 AM, Francisco Iglesias wrote: > On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote: >> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote: >>> >>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote: >>>> >>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias >>>> <frasse.iglesias@gmail.com> wrote: >>>>> >>>>> Hi Bin, >>>>> >>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: >>>>>> Hi Francisco, >>>>>> >>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias >>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>> >>>>>>> Dear Bin, >>>>>>> >>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: >>>>>>>> Hi Francisco, >>>>>>>> >>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias >>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi Bin, >>>>>>>>> >>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: >>>>>>>>>> Hi Francisco, >>>>>>>>>> >>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias >>>>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Bin, >>>>>>>>>>> >>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: >>>>>>>>>>>> Hi Francisco, >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias >>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Bin, >>>>>>>>>>>>> >>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: >>>>>>>>>>>>>> Hi Francisco, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias >>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Bin, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: >>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up >>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For >>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or >>>>>>>>>>>>>>>> 4-byte address is needed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after >>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted >>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte. >>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has >>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right >>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on >>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad >>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was >>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it >>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, >>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error >>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now >>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this >>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here >>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting >>>>>>>>>>>>>> software generation should have been found out seriously broken long >>>>>>>>>>>>>> time ago! >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring >>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I >>>>>>>>>>>> >>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special >>>>>>>>>>>> that needs some special support from the SPI controller. For the case >>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective, >>>>>>>>>>>> just like sending out a command, or address bytes, or data. The >>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it. >>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be >>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or >>>>>>>>>>>> automatically generated (case 2 controller) by the hardware. >>>>>>>>>>> >>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also >>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW >>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell >>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux >>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles). >>>>>>>>>>> >>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which >>>>>>>>>>> intentional or untentional features provided by the functionality are being >>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that >>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside >>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock >>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock >>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality >>>>>>>>>>> above use case will brake, this since those test will not be reliable. >>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if >>>>>>>>>>> there are other use cases that will be affected. This means that in case [1] >>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing. >>>>>>>>>>> Luckily it also easier in this case, see below. >>>>>>>>>> >>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an >>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was >>>>>>>>>> about model behavior changes, sure I can update >>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the >>>>>>>>>> m25p80 model now implement dummy cycles as bytes. >>>>>>>>> >>>>>>>>> Yes, something like that. My concern is that since this functionality has been >>>>>>>>> in tree for while, users have found known or unknown features that got >>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features) >>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one >>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that >>>>>>>>> "In general features are intended to be supported indefinitely once introduced >>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean >>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be >>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use >>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to >>>>>>>>> go forward (even if I also agree on what you are saying below about what I >>>>>>>>> proposed). >>>>>>>>> >>>>>>>> >>>>>>>> Even if the implementation is buggy and we need to keep the buggy >>>>>>>> implementation forever? I think that's why >>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such >>>>>>>> feature. >>>>>>> >>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1 >>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). >>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy. >>>>>> >>>>>> Well, the existing m25p80 implementation that uses dummy cycle >>>>>> accuracy for those flashes prevents all SPI controllers that use tx >>>>>> fifo to work with those flashes. Hence it is buggy. >>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should >>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support >>>>>>>>>>>> >>>>>>>>>>>> I called it "seriously broken" because current implementation only >>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the >>>>>>>>>>>> other type. >>>>>>>>>>> >>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the >>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to >>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers >>>>>>>>>>> referred to above that do not support the portion of that API that corresponds >>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is >>>>>>>>>>> broken since there is also one SPI controller that has a working implementation >>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But >>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API >>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] >>>>>>>>>>> will still be honored as in the same time making it possible to have full >>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread >>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this >>>>>>>>>>> as win/win situation, also because no controller should need modifications. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I am afraid your proposal does not work. Your proposed new device >>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy >>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as >>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to >>>>>>>>>> how the SPI controller works. >>>>>>>>> >>>>>>>>> I agree on above. I decided though that instead of posting sample code in here >>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, >>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step. >>>>>>>>> >>>>>>>> >>>>>>>> Wait, (see below) >>>>>>>> >>>>>>>>>> >>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both >>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or >>>>>>>>>> generated by the controller automatically. Please read the example >>>>>>>>>> given in: >>>>>>>>>> >>>>>>>>>> table 24‐22, an example of Generic FIFO Contents for Quad I/O Read >>>>>>>>>> Command (EBh) >>>>>>>>>> >>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf >>>>>>>>>> >>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to >>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to >>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles, >>>>>>>>>> and this is wrong. >>>>>>>>>> >>>>>>>> >>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above >>>>>>>> your proposal cannot support the complicated controller like Xilinx >>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you >>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle >>>>>>>> generation, which is wrong. >>>>>>>> >>>>>>> >>>>>>> First, thank you very much for looking into the RFC series, very much >>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 >>>>>>> locations in the file, in 1 location the transfer referred to above is done, in >>>>>>> another location the transfer through the txfifo is done. The location where >>>>>>> transfer referred to above is done will not need any modifications (and will >>>>>>> thus work equally well as it does currently). >>>>>> >>>>>> Please explain this a little bit. How does your RFC series handle >>>>>> cases as described in table 24-22, where the 6 dummy cycles are split >>>>>> into 2 transfers, with one transfer using tx fifo, and the other one >>>>>> using hardware dummy cycle generation? >>>>> >>>>> Sorry, I missunderstod. You are right, that won't work. >>>> >>>> +Edgar E. Iglesias >>>> >>>> So it looks by far the only way to implement dummy cycles correctly to >>>> work with all SPI controller models is what I proposed here in this >>>> patch series. >>>> >>>> Maintainers are quite silent, so I would like to hear your thoughts. >>>> >>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you >>>> please share your thoughts since you are the one who reviewed the >>>> existing dummy implementation (based on commits history) >> >> I agree with Edgar, in that Francisco and Bin know this better than me >> and that modelling things in cycles is a pain. > > Hi Alistair, > >> >> As Bin points out it seems like currently we should be modelling bytes >> (from the variable name) so it makes sense to keep it in bytes. I >> would be in favour of this series in that case. Do we know what use >> cases this will break? I know it's hard to answer but I don't think >> there are too many SSI users in QEMU so it might not be too hard to >> test most of the possible use cases. > > The use case I'm aware of is regression testing of drivers. Ex: if a > driver is using 10 dummy clock cycles with the commands and a patch > accidentaly changes the driver to use 11 dummy clock cycles QEMU currently > finds the problem, that won't be possible with this series. It's difficult > to say but it is not impossible there are other use cases also. It was breaking the Aspeed machines : https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/ QEMU 6.1 should have acceptance tests that will help in detecting regressions in this area. Thanks, C. > > More importantly IMO though is that the current use cases can be keept > while still providing support for commands with dummy clock cycles into > the QEMU SPI controllers lacking at the moment. > > (If I recall correctly this series might also have another issue regarding > the GQSPI SPI mode configuration, with that it is possible transmit 8 > dummy clock cycles as 1 data byte, 2 data bytes or 4 data bytes, so I > think some form of calculation might be needed inside m25p80). > > Best regards, > Francisco > > >> >> Alistair >> >>> >>> Hello maintainers, >>> >>> We apparently missed the 6.0 window to address this mess of the m25p80 >>> model. Please provide your inputs on this before I start working on >>> the v2. >>> >>> Regards, >>> Bin >>>
Hi Cédric, On Tue, Apr 27, 2021 at 10:32 PM Cédric Le Goater <clg@kaod.org> wrote: > > Hello, > > On 4/27/21 10:54 AM, Francisco Iglesias wrote: > > On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote: > >> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote: > >>> > >>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote: > >>>> > >>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias > >>>> <frasse.iglesias@gmail.com> wrote: > >>>>> > >>>>> Hi Bin, > >>>>> > >>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: > >>>>>> Hi Francisco, > >>>>>> > >>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias > >>>>>> <frasse.iglesias@gmail.com> wrote: > >>>>>>> > >>>>>>> Dear Bin, > >>>>>>> > >>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: > >>>>>>>> Hi Francisco, > >>>>>>>> > >>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias > >>>>>>>> <frasse.iglesias@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Hi Bin, > >>>>>>>>> > >>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: > >>>>>>>>>> Hi Francisco, > >>>>>>>>>> > >>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias > >>>>>>>>>> <frasse.iglesias@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Bin, > >>>>>>>>>>> > >>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: > >>>>>>>>>>>> Hi Francisco, > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias > >>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Bin, > >>>>>>>>>>>>> > >>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: > >>>>>>>>>>>>>> Hi Francisco, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias > >>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Bin, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: > >>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up > >>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For > >>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or > >>>>>>>>>>>>>>>> 4-byte address is needed. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after > >>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted > >>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte. > >>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has > >>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right > >>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on > >>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad > >>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was > >>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it > >>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, > >>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error > >>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now > >>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this > >>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here > >>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting > >>>>>>>>>>>>>> software generation should have been found out seriously broken long > >>>>>>>>>>>>>> time ago! > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring > >>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I > >>>>>>>>>>>> > >>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special > >>>>>>>>>>>> that needs some special support from the SPI controller. For the case > >>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective, > >>>>>>>>>>>> just like sending out a command, or address bytes, or data. The > >>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it. > >>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be > >>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or > >>>>>>>>>>>> automatically generated (case 2 controller) by the hardware. > >>>>>>>>>>> > >>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also > >>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW > >>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell > >>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux > >>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles). > >>>>>>>>>>> > >>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which > >>>>>>>>>>> intentional or untentional features provided by the functionality are being > >>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that > >>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside > >>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock > >>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock > >>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality > >>>>>>>>>>> above use case will brake, this since those test will not be reliable. > >>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if > >>>>>>>>>>> there are other use cases that will be affected. This means that in case [1] > >>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing. > >>>>>>>>>>> Luckily it also easier in this case, see below. > >>>>>>>>>> > >>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an > >>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was > >>>>>>>>>> about model behavior changes, sure I can update > >>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the > >>>>>>>>>> m25p80 model now implement dummy cycles as bytes. > >>>>>>>>> > >>>>>>>>> Yes, something like that. My concern is that since this functionality has been > >>>>>>>>> in tree for while, users have found known or unknown features that got > >>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features) > >>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one > >>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that > >>>>>>>>> "In general features are intended to be supported indefinitely once introduced > >>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean > >>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be > >>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use > >>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to > >>>>>>>>> go forward (even if I also agree on what you are saying below about what I > >>>>>>>>> proposed). > >>>>>>>>> > >>>>>>>> > >>>>>>>> Even if the implementation is buggy and we need to keep the buggy > >>>>>>>> implementation forever? I think that's why > >>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such > >>>>>>>> feature. > >>>>>>> > >>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1 > >>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). > >>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy. > >>>>>> > >>>>>> Well, the existing m25p80 implementation that uses dummy cycle > >>>>>> accuracy for those flashes prevents all SPI controllers that use tx > >>>>>> fifo to work with those flashes. Hence it is buggy. > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should > >>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support > >>>>>>>>>>>> > >>>>>>>>>>>> I called it "seriously broken" because current implementation only > >>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the > >>>>>>>>>>>> other type. > >>>>>>>>>>> > >>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the > >>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to > >>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers > >>>>>>>>>>> referred to above that do not support the portion of that API that corresponds > >>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is > >>>>>>>>>>> broken since there is also one SPI controller that has a working implementation > >>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But > >>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API > >>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] > >>>>>>>>>>> will still be honored as in the same time making it possible to have full > >>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread > >>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this > >>>>>>>>>>> as win/win situation, also because no controller should need modifications. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I am afraid your proposal does not work. Your proposed new device > >>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy > >>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as > >>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to > >>>>>>>>>> how the SPI controller works. > >>>>>>>>> > >>>>>>>>> I agree on above. I decided though that instead of posting sample code in here > >>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, > >>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step. > >>>>>>>>> > >>>>>>>> > >>>>>>>> Wait, (see below) > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both > >>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or > >>>>>>>>>> generated by the controller automatically. Please read the example > >>>>>>>>>> given in: > >>>>>>>>>> > >>>>>>>>>> table 24‐22, an example of Generic FIFO Contents for Quad I/O Read > >>>>>>>>>> Command (EBh) > >>>>>>>>>> > >>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf > >>>>>>>>>> > >>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to > >>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to > >>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles, > >>>>>>>>>> and this is wrong. > >>>>>>>>>> > >>>>>>>> > >>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above > >>>>>>>> your proposal cannot support the complicated controller like Xilinx > >>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you > >>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle > >>>>>>>> generation, which is wrong. > >>>>>>>> > >>>>>>> > >>>>>>> First, thank you very much for looking into the RFC series, very much > >>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 > >>>>>>> locations in the file, in 1 location the transfer referred to above is done, in > >>>>>>> another location the transfer through the txfifo is done. The location where > >>>>>>> transfer referred to above is done will not need any modifications (and will > >>>>>>> thus work equally well as it does currently). > >>>>>> > >>>>>> Please explain this a little bit. How does your RFC series handle > >>>>>> cases as described in table 24-22, where the 6 dummy cycles are split > >>>>>> into 2 transfers, with one transfer using tx fifo, and the other one > >>>>>> using hardware dummy cycle generation? > >>>>> > >>>>> Sorry, I missunderstod. You are right, that won't work. > >>>> > >>>> +Edgar E. Iglesias > >>>> > >>>> So it looks by far the only way to implement dummy cycles correctly to > >>>> work with all SPI controller models is what I proposed here in this > >>>> patch series. > >>>> > >>>> Maintainers are quite silent, so I would like to hear your thoughts. > >>>> > >>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you > >>>> please share your thoughts since you are the one who reviewed the > >>>> existing dummy implementation (based on commits history) > >> > >> I agree with Edgar, in that Francisco and Bin know this better than me > >> and that modelling things in cycles is a pain. > > > > Hi Alistair, > > > >> > >> As Bin points out it seems like currently we should be modelling bytes > >> (from the variable name) so it makes sense to keep it in bytes. I > >> would be in favour of this series in that case. Do we know what use > >> cases this will break? I know it's hard to answer but I don't think > >> there are too many SSI users in QEMU so it might not be too hard to > >> test most of the possible use cases. > > > > The use case I'm aware of is regression testing of drivers. Ex: if a > > driver is using 10 dummy clock cycles with the commands and a patch > > accidentaly changes the driver to use 11 dummy clock cycles QEMU currently > > finds the problem, that won't be possible with this series. It's difficult > > to say but it is not impossible there are other use cases also. > > > It was breaking the Aspeed machines : > > https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/ Yes, as I mentioned in the series the modification was based on a pure guess from existing QEMU codes as I could not find a datasheet of the Aspeed SPI controller on the internet. Do you know if this is publicly available? > > QEMU 6.1 should have acceptance tests that will help in detecting > regressions in this area. > Regards, Bin
On 4/28/21 3:12 PM, Bin Meng wrote: > Hi Cédric, > > On Tue, Apr 27, 2021 at 10:32 PM Cédric Le Goater <clg@kaod.org> wrote: >> >> Hello, >> >> On 4/27/21 10:54 AM, Francisco Iglesias wrote: >>> On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote: >>>> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote: >>>>> >>>>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote: >>>>>> >>>>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias >>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>> >>>>>>> Hi Bin, >>>>>>> >>>>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: >>>>>>>> Hi Francisco, >>>>>>>> >>>>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias >>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Dear Bin, >>>>>>>>> >>>>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: >>>>>>>>>> Hi Francisco, >>>>>>>>>> >>>>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias >>>>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Bin, >>>>>>>>>>> >>>>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: >>>>>>>>>>>> Hi Francisco, >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias >>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Bin, >>>>>>>>>>>>> >>>>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: >>>>>>>>>>>>>> Hi Francisco, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias >>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Bin, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: >>>>>>>>>>>>>>>> Hi Francisco, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias >>>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Bin, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: >>>>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up >>>>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For >>>>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or >>>>>>>>>>>>>>>>>> 4-byte address is needed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after >>>>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted >>>>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte. >>>>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has >>>>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right >>>>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on >>>>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad >>>>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was >>>>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it >>>>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8, >>>>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error >>>>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now >>>>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this >>>>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here >>>>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting >>>>>>>>>>>>>>>> software generation should have been found out seriously broken long >>>>>>>>>>>>>>>> time ago! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring >>>>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special >>>>>>>>>>>>>> that needs some special support from the SPI controller. For the case >>>>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective, >>>>>>>>>>>>>> just like sending out a command, or address bytes, or data. The >>>>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it. >>>>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be >>>>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or >>>>>>>>>>>>>> automatically generated (case 2 controller) by the hardware. >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also >>>>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW >>>>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell >>>>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux >>>>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles). >>>>>>>>>>>>> >>>>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which >>>>>>>>>>>>> intentional or untentional features provided by the functionality are being >>>>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that >>>>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside >>>>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock >>>>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock >>>>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality >>>>>>>>>>>>> above use case will brake, this since those test will not be reliable. >>>>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if >>>>>>>>>>>>> there are other use cases that will be affected. This means that in case [1] >>>>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing. >>>>>>>>>>>>> Luckily it also easier in this case, see below. >>>>>>>>>>>> >>>>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an >>>>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was >>>>>>>>>>>> about model behavior changes, sure I can update >>>>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the >>>>>>>>>>>> m25p80 model now implement dummy cycles as bytes. >>>>>>>>>>> >>>>>>>>>>> Yes, something like that. My concern is that since this functionality has been >>>>>>>>>>> in tree for while, users have found known or unknown features that got >>>>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features) >>>>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one >>>>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that >>>>>>>>>>> "In general features are intended to be supported indefinitely once introduced >>>>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean >>>>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be >>>>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use >>>>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to >>>>>>>>>>> go forward (even if I also agree on what you are saying below about what I >>>>>>>>>>> proposed). >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Even if the implementation is buggy and we need to keep the buggy >>>>>>>>>> implementation forever? I think that's why >>>>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such >>>>>>>>>> feature. >>>>>>>>> >>>>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1 >>>>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI). >>>>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy. >>>>>>>> >>>>>>>> Well, the existing m25p80 implementation that uses dummy cycle >>>>>>>> accuracy for those flashes prevents all SPI controllers that use tx >>>>>>>> fifo to work with those flashes. Hence it is buggy. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should >>>>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support >>>>>>>>>>>>>> >>>>>>>>>>>>>> I called it "seriously broken" because current implementation only >>>>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the >>>>>>>>>>>>>> other type. >>>>>>>>>>>>> >>>>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the >>>>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to >>>>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers >>>>>>>>>>>>> referred to above that do not support the portion of that API that corresponds >>>>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is >>>>>>>>>>>>> broken since there is also one SPI controller that has a working implementation >>>>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But >>>>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API >>>>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1] >>>>>>>>>>>>> will still be honored as in the same time making it possible to have full >>>>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread >>>>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this >>>>>>>>>>>>> as win/win situation, also because no controller should need modifications. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am afraid your proposal does not work. Your proposed new device >>>>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy >>>>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as >>>>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to >>>>>>>>>>>> how the SPI controller works. >>>>>>>>>>> >>>>>>>>>>> I agree on above. I decided though that instead of posting sample code in here >>>>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below, >>>>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Wait, (see below) >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both >>>>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or >>>>>>>>>>>> generated by the controller automatically. Please read the example >>>>>>>>>>>> given in: >>>>>>>>>>>> >>>>>>>>>>>> table 24‐22, an example of Generic FIFO Contents for Quad I/O Read >>>>>>>>>>>> Command (EBh) >>>>>>>>>>>> >>>>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf >>>>>>>>>>>> >>>>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to >>>>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to >>>>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles, >>>>>>>>>>>> and this is wrong. >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above >>>>>>>>>> your proposal cannot support the complicated controller like Xilinx >>>>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you >>>>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle >>>>>>>>>> generation, which is wrong. >>>>>>>>>> >>>>>>>>> >>>>>>>>> First, thank you very much for looking into the RFC series, very much >>>>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2 >>>>>>>>> locations in the file, in 1 location the transfer referred to above is done, in >>>>>>>>> another location the transfer through the txfifo is done. The location where >>>>>>>>> transfer referred to above is done will not need any modifications (and will >>>>>>>>> thus work equally well as it does currently). >>>>>>>> >>>>>>>> Please explain this a little bit. How does your RFC series handle >>>>>>>> cases as described in table 24-22, where the 6 dummy cycles are split >>>>>>>> into 2 transfers, with one transfer using tx fifo, and the other one >>>>>>>> using hardware dummy cycle generation? >>>>>>> >>>>>>> Sorry, I missunderstod. You are right, that won't work. >>>>>> >>>>>> +Edgar E. Iglesias >>>>>> >>>>>> So it looks by far the only way to implement dummy cycles correctly to >>>>>> work with all SPI controller models is what I proposed here in this >>>>>> patch series. >>>>>> >>>>>> Maintainers are quite silent, so I would like to hear your thoughts. >>>>>> >>>>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you >>>>>> please share your thoughts since you are the one who reviewed the >>>>>> existing dummy implementation (based on commits history) >>>> >>>> I agree with Edgar, in that Francisco and Bin know this better than me >>>> and that modelling things in cycles is a pain. >>> >>> Hi Alistair, >>> >>>> >>>> As Bin points out it seems like currently we should be modelling bytes >>>> (from the variable name) so it makes sense to keep it in bytes. I >>>> would be in favour of this series in that case. Do we know what use >>>> cases this will break? I know it's hard to answer but I don't think >>>> there are too many SSI users in QEMU so it might not be too hard to >>>> test most of the possible use cases. >>> >>> The use case I'm aware of is regression testing of drivers. Ex: if a >>> driver is using 10 dummy clock cycles with the commands and a patch >>> accidentaly changes the driver to use 11 dummy clock cycles QEMU currently >>> finds the problem, that won't be possible with this series. It's difficult >>> to say but it is not impossible there are other use cases also. >> >> >> It was breaking the Aspeed machines : >> >> https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/ > > Yes, as I mentioned in the series the modification was based on a pure > guess from existing QEMU codes as I could not find a datasheet of the > Aspeed SPI controller on the internet. Do you know if this is publicly > available? It is not but much of the register bitfields are described in the code. I should be able to help you in making this work. Thanks, C. >> QEMU 6.1 should have acceptance tests that will help in detecting >> regressions in this area. >> > > Regards, > Bin >
From: Bin Meng <bin.meng@windriver.com> The m25p80 model uses s->needed_bytes to indicate how many follow-up bytes are expected to be received after it receives a command. For example, depending on the address mode, either 3-byte address or 4-byte address is needed. For fast read family commands, some dummy cycles are required after sending the address bytes, and the dummy cycles need to be counted in s->needed_bytes. This is where the mess began. As the variable name (needed_bytes) indicates, the unit is in byte. It is not in bit, or cycle. However for some reason the model has been using the number of dummy cycles for s->needed_bytes. The right approach is to convert the number of dummy cycles to bytes based on the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8). Things get complicated when interacting with different SPI or QSPI flash controllers. There are major two cases: - Dummy bytes prepared by drivers, and wrote to the controller fifo. For such case, driver will calculate the correct number of dummy bytes and write them into the tx fifo. Fixing the m25p80 model will fix flashes working with such controllers. - Dummy bytes not prepared by drivers. Drivers just tell the hardware the dummy cycle configuration via some registers, and hardware will automatically generate dummy cycles for us. Fixing the m25p80 model is not enough, and we will need to fix the SPI/QSPI models for such controllers. This series fixes the mess in the m25p80 from the flash side first, followed by fixes to 3 known SPI controller models that fall into the 2nd case above. Please note, I have no way to verify patch 7/8/9 because: * There is no public datasheet available for the SoC / SPI controller * There is no QEMU docs, or details that tell people how to boot either U-Boot or Linux kernel to verify the functionality These 3 patches are very likely to be wrong. Hence I would like to ask help from the original author who wrote these SPI controller models to help testing, or completely rewrite these 3 patches to fix things. Thanks! Patch 6 is unvalidated with QEMU, mainly because there is no doc to tell people how to boot anything to test. But I have some confidence based on my read of the ZynqMP manual, as well as some experimental testing on a real ZCU102 board. Other flash patches can be tested with the SiFive SPI series: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391 Cherry-pick patch 16 and 17 from the series above, and switch to different flash model to test with the following command: $ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot I've picked up two for testing: QEMU flash: "sst25vf032b" U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800) CPU: rv64imafdcsu Model: SiFive HiFive Unleashed A00 DRAM: 2 GiB MMC: Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB *** Warning - bad CRC, using default environment In: serial@10010000 Out: serial@10010000 Err: serial@10010000 Net: failed to get gemgxl_reset reset Warning: ethernet@10090000 MAC addresses don't match: Address in DT is 52:54:00:12:34:56 Address in environment is 70:b3:d5:92:f0:01 eth0: ethernet@10090000 Hit any key to stop autoboot: 0 => sf probe SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB => sf test 1ff000 1000 SPI flash test: 0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps 1 check: 10 ticks, 400 KiB/s 3.200 Mbps 2 write: 170 ticks, 23 KiB/s 0.184 Mbps 3 read: 9 ticks, 444 KiB/s 3.552 Mbps Test passed 0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps 1 check: 10 ticks, 400 KiB/s 3.200 Mbps 2 write: 170 ticks, 23 KiB/s 0.184 Mbps 3 read: 9 ticks, 444 KiB/s 3.552 Mbps QEMU flash: "mx66u51235f" U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800) CPU: rv64imafdcsu Model: SiFive HiFive Unleashed A00 DRAM: 2 GiB MMC: Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB *** Warning - bad CRC, using default environment In: serial@10010000 Out: serial@10010000 Err: serial@10010000 Net: failed to get gemgxl_reset reset Warning: ethernet@10090000 MAC addresses don't match: Address in DT is 52:54:00:12:34:56 Address in environment is 70:b3:d5:92:f0:01 eth0: ethernet@10090000 Hit any key to stop autoboot: 0 => sf probe SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB => sf test 0 8000 SPI flash test: 0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps 1 check: 80 ticks, 400 KiB/s 3.200 Mbps 2 write: 83 ticks, 385 KiB/s 3.080 Mbps 3 read: 79 ticks, 405 KiB/s 3.240 Mbps Test passed 0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps 1 check: 80 ticks, 400 KiB/s 3.200 Mbps 2 write: 83 ticks, 385 KiB/s 3.080 Mbps 3 read: 79 ticks, 405 KiB/s 3.240 Mbps I am sure there will be bugs, and I have not tested all flashes affected. But I want to send out this series for an early discussion and comments. I will continue my testing. Bin Meng (9): hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes hw/block: m25p80: Support fast read for SST flashes hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command" Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles" hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic include/hw/ssi/aspeed_smc.h | 3 - hw/block/m25p80.c | 153 ++++++++++++++++++++++++++++-------- hw/ssi/aspeed_smc.c | 116 +-------------------------- hw/ssi/npcm7xx_fiu.c | 8 +- hw/ssi/xilinx_spips.c | 29 ++++++- 5 files changed, 153 insertions(+), 156 deletions(-)