[2/2] spi: Add Baikal-T1 System Boot SPI Controller driver

Message ID	20200508093621.31619-3-Sergey.Semin@baikalelectronics.ru (mailing list archive)
State	Not Applicable
Headers	show Return-Path: <SRS0=/ci8=6W=vger.kernel.org=linux-mips-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 61FCC81 for <patchwork-linux-mips@patchwork.kernel.org>; Fri, 8 May 2020 09:37:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E7C7208DB for <patchwork-linux-mips@patchwork.kernel.org>; Fri, 8 May 2020 09:37:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726746AbgEHJhU (ORCPT <rfc822;patchwork-linux-mips@patchwork.kernel.org>); Fri, 8 May 2020 05:37:20 -0400 Received: from mail.baikalelectronics.com ([87.245.175.226]:41224 "EHLO mail.baikalelectronics.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725775AbgEHJhT (ORCPT <rfc822;linux-mips@vger.kernel.org>); Fri, 8 May 2020 05:37:19 -0400 Received: from localhost (unknown [127.0.0.1]) by mail.baikalelectronics.ru (Postfix) with ESMTP id C224F80307C7; Fri, 8 May 2020 09:37:11 +0000 (UTC) X-Virus-Scanned: amavisd-new at baikalelectronics.ru Received: from mail.baikalelectronics.ru ([127.0.0.1]) by localhost (mail.baikalelectronics.ru [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k1SpiEDPikzy; Fri, 8 May 2020 12:37:09 +0300 (MSK) From: Serge Semin <Sergey.Semin@baikalelectronics.ru> To: Mark Brown <broonie@kernel.org> CC: Serge Semin <Sergey.Semin@baikalelectronics.ru>, Serge Semin <fancer.lancer@gmail.com>, Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru>, Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, Paul Burton <paulburton@kernel.org>, Ralf Baechle <ralf@linux-mips.org>, Lee Jones <lee.jones@linaro.org>, Miquel Raynal <miquel.raynal@bootlin.com>, Arnd Bergmann <arnd@arndb.de>, Rob Herring <robh+dt@kernel.org>, <linux-mips@vger.kernel.org>, <devicetree@vger.kernel.org>, John Garry <john.garry@huawei.com>, Chuanhong Guo <gch981213@gmail.com>, Andy Shevchenko <andy.shevchenko@gmail.com>, Eddie James <eajames@linux.ibm.com>, Geert Uytterhoeven <geert@linux-m68k.org>, Chris Packham <chris.packham@alliedtelesis.co.nz>, Tomer Maimon <tmaimon77@gmail.com>, Masahisa Kojima <masahisa.kojima@linaro.org>, Krzysztof Kozlowski <krzk@kernel.org>, Florian Fainelli <f.fainelli@gmail.com>, Jassi Brar <jaswinder.singh@linaro.org>, <linux-kernel@vger.kernel.org>, <linux-spi@vger.kernel.org> Subject: [PATCH 2/2] spi: Add Baikal-T1 System Boot SPI Controller driver Date: Fri, 8 May 2020 12:36:21 +0300 Message-ID: <20200508093621.31619-3-Sergey.Semin@baikalelectronics.ru> In-Reply-To: <20200508093621.31619-1-Sergey.Semin@baikalelectronics.ru> References: <20200508093621.31619-1-Sergey.Semin@baikalelectronics.ru> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-ClientProxiedBy: MAIL.baikal.int (192.168.51.25) To mail (192.168.51.25) Sender: linux-mips-owner@vger.kernel.org Precedence: bulk List-ID: <linux-mips.vger.kernel.org> X-Mailing-List: linux-mips@vger.kernel.org
Series	spi: Add Baikal-T1 System Boot SPI Controller driver \| expand [0/2] spi: Add Baikal-T1 System Boot SPI Controller driver [1/2] dt-bindings: spi: Add Baikal-T1 System Boot SPI Controller binding [2/2] spi: Add Baikal-T1 System Boot SPI Controller driver

Serge Semin May 8, 2020, 9:36 a.m. UTC

This SPI-controller is a part of the Baikal-T1 System Controller and
is based on the DW APB SSI IP-core, but with very limited resources:
no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx
FIFO available. In order to provide a transparent initial boot code
execution this controller is also utilized by an vendor-specific block,
which provides an CS0 SPI flash direct mapping interface. Since both
direct mapping and SPI controller normal utilization are mutual exclusive
only a one of these interfaces can be used to access an external SPI
slave device. Taking into account the peculiarities of the controller
registers and physically mapped SPI flash access, very limited resources
and seeing the normal usecase of the controller is to access an external
SPI-nor flash, we decided to create a dedicated SPI driver for it.

The driver provides callbacks for native messages-based SPI interface,
SPI-memory and direct mapping read operations. Due to not having any
asynchronous signaling interface provided by the core we have no choice
but to implement a polling-based data transmission/reception algorithm.
In addition to that in order to bypass the automatic native chip-select
toggle the driver disables the local interrupts during the memory-based
transfers if no complementary GPIO-based chip-select detected in the
platform.

Co-developed-by: Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru>
Signed-off-by: Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: devicetree@vger.kernel.org
---
 drivers/spi/Kconfig       |  13 +
 drivers/spi/Makefile      |   1 +
 drivers/spi/spi-bt1-sys.c | 873 ++++++++++++++++++++++++++++++++++++++
 drivers/spi/spi-bt1-sys.h | 169 ++++++++
 4 files changed, 1056 insertions(+)
 create mode 100644 drivers/spi/spi-bt1-sys.c
 create mode 100644 drivers/spi/spi-bt1-sys.h

Andy Shevchenko May 8, 2020, 10:03 a.m. UTC | #1

On Fri, May 8, 2020 at 12:37 PM Serge Semin
<Sergey.Semin@baikalelectronics.ru> wrote:
>
> This SPI-controller is a part of the Baikal-T1 System Controller and
> is based on the DW APB SSI IP-core, but with very limited resources:
> no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx
> FIFO available. In order to provide a transparent initial boot code
> execution this controller is also utilized by an vendor-specific block,
> which provides an CS0 SPI flash direct mapping interface. Since both
> direct mapping and SPI controller normal utilization are mutual exclusive
> only a one of these interfaces can be used to access an external SPI
> slave device. Taking into account the peculiarities of the controller
> registers and physically mapped SPI flash access, very limited resources
> and seeing the normal usecase of the controller is to access an external
> SPI-nor flash, we decided to create a dedicated SPI driver for it.

It seems a lot of code.
Why can't you use spi-dw-mmio.c et al.?

> The driver provides callbacks for native messages-based SPI interface,
> SPI-memory and direct mapping read operations. Due to not having any
> asynchronous signaling interface provided by the core we have no choice
> but to implement a polling-based data transmission/reception algorithm.
> In addition to that in order to bypass the automatic native chip-select
> toggle the driver disables the local interrupts during the memory-based
> transfers if no complementary GPIO-based chip-select detected in the
> platform.

Serge Semin May 8, 2020, 10:15 a.m. UTC | #2

On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote:
> On Fri, May 8, 2020 at 12:37 PM Serge Semin
> <Sergey.Semin@baikalelectronics.ru> wrote:
> >
> > This SPI-controller is a part of the Baikal-T1 System Controller and
> > is based on the DW APB SSI IP-core, but with very limited resources:
> > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx
> > FIFO available. In order to provide a transparent initial boot code
> > execution this controller is also utilized by an vendor-specific block,
> > which provides an CS0 SPI flash direct mapping interface. Since both
> > direct mapping and SPI controller normal utilization are mutual exclusive
> > only a one of these interfaces can be used to access an external SPI
> > slave device. Taking into account the peculiarities of the controller
> > registers and physically mapped SPI flash access, very limited resources
> > and seeing the normal usecase of the controller is to access an external
> > SPI-nor flash, we decided to create a dedicated SPI driver for it.
> 
> It seems a lot of code.
> Why can't you use spi-dw-mmio.c et al.?

I said above why. Even though the registers set is similar It's too specific
to be integrated into the generic DW SSI driver.

-Sergey

> 
> > The driver provides callbacks for native messages-based SPI interface,
> > SPI-memory and direct mapping read operations. Due to not having any
> > asynchronous signaling interface provided by the core we have no choice
> > but to implement a polling-based data transmission/reception algorithm.
> > In addition to that in order to bypass the automatic native chip-select
> > toggle the driver disables the local interrupts during the memory-based
> > transfers if no complementary GPIO-based chip-select detected in the
> > platform.
> 
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Mark Brown May 8, 2020, 10:22 a.m. UTC | #3

On Fri, May 08, 2020 at 01:15:41PM +0300, Serge Semin wrote:
> On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote:

> > > slave device. Taking into account the peculiarities of the controller
> > > registers and physically mapped SPI flash access, very limited resources
> > > and seeing the normal usecase of the controller is to access an external
> > > SPI-nor flash, we decided to create a dedicated SPI driver for it.

> > It seems a lot of code.
> > Why can't you use spi-dw-mmio.c et al.?

> I said above why. Even though the registers set is similar It's too specific
> to be integrated into the generic DW SSI driver.

Can you be more specific about the issues?  From what you wrote it
sounded like the main thing was chip select handling.

> > > The driver provides callbacks for native messages-based SPI interface,
> > > SPI-memory and direct mapping read operations. Due to not having any
> > > asynchronous signaling interface provided by the core we have no choice

What do you mean by "asynchronous signaling interface provided by the
core" here?

Andy Shevchenko May 8, 2020, 10:26 a.m. UTC | #4

On Fri, May 8, 2020 at 1:15 PM Serge Semin
<Sergey.Semin@baikalelectronics.ru> wrote:
>
> On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote:
> > On Fri, May 8, 2020 at 12:37 PM Serge Semin
> > <Sergey.Semin@baikalelectronics.ru> wrote:
> > >
> > > This SPI-controller is a part of the Baikal-T1 System Controller and
> > > is based on the DW APB SSI IP-core, but with very limited resources:
> > > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx
> > > FIFO available. In order to provide a transparent initial boot code
> > > execution this controller is also utilized by an vendor-specific block,
> > > which provides an CS0 SPI flash direct mapping interface. Since both
> > > direct mapping and SPI controller normal utilization are mutual exclusive
> > > only a one of these interfaces can be used to access an external SPI
> > > slave device. Taking into account the peculiarities of the controller
> > > registers and physically mapped SPI flash access, very limited resources
> > > and seeing the normal usecase of the controller is to access an external
> > > SPI-nor flash, we decided to create a dedicated SPI driver for it.
> >
> > It seems a lot of code.
> > Why can't you use spi-dw-mmio.c et al.?
>
> I said above why. Even though the registers set is similar It's too specific
> to be integrated into the generic DW SSI driver.

At least you may do at the beginning is to reuse header spi-dw.h and
put your stuff under
spi-dw-baikal.c or so. Then, look at the spi-dw.c and check what can
be reused (I think a lot).

Mark Brown May 8, 2020, 11:37 a.m. UTC | #5

On Fri, May 08, 2020 at 12:36:21PM +0300, Serge Semin wrote:

Your CC list on this series is *very* wide - are you sure that this is
relevant to everyone you're sending things to?  Kernel developers often
get a lot of mail meaning that extra mail can become a bit overwhelming
and cause things to get lost in the noise.

> This SPI-controller is a part of the Baikal-T1 System Controller and
> is based on the DW APB SSI IP-core, but with very limited resources:
> no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx
> FIFO available. In order to provide a transparent initial boot code
> execution this controller is also utilized by an vendor-specific block,
> which provides an CS0 SPI flash direct mapping interface. Since both
> direct mapping and SPI controller normal utilization are mutual exclusive
> only a one of these interfaces can be used to access an external SPI
> slave device. Taking into account the peculiarities of the controller
> registers and physically mapped SPI flash access, very limited resources
> and seeing the normal usecase of the controller is to access an external
> SPI-nor flash, we decided to create a dedicated SPI driver for it.

My overall impression reading this is that there could be a lot more
reliance on both generic functionality and as Andy suggested the spi-dw
driver - the headers seem easy for example.  As far as I can tell the
main things this is adding compared to spi-dw are the dirmap code and
the IRQ disabling around the PIO on the FIFO both of which seem like
relatively small additions which it should be possible to accomodate
within spi-dw.  For example the driver could have multiple transfer
functions and pick a different one with the interrupt disabling when
running on this hardware.

> --- /dev/null
> +++ b/drivers/spi/spi-bt1-sys.c
> @@ -0,0 +1,873 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC

Please make the entire comment a C++ one so things look a bit cleaner
and more intentional.

> +	writel(BIT(req->cs), bs->regs + BC_SPI_SER);
> +	if (req->cs_gpiod) {
> +		gpiod_set_value_cansleep(req->cs_gpiod,
> +					 !!(bs->cfg.mode & SPI_CS_HIGH));

If you have a GPIO chip select you should just let the core manage it
through cs_gpiod rather than open coding.

> +	} else if (!req->dirmap) {
> +		local_irq_save(bs->cfg.flags);
> +		preempt_disable();
> +	}

This is obviously not great, especially for larger transfers.  It would
be a lot easier to review and manage this if the IRQ disabling was done
around the transfers in a single function rather than separately, that
way everything is next to each other and it's clearer that we're doing
this for the minimum time and didn't miss anything.  Right now it's hard
to tell if everything is joined up.

> +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer)
> +{

This has exactly one caller, perhaps it could just be inlined there?  I
think this applies to some of the other internal functions.

> +static int bs_check_status(struct bt1_spi *bs)
> +{

It's not obvious from the name that this function will busy wait rather
than just reading some status registers.

> +	/*
> +	 * Spin around the BUSY-state bit waiting for the controller to finish
> +	 * all the requested IO-operations.
> +	 */
> +	do {
> +		data = readl(bs->regs + BC_SPI_SR);
> +	} while ((data & BC_SPI_SR_BUSY) || !(data & BC_SPI_SR_TFE));

We could use a timeout or something here in case something goes wrong,
as things stand we could end up spinning for ever.

It's also not clear to me why we only check for over/underrun before we
do the busy wait, especially a RX overrun could happen and cause us to
loose data while things are finishing up.

> +	while (txlen || rxlen) {
> +		cnt = BC_SPI_FIFO_LVL - __raw_readl(bs->regs + BC_SPI_TXFLR);
> +		cnt = min(cnt, txlen);
> +		txlen -= cnt;
> +		while (cnt--) {
> +			data = src ? *src++ : 0xFF;
> +			__raw_writel(data, bs->regs + BC_SPI_DR);
> +		}

It's more typical to write zeros when there's no data.

> +static int bs_exec_mem_op(struct spi_mem *mem,
> +			  const struct spi_mem_op *op)
> +{

It's not clear to me that this hardware actually supports spi_mem in
hardware?  

> +	if (!bs->cfg.cs_gpiod) {
> +		bs_push_bytes(bs, cmd, len);
> +
> +		if (op->data.dir == SPI_MEM_DATA_IN)
> +			ret = bs_pull_bytes(bs, op->data.buf.in,
> +					    op->data.nbytes);
> +	} else {
> +		bs_pp_bytes(bs, cmd, NULL, len);
> +
> +		if (op->data.dir == SPI_MEM_DATA_IN)
> +			bs_pp_bytes(bs, NULL, op->data.buf.in,
> +				    op->data.nbytes);
> +	}

The actual transfers here are done using the same PIO functions as
everything else as far as I can see which makes me wonder what this
spi_mem implementation is adding over the standard SPI flash support.
I've not checked but if we need this to get the dirmap support to kick
in then we should fix this by making the core not have that requirement.

> +static void bs_copy_from_map(void *to, void __iomem *from, size_t len)
> +{
> +	size_t shift, chunk;
> +	u32 data;

This feels like something that is likely to be suitable for a generic
implementation?  Indeed I'd kind of expect that the architecture
memcpy_to/fromio() would try to copy aligned blocks since it's usually
going to perform better (which I assume is why this function is doing
it).

> +	if (shift) {
> +		chunk = min_t(size_t, 4 - shift, len);
> +		data = readl_relaxed(from - shift);
> +		memcpy(to, &data + shift, chunk);

It's a little unclear what we're actually doing though - we read a data
address from the hardware?

> +static int bs_prepare_message(struct spi_controller *ctrl,
> +			      struct spi_message *msg)
> +{
> +	struct bt1_spi *bs = spi_controller_get_devdata(ctrl);
> +	struct spi_transfer *xfer;
> +	struct bt1_spi_cfg req;
> +
> +	/*
> +	 * Currently the driver doesn't support the generic messages-based
> +	 * SPI interface with pure native chip-select signal. This is due
> +	 * to the automatic native chip-select toggle peculiarity described
> +	 * in the comment of the bs_set_cfg() method. Alas we can't use the
> +	 * IRQ-disable-based boost approach here since the native queue-based
> +	 * SPI messages transfer method can sleep waiting for the
> +	 * transfers/CS-change delays.
> +	 */
> +	if (!msg->spi->cs_gpiod)
> +		return -ENOTSUPP;

This seems a bit much - we can validate that the message doesn't contain
any delays (which is going to be the overwhelming majority of devices)
or multiple transfers and reject just those transfers we can't support.
Multiple transfers are an issue with this hardware but you could
implement support in the core for coalescing them into single transfers,
this isn't the only hardware with automatic chip select handling that
can't cope with scatter/gather so it would be a useful thing to have.

> +	/*
> +	 * Collect the controller configuration common for all transfers and
> +	 * specific to the very first one.
> +	 */
> +	xfer = list_first_entry(&msg->transfers, typeof(*xfer), transfer_list);
> +	req.dirmap = false;
> +	req.cs = msg->spi->chip_select;
> +	req.cs_gpiod = msg->spi->cs_gpiod;
> +	req.flags = 0;
> +	req.mode = msg->spi->mode;
> +	req.tmode = BC_SPI_CTRL0_TMOD_TR;
> +	req.ndf = 0;
> +	req.dfs = xfer->bits_per_word;
> +	req.freq = xfer->speed_hz;
> +
> +	bs_set_cfg(bs, &req);

It's not clear to me what the benefit is in this indirection through
req?

> +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi,
> +			   struct spi_transfer *xfer)
> +{
> +	struct bt1_spi *bs = spi_controller_get_devdata(ctrl);
> +
> +	bs_update_cfg(bs, xfer);
> +
> +	if (bs->cfg.dfs <= 8)
> +		bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len);
> +	else
> +		bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2);

This will have issues with transfers with an odd number of bytes won't
it?

> +static struct bt1_spi *bs_create_data(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct bt1_spi *bs;

As with some of the other functions this has exactly one caller and
barely any content in it, just inline it.

> +	bs->dev = dev;
> +	bs->pdev = pdev;

Do you really need to separately save these (and is there a context
where you have the driver data but not the device anyway)?

> +#ifdef CONFIG_DEBUG_FS
> +
> +#define BC_SPI_DBGFS_REG(_name, _off)	\
> +{					\
> +	.name = _name,			\
> +	.offset = _off			\
> +}

Perhaps consider regmap_mmio?

> +static int bs_dbgfs_open_regs(struct inode *inode, struct file *file)
> +{
> +	struct bt1_spi *bs = inode->i_private;
> +	int ret;
> +
> +	ret = single_open(file, bs_dbgfs_show_regs, bs);
> +	if (!ret) {
> +		mutex_lock(&bs->ctrl->io_mutex);

Holding a mutex over the entire time that a file is open is not good,
it's also not clear to me what this mutex is supposed to be protecting
given that it's only referenced in these debugfs functions.

> +		writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR);

Whatever this write is doing we never seem to undo it?

Serge Semin May 8, 2020, 3:42 p.m. UTC | #6

On Fri, May 08, 2020 at 11:22:10AM +0100, Mark Brown wrote:
> On Fri, May 08, 2020 at 01:15:41PM +0300, Serge Semin wrote:
> > On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote:
> 
> > > > slave device. Taking into account the peculiarities of the controller
> > > > registers and physically mapped SPI flash access, very limited resources
> > > > and seeing the normal usecase of the controller is to access an external
> > > > SPI-nor flash, we decided to create a dedicated SPI driver for it.
> 
> > > It seems a lot of code.
> > > Why can't you use spi-dw-mmio.c et al.?
> 
> > I said above why. Even though the registers set is similar It's too specific
> > to be integrated into the generic DW SSI driver.
> 
> Can you be more specific about the issues?  From what you wrote it
> sounded like the main thing was chip select handling.

I thought it would be obvious from the patch itself. I've thoroughly described all
the issues there. Here in cover-letter it's a summary of the main ones.
Anyway here are all collected at once:

1) Registers mapping. The DW SSI registers are shifted by 0x100 with
respect to the MMIO region start. The lowest 0x100 registers are
responsible for the Baikal-T1 Boot Controller settings. There aren't much
of them there though. Our code is interested only in a flag, which switches
an accessibility of the DW APB SSI registers and direct SPI flash mapping.
And this switchability is a reason of another peculiarity (see the next
item for details).

2) SPI flash direct mapping. SPI flash direct mapping and DW APB SSI registers
are mutual exclusive, so only one of them can be enabled at a time. In
order to use the dirmap we have to switch the RDA bit off in the Boot
Controller setup register. If DW APB SSI registers need to be accessed the
RDA bit should be set. For this reason we have to make sure that dirmap
operations, SPI operations and SPI-mem-ops operations are exclusive, since
some of them need to interact with the DW APB SSI registers, while another
the directly mapped SPI flash MMIO (currently ctlr->io_mutex is responsible
for this).

3) A specific access to MMIO (concerning directly mapped SPI flash MMIO).
The SoC interconnect is designed in a way so we can't use any instruction to
read/write from/to the MMIO space. It has to be done by lw/sw with
dword-aligned address passed. Though in this driver we only use a read
operation from the directly mapped SPI flash memory.

4) No direct handling of the CS. Though this is an issue of all DW SSI
controllers, here with very small FIFO and no DMA/IRQ supported it mandates to
workaround any preemption/interruption during a non-GPIO-CS-based transfer.
For the same reason the driver doesn't support normal spi-messages based
interface if no GPIO-CS supplied. In addition since FIFO is too small and most
of our platforms don't have GPIO-CS support we had to create the SPI-mem-ops
instead of generic SPI-callback.

5) MMIO access race condition. As I described in the in-code comment it's a
very tricky race happening during concurrent access from different cores to the
APB bus. Due to this if SPI interface is working high frequency like
12.5 - 25 MHz and there is some another code working with APB bus on another
core, the SPI data pushing function might not keep up with filling small FIFO
(8 bytes) on time, since the APB bus has got just 50 MHz frequency. Due to
this I had artificially limit the SPI bus frequency if there is more than one
CPU in the system.

6) A single CS. It's normally connected to an external SPI flash so the
driver is equipped with mem-ops and dirmap out of the box. BTW normal
SPI-operations are in fact unneeded on all the platforms currently
equipped with Baikal-T1 because all of them have got SPI flash attached to
this interface, and most likely it will be like in new platforms too.

7) No interrupts. Yes, this is another peculiarity. Since this DW APB core
is a part of the boot controller it just don't need IRQs. Boot controller
is responsible for the code loading from SPI flash. It has got a dedicated
RTL which provide a transparent access to the flash just by reading from a
corresponding MMIO range (see the SPI flash direct mapping described above).

8) No DMA. Yeah, and DMA isn't supported for the same reason.

I am pretty sure I have forgotten something. Anyway it has been much easier
to create a new driver instead of integrating all of these into a generic
one. Integrating something like this in the current DW APB SSI driver would
mean to have it completely overwritten (refactored if you want) which would
bring us to a new driver anyway. I don't think it would be good to
complicate the generic driver with so many peculiar things scattered around
the code with various hooks or ifdef, especially seeing the current code has
already become a bit messy.

> 
> > > > The driver provides callbacks for native messages-based SPI interface,
> > > > SPI-memory and direct mapping read operations. Due to not having any
> > > > asynchronous signaling interface provided by the core we have no choice
> 
> What do you mean by "asynchronous signaling interface provided by the
> core" here?

By asynchronous signalling I meant DMA and IRQ. Non of them provided by the
controller.

-Sergey

Serge Semin May 8, 2020, 4:24 p.m. UTC | #7

On Fri, May 08, 2020 at 01:26:52PM +0300, Andy Shevchenko wrote:
> On Fri, May 8, 2020 at 1:15 PM Serge Semin
> <Sergey.Semin@baikalelectronics.ru> wrote:
> >
> > On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote:
> > > On Fri, May 8, 2020 at 12:37 PM Serge Semin
> > > <Sergey.Semin@baikalelectronics.ru> wrote:
> > > >
> > > > This SPI-controller is a part of the Baikal-T1 System Controller and
> > > > is based on the DW APB SSI IP-core, but with very limited resources:
> > > > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx
> > > > FIFO available. In order to provide a transparent initial boot code
> > > > execution this controller is also utilized by an vendor-specific block,
> > > > which provides an CS0 SPI flash direct mapping interface. Since both
> > > > direct mapping and SPI controller normal utilization are mutual exclusive
> > > > only a one of these interfaces can be used to access an external SPI
> > > > slave device. Taking into account the peculiarities of the controller
> > > > registers and physically mapped SPI flash access, very limited resources
> > > > and seeing the normal usecase of the controller is to access an external
> > > > SPI-nor flash, we decided to create a dedicated SPI driver for it.
> > >
> > > It seems a lot of code.
> > > Why can't you use spi-dw-mmio.c et al.?
> >
> > I said above why. Even though the registers set is similar It's too specific
> > to be integrated into the generic DW SSI driver.
> 
> At least you may do at the beginning is to reuse header spi-dw.h and
> put your stuff under
> spi-dw-baikal.c or so. Then, look at the spi-dw.c and check what can
> be reused (I think a lot).

Nah, It's not a lot, barely a few things. What is useful is the registers map
(though it has to be shifted and most of the registers are unused), IO operations
(two small read-write methods with questionable design) and possibly a few lines
of config setting procedure.

You think I didn't had in mind to integrate this controller support into
the generic driver or resuse something from there? If it was worth a try I would
have done it.

-Sergey

> 
> -- 
> With Best Regards,
> Andy Shevchenko

Mark Brown May 8, 2020, 5:07 p.m. UTC | #8

On Fri, May 08, 2020 at 06:42:10PM +0300, Serge Semin wrote:
> On Fri, May 08, 2020 at 11:22:10AM +0100, Mark Brown wrote:

> > Can you be more specific about the issues?  From what you wrote it
> > sounded like the main thing was chip select handling.

> I thought it would be obvious from the patch itself. I've thoroughly described all
> the issues there. Here in cover-letter it's a summary of the main ones.

Bear in mind that the patch is a stand alone thing, there's not a copy
of the existing driver sitting with it and the stylistic changes make
comparisons even less obvious.

> 1) Registers mapping. The DW SSI registers are shifted by 0x100 with
> respect to the MMIO region start. The lowest 0x100 registers are
> responsible for the Baikal-T1 Boot Controller settings. There aren't much
> of them there though. Our code is interested only in a flag, which switches
> an accessibility of the DW APB SSI registers and direct SPI flash mapping.
> And this switchability is a reason of another peculiarity (see the next
> item for details).

That seems fairly easy to address, for example with a subdevice or
indirecting through ops for I/O that could add on an offset (what a
subdevice would end up accomplishing really).

> 2) SPI flash direct mapping. SPI flash direct mapping and DW APB SSI registers
> are mutual exclusive, so only one of them can be enabled at a time. In
> order to use the dirmap we have to switch the RDA bit off in the Boot
> Controller setup register. If DW APB SSI registers need to be accessed the
> RDA bit should be set. For this reason we have to make sure that dirmap
> operations, SPI operations and SPI-mem-ops operations are exclusive, since
> some of them need to interact with the DW APB SSI registers, while another

This exclusivity requirement is pretty standard for these flash memory
map controllers, the framework should ensure you don't get any overlap
between memory mapped and regular interactions.

> the directly mapped SPI flash MMIO (currently ctlr->io_mutex is responsible
> for this).

It only seemed to be referenced in the debugfs code?

> 3) A specific access to MMIO (concerning directly mapped SPI flash MMIO).
> The SoC interconnect is designed in a way so we can't use any instruction to
> read/write from/to the MMIO space. It has to be done by lw/sw with
> dword-aligned address passed. Though in this driver we only use a read
> operation from the directly mapped SPI flash memory.

That's a custom IP block for your systems so that'd be a separate
operation no matter what.

> 4) No direct handling of the CS. Though this is an issue of all DW SSI
> controllers, here with very small FIFO and no DMA/IRQ supported it mandates to
> workaround any preemption/interruption during a non-GPIO-CS-based transfer.
> For the same reason the driver doesn't support normal spi-messages based

As I said when reviewing the driver I think all you need here is support
in the core for linearizing messages into a single transfer and then
what you're left with is what should be a fairly small quirk for running
with interrupts disabled if there's no DMA or interrupts.  I'd expect
both bits of that to benefit some other users too, there's definitely
other controllers that have problems with automated chip select
handling but happen to get away with it a lot of the time.

> interface if no GPIO-CS supplied. In addition since FIFO is too small and most
> of our platforms don't have GPIO-CS support we had to create the SPI-mem-ops
> instead of generic SPI-callback.

As I also said in reviewing the driver that's just not a good idea
anyway, there is no way a driver should be open coding things like that
and there are much better ways to support this.  This is only there
because the driver isn't able to cope with normal messages, it's better
to solve that problem and use the generic flash code than to emulate the
generic flash code here.

In both these cases it looks like the majority of the reason the driver
is different is that you're trying to solve problems in the driver
without changing the core, some things are a lot easier to handle
further up the stack.

> 5) MMIO access race condition. As I described in the in-code comment it's a
> very tricky race happening during concurrent access from different cores to the
> APB bus. Due to this if SPI interface is working high frequency like

This looks like it should be a fairly small quirk?

> I am pretty sure I have forgotten something. Anyway it has been much easier
> to create a new driver instead of integrating all of these into a generic
> one. Integrating something like this in the current DW APB SSI driver would
> mean to have it completely overwritten (refactored if you want) which would
> bring us to a new driver anyway. I don't think it would be good to
> complicate the generic driver with so many peculiar things scattered around
> the code with various hooks or ifdef, especially seeing the current code has
> already become a bit messy.

It really does sound like other than the bits I don't think should be
implemented on a per-driver level these differences are relatively
small.

Serge Semin May 10, 2020, 12:20 a.m. UTC | #9

On Fri, May 08, 2020 at 12:37:51PM +0100, Mark Brown wrote:
> On Fri, May 08, 2020 at 12:36:21PM +0300, Serge Semin wrote:
> 
> Your CC list on this series is *very* wide - are you sure that this is
> relevant to everyone you're sending things to?  Kernel developers often
> get a lot of mail meaning that extra mail can become a bit overwhelming
> and cause things to get lost in the noise.

MIPS folks are here since Baikal-T1 is a MIPS-based chip and this patchset
is a part of the campaign of integrating the SoC support into the kernel.
With Lee and Miquel we discussed the dirmap support in the framework of another
patchset. Rob and devicetree-list are CC'ed due to having the bindings tree
updated. Then a series of folks who recently submitted the biggest updates into
the spi subsystems so might have valuable comments for this driver as well. So
yes, I am sure.

> 
> > This SPI-controller is a part of the Baikal-T1 System Controller and
> > is based on the DW APB SSI IP-core, but with very limited resources:
> > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx
> > FIFO available. In order to provide a transparent initial boot code
> > execution this controller is also utilized by an vendor-specific block,
> > which provides an CS0 SPI flash direct mapping interface. Since both
> > direct mapping and SPI controller normal utilization are mutual exclusive
> > only a one of these interfaces can be used to access an external SPI
> > slave device. Taking into account the peculiarities of the controller
> > registers and physically mapped SPI flash access, very limited resources
> > and seeing the normal usecase of the controller is to access an external
> > SPI-nor flash, we decided to create a dedicated SPI driver for it.
> 
> My overall impression reading this is that there could be a lot more
> reliance on both generic functionality and as Andy suggested the spi-dw
> driver - the headers seem easy for example.  As far as I can tell the
> main things this is adding compared to spi-dw are the dirmap code and
> the IRQ disabling around the PIO on the FIFO both of which seem like
> relatively small additions which it should be possible to accomodate
> within spi-dw.  For example the driver could have multiple transfer
> functions and pick a different one with the interrupt disabling when
> running on this hardware.

dirmap and IRQs disabling isn't all they have different. Please see my answer to
you in the cover-letter thread and the comments below in this email.

> 
> > --- /dev/null
> > +++ b/drivers/spi/spi-bt1-sys.c
> > @@ -0,0 +1,873 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC
> 
> Please make the entire comment a C++ one so things look a bit cleaner
> and more intentional.

This has been unexpected. It's first time I see such a requirement.
Anyway ok. I'll fix it. Is it ok to have the C-style comments in the header
file?
* It isn't included by any assembly so from this point of view C++ style
* shall also work there.

> 
> > +	writel(BIT(req->cs), bs->regs + BC_SPI_SER);
> > +	if (req->cs_gpiod) {
> > +		gpiod_set_value_cansleep(req->cs_gpiod,
> > +					 !!(bs->cfg.mode & SPI_CS_HIGH));
> 
> If you have a GPIO chip select you should just let the core manage it
> through cs_gpiod rather than open coding.

Of course I know this, and normally I would have omitted the GPIO manual
assertion (hopefully soon my hands get to merging the AX99100 driver I've
developed some time ago). The thing is that this Baikal-T1 System SSI device
driver has been initially written before commit 05766050d5bd ("spi: spi-mem:
fallback to using transfers when CS gpios are used"). So asserting GPIO CS had
been required to initiate the SPI memory communications seeing the generic
spi_mem_exec_op() doesn't do this. Manual GPIO manipulation is indeed redundant
for the current SPI-mem op execution procedure.

Secondly the message of that commit states "Devices with chip selects driven
via GPIO are not compatible with the spi-mem operations." I find this statement
questionable, because for instance this device supports memory operations with
GPIO-driven CS. Though in current implementation the driver fallback to using normal
push-pull IO mode if GPIO CS is utilized as safer one. But even in this case
it's better than splitting the memory operations up into the transfers, which is
developed in the spi_mem_exec_op() method.

So in this matter my question is: how to modify the SPI-mem interface so the
SPI-memory operations would also work with GPIO driven CS? Some additional flag
might work...

Thirdly what about dirmap operations? If we got a GPIO driven CS then due to
lacking any CS manipulation in spi_mem_dirmap_read() method we wouldn't have
been able to make the direct mapping working without manual setting the GPIO.
So the same question here. How to work this around and justify your requirement?
Until the question is answered and we come up with reasonable solution in order
to have the SPI-mem/dirmap interface working together with GPIO CS support I have
to leave the manual GPIO manipulation.

> 
> > +	} else if (!req->dirmap) {
> > +		local_irq_save(bs->cfg.flags);
> > +		preempt_disable();
> > +	}
> 

> This is obviously not great, especially for larger transfers.

Don't tell me about this. But this is the only way to make the controller
working with all the weird stuff it has got.

> It would
> be a lot easier to review and manage this if the IRQ disabling was done
> around the transfers in a single function rather than separately, that
> way everything is next to each other and it's clearer that we're doing
> this for the minimum time and didn't miss anything.  Right now it's hard
> to tell if everything is joined up.

Hm, this is questionable. If it wasn't joined up I wouldn't have the driver
working. Anyway I see your point and from a different angle of view this might be
better to move this operations to the bs_exec_mem_op() method. But if you insist
on implementing the full SPI-message interface support even for native CS, I'd
leave it here to avoid the code duplication, since the IRQs disabling workaround
should be also used there.

> 
> > +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer)
> > +{
> 
> This has exactly one caller, perhaps it could just be inlined there?  I
> think this applies to some of the other internal functions.

I don't want to inline methods just because they are used in a single place. This
might worsen readability and in this case it will. Currently the transfer_one()
method is consistent with self-explanatory methods: update config, push-pull
bytes/words, check status. This also applies to the rest of the code. I can
consider some improvements/optimizations in this matter though, but embedding
the functions isn't one of them. Moreover the compiler does the static
methods inlining automatically as soon as it sees they are called just once.

> 
> > +static int bs_check_status(struct bt1_spi *bs)
> > +{
> 
> It's not obvious from the name that this function will busy wait rather
> than just reading some status registers.

What do you suggest then? Renaming? Splitting up? If renaming, then what name do
you prefer? Something like bs_check_completion()?

> 
> > +	/*
> > +	 * Spin around the BUSY-state bit waiting for the controller to finish
> > +	 * all the requested IO-operations.
> > +	 */
> > +	do {
> > +		data = readl(bs->regs + BC_SPI_SR);
> > +	} while ((data & BC_SPI_SR_BUSY) || !(data & BC_SPI_SR_TFE));
> 
> We could use a timeout or something here in case something goes wrong,
> as things stand we could end up spinning for ever.

Ok. readl_poll_timeout_atomic() shall do the trick.

> 
> It's also not clear to me why we only check for over/underrun before we
> do the busy wait, especially a RX overrun could happen and cause us to
> loose data while things are finishing up.

In fact this busy-wait loop is relevant for the Tx-only operations. So here we
are waiting for the controller to push the rest of the data from FIFO to the
bus. This is necessary so the next IO operation wouldn't clear the FIFO up by
disabling the controller at the bs_set_cfg() method. The busy-wait loop shall
not cause any overruns (because Rx is turned off). If Tx/Rx overrun happened
it must have happened before and only If either the IO algorithm couldn't keep
up with data being pushed out/pulled in to/from the FIFOs (SPI bus is too fast
while MMIO operations are too slow or for some other case) or I messed
something up with the IO algorithm implementation.

Anyway in order to prevent such question from future code readers I'll move the
busy-wait loop in the beginning of the function.

> 
> > +	while (txlen || rxlen) {
> > +		cnt = BC_SPI_FIFO_LVL - __raw_readl(bs->regs + BC_SPI_TXFLR);
> > +		cnt = min(cnt, txlen);
> > +		txlen -= cnt;
> > +		while (cnt--) {
> > +			data = src ? *src++ : 0xFF;
> > +			__raw_writel(data, bs->regs + BC_SPI_DR);
> > +		}
> 
> It's more typical to write zeros when there's no data.

Ok.

> 
> > +static int bs_exec_mem_op(struct spi_mem *mem,
> > +			  const struct spi_mem_op *op)
> > +{
> 
> It's not clear to me that this hardware actually supports spi_mem in
> hardware?

SPI-mem operations are implemented by means of the EEPROM-read and Tx-only
modes of the controller.

> 
> > +	if (!bs->cfg.cs_gpiod) {
> > +		bs_push_bytes(bs, cmd, len);
> > +
> > +		if (op->data.dir == SPI_MEM_DATA_IN)
> > +			ret = bs_pull_bytes(bs, op->data.buf.in,
> > +					    op->data.nbytes);
> > +	} else {
> > +		bs_pp_bytes(bs, cmd, NULL, len);
> > +
> > +		if (op->data.dir == SPI_MEM_DATA_IN)
> > +			bs_pp_bytes(bs, NULL, op->data.buf.in,
> > +				    op->data.nbytes);
> > +	}
> 

> The actual transfers here are done using the same PIO functions as
> everything else as far as I can see which makes me wonder what this
> spi_mem implementation is adding over the standard SPI flash support.

No, if no GPIO CS is supplied the memory read operation is performed by means of
the EEPROM-read mode of the DW SSI controller (Write command is written to the
Tx FIFO and read data length - to the CTRLR1 register). The memory write
operation utilizes the Tx-only mode (just push data to the Tx FIFO, controller
discards all incoming data). See bs_set_cfg() for details.

If GPIO CS is utilized then we fallback to the normal push-pull methods because
first it's safer this way, second it's still better than handling multiple SPI
transfers.

BTW The EEPROM/Tx-only modes are required to be utilized for the controller on
our SoC in order to support the SPI bus operations with highest frequencies.
Due to relatively slow APB bus performance we have to minimize MMIO operation
per a single byte read/write. This is another reason why we had to create a
dedicated driver for this controller.

> I've not checked but if we need this to get the dirmap support to kick
> in then we should fix this by making the core not have that requirement.

No, dirmap is implemented by means of dedicated sub-block of the boot
controller, which does all of the IO by itself (though the controller config
still has to be accordingly setup including CS or GPIO CS). We just read a
ready SPI-flash data from a dedicated MMIO space.

> 
> > +static void bs_copy_from_map(void *to, void __iomem *from, size_t len)
> > +{
> > +	size_t shift, chunk;
> > +	u32 data;
> 
> This feels like something that is likely to be suitable for a generic
> implementation?  Indeed I'd kind of expect that the architecture
> memcpy_to/fromio() would try to copy aligned blocks since it's usually
> going to perform better (which I assume is why this function is doing
> it).

MIPS memcpy_to/fromio don't do this. They expect the MMIO space can be read in
a same way as normal memory space. It's our controller "feature", that MMIO can
be only accessible by lw/sw instructions. So I had no choice but to implement the
aligned read/write operations by myself here. AFAICS this won't be suitable for
the generic implementation due to the alignment restriction and the controller
setups, which are supposed to be performed before accessing the dirmap MMIO.

> 
> > +	if (shift) {
> > +		chunk = min_t(size_t, 4 - shift, len);
> > +		data = readl_relaxed(from - shift);
> > +		memcpy(to, &data + shift, chunk);
> 
> It's a little unclear what we're actually doing though - we read a data
> address from the hardware?

As I've written in the comment:
 * We split the copying up into the next three stages: unaligned head,
 * aligned body, unaligned tail.
here we do the unaligned head read operation. I read data from an aligned
address, then copy the requested part to the output buffer.

> 
> > +static int bs_prepare_message(struct spi_controller *ctrl,
> > +			      struct spi_message *msg)
> > +{
> > +	struct bt1_spi *bs = spi_controller_get_devdata(ctrl);
> > +	struct spi_transfer *xfer;
> > +	struct bt1_spi_cfg req;
> > +
> > +	/*
> > +	 * Currently the driver doesn't support the generic messages-based
> > +	 * SPI interface with pure native chip-select signal. This is due
> > +	 * to the automatic native chip-select toggle peculiarity described
> > +	 * in the comment of the bs_set_cfg() method. Alas we can't use the
> > +	 * IRQ-disable-based boost approach here since the native queue-based
> > +	 * SPI messages transfer method can sleep waiting for the
> > +	 * transfers/CS-change delays.
> > +	 */
> > +	if (!msg->spi->cs_gpiod)
> > +		return -ENOTSUPP;
> 
> This seems a bit much - we can validate that the message doesn't contain
> any delays (which is going to be the overwhelming majority of devices)
> or multiple transfers and reject just those transfers we can't support.
> Multiple transfers are an issue with this hardware but you could
> implement support in the core for coalescing them into single transfers,
> this isn't the only hardware with automatic chip select handling that
> can't cope with scatter/gather so it would be a useful thing to have.

As I said in the commit message to the binding this controller is nearly always
connected to an external SPI flash (I'd say always, because Baikal-T1 is fully
functional only when initially booted from an SPI flash). At least I don't know
any hardware using this interface differently (I've worked with all devices with
Baikal-T1 SoC on board). In current driver implementation the SPI-mem operations
covers the native CS case while the prepare_message()/transfer_one()/unprepare_message()
are used when GPIO-driven CS is available (due to the commit 05766050d5bd I had
to implement it). So I didn't want to over-complicate the driver just to support
some virtual hardware before it's really necessary.

> 
> > +	/*
> > +	 * Collect the controller configuration common for all transfers and
> > +	 * specific to the very first one.
> > +	 */
> > +	xfer = list_first_entry(&msg->transfers, typeof(*xfer), transfer_list);
> > +	req.dirmap = false;
> > +	req.cs = msg->spi->chip_select;
> > +	req.cs_gpiod = msg->spi->cs_gpiod;
> > +	req.flags = 0;
> > +	req.mode = msg->spi->mode;
> > +	req.tmode = BC_SPI_CTRL0_TMOD_TR;
> > +	req.ndf = 0;
> > +	req.dfs = xfer->bits_per_word;
> > +	req.freq = xfer->speed_hz;
> > +
> > +	bs_set_cfg(bs, &req);
> 
> It's not clear to me what the benefit is in this indirection through
> req?

Config request is useful to collect the controller configs in a single
structure from different objects like normal SPI transfers, SPI-mem
operations or dirmaps. The config structure is then uniformly translated
into the controller setting by the bs_set_cfg() method. 

> 
> > +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi,
> > +			   struct spi_transfer *xfer)
> > +{
> > +	struct bt1_spi *bs = spi_controller_get_devdata(ctrl);
> > +
> > +	bs_update_cfg(bs, xfer);
> > +
> > +	if (bs->cfg.dfs <= 8)
> > +		bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len);
> > +	else
> > +		bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2);
> 
> This will have issues with transfers with an odd number of bytes won't
> it?

No, it won't. Transfers with bits per word greater than 8 should have even
number of bytes seeing the len field is supposed to have the buffers length in
bytes:

 * In-memory data values are always in native CPU byte order, translated
 * from the wire byte order (big-endian except with SPI_LSB_FIRST).  So
 * for example when bits_per_word is sixteen, buffers are 2N bytes long
 * (@len = 2N) and hold N sixteen bit words in CPU byte order.
 *
 * When the word size of the SPI transfer is not a power-of-two multiple
 * of eight bits, those in-memory words include extra bits.  In-memory
 * words are always seen by protocol drivers as right-justified, so the
 * undefined (rx) or unused (tx) bits are always the most significant bits.

The controller expects exactly the same interpretation of right-justified data
when its length isn't not power of 2 including the extra bits.

> 
> > +static struct bt1_spi *bs_create_data(struct platform_device *pdev)
> > +{
> > +	struct device *dev = &pdev->dev;
> > +	struct bt1_spi *bs;
> 
> As with some of the other functions this has exactly one caller and
> barely any content in it, just inline it.

As I said before. I don't see this being necessary.

> 
> > +	bs->dev = dev;
> > +	bs->pdev = pdev;
> 
> Do you really need to separately save these (and is there a context
> where you have the driver data but not the device anyway)?

Even if I don't, having a device pointer around in the private data is a
matter of convenience. It's either referencing the controller/mem pointers
cascade to get it, which may look bulky like &mem->controller->dev and having
an additional argument passed to the sub-functions and in some cases allocating
pointers on the stack; or to prevent all of these complications I just save the
device structure pointer in the private data, so to use it when I need to print
an error or allocate dev-res, etc.

Regarding the platform_device pointer. Well I've defined it in the private data
for the same reason. Though it's used much less than generic device pointer, so
I can discard it.

> 
> > +#ifdef CONFIG_DEBUG_FS
> > +
> > +#define BC_SPI_DBGFS_REG(_name, _off)	\
> > +{					\
> > +	.name = _name,			\
> > +	.offset = _off			\
> > +}
> 
> Perhaps consider regmap_mmio?

For multiple reasons no. First of all I need a direct access to the DR register
and I need to do this as fast as possible due to relatively slow APB
bus. The same concern is regarding the controller configs setting. Secondly
everywhere in the driver I use the normal read/write methods and there is no
need in even a basic regmap-update function. So an additional abstraction like
regmap would be unnecessary complication. Thirdly the access to the registers
is always done from a thread-safe context anyway (except DebugFS read operations).
Fourthly due to the dirmap access the DW SSI registers can be unavailable at the
moment of the dirmap read method invocation, that's why I need a mutual exclusive
control around the operations concerning the registers. Due to this the regmap_mmio
won't work here and I would have to implement my own accessors. There might be
another problems, so using regmap isn't a good idea.

> 
> > +static int bs_dbgfs_open_regs(struct inode *inode, struct file *file)
> > +{
> > +	struct bt1_spi *bs = inode->i_private;
> > +	int ret;
> > +
> > +	ret = single_open(file, bs_dbgfs_show_regs, bs);
> > +	if (!ret) {
> > +		mutex_lock(&bs->ctrl->io_mutex);
> 
> Holding a mutex over the entire time that a file is open is not good,
> it's also not clear to me what this mutex is supposed to be protecting
> given that it's only referenced in these debugfs functions.

Hm, it's io_mutex protecting the SPI IO operation to be thread-safe. See:
spi_mem_access_start()/spi_mem_access_end();
__spi_pump_messages();

By locking the mutex here I make sure the DebugFS registers dump operation won't
interfere the IO operations like dirmap, which make the registers unavailable
for normal MMIO-based access.

> 
> > +		writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR);
> 
> Whatever this write is doing we never seem to undo it?

Here I switch on the DW APB SSI registers to be available for normal
readl/writel accesses. It's not a problem to leave the mode on, since
bs_set_cfg() will do a proper config anyway.

Wow, what a thorough review...

-Sergey

Chris Packham May 10, 2020, 10:17 p.m. UTC | #10

On 10/05/20 12:20 pm, Serge Semin wrote:
> On Fri, May 08, 2020 at 12:37:51PM +0100, Mark Brown wrote:
<snip>
>>> +	writel(BIT(req->cs), bs->regs + BC_SPI_SER);
>>> +	if (req->cs_gpiod) {
>>> +		gpiod_set_value_cansleep(req->cs_gpiod,
>>> +					 !!(bs->cfg.mode & SPI_CS_HIGH));
>> If you have a GPIO chip select you should just let the core manage it
>> through cs_gpiod rather than open coding.
> Of course I know this, and normally I would have omitted the GPIO manual
> assertion (hopefully soon my hands get to merging the AX99100 driver I've
> developed some time ago). The thing is that this Baikal-T1 System SSI device
> driver has been initially written before commit 05766050d5bd ("spi: spi-mem:
> fallback to using transfers when CS gpios are used"). So asserting GPIO CS had
> been required to initiate the SPI memory communications seeing the generic
> spi_mem_exec_op() doesn't do this. Manual GPIO manipulation is indeed redundant
> for the current SPI-mem op execution procedure.
>
> Secondly the message of that commit states "Devices with chip selects driven
> via GPIO are not compatible with the spi-mem operations." I find this statement
> questionable, because for instance this device supports memory operations with
> GPIO-driven CS. Though in current implementation the driver fallback to using normal
> push-pull IO mode if GPIO CS is utilized as safer one. But even in this case
> it's better than splitting the memory operations up into the transfers, which is
> developed in the spi_mem_exec_op() method.
On this specific bit. My use-case for 05766050d5bd was a SPI controller 
that supported direct mem accesses but a hardware design that required a 
GPIO CS. So yes I probably should have qualified it as _some_ devices.
> So in this matter my question is: how to modify the SPI-mem interface so the
> SPI-memory operations would also work with GPIO driven CS? Some additional flag
> might work...

Mark Brown May 11, 2020, 9:25 p.m. UTC | #11

On Sun, May 10, 2020 at 03:20:39AM +0300, Serge Semin wrote:
> On Fri, May 08, 2020 at 12:37:51PM +0100, Mark Brown wrote:

> > Your CC list on this series is *very* wide - are you sure that this is
> > relevant to everyone you're sending things to?  Kernel developers often
> > get a lot of mail meaning that extra mail can become a bit overwhelming
> > and cause things to get lost in the noise.

> MIPS folks are here since Baikal-T1 is a MIPS-based chip and this patchset
> is a part of the campaign of integrating the SoC support into the kernel.
> With Lee and Miquel we discussed the dirmap support in the framework of another
> patchset. Rob and devicetree-list are CC'ed due to having the bindings tree
> updated. Then a series of folks who recently submitted the biggest updates into
> the spi subsystems so might have valuable comments for this driver as well. So
> yes, I am sure.

I think you've got too many people who've been contributing to the
subsystem here at least - it looks like you picked up some people who
recently wrote drivers but haven't been doing a lot of review for
example for example.

> Anyway ok. I'll fix it. Is it ok to have the C-style comments in the header
> file?
> * It isn't included by any assembly so from this point of view C++ style
> * shall also work there.

The SPDX stuff requires C style comments in headers so yes.

> Secondly the message of that commit states "Devices with chip selects driven
> via GPIO are not compatible with the spi-mem operations." I find this statement
> questionable, because for instance this device supports memory operations with
> GPIO-driven CS. Though in current implementation the driver fallback to using normal
> push-pull IO mode if GPIO CS is utilized as safer one. But even in this case
> it's better than splitting the memory operations up into the transfers, which is
> developed in the spi_mem_exec_op() method.

> So in this matter my question is: how to modify the SPI-mem interface so the
> SPI-memory operations would also work with GPIO driven CS? Some additional flag
> might work...

Yes, some flags should work here - the issue was that at least some
controllers may end up trying to do multiple SPI operations for one
spi-mem thing which will break if the chip select doesn't get changed to
correspond with what's going on.

> Thirdly what about dirmap operations? If we got a GPIO driven CS then due to
> lacking any CS manipulation in spi_mem_dirmap_read() method we wouldn't have
> been able to make the direct mapping working without manual setting the GPIO.
> So the same question here. How to work this around and justify your requirement?
> Until the question is answered and we come up with reasonable solution in order
> to have the SPI-mem/dirmap interface working together with GPIO CS support I have
> to leave the manual GPIO manipulation.

If the issue with ensuring that chip select is managed appropriately for
transfers can be resolved then push that stuff up to the framework.  If
not then it's not clear that the open coded version can work well
with GPIO chip selects either.

> > > +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer)
> > > +{

> > This has exactly one caller, perhaps it could just be inlined there?  I
> > think this applies to some of the other internal functions.

> I don't want to inline methods just because they are used in a single place. This
> might worsen readability and in this case it will. Currently the transfer_one()
> method is consistent with self-explanatory methods: update config, push-pull
> bytes/words, check status. This also applies to the rest of the code. I can
> consider some improvements/optimizations in this matter though, but embedding
> the functions isn't one of them. Moreover the compiler does the static
> methods inlining automatically as soon as it sees they are called just once.

One of the things that creating internal APIs like this does is that it
adds another layer of structure to the driver, making it a bit harder to
follow that it's following the usual structures of a Linux driver
(especially noticable here given the open coding).  It creates the
impression of the platform portability layers that people sometimes try
to build.

> > > +static int bs_check_status(struct bt1_spi *bs)
> > > +{

> > It's not obvious from the name that this function will busy wait rather
> > than just reading some status registers.

> What do you suggest then? Renaming? Splitting up? If renaming, then what name do
> you prefer? Something like bs_check_completion()?

wait_for_completion() or something else that mentions that it will wait
for completion rather than just looking to see if the operation has
completed.

> > > +static int bs_exec_mem_op(struct spi_mem *mem,
> > > +			  const struct spi_mem_op *op)
> > > +{

> > It's not clear to me that this hardware actually supports spi_mem in
> > hardware?

> SPI-mem operations are implemented by means of the EEPROM-read and Tx-only
> modes of the controller.

Sure, but those seem like normal SPI-level things rather than cases
where the hardware understands that it has a flash attached and is doing
flash specific things.  A very common case for this stuff is that
controllers have acceleration blocks for read and fall back on normal
SPI for writes and erases, that sounds like what's going on here.

> > > +	if (!bs->cfg.cs_gpiod) {
> > > +		bs_push_bytes(bs, cmd, len);
> > > +
> > > +		if (op->data.dir == SPI_MEM_DATA_IN)
> > > +			ret = bs_pull_bytes(bs, op->data.buf.in,
> > > +					    op->data.nbytes);
> > > +	} else {
> > > +		bs_pp_bytes(bs, cmd, NULL, len);
> > > +
> > > +		if (op->data.dir == SPI_MEM_DATA_IN)
> > > +			bs_pp_bytes(bs, NULL, op->data.buf.in,
> > > +				    op->data.nbytes);
> > > +	}
> > 

> > The actual transfers here are done using the same PIO functions as
> > everything else as far as I can see which makes me wonder what this
> > spi_mem implementation is adding over the standard SPI flash support.

> No, if no GPIO CS is supplied the memory read operation is performed by means of
> the EEPROM-read mode of the DW SSI controller (Write command is written to the
> Tx FIFO and read data length - to the CTRLR1 register). The memory write

So EEPROM read mode is a combined write/read - is that somehow less
performant or something than separate opeations so there's a reason not
to use it except in this special circumstance?

> operation utilizes the Tx-only mode (just push data to the Tx FIFO, controller
> discards all incoming data). See bs_set_cfg() for details.

That's not super illuminating on this issue?

> If GPIO CS is utilized then we fallback to the normal push-pull methods because
> first it's safer this way, second it's still better than handling multiple SPI
> transfers.

> BTW The EEPROM/Tx-only modes are required to be utilized for the controller on
> our SoC in order to support the SPI bus operations with highest frequencies.
> Due to relatively slow APB bus performance we have to minimize MMIO operation
> per a single byte read/write. This is another reason why we had to create a
> dedicated driver for this controller.

If the controller has support for TX only mode that seems like a good
thing to have in the generic driver - for TX only transfers that's less
bus I/O.  The EEPROM mode potentially also, I'm less clear on what
exactly it is but it sounds potentially useful for common cases with
register operations.  Perhaps other systems might have similarly slow
buses and get noticable performance benefits from using these modes.

> > > +static void bs_copy_from_map(void *to, void __iomem *from, size_t len)
> > > +{
> > > +	size_t shift, chunk;
> > > +	u32 data;

> > This feels like something that is likely to be suitable for a generic
> > implementation?  Indeed I'd kind of expect that the architecture
> > memcpy_to/fromio() would try to copy aligned blocks since it's usually
> > going to perform better (which I assume is why this function is doing
> > it).

> MIPS memcpy_to/fromio don't do this. They expect the MMIO space can be read in
> a same way as normal memory space. It's our controller "feature", that MMIO can
> be only accessible by lw/sw instructions. So I had no choice but to implement the
> aligned read/write operations by myself here. AFAICS this won't be suitable for
> the generic implementation due to the alignment restriction and the controller
> setups, which are supposed to be performed before accessing the dirmap MMIO.

You should document this in the driver so nobody comes along and does
the refactoring as a cleanup, it looks like duplicated or redundant code.

> > This seems a bit much - we can validate that the message doesn't contain
> > any delays (which is going to be the overwhelming majority of devices)
> > or multiple transfers and reject just those transfers we can't support.
> > Multiple transfers are an issue with this hardware but you could
> > implement support in the core for coalescing them into single transfers,
> > this isn't the only hardware with automatic chip select handling that
> > can't cope with scatter/gather so it would be a useful thing to have.

> As I said in the commit message to the binding this controller is nearly always
> connected to an external SPI flash (I'd say always, because Baikal-T1 is fully
> functional only when initially booted from an SPI flash). At least I don't know
> any hardware using this interface differently (I've worked with all devices with
> Baikal-T1 SoC on board). In current driver implementation the SPI-mem operations
> covers the native CS case while the prepare_message()/transfer_one()/unprepare_message()
> are used when GPIO-driven CS is available (due to the commit 05766050d5bd I had
> to implement it). So I didn't want to over-complicate the driver just to support
> some virtual hardware before it's really necessary.

Aside from the issue of tempting fate in board designs this circles back
to the open coding of the bits of the spi-mem stuff that aren't really
offloaded by the hardware - the driver would look a lot more standard
(and be closer to spi-dw I guess) if it only implemented the bits that
are offloaded and this interface was available.  Having the normal
SPI message interface would not only support non-flash hardware, it'd
also enable the use of more of the generic SPI flash code like other
similar controllers.

> > > +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi,
> > > +			   struct spi_transfer *xfer)
> > > +{
> > > +	struct bt1_spi *bs = spi_controller_get_devdata(ctrl);
> > > +
> > > +	bs_update_cfg(bs, xfer);
> > > +
> > > +	if (bs->cfg.dfs <= 8)
> > > +		bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len);
> > > +	else
> > > +		bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2);
> > 
> > This will have issues with transfers with an odd number of bytes won't
> > it?

> No, it won't. Transfers with bits per word greater than 8 should have even
> number of bytes seeing the len field is supposed to have the buffers length in
> bytes:

So dfs is bits per word?  This is one of those things where the internal
APIs make things harder to follow.

> > Perhaps consider regmap_mmio?

> For multiple reasons no. First of all I need a direct access to the DR register
> and I need to do this as fast as possible due to relatively slow APB
> bus. The same concern is regarding the controller configs setting. Secondly
> everywhere in the driver I use the normal read/write methods and there is no
> need in even a basic regmap-update function. So an additional abstraction like
> regmap would be unnecessary complication. Thirdly the access to the registers

That comment was more due to writing debug infrastructure than stuff
like update_bits - the main reason regmap-mmio exists is for the debug
and cache features.  If the performance is too marginal to allow any
overhead then fine.  There's nothing stopping you combining regmap and
non-regmap on the same device if you exclude the more sensitive
registers from regmap with the access operations but it's not exactly
nice.

> > > +	if (!ret) {
> > > +		mutex_lock(&bs->ctrl->io_mutex);

> > Holding a mutex over the entire time that a file is open is not good,
> > it's also not clear to me what this mutex is supposed to be protecting
> > given that it's only referenced in these debugfs functions.

> Hm, it's io_mutex protecting the SPI IO operation to be thread-safe. See:
> spi_mem_access_start()/spi_mem_access_end();
> __spi_pump_messages();

> By locking the mutex here I make sure the DebugFS registers dump operation won't
> interfere the IO operations like dirmap, which make the registers unavailable
> for normal MMIO-based access.

Oh, it's the SPI level mutex...  that is a bit icky, and it does let
userspace code block the bus fairly easily which could be problematic
but I guess it's debugfs so meh.

> > > +		writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR);

> > Whatever this write is doing we never seem to undo it?

> Here I switch on the DW APB SSI registers to be available for normal
> readl/writel accesses. It's not a problem to leave the mode on, since
> bs_set_cfg() will do a proper config anyway.

A comment would help here, the fact that it's not undone looks like a
bug.

Serge Semin May 18, 2020, 12:05 a.m. UTC | #12

Mark,
I give up.) I'll try to integrate what I've done here in the generic DW APB SSI
driver framework. Then add some hooks so to support our specific DW APB SSI
controller. The I create a glue-layer for it. My answers to you comments are
below.

On Mon, May 11, 2020 at 10:25:06PM +0100, Mark Brown wrote:
> On Sun, May 10, 2020 at 03:20:39AM +0300, Serge Semin wrote:
> > On Fri, May 08, 2020 at 12:37:51PM +0100, Mark Brown wrote:
> 
> > > Your CC list on this series is *very* wide - are you sure that this is
> > > relevant to everyone you're sending things to?  Kernel developers often
> > > get a lot of mail meaning that extra mail can become a bit overwhelming
> > > and cause things to get lost in the noise.
> 
> > MIPS folks are here since Baikal-T1 is a MIPS-based chip and this patchset
> > is a part of the campaign of integrating the SoC support into the kernel.
> > With Lee and Miquel we discussed the dirmap support in the framework of another
> > patchset. Rob and devicetree-list are CC'ed due to having the bindings tree
> > updated. Then a series of folks who recently submitted the biggest updates into
> > the spi subsystems so might have valuable comments for this driver as well. So
> > yes, I am sure.
> 
> I think you've got too many people who've been contributing to the
> subsystem here at least - it looks like you picked up some people who
> recently wrote drivers but haven't been doing a lot of review for
> example for example.
> 
> > Anyway ok. I'll fix it. Is it ok to have the C-style comments in the header
> > file?
> > * It isn't included by any assembly so from this point of view C++ style
> > * shall also work there.
> 
> The SPDX stuff requires C style comments in headers so yes.
> 
> > Secondly the message of that commit states "Devices with chip selects driven
> > via GPIO are not compatible with the spi-mem operations." I find this statement
> > questionable, because for instance this device supports memory operations with
> > GPIO-driven CS. Though in current implementation the driver fallback to using normal
> > push-pull IO mode if GPIO CS is utilized as safer one. But even in this case
> > it's better than splitting the memory operations up into the transfers, which is
> > developed in the spi_mem_exec_op() method.
> 
> > So in this matter my question is: how to modify the SPI-mem interface so the
> > SPI-memory operations would also work with GPIO driven CS? Some additional flag
> > might work...
> 

> Yes, some flags should work here - the issue was that at least some
> controllers may end up trying to do multiple SPI operations for one
> spi-mem thing which will break if the chip select doesn't get changed to
> correspond with what's going on.

Ok. New SPI flag it is then. It will be something like this:
+ #define SPI_CONTROLLER_FLASH_SS		BIT(6)

Then we have to convert spi_set_cs() method to be non-static in the spi.c
to be used in the spi-mem.c to properly select slaves

Then the spi_mem_access_start() and spi_mem_access_stop() of spi-mem.c should be
updated in the following way:
--- spi_mem_access_start()
+++ spi_mem_access_start()

 	mutex_lock(&ctlr->bus_lock_mutex);
 	mutex_lock(&ctlr->io_mutex);
+
+	if (ctlr->flags & SPI_CONTROLLER_FLASH_SS)
+		spi_set_cs(mem->spi, true);

 	return 0;
 }
--- spi_mem_access_end()
+++ spi_mem_access_end()
 static void spi_mem_access_end(struct spi_mem *mem)
 {
 	struct spi_controller *ctlr = mem->spi->controller;
+
+	if (ctlr->flags & SPI_CONTROLLER_FLASH_SS)
+		spi_set_cs(mem->spi, false);

 	mutex_unlock(&ctlr->io_mutex);
 	mutex_unlock(&ctlr->bus_lock_mutex);
---

Method spi_mem_exec_op() shall also make sure that even if GPIO-driven CS is
utilized the normal mem-op is executed. Something like this should do this:
--- spi_mem_exec_op()
+++ spi_mem_exec_op()
 	if (!spi_mem_internal_supports_op(mem, op))
 		return -ENOTSUPP;

-	if (ctlr->mem_ops && !mem->spi->cs_gpiod) {
+	if (ctlr->mem_ops &&
+	    (!mem->spi->cs_gpiod || (ctlr->flags & SPI_CONTROLLER_FLASH_SS))) {
 		ret = spi_mem_access_start(mem);
 		if (ret)
 			return ret;
---

So, what do you think?

> 
> > Thirdly what about dirmap operations? If we got a GPIO driven CS then due to
> > lacking any CS manipulation in spi_mem_dirmap_read() method we wouldn't have
> > been able to make the direct mapping working without manual setting the GPIO.
> > So the same question here. How to work this around and justify your requirement?
> > Until the question is answered and we come up with reasonable solution in order
> > to have the SPI-mem/dirmap interface working together with GPIO CS support I have
> > to leave the manual GPIO manipulation.
> 
> If the issue with ensuring that chip select is managed appropriately for
> transfers can be resolved then push that stuff up to the framework.  If
> not then it's not clear that the open coded version can work well
> with GPIO chip selects either.

See the comment above.

> 
> > > > +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer)
> > > > +{
> 
> > > This has exactly one caller, perhaps it could just be inlined there?  I
> > > think this applies to some of the other internal functions.
> 
> > I don't want to inline methods just because they are used in a single place. This
> > might worsen readability and in this case it will. Currently the transfer_one()
> > method is consistent with self-explanatory methods: update config, push-pull
> > bytes/words, check status. This also applies to the rest of the code. I can
> > consider some improvements/optimizations in this matter though, but embedding
> > the functions isn't one of them. Moreover the compiler does the static
> > methods inlining automatically as soon as it sees they are called just once.
> 
> One of the things that creating internal APIs like this does is that it
> adds another layer of structure to the driver, making it a bit harder to
> follow that it's following the usual structures of a Linux driver
> (especially noticable here given the open coding).  It creates the
> impression of the platform portability layers that people sometimes try
> to build.

Let's stop this discussion.) Since I am going to integrate this patch into the
generic DW APB SSI driver, I will have to follow that driver design anyway. So
most likely I won't implement a new handy abstraction.

> 
> > > > +static int bs_check_status(struct bt1_spi *bs)
> > > > +{
> 
> > > It's not obvious from the name that this function will busy wait rather
> > > than just reading some status registers.
> 
> > What do you suggest then? Renaming? Splitting up? If renaming, then what name do
> > you prefer? Something like bs_check_completion()?
> 
> wait_for_completion() or something else that mentions that it will wait
> for completion rather than just looking to see if the operation has
> completed.

Ok. If it will be still applicable.

> 
> > > > +static int bs_exec_mem_op(struct spi_mem *mem,
> > > > +			  const struct spi_mem_op *op)
> > > > +{
> 
> > > It's not clear to me that this hardware actually supports spi_mem in
> > > hardware?
> 
> > SPI-mem operations are implemented by means of the EEPROM-read and Tx-only
> > modes of the controller.
> 
> Sure, but those seem like normal SPI-level things rather than cases
> where the hardware understands that it has a flash attached and is doing
> flash specific things.

No, hardware can't detect whether the flash is attached. This must be defined by
the platform, like based on the DT sub-nodes.

> A very common case for this stuff is that
> controllers have acceleration blocks for read and fall back on normal
> SPI for writes and erases, that sounds like what's going on here.

Well, yeah, they do provide some acceleration. EEPROM-read provides automatic
write-cmd-dummy-data-then-read operations. But in this case the only thing we
have to push into the SPI Tx FIFO is command and dummy bytes. The read operation
will be performed automatically. So the only thing the driver has to do is to
keep up with Rx FIFO data pulling out before it's overflown. Similarly Tx-only
mode is used to perform the Flash write/erase commands. The driver has to keep up
with pushing data into the Tx FIFO before it gets emptied and CS is de-asserted.
Using these operations is mandatory in case if there is no GPIO-driven CS is
utilized, since native CS is automatically asserted during the whole EEPROM-read
or Tx-only mode executed.

As I also said such operations are very useful, since minimize IO operations
performed over the APB bus. By doing so we can support higher SPI bus
frequencies.

> 
> > > > +	if (!bs->cfg.cs_gpiod) {
> > > > +		bs_push_bytes(bs, cmd, len);
> > > > +
> > > > +		if (op->data.dir == SPI_MEM_DATA_IN)
> > > > +			ret = bs_pull_bytes(bs, op->data.buf.in,
> > > > +					    op->data.nbytes);
> > > > +	} else {
> > > > +		bs_pp_bytes(bs, cmd, NULL, len);
> > > > +
> > > > +		if (op->data.dir == SPI_MEM_DATA_IN)
> > > > +			bs_pp_bytes(bs, NULL, op->data.buf.in,
> > > > +				    op->data.nbytes);
> > > > +	}
> > > 
> 
> > > The actual transfers here are done using the same PIO functions as
> > > everything else as far as I can see which makes me wonder what this
> > > spi_mem implementation is adding over the standard SPI flash support.
> 
> > No, if no GPIO CS is supplied the memory read operation is performed by means of
> > the EEPROM-read mode of the DW SSI controller (Write command is written to the
> > Tx FIFO and read data length - to the CTRLR1 register). The memory write
> 
> So EEPROM read mode is a combined write/read - is that somehow less
> performant or something than separate opeations so there's a reason not
> to use it except in this special circumstance?

Actually It's more performant (less APB-bus MMIO operations required to perform
the transfers) and safer in case of Native CS-driven slave devices. In case of
GPIO-driven CS EEPROM and Tx-only are still more performant modes, but less safe
than normal synchronous SPI Tx/Rx mode.

I'll explain it one more time to logically continue this discussion. In general
the DW APB SSI controller may be utilized on two types of platform setups:

1) A slave device is selected by the DW APB SSI Native CS lane only.

You know, that Dw APB SSI native chip select is automatically asserted by the
SPI controller during a transfer being executed. As soon as there is no data in
the Tx FIFO to transfer (Tx-only or Tx-Rx modes) or there is no data left to
receive (EEPROM-read or Rx-only mode), the native CS will be automatically
de-asserted.

If we decided not to implement the mem-ops in the driver, then we'd have to:
for flash-write operation: analyze the passed message ahead, merge the Tx-only
operations and atomically execute them after activating Tx-only mode;
for flash-read operation: analyze the passed message ahead, merge Tx-only
operations, combine it with Rx-only tail part and atomically execute them all
after activating the EEPROM-read mode. Well, what we've done here is basically
a reverse operation performed by the spi_exec_mem_op(), which just split the
spi_mem_op structure up into a set of spi_transfer's. Why would we need to do
this, while we could just use the already provided perfectly suitable spi_mem_op
data structure for our needs? Especially seeing our controller supports the
operations like Tx-only and EEPROM-read specifically provided for the cases
of flash-device write/read operations. Isn't that a reason of claiming that it
supports mem-ops?

If we decided to implement the mem-ops in the driver (as it's done in this
patch), then the only thing we'd need to do is to enable the corresponding DW APB
SSI controller mode and push or push/pull spi_mem_op data on time. That's it.
No remapping, no data analyzing, no merging. Yes, the hardware doesn't provide
that much acceleration than normal Flash-capable SPI controllers, but still
it's better than handling standard transfers.

You'd say why remapping, analyzing and merging? Just use the normal SPI
synchronous Tx/Rx mode and push dummy zeros or ignore incoming data. The answer
is performance. Well, this might be good solution if the APB bus was fast enough.
If it's not, then we have to minimize the IO operations with Dw APB SSI
registers. This can be only done by using the EEPROM-read and Tx-only mode utilized. 

Note, all the operations described above MUST be executed in the atomic context,
so the data push/pull algorithms would keep up with keeping the Tx FIFO
non-empty and preventing the Rx FIFO overflown by automatically incoming data.
If either for a reason of IRQ or another task preemption or due to the slow APB
bus we get the Tx FIFO emptied, then the native CS will be de-asserted. If for
the same reason we get the Rx FIFO overflown, then we'll loose data.

2) A slave is selected by GPIO CS.

Well, TBH in this case it's still more performant to use the Tx-only and
EEPROM-read modes for communications (because we need less APB bus activity),
but these modes mandate the IRQs disabling since data is automatically
received. Since in this case the problem with automatic CS de-assertion doesn't
exist (we can manually set and clear GPIO CS), it is safe to use the normal
synchronous SPI Tx/Rx mode and avoid IRQs disabling. Though in case if better
performance is required then mem-ops should be utilized.

Note using Tx-only and EEPROM-read operations is more performant than
normal synchronous SPI Tx-Rx mode because it causes less APB bus activity.
In addition we need less data analyzing and mapping for spi-mem operations.
In our case both of these factors are very important due to slow APB bus.

(See further comment for details of what I can port from this driver to the
generic DW APB SSI core.)

> 
> > operation utilizes the Tx-only mode (just push data to the Tx FIFO, controller
> > discards all incoming data). See bs_set_cfg() for details.
> 
> That's not super illuminating on this issue?
> 
> > If GPIO CS is utilized then we fallback to the normal push-pull methods because
> > first it's safer this way, second it's still better than handling multiple SPI
> > transfers.
> 
> > BTW The EEPROM/Tx-only modes are required to be utilized for the controller on
> > our SoC in order to support the SPI bus operations with highest frequencies.
> > Due to relatively slow APB bus performance we have to minimize MMIO operation
> > per a single byte read/write. This is another reason why we had to create a
> > dedicated driver for this controller.
> 
> If the controller has support for TX only mode that seems like a good
> thing to have in the generic driver - for TX only transfers that's less
> bus I/O.  The EEPROM mode potentially also, I'm less clear on what
> exactly it is but it sounds potentially useful for common cases with
> register operations.  Perhaps other systems might have similarly slow
> buses and get noticable performance benefits from using these modes.

See my comments above. I gave up and will try to merge my driver into the
generic Dw APB SSI framework.

> 
> > > > +static void bs_copy_from_map(void *to, void __iomem *from, size_t len)
> > > > +{
> > > > +	size_t shift, chunk;
> > > > +	u32 data;
> 
> > > This feels like something that is likely to be suitable for a generic
> > > implementation?  Indeed I'd kind of expect that the architecture
> > > memcpy_to/fromio() would try to copy aligned blocks since it's usually
> > > going to perform better (which I assume is why this function is doing
> > > it).
> 
> > MIPS memcpy_to/fromio don't do this. They expect the MMIO space can be read in
> > a same way as normal memory space. It's our controller "feature", that MMIO can
> > be only accessible by lw/sw instructions. So I had no choice but to implement the
> > aligned read/write operations by myself here. AFAICS this won't be suitable for
> > the generic implementation due to the alignment restriction and the controller
> > setups, which are supposed to be performed before accessing the dirmap MMIO.
> 
> You should document this in the driver so nobody comes along and does
> the refactoring as a cleanup, it looks like duplicated or redundant code.

Ok.

> 
> > > This seems a bit much - we can validate that the message doesn't contain
> > > any delays (which is going to be the overwhelming majority of devices)
> > > or multiple transfers and reject just those transfers we can't support.
> > > Multiple transfers are an issue with this hardware but you could
> > > implement support in the core for coalescing them into single transfers,
> > > this isn't the only hardware with automatic chip select handling that
> > > can't cope with scatter/gather so it would be a useful thing to have.
> 
> > As I said in the commit message to the binding this controller is nearly always
> > connected to an external SPI flash (I'd say always, because Baikal-T1 is fully
> > functional only when initially booted from an SPI flash). At least I don't know
> > any hardware using this interface differently (I've worked with all devices with
> > Baikal-T1 SoC on board). In current driver implementation the SPI-mem operations
> > covers the native CS case while the prepare_message()/transfer_one()/unprepare_message()
> > are used when GPIO-driven CS is available (due to the commit 05766050d5bd I had
> > to implement it). So I didn't want to over-complicate the driver just to support
> > some virtual hardware before it's really necessary.
> 
> Aside from the issue of tempting fate in board designs this circles back
> to the open coding of the bits of the spi-mem stuff that aren't really
> offloaded by the hardware - the driver would look a lot more standard
> (and be closer to spi-dw I guess) if it only implemented the bits that
> are offloaded and this interface was available.  Having the normal
> SPI message interface would not only support non-flash hardware, it'd
> also enable the use of more of the generic SPI flash code like other
> similar controllers.

See my comment above, why it's necessary to have the spi-mem implemented for
the case of native CS. If GPIO-driven CS is utilized, then generic (synchronous
SPI Tx/Rx mode) but poll-based (since there is no IRQ provided for our controller)
algorithm can be utilized. If native CS is only available, then the safest way
is to have the atomic context by disabling the interrupts. This can be easily
done for the spi-mem interface, but nearly impossible for the generic
spi_transfer_one_message()-based IO. The only alternative is to implement the
device-specific transfer_one_message() callback, analyze SPI transfers,
merge them up, execute in the atomic context. Needless to say, that such approach
is only applicable for the IRQless controllers. If IRQ is available for the
controller, then using Native CS is still unsafe.

To sum up what approaches are better in what configurations if IRQ-less DW APB
SSI controller is available:
1) Flash/spi-mem + Native CS => EEPROM-read/Tx-only + atomic context - disable IRQs.
2) Flash/spi-mem + GPIO CS   => normal synchronous SPI Tx/Rx mode (currently this
                                will fallback to 4) because spi_mem_exec_op()
                                implementation, probably for good).
3) SPI msg + Native CS => reimplement transfer_one_message() + merge transfers
                          if possible otherwise use normal Tx+Rx mode + atomic
                          context - disabling IRQs during the whole message.
4) SPI msg + GPIO CS   => normal synchronous SPI Tx/Rx mode.

For the Dw APB SSI controller with IRQ the 2) and 4) items are the same. 1) and
3) can't be safely implemented due to possible concurrent IRQs.

Item 4) is already available in the current spi-dw.c driver, although the poll-mode
has been recently removed (I'll have to get it back). From this patch I can port the
items 1) and 2) to the generic version of the DW APB SSI driver. I won't spend
my time for 3) because first I've already spent a lot of it on implementing this
driver, second I don't need such functionality on our platforms. If someone ever
needs it, I'll assist with an advice of how to do this.

So, what do you think?

> 
> > > > +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi,
> > > > +			   struct spi_transfer *xfer)
> > > > +{
> > > > +	struct bt1_spi *bs = spi_controller_get_devdata(ctrl);
> > > > +
> > > > +	bs_update_cfg(bs, xfer);
> > > > +
> > > > +	if (bs->cfg.dfs <= 8)
> > > > +		bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len);
> > > > +	else
> > > > +		bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2);
> > > 
> > > This will have issues with transfers with an odd number of bytes won't
> > > it?
> 
> > No, it won't. Transfers with bits per word greater than 8 should have even
> > number of bytes seeing the len field is supposed to have the buffers length in
> > bytes:
> 
> So dfs is bits per word?  This is one of those things where the internal
> APIs make things harder to follow.
> 
> > > Perhaps consider regmap_mmio?
> 
> > For multiple reasons no. First of all I need a direct access to the DR register
> > and I need to do this as fast as possible due to relatively slow APB
> > bus. The same concern is regarding the controller configs setting. Secondly
> > everywhere in the driver I use the normal read/write methods and there is no
> > need in even a basic regmap-update function. So an additional abstraction like
> > regmap would be unnecessary complication. Thirdly the access to the registers
> 
> That comment was more due to writing debug infrastructure than stuff
> like update_bits - the main reason regmap-mmio exists is for the debug
> and cache features.  If the performance is too marginal to allow any
> overhead then fine.  There's nothing stopping you combining regmap and
> non-regmap on the same device if you exclude the more sensitive
> registers from regmap with the access operations but it's not exactly
> nice.
> 
> > > > +	if (!ret) {
> > > > +		mutex_lock(&bs->ctrl->io_mutex);
> 
> > > Holding a mutex over the entire time that a file is open is not good,
> > > it's also not clear to me what this mutex is supposed to be protecting
> > > given that it's only referenced in these debugfs functions.
> 
> > Hm, it's io_mutex protecting the SPI IO operation to be thread-safe. See:
> > spi_mem_access_start()/spi_mem_access_end();
> > __spi_pump_messages();
> 
> > By locking the mutex here I make sure the DebugFS registers dump operation won't
> > interfere the IO operations like dirmap, which make the registers unavailable
> > for normal MMIO-based access.
> 
> Oh, it's the SPI level mutex...  that is a bit icky, and it does let
> userspace code block the bus fairly easily which could be problematic
> but I guess it's debugfs so meh.

Ok.

> 
> > > > +		writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR);
> 
> > > Whatever this write is doing we never seem to undo it?
> 
> > Here I switch on the DW APB SSI registers to be available for normal
> > readl/writel accesses. It's not a problem to leave the mode on, since
> > bs_set_cfg() will do a proper config anyway.
> 
> A comment would help here, the fact that it's not undone looks like a
> bug.

Ok.

-Sergey

Mark Brown May 18, 2020, 3:19 p.m. UTC | #13

On Mon, May 18, 2020 at 03:05:42AM +0300, Serge Semin wrote:
> On Mon, May 11, 2020 at 10:25:06PM +0100, Mark Brown wrote:

> > Yes, some flags should work here - the issue was that at least some
> > controllers may end up trying to do multiple SPI operations for one
> > spi-mem thing which will break if the chip select doesn't get changed to
> > correspond with what's going on.

> Ok. New SPI flag it is then. It will be something like this:
> + #define SPI_CONTROLLER_FLASH_SS		BIT(6)

I'd rather use CS than SS (it's more common in the code).

> So, what do you think?

Should be fine, controllers that have an issue implementing just
shouldn't set the flag.

> > > > It's not clear to me that this hardware actually supports spi_mem in
> > > > hardware?

> > > SPI-mem operations are implemented by means of the EEPROM-read and Tx-only
> > > modes of the controller.

> > Sure, but those seem like normal SPI-level things rather than cases
> > where the hardware understands that it has a flash attached and is doing
> > flash specific things.

> No, hardware can't detect whether the flash is attached. This must be defined by
> the platform, like based on the DT sub-nodes.

This isn't about autodetection, it's about the abstraction level the
hardware is operating on - some hardware is able to generate flash
operations by itself (possibly with some help programming the opcodes
that are needed by a given flash), some hardware just works at the
bytestream level.

> > A very common case for this stuff is that
> > controllers have acceleration blocks for read and fall back on normal
> > SPI for writes and erases, that sounds like what's going on here.
> 
> Well, yeah, they do provide some acceleration. EEPROM-read provides automatic
> write-cmd-dummy-data-then-read operations. But in this case the only thing we
> have to push into the SPI Tx FIFO is command and dummy bytes. The read operation

So it's a write then read but you have to program the write each time?

Serge Semin May 18, 2020, 9:17 p.m. UTC | #14

On Mon, May 18, 2020 at 04:19:47PM +0100, Mark Brown wrote:
> On Mon, May 18, 2020 at 03:05:42AM +0300, Serge Semin wrote:
> > On Mon, May 11, 2020 at 10:25:06PM +0100, Mark Brown wrote:
> 
> > > Yes, some flags should work here - the issue was that at least some
> > > controllers may end up trying to do multiple SPI operations for one
> > > spi-mem thing which will break if the chip select doesn't get changed to
> > > correspond with what's going on.
> 
> > Ok. New SPI flag it is then. It will be something like this:
> > + #define SPI_CONTROLLER_FLASH_SS		BIT(6)
> 
> I'd rather use CS than SS (it's more common in the code).

Ok.

> 
> > So, what do you think?
> 
> Should be fine, controllers that have an issue implementing just
> shouldn't set the flag.

Yes, exactly what I intended.

> 
> > > > > It's not clear to me that this hardware actually supports spi_mem in
> > > > > hardware?
> 
> > > > SPI-mem operations are implemented by means of the EEPROM-read and Tx-only
> > > > modes of the controller.
> 
> > > Sure, but those seem like normal SPI-level things rather than cases
> > > where the hardware understands that it has a flash attached and is doing
> > > flash specific things.
> 
> > No, hardware can't detect whether the flash is attached. This must be defined by
> > the platform, like based on the DT sub-nodes.
> 
> This isn't about autodetection, it's about the abstraction level the
> hardware is operating on - some hardware is able to generate flash
> operations by itself (possibly with some help programming the opcodes
> that are needed by a given flash), some hardware just works at the
> bytestream level.
> 
> > > A very common case for this stuff is that
> > > controllers have acceleration blocks for read and fall back on normal
> > > SPI for writes and erases, that sounds like what's going on here.
> > 
> > Well, yeah, they do provide some acceleration. EEPROM-read provides automatic
> > write-cmd-dummy-data-then-read operations. But in this case the only thing we
> > have to push into the SPI Tx FIFO is command and dummy bytes. The read operation
> 
> So it's a write then read but you have to program the write each time?

Here is what we need to do to perform the EEPROM-read operation:
1) Enable EEPROM-read mode.
2) Initialize a corresponding registers with a number of SPI transfer words
   (with bits-per-word taken into account) to read.
3) Push opcode + address + dummy bytes into the Tx FIFO. When it's done and
   the Tx FIFO is empty, the controller will proceed with read operations by
   pushing zeros (or ones, don't remember what level it's by default) to MOSI
   and pulling data from MISO into the RX FIFO.
4) Keep up with getting data from the Rx FIFO so one wouldn't get overflown.

Regarding programming write each time. Well, it's up to the driver implementation.
If opcode, address, dummy bytes and number of words to read are the same as before,
then re-programming isn't required.

-Sergey

Mark Brown May 19, 2020, 10:32 a.m. UTC | #15

On Tue, May 19, 2020 at 12:17:27AM +0300, Serge Semin wrote:

> Here is what we need to do to perform the EEPROM-read operation:
> 1) Enable EEPROM-read mode.
> 2) Initialize a corresponding registers with a number of SPI transfer words
>    (with bits-per-word taken into account) to read.
> 3) Push opcode + address + dummy bytes into the Tx FIFO. When it's done and
>    the Tx FIFO is empty, the controller will proceed with read operations by
>    pushing zeros (or ones, don't remember what level it's by default) to MOSI
>    and pulling data from MISO into the RX FIFO.
> 4) Keep up with getting data from the Rx FIFO so one wouldn't get overflown.

> Regarding programming write each time. Well, it's up to the driver implementation.
> If opcode, address, dummy bytes and number of words to read are the same as before,
> then re-programming isn't required.

Ah, nice.  This should be useful for far more than just flash - most
register reads will also be able to take advantage of this, they follow
a similar write then read pattern.

[2/2] spi: Add Baikal-T1 System Boot SPI Controller driver

Commit Message

Comments

Patch