Message ID | 20200508093621.31619-3-Sergey.Semin@baikalelectronics.ru (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | spi: Add Baikal-T1 System Boot SPI Controller driver | expand |
On Fri, May 8, 2020 at 12:37 PM Serge Semin <Sergey.Semin@baikalelectronics.ru> wrote: > > This SPI-controller is a part of the Baikal-T1 System Controller and > is based on the DW APB SSI IP-core, but with very limited resources: > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx > FIFO available. In order to provide a transparent initial boot code > execution this controller is also utilized by an vendor-specific block, > which provides an CS0 SPI flash direct mapping interface. Since both > direct mapping and SPI controller normal utilization are mutual exclusive > only a one of these interfaces can be used to access an external SPI > slave device. Taking into account the peculiarities of the controller > registers and physically mapped SPI flash access, very limited resources > and seeing the normal usecase of the controller is to access an external > SPI-nor flash, we decided to create a dedicated SPI driver for it. It seems a lot of code. Why can't you use spi-dw-mmio.c et al.? > The driver provides callbacks for native messages-based SPI interface, > SPI-memory and direct mapping read operations. Due to not having any > asynchronous signaling interface provided by the core we have no choice > but to implement a polling-based data transmission/reception algorithm. > In addition to that in order to bypass the automatic native chip-select > toggle the driver disables the local interrupts during the memory-based > transfers if no complementary GPIO-based chip-select detected in the > platform.
On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote: > On Fri, May 8, 2020 at 12:37 PM Serge Semin > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > This SPI-controller is a part of the Baikal-T1 System Controller and > > is based on the DW APB SSI IP-core, but with very limited resources: > > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx > > FIFO available. In order to provide a transparent initial boot code > > execution this controller is also utilized by an vendor-specific block, > > which provides an CS0 SPI flash direct mapping interface. Since both > > direct mapping and SPI controller normal utilization are mutual exclusive > > only a one of these interfaces can be used to access an external SPI > > slave device. Taking into account the peculiarities of the controller > > registers and physically mapped SPI flash access, very limited resources > > and seeing the normal usecase of the controller is to access an external > > SPI-nor flash, we decided to create a dedicated SPI driver for it. > > It seems a lot of code. > Why can't you use spi-dw-mmio.c et al.? I said above why. Even though the registers set is similar It's too specific to be integrated into the generic DW SSI driver. -Sergey > > > The driver provides callbacks for native messages-based SPI interface, > > SPI-memory and direct mapping read operations. Due to not having any > > asynchronous signaling interface provided by the core we have no choice > > but to implement a polling-based data transmission/reception algorithm. > > In addition to that in order to bypass the automatic native chip-select > > toggle the driver disables the local interrupts during the memory-based > > transfers if no complementary GPIO-based chip-select detected in the > > platform. > > > > -- > With Best Regards, > Andy Shevchenko
On Fri, May 08, 2020 at 01:15:41PM +0300, Serge Semin wrote: > On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote: > > > slave device. Taking into account the peculiarities of the controller > > > registers and physically mapped SPI flash access, very limited resources > > > and seeing the normal usecase of the controller is to access an external > > > SPI-nor flash, we decided to create a dedicated SPI driver for it. > > It seems a lot of code. > > Why can't you use spi-dw-mmio.c et al.? > I said above why. Even though the registers set is similar It's too specific > to be integrated into the generic DW SSI driver. Can you be more specific about the issues? From what you wrote it sounded like the main thing was chip select handling. > > > The driver provides callbacks for native messages-based SPI interface, > > > SPI-memory and direct mapping read operations. Due to not having any > > > asynchronous signaling interface provided by the core we have no choice What do you mean by "asynchronous signaling interface provided by the core" here?
On Fri, May 8, 2020 at 1:15 PM Serge Semin <Sergey.Semin@baikalelectronics.ru> wrote: > > On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote: > > On Fri, May 8, 2020 at 12:37 PM Serge Semin > > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > > > This SPI-controller is a part of the Baikal-T1 System Controller and > > > is based on the DW APB SSI IP-core, but with very limited resources: > > > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx > > > FIFO available. In order to provide a transparent initial boot code > > > execution this controller is also utilized by an vendor-specific block, > > > which provides an CS0 SPI flash direct mapping interface. Since both > > > direct mapping and SPI controller normal utilization are mutual exclusive > > > only a one of these interfaces can be used to access an external SPI > > > slave device. Taking into account the peculiarities of the controller > > > registers and physically mapped SPI flash access, very limited resources > > > and seeing the normal usecase of the controller is to access an external > > > SPI-nor flash, we decided to create a dedicated SPI driver for it. > > > > It seems a lot of code. > > Why can't you use spi-dw-mmio.c et al.? > > I said above why. Even though the registers set is similar It's too specific > to be integrated into the generic DW SSI driver. At least you may do at the beginning is to reuse header spi-dw.h and put your stuff under spi-dw-baikal.c or so. Then, look at the spi-dw.c and check what can be reused (I think a lot).
On Fri, May 08, 2020 at 12:36:21PM +0300, Serge Semin wrote: Your CC list on this series is *very* wide - are you sure that this is relevant to everyone you're sending things to? Kernel developers often get a lot of mail meaning that extra mail can become a bit overwhelming and cause things to get lost in the noise. > This SPI-controller is a part of the Baikal-T1 System Controller and > is based on the DW APB SSI IP-core, but with very limited resources: > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx > FIFO available. In order to provide a transparent initial boot code > execution this controller is also utilized by an vendor-specific block, > which provides an CS0 SPI flash direct mapping interface. Since both > direct mapping and SPI controller normal utilization are mutual exclusive > only a one of these interfaces can be used to access an external SPI > slave device. Taking into account the peculiarities of the controller > registers and physically mapped SPI flash access, very limited resources > and seeing the normal usecase of the controller is to access an external > SPI-nor flash, we decided to create a dedicated SPI driver for it. My overall impression reading this is that there could be a lot more reliance on both generic functionality and as Andy suggested the spi-dw driver - the headers seem easy for example. As far as I can tell the main things this is adding compared to spi-dw are the dirmap code and the IRQ disabling around the PIO on the FIFO both of which seem like relatively small additions which it should be possible to accomodate within spi-dw. For example the driver could have multiple transfer functions and pick a different one with the interrupt disabling when running on this hardware. > --- /dev/null > +++ b/drivers/spi/spi-bt1-sys.c > @@ -0,0 +1,873 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC Please make the entire comment a C++ one so things look a bit cleaner and more intentional. > + writel(BIT(req->cs), bs->regs + BC_SPI_SER); > + if (req->cs_gpiod) { > + gpiod_set_value_cansleep(req->cs_gpiod, > + !!(bs->cfg.mode & SPI_CS_HIGH)); If you have a GPIO chip select you should just let the core manage it through cs_gpiod rather than open coding. > + } else if (!req->dirmap) { > + local_irq_save(bs->cfg.flags); > + preempt_disable(); > + } This is obviously not great, especially for larger transfers. It would be a lot easier to review and manage this if the IRQ disabling was done around the transfers in a single function rather than separately, that way everything is next to each other and it's clearer that we're doing this for the minimum time and didn't miss anything. Right now it's hard to tell if everything is joined up. > +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer) > +{ This has exactly one caller, perhaps it could just be inlined there? I think this applies to some of the other internal functions. > +static int bs_check_status(struct bt1_spi *bs) > +{ It's not obvious from the name that this function will busy wait rather than just reading some status registers. > + /* > + * Spin around the BUSY-state bit waiting for the controller to finish > + * all the requested IO-operations. > + */ > + do { > + data = readl(bs->regs + BC_SPI_SR); > + } while ((data & BC_SPI_SR_BUSY) || !(data & BC_SPI_SR_TFE)); We could use a timeout or something here in case something goes wrong, as things stand we could end up spinning for ever. It's also not clear to me why we only check for over/underrun before we do the busy wait, especially a RX overrun could happen and cause us to loose data while things are finishing up. > + while (txlen || rxlen) { > + cnt = BC_SPI_FIFO_LVL - __raw_readl(bs->regs + BC_SPI_TXFLR); > + cnt = min(cnt, txlen); > + txlen -= cnt; > + while (cnt--) { > + data = src ? *src++ : 0xFF; > + __raw_writel(data, bs->regs + BC_SPI_DR); > + } It's more typical to write zeros when there's no data. > +static int bs_exec_mem_op(struct spi_mem *mem, > + const struct spi_mem_op *op) > +{ It's not clear to me that this hardware actually supports spi_mem in hardware? > + if (!bs->cfg.cs_gpiod) { > + bs_push_bytes(bs, cmd, len); > + > + if (op->data.dir == SPI_MEM_DATA_IN) > + ret = bs_pull_bytes(bs, op->data.buf.in, > + op->data.nbytes); > + } else { > + bs_pp_bytes(bs, cmd, NULL, len); > + > + if (op->data.dir == SPI_MEM_DATA_IN) > + bs_pp_bytes(bs, NULL, op->data.buf.in, > + op->data.nbytes); > + } The actual transfers here are done using the same PIO functions as everything else as far as I can see which makes me wonder what this spi_mem implementation is adding over the standard SPI flash support. I've not checked but if we need this to get the dirmap support to kick in then we should fix this by making the core not have that requirement. > +static void bs_copy_from_map(void *to, void __iomem *from, size_t len) > +{ > + size_t shift, chunk; > + u32 data; This feels like something that is likely to be suitable for a generic implementation? Indeed I'd kind of expect that the architecture memcpy_to/fromio() would try to copy aligned blocks since it's usually going to perform better (which I assume is why this function is doing it). > + if (shift) { > + chunk = min_t(size_t, 4 - shift, len); > + data = readl_relaxed(from - shift); > + memcpy(to, &data + shift, chunk); It's a little unclear what we're actually doing though - we read a data address from the hardware? > +static int bs_prepare_message(struct spi_controller *ctrl, > + struct spi_message *msg) > +{ > + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); > + struct spi_transfer *xfer; > + struct bt1_spi_cfg req; > + > + /* > + * Currently the driver doesn't support the generic messages-based > + * SPI interface with pure native chip-select signal. This is due > + * to the automatic native chip-select toggle peculiarity described > + * in the comment of the bs_set_cfg() method. Alas we can't use the > + * IRQ-disable-based boost approach here since the native queue-based > + * SPI messages transfer method can sleep waiting for the > + * transfers/CS-change delays. > + */ > + if (!msg->spi->cs_gpiod) > + return -ENOTSUPP; This seems a bit much - we can validate that the message doesn't contain any delays (which is going to be the overwhelming majority of devices) or multiple transfers and reject just those transfers we can't support. Multiple transfers are an issue with this hardware but you could implement support in the core for coalescing them into single transfers, this isn't the only hardware with automatic chip select handling that can't cope with scatter/gather so it would be a useful thing to have. > + /* > + * Collect the controller configuration common for all transfers and > + * specific to the very first one. > + */ > + xfer = list_first_entry(&msg->transfers, typeof(*xfer), transfer_list); > + req.dirmap = false; > + req.cs = msg->spi->chip_select; > + req.cs_gpiod = msg->spi->cs_gpiod; > + req.flags = 0; > + req.mode = msg->spi->mode; > + req.tmode = BC_SPI_CTRL0_TMOD_TR; > + req.ndf = 0; > + req.dfs = xfer->bits_per_word; > + req.freq = xfer->speed_hz; > + > + bs_set_cfg(bs, &req); It's not clear to me what the benefit is in this indirection through req? > +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi, > + struct spi_transfer *xfer) > +{ > + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); > + > + bs_update_cfg(bs, xfer); > + > + if (bs->cfg.dfs <= 8) > + bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len); > + else > + bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2); This will have issues with transfers with an odd number of bytes won't it? > +static struct bt1_spi *bs_create_data(struct platform_device *pdev) > +{ > + struct device *dev = &pdev->dev; > + struct bt1_spi *bs; As with some of the other functions this has exactly one caller and barely any content in it, just inline it. > + bs->dev = dev; > + bs->pdev = pdev; Do you really need to separately save these (and is there a context where you have the driver data but not the device anyway)? > +#ifdef CONFIG_DEBUG_FS > + > +#define BC_SPI_DBGFS_REG(_name, _off) \ > +{ \ > + .name = _name, \ > + .offset = _off \ > +} Perhaps consider regmap_mmio? > +static int bs_dbgfs_open_regs(struct inode *inode, struct file *file) > +{ > + struct bt1_spi *bs = inode->i_private; > + int ret; > + > + ret = single_open(file, bs_dbgfs_show_regs, bs); > + if (!ret) { > + mutex_lock(&bs->ctrl->io_mutex); Holding a mutex over the entire time that a file is open is not good, it's also not clear to me what this mutex is supposed to be protecting given that it's only referenced in these debugfs functions. > + writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR); Whatever this write is doing we never seem to undo it?
On Fri, May 08, 2020 at 11:22:10AM +0100, Mark Brown wrote: > On Fri, May 08, 2020 at 01:15:41PM +0300, Serge Semin wrote: > > On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote: > > > > > slave device. Taking into account the peculiarities of the controller > > > > registers and physically mapped SPI flash access, very limited resources > > > > and seeing the normal usecase of the controller is to access an external > > > > SPI-nor flash, we decided to create a dedicated SPI driver for it. > > > > It seems a lot of code. > > > Why can't you use spi-dw-mmio.c et al.? > > > I said above why. Even though the registers set is similar It's too specific > > to be integrated into the generic DW SSI driver. > > Can you be more specific about the issues? From what you wrote it > sounded like the main thing was chip select handling. I thought it would be obvious from the patch itself. I've thoroughly described all the issues there. Here in cover-letter it's a summary of the main ones. Anyway here are all collected at once: 1) Registers mapping. The DW SSI registers are shifted by 0x100 with respect to the MMIO region start. The lowest 0x100 registers are responsible for the Baikal-T1 Boot Controller settings. There aren't much of them there though. Our code is interested only in a flag, which switches an accessibility of the DW APB SSI registers and direct SPI flash mapping. And this switchability is a reason of another peculiarity (see the next item for details). 2) SPI flash direct mapping. SPI flash direct mapping and DW APB SSI registers are mutual exclusive, so only one of them can be enabled at a time. In order to use the dirmap we have to switch the RDA bit off in the Boot Controller setup register. If DW APB SSI registers need to be accessed the RDA bit should be set. For this reason we have to make sure that dirmap operations, SPI operations and SPI-mem-ops operations are exclusive, since some of them need to interact with the DW APB SSI registers, while another the directly mapped SPI flash MMIO (currently ctlr->io_mutex is responsible for this). 3) A specific access to MMIO (concerning directly mapped SPI flash MMIO). The SoC interconnect is designed in a way so we can't use any instruction to read/write from/to the MMIO space. It has to be done by lw/sw with dword-aligned address passed. Though in this driver we only use a read operation from the directly mapped SPI flash memory. 4) No direct handling of the CS. Though this is an issue of all DW SSI controllers, here with very small FIFO and no DMA/IRQ supported it mandates to workaround any preemption/interruption during a non-GPIO-CS-based transfer. For the same reason the driver doesn't support normal spi-messages based interface if no GPIO-CS supplied. In addition since FIFO is too small and most of our platforms don't have GPIO-CS support we had to create the SPI-mem-ops instead of generic SPI-callback. 5) MMIO access race condition. As I described in the in-code comment it's a very tricky race happening during concurrent access from different cores to the APB bus. Due to this if SPI interface is working high frequency like 12.5 - 25 MHz and there is some another code working with APB bus on another core, the SPI data pushing function might not keep up with filling small FIFO (8 bytes) on time, since the APB bus has got just 50 MHz frequency. Due to this I had artificially limit the SPI bus frequency if there is more than one CPU in the system. 6) A single CS. It's normally connected to an external SPI flash so the driver is equipped with mem-ops and dirmap out of the box. BTW normal SPI-operations are in fact unneeded on all the platforms currently equipped with Baikal-T1 because all of them have got SPI flash attached to this interface, and most likely it will be like in new platforms too. 7) No interrupts. Yes, this is another peculiarity. Since this DW APB core is a part of the boot controller it just don't need IRQs. Boot controller is responsible for the code loading from SPI flash. It has got a dedicated RTL which provide a transparent access to the flash just by reading from a corresponding MMIO range (see the SPI flash direct mapping described above). 8) No DMA. Yeah, and DMA isn't supported for the same reason. I am pretty sure I have forgotten something. Anyway it has been much easier to create a new driver instead of integrating all of these into a generic one. Integrating something like this in the current DW APB SSI driver would mean to have it completely overwritten (refactored if you want) which would bring us to a new driver anyway. I don't think it would be good to complicate the generic driver with so many peculiar things scattered around the code with various hooks or ifdef, especially seeing the current code has already become a bit messy. > > > > > The driver provides callbacks for native messages-based SPI interface, > > > > SPI-memory and direct mapping read operations. Due to not having any > > > > asynchronous signaling interface provided by the core we have no choice > > What do you mean by "asynchronous signaling interface provided by the > core" here? By asynchronous signalling I meant DMA and IRQ. Non of them provided by the controller. -Sergey
On Fri, May 08, 2020 at 01:26:52PM +0300, Andy Shevchenko wrote: > On Fri, May 8, 2020 at 1:15 PM Serge Semin > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > On Fri, May 08, 2020 at 01:03:11PM +0300, Andy Shevchenko wrote: > > > On Fri, May 8, 2020 at 12:37 PM Serge Semin > > > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > > > > > This SPI-controller is a part of the Baikal-T1 System Controller and > > > > is based on the DW APB SSI IP-core, but with very limited resources: > > > > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx > > > > FIFO available. In order to provide a transparent initial boot code > > > > execution this controller is also utilized by an vendor-specific block, > > > > which provides an CS0 SPI flash direct mapping interface. Since both > > > > direct mapping and SPI controller normal utilization are mutual exclusive > > > > only a one of these interfaces can be used to access an external SPI > > > > slave device. Taking into account the peculiarities of the controller > > > > registers and physically mapped SPI flash access, very limited resources > > > > and seeing the normal usecase of the controller is to access an external > > > > SPI-nor flash, we decided to create a dedicated SPI driver for it. > > > > > > It seems a lot of code. > > > Why can't you use spi-dw-mmio.c et al.? > > > > I said above why. Even though the registers set is similar It's too specific > > to be integrated into the generic DW SSI driver. > > At least you may do at the beginning is to reuse header spi-dw.h and > put your stuff under > spi-dw-baikal.c or so. Then, look at the spi-dw.c and check what can > be reused (I think a lot). Nah, It's not a lot, barely a few things. What is useful is the registers map (though it has to be shifted and most of the registers are unused), IO operations (two small read-write methods with questionable design) and possibly a few lines of config setting procedure. You think I didn't had in mind to integrate this controller support into the generic driver or resuse something from there? If it was worth a try I would have done it. -Sergey > > -- > With Best Regards, > Andy Shevchenko
On Fri, May 08, 2020 at 06:42:10PM +0300, Serge Semin wrote: > On Fri, May 08, 2020 at 11:22:10AM +0100, Mark Brown wrote: > > Can you be more specific about the issues? From what you wrote it > > sounded like the main thing was chip select handling. > I thought it would be obvious from the patch itself. I've thoroughly described all > the issues there. Here in cover-letter it's a summary of the main ones. Bear in mind that the patch is a stand alone thing, there's not a copy of the existing driver sitting with it and the stylistic changes make comparisons even less obvious. > 1) Registers mapping. The DW SSI registers are shifted by 0x100 with > respect to the MMIO region start. The lowest 0x100 registers are > responsible for the Baikal-T1 Boot Controller settings. There aren't much > of them there though. Our code is interested only in a flag, which switches > an accessibility of the DW APB SSI registers and direct SPI flash mapping. > And this switchability is a reason of another peculiarity (see the next > item for details). That seems fairly easy to address, for example with a subdevice or indirecting through ops for I/O that could add on an offset (what a subdevice would end up accomplishing really). > 2) SPI flash direct mapping. SPI flash direct mapping and DW APB SSI registers > are mutual exclusive, so only one of them can be enabled at a time. In > order to use the dirmap we have to switch the RDA bit off in the Boot > Controller setup register. If DW APB SSI registers need to be accessed the > RDA bit should be set. For this reason we have to make sure that dirmap > operations, SPI operations and SPI-mem-ops operations are exclusive, since > some of them need to interact with the DW APB SSI registers, while another This exclusivity requirement is pretty standard for these flash memory map controllers, the framework should ensure you don't get any overlap between memory mapped and regular interactions. > the directly mapped SPI flash MMIO (currently ctlr->io_mutex is responsible > for this). It only seemed to be referenced in the debugfs code? > 3) A specific access to MMIO (concerning directly mapped SPI flash MMIO). > The SoC interconnect is designed in a way so we can't use any instruction to > read/write from/to the MMIO space. It has to be done by lw/sw with > dword-aligned address passed. Though in this driver we only use a read > operation from the directly mapped SPI flash memory. That's a custom IP block for your systems so that'd be a separate operation no matter what. > 4) No direct handling of the CS. Though this is an issue of all DW SSI > controllers, here with very small FIFO and no DMA/IRQ supported it mandates to > workaround any preemption/interruption during a non-GPIO-CS-based transfer. > For the same reason the driver doesn't support normal spi-messages based As I said when reviewing the driver I think all you need here is support in the core for linearizing messages into a single transfer and then what you're left with is what should be a fairly small quirk for running with interrupts disabled if there's no DMA or interrupts. I'd expect both bits of that to benefit some other users too, there's definitely other controllers that have problems with automated chip select handling but happen to get away with it a lot of the time. > interface if no GPIO-CS supplied. In addition since FIFO is too small and most > of our platforms don't have GPIO-CS support we had to create the SPI-mem-ops > instead of generic SPI-callback. As I also said in reviewing the driver that's just not a good idea anyway, there is no way a driver should be open coding things like that and there are much better ways to support this. This is only there because the driver isn't able to cope with normal messages, it's better to solve that problem and use the generic flash code than to emulate the generic flash code here. In both these cases it looks like the majority of the reason the driver is different is that you're trying to solve problems in the driver without changing the core, some things are a lot easier to handle further up the stack. > 5) MMIO access race condition. As I described in the in-code comment it's a > very tricky race happening during concurrent access from different cores to the > APB bus. Due to this if SPI interface is working high frequency like This looks like it should be a fairly small quirk? > I am pretty sure I have forgotten something. Anyway it has been much easier > to create a new driver instead of integrating all of these into a generic > one. Integrating something like this in the current DW APB SSI driver would > mean to have it completely overwritten (refactored if you want) which would > bring us to a new driver anyway. I don't think it would be good to > complicate the generic driver with so many peculiar things scattered around > the code with various hooks or ifdef, especially seeing the current code has > already become a bit messy. It really does sound like other than the bits I don't think should be implemented on a per-driver level these differences are relatively small.
On Fri, May 08, 2020 at 12:37:51PM +0100, Mark Brown wrote: > On Fri, May 08, 2020 at 12:36:21PM +0300, Serge Semin wrote: > > Your CC list on this series is *very* wide - are you sure that this is > relevant to everyone you're sending things to? Kernel developers often > get a lot of mail meaning that extra mail can become a bit overwhelming > and cause things to get lost in the noise. MIPS folks are here since Baikal-T1 is a MIPS-based chip and this patchset is a part of the campaign of integrating the SoC support into the kernel. With Lee and Miquel we discussed the dirmap support in the framework of another patchset. Rob and devicetree-list are CC'ed due to having the bindings tree updated. Then a series of folks who recently submitted the biggest updates into the spi subsystems so might have valuable comments for this driver as well. So yes, I am sure. > > > This SPI-controller is a part of the Baikal-T1 System Controller and > > is based on the DW APB SSI IP-core, but with very limited resources: > > no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx > > FIFO available. In order to provide a transparent initial boot code > > execution this controller is also utilized by an vendor-specific block, > > which provides an CS0 SPI flash direct mapping interface. Since both > > direct mapping and SPI controller normal utilization are mutual exclusive > > only a one of these interfaces can be used to access an external SPI > > slave device. Taking into account the peculiarities of the controller > > registers and physically mapped SPI flash access, very limited resources > > and seeing the normal usecase of the controller is to access an external > > SPI-nor flash, we decided to create a dedicated SPI driver for it. > > My overall impression reading this is that there could be a lot more > reliance on both generic functionality and as Andy suggested the spi-dw > driver - the headers seem easy for example. As far as I can tell the > main things this is adding compared to spi-dw are the dirmap code and > the IRQ disabling around the PIO on the FIFO both of which seem like > relatively small additions which it should be possible to accomodate > within spi-dw. For example the driver could have multiple transfer > functions and pick a different one with the interrupt disabling when > running on this hardware. dirmap and IRQs disabling isn't all they have different. Please see my answer to you in the cover-letter thread and the comments below in this email. > > > --- /dev/null > > +++ b/drivers/spi/spi-bt1-sys.c > > @@ -0,0 +1,873 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC > > Please make the entire comment a C++ one so things look a bit cleaner > and more intentional. This has been unexpected. It's first time I see such a requirement. Anyway ok. I'll fix it. Is it ok to have the C-style comments in the header file? * It isn't included by any assembly so from this point of view C++ style * shall also work there. > > > + writel(BIT(req->cs), bs->regs + BC_SPI_SER); > > + if (req->cs_gpiod) { > > + gpiod_set_value_cansleep(req->cs_gpiod, > > + !!(bs->cfg.mode & SPI_CS_HIGH)); > > If you have a GPIO chip select you should just let the core manage it > through cs_gpiod rather than open coding. Of course I know this, and normally I would have omitted the GPIO manual assertion (hopefully soon my hands get to merging the AX99100 driver I've developed some time ago). The thing is that this Baikal-T1 System SSI device driver has been initially written before commit 05766050d5bd ("spi: spi-mem: fallback to using transfers when CS gpios are used"). So asserting GPIO CS had been required to initiate the SPI memory communications seeing the generic spi_mem_exec_op() doesn't do this. Manual GPIO manipulation is indeed redundant for the current SPI-mem op execution procedure. Secondly the message of that commit states "Devices with chip selects driven via GPIO are not compatible with the spi-mem operations." I find this statement questionable, because for instance this device supports memory operations with GPIO-driven CS. Though in current implementation the driver fallback to using normal push-pull IO mode if GPIO CS is utilized as safer one. But even in this case it's better than splitting the memory operations up into the transfers, which is developed in the spi_mem_exec_op() method. So in this matter my question is: how to modify the SPI-mem interface so the SPI-memory operations would also work with GPIO driven CS? Some additional flag might work... Thirdly what about dirmap operations? If we got a GPIO driven CS then due to lacking any CS manipulation in spi_mem_dirmap_read() method we wouldn't have been able to make the direct mapping working without manual setting the GPIO. So the same question here. How to work this around and justify your requirement? Until the question is answered and we come up with reasonable solution in order to have the SPI-mem/dirmap interface working together with GPIO CS support I have to leave the manual GPIO manipulation. > > > + } else if (!req->dirmap) { > > + local_irq_save(bs->cfg.flags); > > + preempt_disable(); > > + } > > This is obviously not great, especially for larger transfers. Don't tell me about this. But this is the only way to make the controller working with all the weird stuff it has got. > It would > be a lot easier to review and manage this if the IRQ disabling was done > around the transfers in a single function rather than separately, that > way everything is next to each other and it's clearer that we're doing > this for the minimum time and didn't miss anything. Right now it's hard > to tell if everything is joined up. Hm, this is questionable. If it wasn't joined up I wouldn't have the driver working. Anyway I see your point and from a different angle of view this might be better to move this operations to the bs_exec_mem_op() method. But if you insist on implementing the full SPI-message interface support even for native CS, I'd leave it here to avoid the code duplication, since the IRQs disabling workaround should be also used there. > > > +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer) > > +{ > > This has exactly one caller, perhaps it could just be inlined there? I > think this applies to some of the other internal functions. I don't want to inline methods just because they are used in a single place. This might worsen readability and in this case it will. Currently the transfer_one() method is consistent with self-explanatory methods: update config, push-pull bytes/words, check status. This also applies to the rest of the code. I can consider some improvements/optimizations in this matter though, but embedding the functions isn't one of them. Moreover the compiler does the static methods inlining automatically as soon as it sees they are called just once. > > > +static int bs_check_status(struct bt1_spi *bs) > > +{ > > It's not obvious from the name that this function will busy wait rather > than just reading some status registers. What do you suggest then? Renaming? Splitting up? If renaming, then what name do you prefer? Something like bs_check_completion()? > > > + /* > > + * Spin around the BUSY-state bit waiting for the controller to finish > > + * all the requested IO-operations. > > + */ > > + do { > > + data = readl(bs->regs + BC_SPI_SR); > > + } while ((data & BC_SPI_SR_BUSY) || !(data & BC_SPI_SR_TFE)); > > We could use a timeout or something here in case something goes wrong, > as things stand we could end up spinning for ever. Ok. readl_poll_timeout_atomic() shall do the trick. > > It's also not clear to me why we only check for over/underrun before we > do the busy wait, especially a RX overrun could happen and cause us to > loose data while things are finishing up. In fact this busy-wait loop is relevant for the Tx-only operations. So here we are waiting for the controller to push the rest of the data from FIFO to the bus. This is necessary so the next IO operation wouldn't clear the FIFO up by disabling the controller at the bs_set_cfg() method. The busy-wait loop shall not cause any overruns (because Rx is turned off). If Tx/Rx overrun happened it must have happened before and only If either the IO algorithm couldn't keep up with data being pushed out/pulled in to/from the FIFOs (SPI bus is too fast while MMIO operations are too slow or for some other case) or I messed something up with the IO algorithm implementation. Anyway in order to prevent such question from future code readers I'll move the busy-wait loop in the beginning of the function. > > > + while (txlen || rxlen) { > > + cnt = BC_SPI_FIFO_LVL - __raw_readl(bs->regs + BC_SPI_TXFLR); > > + cnt = min(cnt, txlen); > > + txlen -= cnt; > > + while (cnt--) { > > + data = src ? *src++ : 0xFF; > > + __raw_writel(data, bs->regs + BC_SPI_DR); > > + } > > It's more typical to write zeros when there's no data. Ok. > > > +static int bs_exec_mem_op(struct spi_mem *mem, > > + const struct spi_mem_op *op) > > +{ > > It's not clear to me that this hardware actually supports spi_mem in > hardware? SPI-mem operations are implemented by means of the EEPROM-read and Tx-only modes of the controller. > > > + if (!bs->cfg.cs_gpiod) { > > + bs_push_bytes(bs, cmd, len); > > + > > + if (op->data.dir == SPI_MEM_DATA_IN) > > + ret = bs_pull_bytes(bs, op->data.buf.in, > > + op->data.nbytes); > > + } else { > > + bs_pp_bytes(bs, cmd, NULL, len); > > + > > + if (op->data.dir == SPI_MEM_DATA_IN) > > + bs_pp_bytes(bs, NULL, op->data.buf.in, > > + op->data.nbytes); > > + } > > The actual transfers here are done using the same PIO functions as > everything else as far as I can see which makes me wonder what this > spi_mem implementation is adding over the standard SPI flash support. No, if no GPIO CS is supplied the memory read operation is performed by means of the EEPROM-read mode of the DW SSI controller (Write command is written to the Tx FIFO and read data length - to the CTRLR1 register). The memory write operation utilizes the Tx-only mode (just push data to the Tx FIFO, controller discards all incoming data). See bs_set_cfg() for details. If GPIO CS is utilized then we fallback to the normal push-pull methods because first it's safer this way, second it's still better than handling multiple SPI transfers. BTW The EEPROM/Tx-only modes are required to be utilized for the controller on our SoC in order to support the SPI bus operations with highest frequencies. Due to relatively slow APB bus performance we have to minimize MMIO operation per a single byte read/write. This is another reason why we had to create a dedicated driver for this controller. > I've not checked but if we need this to get the dirmap support to kick > in then we should fix this by making the core not have that requirement. No, dirmap is implemented by means of dedicated sub-block of the boot controller, which does all of the IO by itself (though the controller config still has to be accordingly setup including CS or GPIO CS). We just read a ready SPI-flash data from a dedicated MMIO space. > > > +static void bs_copy_from_map(void *to, void __iomem *from, size_t len) > > +{ > > + size_t shift, chunk; > > + u32 data; > > This feels like something that is likely to be suitable for a generic > implementation? Indeed I'd kind of expect that the architecture > memcpy_to/fromio() would try to copy aligned blocks since it's usually > going to perform better (which I assume is why this function is doing > it). MIPS memcpy_to/fromio don't do this. They expect the MMIO space can be read in a same way as normal memory space. It's our controller "feature", that MMIO can be only accessible by lw/sw instructions. So I had no choice but to implement the aligned read/write operations by myself here. AFAICS this won't be suitable for the generic implementation due to the alignment restriction and the controller setups, which are supposed to be performed before accessing the dirmap MMIO. > > > + if (shift) { > > + chunk = min_t(size_t, 4 - shift, len); > > + data = readl_relaxed(from - shift); > > + memcpy(to, &data + shift, chunk); > > It's a little unclear what we're actually doing though - we read a data > address from the hardware? As I've written in the comment: * We split the copying up into the next three stages: unaligned head, * aligned body, unaligned tail. here we do the unaligned head read operation. I read data from an aligned address, then copy the requested part to the output buffer. > > > +static int bs_prepare_message(struct spi_controller *ctrl, > > + struct spi_message *msg) > > +{ > > + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); > > + struct spi_transfer *xfer; > > + struct bt1_spi_cfg req; > > + > > + /* > > + * Currently the driver doesn't support the generic messages-based > > + * SPI interface with pure native chip-select signal. This is due > > + * to the automatic native chip-select toggle peculiarity described > > + * in the comment of the bs_set_cfg() method. Alas we can't use the > > + * IRQ-disable-based boost approach here since the native queue-based > > + * SPI messages transfer method can sleep waiting for the > > + * transfers/CS-change delays. > > + */ > > + if (!msg->spi->cs_gpiod) > > + return -ENOTSUPP; > > This seems a bit much - we can validate that the message doesn't contain > any delays (which is going to be the overwhelming majority of devices) > or multiple transfers and reject just those transfers we can't support. > Multiple transfers are an issue with this hardware but you could > implement support in the core for coalescing them into single transfers, > this isn't the only hardware with automatic chip select handling that > can't cope with scatter/gather so it would be a useful thing to have. As I said in the commit message to the binding this controller is nearly always connected to an external SPI flash (I'd say always, because Baikal-T1 is fully functional only when initially booted from an SPI flash). At least I don't know any hardware using this interface differently (I've worked with all devices with Baikal-T1 SoC on board). In current driver implementation the SPI-mem operations covers the native CS case while the prepare_message()/transfer_one()/unprepare_message() are used when GPIO-driven CS is available (due to the commit 05766050d5bd I had to implement it). So I didn't want to over-complicate the driver just to support some virtual hardware before it's really necessary. > > > + /* > > + * Collect the controller configuration common for all transfers and > > + * specific to the very first one. > > + */ > > + xfer = list_first_entry(&msg->transfers, typeof(*xfer), transfer_list); > > + req.dirmap = false; > > + req.cs = msg->spi->chip_select; > > + req.cs_gpiod = msg->spi->cs_gpiod; > > + req.flags = 0; > > + req.mode = msg->spi->mode; > > + req.tmode = BC_SPI_CTRL0_TMOD_TR; > > + req.ndf = 0; > > + req.dfs = xfer->bits_per_word; > > + req.freq = xfer->speed_hz; > > + > > + bs_set_cfg(bs, &req); > > It's not clear to me what the benefit is in this indirection through > req? Config request is useful to collect the controller configs in a single structure from different objects like normal SPI transfers, SPI-mem operations or dirmaps. The config structure is then uniformly translated into the controller setting by the bs_set_cfg() method. > > > +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi, > > + struct spi_transfer *xfer) > > +{ > > + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); > > + > > + bs_update_cfg(bs, xfer); > > + > > + if (bs->cfg.dfs <= 8) > > + bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len); > > + else > > + bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2); > > This will have issues with transfers with an odd number of bytes won't > it? No, it won't. Transfers with bits per word greater than 8 should have even number of bytes seeing the len field is supposed to have the buffers length in bytes: * In-memory data values are always in native CPU byte order, translated * from the wire byte order (big-endian except with SPI_LSB_FIRST). So * for example when bits_per_word is sixteen, buffers are 2N bytes long * (@len = 2N) and hold N sixteen bit words in CPU byte order. * * When the word size of the SPI transfer is not a power-of-two multiple * of eight bits, those in-memory words include extra bits. In-memory * words are always seen by protocol drivers as right-justified, so the * undefined (rx) or unused (tx) bits are always the most significant bits. The controller expects exactly the same interpretation of right-justified data when its length isn't not power of 2 including the extra bits. > > > +static struct bt1_spi *bs_create_data(struct platform_device *pdev) > > +{ > > + struct device *dev = &pdev->dev; > > + struct bt1_spi *bs; > > As with some of the other functions this has exactly one caller and > barely any content in it, just inline it. As I said before. I don't see this being necessary. > > > + bs->dev = dev; > > + bs->pdev = pdev; > > Do you really need to separately save these (and is there a context > where you have the driver data but not the device anyway)? Even if I don't, having a device pointer around in the private data is a matter of convenience. It's either referencing the controller/mem pointers cascade to get it, which may look bulky like &mem->controller->dev and having an additional argument passed to the sub-functions and in some cases allocating pointers on the stack; or to prevent all of these complications I just save the device structure pointer in the private data, so to use it when I need to print an error or allocate dev-res, etc. Regarding the platform_device pointer. Well I've defined it in the private data for the same reason. Though it's used much less than generic device pointer, so I can discard it. > > > +#ifdef CONFIG_DEBUG_FS > > + > > +#define BC_SPI_DBGFS_REG(_name, _off) \ > > +{ \ > > + .name = _name, \ > > + .offset = _off \ > > +} > > Perhaps consider regmap_mmio? For multiple reasons no. First of all I need a direct access to the DR register and I need to do this as fast as possible due to relatively slow APB bus. The same concern is regarding the controller configs setting. Secondly everywhere in the driver I use the normal read/write methods and there is no need in even a basic regmap-update function. So an additional abstraction like regmap would be unnecessary complication. Thirdly the access to the registers is always done from a thread-safe context anyway (except DebugFS read operations). Fourthly due to the dirmap access the DW SSI registers can be unavailable at the moment of the dirmap read method invocation, that's why I need a mutual exclusive control around the operations concerning the registers. Due to this the regmap_mmio won't work here and I would have to implement my own accessors. There might be another problems, so using regmap isn't a good idea. > > > +static int bs_dbgfs_open_regs(struct inode *inode, struct file *file) > > +{ > > + struct bt1_spi *bs = inode->i_private; > > + int ret; > > + > > + ret = single_open(file, bs_dbgfs_show_regs, bs); > > + if (!ret) { > > + mutex_lock(&bs->ctrl->io_mutex); > > Holding a mutex over the entire time that a file is open is not good, > it's also not clear to me what this mutex is supposed to be protecting > given that it's only referenced in these debugfs functions. Hm, it's io_mutex protecting the SPI IO operation to be thread-safe. See: spi_mem_access_start()/spi_mem_access_end(); __spi_pump_messages(); By locking the mutex here I make sure the DebugFS registers dump operation won't interfere the IO operations like dirmap, which make the registers unavailable for normal MMIO-based access. > > > + writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR); > > Whatever this write is doing we never seem to undo it? Here I switch on the DW APB SSI registers to be available for normal readl/writel accesses. It's not a problem to leave the mode on, since bs_set_cfg() will do a proper config anyway. Wow, what a thorough review... -Sergey
On 10/05/20 12:20 pm, Serge Semin wrote: > On Fri, May 08, 2020 at 12:37:51PM +0100, Mark Brown wrote: <snip> >>> + writel(BIT(req->cs), bs->regs + BC_SPI_SER); >>> + if (req->cs_gpiod) { >>> + gpiod_set_value_cansleep(req->cs_gpiod, >>> + !!(bs->cfg.mode & SPI_CS_HIGH)); >> If you have a GPIO chip select you should just let the core manage it >> through cs_gpiod rather than open coding. > Of course I know this, and normally I would have omitted the GPIO manual > assertion (hopefully soon my hands get to merging the AX99100 driver I've > developed some time ago). The thing is that this Baikal-T1 System SSI device > driver has been initially written before commit 05766050d5bd ("spi: spi-mem: > fallback to using transfers when CS gpios are used"). So asserting GPIO CS had > been required to initiate the SPI memory communications seeing the generic > spi_mem_exec_op() doesn't do this. Manual GPIO manipulation is indeed redundant > for the current SPI-mem op execution procedure. > > Secondly the message of that commit states "Devices with chip selects driven > via GPIO are not compatible with the spi-mem operations." I find this statement > questionable, because for instance this device supports memory operations with > GPIO-driven CS. Though in current implementation the driver fallback to using normal > push-pull IO mode if GPIO CS is utilized as safer one. But even in this case > it's better than splitting the memory operations up into the transfers, which is > developed in the spi_mem_exec_op() method. On this specific bit. My use-case for 05766050d5bd was a SPI controller that supported direct mem accesses but a hardware design that required a GPIO CS. So yes I probably should have qualified it as _some_ devices. > So in this matter my question is: how to modify the SPI-mem interface so the > SPI-memory operations would also work with GPIO driven CS? Some additional flag > might work...
On Sun, May 10, 2020 at 03:20:39AM +0300, Serge Semin wrote: > On Fri, May 08, 2020 at 12:37:51PM +0100, Mark Brown wrote: > > Your CC list on this series is *very* wide - are you sure that this is > > relevant to everyone you're sending things to? Kernel developers often > > get a lot of mail meaning that extra mail can become a bit overwhelming > > and cause things to get lost in the noise. > MIPS folks are here since Baikal-T1 is a MIPS-based chip and this patchset > is a part of the campaign of integrating the SoC support into the kernel. > With Lee and Miquel we discussed the dirmap support in the framework of another > patchset. Rob and devicetree-list are CC'ed due to having the bindings tree > updated. Then a series of folks who recently submitted the biggest updates into > the spi subsystems so might have valuable comments for this driver as well. So > yes, I am sure. I think you've got too many people who've been contributing to the subsystem here at least - it looks like you picked up some people who recently wrote drivers but haven't been doing a lot of review for example for example. > Anyway ok. I'll fix it. Is it ok to have the C-style comments in the header > file? > * It isn't included by any assembly so from this point of view C++ style > * shall also work there. The SPDX stuff requires C style comments in headers so yes. > Secondly the message of that commit states "Devices with chip selects driven > via GPIO are not compatible with the spi-mem operations." I find this statement > questionable, because for instance this device supports memory operations with > GPIO-driven CS. Though in current implementation the driver fallback to using normal > push-pull IO mode if GPIO CS is utilized as safer one. But even in this case > it's better than splitting the memory operations up into the transfers, which is > developed in the spi_mem_exec_op() method. > So in this matter my question is: how to modify the SPI-mem interface so the > SPI-memory operations would also work with GPIO driven CS? Some additional flag > might work... Yes, some flags should work here - the issue was that at least some controllers may end up trying to do multiple SPI operations for one spi-mem thing which will break if the chip select doesn't get changed to correspond with what's going on. > Thirdly what about dirmap operations? If we got a GPIO driven CS then due to > lacking any CS manipulation in spi_mem_dirmap_read() method we wouldn't have > been able to make the direct mapping working without manual setting the GPIO. > So the same question here. How to work this around and justify your requirement? > Until the question is answered and we come up with reasonable solution in order > to have the SPI-mem/dirmap interface working together with GPIO CS support I have > to leave the manual GPIO manipulation. If the issue with ensuring that chip select is managed appropriately for transfers can be resolved then push that stuff up to the framework. If not then it's not clear that the open coded version can work well with GPIO chip selects either. > > > +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer) > > > +{ > > This has exactly one caller, perhaps it could just be inlined there? I > > think this applies to some of the other internal functions. > I don't want to inline methods just because they are used in a single place. This > might worsen readability and in this case it will. Currently the transfer_one() > method is consistent with self-explanatory methods: update config, push-pull > bytes/words, check status. This also applies to the rest of the code. I can > consider some improvements/optimizations in this matter though, but embedding > the functions isn't one of them. Moreover the compiler does the static > methods inlining automatically as soon as it sees they are called just once. One of the things that creating internal APIs like this does is that it adds another layer of structure to the driver, making it a bit harder to follow that it's following the usual structures of a Linux driver (especially noticable here given the open coding). It creates the impression of the platform portability layers that people sometimes try to build. > > > +static int bs_check_status(struct bt1_spi *bs) > > > +{ > > It's not obvious from the name that this function will busy wait rather > > than just reading some status registers. > What do you suggest then? Renaming? Splitting up? If renaming, then what name do > you prefer? Something like bs_check_completion()? wait_for_completion() or something else that mentions that it will wait for completion rather than just looking to see if the operation has completed. > > > +static int bs_exec_mem_op(struct spi_mem *mem, > > > + const struct spi_mem_op *op) > > > +{ > > It's not clear to me that this hardware actually supports spi_mem in > > hardware? > SPI-mem operations are implemented by means of the EEPROM-read and Tx-only > modes of the controller. Sure, but those seem like normal SPI-level things rather than cases where the hardware understands that it has a flash attached and is doing flash specific things. A very common case for this stuff is that controllers have acceleration blocks for read and fall back on normal SPI for writes and erases, that sounds like what's going on here. > > > + if (!bs->cfg.cs_gpiod) { > > > + bs_push_bytes(bs, cmd, len); > > > + > > > + if (op->data.dir == SPI_MEM_DATA_IN) > > > + ret = bs_pull_bytes(bs, op->data.buf.in, > > > + op->data.nbytes); > > > + } else { > > > + bs_pp_bytes(bs, cmd, NULL, len); > > > + > > > + if (op->data.dir == SPI_MEM_DATA_IN) > > > + bs_pp_bytes(bs, NULL, op->data.buf.in, > > > + op->data.nbytes); > > > + } > > > > The actual transfers here are done using the same PIO functions as > > everything else as far as I can see which makes me wonder what this > > spi_mem implementation is adding over the standard SPI flash support. > No, if no GPIO CS is supplied the memory read operation is performed by means of > the EEPROM-read mode of the DW SSI controller (Write command is written to the > Tx FIFO and read data length - to the CTRLR1 register). The memory write So EEPROM read mode is a combined write/read - is that somehow less performant or something than separate opeations so there's a reason not to use it except in this special circumstance? > operation utilizes the Tx-only mode (just push data to the Tx FIFO, controller > discards all incoming data). See bs_set_cfg() for details. That's not super illuminating on this issue? > If GPIO CS is utilized then we fallback to the normal push-pull methods because > first it's safer this way, second it's still better than handling multiple SPI > transfers. > BTW The EEPROM/Tx-only modes are required to be utilized for the controller on > our SoC in order to support the SPI bus operations with highest frequencies. > Due to relatively slow APB bus performance we have to minimize MMIO operation > per a single byte read/write. This is another reason why we had to create a > dedicated driver for this controller. If the controller has support for TX only mode that seems like a good thing to have in the generic driver - for TX only transfers that's less bus I/O. The EEPROM mode potentially also, I'm less clear on what exactly it is but it sounds potentially useful for common cases with register operations. Perhaps other systems might have similarly slow buses and get noticable performance benefits from using these modes. > > > +static void bs_copy_from_map(void *to, void __iomem *from, size_t len) > > > +{ > > > + size_t shift, chunk; > > > + u32 data; > > This feels like something that is likely to be suitable for a generic > > implementation? Indeed I'd kind of expect that the architecture > > memcpy_to/fromio() would try to copy aligned blocks since it's usually > > going to perform better (which I assume is why this function is doing > > it). > MIPS memcpy_to/fromio don't do this. They expect the MMIO space can be read in > a same way as normal memory space. It's our controller "feature", that MMIO can > be only accessible by lw/sw instructions. So I had no choice but to implement the > aligned read/write operations by myself here. AFAICS this won't be suitable for > the generic implementation due to the alignment restriction and the controller > setups, which are supposed to be performed before accessing the dirmap MMIO. You should document this in the driver so nobody comes along and does the refactoring as a cleanup, it looks like duplicated or redundant code. > > This seems a bit much - we can validate that the message doesn't contain > > any delays (which is going to be the overwhelming majority of devices) > > or multiple transfers and reject just those transfers we can't support. > > Multiple transfers are an issue with this hardware but you could > > implement support in the core for coalescing them into single transfers, > > this isn't the only hardware with automatic chip select handling that > > can't cope with scatter/gather so it would be a useful thing to have. > As I said in the commit message to the binding this controller is nearly always > connected to an external SPI flash (I'd say always, because Baikal-T1 is fully > functional only when initially booted from an SPI flash). At least I don't know > any hardware using this interface differently (I've worked with all devices with > Baikal-T1 SoC on board). In current driver implementation the SPI-mem operations > covers the native CS case while the prepare_message()/transfer_one()/unprepare_message() > are used when GPIO-driven CS is available (due to the commit 05766050d5bd I had > to implement it). So I didn't want to over-complicate the driver just to support > some virtual hardware before it's really necessary. Aside from the issue of tempting fate in board designs this circles back to the open coding of the bits of the spi-mem stuff that aren't really offloaded by the hardware - the driver would look a lot more standard (and be closer to spi-dw I guess) if it only implemented the bits that are offloaded and this interface was available. Having the normal SPI message interface would not only support non-flash hardware, it'd also enable the use of more of the generic SPI flash code like other similar controllers. > > > +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi, > > > + struct spi_transfer *xfer) > > > +{ > > > + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); > > > + > > > + bs_update_cfg(bs, xfer); > > > + > > > + if (bs->cfg.dfs <= 8) > > > + bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len); > > > + else > > > + bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2); > > > > This will have issues with transfers with an odd number of bytes won't > > it? > No, it won't. Transfers with bits per word greater than 8 should have even > number of bytes seeing the len field is supposed to have the buffers length in > bytes: So dfs is bits per word? This is one of those things where the internal APIs make things harder to follow. > > Perhaps consider regmap_mmio? > For multiple reasons no. First of all I need a direct access to the DR register > and I need to do this as fast as possible due to relatively slow APB > bus. The same concern is regarding the controller configs setting. Secondly > everywhere in the driver I use the normal read/write methods and there is no > need in even a basic regmap-update function. So an additional abstraction like > regmap would be unnecessary complication. Thirdly the access to the registers That comment was more due to writing debug infrastructure than stuff like update_bits - the main reason regmap-mmio exists is for the debug and cache features. If the performance is too marginal to allow any overhead then fine. There's nothing stopping you combining regmap and non-regmap on the same device if you exclude the more sensitive registers from regmap with the access operations but it's not exactly nice. > > > + if (!ret) { > > > + mutex_lock(&bs->ctrl->io_mutex); > > Holding a mutex over the entire time that a file is open is not good, > > it's also not clear to me what this mutex is supposed to be protecting > > given that it's only referenced in these debugfs functions. > Hm, it's io_mutex protecting the SPI IO operation to be thread-safe. See: > spi_mem_access_start()/spi_mem_access_end(); > __spi_pump_messages(); > By locking the mutex here I make sure the DebugFS registers dump operation won't > interfere the IO operations like dirmap, which make the registers unavailable > for normal MMIO-based access. Oh, it's the SPI level mutex... that is a bit icky, and it does let userspace code block the bus fairly easily which could be problematic but I guess it's debugfs so meh. > > > + writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR); > > Whatever this write is doing we never seem to undo it? > Here I switch on the DW APB SSI registers to be available for normal > readl/writel accesses. It's not a problem to leave the mode on, since > bs_set_cfg() will do a proper config anyway. A comment would help here, the fact that it's not undone looks like a bug.
Mark, I give up.) I'll try to integrate what I've done here in the generic DW APB SSI driver framework. Then add some hooks so to support our specific DW APB SSI controller. The I create a glue-layer for it. My answers to you comments are below. On Mon, May 11, 2020 at 10:25:06PM +0100, Mark Brown wrote: > On Sun, May 10, 2020 at 03:20:39AM +0300, Serge Semin wrote: > > On Fri, May 08, 2020 at 12:37:51PM +0100, Mark Brown wrote: > > > > Your CC list on this series is *very* wide - are you sure that this is > > > relevant to everyone you're sending things to? Kernel developers often > > > get a lot of mail meaning that extra mail can become a bit overwhelming > > > and cause things to get lost in the noise. > > > MIPS folks are here since Baikal-T1 is a MIPS-based chip and this patchset > > is a part of the campaign of integrating the SoC support into the kernel. > > With Lee and Miquel we discussed the dirmap support in the framework of another > > patchset. Rob and devicetree-list are CC'ed due to having the bindings tree > > updated. Then a series of folks who recently submitted the biggest updates into > > the spi subsystems so might have valuable comments for this driver as well. So > > yes, I am sure. > > I think you've got too many people who've been contributing to the > subsystem here at least - it looks like you picked up some people who > recently wrote drivers but haven't been doing a lot of review for > example for example. > > > Anyway ok. I'll fix it. Is it ok to have the C-style comments in the header > > file? > > * It isn't included by any assembly so from this point of view C++ style > > * shall also work there. > > The SPDX stuff requires C style comments in headers so yes. > > > Secondly the message of that commit states "Devices with chip selects driven > > via GPIO are not compatible with the spi-mem operations." I find this statement > > questionable, because for instance this device supports memory operations with > > GPIO-driven CS. Though in current implementation the driver fallback to using normal > > push-pull IO mode if GPIO CS is utilized as safer one. But even in this case > > it's better than splitting the memory operations up into the transfers, which is > > developed in the spi_mem_exec_op() method. > > > So in this matter my question is: how to modify the SPI-mem interface so the > > SPI-memory operations would also work with GPIO driven CS? Some additional flag > > might work... > > Yes, some flags should work here - the issue was that at least some > controllers may end up trying to do multiple SPI operations for one > spi-mem thing which will break if the chip select doesn't get changed to > correspond with what's going on. Ok. New SPI flag it is then. It will be something like this: + #define SPI_CONTROLLER_FLASH_SS BIT(6) Then we have to convert spi_set_cs() method to be non-static in the spi.c to be used in the spi-mem.c to properly select slaves Then the spi_mem_access_start() and spi_mem_access_stop() of spi-mem.c should be updated in the following way: --- spi_mem_access_start() +++ spi_mem_access_start() mutex_lock(&ctlr->bus_lock_mutex); mutex_lock(&ctlr->io_mutex); + + if (ctlr->flags & SPI_CONTROLLER_FLASH_SS) + spi_set_cs(mem->spi, true); return 0; } --- spi_mem_access_end() +++ spi_mem_access_end() static void spi_mem_access_end(struct spi_mem *mem) { struct spi_controller *ctlr = mem->spi->controller; + + if (ctlr->flags & SPI_CONTROLLER_FLASH_SS) + spi_set_cs(mem->spi, false); mutex_unlock(&ctlr->io_mutex); mutex_unlock(&ctlr->bus_lock_mutex); --- Method spi_mem_exec_op() shall also make sure that even if GPIO-driven CS is utilized the normal mem-op is executed. Something like this should do this: --- spi_mem_exec_op() +++ spi_mem_exec_op() if (!spi_mem_internal_supports_op(mem, op)) return -ENOTSUPP; - if (ctlr->mem_ops && !mem->spi->cs_gpiod) { + if (ctlr->mem_ops && + (!mem->spi->cs_gpiod || (ctlr->flags & SPI_CONTROLLER_FLASH_SS))) { ret = spi_mem_access_start(mem); if (ret) return ret; --- So, what do you think? > > > Thirdly what about dirmap operations? If we got a GPIO driven CS then due to > > lacking any CS manipulation in spi_mem_dirmap_read() method we wouldn't have > > been able to make the direct mapping working without manual setting the GPIO. > > So the same question here. How to work this around and justify your requirement? > > Until the question is answered and we come up with reasonable solution in order > > to have the SPI-mem/dirmap interface working together with GPIO CS support I have > > to leave the manual GPIO manipulation. > > If the issue with ensuring that chip select is managed appropriately for > transfers can be resolved then push that stuff up to the framework. If > not then it's not clear that the open coded version can work well > with GPIO chip selects either. See the comment above. > > > > > +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer) > > > > +{ > > > > This has exactly one caller, perhaps it could just be inlined there? I > > > think this applies to some of the other internal functions. > > > I don't want to inline methods just because they are used in a single place. This > > might worsen readability and in this case it will. Currently the transfer_one() > > method is consistent with self-explanatory methods: update config, push-pull > > bytes/words, check status. This also applies to the rest of the code. I can > > consider some improvements/optimizations in this matter though, but embedding > > the functions isn't one of them. Moreover the compiler does the static > > methods inlining automatically as soon as it sees they are called just once. > > One of the things that creating internal APIs like this does is that it > adds another layer of structure to the driver, making it a bit harder to > follow that it's following the usual structures of a Linux driver > (especially noticable here given the open coding). It creates the > impression of the platform portability layers that people sometimes try > to build. Let's stop this discussion.) Since I am going to integrate this patch into the generic DW APB SSI driver, I will have to follow that driver design anyway. So most likely I won't implement a new handy abstraction. > > > > > +static int bs_check_status(struct bt1_spi *bs) > > > > +{ > > > > It's not obvious from the name that this function will busy wait rather > > > than just reading some status registers. > > > What do you suggest then? Renaming? Splitting up? If renaming, then what name do > > you prefer? Something like bs_check_completion()? > > wait_for_completion() or something else that mentions that it will wait > for completion rather than just looking to see if the operation has > completed. Ok. If it will be still applicable. > > > > > +static int bs_exec_mem_op(struct spi_mem *mem, > > > > + const struct spi_mem_op *op) > > > > +{ > > > > It's not clear to me that this hardware actually supports spi_mem in > > > hardware? > > > SPI-mem operations are implemented by means of the EEPROM-read and Tx-only > > modes of the controller. > > Sure, but those seem like normal SPI-level things rather than cases > where the hardware understands that it has a flash attached and is doing > flash specific things. No, hardware can't detect whether the flash is attached. This must be defined by the platform, like based on the DT sub-nodes. > A very common case for this stuff is that > controllers have acceleration blocks for read and fall back on normal > SPI for writes and erases, that sounds like what's going on here. Well, yeah, they do provide some acceleration. EEPROM-read provides automatic write-cmd-dummy-data-then-read operations. But in this case the only thing we have to push into the SPI Tx FIFO is command and dummy bytes. The read operation will be performed automatically. So the only thing the driver has to do is to keep up with Rx FIFO data pulling out before it's overflown. Similarly Tx-only mode is used to perform the Flash write/erase commands. The driver has to keep up with pushing data into the Tx FIFO before it gets emptied and CS is de-asserted. Using these operations is mandatory in case if there is no GPIO-driven CS is utilized, since native CS is automatically asserted during the whole EEPROM-read or Tx-only mode executed. As I also said such operations are very useful, since minimize IO operations performed over the APB bus. By doing so we can support higher SPI bus frequencies. > > > > > + if (!bs->cfg.cs_gpiod) { > > > > + bs_push_bytes(bs, cmd, len); > > > > + > > > > + if (op->data.dir == SPI_MEM_DATA_IN) > > > > + ret = bs_pull_bytes(bs, op->data.buf.in, > > > > + op->data.nbytes); > > > > + } else { > > > > + bs_pp_bytes(bs, cmd, NULL, len); > > > > + > > > > + if (op->data.dir == SPI_MEM_DATA_IN) > > > > + bs_pp_bytes(bs, NULL, op->data.buf.in, > > > > + op->data.nbytes); > > > > + } > > > > > > > The actual transfers here are done using the same PIO functions as > > > everything else as far as I can see which makes me wonder what this > > > spi_mem implementation is adding over the standard SPI flash support. > > > No, if no GPIO CS is supplied the memory read operation is performed by means of > > the EEPROM-read mode of the DW SSI controller (Write command is written to the > > Tx FIFO and read data length - to the CTRLR1 register). The memory write > > So EEPROM read mode is a combined write/read - is that somehow less > performant or something than separate opeations so there's a reason not > to use it except in this special circumstance? Actually It's more performant (less APB-bus MMIO operations required to perform the transfers) and safer in case of Native CS-driven slave devices. In case of GPIO-driven CS EEPROM and Tx-only are still more performant modes, but less safe than normal synchronous SPI Tx/Rx mode. I'll explain it one more time to logically continue this discussion. In general the DW APB SSI controller may be utilized on two types of platform setups: 1) A slave device is selected by the DW APB SSI Native CS lane only. You know, that Dw APB SSI native chip select is automatically asserted by the SPI controller during a transfer being executed. As soon as there is no data in the Tx FIFO to transfer (Tx-only or Tx-Rx modes) or there is no data left to receive (EEPROM-read or Rx-only mode), the native CS will be automatically de-asserted. If we decided not to implement the mem-ops in the driver, then we'd have to: for flash-write operation: analyze the passed message ahead, merge the Tx-only operations and atomically execute them after activating Tx-only mode; for flash-read operation: analyze the passed message ahead, merge Tx-only operations, combine it with Rx-only tail part and atomically execute them all after activating the EEPROM-read mode. Well, what we've done here is basically a reverse operation performed by the spi_exec_mem_op(), which just split the spi_mem_op structure up into a set of spi_transfer's. Why would we need to do this, while we could just use the already provided perfectly suitable spi_mem_op data structure for our needs? Especially seeing our controller supports the operations like Tx-only and EEPROM-read specifically provided for the cases of flash-device write/read operations. Isn't that a reason of claiming that it supports mem-ops? If we decided to implement the mem-ops in the driver (as it's done in this patch), then the only thing we'd need to do is to enable the corresponding DW APB SSI controller mode and push or push/pull spi_mem_op data on time. That's it. No remapping, no data analyzing, no merging. Yes, the hardware doesn't provide that much acceleration than normal Flash-capable SPI controllers, but still it's better than handling standard transfers. You'd say why remapping, analyzing and merging? Just use the normal SPI synchronous Tx/Rx mode and push dummy zeros or ignore incoming data. The answer is performance. Well, this might be good solution if the APB bus was fast enough. If it's not, then we have to minimize the IO operations with Dw APB SSI registers. This can be only done by using the EEPROM-read and Tx-only mode utilized. Note, all the operations described above MUST be executed in the atomic context, so the data push/pull algorithms would keep up with keeping the Tx FIFO non-empty and preventing the Rx FIFO overflown by automatically incoming data. If either for a reason of IRQ or another task preemption or due to the slow APB bus we get the Tx FIFO emptied, then the native CS will be de-asserted. If for the same reason we get the Rx FIFO overflown, then we'll loose data. 2) A slave is selected by GPIO CS. Well, TBH in this case it's still more performant to use the Tx-only and EEPROM-read modes for communications (because we need less APB bus activity), but these modes mandate the IRQs disabling since data is automatically received. Since in this case the problem with automatic CS de-assertion doesn't exist (we can manually set and clear GPIO CS), it is safe to use the normal synchronous SPI Tx/Rx mode and avoid IRQs disabling. Though in case if better performance is required then mem-ops should be utilized. Note using Tx-only and EEPROM-read operations is more performant than normal synchronous SPI Tx-Rx mode because it causes less APB bus activity. In addition we need less data analyzing and mapping for spi-mem operations. In our case both of these factors are very important due to slow APB bus. (See further comment for details of what I can port from this driver to the generic DW APB SSI core.) > > > operation utilizes the Tx-only mode (just push data to the Tx FIFO, controller > > discards all incoming data). See bs_set_cfg() for details. > > That's not super illuminating on this issue? > > > If GPIO CS is utilized then we fallback to the normal push-pull methods because > > first it's safer this way, second it's still better than handling multiple SPI > > transfers. > > > BTW The EEPROM/Tx-only modes are required to be utilized for the controller on > > our SoC in order to support the SPI bus operations with highest frequencies. > > Due to relatively slow APB bus performance we have to minimize MMIO operation > > per a single byte read/write. This is another reason why we had to create a > > dedicated driver for this controller. > > If the controller has support for TX only mode that seems like a good > thing to have in the generic driver - for TX only transfers that's less > bus I/O. The EEPROM mode potentially also, I'm less clear on what > exactly it is but it sounds potentially useful for common cases with > register operations. Perhaps other systems might have similarly slow > buses and get noticable performance benefits from using these modes. See my comments above. I gave up and will try to merge my driver into the generic Dw APB SSI framework. > > > > > +static void bs_copy_from_map(void *to, void __iomem *from, size_t len) > > > > +{ > > > > + size_t shift, chunk; > > > > + u32 data; > > > > This feels like something that is likely to be suitable for a generic > > > implementation? Indeed I'd kind of expect that the architecture > > > memcpy_to/fromio() would try to copy aligned blocks since it's usually > > > going to perform better (which I assume is why this function is doing > > > it). > > > MIPS memcpy_to/fromio don't do this. They expect the MMIO space can be read in > > a same way as normal memory space. It's our controller "feature", that MMIO can > > be only accessible by lw/sw instructions. So I had no choice but to implement the > > aligned read/write operations by myself here. AFAICS this won't be suitable for > > the generic implementation due to the alignment restriction and the controller > > setups, which are supposed to be performed before accessing the dirmap MMIO. > > You should document this in the driver so nobody comes along and does > the refactoring as a cleanup, it looks like duplicated or redundant code. Ok. > > > > This seems a bit much - we can validate that the message doesn't contain > > > any delays (which is going to be the overwhelming majority of devices) > > > or multiple transfers and reject just those transfers we can't support. > > > Multiple transfers are an issue with this hardware but you could > > > implement support in the core for coalescing them into single transfers, > > > this isn't the only hardware with automatic chip select handling that > > > can't cope with scatter/gather so it would be a useful thing to have. > > > As I said in the commit message to the binding this controller is nearly always > > connected to an external SPI flash (I'd say always, because Baikal-T1 is fully > > functional only when initially booted from an SPI flash). At least I don't know > > any hardware using this interface differently (I've worked with all devices with > > Baikal-T1 SoC on board). In current driver implementation the SPI-mem operations > > covers the native CS case while the prepare_message()/transfer_one()/unprepare_message() > > are used when GPIO-driven CS is available (due to the commit 05766050d5bd I had > > to implement it). So I didn't want to over-complicate the driver just to support > > some virtual hardware before it's really necessary. > > Aside from the issue of tempting fate in board designs this circles back > to the open coding of the bits of the spi-mem stuff that aren't really > offloaded by the hardware - the driver would look a lot more standard > (and be closer to spi-dw I guess) if it only implemented the bits that > are offloaded and this interface was available. Having the normal > SPI message interface would not only support non-flash hardware, it'd > also enable the use of more of the generic SPI flash code like other > similar controllers. See my comment above, why it's necessary to have the spi-mem implemented for the case of native CS. If GPIO-driven CS is utilized, then generic (synchronous SPI Tx/Rx mode) but poll-based (since there is no IRQ provided for our controller) algorithm can be utilized. If native CS is only available, then the safest way is to have the atomic context by disabling the interrupts. This can be easily done for the spi-mem interface, but nearly impossible for the generic spi_transfer_one_message()-based IO. The only alternative is to implement the device-specific transfer_one_message() callback, analyze SPI transfers, merge them up, execute in the atomic context. Needless to say, that such approach is only applicable for the IRQless controllers. If IRQ is available for the controller, then using Native CS is still unsafe. To sum up what approaches are better in what configurations if IRQ-less DW APB SSI controller is available: 1) Flash/spi-mem + Native CS => EEPROM-read/Tx-only + atomic context - disable IRQs. 2) Flash/spi-mem + GPIO CS => normal synchronous SPI Tx/Rx mode (currently this will fallback to 4) because spi_mem_exec_op() implementation, probably for good). 3) SPI msg + Native CS => reimplement transfer_one_message() + merge transfers if possible otherwise use normal Tx+Rx mode + atomic context - disabling IRQs during the whole message. 4) SPI msg + GPIO CS => normal synchronous SPI Tx/Rx mode. For the Dw APB SSI controller with IRQ the 2) and 4) items are the same. 1) and 3) can't be safely implemented due to possible concurrent IRQs. Item 4) is already available in the current spi-dw.c driver, although the poll-mode has been recently removed (I'll have to get it back). From this patch I can port the items 1) and 2) to the generic version of the DW APB SSI driver. I won't spend my time for 3) because first I've already spent a lot of it on implementing this driver, second I don't need such functionality on our platforms. If someone ever needs it, I'll assist with an advice of how to do this. So, what do you think? > > > > > +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi, > > > > + struct spi_transfer *xfer) > > > > +{ > > > > + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); > > > > + > > > > + bs_update_cfg(bs, xfer); > > > > + > > > > + if (bs->cfg.dfs <= 8) > > > > + bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len); > > > > + else > > > > + bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2); > > > > > > This will have issues with transfers with an odd number of bytes won't > > > it? > > > No, it won't. Transfers with bits per word greater than 8 should have even > > number of bytes seeing the len field is supposed to have the buffers length in > > bytes: > > So dfs is bits per word? This is one of those things where the internal > APIs make things harder to follow. > > > > Perhaps consider regmap_mmio? > > > For multiple reasons no. First of all I need a direct access to the DR register > > and I need to do this as fast as possible due to relatively slow APB > > bus. The same concern is regarding the controller configs setting. Secondly > > everywhere in the driver I use the normal read/write methods and there is no > > need in even a basic regmap-update function. So an additional abstraction like > > regmap would be unnecessary complication. Thirdly the access to the registers > > That comment was more due to writing debug infrastructure than stuff > like update_bits - the main reason regmap-mmio exists is for the debug > and cache features. If the performance is too marginal to allow any > overhead then fine. There's nothing stopping you combining regmap and > non-regmap on the same device if you exclude the more sensitive > registers from regmap with the access operations but it's not exactly > nice. > > > > > + if (!ret) { > > > > + mutex_lock(&bs->ctrl->io_mutex); > > > > Holding a mutex over the entire time that a file is open is not good, > > > it's also not clear to me what this mutex is supposed to be protecting > > > given that it's only referenced in these debugfs functions. > > > Hm, it's io_mutex protecting the SPI IO operation to be thread-safe. See: > > spi_mem_access_start()/spi_mem_access_end(); > > __spi_pump_messages(); > > > By locking the mutex here I make sure the DebugFS registers dump operation won't > > interfere the IO operations like dirmap, which make the registers unavailable > > for normal MMIO-based access. > > Oh, it's the SPI level mutex... that is a bit icky, and it does let > userspace code block the bus fairly easily which could be problematic > but I guess it's debugfs so meh. Ok. > > > > > + writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR); > > > > Whatever this write is doing we never seem to undo it? > > > Here I switch on the DW APB SSI registers to be available for normal > > readl/writel accesses. It's not a problem to leave the mode on, since > > bs_set_cfg() will do a proper config anyway. > > A comment would help here, the fact that it's not undone looks like a > bug. Ok. -Sergey
On Mon, May 18, 2020 at 03:05:42AM +0300, Serge Semin wrote: > On Mon, May 11, 2020 at 10:25:06PM +0100, Mark Brown wrote: > > Yes, some flags should work here - the issue was that at least some > > controllers may end up trying to do multiple SPI operations for one > > spi-mem thing which will break if the chip select doesn't get changed to > > correspond with what's going on. > Ok. New SPI flag it is then. It will be something like this: > + #define SPI_CONTROLLER_FLASH_SS BIT(6) I'd rather use CS than SS (it's more common in the code). > So, what do you think? Should be fine, controllers that have an issue implementing just shouldn't set the flag. > > > > It's not clear to me that this hardware actually supports spi_mem in > > > > hardware? > > > SPI-mem operations are implemented by means of the EEPROM-read and Tx-only > > > modes of the controller. > > Sure, but those seem like normal SPI-level things rather than cases > > where the hardware understands that it has a flash attached and is doing > > flash specific things. > No, hardware can't detect whether the flash is attached. This must be defined by > the platform, like based on the DT sub-nodes. This isn't about autodetection, it's about the abstraction level the hardware is operating on - some hardware is able to generate flash operations by itself (possibly with some help programming the opcodes that are needed by a given flash), some hardware just works at the bytestream level. > > A very common case for this stuff is that > > controllers have acceleration blocks for read and fall back on normal > > SPI for writes and erases, that sounds like what's going on here. > > Well, yeah, they do provide some acceleration. EEPROM-read provides automatic > write-cmd-dummy-data-then-read operations. But in this case the only thing we > have to push into the SPI Tx FIFO is command and dummy bytes. The read operation So it's a write then read but you have to program the write each time?
On Mon, May 18, 2020 at 04:19:47PM +0100, Mark Brown wrote: > On Mon, May 18, 2020 at 03:05:42AM +0300, Serge Semin wrote: > > On Mon, May 11, 2020 at 10:25:06PM +0100, Mark Brown wrote: > > > > Yes, some flags should work here - the issue was that at least some > > > controllers may end up trying to do multiple SPI operations for one > > > spi-mem thing which will break if the chip select doesn't get changed to > > > correspond with what's going on. > > > Ok. New SPI flag it is then. It will be something like this: > > + #define SPI_CONTROLLER_FLASH_SS BIT(6) > > I'd rather use CS than SS (it's more common in the code). Ok. > > > So, what do you think? > > Should be fine, controllers that have an issue implementing just > shouldn't set the flag. Yes, exactly what I intended. > > > > > > It's not clear to me that this hardware actually supports spi_mem in > > > > > hardware? > > > > > SPI-mem operations are implemented by means of the EEPROM-read and Tx-only > > > > modes of the controller. > > > > Sure, but those seem like normal SPI-level things rather than cases > > > where the hardware understands that it has a flash attached and is doing > > > flash specific things. > > > No, hardware can't detect whether the flash is attached. This must be defined by > > the platform, like based on the DT sub-nodes. > > This isn't about autodetection, it's about the abstraction level the > hardware is operating on - some hardware is able to generate flash > operations by itself (possibly with some help programming the opcodes > that are needed by a given flash), some hardware just works at the > bytestream level. > > > > A very common case for this stuff is that > > > controllers have acceleration blocks for read and fall back on normal > > > SPI for writes and erases, that sounds like what's going on here. > > > > Well, yeah, they do provide some acceleration. EEPROM-read provides automatic > > write-cmd-dummy-data-then-read operations. But in this case the only thing we > > have to push into the SPI Tx FIFO is command and dummy bytes. The read operation > > So it's a write then read but you have to program the write each time? Here is what we need to do to perform the EEPROM-read operation: 1) Enable EEPROM-read mode. 2) Initialize a corresponding registers with a number of SPI transfer words (with bits-per-word taken into account) to read. 3) Push opcode + address + dummy bytes into the Tx FIFO. When it's done and the Tx FIFO is empty, the controller will proceed with read operations by pushing zeros (or ones, don't remember what level it's by default) to MOSI and pulling data from MISO into the RX FIFO. 4) Keep up with getting data from the Rx FIFO so one wouldn't get overflown. Regarding programming write each time. Well, it's up to the driver implementation. If opcode, address, dummy bytes and number of words to read are the same as before, then re-programming isn't required. -Sergey
On Tue, May 19, 2020 at 12:17:27AM +0300, Serge Semin wrote: > Here is what we need to do to perform the EEPROM-read operation: > 1) Enable EEPROM-read mode. > 2) Initialize a corresponding registers with a number of SPI transfer words > (with bits-per-word taken into account) to read. > 3) Push opcode + address + dummy bytes into the Tx FIFO. When it's done and > the Tx FIFO is empty, the controller will proceed with read operations by > pushing zeros (or ones, don't remember what level it's by default) to MOSI > and pulling data from MISO into the RX FIFO. > 4) Keep up with getting data from the Rx FIFO so one wouldn't get overflown. > Regarding programming write each time. Well, it's up to the driver implementation. > If opcode, address, dummy bytes and number of words to read are the same as before, > then re-programming isn't required. Ah, nice. This should be useful for far more than just flash - most register reads will also be able to take advantage of this, they follow a similar write then read pattern.
diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig index 741b9140992a..3b1815b6a904 100644 --- a/drivers/spi/Kconfig +++ b/drivers/spi/Kconfig @@ -171,6 +171,19 @@ config SPI_BCM_QSPI based platforms. This driver works for both SPI master for spi-nor flash device as well as MSPI device. +config SPI_BT1_SYS + tristate "Baikal-T1 System Boot SPI Controller" + depends on (MIPS_BAIKAL_T1 && OF) || COMPILE_TEST + imply SPI_MEM + help + SPI driver for Baikal-T1 System Boot SPI Controller. It's based on + the DW APB SSI IP block, but with very limited resources available: + no IRQ, no DMA, a single CS, FIFO of just 8 bytes depth. Its primary + usecase is to boot the system from an external SPI-nor flash. So + aside with the poll-based normal SPI interface the memory operations + and read-only direct mapping are supported by the driver from + scratch. + config SPI_BITBANG tristate "Utilities for Bitbanging SPI masters" help diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile index 28f601327f8c..3c5d40a298e4 100644 --- a/drivers/spi/Makefile +++ b/drivers/spi/Makefile @@ -29,6 +29,7 @@ obj-$(CONFIG_SPI_BCM63XX) += spi-bcm63xx.o obj-$(CONFIG_SPI_BCM63XX_HSSPI) += spi-bcm63xx-hsspi.o obj-$(CONFIG_SPI_BCM_QSPI) += spi-iproc-qspi.o spi-brcmstb-qspi.o spi-bcm-qspi.o obj-$(CONFIG_SPI_BITBANG) += spi-bitbang.o +obj-$(CONFIG_SPI_BT1_SYS) += spi-bt1-sys.o obj-$(CONFIG_SPI_BUTTERFLY) += spi-butterfly.o obj-$(CONFIG_SPI_CADENCE) += spi-cadence.o obj-$(CONFIG_SPI_CLPS711X) += spi-clps711x.o diff --git a/drivers/spi/spi-bt1-sys.c b/drivers/spi/spi-bt1-sys.c new file mode 100644 index 000000000000..f7ab0eb4c247 --- /dev/null +++ b/drivers/spi/spi-bt1-sys.c @@ -0,0 +1,873 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC + * + * Authors: + * Ramil Zaripov <Ramil.Zaripov@baikalelectronics.ru> + * Serge Semin <Sergey.Semin@baikalelectronics.ru> + * + * Baikal-T1 System Boot SPI Controller driver + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/bits.h> +#include <linux/bitfield.h> +#include <linux/string.h> +#include <linux/platform_device.h> +#include <linux/spi/spi.h> +#include <linux/spi/spi-mem.h> +#include <linux/of.h> +#include <linux/clk.h> +#include <linux/irqflags.h> +#include <linux/preempt.h> +#include <linux/debugfs.h> +#include <linux/cpumask.h> +#include <linux/seq_file.h> + +#include "spi-bt1-sys.h" + +static void bs_set_cfg(struct bt1_spi *bs, const struct bt1_spi_cfg *req) +{ + u32 ctrl0, ctrl1, baudr, min_sckdv, sckdv; + + ctrl0 = FIELD_PREP(BC_SPI_CTRL0_DFS_MASK, req->dfs - 1) | + FIELD_PREP(BC_SPI_CTRL0_FRF_MASK, BC_SPI_CTRL0_FRF_SPI) | + (req->mode & SPI_CPHA ? BC_SPI_CTRL0_SCPH : 0) | + (req->mode & SPI_CPOL ? BC_SPI_CTRL0_SCPOL : 0) | + (req->mode & SPI_LOOP ? BC_SPI_CTRL0_SRL : 0) | + FIELD_PREP(BC_SPI_CTRL0_TMOD_MASK, req->tmode); + ctrl1 = FIELD_PREP(BC_SPI_CTRL1_NDF_MASK, req->ndf ? req->ndf - 1 : 0); + + /* + * There is a very tricky race condition hidden inside Baikal-T1 SoC + * MMIO implementation. Simultaneous accesses to any registers via + * the APB bus are serialized even if they are executed to different + * addresses of even different controllers. This may cause + * IO-operations delays (two, three, four, etc times the normal IO + * operation latency), which in our case may lead to the data loss if + * SPI-bus is fast enough and APB traffic is too heavy (either + * generated by sibling CPU or DMA). So in order to fix this we have + * no other way but to increase the SPI-bus clock divider lower limit. + * Note this limitation isn't applicable to the SPI flash direct + * mapping mode, since IO operations are performed by the Boot + * Controller itself while CPU just reads the requested data. + */ + sckdv = DIV_ROUND_UP(bs->ref_freq, req->freq); + if (!req->dirmap && !req->cs_gpiod && num_possible_cpus() > 1) + min_sckdv = BC_SPI_MIN_SCKDV; + else + min_sckdv = 2; + sckdv = clamp_val(sckdv, min_sckdv, BC_SPI_BAUDR_SCKDV_MASK); + baudr = FIELD_PREP(BC_SPI_BAUDR_SCKDV_MASK, sckdv); + + /* + * Enable DW APB SSI controller registers access by means of the + * RDA mode control bit. Only then we'll be able to read/write + * from/to the controller memory mapped config-space. + */ + writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR); + + /* + * Update the SPI controller configuration registers and enable + * the requested slave for further communications. + */ + writel(0, bs->regs + BC_SPI_SSIENR); + writel(ctrl0, bs->regs + BC_SPI_CTRL0); + writel(ctrl1, bs->regs + BC_SPI_CTRL1); + writel(baudr, bs->regs + BC_SPI_BAUDR); + writel(BC_SPI_SSIENR_EN, bs->regs + BC_SPI_SSIENR); + + bs->cfg = *req; + + /* + * DW APB SSI controller has a nasty peculiarity. It doesn't provide + * a direct way to set and clear the native chip-select signal. Instead + * the controller asserts the CS pin if Tx FIFO isn't empty and a + * transmission is going on. When Tx FIFO doesn't have anything to be + * pushed out the chip-select pin will be automatically de-asserted + * back to the high level. Needless to say that the sudden CS + * de-assertion in the middle of the message transfer most likely will + * cause the data loss. So in order to make sure the executed transfers + * are CS-atomic we have to either keep the controller FIFO non-empty + * during the whole series of transfers or use GPIO-based chip-select + * signal. The last case availability depends on the platform + * configuration, while the former one can be achieved by making sure + * nothing preempt the transfer execution context. It can be done by + * disabling the local interrupts while the data being pushed and + * pulled to/from the controller FIFOs. Note we don't have to worry + * about this problem in case of SPI flash transparent access, because + * IO operations are performed by the Boot Controller. + */ + writel(BIT(req->cs), bs->regs + BC_SPI_SER); + if (req->cs_gpiod) { + gpiod_set_value_cansleep(req->cs_gpiod, + !!(bs->cfg.mode & SPI_CS_HIGH)); + } else if (!req->dirmap) { + local_irq_save(bs->cfg.flags); + preempt_disable(); + } + + /* + * If direct mapping is requested from now the DW APB SSI registers + * are getting unavailable. + */ + if (req->dirmap) + writel(0, bs->regs + BC_CSR); +} + +/* + * Note the next method is applicable only in non-direct mapping case. So + * we don't need to disable the transparent mode to access the DW APB SSI + * registers since they are already available after bs_set_cfg() actions. + */ +static void bs_update_cfg(struct bt1_spi *bs, struct spi_transfer *xfer) +{ + u32 ctrl0, baudr, min_sckdv, sckdv; + + sckdv = DIV_ROUND_UP(bs->ref_freq, xfer->speed_hz); + if (!bs->cfg.cs_gpiod && num_possible_cpus() > 1) + min_sckdv = BC_SPI_MIN_SCKDV; + else + min_sckdv = 2; + sckdv = clamp_val(sckdv, min_sckdv, BC_SPI_BAUDR_SCKDV_MASK); + xfer->effective_speed_hz = bs->ref_freq / sckdv; + + if (xfer->bits_per_word == bs->cfg.dfs && + xfer->speed_hz == bs->cfg.freq) + return; + + ctrl0 = readl(bs->regs + BC_SPI_CTRL0) & ~BC_SPI_CTRL0_DFS_MASK; + ctrl0 |= FIELD_PREP(BC_SPI_CTRL0_DFS_MASK, xfer->bits_per_word - 1); + baudr = FIELD_PREP(BC_SPI_BAUDR_SCKDV_MASK, sckdv); + + writel(0, bs->regs + BC_SPI_SSIENR); + writel(ctrl0, bs->regs + BC_SPI_CTRL0); + writel(baudr, bs->regs + BC_SPI_BAUDR); + writel(BC_SPI_SSIENR_EN, bs->regs + BC_SPI_SSIENR); + + bs->cfg.dfs = xfer->bits_per_word; + bs->cfg.freq = xfer->speed_hz; +} + +static void bs_clear_cfg(struct bt1_spi *bs) +{ + if (bs->cfg.cs_gpiod) { + gpiod_set_value_cansleep(bs->cfg.cs_gpiod, + !(bs->cfg.mode & SPI_CS_HIGH)); + } else if (!bs->cfg.dirmap) { + local_irq_restore(bs->cfg.flags); + preempt_enable(); + } + + /* Disable direct mapping if has been enabled in bs_set_cfg(). */ + if (bs->cfg.dirmap) + writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR); + + writel(0, bs->regs + BC_SPI_SER); +} + +static int bs_check_status(struct bt1_spi *bs) +{ + int ret = 0; + u32 data; + + /* Check the bogus bits of the controller status register. */ + data = readl(bs->regs + BC_SPI_RISR); + if (data & BC_SPI_RISR_RXO) { + dev_err(bs->dev, "RX FIFO overrun detected\n"); + ret = -EIO; + } + + if (data & BC_SPI_RISR_RXU) { + dev_err(bs->dev, "RX FIFO underrun detected\n"); + ret = -EIO; + } + + if (data & BC_SPI_RISR_TXO) { + dev_err(bs->dev, "TX FIFO overrun detected\n"); + ret = -EIO; + } + + readl(bs->regs + BC_SPI_ICR); + + /* + * Spin around the BUSY-state bit waiting for the controller to finish + * all the requested IO-operations. + */ + do { + data = readl(bs->regs + BC_SPI_SR); + } while ((data & BC_SPI_SR_BUSY) || !(data & BC_SPI_SR_TFE)); + + return ret; +} + +static int bs_pull_bytes(struct bt1_spi *bs, u8 *dst, unsigned int len) +{ + const u8 *end = dst + len; + const u8 *next = dst; + int retry = BC_SPI_RETRY; + + /* Write dummy-data to the outbound FIFO to trigger the RO-transfer. */ + if (bs->cfg.tmode == BC_SPI_CTRL0_TMOD_RO) + writel(GENMASK(bs->cfg.dfs - 1, 0), bs->regs + BC_SPI_DR); + + while (dst < end) { + next += __raw_readl(bs->regs + BC_SPI_RXFLR); + if (dst < next) { + do { + *dst++ = __raw_readl(bs->regs + BC_SPI_DR); + } while (dst < next); + retry = BC_SPI_RETRY; + } else if (!retry--) { + return -EIO; + } + } + + return 0; +} + +static void bs_push_bytes(struct bt1_spi *bs, const u8 *src, unsigned int len) +{ + const u8 *end = src + len; + const u8 *next = src; + + while (src < end) { + next += BC_SPI_FIFO_LEN - __raw_readl(bs->regs + BC_SPI_TXFLR); + if (src < next) { + if (next > end) + next = end; + do { + __raw_writel(*src++, bs->regs + BC_SPI_DR); + } while (src < next); + } + } +} + +static void bs_pp_bytes(struct bt1_spi *bs, const u8 *src, u8 *dst, + unsigned int len) +{ + unsigned int txlen = len, rxlen = len; + u32 data, cnt; + + /* + * Note we have to keep the Tx FIFO only half full in order to prevent + * the Rx FIFO overrun in case of the task rescheduling in the middle + * of the loop. + */ + while (txlen || rxlen) { + cnt = BC_SPI_FIFO_LVL - __raw_readl(bs->regs + BC_SPI_TXFLR); + cnt = min(cnt, txlen); + txlen -= cnt; + while (cnt--) { + data = src ? *src++ : 0xFF; + __raw_writel(data, bs->regs + BC_SPI_DR); + } + cnt = __raw_readl(bs->regs + BC_SPI_RXFLR); + cnt = min(cnt, rxlen); + rxlen -= cnt; + while (cnt--) { + data = __raw_readl(bs->regs + BC_SPI_DR); + if (dst) + *dst++ = data; + } + } +} + +static void bs_pp_words(struct bt1_spi *bs, const u16 *src, u16 *dst, + unsigned int len) +{ + unsigned int txlen = len, rxlen = len; + u32 data, cnt; + + while (txlen || rxlen) { + cnt = BC_SPI_FIFO_LVL - __raw_readl(bs->regs + BC_SPI_TXFLR); + cnt = min(cnt, txlen); + txlen -= cnt; + while (cnt--) { + data = src ? *src++ : 0xFFFF; + __raw_writel(data, bs->regs + BC_SPI_DR); + } + cnt = __raw_readl(bs->regs + BC_SPI_RXFLR); + cnt = min(cnt, rxlen); + rxlen -= cnt; + while (cnt--) { + data = __raw_readl(bs->regs + BC_SPI_DR); + if (dst) + *dst++ = data; + } + } +} + +static u8 *bs_create_cmd(struct bt1_spi *bs, const struct spi_mem_op *op, + unsigned int *len) +{ + unsigned int i, j; + const u8 *data; + u8 *cmd; + + /* + * Calculate the total length of the EEPROM command transfer and + * either use the pre-allocated buffer or create a temporary one. + */ + *len = 1 + op->addr.nbytes + op->dummy.nbytes; + if (op->data.dir == SPI_MEM_DATA_OUT) + *len += op->data.nbytes; + + if (*len <= BC_SPI_BUF_SIZE) { + cmd = bs->buf; + } else { + cmd = kzalloc(*len, GFP_KERNEL); + if (!cmd) + return ERR_PTR(-ENOMEM); + } + + /* + * Collect the operation code, address and dummy bytes into the single + * buffer. + */ + cmd[0] = op->cmd.opcode; + for (i = 1; i <= op->addr.nbytes; ++i) { + cmd[i] = op->addr.val >> + (BITS_PER_BYTE * (op->addr.nbytes - i)); + } + for (; i <= op->addr.nbytes + op->dummy.nbytes; ++i) + cmd[i] = 0xFF; + + /* + * If it's a transfer with data to be sent, also copy it into the + * single buffer in order to speed up the push-data loop. Otherwise + * in case of high-frequency SPI transfers we may have an occasional + * output data loss. + */ + if (op->data.dir == SPI_MEM_DATA_OUT) { + data = op->data.buf.out; + for (j = 0; j < op->data.nbytes; ++i, ++j) + cmd[i] = data[j]; + } + + return cmd; +} + +static void bs_clear_cmd(struct bt1_spi *bs, u8 *cmd) +{ + if (cmd != bs->buf) + kfree(cmd); +} + +static int bs_adjust_mem_op_size(struct spi_mem *mem, struct spi_mem_op *op) +{ + if (op->data.dir == SPI_MEM_DATA_IN) { + if (op->data.nbytes > (BC_SPI_CTRL1_NDF_MASK + 1)) + op->data.nbytes = BC_SPI_CTRL1_NDF_MASK + 1; + } + + return 0; +} + +static bool bs_supports_mem_op(struct spi_mem *mem, + const struct spi_mem_op *op) +{ + if (op->data.buswidth > 1 || op->addr.buswidth > 1 || + op->dummy.buswidth > 1 || op->cmd.buswidth > 1) + return false; + + return spi_mem_default_supports_op(mem, op); +} + +static int bs_exec_mem_op(struct spi_mem *mem, + const struct spi_mem_op *op) +{ + struct bt1_spi *bs = spi_controller_get_devdata(mem->spi->controller); + struct bt1_spi_cfg req; + unsigned int len; + int ret; + u8 *cmd; + + /* Collect the controller configuration required by the operation. */ + req.dirmap = false; + req.cs = mem->spi->chip_select; + req.cs_gpiod = mem->spi->cs_gpiod; + req.flags = 0; + req.mode = mem->spi->mode; + if (req.cs_gpiod) { + req.tmode = BC_SPI_CTRL0_TMOD_TR; + req.ndf = 0; + } else if (op->data.dir == SPI_MEM_DATA_IN) { + req.tmode = BC_SPI_CTRL0_TMOD_READ_EEPROM; + req.ndf = op->data.nbytes; + } else { + req.tmode = BC_SPI_CTRL0_TMOD_TO; + req.ndf = 0; + } + req.dfs = BITS_PER_BYTE; + req.freq = mem->spi->max_speed_hz; + + /* + * Collect the command data into a single output buffer to speed the + * transmission up. + */ + cmd = bs_create_cmd(bs, op, &len); + if (IS_ERR(cmd)) + return PTR_ERR(cmd); + + bs_set_cfg(bs, &req); + + /* + * In the boosted mode we have to push and pull the data as soon as + * possible otherwise it might be lost, due to either native CS + * getting toggled or Rx FIFO getting overrun. In non-boosted mode we + * can relax and push-pull data synchronously with no worries of losing + * any data or breaking the transfers due to the native CS toggle. + */ + if (!bs->cfg.cs_gpiod) { + bs_push_bytes(bs, cmd, len); + + if (op->data.dir == SPI_MEM_DATA_IN) + ret = bs_pull_bytes(bs, op->data.buf.in, + op->data.nbytes); + } else { + bs_pp_bytes(bs, cmd, NULL, len); + + if (op->data.dir == SPI_MEM_DATA_IN) + bs_pp_bytes(bs, NULL, op->data.buf.in, + op->data.nbytes); + } + + ret = bs_check_status(bs); + + bs_clear_cfg(bs); + + bs_clear_cmd(bs, cmd); + + return ret; +} + +static int bs_dirmap_create(struct spi_mem_dirmap_desc *desc) +{ + struct spi_controller *ctlr = desc->mem->spi->controller; + struct bt1_spi *bs = spi_controller_get_devdata(ctlr); + + if (!bs->map || !bs_supports_mem_op(desc->mem, &desc->info.op_tmpl)) + return -ENOTSUPP; + + /* + * Make sure the requested region doesn't go out of the physically + * mapped flash memory bounds and the operation is read-only. + */ + if (desc->info.offset + desc->info.length > bs->map_len || + desc->info.op_tmpl.data.dir != SPI_MEM_DATA_IN) + return -ENOTSUPP; + + return 0; +} + +static void bs_copy_from_map(void *to, void __iomem *from, size_t len) +{ + size_t shift, chunk; + u32 data; + + /* + * We split the copying up into the next three stages: unaligned head, + * aligned body, unaligned tail. + */ + shift = (size_t)from & 0x3; + if (shift) { + chunk = min_t(size_t, 4 - shift, len); + data = readl_relaxed(from - shift); + memcpy(to, &data + shift, chunk); + from += chunk; + to += chunk; + len -= chunk; + } + + while (len >= 4) { + data = readl_relaxed(from); + memcpy(to, &data, 4); + from += 4; + to += 4; + len -= 4; + } + + if (len) { + data = readl_relaxed(from); + memcpy(to, &data, len); + } +} + +static ssize_t bs_dirmap_read(struct spi_mem_dirmap_desc *desc, + u64 offs, size_t len, void *buf) +{ + struct spi_mem *mem = desc->mem; + struct bt1_spi *bs = spi_controller_get_devdata(mem->spi->controller); + struct bt1_spi_cfg req; + + /* + * Make sure the requested operation length is valid. Truncate the + * length if it's greater than the length of the MMIO region. + */ + if (offs >= bs->map_len || !len) + return 0; + + len = min_t(size_t, len, bs->map_len - offs); + + /* Collect the controller configuration required by the operation. */ + req.dirmap = true; + req.cs = mem->spi->chip_select; + req.cs_gpiod = mem->spi->cs_gpiod; + req.flags = 0; + req.mode = mem->spi->mode; + req.tmode = BC_SPI_CTRL0_TMOD_READ_EEPROM; + req.ndf = 4; + req.dfs = BITS_PER_BYTE; + req.freq = mem->spi->max_speed_hz; + + bs_set_cfg(bs, &req); + + bs_copy_from_map(buf, bs->map + offs, len); + + bs_clear_cfg(bs); + + return len; +} + +static const struct spi_controller_mem_ops bs_mem_ops = { + .adjust_op_size = bs_adjust_mem_op_size, + .supports_op = bs_supports_mem_op, + .exec_op = bs_exec_mem_op, + .dirmap_create = bs_dirmap_create, + .dirmap_read = bs_dirmap_read +}; + +static int bs_prepare_message(struct spi_controller *ctrl, + struct spi_message *msg) +{ + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); + struct spi_transfer *xfer; + struct bt1_spi_cfg req; + + /* + * Currently the driver doesn't support the generic messages-based + * SPI interface with pure native chip-select signal. This is due + * to the automatic native chip-select toggle peculiarity described + * in the comment of the bs_set_cfg() method. Alas we can't use the + * IRQ-disable-based boost approach here since the native queue-based + * SPI messages transfer method can sleep waiting for the + * transfers/CS-change delays. + */ + if (!msg->spi->cs_gpiod) + return -ENOTSUPP; + + /* + * Collect the controller configuration common for all transfers and + * specific to the very first one. + */ + xfer = list_first_entry(&msg->transfers, typeof(*xfer), transfer_list); + req.dirmap = false; + req.cs = msg->spi->chip_select; + req.cs_gpiod = msg->spi->cs_gpiod; + req.flags = 0; + req.mode = msg->spi->mode; + req.tmode = BC_SPI_CTRL0_TMOD_TR; + req.ndf = 0; + req.dfs = xfer->bits_per_word; + req.freq = xfer->speed_hz; + + bs_set_cfg(bs, &req); + + return 0; +} + +static int bs_unprepare_message(struct spi_controller *ctrl, + struct spi_message *msg) +{ + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); + + bs_clear_cfg(bs); + + return 0; +} + +static int bs_transfer_one(struct spi_controller *ctrl, struct spi_device *spi, + struct spi_transfer *xfer) +{ + struct bt1_spi *bs = spi_controller_get_devdata(ctrl); + + bs_update_cfg(bs, xfer); + + if (bs->cfg.dfs <= 8) + bs_pp_bytes(bs, xfer->tx_buf, xfer->rx_buf, xfer->len); + else + bs_pp_words(bs, xfer->tx_buf, xfer->rx_buf, xfer->len / 2); + + return bs_check_status(bs); +} + +static struct bt1_spi *bs_create_data(struct platform_device *pdev) +{ + struct device *dev = &pdev->dev; + struct bt1_spi *bs; + + bs = devm_kzalloc(dev, sizeof(*bs), GFP_KERNEL); + if (!bs) + return ERR_PTR(-ENOMEM); + + bs->dev = dev; + bs->pdev = pdev; + + return bs; +} + +static int bs_request_regs(struct bt1_spi *bs) +{ + struct resource *res; + + bs->regs = devm_platform_ioremap_resource_byname(bs->pdev, "config"); + if (IS_ERR(bs->regs)) { + dev_err(bs->dev, "Couldn't get the controller registers\n"); + return PTR_ERR(bs->regs); + } + + /* + * Direct mapping is optional feature, which in fact doesn't really + * give much of the performance gain. But the driver supports it + * anyway. + */ + bs->map = devm_platform_ioremap_resource_byname(bs->pdev, "map"); + if (IS_ERR(bs->map)) { + bs->map = NULL; + bs->map_len = 0; + dev_warn(bs->dev, "No direct mapping memory detected\n"); + } else { + res = platform_get_resource_byname(bs->pdev, IORESOURCE_MEM, + "map"); + bs->map_len = resource_size(res); + } + + return 0; +} + +static void bs_disable_clk(void *data) +{ + struct bt1_spi *bs = data; + + clk_disable_unprepare(bs->ref); +} + +static int bs_request_clks(struct bt1_spi *bs) +{ + int ret; + + /* + * SSI reference and APB clocks are synchronous so we only need a + * single clock source requested and enabled. + */ + bs->ref = devm_clk_get(bs->dev, NULL); + if (IS_ERR(bs->ref)) { + dev_err(bs->dev, "Couldn't get the ref clock descriptor\n"); + return PTR_ERR(bs->ref); + } + + ret = clk_prepare_enable(bs->ref); + if (ret) { + dev_err(bs->dev, "Couldn't enable the reference clock\n"); + return ret; + } + + ret = devm_add_action_or_reset(bs->dev, bs_disable_clk, bs); + if (ret) { + dev_err(bs->dev, "Can't add the clocks disable action\n"); + return ret; + } + + bs->ref_freq = clk_get_rate(bs->ref); + if (!bs->ref_freq) { + dev_err(bs->dev, "Invalid reference clock frequency\n"); + return -EINVAL; + } + + return 0; +} + +static int bs_create_spi_controller(struct bt1_spi *bs) +{ + int ret; + + bs->ctrl = spi_alloc_master(bs->dev, 0); + if (!bs->ctrl) { + dev_err(bs->dev, "No memory for SPI controller descriptor\n"); + return -ENOMEM; + } + + spi_controller_set_devdata(bs->ctrl, bs); + + bs->ctrl->use_gpio_descriptors = true; + bs->ctrl->mode_bits = SPI_CPOL | SPI_CPHA | SPI_LOOP; + bs->ctrl->bits_per_word_mask = SPI_BPW_RANGE_MASK(4, 16); + bs->ctrl->bus_num = bs->pdev->id; + bs->ctrl->prepare_message = bs_prepare_message; + bs->ctrl->unprepare_message = bs_unprepare_message; + bs->ctrl->transfer_one = bs_transfer_one; + bs->ctrl->mem_ops = &bs_mem_ops; + bs->ctrl->min_speed_hz = bs->ref_freq / (BC_SPI_BAUDR_SCKDV_MASK - 1); + bs->ctrl->max_speed_hz = bs->ref_freq / 2; + bs->ctrl->dev.of_node = bs->dev->of_node; + bs->ctrl->dev.fwnode = bs->dev->fwnode; + bs->ctrl->flags = SPI_MASTER_GPIO_SS; + + ret = devm_spi_register_controller(bs->dev, bs->ctrl); + if (ret) { + spi_controller_put(bs->ctrl); + dev_err(bs->dev, "Failed to register the SPI controller\n"); + return ret; + } + + return 0; +} + +#ifdef CONFIG_DEBUG_FS + +#define BC_SPI_DBGFS_REG(_name, _off) \ +{ \ + .name = _name, \ + .offset = _off \ +} + +static const struct debugfs_reg32 bs_dbgfs_regs[] = { + BC_SPI_DBGFS_REG("CSR", BC_CSR), + BC_SPI_DBGFS_REG("MAR", BC_MAR), + BC_SPI_DBGFS_REG("DRID", BC_DRID), + BC_SPI_DBGFS_REG("VID", BC_VID), + BC_SPI_DBGFS_REG("CTRL0", BC_SPI_CTRL0), + BC_SPI_DBGFS_REG("CTRL1", BC_SPI_CTRL1), + BC_SPI_DBGFS_REG("SSIENR", BC_SPI_SSIENR), + BC_SPI_DBGFS_REG("SER", BC_SPI_SER), + BC_SPI_DBGFS_REG("BAUDR", BC_SPI_BAUDR), + BC_SPI_DBGFS_REG("TXFLR", BC_SPI_TXFLR), + BC_SPI_DBGFS_REG("RXFLR", BC_SPI_RXFLR), + BC_SPI_DBGFS_REG("SR", BC_SPI_SR), + BC_SPI_DBGFS_REG("RISR", BC_SPI_RISR), + BC_SPI_DBGFS_REG("IDR", BC_SPI_IDR), + BC_SPI_DBGFS_REG("VERSION", BC_SPI_VERSION) +}; + +static int bs_dbgfs_show_regs(struct seq_file *s, void *data) +{ + const struct debugfs_reg32 *regs = bs_dbgfs_regs; + struct bt1_spi *bs = s->private; + int i; + + for (i = 0; i < ARRAY_SIZE(bs_dbgfs_regs); i++, regs++) { + seq_printf(s, "%s = 0x%08x\n", regs->name, + readl(bs->regs + regs->offset)); + if (seq_has_overflowed(s)) + break; + } + + return 0; +} + +static int bs_dbgfs_open_regs(struct inode *inode, struct file *file) +{ + struct bt1_spi *bs = inode->i_private; + int ret; + + ret = single_open(file, bs_dbgfs_show_regs, bs); + if (!ret) { + mutex_lock(&bs->ctrl->io_mutex); + writel(BC_CSR_SPI_RDA, bs->regs + BC_CSR); + } + + return ret; +} + +static int bs_dbgfs_release_regs(struct inode *inode, struct file *file) +{ + struct bt1_spi *bs = inode->i_private; + + mutex_unlock(&bs->ctrl->io_mutex); + + return single_release(inode, file); +} + +static const struct file_operations bs_fops_regs = { + .open = bs_dbgfs_open_regs, + .read = seq_read, + .llseek = seq_lseek, + .release = bs_dbgfs_release_regs +}; + +static void bs_dbgfs_remove(void *data) +{ + struct bt1_spi *bs = data; + + debugfs_remove_recursive(bs->dbgfs); +} + +static void bs_dbgfs_init(struct bt1_spi *bs) +{ + char name[32]; + int ret; + + snprintf(name, 32, "bt1-sys-ssi%d", bs->ctrl->bus_num); + bs->dbgfs = debugfs_create_dir(name, NULL); + + ret = devm_add_action_or_reset(bs->dev, bs_dbgfs_remove, bs); + if (ret) + return; + + debugfs_create_file("registers", 0400, bs->dbgfs, bs, &bs_fops_regs); +} + +#else + +static inline int bs_dbgfs_init(struct bt1_spi *bs) +{ + return 0; +} + +#endif /* CONFIG_DEBUG_FS */ + +static int bs_probe(struct platform_device *pdev) +{ + struct bt1_spi *bs; + int ret; + + bs = bs_create_data(pdev); + if (IS_ERR(bs)) + return PTR_ERR(bs); + + ret = bs_request_regs(bs); + if (ret) + return ret; + + ret = bs_request_clks(bs); + if (ret) + return ret; + + ret = bs_create_spi_controller(bs); + if (ret) + return ret; + + bs_dbgfs_init(bs); + + return 0; +} + +static const struct of_device_id bs_of_match[] = { + { .compatible = "baikal,bt1-sys-ssi" }, + { } +}; + +static struct platform_driver bs_driver = { + .probe = bs_probe, + .driver = { + .name = "bt1-sys-ssi", + .of_match_table = bs_of_match + } +}; +module_platform_driver(bs_driver); + +MODULE_AUTHOR("Serge Semin <Sergey.Semin@baikalelectronics.ru>"); +MODULE_DESCRIPTION("Baikal-T1 System Boot SPI Controller driver"); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/spi/spi-bt1-sys.h b/drivers/spi/spi-bt1-sys.h new file mode 100644 index 000000000000..a45c0ef9ff4d --- /dev/null +++ b/drivers/spi/spi-bt1-sys.h @@ -0,0 +1,169 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC + * + * Baikal-T1 System Boot SPI Controller driver + */ +#ifndef __SPI_BT1_SYS_H__ +#define __SPI_BT1_SYS_H__ + +#include <linux/kernel.h> +#include <linux/bits.h> +#include <linux/platform_device.h> +#include <linux/clk.h> +#include <linux/debugfs.h> +#include <linux/gpio/consumer.h> +#include <linux/spi/spi.h> + +/* + * Baikal-T1 System Boot SPI controller registers. In addition to the Vendor- + * specific settings these are normal Synopsys DW APB SSI controller registers. + * Exception is that the IP core is synthesized with no IRQ/DMA, with very + * small FIFO of just 8 words and have a specific IP block implementing + * physically mapped SPI flash found at CS0. + */ +#define BC_CSR 0x000 +#define BC_CSR_MODE_FLD 0 +#define BC_CSR_MODE_MASK GENMASK(1, BC_CSR_MODE_FLD) +#define BC_CSR_MODE_ROM 0 +#define BC_CSR_MODE_SRAM 1 +#define BC_CSR_MODE_FLASH 2 +#define BC_CSR_SPI_RDA BIT(8) +#define BC_MAR 0x004 +#define BC_MAR_BSAB BIT(0) +#define BC_DRID 0x008 +#define BC_VID 0x00C +#define BC_SPI_CTRL0 0x100 +#define BC_SPI_CTRL0_DFS_FLD 0 +#define BC_SPI_CTRL0_DFS_MASK GENMASK(3, BC_SPI_CTRL0_DFS_FLD) +#define BC_SPI_CTRL0_FRF_FLD 4 +#define BC_SPI_CTRL0_FRF_MASK GENMASK(5, BC_SPI_CTRL0_FRF_FLD) +#define BC_SPI_CTRL0_FRF_SPI 0 +#define BC_SPI_CTRL0_FRF_SSP 1 +#define BC_SPI_CTRL0_FRF_MW 2 +#define BC_SPI_CTRL0_SCPH BIT(6) +#define BC_SPI_CTRL0_SCPOL BIT(7) +#define BC_SPI_CTRL0_TMOD_FLD 8 +#define BC_SPI_CTRL0_TMOD_MASK GENMASK(9, BC_SPI_CTRL0_TMOD_FLD) +#define BC_SPI_CTRL0_TMOD_TR 0 +#define BC_SPI_CTRL0_TMOD_TO 1 +#define BC_SPI_CTRL0_TMOD_RO 2 +#define BC_SPI_CTRL0_TMOD_READ_EEPROM 3 +#define BC_SPI_CTRL0_SRL BIT(11) +#define BC_SPI_CTRL1 0x104 +#define BC_SPI_CTRL1_NDF_FLD 0 +#define BC_SPI_CTRL1_NDF_MASK GENMASK(15, BC_SPI_CTRL1_NDF_FLD) +#define BC_SPI_SSIENR 0x108 +#define BC_SPI_SSIENR_EN BIT(0) +#define BC_SPI_SER 0x110 +#define BC_SPI_SER_0 BIT(0) +#define BC_SPI_BAUDR 0x114 +#define BC_SPI_BAUDR_SCKDV_FLD 0 +#define BC_SPI_BAUDR_SCKDV_MASK GENMASK(15, BC_SPI_BAUDR_SCKDV_FLD) +#define BC_SPI_TXFTLR 0x118 +#define BC_SPI_RXFTLR 0x11C +#define BC_SPI_TXFLR 0x120 +#define BC_SPI_TXFLR_TXTFL_FLD 0 +#define BC_SPI_TXFLR_TXTFL_MASK GENMASK(2, BC_SPI_TXFLR_TXTFL_FLD) +#define BC_SPI_RXFLR 0x124 +#define BC_SPI_RXFLR_RXTFL_FLD 0 +#define BC_SPI_RXFLR_RXTFL_MASK GENMASK(2, BC_SPI_RXFLR_RXTFL_FLD) +#define BC_SPI_SR 0x128 +#define BC_SPI_SR_BUSY BIT(0) +#define BC_SPI_SR_TFNF BIT(1) +#define BC_SPI_SR_TFE BIT(2) +#define BC_SPI_SR_RFNE BIT(3) +#define BC_SPI_SR_RFF BIT(4) +#define BC_SPI_RISR 0x134 +#define BC_SPI_RISR_TXE BIT(0) +#define BC_SPI_RISR_TXO BIT(1) +#define BC_SPI_RISR_RXU BIT(2) +#define BC_SPI_RISR_RXO BIT(3) +#define BC_SPI_RISR_RXF BIT(4) +#define BC_SPI_TXOICR 0x138 +#define BC_SPI_RXOICR 0x13C +#define BC_SPI_RXUICR 0x140 +#define BC_SPI_ICR 0x148 +#define BC_SPI_ICR_STS BIT(0) +#define BC_SPI_IDR 0x158 +#define BC_SPI_VERSION 0x15C +#define BC_SPI_DR 0x160 +#define BC_SPI_RX_DLY 0x1F0 + +/* + * Boot controller specific parameters + * @BC_SPI_FIFO_LEN: DW APB SSI controller FIFO length. + * @BC_SPI_FIFO_LVL: Safe Tx FIFO level so not to afraid the xfer task being + * scheduled. + * @BC_SPI_NUM_CS: Number of hardware chip selects. + * @BC_SPI_MIN_SCKDV: Minimal clock divider supported when native CS is used. + * @BC_SPI_RETRY: Number of retries before give up in waiting new data. + * Timeout may happen if the controller detected Rx-overrun + * event. + * @BC_SPI_BUF_SIZE: Size of the cmd-buffer for Read/Write-EEPROM transfer. It + * includes the opcode, address, dummy and data bytes. + */ +#define BC_SPI_FIFO_LEN 8 +#define BC_SPI_FIFO_LVL (BC_SPI_FIFO_LEN / 2) +#define BC_SPI_NUM_CS 1 +#define BC_SPI_MIN_SCKDV 6 +#define BC_SPI_RETRY 1000 +#define BC_SPI_BUF_SIZE (1 + BITS_PER_LONG_LONG / BITS_PER_BYTE + 512) + +/* + * struct bt1_spi_cfg - DW APB SPI controller configuration + * @dirmap: Flag of SPI flash direct mapping. + * @cs: Native chip select number. + * @cs_gpiod: Chip-select GPIO descriptor. + * @flags: IRQ flags saved in case of the boosted transfers. + * @mode: SPI transfer mode. + * @tmode: TR/TO/RO/EEPROM modes. + * @dfs: Data frame size. + * @ndf: Number of data frames for RO/EEPROM transfers. + * @freq: Transfer frequency. + */ +struct bt1_spi_cfg { + bool dirmap; + u8 cs; + struct gpio_desc *cs_gpiod; + unsigned long flags; + u32 mode; + u32 tmode; + u32 dfs; + u32 ndf; + u32 freq; +}; + +/* + * struct bt1_spi - Bailal-T1 System Boot SPI controller private data + * @dev: Pointer to the device structure. + * @pdev: Pointer to the platform device structure. + * @ctrl: Pointer to the SPI controller structure. + * @regs: Memory mapped controller registers. + * @map: Physically mapped SPI flash. + * @map_len: Length of the physically mapped ROM. + * @buf: Temporary buffer. + * @cfg: Current controller configuration. + * @ref_freq: Reference frequency. + * @ref: Reference clock source. + */ +struct bt1_spi { + struct device *dev; + struct platform_device *pdev; + struct spi_controller *ctrl; + + void __iomem *regs; + void __iomem *map; + resource_size_t map_len; + + u8 buf[BC_SPI_BUF_SIZE]; + struct bt1_spi_cfg cfg; + unsigned long ref_freq; + struct clk *ref; + +#ifdef CONFIG_DEBUG_FS + struct dentry *dbgfs; +#endif +}; + +#endif /* __SPI_BT1_SYS_H__ */