Message ID | 20200912140730.3.Ided778fb4cd078e36c6b240d1b279cd7a534a313@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] spi: spi-geni-qcom: Use the FIFO even more | expand |
On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote: > When setting up a bidirectional transfer we need to program both the > TX and RX lengths. We don't need a memory barrier between those two > writes. Factor out the __iowmb() and use writel_relaxed(). This > saves a fraction of a microsecond of setup overhead on bidirectional > transfers. > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > drivers/spi/spi-geni-qcom.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > index 92d88bf85a90..6c7e12b68bf0 100644 > --- a/drivers/spi/spi-geni-qcom.c > +++ b/drivers/spi/spi-geni-qcom.c > @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, > len &= TRANS_LEN_MSK; > > mas->cur_xfer = xfer; > + > + /* > + * Factor out the __iowmb() so that we can use writel_relaxed() for > + * both writes below and thus only incur the overhead once even if > + * we execute both of them. > + */ How many passes through this function do we have to take before saving the amount of time it took me to read this comment? Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Regards, Bjorn > + __iowmb(); > + > if (xfer->tx_buf) { > m_cmd |= SPI_TX_ONLY; > mas->tx_rem_bytes = xfer->len; > - writel(len, se->base + SE_SPI_TX_TRANS_LEN); > + writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN); > } > - > if (xfer->rx_buf) { > m_cmd |= SPI_RX_ONLY; > - writel(len, se->base + SE_SPI_RX_TRANS_LEN); > + writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN); > mas->rx_rem_bytes = xfer->len; > } > > -- > 2.28.0.618.gf4bc123cb7-goog >
Hi, On Sat, Sep 12, 2020 at 3:54 PM Bjorn Andersson <bjorn.andersson@linaro.org> wrote: > > On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote: > > > When setting up a bidirectional transfer we need to program both the > > TX and RX lengths. We don't need a memory barrier between those two > > writes. Factor out the __iowmb() and use writel_relaxed(). This > > saves a fraction of a microsecond of setup overhead on bidirectional > > transfers. > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > drivers/spi/spi-geni-qcom.c | 13 ++++++++++--- > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > > index 92d88bf85a90..6c7e12b68bf0 100644 > > --- a/drivers/spi/spi-geni-qcom.c > > +++ b/drivers/spi/spi-geni-qcom.c > > @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, > > len &= TRANS_LEN_MSK; > > > > mas->cur_xfer = xfer; > > + > > + /* > > + * Factor out the __iowmb() so that we can use writel_relaxed() for > > + * both writes below and thus only incur the overhead once even if > > + * we execute both of them. > > + */ > > How many passes through this function do we have to take before saving > the amount of time it took me to read this comment? > > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Thanks for the review! Yeah, in Chrome OS we do a crazy amount of SPI transfers since our EC and security chip are connected over SPI and we seem to pile a whole lot of stuff into the EC. This means we keep coming back to the SPI controller again and again when profiling things. I'm hoping that we'll eventually be able to get DMA enabled here, but until then at least it's nice to make the FIFO transfers better... -Doug
Hi Douglas, I love your patch! Yet something to improve: [auto build test ERROR on spi/for-next] [also build test ERROR on v5.9-rc4 next-20200911] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Douglas-Anderson/spi-spi-geni-qcom-Use-the-FIFO-even-more/20200913-050928 base: https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next config: x86_64-randconfig-a002-20200913 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 3170d54842655d6d936aae32b7d0bc92fce7f22e) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> drivers/spi/spi-geni-qcom.c:385:2: error: implicit declaration of function '__iowmb' [-Werror,-Wimplicit-function-declaration] __iowmb(); ^ 1 error generated. # https://github.com/0day-ci/linux/commit/4458adf4007926cfaaa505b54a83059c9ba813ad git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Douglas-Anderson/spi-spi-geni-qcom-Use-the-FIFO-even-more/20200913-050928 git checkout 4458adf4007926cfaaa505b54a83059c9ba813ad vim +/__iowmb +385 drivers/spi/spi-geni-qcom.c 334 335 static void setup_fifo_xfer(struct spi_transfer *xfer, 336 struct spi_geni_master *mas, 337 u16 mode, struct spi_master *spi) 338 { 339 u32 m_cmd = 0; 340 u32 len; 341 struct geni_se *se = &mas->se; 342 int ret; 343 344 /* 345 * Ensure that our interrupt handler isn't still running from some 346 * prior command before we start messing with the hardware behind 347 * its back. We don't need to _keep_ the lock here since we're only 348 * worried about racing with out interrupt handler. The SPI core 349 * already handles making sure that we're not trying to do two 350 * transfers at once or setting a chip select and doing a transfer 351 * concurrently. 352 * 353 * NOTE: we actually _can't_ hold the lock here because possibly we 354 * might call clk_set_rate() which needs to be able to sleep. 355 */ 356 spin_lock_irq(&mas->lock); 357 spin_unlock_irq(&mas->lock); 358 359 if (xfer->bits_per_word != mas->cur_bits_per_word) { 360 spi_setup_word_len(mas, mode, xfer->bits_per_word); 361 mas->cur_bits_per_word = xfer->bits_per_word; 362 } 363 364 /* Speed and bits per word can be overridden per transfer */ 365 ret = geni_spi_set_clock_and_bw(mas, xfer->speed_hz); 366 if (ret) 367 return; 368 369 mas->tx_rem_bytes = 0; 370 mas->rx_rem_bytes = 0; 371 372 if (!(mas->cur_bits_per_word % MIN_WORD_LEN)) 373 len = xfer->len * BITS_PER_BYTE / mas->cur_bits_per_word; 374 else 375 len = xfer->len / (mas->cur_bits_per_word / BITS_PER_BYTE + 1); 376 len &= TRANS_LEN_MSK; 377 378 mas->cur_xfer = xfer; 379 380 /* 381 * Factor out the __iowmb() so that we can use writel_relaxed() for 382 * both writes below and thus only incur the overhead once even if 383 * we execute both of them. 384 */ > 385 __iowmb(); 386 387 if (xfer->tx_buf) { 388 m_cmd |= SPI_TX_ONLY; 389 mas->tx_rem_bytes = xfer->len; 390 writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN); 391 } 392 if (xfer->rx_buf) { 393 m_cmd |= SPI_RX_ONLY; 394 writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN); 395 mas->rx_rem_bytes = xfer->len; 396 } 397 398 /* 399 * Lock around right before we start the transfer since our 400 * interrupt could come in at any time now. 401 */ 402 spin_lock_irq(&mas->lock); 403 geni_se_setup_m_cmd(se, m_cmd, FRAGMENTATION); 404 405 /* 406 * TX_WATERMARK_REG should be set after SPI configuration and 407 * setting up GENI SE engine, as driver starts data transfer 408 * for the watermark interrupt. 409 */ 410 if (m_cmd & SPI_TX_ONLY) 411 writel(mas->tx_wm, se->base + SE_GENI_TX_WATERMARK_REG); 412 spin_unlock_irq(&mas->lock); 413 } 414 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Douglas, I love your patch! Yet something to improve: [auto build test ERROR on spi/for-next] [also build test ERROR on v5.9-rc4 next-20200911] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Douglas-Anderson/spi-spi-geni-qcom-Use-the-FIFO-even-more/20200913-050928 base: https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next config: sparc-allyesconfig (attached as .config) compiler: sparc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sparc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): drivers/spi/spi-geni-qcom.c: In function 'setup_fifo_xfer': >> drivers/spi/spi-geni-qcom.c:385:2: error: implicit declaration of function '__iowmb' [-Werror=implicit-function-declaration] 385 | __iowmb(); | ^~~~~~~ cc1: some warnings being treated as errors # https://github.com/0day-ci/linux/commit/4458adf4007926cfaaa505b54a83059c9ba813ad git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Douglas-Anderson/spi-spi-geni-qcom-Use-the-FIFO-even-more/20200913-050928 git checkout 4458adf4007926cfaaa505b54a83059c9ba813ad vim +/__iowmb +385 drivers/spi/spi-geni-qcom.c 334 335 static void setup_fifo_xfer(struct spi_transfer *xfer, 336 struct spi_geni_master *mas, 337 u16 mode, struct spi_master *spi) 338 { 339 u32 m_cmd = 0; 340 u32 len; 341 struct geni_se *se = &mas->se; 342 int ret; 343 344 /* 345 * Ensure that our interrupt handler isn't still running from some 346 * prior command before we start messing with the hardware behind 347 * its back. We don't need to _keep_ the lock here since we're only 348 * worried about racing with out interrupt handler. The SPI core 349 * already handles making sure that we're not trying to do two 350 * transfers at once or setting a chip select and doing a transfer 351 * concurrently. 352 * 353 * NOTE: we actually _can't_ hold the lock here because possibly we 354 * might call clk_set_rate() which needs to be able to sleep. 355 */ 356 spin_lock_irq(&mas->lock); 357 spin_unlock_irq(&mas->lock); 358 359 if (xfer->bits_per_word != mas->cur_bits_per_word) { 360 spi_setup_word_len(mas, mode, xfer->bits_per_word); 361 mas->cur_bits_per_word = xfer->bits_per_word; 362 } 363 364 /* Speed and bits per word can be overridden per transfer */ 365 ret = geni_spi_set_clock_and_bw(mas, xfer->speed_hz); 366 if (ret) 367 return; 368 369 mas->tx_rem_bytes = 0; 370 mas->rx_rem_bytes = 0; 371 372 if (!(mas->cur_bits_per_word % MIN_WORD_LEN)) 373 len = xfer->len * BITS_PER_BYTE / mas->cur_bits_per_word; 374 else 375 len = xfer->len / (mas->cur_bits_per_word / BITS_PER_BYTE + 1); 376 len &= TRANS_LEN_MSK; 377 378 mas->cur_xfer = xfer; 379 380 /* 381 * Factor out the __iowmb() so that we can use writel_relaxed() for 382 * both writes below and thus only incur the overhead once even if 383 * we execute both of them. 384 */ > 385 __iowmb(); 386 387 if (xfer->tx_buf) { 388 m_cmd |= SPI_TX_ONLY; 389 mas->tx_rem_bytes = xfer->len; 390 writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN); 391 } 392 if (xfer->rx_buf) { 393 m_cmd |= SPI_RX_ONLY; 394 writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN); 395 mas->rx_rem_bytes = xfer->len; 396 } 397 398 /* 399 * Lock around right before we start the transfer since our 400 * interrupt could come in at any time now. 401 */ 402 spin_lock_irq(&mas->lock); 403 geni_se_setup_m_cmd(se, m_cmd, FRAGMENTATION); 404 405 /* 406 * TX_WATERMARK_REG should be set after SPI configuration and 407 * setting up GENI SE engine, as driver starts data transfer 408 * for the watermark interrupt. 409 */ 410 if (m_cmd & SPI_TX_ONLY) 411 writel(mas->tx_wm, se->base + SE_GENI_TX_WATERMARK_REG); 412 spin_unlock_irq(&mas->lock); 413 } 414 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi, On Sat, Sep 12, 2020 at 6:09 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Sat, Sep 12, 2020 at 3:54 PM Bjorn Andersson > <bjorn.andersson@linaro.org> wrote: > > > > On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote: > > > > > When setting up a bidirectional transfer we need to program both the > > > TX and RX lengths. We don't need a memory barrier between those two > > > writes. Factor out the __iowmb() and use writel_relaxed(). This > > > saves a fraction of a microsecond of setup overhead on bidirectional > > > transfers. > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > > > > drivers/spi/spi-geni-qcom.c | 13 ++++++++++--- > > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > > > index 92d88bf85a90..6c7e12b68bf0 100644 > > > --- a/drivers/spi/spi-geni-qcom.c > > > +++ b/drivers/spi/spi-geni-qcom.c > > > @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, > > > len &= TRANS_LEN_MSK; > > > > > > mas->cur_xfer = xfer; > > > + > > > + /* > > > + * Factor out the __iowmb() so that we can use writel_relaxed() for > > > + * both writes below and thus only incur the overhead once even if > > > + * we execute both of them. > > > + */ > > > > How many passes through this function do we have to take before saving > > the amount of time it took me to read this comment? > > > > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > Thanks for the review! Yeah, in Chrome OS we do a crazy amount of SPI > transfers since our EC and security chip are connected over SPI and we > seem to pile a whole lot of stuff into the EC. This means we keep > coming back to the SPI controller again and again when profiling > things. I'm hoping that we'll eventually be able to get DMA enabled > here, but until then at least it's nice to make the FIFO transfers > better... Ugh. Given the problem that the kernel test robot found, I'm gonna say just drop this patch but keep the others I sent. As per the CL description, it's a pretty minor optimization and even though we do a lot of SPI transfers it's probably more worth it to work towards DMA mode than to try to find a cleaner solution for this one. -Doug
diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c index 92d88bf85a90..6c7e12b68bf0 100644 --- a/drivers/spi/spi-geni-qcom.c +++ b/drivers/spi/spi-geni-qcom.c @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, len &= TRANS_LEN_MSK; mas->cur_xfer = xfer; + + /* + * Factor out the __iowmb() so that we can use writel_relaxed() for + * both writes below and thus only incur the overhead once even if + * we execute both of them. + */ + __iowmb(); + if (xfer->tx_buf) { m_cmd |= SPI_TX_ONLY; mas->tx_rem_bytes = xfer->len; - writel(len, se->base + SE_SPI_TX_TRANS_LEN); + writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN); } - if (xfer->rx_buf) { m_cmd |= SPI_RX_ONLY; - writel(len, se->base + SE_SPI_RX_TRANS_LEN); + writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN); mas->rx_rem_bytes = xfer->len; }
When setting up a bidirectional transfer we need to program both the TX and RX lengths. We don't need a memory barrier between those two writes. Factor out the __iowmb() and use writel_relaxed(). This saves a fraction of a microsecond of setup overhead on bidirectional transfers. Signed-off-by: Douglas Anderson <dianders@chromium.org> --- drivers/spi/spi-geni-qcom.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)