diff mbox

[PATCHv2,0/2] N900 Modem Speech Support

Message ID 20150305113013.GA3867@amd (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Machek March 5, 2015, 11:30 a.m. UTC
Hi!

> Userland access goes via /dev/cmt_speech. The API is implemented in
> libcmtspeechdata, which is used by ofono and the freesmartphone.org project.
> Apart from that the device is also used by the phone binaries distributed
> with Maemo. So while this is a new userland ABI for the mainline kernel it
> has been tested in the wild for some years.

I'm sorry, Dave. I can't let you do that.

Yes, the ABI is "tested" for some years, but it is not documented, and
it is very wrong ABI.

I'm not sure what they do with the "read()". I was assuming it is
meant for passing voice data, but it can return at most 4 bytes,
AFAICT.

We already have perfectly good ABI for passing voice data around. It
is called "ALSA". libcmtspeech will then become unneccessary, and the
daemon routing voice data will be as simple as "read sample from
ALSA/modem, write sample to ALSA/rx-51_soundcard" & "read sample from
ALSA/rx-51_soundcard, write sample to ALSA/modem" & .

Should this driver be merged to drivers/staging while the interface is
fixed? Big part of driver should stay the same, userspace interface is
only small part of the driver...

Sorry about that,
									Pavel

Signed-off-by: Pavel Machek <pavel@ucw.cz>

Comments

Kai Vehmanen March 5, 2015, 5:32 p.m. UTC | #1
Hi,

On Thu, 5 Mar 2015, Pavel Machek wrote:

>> Userland access goes via /dev/cmt_speech. The API is implemented in
>> libcmtspeechdata, which is used by ofono and the freesmartphone.org project.
> Yes, the ABI is "tested" for some years, but it is not documented, and
> it is very wrong ABI.
>
> I'm not sure what they do with the "read()". I was assuming it is
> meant for passing voice data, but it can return at most 4 bytes,
> AFAICT.
>
> We already have perfectly good ABI for passing voice data around. It
> is called "ALSA". libcmtspeech will then become unneccessary, and the
> daemon routing voice data will be as simple as "read sample from

I'm no longer involved with cmt_speech (with this driver nor modems in 
general), but let me clarify some bits about the design.

First, the team that designed the driver and the stack above had a lot of 
folks working also with ALSA (and the ALSA drivers have been merged to 
mainline long ago) and we considered ALSA on multiple occasions as the 
interface for this as well.

Our take was that ALSA is not the right interface for cmt_speech. The 
cmt_speech interface in the modem is _not_ a PCM interface as modelled by 
ALSA. Specifically:

- the interface is lossy in both directions
- data is sent in packets, not a stream of samples (could be other things
   than PCM samples), with timing and meta-data
- timing of uplink is of utmost importance

Some definite similarities:
  - the mmap interface to manage the PCM buffers (that is on purpose
    similar to that of ALSA)

The interface was designed so that the audio mixer (e.g. Pulseaudio) is 
run with a soft real-time SCHED_FIFO/RR user-space thread that has full 
control over _when_ voice _packets_ are sent, and can receive packets with 
meta-data (see libcmtspeechdata interface, cmtspeech.h), and can 
detect and handle gaps in the received packets.

This is very different from modems that offer an actual PCM voice link for 
example over I2S to the application processor (there are lots of these on 
the market). When you walk out of coverage during a call with these 
modems, you'll still get samples over I2S, but not so with cmt_speech, so 
ALSA is not the right interface.

Now, I'm not saying the interface is perfect, but just to give a bit of 
background, why a custom char-device interface was chosen.

PS Not saying it's enough for mainline inclusion, but libcmtspeechdata [1]
    was released and documented to enable the driver to be used by
    other software than the closed pulseaudio modules. You Pavel of course
    know this as you've been maintaining the library, but FYI for others.

[1] https://www.gitorious.org/libcmtspeechdata

Br, Kai
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Machek March 6, 2015, 9:43 a.m. UTC | #2
Hi!

> >>Userland access goes via /dev/cmt_speech. The API is implemented in
> >>libcmtspeechdata, which is used by ofono and the freesmartphone.org project.
> >Yes, the ABI is "tested" for some years, but it is not documented, and
> >it is very wrong ABI.
> >
> >I'm not sure what they do with the "read()". I was assuming it is
> >meant for passing voice data, but it can return at most 4 bytes,
> >AFAICT.
> >
> >We already have perfectly good ABI for passing voice data around. It
> >is called "ALSA". libcmtspeech will then become unneccessary, and the
> >daemon routing voice data will be as simple as "read sample from
> 
> I'm no longer involved with cmt_speech (with this driver nor modems in
> general), but let me clarify some bits about the design.

Thanks a lot for your insights; high level design decisions are quite
hard to understand from C code.

> First, the team that designed the driver and the stack above had a lot of
> folks working also with ALSA (and the ALSA drivers have been merged to
> mainline long ago) and we considered ALSA on multiple occasions as the
> interface for this as well.
> 
> Our take was that ALSA is not the right interface for cmt_speech. The
> cmt_speech interface in the modem is _not_ a PCM interface as modelled by
> ALSA. Specifically:
> 
> - the interface is lossy in both directions
> - data is sent in packets, not a stream of samples (could be other things
>   than PCM samples), with timing and meta-data
> - timing of uplink is of utmost importance

I see that you may not have data available in "downlink" scenario, but
how is it lossy in "uplink" scenario? Phone should always try to fill
the uplink, no? (Or do you detect silence and not transmit in this
case?) (Actually, I guess applications should be ready for "data not
ready" case even on "normal" hardware due to differing clocks.)

Packets vs. stream of samples... does userland need to know about the
packets? Could we simply hide it from the userland? As userland daemon
is (supposed to be) realtime, do we really need extra set of
timestamps? What other metadata are there?

Uplink timing... As the daemon is realtime, can it just send the data
at the right time? Also normally uplink would be filled, no?

> Some definite similarities:
>  - the mmap interface to manage the PCM buffers (that is on purpose
>    similar to that of ALSA)
> 
> The interface was designed so that the audio mixer (e.g. Pulseaudio) is run
> with a soft real-time SCHED_FIFO/RR user-space thread that has full control
> over _when_ voice _packets_ are sent, and can receive packets with meta-data
> (see libcmtspeechdata interface, cmtspeech.h), and can detect and handle
> gaps in the received packets.

Well, packets are of fixed size, right? So the userland can simply
supply the right size in the common case. As for sending at the right
time... well... if the userspace is already real-time, that should be
easy. 

Now, there's a difference in the downlink. Maybe ALSA people have an
idea what to do in this case? Perhaps we can just provide artificial
"zero" data?

> This is very different from modems that offer an actual PCM voice link for
> example over I2S to the application processor (there are lots of these on
> the market). When you walk out of coverage during a call with these modems,
> you'll still get samples over I2S, but not so with cmt_speech, so ALSA is
> not the right interface.

Yes, understood.

> Now, I'm not saying the interface is perfect, but just to give a bit of
> background, why a custom char-device interface was chosen.

Thanks and best regards,
									Pavel
Kai Vehmanen March 6, 2015, 8:49 p.m. UTC | #3
Hi,

On Fri, 6 Mar 2015, Pavel Machek wrote:

>> Our take was that ALSA is not the right interface for cmt_speech. The
>> cmt_speech interface in the modem is _not_ a PCM interface as modelled by
>> ALSA. Specifically:
>>
>> - the interface is lossy in both directions
>> - data is sent in packets, not a stream of samples (could be other things
>>   than PCM samples), with timing and meta-data
>> - timing of uplink is of utmost importance
>
> I see that you may not have data available in "downlink" scenario, but
> how is it lossy in "uplink" scenario? Phone should always try to fill
> the uplink, no? (Or do you detect silence and not transmit in this

Lossy was perhaps not the best choice of words, non-continuous would be 
a better choice in the uplink case. To adjust timing, some samples from 
the continuous locally recorded PCM stream need to be skipped and/or 
duplicated. This would normally be done between speech bursts to avoid 
artifacts.

> Packets vs. stream of samples... does userland need to know about the
> packets? Could we simply hide it from the userland? As userland daemon
> is (supposed to be) realtime, do we really need extra set of
> timestamps? What other metadata are there?

Yes, we need flags that tell about the frame. Please see docs for 
'frame_flags' and 'spc_flags' in libcmtspeechdata cmtspeech.h:
https://www.gitorious.org/libcmtspeechdata/libcmtspeechdata/source/9206835ea3c96815840a80ccba9eaeb16ff7e294:cmtspeech.h

Kernel space does not have enough info to handle these flags as the audio 
mixer is not implemented in kernel, so they have to be passed to/from 
user-space.

And some further info in libcmtspeechdata/docs/ 
https://www.gitorious.org/libcmtspeechdata/libcmtspeechdata/source/9206835ea3c96815840a80ccba9eaeb16ff7e294:doc/libcmtspeechdata_api_docs_main.txt

> Uplink timing... As the daemon is realtime, can it just send the data
> at the right time? Also normally uplink would be filled, no?

But how would you implement that via the ALSA API? With cmt_speech, a 
speech packet is prepared in a mmap'ed buffer, flags are set to describe 
the buffer, and at the correct time, write() is called to trigger 
transmission in HW (see cmtspeech_ul_buffer_release() in 
libcmtspeechdata() -> compare this to snd_pcm_mmap_commit() in ALSA). In 
ALSA, the mmap commit and PCM write variants just add data to the 
ringbuffer and update the appl pointer. Only initial start (and stop) on 
stream have the "do something now" semantics in ALSA.

The ALSA compressed offload API did not exist back when we were working on 
cmt_speech, but that's still not a good fit, although adds some of the 
concepts (notably frames).

> Well, packets are of fixed size, right? So the userland can simply
> supply the right size in the common case. As for sending at the right
> time... well... if the userspace is already real-time, that should be 
> easy

See above, ALSA just doesn't work like that, there's no syscall for "send 
these samples now", the model is different.

Br, Kai
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/hsi/clients/cmt_speech.c b/drivers/hsi/clients/cmt_speech.c
index 5dbbc67..06dc81c 100644
--- a/drivers/hsi/clients/cmt_speech.c
+++ b/drivers/hsi/clients/cmt_speech.c
@@ -21,6 +21,8 @@ 
  * 02110-1301 USA
  */
 
+/* Thanks to http://ben-collins.blogspot.cz/2010/05/writing-alsa-driver-basics.html */
+
 #include <linux/errno.h>
 #include <linux/module.h>
 #include <linux/types.h>
@@ -39,6 +41,11 @@ 
 #include <linux/hsi/ssi_protocol.h>
 #include <linux/hsi/cs-protocol.h>
 
+#include <sound/initval.h>
+#include <sound/core.h>
+#include <sound/memalloc.h>
+#include <sound/pcm.h>
+
 #define CS_MMAP_SIZE	PAGE_SIZE
 
 struct char_queue {
@@ -62,8 +69,12 @@  struct cs_char {
 	/* hsi channel ids */
 	int                     channel_id_cmd;
 	int                     channel_id_data;
+	/* alsa */
+	struct snd_card *card;
 };
 
+#define cs_char cs_char
+
 #define SSI_CHANNEL_STATE_READING	1
 #define SSI_CHANNEL_STATE_WRITING	(1 << 1)
 #define SSI_CHANNEL_STATE_POLL		(1 << 2)
@@ -168,6 +179,8 @@  static void cs_notify(u32 message, struct list_head *head)
 	wake_up_interruptible(&cs_char_data.wait);
 	kill_fasync(&cs_char_data.async_queue, SIGIO, POLL_IN);
 
+	/* snd_pcm_period_elapsed(my_dev->ss); ?? FIXME */
+
 out:
 	return;
 }
@@ -1134,10 +1147,8 @@  static unsigned int cs_char_poll(struct file *file, poll_table *wait)
 	return ret;
 }
 
-static ssize_t cs_char_read(struct file *file, char __user *buf, size_t count,
-								loff_t *unused)
+static ssize_t __cs_char_read(struct cs_char *csdata, char __user *buf, size_t count, int nonblock)
 {
-	struct cs_char *csdata = file->private_data;
 	u32 data;
 	ssize_t retval;
 
@@ -1161,7 +1172,7 @@  static ssize_t cs_char_read(struct file *file, char __user *buf, size_t count,
 
 		if (data)
 			break;
-		if (file->f_flags & O_NONBLOCK) {
+		if (nonblock) {
 			retval = -EAGAIN;
 			goto out;
 		} else if (signal_pending(current)) {
@@ -1182,10 +1193,17 @@  out:
 	return retval;
 }
 
-static ssize_t cs_char_write(struct file *file, const char __user *buf,
-						size_t count, loff_t *unused)
+
+static ssize_t cs_char_read(struct file *file, char __user *buf, size_t count,
+								loff_t *unused)
 {
 	struct cs_char *csdata = file->private_data;
+	return __cs_char_read(csdata, buf, count, file->f_flags & O_NONBLOCK);
+}
+
+static ssize_t __cs_char_write(struct cs_char *csdata, const char __user *buf,
+						size_t count)
+{
 	u32 data;
 	int err;
 	ssize_t	retval;
@@ -1205,6 +1223,13 @@  static ssize_t cs_char_write(struct file *file, const char __user *buf,
 	return retval;
 }
 
+static ssize_t cs_char_write(struct file *file, const char __user *buf,
+						size_t count, loff_t *unused)
+{
+	struct cs_char *csdata = file->private_data;
+	return __cs_char_write(csdata, buf, count);
+}
+
 static long cs_char_ioctl(struct file *file, unsigned int cmd,
 				unsigned long arg)
 {
@@ -1269,7 +1294,7 @@  static int cs_char_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
-static int cs_char_open(struct inode *unused, struct file *file)
+static int __cs_char_open(void)
 {
 	int ret = 0;
 	unsigned long p;
@@ -1300,8 +1325,6 @@  static int cs_char_open(struct inode *unused, struct file *file)
 	cs_char_data.mmap_base = p;
 	cs_char_data.mmap_size = CS_MMAP_SIZE;
 
-	file->private_data = &cs_char_data;
-
 	return 0;
 
 out3:
@@ -1314,6 +1337,13 @@  out1:
 	return ret;
 }
 
+static int cs_char_open(struct inode *unused, struct file *file)
+{
+	file->private_data = &cs_char_data;
+
+	return __cs_char_open();
+}
+
 static void cs_free_char_queue(struct list_head *head)
 {
 	struct char_queue *entry;
@@ -1329,10 +1359,8 @@  static void cs_free_char_queue(struct list_head *head)
 
 }
 
-static int cs_char_release(struct inode *unused, struct file *file)
+static int __cs_char_release(struct cs_char *csdata)
 {
-	struct cs_char *csdata = file->private_data;
-
 	cs_hsi_stop(csdata->hi);
 	spin_lock_bh(&csdata->lock);
 	csdata->hi = NULL;
@@ -1345,6 +1373,13 @@  static int cs_char_release(struct inode *unused, struct file *file)
 	return 0;
 }
 
+static int cs_char_release(struct inode *unused, struct file *file)
+{
+	struct cs_char *csdata = file->private_data;
+
+	return __cs_char_release(csdata);
+}
+
 static const struct file_operations cs_char_fops = {
 	.owner		= THIS_MODULE,
 	.read		= cs_char_read,
@@ -1363,6 +1398,112 @@  static struct miscdevice cs_char_miscdev = {
 	.fops	= &cs_char_fops
 };
 
+static struct snd_pcm_hardware my_pcm_hw = {
+	.info = (SNDRV_PCM_INFO_MMAP |
+		 SNDRV_PCM_INFO_INTERLEAVED |
+		 SNDRV_PCM_INFO_BLOCK_TRANSFER |
+		 SNDRV_PCM_INFO_MMAP_VALID),
+	.formats          = SNDRV_PCM_FMTBIT_U8,
+	.rates            = SNDRV_PCM_RATE_8000,
+	.rate_min         = 8000,
+	.rate_max         = 8000,
+	.channels_min     = 1,
+	.channels_max     = 1,
+	.buffer_bytes_max = (32 * 48),
+	.period_bytes_min = 48,
+	.period_bytes_max = 48,
+	.periods_min      = 1,
+	.periods_max      = 32,
+};
+#define MAX_BUFFER 1024
+
+static int my_pcm_open(struct snd_pcm_substream *ss)
+{
+	ss->runtime->hw = my_pcm_hw;
+	ss->private_data = &cs_char_data;
+
+	printk("my_pcm_open\n");
+
+	return 0;
+}
+
+static int my_pcm_close(struct snd_pcm_substream *ss)
+{
+	ss->private_data = NULL;
+
+	printk("my_pcm_close\n");	
+
+	return 0;
+}
+
+static int my_hw_params(struct snd_pcm_substream *ss,
+			struct snd_pcm_hw_params *hw_params)
+{
+	return snd_pcm_lib_malloc_pages(ss,
+					params_buffer_bytes(hw_params));
+}
+
+static int my_hw_free(struct snd_pcm_substream *ss)
+{
+	return snd_pcm_lib_free_pages(ss);
+}
+
+static int my_pcm_prepare(struct snd_pcm_substream *ss)
+{
+	return 0;
+}
+
+static int my_pcm_trigger(struct snd_pcm_substream *ss,
+			  int cmd)
+{
+	struct cs_char *my_dev = snd_pcm_substream_chip(ss);
+	int ret = 0;
+
+	switch (cmd) {
+	case SNDRV_PCM_TRIGGER_START:
+		// Start the hardware capture
+		break;
+	case SNDRV_PCM_TRIGGER_STOP:
+		// Stop the hardware capture
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static snd_pcm_uframes_t my_pcm_pointer(struct snd_pcm_substream *ss)
+{
+	struct cs_char *my_dev = snd_pcm_substream_chip(ss);
+
+//	return my_dev->hw_idx;
+	return 0;
+}
+
+static int my_pcm_copy(struct snd_pcm_substream *ss,
+		       int channel, snd_pcm_uframes_t pos,
+		       void __user *dst,
+		       snd_pcm_uframes_t count)
+{
+	struct cs_char *my_dev = snd_pcm_substream_chip(ss);
+
+	//return copy_to_user(dst, my_dev->buffer + pos, count);
+	return -EFAULT;
+}
+
+static struct snd_pcm_ops my_pcm_ops = {
+	.open      = my_pcm_open,
+	.close     = my_pcm_close,
+	.ioctl     = snd_pcm_lib_ioctl,
+	.hw_params = my_hw_params,
+	.hw_free   = my_hw_free,
+	.prepare   = my_pcm_prepare,
+	.trigger   = my_pcm_trigger,
+	.pointer   = my_pcm_pointer,
+	.copy      = my_pcm_copy,
+};
+
 static int cs_hsi_client_probe(struct device *dev)
 {
 	int err = 0;
@@ -1398,14 +1539,48 @@  static int cs_hsi_client_probe(struct device *dev)
 		dev_err(dev, "Failed to register: %d\n", err);
 
 	printk("Registering sound card\n");
-#if 0
+#if 1
 	{
 	struct snd_card *card;
 	int ret;
-	ret = snd_card_create(SNDRV_DEFAULT_IDX1, "Nokia HSI modem",
+	ret = snd_card_new(dev, SNDRV_DEFAULT_IDX1, "Nokia HSI modem",
 			      THIS_MODULE, 0, &card);
 	if (ret < 0)
 		return ret;
+
+	strcpy(card->driver, "cmt_speech");
+	strcpy(card->shortname, "HSI modem");
+	sprintf(card->longname, "Nokia HSI modem");
+	snd_card_set_dev(card, dev);
+
+	static struct snd_device_ops ops = { NULL };
+	ret = snd_device_new(card, SNDRV_DEV_LOWLEVEL, dev, &ops);
+	if (ret < 0)
+		return ret;
+
+	struct snd_pcm *pcm;
+	ret = snd_pcm_new(card, card->driver, 0, 0, 1,
+			  &pcm);
+	if (ret < 0)
+		return ret;
+
+	snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_CAPTURE,
+			&my_pcm_ops);
+	pcm->private_data = dev;
+	pcm->info_flags = 0;
+	strcpy(pcm->name, card->shortname);
+
+	ret = snd_pcm_lib_preallocate_pages_for_all(pcm,
+						    SNDRV_DMA_TYPE_CONTINUOUS,
+						    snd_dma_continuous_data(GFP_KERNEL),
+						    MAX_BUFFER, MAX_BUFFER);
+	if (ret < 0)
+		return ret;
+	
+	if ((ret = snd_card_register(card)) < 0)
+		return ret;
+
+	cs_char_data.card = card;
 	}
 #endif
 
@@ -1417,6 +1592,8 @@  static int cs_hsi_client_remove(struct device *dev)
 	struct cs_hsi_iface *hi;
 
 	dev_dbg(dev, "hsi_client_remove\n");
+
+	snd_card_free(cs_char_data.card);
 	misc_deregister(&cs_char_miscdev);
 	spin_lock_bh(&cs_char_data.lock);
 	hi = cs_char_data.hi;