Don't require constant customer dialog interaction for long-form content
I would love to see some improvement to the possible interactions that can occur between an Alexa skill and customers. Currently, it relies heavily on a back and forth dialog with customers, but isn't well-suited to long-form content (like articles, which I'll use as an example throughout going forward). I'd like to be able to return a relatively small amount of an article at a time, and react to events emitted by the Alexa infrastructure that allow me to prepare the next chunk of the article and present it to the customer seamlessly, without any interruption.
Say we have a 10 page article that I want to present to the customer one page at a time. The desired interaction would be something like this:
* Customer: Alexa, read the next article.
* Echo: The next article is entitled "Article now", starting now.
* Echo: [Reads the first page]
* (before the first page is finished being read, the Skill is called again with an "almost finished event", so the next page can be sent back to the device)
* Echo: [Reads the second page]
* (event emitted before second page is finished)
* Echo: [Reads third page]
* ...and so on until...
* Echo: [Reads the last page]
* Echo: Up next... "Article 2", starting now.
The idea is that articles can be read continuously without being interrupted with a question. At any time, the customer could take actions associated with the article (skip, star, delete, etc). The emitted events are similar to audio playback events. They will not only allow preparing the next page, but to save progress in the article for future interactions.
What I've tried.
Text to speech with standard dialogs
This scenario uses the standard dialogs that you can create with the ASK, but requires that the customer continually interact with the skill ("Do you want to continue?"), to keep the dialog going.
- Same, familiar Alexa voice that's been configured by the customer.
- Ability to use standard Alexa text-to-speech facilities instead of using Polly for long-form content.
- Dialog stays open the whole time so any actions are taken by the customer, will be applied to the active skill first ("Alexa, archive that article", instead of "Alexa, ask [skill] to archive article")
- Progress can be saved after each dialog turn.
- In order to save progress affectively (and because there's a limit to the amount of text that can be added to the dialog payload), and keep the dialog open, each chunk of the article requires some dialog back with the customer, such as "do you want to continue reading". This is jarring and would be better without it, though there's no way that I can find to do so.
Audio skill and Text to speech with Polly
This works pretty well and gets close to the "almost done" events, which are provided by the AudioPlayer API.
- Continuous stream of the long-form content.
- "Almost done" events allow skill to render next chunk of text and save progress.
- Can break content into small chunks that help facilitate saving progress.
- Can keep playing content without customer interaction.
- Need to use Polly for text to speech.
- Standard Alexa voice is not available via Polly
- When taking non-standard actions (those that are not provided by the Audio Player API), you need to preceed the utterance with the skill name (Alexa, as [skill] to [some action])
Implement "almost finished" events in normal (non-audio) dialogs that allow continuous Alexa speech without the need for using text-to-speech services such as Polly, and allow more natural actions without having to use phrases like "Alexa, ask the skill to do [action]". This will provide easier, more natural long-form content interactions for customers, and will maintain the same configured Alexa voice that customers are used to, instead of the different Polly voices.
NOTE: I originally asked about options for this sort of interaction in this forum post: