Stateful interaction model
Imagine a situation. I have a simple game, which uses states. One state is initial, another one is game state and the last one is the end game state.
I have x amount of intents which are: launch, answer, pass a question, players amount, stop, yes and no intents. I know that these are not all of the required intents, but I want to keep this example short.
Lets say player launches the game. At this state I want user to say how many players he wants to play with. But because game answer intent has a lot of answers in the slot, alexa gets confused and starts matching this intent instead of players intent, which has numbers slot. In this case I have to handle all of the intents, though they might not make sense in the given context. This does not only make coding harder, because you have to cover all of the cases, but also hurts the recognition, because alexa has to determine what was said, between all of the intents.
What would I like is to be able to turn off certain intents at certain stages/states of the game. Lets say on launch I would like, that alexa on user input would be choosing between: stop and players intent. In game state, alexa would be choosing, between: answer intent, pass question and stop intent. And in game end state, alexa would be choosing from stop, yes and no intents.
I think this would help the recognition, because alexa would only have to choose from intents that are used at the certain stage of the skill, instead of choosing from all of the available.
Also it would be easier to code complex skills, because you wouldn't need to handle every intent at the every stage of the game.
Alex Kaye commented
+1 for this. I was just trying to build a trivia-style game feature, expecting I could have the user speak the answer after a question prompt. Looking at the Amazon-provided samples, you have to have an AnswerIntent with numeric slots and give the user multiple possible answers to choose from. To make it work as I'd like, I'd have had to do crazy like create an intent for each possible answer and check attribute state to see if it came at the expected time.
I wonder if the design team had the old UX debate about modal vs. modeless interfaces, and opted for the latter in skills, even though skills themselves are a mode. I would argue that given the utterance->intent mapping is fuzzy, modes are even more important for a voice interface than with e.g. keyboard shortcuts. At least keypress intents are unambiguous. With fuzzy voice matching, a global namespace is way too easy to pollute to the point of uselessness. We definitely need to be able to prune the match possibilities.