Our skill uses ListTemplate2 to display lists of customer-created images, with list sizes ranging from 3 to more than 20 in some cases. Each list item comes with descriptions that we send to speechOutput.
As the skill speaks the description (TextContent/PrimaryText), the outputSpeech soon begins describing slides that are no longer on screen. Our users have reported that this is confusing.
We tested a "paging" solution, where the skill would only show three images at a time (using next/previous intents), but also received negative feedback (generally, too confusing to keep track of what they've seen, where they are in the list, etc.)
In our view, the simple solution is to have Echo Spot/Show track the list item being spoken, and automatically scroll the list to stay in sync with spoken output.
Also, please add a feature request category for visual displays (Show/Spot/Fire Cube)