Create an earcon tag for high-quality short-form assets (earcons) such as sonic logos, UX/UI sounds, mnemonics, and jingles. This <earcon> tag should differ from others in that it should support a higher sample rate/bit rate with a restricted length.
The <audio> tag can support up to 240 seconds, whereas the average <earcon> is only 1-5 seconds long. With a sample rate of 24 kHz and a bit rate of 48 kbps for a full 240 seconds, the size of the compressed audio file would be 1.44 MB. Conversely, with a 5 second <earcon> tag, 1.44 MB could theoretically allow for a 48 kHz / 24-bit uncompressed file. The difference in quality would be tremendous for the same exact file size.
Some will recommend that the AudioPlayer is a solution for streaming higher quality music. While this may be true, the user loses control of the custom functions of the current skill and is restricted to a limited list of intents (e.g. pause, next) and cannot access critical skill functions such as Help or any other form of navigation. Forcing a user to say "Next" or "Back" to navigate to an interactive portion of the skill is terrible UX—and when they ask for help but receive the generic Alexa menu is even worse.
An earcon tag would be the best solution for many audio woes.