A MediaSession collects all audio-producing objects in one tab. It is usually unpleasant when multiple MediaSessions play sound at the same time. Audio focus handling manages the MediaSessions and mixes them in proper ways. This is part of the default media session on desktop project.
There are “persistent” and “transient” audio focus types.
MediaSessionAudio-producing objects should join MediaSession when they want to produce sound. MediaSession has the following states:
MediaSession has audio focus and its audio-producing objects can play.MediaSession.Besides, MediaSession has a DUCKING flag, which means its managed audio-producing objects has lowered volume. The flag is orthogonal with MediaSession state.
AudioFocusManagerAudioFocusManager is a global instance which manages the state of MediaSessions. It is used for platforms (e.g. Android) that do not have a system audio focus.
When an audio-producing object wants to play audio, it should join MediaSession and tell which kind of audio focus type it requires. MediaSession will then request audio focus from AudioFocusManager, and will allow the object to play sound if successful. AudioFocusManager will notify other MediaSessions if their states are changed.
When an audio-producing object stops playing audio, it should be removed from its MediaSession, and MediaSession should abandon its audio focus if its audio-producing objects is empty. AudioFocusManager will notify other MediaSessions of state change if necessary.
AudioFocusManager uses a stack implementation. It keeps track of all ACTIVE/SUSPENDED MediaSessions. When a MediaSession requests audio focus, it will be put at the top of the stack, and will be removed from the stack when it abandons audio focus.
The algorithm is as follows:
When a MediaSession requests audio focus:
Remove it from the audio focus stack if it's already there, and place it at the top of audio focus stack, grant focus to the session and let it play.
If the session is persistent, suspend all the other sessions on the stack.
If the session is transient, we should duck any active persistent audio focus entry if present:
When a MediaSession abandons audio focus:
If the session is not on the top, just remove it from the stack.
If the session is on the top, remove it from the stack.
Pepper is different from media elements since it has a different model. Pepper cannot be paused, but its volume can be changed. When considering Pepper, the above algorithm must be modified.
When Pepper joins MediaSession, it should request persistent focus type. When AudioFocusManager wants to suspend a MediaSession, it must check whether the session has Pepper instance, and if yes, it should duck the session instead.
Also, whenever a session abandons focus, and the next top session is INACTIVE, AudioFocusManager should find the next session having Pepper and unduck it.