YouTube recently announced a new analytics tool for 360-degree and virtual reality content creators: heatmaps that illustrate where viewers are actually looking. The new tool allows creators to see exactly what parts of their video are holding a viewer’s attention, and for how long.
YouTube has also released some enlightening early statistics on how – and this is important – viewers currently engage with immersive content.
“Surprisingly” (says YouTube), viewers spend 75% of their time focused on the front 90 degrees of an immersive video. Understandably, this figure has a lot of people questioning the point of VR content if the audience is only willing to engage with a quarter of it.
It’s an easy argument to make, but perhaps what these numbers are really saying is that VR content creators need to learn new ways to grab viewers attention in a 360º world?
Ever since moving pictures became something we watched for entertainment purposes, our eyes have been guided by camera angles to tell us where to look. For over a century that’s what the viewing audience has come to expect.
Virtual reality reminds us very much of the 2D world of film and television, but it’s an entirely different medium with its own set of rules that are still being written. Nothing is set in stone.
And camera angles? Well, those are up to the viewer to choose.
Content creators in the virtual reality space have the difficult task of catching the attention of an audience with over 100 years of collective viewing experience of looking straight ahead.
Does this make virtual reality a fad? A gimmick? No, of course not. It simply means that VR can’t rely on the same tools that have been used for film and television to engage an audience in a fully-immersive format.
That’s a lot of unlearning to do for content creators, and a lot of new learning to do as the format develops. It’s an exciting new frontier.
Back to YouTube’s statistics: the most popular VR videos had the audience looking behind them almost 20% of the time. Markers and animation are what the company suggests will help draw attention to other parts of the surrounding space. In our day to day lives our attention is constantly guided by signs, so it’s a helpful suggestion. But think about this: what’s the one sure thing that will make you stop whatever you’re looking at and focus your attention elsewhere?
Sound…
We are programmed to react to sound. In a split second we must figure out where that sound is coming from and what it means. It is as true in the virtual world as it is in the real world, which is why 1.618 Digital is passionate about high-quality spatialised sound.
Spatial audio can be an effective tool to lead or surprise your audience. By being in the habit of looking in one direction at any given time, the viewer can easily miss out on what is happening behind or beside them. Through the creative implementation of sonic cues within an immersive environment content creators can control or suggest a narrative. Ultimately, this encourages the audience to engage with specific elements – or viewing angles – within the experience.
Virtual reality is an effective form of visual storytelling. What YouTube’s early heatmap data points to isn’t VR’s failure to engage its viewers. It’s the bigger picture of where audience attention currently is, and the gaps content creators need to fill to direct it elsewhere.
1.618 Digital Team
Importance of Spatial Audio In VR Content
Hearing is fundamental to our perception of the surrounding world. Achieving this effect in virtual reality requires audio that sounds real and authentic. Implementing spatial audio to create full immersion in 360° video or interactive VR requires capturing audio or a physical acoustic modeling of the space where the scene takes place. An appropriate soundscape can provide the quickest path to immersion for just about any VR experience, and even removing the visual element, still enables us to sufficiently perceive the surrounding world – giving us a sense of space, time, and presence. In contrast, the silent experiences, or the ones with incongruent sound would break the sense of presence and immersion, thus immediately removing the suspension of disbelief, and as a result substantially degrading the overall experience.
Spatial sound recording or let’s do it in post?
The use of conventional industry formats such as Mono (single channel) and Stereo (two channels) are a basic requirement, although they are limiting and no longer sufficient to offer full immersion in 360° videos or interactive VR experiences. The use of spatial audio is the only way to create true three-dimensional audio, which utilises higher number of channels, be it capturing sound on location or through the means of sound design and mixing in post-production. Depending on the nature of the project both methods are important. Often to design the full sonic experience in VR, it requires spatial sound recording on set along with sound design and spatialisation of individual elements in post-production such as atmosphere, dialogue, foley, sound effects, and music.
Ambisonic format is the most effective method to capture location sound
There are a number of ways to capture the location sound. However, the most effective method is to record in an ambisonic format which utilises four channels capturing the sound in all directions, along with discrete sound sources such as dialogue or any required diegetic sounds that are part of the scene. The latter can then be positioned accordingly within the 3D soundfield by employing specialist spatialisation software within audio editing application or a game engine. This approach enables VR audio content makers to work with an adequate resolution within the virtual space for positioning sound components across four, 16 or more virtual or physical channels.
Ambisonic sound offers a number of significant benefits that play a crucial role in making experiences as realistic as possible.
-Firstly, sound that was captured in all directions then enables the user to move their head and body while wearing any head-mounted display, and with a use of head-tracking system perceive their own dynamic position within the space in relation to the surrounding environment.
-Second, greater channel count offers more accuracy in positioning individual elements within the 3D space. This avoids everything coming from the same general direction as is common when listening to music, but lacking in realism when comes to creating a metaverse or offering your audience an authentic 360° video experience.
Why is this essential?
The considerations mentioned above are essential due the phenomenon described as a head-related transfer function (HRTF), which is a response that informs how our ears perceive sound from own position in space. Collectively, head-related functions for both ears give a perception of binaural sound, enabling us to effectively identify a location and a distance of sound sources by constantly receiving sonic information to measure sound intensity and the time difference between sounds arriving to both ears. We re-create this psychoacoustic process in post-production to then achieve a latency-free, real-time binaural rendering via a close approximation of personal HRTF. It is essential to take human physiology into consideration when making audience fully immersed and enjoy their experience, be it a story, game or cognitive therapy etc.
The use of audio in marketing campaigns to guide your audience in 360° content
Unlike 2D content where a viewer can see the entire field of view in one direction, the 360° environment presents challenges as well as the opportunities for creative content makers when it comes to constructing the narrative. Spatial audio can be an effective tool to lead or surprise your audience. By looking in one direction at any given time, the viewer can easily miss out on what is behind them or sidewards, by implementing sonic cues within the space we can control or suggest a narrative. By helping our audience to navigate through their point of view, we can ultimately guide them to and encourage them to engage with a specific element within the experience.
What is more important?
When combined effectively, fully integrated visual and sonic perception work in perfect harmony that enables us to see, hear, feel and appreciate the beauty and richness of our world. Virtual reality already proved its effectiveness in video storytelling, gaming, educational training, social interaction and medical applications. In order to make any of the above experiences successful, it requires a coherent approach of applying sound and visual content to make it as effective for its purpose as possible – more immersive, more authentic and as the result more engaging, more memorable, more empathetic, more fun and ultimately good enough to have a desire to come back and experience it again and again.
1.618 Team