Humane video conferencing

I set a UX vision and strategy for humane and intelligent video conference features for a 60+ person team:

  • We focused on the human aspects of telepresence, creating a product development environment that emphasized empathy and collaboration.
  • We started with science and made honest and difficult choices based on user data. These choices gave us confidence we were building the right thing.
  • We built to learn. Offline rooms, dedicated lab networks, dSLRs strapped to walls… We built our own VC system to deeply understand the medium in which we work.

The humans in the room

We identified the “bowling alley effect” as a major pain point in meetings – a blocker to how people see and are seen to others. This fundamentally affects communication and meeting outcomes.

Detecting a distant participant's face is the first challenge in framing everyone fairly

From this, we developed the concept of natural scale, where participants are shown life-size when they speak, resulting in improved communication metrics. This led us to implementing computer vision to seamlessly control the meeting room camera in a way that ensures fairness for all users.

Face position drives how the camera crops the room, framing each participant at natural scale

89% of participants reported satisfaction with seeing others at natural scale, compared to 74% in the default configuration.

A template for innovation

Our work led us to pioneer other advanced features – many of which have been launched to Google Workspace users.

Animation of a camera automatically panning and zooming to keep meeting participants centered as they move.
Automatic camera framing
Before-and-after of a participant's face under poor lighting, then re-lit by machine learning to appear evenly illuminated.
Machine-generated portrait lighting
A meeting room with a laptop on the table showing the Companion mode UI alongside the room's main video feed.
Companion mode lets meeting room participants follow along with their personal device
A dark video frame brightened in real time by computational photography, revealing a clear view of the participant.
Low-light capture enhancement