The following describes how we applied the onboarding phase of the Natively Adaptive Interfaces (NAI) approach to a specific use case: making online videos accessible to blind and low-vision (B/LV) users.
Understand multimodal agentive interfaces.
Researched how AI agents, such as Google's Gemini and Gemma, process voice and text commands and generate audio or text responses relevant to video content.
Understand critical accessibility concepts.
- Studied ability-based design specifically for non-visual and low-vision interaction. Specifically focused on leveraging auditory perception, screen reader compatibility, and alternative app navigation methods.
Grasped universal design principles, and how they might apply when transforming the core video experience beyond the visual modality.
Examined curb-cut opportunities beyond B/LV users. Users who are experiencing situational visual impairments may benefit from features built for B/LV users. For example, a sighted person can experience situational visual impairments when driving. They often cannot—and should not—view their mobile device during this time, and would benefit from a way to interact with their device that doesn't require use of their visual modality.
Researched specific barriers B/LV users face with video content and video players. These included lack of audio descriptions, reliance on visual cues, inaccessible controls, and poor screen reader support.
Understand the concept of equivalent experiences as a design goal.
Conceptualized how a B/LV user can achieve the same goals as a sighted user when consuming an online video. For example, learning from a tutorial, following a news report, and enjoying a movie.
Recognized that new forms of audio and agent interactions could be critical in delivering a solution.
Mapped out essential inputs and crucial outputs. Examples of inputs include voice commands and text chat using a screen reader; examples of outputs include detailed audio descriptions, spoken summaries, answers to questions about visual content, clear audio cues for player state, and text outputs compatible with screen readers and braille displays.
Understand feature themes that are useful when building adaptable agents.
- Application control: application control is essential for B/LV users. The main focus here will be in identifying the preferred or required means for providing inputs to the agent, and receiving confirmation.
- Information seeking for content understanding: the agent must comprehend the content to describe visual elements. For example, users might say: "Describe the scene"; "Who just entered the room?"; or "What's written on the whiteboard?" Therefore, access to detailed audio descriptions (AD) or real-time visual analysis will be key. Consider summaries of visual sequences.
- Information seeking for content discovery: how can the agent help B/LV users find relevant videos without relying on visual thumbnails or browsing? For example, users might ask: "Find videos about assistive technology with good audio description" or "Read the description of the top three results."
- Content transformation: because much of the information conveyed in a video is done visually, implementing a content transformation such as video to detailed audio descriptions of visual information is fundamental to the agent providing an equivalent output experience.
- Multimodal interaction: the agent will utilize full multimodal interaction to cover the needs of users across the range of visual abilities.