The bulk of your Action development involves designing and building your conversation. There are three general things that you need to take into account when building conversations:
Surface capabilities describe the surface that the user is experiencing your Action on. Surfaces can have audio support, screen support, or both. Actions on Google returns the capabilities of a surface to every request to your fulfillment, so you can use this information to deliver the right UI.
Responses define what users can say to your Action and the corresponding response you return back to users. Depending on the surface capability of the device your users are on, dialogs can be spoken aloud (TTS and SSML is supported) or visually represented (chat bubbles, cards, lists, carousels, and chips that are displayed on screen).
Helpers let you ask the Assistant to fulfill intents for you to carry out common functionality.
How to build Assistant conversations
Your Actions can appear on a variety of surfaces such as phones that support audio and display experiences or a voice-activated speaker that supports audio-only experiences. To create the best experiences for all your users, first start by designing the voice-only experience. This is the easiest way to create a natural sounding conversation, which is still required for visual experiences. See our design principles on how to start writing great dialogs.
From there, supplement your conversation with components for the screen, such as basic cards and lists. Actions on Google also lets you supply different TTS audio and display responses when needed.
You might want to do this when you need to show less on a display than when it's spoken aloud, but want to retain a similar experience on all devices.
When your conversation needs it, you can branch your logic and provide completely different UIs for each type of surface.
This might be useful for times when you need a simplified experience for audio-only devices, such as a quick reorder of recent items and a full shopping cart experience on devices with audio and screen outputs.