Subscription Types
Pub/Sub offers different subscription types to cater to various use cases:
Pull subscriptions
With pull subscriptions, your application periodically makes requests to the Pub/Sub service to retrieve messages. This approach gives you more control over when and how many messages are consumed.
- Batch processing where messages can be processed in bulk at specific intervals.
- Applications that need fine-grained control over message flow and processing rate.
- Scenarios where the subscriber cannot be exposed to a public endpoint (e.g., behind a firewall).
Pub/Sub pull subscriptions support two APIs for retrieving messages:
- Pull: A unary RPC based on a request and response model.
- StreamingPull: Utilizes a persistent bidirectional connection for receiving multiple messages as they become available.
Pub/Sub provides both high-level and low-level auto-generated client libraries. For optimal performance, it is recommended to use asynchronous pull with the StreamingPull API and the high-level client library.
Code samples for client libraries are available:
Before using these samples, ensure you have completed the necessary setup:
Push subscriptions
For push subscriptions, the Pub/Sub server initiates requests to your application to deliver messages. This is suitable for applications that need real-time message delivery.
- Real-time applications that need to process messages immediately upon arrival.
- Serverless architectures where your application can be triggered by Pub/Sub events.
To receive messages from push subscriptions, you must use a webhook to process the HTTP POST requests that Pub/Sub sends to the push endpoint.
For information on processing these requests in App Engine and simulating push notifications locally, see:
Authentication: If a push subscription uses authentication, Pub/Sub signs a JWT and sends it in the Authorization header of the push request. The JWT includes claims and a signature. For details and code samples, see:
For production environments, it is recommended to use the Google client library for token validation. While the tokeninfo endpoint may be simpler, its use in production is discouraged due to potential throttling.
BigQuery Subscriptions
With BigQuery subscriptions, Pub/Sub writes messages directly to a BigQuery table. This is useful for analytical workloads that need to persist messages for long-term storage and analysis.
- Archiving messages for long-term data warehousing.
- Performing SQL-based analysis on message data.
- Joining message data with other datasets in BigQuery.
The following steps outline the process for establishing and testing a direct Pub/Sub to BigQuery Integration:
Define and Create the BigQuery Table With Schema:
- Construct a schema that accurately represents the structure of the data to be ingested from Pub/Sub. Refer to the BigQuery Table Schema Guidelines for more details.
- Utilize the BigQuery console,
bq
command-line tool, or BigQuery API to create the table using the defined schema. The resulting BigQuery table will be utilized as the destination for the Pub/Sub messages.
Create a BigQuery Subscription:
- Navigate to the Pub/Sub section within the Google Cloud Console.
- Create a new BigQuery subscription for the integration.
- Select Delivery type as Write to BigQuery and specify the BigQuery dataset, table ID as the destination.
- Select the option "Use Table Schema." This ensures that Pub/Sub automatically aligns the incoming message structure with the BigQuery table's schema.
- The Pub/Sub service account must be granted an appropriate role (e.g., BigQuery Data Editor) to enable write operations to the BigQuery table.
Publish Sample Messages to the Pub/Sub Topic:
- Publish sample messages to the Pub/Sub topic associated with the subscription.
- Ensure these messages adhere to the structure defined in the BigQuery table schema. See Example Messages for sample messages.
Verify Data Ingestion in BigQuery:
- The published messages should be present within the table, structured according to the defined schema.
- Note: The
data
field of the published messages will be BASE64 encoded by default upon ingestion into BigQuery.
Decoding BASE64 Encoded Data: To decode the BASE64 encoded data, utilize the
SAFE_CONVERT_BYTES_TO_STRING
function from the supported conversion functions.Retrieve Data from BigQuery via Client Library: BigQuery provides flexible data consumption and processing capabilities, accessible through its client libraries. These libraries offer a simplified interface to the BigQuery API, minimizing the need for complex, direct API requests. For information on installing and using these libraries, see the client libraries documentation.
Cloud Storage Subscriptions
With Cloud Storage subscriptions, Pub/Sub writes messages directly to Cloud Storage files in a specified bucket. This type of subscription is useful for storing large numbers of messages in a cost-effective and scalable way.
- Data lakes: storing large volumes of raw message data for later processing.
- Archiving: long-term storage of messages for compliance or historical analysis.
- Batch processing: using other Google Cloud services like Dataflow or Dataproc to process messages in Cloud Storage files.
For more in-depth analysis please see how to choose section.