No Hubbub - Cloud PubSub
Mon, Mar 9, 2015Last week, Google announced the beta release of Cloud PubSub. They state,
“We designed Google Cloud Pub/Sub to deliver real-time and reliable messaging, in one global, managed service that helps developers create simpler, more reliable, and more flexible applications. It’s been tested extensively, supporting critical applications like Google Cloud Monitoring and Snapchat’s new Discover feature.”
Push-based publish/subscribe over HTTP is something I’ve been interested in since Google helped draft the PubSubHubbub protocol five years ago.
From the abstract,
“We offer this spec in hopes that it fills a need or at least advances the state of the discussion in the pubsub space. Polling sucks. We think a decentralized pubsub layer is a fundamental, missing layer in the Internet architecture today and its existence, more than just enabling the obvious lower latency feed readers, would enable many cool applications, most of which we can’t even imagine. But we’re looking forward to decentralized social networking.”
It’s a shame they chose a silly name and focused unnecessarily on the distribution of changes in “feeds” instead of using more general language. There’s no reason this protocol can’t be applied to the distribution of arbitrary messages. In fact, shortly after the first draft was published in 2010, I decided to build an implementation to use as a message distribution system for Attribyte. HTTPS with authentication, or spiped, allows messages to be distributed in near-real-time to services running inside the same rack or anywhere in the cloud.
So, is Cloud PubSub based on PubSubHubbub? No, but the terminology and mechanics of the “push” model are almost exactly the same. Publishers create “topics” on the service to which they publish messages. Applications subscribe to topics and receive published messages on a pre-configured HTTP endpoint.
How do the two compare?
Feature | Cloud PubSub | PubSubHubbub |
---|---|---|
Subscribe to topic? | Synchronous. JSON API or Google developer console used to configure the callback endpoint for the topic. | Multi-message asynchronous “dance” over HTTP allows subscribers to negotiate an endpoint for message reception with verification of intent. |
Subscription lifetime? | Until explicitly canceled. | Specified lease time or explicitly canceled. |
Receive callback messages? | “Webhook HTTPS” I think this is jargon that simply means HTTPS POST | HTTP/HTTPS POST |
Message content type? | JSON. Binary messages must be Base 64 encoded. | Arbitrary bytes! Because this protocol was originally created for distributing changes to Atom/RSS feeds, most people, I think, assume that it is only useful for distributing feed changes. |
Arbitrary “attributes” associated with the message? | Embedded in the JSON message. | HTTP headers. |
Subscriber acknowledges receipt? | “Success” HTTP response with configurable “ack deadline.” | “Success” HTTP response. Timeout is not part of the protocol, but obviously configurable. |
Delivery guarantees? | Out-of-order delivery possible. At least one delivery. Retry on failure? Yes. | Not specified. Neither delivery order nor single delivery is guaranteed. Retry on failure? “Hubs SHOULD retry notifications repeatedly until successful (up to some reasonable maximum over a reasonable time period)” |
Security? | HTTPS | HTTPS and per-subscriber, HMAC-based, “authenticated content distribution.” |
Run your own server? | No – but there’s nothing that prevents building one with an identical API. | Yes. |
Ignore the subscription mechanics. A server that supports pubsubhubbub can easily support the “push” version of the Cloud PubSub API.
For either, Here’s what you get for delivery guarantees: maybe ordered, possibly delivered more than once. So, not much! Even if you just need to feel confident that, “the subscriber finished processing the message,” you’ll have to rely on hub retry. To accomplish this, the subscriber must wait until processing is complete, however long that takes, to send the “OK” response back to the hub. Of course, if the processing is stalled or takes a long time, the hub may decide to close the socket before processing finishes. When this happens, be prepared to handle the same message when the hub attempts a retry! If it happens frequently, the hub is going to run out of resources spending time and connections waiting for the synchronous response. To avoid this (but not the potential for duplicate messages), Cloud PubSub supports a “pull” model that PubSubHubbub doesn’t.
Publish-subscribe is a powerful tool, but not for the faint of heart. As Pat Helland says…
Messaging across loosely coupled partners is inherently an exercise in confusion and uncertainty.