New Feature: Keep It All Together With CorrelationId

February 8, 2015, 4:00 pm

Say hello to a new feature! One of the best things about AMPS is the way that it keeps publishers completely independent from subscribers. Publishers don’t need to know how many subscribers are listening for a message, where they are, or even whether they’re connected at a given point in time. That flexibility pays off: once publishers are set up, you can use the same message stream for any number of different applications, without changing the publisher.

Every now and then, though, this decoupling has a disadvantage. Like helping subscribers identify which messages belong together, even if they’re on a different subscription. Or communicating metadata, like a response address, that isn’t part of the original message.

One way to do this is to enrich the message. With this approach, the publisher can add an extra field with the correlation information to the message before sending it. The subscriber then uses the extra field to figure out which messages go together. There are some real advantages to message enrichment: because the correlation information is part of the message, AMPS can filter on that field, which can increase the precision of your subscriptions.

There are some drawbacks to enrichment, though. You need to add that field to every message when you publish the message. If your AMPS publisher isn’t the original source of the message, this adds extra work because your publisher needs to parse and reassemble the message. Likewise, your subscriber may need that correlation information to figure out how to handle the message, which may mean parsing the message before parsing is required, or (even worse) parsing the message more than once. For lots of applications, enrichment adds minimal overhead, and the added filtering capability makes enrichment a great option. For other applications, though, it’s expensive or impossible to enrich messages. What then?

In AMPS 4.0, there’s another option. We’ve added an optional CorrelationId header for messages published to AMPS. The publisher sets the CorrelationId before sending the message. AMPS passes along the CorrelationId to the subscriber, without parsing, altering, changing, folding, spindling, or mutilating the CorrelationId.

Code it Up

Enough introduction, let’s look at the code. First, a simple publisher sets a value on the CorrelationId:

# Set up a subscription to wait for replies.defget_replies(message):print"Got a response %s on topic %s"% \
          (msg.get_data(),msg.get_topic())client.subscribe(get_replies,"reply-to-[0-9]")# Publish messagescommand=AMPS.Command("publish").set_topic("all-the-things")# use the CorrelationId as a reply to topicforiinrange(1,10):command.set_correlation_id("reply-to-%d"%i)command.set_data('{"message":"hello, subscriber!"}')# because we don't check for a response, we use# execute_async with no message handlerclient.execute_async(command,None)# wait for replieswhileTrue:time.sleep(1)

First, the publisher sets up a subscription to process replies. It’s important to make sure that the subscription is running before sending any messages, because subscribers may begin replying before all of the messages are sent.

To send the messages, the publisher first creates a publish Command that will hold the common information for all o the publish requests. Each time through the loop, the publisher fills in a distinct CorrelationId and the message data, then publishes the message.

Once all the messages are published, the subscribe waits for replies – each on a different topic communicated to the subscriber in the CorrelationId.

That’s all there is to it. AMPS doesn’t process the CorrelationId, so there’s no need to provide it in any particular format. For the subscriber, we just receive the messages and print what we get. Because, in this case, we use the CorrelationId as the topic to reply to, we just send a message back to the specified topic:

formessageinclient.subscribe("all-the-things"):print"Correlation: '%s' on message '%s'"% \
        (message.get_correlation_id(),message.get_data())client.publish(message.get_correlation_id(), \
                  '{"response":"hi, world!"}')

The output of the subscriber is shown below:

Correlation: 'reply-to-1' on message '{"message":"hello, subscriber!"}'
Correlation: 'reply-to-2' on message '{"message":"hello, subscriber!"}'
Correlation: 'reply-to-3' on message '{"message":"hello, subscriber!"}'
Correlation: 'reply-to-4' on message '{"message":"hello, subscriber!"}'
Correlation: 'reply-to-5' on message '{"message":"hello, subscriber!"}'
Correlation: 'reply-to-6' on message '{"message":"hello, subscriber!"}'
Correlation: 'reply-to-7' on message '{"message":"hello, subscriber!"}'
Correlation: 'reply-to-8' on message '{"message":"hello, subscriber!"}'
Correlation: 'reply-to-9' on message '{"message":"hello, subscriber!"}'

And, once the subscriber replies to the messages, the publisher produces this output:

Got a response {"response":"hi, world!"} on topic reply-to-1
Got a response {"response":"hi, world!"} on topic reply-to-2
Got a response {"response":"hi, world!"} on topic reply-to-3
Got a response {"response":"hi, world!"} on topic reply-to-4
Got a response {"response":"hi, world!"} on topic reply-to-5
Got a response {"response":"hi, world!"} on topic reply-to-6
Got a response {"response":"hi, world!"} on topic reply-to-7
Got a response {"response":"hi, world!"} on topic reply-to-8
Got a response {"response":"hi, world!"} on topic reply-to-9

It’s just that simple.

Are there provisos? Caveats? Quid pro quos?

No! Absolutely not! It’s just that simple. Well, most of the time, anyway.

There are a few things you should know about how AMPS treats CorrelationId. These mostly boil down to a simple principle of “do the right thing when the right thing is clear, otherwise do nothing”.

For SOW records, the CorrelationId of the record is the CorrelationId of the most recent message that updated the record. Exactly what you would get if the CorrelationId were a field in the SOW. What if there’s no CorrelationId on that SOW record? Then AMPS doesn’t provide one – the message from AMPS doesn’t include that header.
For transaction log replay, AMPS includes the CorrelationId on the replayed messages. To make this possible, AMPS stores the CorrelationId for a message in the transaction log when the message includes a CorrelationId.
For delta publish, the CorrelationId of the record is updated if there’s a CorrelationId on the update. Otherwise, AMPS preserves the existing CorrelationId.
For delta subscribe, AMPS provides the CorrelationId on the message if the record in the SOW has a CorrelationId, or if the new publish adds one. Otherwise, no CorrelationId. This is the same principle as SOW records.
For views (including JOINs and aggregations), AMPS never provides a CorrelationId. In this case, the results of the view might come from messages that have different CorrelationId values. Which one is the best? Who’s right and who’s wrong? How can AMPS choose a favorite? There’s no good answer, so AMPS doesn’t provide a CorrelationId.
For out-of-focus messages, the CorrelationID AMPS provides depends on why the message went out of focus. If the message has gone out of focus because it no longer matches the subscription, AMPS provides the CorrelationID of the update. If the message has been deleted, or the subscriber is no longer entitled to see the new message, AMPS provides the CorrelationID of the previous message.
For topics where AMPS creates messages, such as /AMPS/ClientStatus, AMPS never provides a CorrelationId. Since AMPS doesn’t use the contents of the CorrelationId at all, there’s no good answer for how to fill those in.

Keep it All Together

That’s the story of CorrelationId. It’s a simple feature that does one thing, only one thing, and does it well.

But what will you do with it? How do you use it to keep track of messages and get them where they’re going? That’s up to you. Because AMPS doesn’t change the CorrelationId, you can put whatever routing or processing information you need to in that field. The sky’s the limit¹!

[1] Within some common sense guidelines. The CorrelationId is part of the message headers sent to AMPS and is stored as part of a SOW record. The precise limits for those depend on your configuration and the data involved, but there’s no special limit on a CorrelationId.</p>

↧

How Fast Can You Go?

February 17, 2015, 4:00 pm

≫ Next: No Filesystem? No Problem! Keeping State in AMPS

≪ Previous: New Feature: Keep It All Together With CorrelationId

Built for speed AMPS is built from the ground up to go fast. AMPS tries to deliver messages at the fastest rate that an individual consumer can handle. AMPS has sophisticated machinery to try to find the fastest possible delivery rate for an individual consumer, and AMPS works hard¹ to keep slower consumers from causing problems for faster consumers.

These techniques apply to historical replay (bookmark subscriptions²) as well as subscriptions to current publishes. AMPS is designed to provide messages as fast as possible, regardless of the source.

Why would anyone Ever Want to go Slow?

As fast as possible is always a good thing in our world here at 60East, but, what if you need to slow the pace of messages? And why would you ever want to do that?

We recently worked with a developer who wanted to simulate real-time conditions for capacity planning and testing purposes. In the simulation, this developer wanted the ability to replay messages at any rate from the actual publish speed all the way up to the maximum throughput the client could handle. By controlling the replay speed, the simulation could exactly replicate activity peaks throughout the day, or run at 2x speed, or 4x speed, or anything up to the full speed of the system.

As it turns out, this is simple to do with AMPS 4.0, a topic backed by a transaction log, and a bookmark subscription.

The key ingredient is that AMPS 4.0 records the timestamp at processing time for each message and then makes that available when you replay messages. For regular subscriptions, the timestamp is close to the current time, but, for bookmark subscriptions, you get the original timestamp of when the message was processed. True-to-form, AMPS will continue to send messages as fast as your client can consume them. With a little bit of code, though, you can use the timestamps to replay messages to your client at any speed you like. Real time? Twice as fast? Four times as fast? Half speed? You’ve got full control.

Here is a sample bookmark subscription using the Java client and the Command Interface (new to AMPS 4.0 clients!):

publicvoidsubscribe(floatpace,Stringbookmark){try{Clientclient=newClient("Paced-Replay-Client");client.connect(uri);client.logon();// Use your own message handlerSOWAndSubscribeMessageHandlerssh=newSOWAndSubscribeMessageHandler();PacedBookMarkSubscribeHandlerpacedh=newPacedBookMarkSubscribeHandler(pace,ssh);Commandcommand=newCommand("subscribe").setTopic(topic).setBookmark(bookmark).setOptions("timestamp,oof");client.executeAsync(command,pacedh);}catch(AMPSExceptione){System.out.println("exception in Main: "+e.toString());}}

This is a pretty typical subscription, which includes the Option of timestamp. This option is key as it will return a timestamp on published messages. We’ll use that timestamp to pace the messages.

To set the pace, we used a PacedBookMarkSubscribeHandler and pass our Message Handler of choice along with the desired replay pace.

publicclassPacedBookMarkSubscribeHandlerimplementsMessageHandler{finalTimeZonegmt=TimeZone.getTimeZone("GMT");SimpleDateFormatsdf=newSimpleDateFormat("yyyyMMdd'T'HHmmss");longpreviousTx=0;floatpace;MessageHandlerwrappedHandler;publicPacedBookMarkSubscribeHandler(floatpace,MessageHandlerh){System.out.printf("Replaying at %fx of real time.\n",1/pace);this.pace=pace;this.sdf.setTimeZone(gmt);this.wrappedHandler=h;}privatelonggetDelta(Stringtimestamp){longdelta=0,currentTx=0;try{currentTx=sdf.parse(timestamp).getTime();}catch(ParseExceptione){System.out.println("exception in getDelta: "+e.toString());}if(previousTx>0)delta=(long)((currentTx-previousTx)*pace);previousTx=currentTx;returndelta;}publicvoidinvoke(Messagem){switch(m.getCommand()){caseMessage.Command.SOW:wrappedHandler.invoke(m);break;caseMessage.Command.OOF:caseMessage.Command.Publish:try{Thread.sleep(getDelta(m.getTimestamp()));}catch(InterruptedExceptione){System.out.println("Paced Handler Exception: "+e.toString());}wrappedHandler.invoke(m);break;}}}

As you can see, when each message comes in, we use the timestamp on the message to determine the elapsed time between this message and the previous message, scaled by the speed of the replay. We sleep for that amount of time, and then process the message.

Set the Pace

You can use this simple technique to set the speed of a replay, and use the PacedBookmarkSubscribeHandler (or your own variation of it!) in place of an existing message handler. How fast can your application process messages?

It’s up to you.

[1]See the AMPS User Guide, where Slow Client Management is discussed in section 23.4. We’re also working on a blog post to talk more about the techniques AMPS uses to make efficient use of resources – stay tuned!

[2]Bookmark subscribe allows you to begin a subscription at any point in the AMPS transaction log. It’s one of the most commonly-used features of AMPS. For more information, see the AMPS User Guide chapter 18, or the documentation for your AMPS client library of choice.

↧

No Filesystem? No Problem! Keeping State in AMPS

March 8, 2015, 5:00 pm

≫ Next: Scale Up and Out

≪ Previous: How Fast Can You Go?

Never lose track of your state AMPS makes a great platform for distributing messages to worker processes. The combination of low latency delivery, the SOW last value cache, message replay, and powerful content filtering make it easy to build a scalable grid of workers.

In this post, we show how to extend the AMPS client to provide a bookmark store for workers that don’t maintain persistent state locally. The post assumes a good working knowledge of resumable subscriptions (covered in detail in the AMPS User Guide and the AMPS Java Developer Guide), and also assumes some familiarity with the implementations of the AMPS clients.

Complete source code for this post, including an AMPS configuration file and a class that loads sample data into AMPS, is available for download.

Keeping State in AMPS

To keep state in AMPS, we use the following three steps:

Load state from AMPS when the application starts
Periodically persist state to AMPS as the application performs work
Flush the saved state to AMPS when the application shuts down

One way to use this technique is to maintain the point at which a particular worker should resume a subscription. To do this, we implement a BookmarkStore that uses AMPS for persistence.

There are lots of other ways to use this technique, of course. Yuck! Stateless Poison Message Handling shows a way to use the same technique to track messages that can’t be processed by a worker.

You can use varations on this technique to save the current state of a calculation or any other state that you need to track.

About Stores

The AMPS client libraries use stores to provide reliable publication and resumable subscriptions. Stores, as the name suggests, are used by the client to maintain state. The stores preserve the current state of the client. Bookmark stores save the state of incoming subscriptions. Publish stores save outgoing messages. For each type of store, the client library provides a simple interface, allowing you to choose the specific store implementation you want to use, or to write your own.

Bookmark Stores

Bookmark stores provide the following major functions:

Add a bookmark to a subscription, indicating that a message has been received
Remove a bookmark from a subscription, indicating that the message has been processed
Return the most recent bookmark for a subscription, which is the point at which the subscription should resume

With bookmark live subscriptions, bookmark stores have one additional responsibility:

Track which received messages have been persisted, and use those to help calculate the most recent bookmark

The AMPS clients include two varieties of bookmark stores. Memory-backed bookmark stores allow clients to resume after losing a connection to the server. In this case, the client loses state if the application that uses the client restarts. File-backed bookmark stores allow clients to resume after restarting. Some applications, though, need to be able to resume subscriptions without maintaining local state. For example, a set of worker tasks running on a virtual machine that is periodically re-provisioned can’t rely on persistent files to maintain state. Your application may want to present a consistent view of information whether the user is connecting from a desktop, a mobile phone, or through a website, without presenting messages that the user has already acted on.

The AMPS SOW is one way to preserve state for a client without relying on having access to a filesystem, or being able to access any resources other than AMPS itself. In this blog post, we’ll show you how to use AMPS as a bookmark store.

There are a few constraints to consider for creating the bookmark store. The solution needs to minimize the number of messages and the overall amount of data published to AMPS. While publishing to AMPS is fast, each message sent to AMPS consumes bandwidth between AMPS and the client. That bandwidth is often the most constrained resource for the application, so we need to use as little as possible. Last, but not least, it’s important to keep the solution simple, and use what’s already provided in the client wherever possible.

In addition, when a subscription uses the live option, the subscription receives messages that have not yet been persisted to the transaction log. This means that, if the server fails over, it is possible that the client has received messages that are not stored in the transaction log. In this case, AMPS periodically sends persisted acknowledgements on the subscription, which indicates the most recent point at which messages in the topic have been persisted. The bookmark store implementations provided with the AMPS clients track these acknowledgements, and the most recent method for those stores returns the latest persisted message rather than the latest discarded message. Using this strategy, the bookmark store guarantees that the client can restart from a valid message and will not miss messages even when using the live option.

To meet these constraints, the AMPS bookmark store takes this approach:

Progress for the clients is stored in a SOW topic. This SOW topic need not be on the same server that publishes the messages. The SOW topic can be replicated, as well, to provide highly-available storage.
Rather than persisting the entire state of the store to AMPS, keep the store in local memory and persist only the bookmark value for MOST_RECENT. When persisting the value, indicate whether the subscription is receiving persisted acknowledgements or not.
Persist the most recent value to AMPS periodically, based on the number of messages processed for the subscription. This interval is configurable.
Derive from MemoryBookmarkStore to take advantage of the logic that’s already written for duplicate handling, finding the correct value of MOST_RECENT, and maintaining quick in-memory access.

The rest of this post describes the implementation in detail.

Configuring The SOW

The records that hold the progress will contain the clientName, the subscription ID for every subscription tracked by that client, the last bookmark the client has persisted, and whether the subscriber is receiving persisted acknowledgements. Represented as JSON, each message will look something like this:

{"clientName":"resumableClient","subId":"sample-replay-id","bookmark":"10620414156524534001|3836|","persisted":"false"}

Messages in the SOW are uniquely identified using the clientName and subId fields. Because each client can have multiple subscriptions, each processing different bookmarks, the SOW definition creates a compound key where each unique combination of clientName and subId is a unique message. We define the SOW topic as follows:

<SOW><TopicDefinition><FileName>./sow/%n.sow</FileName><Topic>/ADMIN/bookmark_store</Topic><MessageType>json</MessageType><Key>/clientName</Key><Key>/subId</Key></TopicDefinition></SOW>

For this example, we use the JSON message type to make it easy to read the messages. This means that the instance that hosts the bookmark store needs to accept connections from clients that use the JSON message type.

We use the /ADMIN/ prefix as a way of indicating that this topic is used for application record keeping, and is not a topic that contains data. This is a convention to help with logging and troubleshooting, and is also intended to make it unlikely than any existing applications that use regex topic subscriptions will accidentally subscribe to this topic. However, the prefix has no meaning for AMPS itself, and the implementation could choose a different name.

Working with the BookmarkStore Interface

To get the results we need, there are three sets of methods we need to worry about on the BookmarkStore interface:

log() registers a message with the bookmark store to register the subscription and bookmark on the message. We don’t need to override this, since we’re planning to use the functionality that’s already provided, but we’ll call the log() function when we load the current state of the bookmarks from the SOW.
discard() marks a message as processed, and allows the bookmark store to discard the message. Our new AMPS-based bookmark store will override this method. We’ll call the discard() method for the MemoryBookmark store and add code for persisting bookmarks to AMPS periodically.
persisted() is called when the client receives a persisted acknowledgement. This marks a message as persisted, and allows the bookmark store to discard the message. For our implementation, we track that the subscription is receiving persisted acknowledgements and then call the MemoryBookmarkStore implementation.

For all of the other functionality of the bookmark store, the MemoryBookmarkStore does exactly what we need.

Defining the BookmarkStore Class

Next, it’s time to define the class for the bookmark store and a constructor. The constructor for the class takes the client to use to manage the bookmark store – which must already be connected – and the name of the client to track subscriptions for. Notice that the client used to track subscriptions doesn’t need to be connected to the same AMPS instance as the subscriptions being tracked.

publicAMPSBookmarkStore(ClientbookmarkClient,StringtrackedClientName)throwsAMPSException{if(bookmarkClient==null){thrownewAMPSException("Null client passed to bookmark store.");}_internalClient=bookmarkClient;_trackedName=trackedClientName;_bookmarkPattern=Pattern.compile("\"bookmark\" *: *\"([^\"]*)\"");_subIdPattern=Pattern.compile("\"subId\" *: *\"([^\"]+)\"");_persistedAckPattern=Pattern.compile("\"persisted\" *: *\"([^\"]+)\"");MessageStreamms=null;try{// Message to use for logging bookmarksMessagelogmsg=_internalClient.allocateMessage();// Retrieve state for this client.ms=_internalClient.sow("/ADMIN/bookmark_store","/clientName = '"+_trackedName+"'");for(Messagemsg:ms){if(msg.getCommand()!=Message.Command.SOW){continue;}Stringdata=msg.getData();MatcherbookmarkMatch=_bookmarkPattern.matcher(data);if(!bookmarkMatch.find())continue;MatchersubIdMatch=_subIdPattern.matcher(data);if(!subIdMatch.find())continue;MatcherpersistedAckMatch=_persistedAckPattern.matcher(data);if(!persistedAckMatch.find())continue;// Get the bookmark string, the subId, and whether this subscription// receives persisted acks.Stringbookmark=bookmarkMatch.group(1);StringsubId=subIdMatch.group(1);StringpersistedAcks=persistedAckMatch.group(1);// Extract individual bookmarks from the record if necessaryString[]bookmarks=newString[1];if(bookmark.contains(",")){bookmarks=bookmark.split(",");}else{bookmarks[0]=bookmark;}for(Stringb:bookmarks){// Create a message with the subId and bookmark// to use for logging with the MemoryBookmarkStore.logmsg.reset();logmsg.setSubId(subId);logmsg.setBookmark(bookmark);// Register the bookmark from the SOW as the// last successfully processed bookmark for this subId.super.log(logmsg);super.discard(logmsg);if(persistedAcks=="true"){super.persisted(logmsg.getSubIdRaw(),logmsg.getBookmarkRaw());_persistedAcks.add(logmsg.getSubId());}}_discardCounter.put(logmsg.getSubIdRaw(),0);}}catch(AMPSExceptione){System.err.println(e.getLocalizedMessage());e.printStackTrace(System.err);throwe;}finally{if(ms!=null)ms.close();}// Start the worker to asynchronously handle updates_workerThread=newThread(newUpdatePublisher(_internalClient,_trackedName,_workQueue),"Bookmark Update for "+_trackedName);_workerThread.start();}

The constructor stores the client and the name of the client to track. The constructor then runs a SOW query on the topic that stores the persisted bookmarks and processes the results.

The persisted message for the subscription contains either a single bookmark, or a comma-delimited set of bookmarks. If the message contains a list, the constructor processes the bookmarks one at a time.

For each persisted bookmark, the constructor creates a message that contains the subscription ID and bookmark. The constructor logs the message, then immediately discards it. If the subscription has been receiving persisted acknowledgements, the constructor also logs an acknowledgement for the bookmark. This has the result of ensuring that the underlying MemoryBookmarkStore records the bookmark from the SOW query as the most recent bookmark for that subscription. The relevant lines are duplicated below:

// last successfully processed bookmark for this subId.super.log(logmsg);super.discard(logmsg);if(persistedAcks=="true"){super.persisted(logmsg.getSubIdRaw(),logmsg.getBookmarkRaw());_persistedAcks.add(logmsg.getSubId());}

To keep the sample self-contained, this class uses the standard Java regular expressions utility to process messages. In production, we would replace the regular expressions with one of the commonly-used JSON-parsing classes for Java.

Publishing Updates to the Store

While publish operations are very efficient with AMPS, the class takes the standard approach of trying to do minimal work within a message handler. In this case, because the discard() methods may be called within a message handler, the AMPSBookmarkStore uses a simple BlockingQueue to deliver the subscription IDs and bookmarks to a worker thread. The worker thread dequeues each request, creates a message, and publishes the message to the SOW.

Managing Discarded Messages

Next, we override the discard() method of the MemoryBookmarkStore. In the overrides, we check to see if it’s necessary to persist the bookmark to the SOW topic, and then call discard() on the MemoryBookmarkStore.

@Overridepublicvoiddiscard(Messagem)throwsAMPSException{super.discard(m);checkForPersist(m.getSubIdRaw());}

The checkForPersist() method checks the number of messages discarded for the subId. If the number of messages is higher than the configured threshold, the method writes the most recent bookmark to SOW storage.

privatevoidcheckForPersist(FieldsubId)throwsAMPSException{Integercount=_discardCounter.get(subId);if(count==null){count=0;}++count;if(count>_threshold){try{_workQueue.put(newUpdateRecord(subId.copy(),getMostRecent(subId)));count=0;}catch(InterruptedExceptione){thrownewAMPSException(e);}}_discardCounter.put(subId,count);}

The method checks the current count. If the count is above the configured threshold, the method enqueues an update and resets the counter. Otherwise, the method just increments the counter and returns. Notice that when the method enqueues an update, the method requests that the MemoryBookmarkStore provide the MostRecent value for the update. This ensures that the update contains the current recovery point. The overrides for the persisted methods are equally simple:

@Overridepublicvoidpersisted(FieldsubId,BookmarkFieldbookmark)throwsAMPSException{super.persisted(subId,bookmark);_persistedAcks.add(subId.toString());}

In this case, we simply note that the subscription is getting persisted acknowledgements, and then call the MemoryBookmarkStore. Adding the subId to the set of persisted acks means that messages that note the most recent bookmark for this subscription will include the information that it gets persisted acknowledgements. The worker that processes the update simply blocks until an update is submitted to the queue, then creates and publishes an update message.

while(true){UpdateRecordupdate=_workQueue.take();Stringmsg="{\"clientName\":\""+_trackedName+"\""+",\"subId\":\""+update.subId+"\""+",\"bookmark\":\""+update.bookmark+"\""+",\"persisted\":"+"\""+_persistedAcks.contains(update.subId.toString())+"\""+"}";_internalClient.publish("/ADMIN/bookmark_store",msg);}

Finally, we implement two utility methods. The close() method iterates all of the subscriptions tracked by the store, and calls updateSubscription for each subscription to store the state of the subscripton to the SOW.

publicvoidclose(){for(FieldsubId:_discardCounter.keySet()){try{_workQueue.put(newUpdateRecord(subId.copy(),getMostRecent(subId)));}catch(AMPSExceptione){e.printStackTrace();// Swallow exception: could also translate to unchecked// or log in the object.}catch(InterruptedExceptione){// Unable to update publish store. Recover at last saved bookmark// instead.e.printStackTrace();break;}}// Wait for the worker thread to drain the queue.while(!(_workQueue.size()==0&&_workerThread.getState()==Thread.State.WAITING)){Thread.yield();}_workerThread.interrupt();try{_workerThread.join();}catch(InterruptedExceptione){e.printStackTrace();// trying to close, continue}_internalClient.close();_internalClient=null;}

The setPersistEvery() method sets the number of messages to process for a subscription between snapshots of the bookmark into the SOW.

publicvoidsetPersistEvery(intmessageCount){_threshold=messageCount;}

The rest of the bookmark store interface, and the internal logic for managing the in-memory store is provided by the existing MemoryBookMarkStore.

Using the Store

Using the store from an HAClient is simple. You construct the store with the client to use for persistence and the name of the client to track. For example, the code snippet below creates an AMPSBookmarkStore that uses an existing client:

// control_client is connected to the AMPS instance that hosts the bookmark storeHAClientclient=newHAClient("Query-Client");DefaultServerChoosersc=newDefaultServerChooser();sc.add("tcp://amps-server:9007/amps");// Server to subscribe toclient.setServerChooser(sc);AMPSBookmarkStorestore=newAMPSBookmarkStore(control_client,client.getName());store.setPersistEvery(10);client.setBookmarkStore(store);client.connectAndLogon();

Once the bookmark store is set, you use the client just as you would with any other bookmark store. For example, using the Command interface introduced with the 4.0 clients:

BookmarkMessageHandlerhandler=newBookmarkMessageHandler(client);client.executeAsync(newCommand("subscribe").setTopic("messages-history").setSubId(newCommandId("sample-replay-id")).setBookmark(Client.Bookmarks.MOST_RECENT),handler);

Notice that, just as with any other resumable subscription, we set the subscription ID and we ask to start the subscription from the MOST_RECENT bookmark in the bookmark store. The provided sample also uses a bookmark message handler that exits after processing ten messages: this allows you to see how the bookmark store works for resuming after the last processed message.

Next Steps: High Availability Bookmark Stores

Because the topic that stores the bookmarks is just a normal SOW topic in AMPS, you can add high availability to the bookmark store without changing the AMPSBookmarkStore class itself.

To do this, you’d make the following changes:

Add a transaction log for the /ADMIN/bookmark_store topic.
Replicate the topic to another AMPS instance.
Provide an HAClient as the control client for the AMPSBookmarkStore, with the two AMPS instances in the server chooser for the client.

Wrapping it Up

While this post has been a deep dive into using AMPS itself as a BookmarkStore, the techniques in the post apply to any sort of state your application may need.

Just to review, the overall technique is:

Restore state from the SOW when the application starts
Periodically save state to the SOW as the application runs
Flush the saved state when the application exits

Have a different pattern for maintaining application state in AMPS? Let us know in the comments!

↧

Scale Up and Out

March 10, 2015, 5:00 pm

≫ Next: Tip: Easy File Sharing

≪ Previous: No Filesystem? No Problem! Keeping State in AMPS

Jeffrey M. Birnbaum describes how 60East builds high performance software CIO Review recently published “Scale Up and Out: The Changing Face of High Performance Computing” by Jeffrey M. Birnbaum. This article describes the state of high performance computing, and how things have changed since 1998.

You can get the PDF, or read the article online.

↧

Tip: Easy File Sharing

April 1, 2015, 5:00 pm

≫ Next: Fresh From the Lab: AMPS 4.3

≪ Previous: Scale Up and Out

File sharing: the first (and last?) great IT problem

At 60East, we’re always helping teams analyze complex, interconnected software systems. One thing that’s easy to take for granted is file sharing – something that’s not always easy to do between teams in large enterprises. There are often great reasons for this, but for transmission of log files containing no sensitive information, it can be a real burden. The best way to share files is by using a shared host or file system, but if you have two teams that need to share files that don’t have any shared systems, then there are more efficient ways other than e-mail or file sharing services.

Let’s look at two of my favorite alternatives on Linux for sharing files within enterprise teams with no easy way to share files. In this example, a user on system “a.com” wants to share a file named “big.log” with another user on system “b.com”.

We’d never recommend sharing sensitive information using these techniques in environments where it’s forbidden. Know your restrictions and use these at your own risk.

Method 1

The easiest way to share a file is for the file owner to open a simple HTTP server while the other fetches the file using a web browser or wget.

Open an HTTP Server on port 8080 on a.com, using Python:

1 python -m SimpleHTTPServer 8080

Fetch the file from b.com (or from any WebBrowser):

1 wget a.com:8080/symbol_store-20150315.tar.gz
23 Connecting to a.com:8080... connected.
4 HTTP request sent, awaiting response... 200 OK
5 Length: 29627512439(28G)[application/octet-stream]6 Saving to: “big.log”
7  1% [] 309,681,284 26.3M/s  eta 18m 24s

Wait for the file to download to complete. The user on b.com should now have the big.log file!

Method 2

The second method uses netcat (aka nc) and is great for file sharing Ninjas that may need to tweak for additional flexibility. The simple form of this method is to open netcat in listening mode on b.com and push the file to the other user from a.com.

On b.com, open the netcat listener on port 8080:

1 nc -l 8080> big.log

On a.com, push the file over the network to the listening side:

1 cat big.log | nc b.com 8080

Wait for the file transfer to complete. The user on b.com should now have the big.log file!

Bonus: Add Inline Compression

If your file is a big text file, then you could benefit from compressing the data before sending it across. Here’s how to edit the previous command to compress the bytes:

On b.com, open the netcat listener on port 8080:

1 nc -l 8080| gzip -d > big.log

On a.com, push the file over the network to the listening side:

1 cat big.log | gzip | nc b.com 8080

Super Bonus: Add a Progress Bar

Transferring large files can be frustrating when there’s no indication of how long it’s going to take. Plus, after the weather chit-chat, the progress indicators give introverted engineers something to talk about during the long pauses of silence when the bits are flying! :-) Let’s add some progress bars to our compressed netcat file transfers!

On b.com, open the netcat listener on port 8080, this is the same:

1 nc -l 8080| gzip -d > big.log

On a.com, push the file over the network to the listening side (just replace “cat” with the “pv –tpreb” command):

1 pv -tpreb big.log | gzip | nc b.com 8080
2 386MB 0:00:10 [27.9MB/s][>                        ]  1% ETA 0:16:19

Transfer At Will

With these tricks in your toolbox, we hope you’ll never miss dinner again due to the hours spent trying to send a large file to a colleague sitting in the cubicle across from you.

Do you have a great file transfer technique that gets you home on time? Let us know in the comments!

↧

Fresh From the Lab: AMPS 4.3

June 4, 2015, 5:00 pm

≫ Next: Composite Message Types: Answers Beyond Ints

≪ Previous: Tip: Easy File Sharing

File sharing: the first (and last?) great IT problem At 60East, we’ve been heads down in the lab, mixing up new ingredients to bring you the future of messaging today. We’ve just released AMPS 4.3.1.0.

This release focuses on extending the futuristic capabilities we introduced in AMPS 4.0, making AMPS even better!

Here are just a few of the new ingredients in AMPS 4.3.1.0:

Composite message types allow you to combine multiple types of data in a single message type. Among other uses, this feature lets you take advantage of the full flexibility and extensiblity of AMPS content filtering, while letting you control how AMPS parses a message.
Hash indexing for topics that maintain a State-of-the-World (SOW) database. You can now explicitly create hash indexes over SOW topics. AMPS hash indexes are a lightweight way to optimize common SOW queries. Hash indexes provide a significant performance boost for a query that uses them, with almost no degradation of publish performance. When a hash index is present, AMPS transparently uses the index for queries wherever possible.
Explicitly keyed SOW topics let you create SOW topics with an application-provided SOW key. This gives you the ability to create SOW topics over unparseable message types (such as unparsed binary) or to let the application define what makes a record unique, regardless of the data in the message.
Replication Compression can significantly reduce the bandwith required for AMPS replication. This is enabled with a simple configuration change on the publishing instance.
Persistent Journal Indexing can dramatically speed up recovery times for instances recovering large journal files. Rather than parsing each journal file to build an index of the messages in the file, AMPS now persists the index for the journal, reducing the recovery time.

These are just some of the improvements available in this release. AMPS 4.3 is available for download now – give it a try!

Stay tuned for blog posts on these new features. To stay informed on what’s new with 60East, subscribe to our newsletter and notification lists.

↧

Composite Message Types: Answers Beyond Ints

June 7, 2015, 5:00 pm

≫ Next: How can a Pure Software Messaging Solution Whip the NIC off a Hardware Messaging Appliance?

≪ Previous: Fresh From the Lab: AMPS 4.3

It's a brave new world out here!

“I may not have gone where I intended to go, but I think I have ended up where I needed to be.” ― Douglas Adams, The Long Dark Tea-Time of the Soul

In the Hitchhiker’s Guide to the Galaxy, Douglas Adams conveyed to us that the earth was actually a computer system designed to calculate the meaning of life. SPOILER ALERT: the answer was 42. Unfortunately, not all our answers come out as a concise short int format. Many of the simulation farms or grids that AMPS is deployed in produce vast amounts of data. This data is often an intermediate step in a workflow of calculations. In fact, we get asked about the movement of large data sets so often that we previously devoted an entire blog post to the topic. In this article, we are going to highlight how we can be flexible in our treatment of very large messages through the use of the new AMPS composite message types.

Finding The Big Answers

Many people use AMPS to efficiently route messages to their appropriate destinations based on powerful content filtering. With AMPS, it’s straight forward to navigate through complex data in a wide variety of formats including JSON, BSON, FIX, NVFIX, and XML. AMPS also offers an unparsed binary type (or BLOB) that lets us send any data we want, such as a serialized object, but trades off the ability to filter based on the content of the data. Also, by supporting unparsed binary, it not only helps avoid the costs of serialization and deserialization but allows AMPS to work the variety of similar formats found in customer deployments. We have also seen such binary formats used to “hide” content from AMPS, for example, large or deeply-nested XML documents that you don’t want parsed. When a message type is declared as binary, AMPS won’t attempt to parse it.

When the messages become very large, people often consider some form of optimization or otherwise transform them into a binary format (i.e. sending a series of one byte chars as sequential bytes on the wire instead of a JSON array). This can make sense if the data can be optimized effectively and alleviates memory and network loads while not incurring too much of a latency hit. Another challenge that it introduces is that we often lose the capability to do content-based filtering or routing. If we already know where they are supposed to go, then we can create the appropriate topic (target) and send them. Unfortunately, the use of explicit topics makes for a more brittle system and incurs the costs of topic management as more and more applications and topics are added to the system.

The answer? AMPS composite message types.

The Answer

With most traditional pub-sub systems the work around for handling binary payloads and rigid topic hierarchies was to utilize custom headers as a vehicle to store meta-data which could be parsed by the routing agents. Now with AMPS composite message types, we enable developers to avoid such a traditional hack and embrace content filtering on regular message parts that are not limited to any particular size or structure. Unlike headers, composite message types can contain arbitrary data, and are fully filterable.

Fortunately, if we desire to leverage optimized payloads as well as enabling critical content based filtering, we can employ composite message types. Akin to MIME, the payload can remain untouched while content filtering can be performed on the accessible parts or metadata.

In many financial services applications, the actual bulk of the payload are doubles or floating point numbers, and the data that is useful for filtering and routing is metadata and a small subset of critical information. Ideally, we would send the metadata in a format AMPS can filter, while maintaining the bulk of the payload in an efficient binary format.

No single message type meets both these needs, but the AMPS composite message type is ideal for this situation. We can store the information to be filtered on in a JSON part in a preprocessing step. We can combine that part with the BLOB payload to form a composite message type. We can treat it as a regular AMPS message type and filter/route it based on the JSON part of the message, all while maintaining the optimized payload until it is needed.

Composite message types can be treated like any other message and can be leveraged to create a SOW, conflated topics, or even with delta subscriptions. In terms of setting your application up for using it, you will just need to update your configuration file to declare your composite JSON-Binary message type. One has to just name the part and declare its parts (message types). After that, we just have to bind it to a transport.

<MessageType><Name>composite-json-binary</Name><Module>composite-local</Module><MessageType>json</MessageType><MessageType>binary</MessageType></MessageType><Transport><Name>composite-json-binary-tcp</Name><Type>tcp</Type><InetAddr>9023</InetAddr><MessageType>composite-json-binary</MessageType><Protocol>amps</Protocol></Transport>

Searching the Whole or a Part

AMPS provides two different ways to create an AMPS composite message, depending on how you want AMPS to parse the message.

The composite-local option ensures that an XPath identifier can match any of the parts of the message, and that a filter can match a specific part of the message using the ordinal value of the message part. For example, /0/mymessage=123.4 would test the json message’s field mymessage.

The alternative option is composite-global, which combines all of the parts into a single set of XPaths. This lets you find values without having to know which part of the message contains the value.

Build a Message

The AMPS 4.3.1.0 clients include helper classes for building and parsing composite messages. To build a composite message, we just have to create the message and append() each part.

Say we had a large set of doubles that represented the results from a scenario calculation. That would be the binary part of the message. We would then take any essential data from that message and, along with any other enrichment information, we would create a JSON message which would be the other part.

std::ostringstreamjson_part;std::vector<double>data;// …skipping the population of variables…// Create the payload for the composite message.AMPS::CompositeMessageBuilderbuilder;//insert the json part of the messagebuilder.append(json_part.str());// copy the array of doubles into  second message partbuilder.append(reinterpret_cast<constchar*>(data.data()),data.size()*sizeof(double));// Create publish the payload on the topicstd::stringtopic("messages");ampsClient.publish(topic.c_str(),topic.length(),builder.data(),builder.length());

And that is all we have to do to create and publish a composite message.

On the subscriber side, we instantiate a CompositeMessageParser to access the distinct parts of the composite message.

AMPS::CompositeMessageParserparser;

We then create our subscription and upon receipt, we can parse the message and obtain the distinct parts with getPart().

for(automessage:ampsClient.subscribe("messages")){parser.parse(message);std::stringjson_part(parser.getPart(0));AMPS::Fieldbinary=parser.getPart(1);...}

Of course we don’t have to be so explicit, we could first use the parser object to obtain a count of how many parts the message contained.

std::cout<<"Received message with "<<parser.size()<<" parts"<<std::endl;

To make the binary payload usable, we convert it back into a vector.

std::vector<double>vec;double*array_start=(double*)binary.data();double*array_end=array_start+(binary.len()/sizeof(double));vec.insert(vec.end(),array_start,array_end);

In this blog, we looked at how composite messages provides great flexibility on how we can optimize our messages by having them encapsulate different message types. The primary use cases are to allow for payloads that do not need to be parsed by AMPS (i.e. large binaries, custom formats etc) as well as cases that could benefit from optimized or hidden data. With AMPS, having the metadata available in a supported type such as JSON affords us the added luxury of content filtering on that particular part – without having to parse the binary payload.

In the future we will analyze more options that could be employed to improve transport optimizations and provide life to those systems still using FIX with Base64 encoded payloads. We can also discuss best practices around Protocol Buffers or Avro and/or leverage modern compression systems such as snappy. In our experience, the best choice of tools and strategies are highly dependent on the use case and characteristics such as system load, tolerance for latency, and burden of maintenance.

Let us know how you think Composite Message types may help you and we can work through the ideas with you – just “Don’t Panic” (Douglas Adams).

Kudos: I’d like to give some credit to Cory Isaacson, CTO of Risk Management Solutions, for prioritizing the issue of Very Large Messaging about a decade ago. He was looking at 10-20 Megabyte structured financial products (i.e. CMO, CDO) as well as HL7 and XML were being passed around and made us thoroughly explore distributed shared memory and cache, optimized parsers, and compression. There was no perfect general solution – every use case and customer had different tolerance levels to trading off latency and agility. Today the options and alternatives have improved, send us an email if this is something you want to discuss within the context of AMPS or otherwise.

↧

How can a Pure Software Messaging Solution Whip the NIC off a Hardware Messaging Appliance?

July 21, 2015, 5:00 pm

≫ Next: The NBA of Data Science

≪ Previous: Composite Message Types: Answers Beyond Ints

Fast cars, fast software, and beautiful scenery. Why is it that the latest Bugatti Veyron doesn’t have the fastest time at the Top Gear test track? How could a highly regulated or constrained 2004 Renault R24 Formula One outperform it? The Bugatti outperforms the F1 in many tests; but not on that prestigious Top Gear track due to the design trade-offs their designers made. We also don’t know if the driver of the Bugatti really was in tune with this vehicle. Martin Thompson provides an excellent metaphor from racing to software development in his blog Mechanical Sympathy.

"The name "Mechanical Sympathy" comes from the great racing driver Jackie Stewart, who was a 3 times world Formula 1 champion. He believed the best drivers had enough understanding of how a machine worked so they could work in harmony with it." -Martin Thompson

In the software world, “Mechanical Sympathy” comes down to pursuing the principle of building software with an intimate knowledge of the underlying hardware. i.e. If you know your hardware platform, how that platform works, and where it may be advancing, you can optimize your software to realize higher efficacy today and be able to scale along with the advancements.

In this post, we will investigate this principle further and answer the question about how AMPS running on best of breed commodity hardware can be 2.5 times faster in message processing and delivery throughput than a specialized hardware appliance.

Losing the first Heat but Winning in Record Time in the Final:

Admittedly, a while back, AMPS was in a similar situation and was outperformed on a test that was like a Top Gear track. The use case was quite a niche one that mostly relied on our state of the world cache and not a lot of our other capabilities. A specialized software vendor made us humble as their product outperformed what we thought was an analogy to a Bugatti, AMPS. We started the race by out-lapping our opponent as we were 30X faster on inserts however after a few more laps we noted we were significantly slower in the querying. The AMPS approach is optimized for frequently-updated and changing data. Most AMPS applications use subscriptions as a “continuous query” that delivers results as the row is inserted. For other applications, AMPS employs a sophisticated parallel divide and conquer approach to querying. This use-case was more of a ‘write once, then distribute’ model. The test provided an a-priori knowledge of keys and begged for an index based approach. Divide and conquer wasn’t good enough in this case. We wanted to win!

In a few days, we revamped AMPS’ indexing model to add hash indexes, and we were able to outpace the competitor in terms of queries and still maintain the 30X insert advantage. To elaborate, indexing schemes in most database systems rightfully optimize reads at the expense of maintaining indices during writes (updating and possibly rebalancing B-Trees, etc). The result is slow writes/updates but faster reads – a good tradeoff as most use cases are ‘write once, read many’. By adding hash indexes, AMPS included the ability to perform extremely fast lookup for this scenario while keeping inserts extremely efficient. In the last laps, AMPS was fueled with a combination of statically maintained indexes and on demand indexes which allowed AMPS to zoom past the checkered flag in record time.

In summary, AMPS’ Hash Index model allows one to very rapidly find data for queries that are fully covered by a hash index, as well as to take advantage of the divide and conquer on-demand indexing traditionally used by AMPS. Hash indexes brought us back to a Bugatti-level as it dramatically improve AMPS performance for query-heavy (NoSQL) scenarios.

Hardware Messaging Appliances Have a Place

Once in a while we see an analogy to a Formula 1 in the guise of a hardware appliance that claims superior throughput, predictable latency, and management simplicity. Their literature tells us there are scenarios where offloading or redirecting workloads to a messaging appliance makes sense, and if you are doing everything you need to do on the NIC, then a hardware messaging appliance is something that could be useful. By avoiding disk paging and CPU interrupts alone, they argue that they can predictably provide low latency with minimal interruption.

It’s fairly understandable that some people will still think that hardware vendors will always perform faster - it’s a dedicated box with minimal contention and can have registers aligned to the actual message types in the use case. The software driving the hardware can be finely tuned and optimized to both the use case and the customized hardware. The vendor can optimize their TCP stack, employ zero-copy processing and hand off results without leaving the NIC - nothing could be better at receiving, processing and pushing out messages… right?

The premium costs for dedicated appliances make sense only if they do something better (i.e. faster) than what best of breed commodity hardware/software can do and if they are more simple to manage and you can afford to scale out with more or advanced hardware. That is a lot of “Ifs”… perhaps the Forumula 1 cars should go race on their own special track.

What if a software solution proves these assertions wrong?

Dedicated hardware is fast - until you need to turn the other direction! When your needs change, hardware appliances can make you feel like you’re spinning your wheels.

A View to A Trade

So let’s take a real world example: a View Server system where front office equity traders are constantly changing the data they want to see. With even a dozen traders, let alone 100, this can be taxing on the server technology during peak loads when the server has to process incoming market streams as well as quickly serving data requests from the traders while potentially having to enrich or validate the structured and unstructured data.

The hardware appliance was able to drive throughput to 1.6 million 1KB messages per second and the AMPS software (running on commodity hardware) was able to realize 4.5 million messages per second – only being bound by the memory, network and CPU.

If we upgraded any part of the hardware, we would have hit better numbers – and in fact, when we switched from a 10GB network to 40GB we realized 6.5 million messages per second. To read more about this, look at our ‘Shock Absorber’ white paper or if you want particular fan-out details and CPU/Network saturation #s, please get in touch with us. If you need to see it in action, even our evaluation offering on a VM will demonstrate its inherent high throughput capacities (though AMPS works even better on real hardware).

Defending COTS is Hard Work!

AMPS is implemented in regular programming languages and works best on best of breed Commercial Off The Shelf (COTS). It will even scale down to work on VM which is great for Dev-Ops testing and development. However when embracing COTS, one has to not assume your software is really at the whim of the compiler optimization, tuned VM, OS and hardware capacity. Much of the performance gain on COTS comes from understanding each of the levels of the system and spending much effort in measuring and fine-tuning at every stage of processing.

Measuring Success:

Many of our successful real world customers are also driven by measurements and give us throughput targets as well as latency tolerance goals that are indicated by minimum, median and maximum times. In our deployments and in our proof of concepts, we actually work towards predicting how well we would perform. The challenging part is calculating the processing step but we have been doing this long enough that we understand the dynamics and can intelligently inform our estimates for AMPS’ maximum throughput from incorporating the type and size of the memory scheme, CPU type and networking capacity etc. By ensuring highly concurrent processing, intelligent memory usage strategies, and a lot of common sense, we hit our predicted rates.

When we hear that AMPS provided 2.5-4X more throughput in different cases… we aren’t ecstatic. Instead we try to reassess if the 2.5 X more throughput was a theoretical bound or if we were doing something wrong that prevented us from hitting that bound.

It is due to this philosophy that AMPS has become deployed in some of the largest and successful high volume trading systems in the US, and AMPS scales with best of breed COTS. Given AMPS’ effective use of resources, you can do much with one server and this reduces the need to scale AMPS out. Its so well behaved that we are often hosted in a multi-tenant server. If you upgrade CPUs, memory or network, AMPS scales to take advantage of the new capacity. It can scale down to run on a VM for development testing ops and up to the 400 CPUs as we need to set aggressive goals for ourselves as that has become what is expected of us by both clients and ourselves.

The Secret Sauce

In terms of the secret sauce, its simple, don’t follow all of Google’s C++ standards :), but we do implement amazingly fast XML, FIX and JSON etc parsers, exploit concurrency with lock free data structures and be built for the enterprise. The latter point, ‘the enterprise’ is vital as AMPS has a simple model for deployment, conjugation, and updates. It also has enterprise level high availability, monitoring, and admin tools. Due to the low cost point of AMPS, we have seen deployments in the thousands with minimal dev ops support requirements.

Still not convinced? Our many-trick pony also has a rich set of our multi-language API data selection capabilities that ensures it only sends the data that needs to be sent. This reduces risk of broadcast storms or network contention and minimizes CPU waste due to over-subscriptions. Developers can leverage a state of the world data cache of all recent values, complex aggregations and calculations, conflated messages with data updates at an interval, out of focus notifications, and it comes with the greatest invention of all time - a transaction log which can be leveraged for message replay, failover, and to resume subscriptions at the exact point that a subscriber restarted. All of these capabilities are not implemented with the idea that we are going to be trading off throughput performance – it’s the opposite. They increase the holistic system performance and resilience.

We are fortunate to say that we win a lot of races but we know we have to keep on being better and fully embrace the “Mechanical Sympathy” principle. The hardware messaging appliance did a great job serving 1.6 million messages per second and the simple yet finely tuned software vanquished all doubt by realizing 4.5 million 1KB messages per second in its ‘slowest case’…When we added the 40GB network, we hit over 6.5 million messages per second…A little COTS can go a long way!

↧

The NBA of Data Science

September 30, 2015, 5:00 pm

≫ Next: Comparison: AMPS and RabbitMQ

≪ Previous: How can a Pure Software Messaging Solution Whip the NIC off a Hardware Messaging Appliance?

We love basketball: especially where data is involved!. I’ve been pondering for a while on how to showcase some replay functionality in our AMPS product in a way that’s general enough that everyone understands the concept, yet provides a solution metaphor that easily translates into other domains. Ideally, the data would be from some real-world system, where time-series, ordering, and content filtering could be useful (again, leveraging the features of the product I’m trying to explain.)

One thing is for certain: whether you’re building a demo for your product or just trying to practice skills in big data, YOU NEED DATA. While Fisher’s Iris data set from the 1930’s is great for text book explanations of clustering, k-means, and some machine learning concepts, it doesn’t really exercise the cutting edge technologies that are designed for today’s large scale problem domains.

Earlier this month, I came across one of the most fascinating datasets I’ve seen to date: The NBA Motion Tracking database. Since the 2013-2014 season, every NBA court has 6 cameras elevated above the playing field that samples the 10 players and ball (in 3 DIMENSIONS!) at 25 times per second. That’s HUGE! You have a list of distinct players, teams, shot locations, all packaged into a beautiful time series accessible data set. All game events combined are around 6 BILLION events from 2013 through 2015, including the standard games, playoffs, and all-star games. The ball metrics include the x, y, and radius, which tracks the ball in all 3 dimensions throughout the game.

Without a doubt, knowing that the NBA is tracking all of this data and making it accessible [1] has made BASKETBALL MY FAVORITE SPORT (sorry, Soccer!)

Why would the NBA be doing this? Just look at the stats on http://stats.nba.com, which track certain player performance metrics: speed, shot percentages, court coverage, etc. With this data, team owners could even do their own data science projects determining the rate at which certain players fatigue, becoming less effective. Or, imagine A/B testing combinations of players on your team to find the best lineup for the playoff games.

For my purpose, I just downloaded a subset of the data and assembled a visualization where I used AMPS for high performance replay of the messages to my browser using the AMPS Javascript client API and then ran visualizations through D3.js. Given that the data is tracking physical objects over time, it makes for a fun visualization that requires little explanation. If you want to check it out, it’s here: http://replay.demo.crankuptheamps.com.

Bottom Line: If you’re searching for cool data to explore data science and/or visualization techniques, you should totally swing on over to http://stats.nba.com and take a gander – it’s so much fun. Don’t let the lack of documentation discourage you, there are some great resources available as people are making use of this dataset. Below are some resources to get you started[2] if you’re new to this.

Plug: If you’re interested in the fastest message replay engine (which happens to support pub/sub, message queueing, SQL queries, historical queries, content filtering, aggregation and conflation… and so much MORE), then please check out AMPS at http://www.crankuptheamps.com.

Thanks!

[1]“accessible” is used loosely here, since I couldn’t find any public API’s or documentation on what the data contained. Luckily, it’s large JSON objects, and they’re used from most of the pages on stats.nba.com, making it easy to understand what the data contains.

[2] How to scrape the NBA data: http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/

[3] For info on how to build (and get inspired by others!) visualizations, check out the d3js.org gallery.

↧

Comparison: AMPS and RabbitMQ

February 25, 2016, 4:00 pm

≫ Next: AMPS 5.0: Finely-Tuned Messaging

≪ Previous: The NBA of Data Science

I'm late!

The Advanced Message Processing System (AMPS) from 60East Technologies is used in production for thousands of enterprise messaging applications. These applications use AMPS because they have the most demanding throughput and latency requirements for publish/subscribe messaging. These applications also take advantage of the AMPS durable message storage, historical replay and audit, global replication and high availability, sophisticated aggregation and analytics, and more. The 5.0 release of AMPS adds durable message queues, built on the proven AMPS engine for the highest levels of reliability and performance.

To illustrate the performance of AMPS queues, our engineers ran a head-to-head comparison against RabbitMQ, the most popular message queueing product in use today. This document captures the performance testing results and compares features and functionality. While AMPS supports multi-messaging paradigms and provides extensive capabilities beyond message queueing, this paper focuses on the features of AMPS that are most relevant to solutions benefiting from message queueing. A majority of the AMPS user base deploys AMPS into enterprises that cannot tolerate message loss or low performance. AMPS Queues are durable by default and are able to achieve 40x the throughput of RabbitMQ with better durability and message delivery guarantees.

AMPS Queues are durable by default and are able to achieve 40x the throughput of RabbitMQ with better durability and message delivery guarantees.

Performance Comparison

In this comparison, durable queues were used in both AMPS and RabbitMQ. The AMPS test used the AMPS C++ client library, and the RabbitMQ test used the alanxz/rabbitmq-c library. For both products, the consumers were set to acknowledge consumption in batches of 80 messages. For RabbitMQ, 60East set the prefetch count for each subscriber to 100 messages in an attempt to achieve higher performance and reduce the time waiting for new messages to arrive. For AMPS, the maximum backlog of the subscription was set to 100 messages and the proportional distribution method was used in order to optimize for subscriber efficiency. This approach is the most compute-intensive distribution method for the server.

Test results were run in the 60East Engineering Lab. Both tests were run on the same system, a Supermicro SYS 1028U TNR4T+ with 2 Intel Xeon E5-2690 v3 processors (@ 12 cores each) and 128GB of memory. All tests were run over the loopback interface, with both publishers and subscribers on the same system.

Because AMPS is able to completely saturate the storage on most systems, our engineers ran the test persisting the queues to two different storage devices. The first test used a mid-range, enterprise-class, 7200RPM spinning drive for the queue persistence.

HDD Publish Performance

HDD Consume Performance

For all message sizes, AMPS consistently outperforms RabbitMQ by factors ranging from 2X - 15X better for both publishers and subscribers. More interesting than that, though, is that while AMPS performs better with smaller messages – as would be expected, since the storage device can handle more messages in the same amount of I/O bandwidth – RabbitMQ provides consistent performance regardless of the message size, which indicates that the performance is a result of the RabbitMQ implementation rather than I/O constraints.

Most production installations of AMPS are for applications that require the highest levels of throughput and lowest levels of latency. To test performance under these conditions, our engineers moved the persistence to an Intel DC P3700 NVMe PCIe solid state storage device.

SSD Publish Performance

SSD Consume Performance

As in the hard disk scenario, AMPS performed significantly better, with results 24x - 40x better for publishers and 16x - 30x better for subscribers. The most surprising results were that RabbitMQ seemed to gain little or no performance benefit from higher-bandwidth I/O. Notice that, for all tests, RabbitMQ provides consistent performance, regardless of the amount of traffic being delivered to the drive. AMPS, in contrast, fully uses the I/O capacity of the drive, and scales with the usage of that capacity.

There are two notable differences in performance. The first is the difference in magnitude. AMPS was faster in testing across the board. The second difference is the difference in device scalability. AMPS performance dramatically improved with a faster device, while RabbitMQ’s low performance remained constant, providing no material benefit from the upgraded storage device.

RabbitMQ's low performance remained constant, providing no material benefit from the upgraded storage device.

This behavior is a result of the difference in persistence guarantees between the products. AMPS fully persists messages before providing them to subscribers, which means that AMPS performance scales proportional to the bandwidth of the storage device. In contrast, RabbitMQ uses a strategy that batches calls to fsync [1]– which means that that RabbitMQ does not depend on I/O throughput and has constant performance regardless of the amount of bandwidth made available.

This explains the difference in scalability between RabbitMQ and AMPS. But that’s not the most striking result in the research.

Why are the differences in performance so large?

What’s Going On?

The difference is not a result of the test or the configuration. The difference is a result of how AMPS is engineered. AMPS was designed to achieve peak performance and scale on modern multi-core hardware. The software architecture consists of a super-pipelined processing engine that allows maximum concurrency to leverage every ounce of CPU to complete tasks. The core engine has been thoroughly NUMA optimized to ensure low jitter and effective use of memory bandwidth. Finally, the entire system has been coded with exacting attention to detail: every data structure, every cache line, every thread, every algorithm, every point where threads interact has been carefully considered to produce a finely tuned processing engine. The result is a system built for “forward scaling” and as new processors, memory technologies, storage devices, and networking devices come to market, the performance delta between RabbitMQ and AMPS will continue to increase. The differences don’t stop at performance. The feature rich queues are a story unto themselves.

Message Queues Reinvented

Through the looking glass, to a better world of queueing.

AMPS takes a completely different approach to message queuing than RabbitMQ. With AMPS queues, all messages are recorded into the AMPS transaction log, a durable, high-performance, fully-queryable sequential message store engineered to support applications that require sustained throughput of millions of messages a second. However, a queue is only one of the ways the AMPS engine can distribute a message. In effect, each queue is an independent view of the messages within the AMPS transaction log. As messages are delivered and consumed, AMPS simply tracks the state of that message in the view. The producer of a message needs no knowledge of either the distribution model or the consumer of the message. New queues, new consumers, and new applications can easily be added with no changes to the producer.

In addition to blazing-fast performance, AMPS provides other benefits:

Ultimate Routing Flexibility AMPS provides the ultimate flexibility in decoupling producers from consumers. The full power of AMPS topic-matching (including regular expressions in the topic) and content-filtering (a combination of XPath and SQL-92 which includes regular expressions, calculated expressions, and user-defined functions) is available for message routing. With AMPS queues, applications are no longer limited to topics, routing keys, or matching on simple key/value pairs to determine delivery. AMPS completely decouples publishers from the details of how the queues are arranged or the ways consumers will use messages. In fact, AMPS can populate queues from its transaction log, so messages can later appear in queues that did not exist at the time the message was published providing ultimate routing flexibility.

Multiple Distribution Models AMPS fully supports multiple distribution models, even for the same message. From a single publish, AMPS can provide the message to any number of pub/sub subscriptions, archive it for audit or back testing, aggregate it into a view, replicate it to a disaster recovery site and enqueue the message for distribution to workers in queues across multiple regions. The publisher does not need to know how AMPS or any of the consumers will use the message.

Aggregation and Complex Event Processing AMPS is content-aware and includes a sophisticated aggregation and event processing engine. AMPS allows the user to create a real-time aggregated view of the contents of the queue as messages are added to and consumed from the queue. With AMPS, the user’s view of the messages in the queue isn’t limited to queue depth or consumption rate. Aggregation provides true insight: for example, a monitoring application can track the total value of orders in the queue automatically, in real-time, and bring your key business metrics and risk into focus.

High-Performance Delivery and Fairness Models Message distribution in AMPS is optimized for high performance and processing efficiency. Traditional queues require consumers to poll the queue periodically, which adds latency and network overhead. AMPS works on a subscription model that provides messages to a consumer as soon as the message is available.

Consumers can declare a backlog, which sets the maximum number of unacknowledged messages provided to a consumer at a given time. In addition, consumers can concurrently acknowledge and process messages. This smart pipelining allows the consumer to run at full capacity without blocking to wait for the next message from the queue.

Delivery fairness models allow the user to tune each individual queue for lowest overall latency, equal balance across processors, or most efficient use of client resources. Fairness is applied across all consumers for each message, and AMPS manages the consumer backlog according to the queue fairness model. Unlike systems that allow a consumer that requests a large batch size to starve other processors, AMPS distributes each message according to the fairness model. This helps ensure that work reaches each consumer, and prevents a single “greedy” consumer from consuming all of the available messages, thus slowing down the overall message processing system.

Many existing queuing products present an interface that is similar to AMPS by polling in a background thread and allowing a client to retrieve multiple messages at the same time. AMPS provides the programming advantages of that model, while solving the underlying problems in a unique way that provides dramatically better performance.

Distributed Queuing and High Availability The AMPS high-availability features fully support distributed queueing using AMPS replication. Replication in AMPS is fault-tolerant and resilient even over WAN connections or if one of the instances is restarted. AMPS uses a unique, patent-pending method to manage message distribution and distributed queueing without partitioning. AMPS instances automatically use remote instances to absorb processing load when processing on a local instance is unable to keep up with the rate at which messages are enqueued. If a replicated instance goes offline and comes back online, AMPS replication automatically synchronizes messages between that instance and its replication peers. With AMPS, there’s no need to trade-off between highly-scalable and highly-available. AMPS provides both in the same deployment.

Content-Based Entitlement Control The AMPS entitlement system is built from the ground up to be content-aware, for precise control over entitlements. The entitlement system provides the ability to grant permissions to both publishers and subscribers based both on the topic and the content of the message, using the full power of AMPS expressions (including custom functions) to specify the content to which a producer or consumer is entitled.

Beyond Queueing

AMPS is built from the ground up to be a complete platform for data-intensive applications that demand the highest levels of performance and throughput. In addition to message queues, AMPS provides other unique capabilities which make it easy to build high-performance applications:

State of the World Database AMPS state of the world databases provide quick access to the current value of each distinct message on a topic. For pub/sub systems, this capability provides a way for a subscriber to quickly retrieve current values. The state of the world database provides full query capability, and also offers optional key/value index for retrieval times that meet or exceed popular NoSQL key/value store databases. AMPS also provides the ability to atomically retrieve current state and enter a subscription for updates, ensuring no duplication or message loss.

Message Recording and Replay As mentioned earlier, the AMPS transaction log is fully queryable. Applications can start a subscription from any point in the transaction log. AMPS replays messages from the log, in the original order. Once replay is complete, the subscriber receives any new messages that arrive. AMPS allows subscribers to pause and resume replay, as well as set a maximum rate for the replay.

AMPS provides even more advanced features such as Out of Focus notifications for detecting when a message is no longer relevant, incremental (delta) updates to messages, and more.

If you’re ready to explore the diverse features of the fastest messaging product, please try an evaluation of AMPS, available from http://www.crankuptheamps.com/evaluate.

60East provides the full details of the test and the code we used at the github repository at https://github.com/60East/comparison-rabbitmq.

Just to keep things fun, we’ll send a free Version 5.0 T-shirt to the first 50 people that download AMPS 5.0 and email the full version number to support@crankuptheamps.com with their mailing address and shirt size.

[1] As described at https://www.rabbitmq.com/confirms.html, "The RabbitMQ message store persists messages to disk in batches after an interval (a few hundred milliseconds) to minimise the number of fsync(2) calls, or when a queue is idle. This means that under a constant load, latency for basic.ack can reach a few hundred milliseconds." Further, applications must be aware of this. After a restart or failover, when consuming messages from a durable queue "... the client could reasonably assume that the message will be delivered again. This is not the case: the restart has caused the broker to lose the message. In order to guarantee persistence, a client should use confirms."

Copyright © 2016 All rights reserved. 60East, AMPS, and Advanced Message Processing System are trademarks of 60East Technologies, Inc. RabbitMQ is a trademark of Pivotal Software, Inc. All other trademarks are the property of their respective owners.

↧

AMPS 5.0: Finely-Tuned Messaging

March 13, 2016, 5:00 pm

≫ Next: bFAT: Messaging with Extra Cheese

≪ Previous: Comparison: AMPS and RabbitMQ

AMPS 5.0 is now available - image by mitch huang

AMPS 5.0 is now available.

This version of AMPS builds on the technology in previous releases to refine existing features and bring all new capability to AMPS.

With this release, AMPS provides extremely high performancepersistent message queues. The message queues include a variety of fairness models, and include an innovative approach to message delivery that eliminates polling and keeps processors running at full capacity while there is work to do. Message queues work seamlessly with the AMPS entitlement system. Replication is queue-aware, and AMPS includes a patent-pending method for distributed queueing while maintaining the highest levels of performance and strict delivery guarantees.

AMPS 5.0 also includes:

Replication Validation. To help troubleshoot replication configuration, AMPS now confirms that replication connections reach the expected instances and that those instances have the expected configuration.
Google Protocol Buffers. AMPS now provides full support for Google Protocol Buffers, including content filtering and delta messaging.
Rate Control on Bookmark Replay Subscriptions. When replaying messages from the transaction log, subscribers can now specify a maximum rate for bookmark replay.
Pause and Resume on Bookmark Subscriptions. Subscribers can now pause subscriptions and resume them at a later time.
Improved Slow Client Protection. Slow client protection now provides per-instance, per-transport, and per-client limits. Self-tuning defaults make configuration simpler and make it less likely for a large query to trigger slow client protection.
Usability and Performance Improvements. AMPS 5.0 includes many other improvements that both increase performance and make AMPS easier than ever to configure and use.

AMPS 5.0 is available now. Download an evaluation copy today, and let us know how the new features work for your applications!

↧

bFAT: Messaging with Extra Cheese

April 1, 2016, 12:00 am

≫ Next: Keeping State in AMPS, Rebooted

≪ Previous: AMPS 5.0: Finely-Tuned Messaging

bFat is like a thick, juicy hamburger for your messaging apps. image by Jeffrey Bary -- CC BY 2.0 Here at 60East, we’re dedicated to messaging.

All messaging. All the time. Day in and day out.

And that’s why we’re thrilled to bring you the newest, best, hottest, juiciest new message format.

We call it BFat. Check it out at bfat.io.

Would you like fries with that?

↧

Keeping State in AMPS, Rebooted

June 14, 2016, 5:00 pm

≫ Next: Try AMPS NOW With Cloud Evaluation Beta

≪ Previous: bFAT: Messaging with Extra Cheese

keep track of where your subscription is at with this simple bookmark store -- image by Simon Cocks -- CC BY 2.0 In this post, we will revisit the topic of extending the AMPS client to provide a bookmark store for stateless clients. This post is in response to requests for a simple stateless store that does not provide all of the functionality of the local stores, but instead just makes it possible for an application with very simple needs to pick up a subscription after failover. The implementation that we will discuss here is fairly limited, but should provide a starting point for less restrictive implementations. Before we begin, I would like to encourage you to read No Filesystem? No Problem! Keeping State in AMPS in order to gain a deeper understanding of bookmark stores.

Maintaining a Bookmark Store in AMPS

This implementation of the bookmark store will do two things:

Load state from AMPS during subscription or re-subscription.
Update the bookmark store as messages are processed by the client.

These two operations maintain the point at which a particular subscription should resume on recovery.

As with the bookmark store in the previous post, we can avoid state on the client by updating a State-of-the-World(SOW) topic with the bookmark of a message when processed by a subscriber. It is worth noting that the instance that contains the store SOW topic does not need to be the instance that the subscription is placed on.

Restrictions of this Implementation

As mentioned before, the implementation discussed in this post will be quite limited. The following restrictions will apply to this store:

Bookmark Live subscriptions are not supported.
Messages are required to be discarded in the exact order that they are received.
Replicated topics should not be used with this store.
You must use this store with an HAClient.
The client provided to the bookmark store MUST NOT be the client placing the subscription.

Bookmark Store Messages

The messages that will be sent to the SOW bookmark store are of the following form:

{"clientName":"trackedClient","subId":"1","bookmark":"13948331409633568391|1465596069899000013|"}

For this example we are using JSON, but any message type can be used. For a simple bookmark store, we only need these 3 pieces of information because AMPS enforces the following 3 rules:

The client name must be unique, as is the case anytime you are using a transaction log.
A subId must be unique to each client. The same subId can be used with different clients. If no subId is provided to AMPS, one is automatically assigned.
The bookmark corresponds to a unique message in the transaction log. The bookmark value that we record in the bookmark SOW represents the last message processed by the subscriber.

A more complex bookmark store may require message fields to indicate if a message has been discarded, or if a persisted ack has been sent. Since we have designed this bookmark store for a very limited use case, we don’t need to worry about that here.

Now that we know what our messages will look like, it’s time to configure AMPS!

Configuring AMPS

The SOW needs to be configured so that we only have one record per clientName and subId pair. We do this by making these our Key values.

<SOW><TopicDefinition><Topic>/sample/bookmarkStore</Topic><FileName>./data/sow/bookmark.sow</FileName><MessageType>json</MessageType><Key>/clientName</Key><Key>/subId</Key></TopicDefinition></SOW>

Since we are using a JSON message type, the server must have a transport configured that is able to accept JSON messages. Note that just because we are using JSON messages for the bookmark store does not mean that the message type must be JSON for the bookmark subscription.

Working with the Bookmark Store Interface

Unlike the previous blog article, we will not be calling method implementations used in other stores. Instead, we will define each method in the sow_bookmark_store class. The following methods will be implemented:

set_server_version(version) Internally used to set the server version so that the store knows how to deal with persisted acks and calls to get_most_recent(subid).
- Though this method is required, we will not be using it because this implementation requires that messages be processed in-order. As such, acks will not be processed.
get_most_recent(subid) returns the most recent bookmark from the store. This bookmark should be used for (re-)subscriptions.
is_discarded(message) is called for each arriving message to determine if the application has already seen this bookmark. If it has, then the message should not be reprocessed.
- Since we are requiring that messages are processed in-order, and this store does not provide any duplicate detection, this will always return False.
log(message) is used to log a bookmark into the store and return the bookmark sequence number. The bookmark sequence number is the internal location where the store recorded the bookmark for this message.
- Since we will only ever have one record per clientName and subId pair, the sequence number does not matter. Thus, we will always return 1.
persisted(subid, bookmark) marks all bookmarks up to the provided one as replicated to all replication destinations for the give subscription.
- This is only used for Bookmark Live subscriptions and Replication. Since our sample bookmark store will support neither, we can leave this unimplemented.
discard_message(message) marks a message as seen by the subscriber.
discard(subid, seqnumber) is deprecated, so we will not be implementing it.

Defining the sow_bookmark_store Class

Now we will be implementing the class for the SOW bookmark store. The bulk of the work will be done when we initialize the class. The init method will take 3 arguments in the following order:

bookmark_client: This is the client that will become our internal client for the bookmark store. It must be connect and logged on.
topic: The SOW topic defined in the config that will function as our bookmark store.
tracked_client_name: The name of the client whose bookmarks this store manages.

Again, the bookmark_client does not need to be connected to to same AMPS server as the subscriptions being tracked. The client corresponding to the tracked_client_name must be an HAClient.

classsow_bookmark_store(object):def__init__(self,bookmark_client,topic,tracked_client_name):""" Class for creating and managing a SOW bookmark store        :param bookmark_client: The client that will become the bookmark store internal client        :type bookmark_client: AMPS.HAClient        :param topic: The SOW topic defined in the config that will be used for the bookmark store.        :type topic: string        :param tracked_client_name: The name of the client whos bookmarks we will be storing.        :type tracked_client_name: string        :raises AMPS.AMPSException: if the internal client fails to query the SOW."""self._internalClient=bookmark_clientself._trackedName=tracked_client_nameself._topic=topicself._mostRecentBookmark={}try:formessageinself._internalClient.sow(self._topic,"/clientName = '%s'"%self._trackedName):ifmessage.get_command()!='sow':continuedata=message.get_data()bookmark_data=json.loads(data)if'bookmark'inbookmark_dataand'subId'inbookmark_data:self._mostRecentBookmark[bookmark_data['subId']]=bookmark_data['bookmark']exceptAMPS.AMPSExceptionasaex:raiseAMPS.AMPSException("Error reading bookmark store",aex)

The __init__ method is responsible for getting the most recent bookmark for all subIds corresponding to the tracked_client_name. This operation is performed in __init__ as opposed to get_most_recent(subid) as a performance enhancement. Instead of issuing a SOW query for each subId, we can issue one SOW query and create a dictionary from the results. Doing the work in __init__ also allows us to throw an exception if the _internalClient is not able to reach the server.

Subscribing and Recovering

Upon subscribing or recovering, get_most_recent(subid) will be called. This method is responsible for returning the last bookmark processed for the corresponding subscription identifier.

defget_most_recent(self,subid):""" Returns the most recent bookmark from the store that ought to be used for (re-)subscriptions.        :param subid: The id of the subscription to check.        :type subid: string        :returns: mostRecentBookmark[subid] or '0'"""# if we have a most recent value for that subId, then we'll return it# if not, we return EPOCHifsubidinself._mostRecentBookmark:returnself._mostRecentBookmark[subid]else:return'0'

This method will simply check the _mostRecentBookmark dictionary for the subid. If we find it, we will return the bookmark stored in the dictionary. If we do not find that key, then we assume that this is a brand new subscription. As such, we return EPOCH.

Publishing to the Bookmark Store

Since one of the requirements is that the message will be processed in the order that they are received, we can tell the bookmark store that a message was processed by the subscriber any time discard(message) is called. To mark a message as processed, all we need to do it publish the bookmark of that message to the store using the message format mentioned above.

defdiscard_message(self,message):""" Mark a message as seen by the application.        :param message: The message to mark as seen.        :type message: AMPS.Message        :raises AMPS.AMPSException: if the internal client cannot publish to the server."""subid=message.get_sub_id()bookmark=message.get_bookmark()ifbookmarkisNoneorsubidisNone:returnmsg='{"clientName": "%s", "subId": "%s", "bookmark": "%s"}'%(self._trackedName,subid,bookmark)try:self._internalClient.publish(self._topic,msg)self._mostRecentBookmark[subid]=bookmarkexceptAMPS.AMPSExceptionasaex:raiseAMPS.AMPSException("Error updating bookmark store",aex)

The subscription identifier and bookmark for the message being passed to this method can be retrieved by calling get_sub_id() and get_bookmark(), respectively, on the message object. You will notice that this method is called discard_message(message) not discard(message), but the client will still call discard(message). Before publishing, we check that subid and bookmark are set, this is to prevent messages from entering our bookmark store that should not. For example, a subscriber calling discard on every message that it sees could update the bookmark store with a message that does not contain a bookmark: if we were to save that message to the store, we might store an empty or invalid bookmark and recover from the wrong point.

The log Method

For bookmark stores that will support out-of-order message processing, log(message) is responsible for assigning a sequence number to a bookmark, then publishing this information to the bookmark store. The method then returns the sequence number that it assigned to the message.

deflog(self,message):""" Log a bookmark to the store.        :param message: The message to log in the store        :type message: AMPS.Message        :returns: The corresponding bookmark sequence number for this bookmark."""# since we only ever have one SOW record per _trackedName and subId pair, this can# always return '1'return'1'

In this bookmark store, we are requiring that messages are processed in-order. Since this is the case, we need not need to assign a unique sequence number to each message. Instead, we can return ‘1’ since there will be at most one message for each _trackedName and subscription identifier pair.

Checking if a Message is Discarded

In a bookmark store that supports out-of-order message processing, is_discarded(message) will return a boolean value indicating if a message has been discarded or not. During replay, AMPS checks if a message is discarded before delivering it. This is to prevent messages that have already be processed by the application from being delivered again.

defis_discarded(self,message):""" Called for each arriving message to determine if the application has already seen this bookmark and        should not be reprocessed. Returns 'true' if the bookmark should not be re-processed, false otherwise.        :param message: The message to check        :type message: AMPS.Message        :returns: True or False"""# since messages are being processed in order, we never see a discarded message.returnFalse

As mentioned before, the bookmark store that we are building is designed to process messages in the order that they are received. As such, we will never have a situation where a message that is discarded will be sent to a subscriber. This being the case, we can always return False.

This concludes the methods that will need to be implemented. However there are 2 more methods that will need to exist.

Unimplemented Methods

The first of these methods is set_server_version(version). This method is used to tell AMPS how to handle persisted acks. Based on the in-order processing of messages, we will not need to concern ourselves with acking. For this reason we can leave this method unimplemented.

defset_server_version(self,version):""" Internally used to set the server version so the store knows how to deal            with persisted acks and calls to get_most_recent().        :param version: The version of the server being used.        :type version: int"""pass

The next method is persisted(subid, bookmark). This method is used to mark all bookmarks prior to the provided one as persisted. Persisted acks are necessary for Bookmark Live subscriptions, but our implementation will not support this functionality. With that in mind, we do not need to concern ourselves with implementing this method.

defpersisted(self,subid,bookmark):""" Mark all bookmarks up to the provided one as replicated to all replication destinations        for the given subscription.        :param subid: The subscription Id to which to bookmark applies        :type subid: string        :param bookmark: The most recent bookmark replicated everywhere.        :type bookmark: string"""# Bookmark Live and Replication are not supported, so this does nothing.pass

It is also worth noting here that there was previously an option to discard based on subscription identifier and sequence number. This method has been deprecated and should not be implemented.

Using the Store

To use the bookmark store we will need to follow these simple steps:

Create a client for the bookmark store to use. This client must be connected and logged on to the instance that contains the SOW topic for the bookmark store. This need not be the instance that contains the topic the application will subscribe to.
Construct an HAClient for your application to use.
Set the bookmark store for the HAClient.
Call discard(message) on each message when your application is done with it.

classhandler:def__init__(self,client):self._client=clientdef__call__(self,message):printmessage.get_data()self._client.discard(message)bkmrkchooser=AMPS.DefaultServerChooser()bkmrkchooser.add("tcp://localhost:9007/amps/json")bkmrkclient=AMPS.HAClient("bkmrk")bkmrkclient.set_server_chooser(bkmrkchooser)bkmrkclient.connect_and_logon()chooser=AMPS.DefaultServerChooser()chooser.add("tcp://localhost:9007/amps/json")haclient=AMPS.HAClient("haclient")haclient.set_bookmark_store(sow_bookmark_store(bkmrkclient,"/sample/bookmarkStore","haclient"))haclient.set_server_chooser(chooser)haclient.connect_and_logon()haclient.bookmark_subscribe(handler(haclient),"orders","recent")

Every time haclient processes a message via the handler class, self._client.discard(message) will be called. This will keep the bookmark store up-to-date with the messages processed by the subscriber.

Closing Thoughts

Most bookmark stores protect against duplicate message delivery. The store that we created in this post does not, and it can cause duplicate messages to be sent to your application on recovery. One way this can happen is when the AMPS instance containing the bookmark store becomes unavailable. This would result in the store implementation being unable to update the SOW that contains the bookmarks.

If, at this point, the subscriber were to restart, the store would be one message behind. If more than one message has been processed before the subscriber restarts, the store would be further behind.

With that in mind, if your application can tolerate duplicate messages, then this simple implementation should work for you!

Get the code

The bookmark store implementation, a configuration file that includes the SOW configuration, and a simple sample program can be downloaded here.

↧

Try AMPS NOW With Cloud Evaluation Beta

July 21, 2016, 5:00 pm

≫ Next: AMPS 5.2: More Power

≪ Previous: Keeping State in AMPS, Rebooted

Power of the CLOUD

The Advanced Message Processing System (AMPS) from 60East Technologies is a state-of-the-art technology that powers up many of the Fortune 500 companies. Developers who tried AMPS love it and use it in their products.

But is there an easy way to test features of AMPS without having access to a linux machine and installing AMPS Server? There is NOW! We introduce a new quick and convenient way to evaluate AMPS: Cloud Evaluation Dashboard. We’ll take care of the Linux server hosting let you start trying AMPS in your products right away.

Cloud Evaluation Dashboard

Once you have signed up for the Cloud Evaluation Dashboard account and gotten an email from us containing the link to your dashboard, you are just 4 steps away from being amazed by AMPS features and its blazing-fast perfomance.

Step 1: Download Client files

Our first step is the Downloads page. Here you can select and download AMPS client libraries to use in your code:

Downloads Page

We offer evaluation kits for the AMPS client libraries for the most popular programming languages:

C#
Java
C++
Python

and operating systems:

Windows
Linux

Don’t see your favorite combination of language and operating system in the prebuilt evaluation kits? Visit our Developer Center for the full selection of clients!

You have your client files downloaded? Don’t worry about unzipping and setting them up yet. Let’s proceed to the next step!

Step 2: Turn ON your AMPS Instance

Your personal cloud instance of AMPS Server is waiting for you! Hit the Turn ON button and let us do all the installation and configuration of the AMPS Server for you. It can take a few minutes for the virtual machine to be provisioned and start running, so be patient. You’re almost ready to crank it up!

You can even browse the dashboard pages while it’s starting. Within a few minutes the instance will be up and running:

Instance Controls Instance: Time Left

This page also serves as the control center of your cloud experience. From here you can start, stop, or reset your instance, and see different statistics about perfomance and hardware. This is your personal instance of AMPS in the cloud – this instance is not shared with others.

Instance: Stats Instance: Hardware

The instance is ready? Let’s move to the next step!

Step 3: Configure your Client Libraries

Once you have the AMPS Server instance up and running, you will have all the pieces required to set up the client libraries downloaded at the Step 1. Navigate to the Configure page in order to get personal instructions on how to setup AMPS client libraries according to your preferred language. The IP of your cloud instance is already in the instructions, so there’s no need to go back and forth and copy-and-paste it:

Configure Client

However, since the instance changes its IP address every time it starts/stops, you need to update IP address in the client settings. The Configure page will always contain up-to-date settings.

Configure Client: Repeat

At this point you are ready to start playing with AMPS! Proceed to the next step!

Step 4: Get a Taste of AMPS features

We’ve prepared simple and clear examples of a few of the main features of AMPS:

Try AMPS: Functionality

On these pages you’ll find quick introduction to a feature and code examples that are ready to be copied and pasted in the code:

Try AMPS: Code Example

Code examples are available in all supported languages and are tailored to work with your AMPS instance and update their content automatically.

Further Steps

After you’ve gotten a sample of AMPS with the Cloud AMPS Server, the next step is to try AMPS on your own hardware.

The table below demonstrates main differences between the Cloud and the Local Evaluation modes:

Cloud Instance	Local Instance
Pros: Easy to start Fully configured Runs on our servers Works well with the code examples	Pros: No limits of use within the evaluation period (14 days) Instance location controlled by you Runs on your hardware Can have any configuration, including replication
Cons: Sessions are limited to 3 hours IP address is dynamic Can't change AMPS configuration Hardware is limited Network latency to the cloud	Cons: Linux server is required Writing an AMPS configuration file requires additional knowledge/support Hardware and system maintenance

Want to know more about how AMPS can help you build great applications? Still having trouble getting AMPS to play your tune? For help or questions, send us a note at support@crankuptheamps.com.

AMPS

↧

AMPS 5.2: More Power

March 7, 2017, 4:00 pm

≫ Next: Get more AMPS with Galvanometer

≪ Previous: Try AMPS NOW With Cloud Evaluation Beta

AMPS 5.2 is now available

AMPS 5.2 is now available.

This release of AMPS includes new features designed to help manage extremely complex, high-volume data flows that require data transformation in the AMPS server — while maintaining the performance and ease of use that AMPS is known for. The release also includes a set of features designed to make AMPS easier to configure and administer, and a wide variety of other usability and performance improvements.

The new functionality in AMPS 5.2 includes:

Inline Message Enrichment. AMPS can now modify and enrich messages as they are published, without requiring that a separate view be configured. When a message is enriched, AMPS perists the enriched version of the message into the transaction log and topic SOW file.
Aggregated Subscriptions. A subscriber can now request that aggregation and analytics occur for an individual subscription, without requiring that a view be defined in the AMPS configuration file.
Conflated Subscriptions. A subscriber can now request that conflation occur for an individual subscription, without requiring that a conflated topic be defined in the AMPS configuration file.
Monitoring Interface. AMPS includes an enhanced monitoring interface, the Galvanometer, that displays information about a set of instances in an easy-to-understand graphical format.
SSL Support. The AMPS server now fully supports SSL/TLS connections.
Order Chaining. AMPS now includes the ability to update SOW records with chained, or hierarchical, SOW keys. This is most commonly used with FIX order chaining, but can also be used any time a hierarchical document structure should update a single record in the SOW.
Expanded Set of Functions. This release of AMPS includes more functions for working with data. Expanded functions include string construction functions such as CONCAT(), numeric functions such as ROUND(), and aggregation functions such as STDDEV_POP() and STDDEV_SAMP().
Configuration File Simplification. AMPS now supports the ability to easily compose configuration files from a library of common definitions with the new Include configuration file directive.
File System Threshold Actions. The AMPS action set now includes the ability to conditionally run actions based on the capacity of the file system, to provide finer-grained control for instances with limited filesystem capacity.
Local Queues. AMPS 5.2 adds the ability to explicitly declare that a queue is maintained only on the local instance, even in cases where the data in the underlying topics for the queue is replicated.
Performance Improvements. AMPS 5.2 includes many other improvements that both increase performance and make AMPS easier than ever to configure and use.

AMPS 5.2 is available now. Download it, or start an evaluation today, and let us know how the new features work for your applications!

↧

Get more AMPS with Galvanometer

March 12, 2017, 5:00 pm

≫ Next: Introducing the AMPS JavaScript Client

≪ Previous: AMPS 5.2: More Power

Galvanometer: Get MORE AMPS!

Every system needs control, and AMPS is no exception. We already have a pretty powerful and flexible Admin module that provides various information about the AMPS instance and the host system, in several formats such as XML, JSON, CSV, and plain text. It is very convenient for applications, scripts, services… but what about humans? We, the people, prefer information processed and visualized. Among many other cool features that AMPS 5.2 introduced, this one is literally very easy to notice. Ladies and gentlemen, allow me to introduce the new admin interface: Galvanometer!

What is Galvanometer?

Galvanometer is the new graphical AMPS admin interface. It is included and enabled by default in all AMPS instances starting from 5.2. It doesn’t require any additional installation steps, simply go to the admin address and it’s already there! Don’t worry, the classic admin module is there as well, and will always be.

Some of the features of Galvanometer:

Live Instance stats, such as Messaging, Clients, Lifetime, and so forth;
Live Host stats, such as Memory, CPU utilization, Networking, Storage;
Time Machine;
Built-in SOW/Subscribe functionality (special thanks to our new JavaScript client);
Details about Views, Topics, SOW;
Replication monitoring;
Transaction Log monitoring;
Graph Builder.

Are you excited to take a look? Let’s do it right now!

Know your Instance

Instance is the main page of Galvanometer. It visualizes and updates in real time the most important information about a running AMPS instance:

Galvanometer Instance Page

On this page you can find:

The Lifetime widget, a bar that represents the life span of the instance, including shutdown periods, occured minidumps, etc;
The Messaging widget that visualizes the messaging stream of the instance. In real time you can see how many messages are being processed by AMPS;
The Clients table with the list of connected clients and their status and activity. If you click on a listed client you will be able to see its contribution to the total messaging stream on the Messaging widget. Double click or a click on the Details link will provide full information about the client, including its subscriptions, messaging, etc:

Galvanometer: Client Details

Host: Hardware Monitoring

It is important to know not only how and what the AMPS instance is doing, but its Host as well. Galvanometer has a dedicated page to monitor the host, including it’s CPU cores utilization, memory, storage and networking:

Galvanometer: Host Page

All these widgets refresh automatically in real time.

Great Scott! Time Machine!

Ever wonder how many messages AMPS was processing 30 minutes ago? Do you want to see how the CPU was utilized during the holiday season? All it takes is to travel back in time and see with your own eyes… But it is impossible without a time machine, you might say. Not for us! Our engineers hacked time itself and now we proudly present a working Time Machine that is shipped with every Galvanometer:

Galvanometer: Time Machine Widget

Select the date and time you want to travel to and hit Go!. Time travel won’t take long and soon you’ll see this the same page, but in the past. It’s even paused so you can take your time and look around. Once you’re adjusted to the past, hit the Play button again to resume the time flow. Be careful: you’re still in past!

Galvanometer: Time Machine Mode

Currently, our time machine only works with AMPS and Galvanometer, stay tuned for updates!

Wouldn’t it be nice to be able to subscribe to regular topics and query SOW topics and see the the results right away without leaving Galvanometer? We thought so too, and that’s why every Galvanometer contains a built-in AMPS client, allowing users to get real time information about the data flow:

Galvanometer: Host Page

This uses our new JavaScript Client: stay tuned for more details in a future blog post!

Replication: See the Big Picture

Replication allows to build a distributed yet synchronized system of AMPS instances. Now it is very easy to visualize your replication fabric and get information about data flow between replicated instances!

Galvanometer provides three ways of representing replication:

Chorded replication Graph;
Replication Matrix Table;
Force-directed replication Graph. Chorded GraphReplication MatrixForce-directed Graph

Fine-grained Stats with the Graph Builder

Graph Builder is a nice tool that allows you to watch and monitor a particular metric through time. Let’s say you want to see how AMPS was consuming memory for last twenty minutes, or during peak hours last Thurdsay.

It’s never been easier - just search for a metric, pick one from the dropdown menu, select a time range, and voilà:

Galvanometer: Graph Builder

The generated graph can be saved as an image or a PDF file.

Try it today!

Galvanometer is available and enabled by default in the newly released AMPS 5.2. It has a metric ton of cool features — give it a try! Please share your feedback and comments with us. Let us know how the Galvanometer is making your life easier :)

AMPS

↧

Introducing the AMPS JavaScript Client

March 14, 2017, 5:00 pm

≫ Next: Time-Based Triggers

≪ Previous: Get more AMPS with Galvanometer

AMPS is a very robust system due to its amazing performance, flexibility, and reliability. 60East already provides client libraries for C/C++, C#, Java, and Python, and today we introduce the first version of our official JavaScript client that will power up both modern front end web applications and Node.js-backed back end applications.

Features available in this version of the JavaScript client:

Minimal external dependencies (WebSocket and Promise)
A fully asynchronous Promise-based interface
Convenience methods for common commands, such as publish, subscribe, sow, deltaSubscribe
A heartbeat mode for quickly detecting connection failures
A Command interface for fine-grained control over commands and their options
Support of enterprise authentication systems, such as Kerberos, NTLM, and Basic Auth
Support of Authenticators for custom authentication scenarios

Let’s take a closer look and see it in action!

Easy to Get, Easy to Use

The client is designed and built to have as few dependencies as possible. It is compatible with both Browser and Node.js environments.

For example, in order to use it in the browser environment, here’s all it takes to get a working AMPS client in your web application:

<!-- Optional import support for obsolete browsers --><script src="es6-promise.min.js"></script><script src="amps.js"></script>

The distribution already contains ES6-Promise polyfill to support obsolete browsers.

Born Asynchronous to Live Free (From Blocking)

The client adopts the JavaScript style and design approach in order to deliver the most intuitive, fully asynchronous interface for JavaScript developers. The client is very easy to use due to heavy use of the Promise feature.

Promises are objects that encapsulate values that may be available now, in the future, or never. In the asynchronous world of JavaScript this is a convenient way of structuring and organizing actions that may take an uncertain amount of time to execute.

To demonstrate the idea of the Promise, check out how the client connects to the AMPS server:

letclient=newamps.Client()client.connect('wss://fortune500company.com:9100/amps/json').then(()=>client.publish('test-topic',{id:7})).catch(err=>console.error('Connection Error: ',err))

In code examples, we use the new JavaScript ES6 syntax

Even though the above code is executed in the non-blocking asynchronous mode, the syntax is very compact and resembles the traditional synchronous style, thanks to Promises!

Get Used to Convenience

The client includes a full set of convenience methods that make common tasks easier to do. Here’s how we can subscribe to topics, and then publish to them using subscribe() and publish():

letonMessage=(message)=>console.log(message.data)letclient=newamps.Client()client.connect('wss://fortune500company.com:9100/amps/json')// connected, subscribe for the first topic.then(()=>client.subscribe(onMessage,'orders','/qty > 0'))// second subscription.then(()=>client.subscribe(onMessage,'reservations'))// third subscription.then(()=>client.subscribe(onMessage,'notifications'))// now we can publish messages to these topics.then(()=>{client.publish('notifications',{note:'Ordered Tesla 3'})client.publish('orders',{order:'Tesla 3',qty:10})client.publish('reservations',{res:'Tesla 3',qty:10})})// if any subscription failed, the chain will end up here.catch(console.error)

In the example above we subscribe to three different topics, and make the subscriptions sequentially. That is, we’re waiting for the first subscription to be processed, then the second and finally the third. However, we can make our application a bit faster by parallelizing this process:

client.connect('wss://fortune500company.com:9100/amps/json')// connected, subscribe to topics in parallel.then(()=>Promise.all([client.subscribe(onMessage,'orders','/qty > 0'),client.subscribe(onMessage,'reservations'),client.subscribe(onMessage,'notifications')]))// subscribed to all topics at this point.then(subscriptionIds=>{console.log('ids of subscriptions: ',subscriptionIds)client.publish('notifications',{note:'Ordered Tesla 3'})client.publish('orders',{order:'Tesla 3',qty:10})client.publish('reservations',{res:'Tesla 3',qty:10})})

All convenience methods return a Promise object, which allows us to chain and parallelize them. The publish() method is an exception, since publish() does not wait for confirmation of processing from the server by default. (It is possible to confirm the publish by making a custom Command object.)

I Measure the Moment in the Heartbeats I Skip

The heartbeat feature is a quick and reliable way of detecting connection failures. It’s simple to set up, and convenient to use:

letonError=err=>console.log('Heartbeat: ',err)letclient=newamps.Client().heartbeat(5)// 5 seconds between beats.errorHandler(onError)

In the above example the AMPS server will publish periodic heartbeat messages every 5 seconds to the client and will expect the client to respond with a heartbeat message. If the client does not provide a heartbeat within the time specified, the server logs an error and disconnects the connection. The client will report the heartbeat error to its dedicated error handler. The heartbeat() command can be used again in order to refresh the timer and/or change the heartbeat period.

Command Interface: You Have Full Control

The client includes a low-level interface for constructing AMPS commands, the Command interface. Compared to the convenience methods, Commands have the full range of options and controls to set:

command name
message hander
options, such as ackType, orderBy, bookmark and many more
flags (a comma separated list of values available to a command)

Here’s the example of a publish command that confirms the processing of a published message by the server:

letclient=newamps.Client()client.connect('wss://fortune500company.com:9100/amps/json')// connected, let's publish a message with confirmation.then(()=>{letpublishCommand=newamps.Command('publish').topic('orders').data({order:'Tesla 3',qty:10}).ackType('persisted')returnclient.execute(publishCommand,ack=>{console.log('message persisted: ',ack)client.disconnect()})})// connection or command execution error.catch(console.error)

Try it Today!

The new JavaScript client is available starting today for all our customers. Get the latest version from the Downloads page. The API reference and the quick-start guide are available here.

It’s Not What JavaScript Can Do for AMPS, It’s What AMPS Can Do for JavaScript

Behind the scenes, AMPS offers rich features to create the most flexible, scalable and responsive view servers. By combining database, messaging and aggregation technology into one integrated engine, one can reduce data movement and exploit concurrency. More importantly, this design allows us to offer extremely valuable capabilities. For example, imagine being able to populate your GUIs with a query to the AMPS State of the World database and subscribing to the real time message flow in one single atomic step!

Here are just some of the other ways that AMPS amplifies your JavaScript systems:

The power of Content Filtering. AMPS allows you to flexibly reduce the message flow to clients based on the content of each message - mitigating network congestion when scaling up to thousands of users or more and reducing the processing burden on each client at the same time;
With Delta Messaging, your applications can publish or subscribe to only the parts of a message with changed data;
With Out of Focus Notification, applications receive notification when a previously-received message is no longer relevant to a subscription;
With Aggregations with Joins, AMPS offers real time aggregation of streams of data, including the ability to join and aggregate streams of different message types;
A powerful Replay Engine for back testing and rapid recovery.

Please share your feedback and comments with us. We also want to express our gratitude for those customers who worked with us while the JavaScript client was in beta testing stages. Who knew that there were that many browsers to test on??? :)

Want to see the JavaScript client in action now, right from your browser? Check out the Basketball Replay Demo and the related blog post

↧

Time-Based Triggers

March 20, 2017, 5:00 pm

≫ Next: Identifiers, Changes and Chains

≪ Previous: Introducing the AMPS JavaScript Client

robot maid

Imagine, if you will, the following scenario. You have an AMPS SOW topic representing the state of orders in your business. You have a well tuned AMPS configuration, you have optimized your client applications and your network is at optimal capacity. You are processing millions of messages and everybody seems happy.

But you have discovered a nefarious problem. A consuming client is behind and there are orders that are not getting processed! Orders come into AMPS, but never leave the “Pending” state. Seconds go by and they still don’t update. They are stuck, like that last bit of jam that won’t come out of the jar. You don’t know which client is the culprit and you can’t keep track of which messages are being left behind!

What do you do? Hire a nanny to sit and watch your orders? Of course not!

You build a robot nanny!

AMPS to the rescue!

Robot Nanny

Action On Message Condition Timeout

Let me introduce you to one of the new, simple but powerful, actions included in AMPS 5.2. Action on Message Condition Timeout allows AMPS to run an action when a message in a SOW topic meets a specific condition for longer than a specified period of time.

This module uses the Out-of-Focus notification (OOF) mechanism to do its magic. When a message matches the specified topic and filter, the module begins tracking that message. If no OOF notification is received for that message within the specified timeout, the action runs for that message.

Back to our story, how do you save the day?

You use this action to configure amps to send an alert for messages in your orders SOW topic with a state of “Pending” and a duration of 5 seconds.

Any message that still has a state of “Pending” after 5 seconds will trigger the action.

Configure the action to publish a message to an “Alerts” topic, set up a “nanny-bot” to listen on the Alerts topic, and then boom!

Send an email, buzz your pager, turn on your toaster, or fire the death ray! You name it!

All at AMPS speed!

Working Example

Here are the steps to a basic working example that you can run on your own AMPS instance.

Note: This example assumes that you are running the AMPS server on your local machine using the loopback address. If you are running your AMPS server on a different machine or network configuration, the example may not work as written

1. Copy the following AMPS server configuration and save it to a file called sample-on-message-condition-timeout-config.xml inside of your amps installation directory.

Note: This is a bare bones demo configuration. Use for anything else at your own risk!

<?xml version="1.0" encoding="UTF-8"?><AMPSConfig><!-- Name of the AMPS instance --><Name>AMPS-On-Message-Condition-Timeout-Demo</Name><!-- Configure the administrative HTTP server on port 8085       This HTTP server provides admin functions and statistics       for the instance   --><Admin><InetAddr>localhost:8085</InetAddr></Admin><!-- Configure a transport that accepts any known message type over       TCP port 9007 using the amps protocol. --><Transports><Transport><Name>any-tcp</Name><Type>tcp</Type><InetAddr>9007</InetAddr><Protocol>amps</Protocol></Transport></Transports><Logging><Target><Protocol>stdout</Protocol><Level>error</Level><IncludeErrors>00-0015</IncludeErrors></Target></Logging><SOW><Topic><Name>Orders</Name><MessageType>json</MessageType><Key>/OrderID</Key><Durability>transient</Durability></Topic><Topic><Name>Alerts</Name><MessageType>json</MessageType><Key>/OrderID</Key><Durability>transient</Durability></Topic></SOW><Actions><Action><On><Module>amps-action-on-message-condition-timeout</Module><Options><MessageType>json</MessageType><Topic>Orders</Topic><Filter>/status = 'PENDING'</Filter><Duration>5s</Duration></Options></On><Do><Module>amps-action-do-publish-message</Module><Options><MessageType>json</MessageType><Topic>Alerts</Topic><Data></Data></Options></Do></Action></Actions></AMPSConfig>

2. Open a terminal and navigate to your amps installation directory.

3. Start amps using the configuration file that you just created:

bin/ampServer sample-on-message-condition-timeout-config.xml

4. Open a second terminal and navigate to your amps installation directory.

5. Run the following command to listen on the Alerts topic using spark, the AMPS reference client:

bin/spark subscribe -topic Alerts -server 127.0.0.1:9007/amps/json

6. Open a third terminal and navigate to your amps installation directory (last one, I promise).

7. Run the following command to send a message to the Orders topic using spark, the simple AMPS command-line client:

echo'{"id":1,"status":"PENDING","message":"Take over the world!"}'| bin/spark publish -topic Orders -server 127.0.0.1:9007/amps/json

8. Bring your second terminal into focus.

9. Wait 5 seconds; the longest time you have ever had to wait for anything AMPS related. Voila! Your message has appeared!

% bin/spark subscribe -topic Alerts -server 127.0.0.1:9007/amps/json
    {"id":1,"status":"PENDING","message":"Take over the world!"}

I will leave the “firing of lasers” portion of the demo as an exercise to the user ;-)

↧

Identifiers, Changes and Chains

March 27, 2017, 5:00 pm

≫ Next: Same Data, Unique View: Aggregated Subscriptions

≪ Previous: Time-Based Triggers

red hot chain

In some applications, unique identifiers for objects change through the object’s lifetime. It can be difficult to decide how to model this in systems where an identifier is necessary, such as topics in the AMPS State-of-the-World (SOW).

A prime example of this problem is in FIX Order ID Chaining. The FIX specification allows systems to declare that a previous Order is canceled and that a new Order, with a new ID, replaces it. A system does this by using the /41 or /OrigClOrdID field of the message. Applications frequently want to model this replacement order as the same Order object as the previous message. In this case, though, there is no shared identifier in a common field that exists across the “chain” of orders, so there’s no common field that can be used as a Key:

fig 1

In the diagram, three messages representing one Order are seen, but neither the /11 or /41 fields are suitable for a unique ID. For any given combination of /41 or /11 present in an incoming message, previously seen values for those identifiers must be consulted to determine if the incoming message is an update to a previous record, or if the message constitutes a new Order.

Introducing AMPS ID Chaining

AMPS 5.2 provides new functionality which supports correctly identifying messages that follow this chaining pattern. The functionality is called the ID Chaining module, and is configured in your SOW topic configuration. Here’s an example:

<Modules><Module><Name>libamps-id-chaining-key-generator</Name><Library>libamps_id_chaining_key_generator.so</Library></Module></Modules>
...
<SOW><Topic><Name>Orders</Name><FileName>./sow/%n.sow</FileName><MessageType>nvfix</MessageType><KeyGenerator><Module>libamps-id-chaining-key-generator</Module><Options><Primary>/11</Primary><Secondary>/41</Secondary><FileName>./sow/order.chaining.data</FileName></Options></KeyGenerator></Topic></SOW>

Let’s dissect this configuration file a bit.

The first section, Modules, loads an optional module that ships with AMPS called libamps_id_chaining_key_generator.so. This module is now available for use as we declare SOW topics.

Later in the configuration, we declare a SOW topic called Orders of message type nvfix. Unlike most SOW topics, we never specify a <Key>; instead, we specify a KeyGenerator element referring to the libamps-id-chaining-key-generator module we loaded above. This results in AMPS invoking the ID Chaining module to create a SOW key for each incoming record.

We pass Options to the ID Chaining module to configure it for our data. The Primary and Secondary options are used to indicate the message fields which serve as the current/primary ID, and the previous or secondary ID. If you’re configuring the module for use with standard FIX data, these will most typically be set to /11 and /41, but they may be set to any fields you’d like. The FileName option specifies a file you want the module to store its state in. The module will read this file on startup and keep it updated with the data it needs to preserve the linkage between every valid identifier for each chain.

See It In Action

With this configuration, let’s publish some sample data and see what happens. Here’s our sample data set:

$ cat -v orders.nvfix 
11=A-1111^A55=MSFT^A44=60.10^A38=100^A
11=A-1121^A41=A-1111^A44=60.10^A38=50^A
11=A-1131^A41=A-1121^A44=60.10^A38=75^A
11=B-1111^A55=IBM^A44=120.01^A38=100^A
11=B-1121^A41=B-1111^A44=120.01^A38=50^A
11=B-1131^A41=B-1111^A44=120.01^A38=75^A

Notice we have two order chains present; one identical to the example in the first diagram (the MSFT order chain). The second order chain (IBM) has a significant difference: the third publish refers to the first ID used in that chain, B-1111. Let’s publish this data and see how AMPS resolves these ID chains:

$  ~/spark publish -delta -server localhost:9007 -topic Orders -type nvfix -file orders.nvfix
total messages published: 6(3000.00/s)$ ~/spark sow -server localhost:9007 -type nvfix -topic Orders | cat -v
11=A-1131^A41=A-1121^A44=60.10^A38=75^A55=MSFT^A
11=B-1131^A41=B-1111^A44=120.01^A38=75^A55=IBM^A
Total messages received: 2(Infinity/s)

As expected, AMPS ID Chaining resolved these 6 messages into two distinct Orders. Note that even though we used an older ID (B-1111) in the 3rd publish on the IBM chain, AMPS was still able to resolve this publish to the correct chain. This is because AMPS ID Chaining tracks every distinct ID ever used in the chain, not just the most recent. Doing so frees systems from being concerned that an older order ID might still be used by upstream systems.

Failure Detection

AMPS ID chaining requires that two distinct chains are never resolved together by a future message. For example, this order of publishes cannot resolve to a single message, because the 3rd publish attempts to resolve two existing order ID chains into one:

fig 2

If publishers cannot be prevented from publishing data which creates this scenario, the ID Chaining module includes a Validation configuration option which detects ID Chaining sequencing errors. This option requires extra space and processing time, but can be very helpful in tracking down publisher errors. To enable this option, update the configuration with an entry like the following:

<KeyGenerator><Module>libamps-id-chaining-key-generator</Module><Options><Primary>/11</Primary><Secondary>/41</Secondary><FileName>order.chaining.data</FileName><Validation>true</Validation></Options></KeyGenerator>

When a publisher publishes data with a sequencing error that AMPS detects, errors are emitted to the AMPS log and the message is rejected:

2017-03-07T15:01:53.5953200-08:00 [32] warning: 29-0104 Sequencing error: An attempt to map id [A-1121] to SOW key 15073404310751987725 failed, because it was already mapped to SOW key 12015654067891347767. This indicates a sequencing error in upstream publishers.
2017-03-07T15:01:53.5953220-08:00 [32] error: 02-0040 client[my-publisher] published a message which could not be processed by the SOW KeyGenerator:
 topic='Orders'
 client seq= 0
 data=[11=A-1121^A41=A-1111^A44=60.10^A38=50^A]

Conclusion

Systems that process FIX orders are often faced with the need to track “chains” of order IDs that change through time, but this problem isn’t confined to FIX orders. Many systems are faced with challenges of identifiers that are not constant. AMPS ID Chaining provides an important tool when working with data that lacks a consistent unique ID per object. For more information on using this feature, consult the User Guide

↧

Same Data, Unique View: Aggregated Subscriptions

March 29, 2017, 5:00 pm

≫ Next: 60East Launches Media Division with the World’s Most Advanced ASCII Movie Player!

≪ Previous: Identifiers, Changes and Chains

unique fish

AMPS 5.2 introduces a powerful new capability for subscribers to create custom aggregations and projections to AMPS SOW topics – with no configuration necessary! We call this functionality Aggregated subscriptions. Aggregated subscriptions are like private views for an individual subscription. You no longer have to reconfigure and restart AMPS to test a different calculation, or add a full view for a subscriber that needs different data – but only for a few days at the close of the month. When a subscriber has unique needs, aggregated subscriptions can give that subscriber a unique view.

Aggregated Subscriptions can be used with any command that queries a State of the World topic (for those of you familiar with AMPS, this includes the sow, sow_and_subscribe, and sow_and_delta_subscribe commands.)

To use Aggregated Subscriptions, configure one or more SOW topics on your AMPS instance, for example:

<SOW><Topic><Name>Orders</Name><MessageType>json</MessageType><Key>/order_key</Key>
    ...
  </Topic></SOW>

No additional configuration is required to support aggregated subscriptions; any topic in the SOW may be used with these options.

Aggregated Subscriptions specify a set of grouping fields and a set of projection fields when placing the subscription or issuing the SOW query. These serve the same purpose as the Grouping and Projection elements in the AMPS configuration when defining a View. However, instead of specifying these fields in a server configuration file, you provide these options through the AMPS Client you use, in the options field of the command.

The AMPS command-line tool spark supports providing an options field via the -opts argument, so we can use spark to quickly test new aggregations.

Examples

Suppose our Orders topic above has been seeded with a few sample messages:

{"order_key":1,"symbol":"MSFT","price":62.30,"qty":100}{"order_key":2,"symbol":"MSFT","price":62.28,"qty":150}{"order_key":3,"symbol":"IBM","price":180.20,"qty":16}{"order_key":4,"symbol":"FIZZ","price":61.77,"qty":4000}{"order_key":5,"symbol":"YUM","price":64.07,"qty":123}

We can use the sow command with projection and grouping options to ask for custom aggregations to be built and returned. Suppose we’d like to know the average order price for each symbol, for example. We can use the command line utility spark to easily execute this query:

$ ~/spark sow -server localhost:9007 -topic Orders -opts "projection=[/symbol,avg(/price) as /avg_price],grouping=[/symbol]"{"symbol":"FIZZ","avg_price":61.77}{"symbol":"YUM","avg_price":64.07}{"symbol":"MSFT","avg_price":62.29}{"symbol":"IBM","avg_price":180.2}

Note the syntax of the projection and grouping options in the -opts argument. Both options take a list of fields. For the grouping option, this is a list of one or more fields you’d like to group your results by. The list of fields in projection is more flexible, and allows you to simply project a field through (e.g. /symbol), or use AMPS SQL-like syntax to compute a value you’d like projected (e.g. avg(/price) as /avg_price).

Customizing Output

The projection syntax allows us to do arbitrary computation and to call User Defined Functions as well. Imagine we’d like to compute and return the average order total by symbol, for example:

$ ~/spark sow -server localhost:9007 -topic Orders -opts "projection=[lower(/symbol) as /symbol,avg(/price*/qty) as /avg_total],grouping=[/symbol]" -orderby "/avg_total desc"{"symbol":"fizz","avg_total":247080.0}{"symbol":"yum","avg_total":7880.61}{"symbol":"msft","avg_total":7786.0}{"symbol":"ibm","avg_total":2883.2}

In this example we use an AMPS built-in function lower to convert the symbol names to lowercase; we also average on the order’s price multiplied by the order’s quantity, and sort the results on this new /avg_total field using -orderby.

Subscriptions

In addition to a one-time query, aggregated subscriptions can be placed which allows your application to see ongoing updates to the results of the aggregation as changes to underlying data arrive. For fast-moving underlying data, this may be combined with subscription conflation to reduce update frequency.

As an example, imagine we place this subscription in one session:

$ ~/spark sow_and_subscribe -server localhost:9007 -topic Orders -opts "projection=[lower(/symbol) as /symbol,avg(/price*/qty) as /avg_total],grouping=[/symbol],conflation=5s" -orderby "/avg_total desc"{"symbol":"fizz","avg_total":247080.0}{"symbol":"yum","avg_total":7880.61}{"symbol":"msft","avg_total":7786.0}{"symbol":"ibm","avg_total":2883.2}

spark keeps running, listening for more data. Our use of conflation=5s means AMPS will conflate messages it might send us on a 5 second interval. In another window, we quickly publish 4 new Orders for YUM:

{"order_key":10,"symbol":"YUM","price":70,"qty":10000}{"order_key":11,"symbol":"YUM","price":70,"qty":8000}{"order_key":12,"symbol":"YUM","price":70,"qty":9000}{"order_key":13,"symbol":"YUM","price":70,"qty":7000}

Because we’ve specified conflation=5s, we see just one additional message published to our subscriber, a few seconds later:

{"symbol":"yum","avg_total":477576.122}

In addition to conflation, aggreagated subcscriptions may be combined with both content filters and with delta subscriptions to even further reduce the amount of data your subscriber must process.

Aggregated subscriptions are unique to each client, even when they contain the same projection fields and grouping clause, so note that additional system resources are used for each client that requests an ongoing aggregation subscription. The resources used by a client’s aggreagations count against the configurable byte limits for a client; a client may be disconnected by the server if this exceeds the configured limit.

Conclusion

AMPS 5.2’s Aggregated Subscriptions make AMPS much more flexible and allows you to build more responsive, customizable applications. This feature allows you to build richer experiences for your users, and to make changes to the aggreagations you offer without reconfiguring AMPS. For more information on these new capabilites, consult the User Guide.

↧

Code it Up

Are there provisos? Caveats? Quid pro quos?

Keep it All Together

Why would anyone Ever Want to go Slow?

Set the Pace

Keeping State in AMPS

About Stores

Bookmark Stores

Configuring The SOW

Working with the BookmarkStore Interface

Defining the BookmarkStore Class

Publishing Updates to the Store

Managing Discarded Messages

Using the Store

Next Steps: High Availability Bookmark Stores

Wrapping it Up

Method 1

Method 2

Bonus: Add Inline Compression

Super Bonus: Add a Progress Bar

Transfer At Will

Finding The Big Answers

Searching the Whole or a Part

Build a Message

Losing the first Heat but Winning in Record Time in the Final:

Hardware Messaging Appliances Have a Place

A View to A Trade

Defending COTS is Hard Work!

Measuring Success:

The Secret Sauce

Performance Comparison

What’s Going On?

Message Queues Reinvented

Beyond Queueing

Maintaining a Bookmark Store in AMPS

Restrictions of this Implementation

Bookmark Store Messages

Configuring AMPS

Working with the Bookmark Store Interface

Defining the sow_bookmark_store Class

Subscribing and Recovering

Publishing to the Bookmark Store

The log Method

Checking if a Message is Discarded

Unimplemented Methods

Using the Store

Closing Thoughts

Get the code

Step 1: Download Client files

Step 2: Turn ON your AMPS Instance

Step 3: Configure your Client Libraries

Step 4: Get a Taste of AMPS features

Further Steps

What is Galvanometer?

Know your Instance

Host: Hardware Monitoring

Great Scott! Time Machine!

SQL: SOW and Subscribe without leaving Galvanometer

Replication: See the Big Picture

Fine-grained Stats with the Graph Builder

Try it today!

Easy to Get, Easy to Use

Born Asynchronous to Live Free (From Blocking)

Get Used to Convenience

I Measure the Moment in the Heartbeats I Skip

Command Interface: You Have Full Control

Try it Today!

It’s Not What JavaScript Can Do for AMPS, It’s What AMPS Can Do for JavaScript

Action On Message Condition Timeout

Back to our story, how do you save the day?

Working Example

Introducing AMPS ID Chaining

See It In Action

Failure Detection

Conclusion

Examples

Customizing Output

Subscriptions

Conclusion