Data transforms in Apache Camel with BeanIO

Apache Camel has so many ways of making your life easier; here’s one.

I needed to import a fixed-format file, the kind of thing that reminds you to hug XML and even give JSON a break every so often. In this case, I was importing the Yale “Bright Star Catalogue“, featuring a load of numbers about all the visible stars, about 9000 all told. Not a huge database, but a pain to parse, like all fixed format data, and I needed output in XML.

I looked at what Camel had to offer and came across the BeanIO component, which handles CSV, Fixed and XML formats. Now this immediately made life easier, for a start there’s an external XML mapping file to tell the parser what fields to expect and what to do  (for all the options, see here). Here’s the first few fields in my star data :

<?xml version="1.0" encoding="UTF-8"?>
<beanio xmlns="http://www.beanio.org/2012/03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.beanio.org/2012/03 
http://www.beanio.org/2012/03/mapping.xsd">

<stream name="stars" format="fixedlength">
     <record name="star" class="java.util.HashMap">
         <field name="HR" length="4" trim="true" />
         <field name="NAME" length="10" trim="true" />
         <field name="DM" length="11" trim="true" />
         <field name="HD" length="6" trim="true" />
     </record> 
 </stream>

I’m using vanilla Spring XML in my Camel, so I’ve no class to map my data to, hence I’m using HashMap, but if you’ve got one, your class goes in the record. Also, as I’m hoping for XML output, I’m trimming each field so I don’t get a file full of spaces.

BeanIO can run as either a DataFormat, or as a Component, I’m using the former. Now al I needed was a folder to put the file in and a bit of Camel:

<dataFormats>
 <beanio id="stars" mapping="mapping.xml" streamName="stars" 
ignoreUnexpectedRecords="true"/>
</dataFormats>
<route>
 <from uri="file:inbox?noop=true"/>
 <split streaming="true" parallelProcessing="true">
 <tokenize token="\r\n|\n" xml="false" trim="true" />
 <to uri="direct:stars"/>
 </split>
 </route>
<route>
 <from uri="direct:stars"/>
 <unmarshal ref="stars"/>
</route>

This is standard stuff, the dataFormat section points to the mapping file and tells it what stream definition I want from it, then I split the file, send each line on and “unmarshal” it into a HashMap using that definition.

Now at this point I was fairly happy, the split was simple, but I was still faced with having to create some sort of Groovy bean to assemble the HashMap into the XML I wanted. I actually started down that road and then came across the following in the docs:

Our original mapping file from Section 2.1 can now be updated to parse XML instead of CSV with only two minor changes. First, the stream format is changed to xml. And second, the hire date field format is removed…

Lightbulb moment. All I needed was to add a second steam format using the fields I wanted in my XML, and BeanIO would “marshall” it for me. No bean, no mess no fuss. Again there’s a load of options, you can rename elements, make some things attributes, change format. I just needed the plain version with a couple of tweaks to suppress the <?xml… header and format the output just for readability’s sake:

<stream name="stars2" format="xml" xmlType="none">
 <parser>
    <property name="suppressHeader" value="true" />
    <property name="indentation" value="2" />
</parser>
 <record name="star" class="java.util.HashMap">
    <field name="HR" />
    <field name="NAME" />
    <field name="DM" />
    <field name="HD" />
</record>
</stream>

Now I just need to add in the DataFormat and mod my route a little so my filename comes from the data:

<dataFormats>
  <beanio id="stars" mapping="mapping.xml" streamName="stars" 
  ignoreUnexpectedRecords="true"/>
  <beanio id="stars2" mapping="mapping.xml" streamName="stars2"/>
 </dataFormats>
 
 <route>
 <from uri="file:inbox?noop=true"/>
 <split streaming="true" parallelProcessing="true">
 <tokenize token="\r\n|\n" xml="false" trim="true" />
 <to uri="direct:stars"/>
 </split> 
 </route>
 
 <route>
 <from uri="direct:stars"/>
 <unmarshal ref="stars"/>
 <setHeader headerName="CamelFileName">
     <simple>${body[0].get('HD')}.xml</simple>
 </setHeader>
 <marshal ref="stars2"/>
 <to uri="file:filebox"/> 
 </route>

That’s it, 9000 XML files in a few lines of configuration:

<star>
<HR>3</HR>
<NAME>33 Psc</NAME>
<DM>BD-06 6357</DM>
<HD>28</HD>
</star>

Now the neat thing about this, is that the is file is one of dozens of astronomy data files – many in fixed format. So the same code, add a new stream to the mapping file and you’re parsing out the “ACT Reference Catalog ” of 100,000 stars.

 

 

 

 

Advertisements

CheerLights by Camel

Cheerlights

It’s that time of year again, when, up and down the country, people are sticking together electronics and lights for the CheerLights project. If you don’t know of it, then it a wheeze from  ioBridge Labs to connect us at this festive season. Essentially, if you tweet one or more of a set of colours using the #cheerlights tag, their server will pick it up and publish it to a ThingSpeak channel. Once there, a simple API makes it possible to construct something that sets your lights to the current colour. It’s a simple idea, but very powerful when you think of the thousands of lights and gadgets, all changing colour simultaneously across the world in response to a tweet.

UntitledLast year, I went virtual with a bit of Processing, but this year, I’m looking to do a light based on a Ciseco RFµ328 board. It’s basically a tiny Arduino, but with an SRF radio. So, it’s CheerLights API -> Raspberry Pi (SRF Dongle) ->  RFµ328 + RGB LED. What could be simpler?

Well, it started out ok. I did a Tcl script that polled the ThingSpeak API and got the current colour every 10 seconds, spat that out to the RFµ and wrote a little bit of code on that to set the RGB LED. The problem then is that you have to wait 10s for it to notice changes, by which time it might have missed some tweets if it’s busy; or, you are constantly sending ‘red’ over SRF when it’s quiet. Plus, some clever folk send out things like “#cheerlights red blue green red” and of course, you’ll just get the last one. That’s the problem with REST, it’s a polling technology.

Now, they’ve a longer feed which gives you a bit of a history, but you’re going to have to parse it and work out where in the list your light is, plus store some sort of status between polls etc. It’s getting more complex, and with a fixed poll interval still not ideal as the other end, the twitter end, is an unknown. You might of course be thinking “Get a life, it’s a light” and you’d be right in some ways. However, as an engineer, it’s an interesting problem and to be honest, you never know when you might want to use Twitter to control/inform some other process when you’ve little control over the tweeters.

Let’s start by bringing the problem under our control, by looking at Twitter ourselves. Now the steps are:

  • Tell the Twitter API what we’re searching for, i.e. the #cheerlight hashtag. It’s an Event api, so we’ll get results only as they’re tweeted. That neatly fixes the polling issue, whilst still getting us tweets as they happen.
  • Pull any colours out of the tweet – bit of regex here perhaps.
  • Send those colours out to the widget. That doesn’t change

Ok, it’s a bit more complex, especially the Twitter side, but we’ve got a Camel on standby, so lets ride!

Using Camel Routes

Now Apache Camel has a Twitter component and a very nice example of it’s use, so I won’t go into the process of creating Twitter keys. Suffice to say, they’re in a properties file and I can use them in a route to get the tweets.

Our starting route is therefore:

<route id="twitter-feed">
  <from uri="twitter://streaming/filter?type=event&amp;keywords=#cheerlights&amp;consumerKey={{twitter.consumerKey}}&amp;consumerSecret={{twitter.consumerSecret}}&amp;accessToken={{twitter.accessToken}}&amp;accessTokenSecret={{twitter.accessTokenSecret}}" />
<!-- Body is Twitter4J Status Object -->
  <log message="${body.user.screenName} tweeted: ${body.text}" />
<!-- Queue them up -->
  <to uri="seda:statusFeed"/>
</route>

One of the things to like about Camel is the ability to build up a problem in pieces; it’s ‘loosely coupled’, which is good. This route watches for #cheerlights and returns the tweet – it does just one job. Notice the body isn’t a string, but a tweet object with full data like author, georef, replies etc etc.  Here I’ve dropped the results in a queue, but I could have started with a file, or simply printed it out. And, once the route works, I can go on to the next part in confidence.

Next step is get any colours. Time for a bit of Groovy here.

<route id="colours">
  <from uri="seda:statusFeed"/>
<!-- Find the colours and create delimited string as new body. Groovy rocks for this! -->
  <setBody>
    <groovy>request.getBody().getText().toLowerCase().findAll(/(purple|cyan|red|
      green|blue|white|warmwhite|yellow|orange|magenta|pink|oldlace)/).join(',')
    </groovy>
  </setBody>
  <log message="colours ${body}" />
<!-- Drop each colour into the colour queue -->
  <split>
    <tokenize token=","/>
    <to uri="seda:colourFeed"/>
  </split>
</route>

Here I replace the body of the message with a delimited string of any colours in it e.g. the tweet “#cheerlights set everything to blue. Now some red and green” becomes “blue,red,green” via a bit of Groovy regex-less magic. Since I might get one colour or ten in a given tweet next I use the Splitter to drop each colour as a separate message into a new queue to be consumed by the widget driver. Note because of the queues, each route doesn’t know anything or depend the others apart from there needing to be consumers. This is pretty handy as I can for instance feed the colours from a file, rather than test-tweeting. And, because the original full-fat tweet is preserved in the initial queue, I can pick out other facts, process them and reuse the information if I wanted to: there could be a database of tweet lat/lon pairs, or an analysis of tweeters or a mashup of colours picked. All just by altering the routes slightly to tap into the information flow at the right point.

The last bit of the puzzle is outputting the right data over SRF. Now the folks at Ciseco, have made it pretty easy. You send serial data to the USB dongle on the Pi, and it turns up on the RFµ328. But, they also have a neat protocol called LLAP that’s ideal for this sort of stuff and handles a lot of the housekeeping for you  . It uses 12-character messages, which is fine for us if we send an RGB string. So, I’ll create a new message type called PWM and send it an RGB colour to my RFµ which has the address “AC”. All LLAP messages start with an ‘a’, so the message for blue would be:

aACPWM0000FF

All the final route needs to do is read a colour, turn it  into RGB via a smidgen more of Groovy and then send it via the Stream component to the USB port the dongle is on.

 <route id="changer">
   <from uri="seda:colourFeed"/>
     <!-- throttle 1 messages per sec -->
     <throttle>
       <constant>1</constant>
       <log message="switching light to ${body}"/>
       <transform>
         <groovy> def cmap = [red:"FF0000",green:"008000",blue:"0000FF",cyan:"00FFFF",white:"#FFFFFF",warmwhite:"FDF5E6",purple:"800080",magenta:"FF00FF",yellow:"FFFF00",orange:"FFA500",pink:"FFC0CB"]
 "aACPWM" + cmap.get(request.getBody())
         </groovy>
       </transform>
       <log message="Sending LLAP Msg ${body}" />
       <to uri="stream:file?fileName=/dev/tty.usbmodem000001"/>
     </throttle>
 </route>

Notice I’ve wrapped the route in a call to the Throttler component so that the colour doesn’t change more than once a second. This makes sure that tweets of “red green blue” don’t end up as just a flicker and then blue. The input route could be throttled in a similar way so only so many colours were in the queue in case there was a flurry. See RoutePolicy for details..

Wrap up.

20141221_210057I’ve left the Arduino/ RFµ328 side out of this post – it’s easy enough to get something like this with a few lines of code and a bit of soldering:

All the Groovy is inline in this example. It’s not the most efficient method, really it should be a Bean so things like the array are only initialised once.

The point is more that Camel is a fantastic environment for the IoT’er 🙂

Camel and CSV when you need XML

It’s no secret that the public sector are in love with CSV. You only have to look at sites like data.gov.uk to see that. If it’s not CSV then it’s Excel, which comes down to pretty much the same thing. The reason is simple, there’s loads of CSV and you can create and consume it easily with office level tools. However, in the IT world CSV tends to be an intermediate format to something like SQL, or in my case XML. I often get situations where the ‘seed’ data in a project comes in as CSV from which N XML files need to be made, one from each row, e.g. people.csv into N <person/> files. The follow-on problem to this is that some of the original CSV files can be big. That’s not big as in Big Data big, but too large to load into an editor or run as a process with the whole thing in memory i.e.. “We’ll have to send you the csv, zipped, it’s 2 million rows.” irritatingly big.

Now of course most of the platforms that you might want to use to consume this stuff comes with tools, but you need to know them and if you want to as I do turn the CSV into XML as well there might be couple of places you need to explain this and specific idioms to remember from the last time that you didn’t write down. All these things tend to come to a head when you’ve a day to create some whizzy demo site from a file someone emailed you.

If I get a file even vaguely in XML and I want another XML then I tend to use XQuery or XSLT. If not I tend to use Ant or Apache Camel. These days Camel is my favourite as it neatly merges the modern need for both transport and transformation into one system. So, I’ve a CSV file on the system, what to do next?

First choice is whether you can consume you file whole or you need to read it line by line or in chunks. The latter is the normal situation, it’s not often you get just a few hundred lines to read and streaming it in allows you to read any size of file. Whichever way you go, you can use the CSV data format as your base (there’s also the  heavy hitter Bindy module I’m ignoring for this post). This adds the ability to marshal (or transform) data to and from a Java Object (like a file) to CSV in a <route/>. At it’s simplest, it means you can read a file from a folder into memory and unmarshall it into a Java List (actually a List inside a List) like so:

<route id="csvfileone">
    <from uri="file:inbox" />
    <unmarshal><csv delimiter="|" skipFirstLine="true"/></unmarshal>
    <log message="First Line is ${body[0][0]}"/>
    <to uri="seda:mlfeed"/>
</route>

Here I’ve used the option to ignore the header line in my file and use pipe as delimiter rather than comma. The whole thing is sent to a seda queue and I’ll assume something is processing the List at that end. Just to prove it really is a List (you can talk to Lists in <simple/> which is also cool), I’ve logged the first line. Now if you want to read a small file and pick out say the first, second and fourth field from a given line this might be all you need. The problem with this approach is that you don’t need a huge file before memory and performance become issues.

If you’re looking at a big file, then what you can do is use the Splitter to, well split it into lines (or blocks of lines) first and then unmarshall each line afterwards. This is ideal, if as I do, each line is to become a separate file in your output later. Now the route looks like this:

<route id="csvfilereader">
    <from uri="file:inbox" />
    <split streaming="true" parallelProcessing="true">
       <tokenize token="\r\n|\n" xml="false" trim="true" />
           <filter>
               <simple>${property.CamelSplitIndex} &gt; 0</simple>
               <unmarshal><csv/></unmarshal>
               <to uri="seda:mlfeed"/>
           </filter>
    </split> 
</route>

To reduce memory the splitter is told to stream the file into chunks. Note a side effect of this is that the lines in the file won’t necessarily turn up in the order they were in the input file.  The splitter has also been told to process each chunk in parallel which speeds up the process. The Tokenize language is used to tell the splitter how to perform the split. In this case, it’s to use either Windows or Unix line endings (got to love that) and to trim the results. Each line is then fed into our queue unmarshalled as before. Note I couldn’t use skipFirstLine here as each entry is only one line so I’ve added a <filter> based on the counter from the split instead. One of the things I like about Camel is the way you can start of with a simple route and then add complexity incrementally.

Now I’ve a simple and robust way to suck up my CSV file, I just need to turn each record into XML by transforming the data with a bit of self-explanatory Groovy:

class handler {
    public void makeXML(Exchange exchange) {
    def response= "";
    def crn = "";
 
 /* Example data
CRN, Forename, Surname, DoB, PostCode
[11340, David, Wright, 1977-10-06, CV6 1LT]' 
*/
 
    csvdata.each {
        crn = it.get(0)
        response = "<case>\n"
        response += "<crn>" + crn + "</crn>\n"
        response += "<surname>" + it.get(1) + "</surname>\n"
        response += "<forename>" + it.get(2) + "</forename>\n"
        response += "<dob>" + it.get(3) + "</dob>\n"
        response += "<postcode>" + it.get(4) + "</postcode>"
         response += "</case>"
    } 
    exchange.getIn().setBody(response)
    exchange.getIn().setHeader(Exchange.FILE_NAME,crn + '.xml')
}

As a bonus, I’ve dropped the unique id (CRN) field into a Header so the it will get used as the filename and each output file will be called something like 11340.xml. Last of all, I need to wrap the code up in a route to read the queue, create the file and spit it out into a folder:

 <route id="xmlfilewriter">
     <from uri="seda:mlfeed"/>
     <log message="hello ${body[0][0]}"/>
     <to uri="bean:gmh?method=makeXML"/>
     <to uri="file:outbox"/>
 </route>

Of course, in the real world, you’d probably not store the file this way and it would go straight to Hadoop, or MarkLogic etc. Also, of course, it could stay in CSV and you could do other cool things to it. That’s what I like about Camel, flexibility.

 

 

Review: Apache Camel Developer’s Cookbook

A week or so ago, the nice people at Packt Publishing offered me a chance to review “Apache Camel Developer’s Cookbook” I’m always happy to read another book on my favourite integration pattern (and get a free ebook 🙂 as you always learn something new. I’m also glad to report that this is a fine effort by Scott Cranton (@scottcranton) and Jakub Korub (@jakekorab) and well worth getting to go alongside the canonical “Camel in Action” by Claus Ibsen (@davsclaus).

Where CiA dives deep into the guts of Camel, ACDC is presented as a recipe book. You can read it section by section, starting with the basics of routes and endpoints, and moving on through various message patterns to security, testing and performance.  or, you can drop in to the section you want, to pick out a given recipe, as long as you have some Camel already under your belt.

It was heartening to see most of the recipes described not only using the Java DSL, but also in Spring XML. It might be more verbose, but it made it a lot easier to read for people like myself coming from the XML document side with only a smattering of Java and using Camel more as a tool.

Each recipe, is arranged identically:

  • The goal of the recipe is described.
  • Getting Ready. Prerequisites and how to set up the example code.
  • How to do it. Detailed steps for the recipe.
  • How it works. The background/architecture description in Camel.
  • There’s more. Further steps/more advanced use.
  • See also. Links to resources.

It’s a neat layout that reads easily, with only a couple of places where the material felt a little coerced, and each recipe is backed up with code ready to run via a ‘drop-in’ maven project.

The recipes are grouped into themed chapters:

  1. Structuring Routes.
  2. Message Routing.
  3. Routing to your code.
  4. Transformation
  5. Splitting and Aggregating
  6. Parallel Processing
  7. Error Handling and Compensation
  8. Transactions and Idempotency
  9. Testing
  10. Monitoring and Debugging
  11. Security
  12. Web Services

These were all informative, and showcase how a wide variety of problems can be addressed in Camel with some background on the EIP message patterns they represent. The chapters on error handling, testing and monitoring are excellent and provided a practical balance while the chapter on Parallel Processing addresses some of the issues of scale. If I had a complaint, and it’s probably just my take on Camel use, it would be that some of the recipes went straight for the more complex offering, e.g Bindy for CSV handling rather than starting with a data handler and a POJO.  It shows that Camel is ready for the big time, and it is, but I think it obscures the great flexibility of Camel as a framework for not only complex problems but doing, perhaps simpler or mundane, things really well.

All in all it’s a good, informative book. If you’ve used Camel before there’ll be a few things you haven’t seen and some good examples of best practise. If you haven’t, it’s got a good mixture of background and drop-in code to get you started.

 

 

 

 

 

 

Reading usb serial data with Apache Camel

I’ve done a couple of posts on using Apache Camel as part of my home-monitoring system, and why I think it’s a good fit as an IoT switchboard. However, one of the flaws in my master plan is that there isn’t a component for serial communications (Camel MINA does socket comms, but not serial). Note I’m not talking actual RS232 here but the version that comes in USB form. I’m using mostly JeeNodes, which talk RF12 between them, but even in this modern era, without a wi-fi shield or some similar way to get onto the LAN and TCP/IP, then at some point, you’re talking to a serial port on some box into which, in my case a controlling JeeLink, is plugged.

Now, I’ve got around this like I guess most people have, by creating a serial-tcp/ip bridge in code, and getting on with our lives. But, it isn’t, well elegant and it bugs the engineer in me. It would be nice to be able to link directly to Camel and use the considerable advantages it brings, like throttling and queues and all the other goodies. But sadly, my Java isn’t up to creating modules, and it’s a fairly niche use-case. So, up to now I’d pretty much given up on it.

What I hadn’t realised, was that the Stream component, not only lets you talk to the standard input and output streams but also files and urls. Reading further, Stream has support for reading a file continually, as if you were doing ‘tail -f’ on it. Now in Unix, famously “everything is a file”, so in theory, you should be able to read a serial device, like it was a file and if you could Stream should be able to as well. Cue ‘Road to Damascas’ moment.

For my test I quickly grabbed a JeeNode and plugged it into my MacBook. Then I wrote a short sketch that spat out a clock to the serial port:

#include <Stdio.h>
#include <Time.h>
char buffer[9] ="";
void setup()
{
   Serial.begin(9600); // Tried 19200 but didn't work!
   setTime(0,0,0,0,0,0); // reset starting time
}
void loop()
{
    digitalClockDisplay();
    delay(5000);
}
void digitalClockDisplay(){
    //Send a whole line at a time
    sprintf(buffer, "%02d:%02d:%02d",hour(),minute(),second());
    Serial.println(buffer);
 }
}

I could see it running in the Serial Monitor in Arduino, and I could see the device in /dev, but nothing if I tried “tail -f  /dev/cu.usbserial-A900acSz”

Turns out, the default baud rate is 9600 on Macs and also that you can’t alter them using Unix stty commands either!  After reading a whole lot of internet, it seemed that, at least for my demo, I’d just have to stick to 9600 or get pulled into a whole pile of Mac-mess. Also note, at least on a Mac I got /dev/cu.* and /dev/tty.*. Only the cu.* ones seemed to work. I think this is something to do with the tty.* ones expecting handshaking, but I’m happy to be corrected.

Once I’d altered the baud-rate every thing worked fine. I could tail -f onto my JeeNode device and it would happily spit out stuff like this:

00:00:20
00:00:25
00:00:30
00:00:35

On the Camel side, I made a copy of camel-example-console from the /examples folder and modified the camel-context.xml to hold my new routes. First a route to read from my JeeNode – nothing fancy, just get the data and put it in a SEDA queue as a string. Then a route to get it off the queue and send it to the screen via stdout:

 <route>
 <from uri="stream:file?fileName=/dev/cu.usbserial-A900acSz&amp;scanStream=true&amp;scanStreamDelay=1000"/>
 <convertBodyTo type="java.lang.String"/>
 <to uri="seda:myfeed"/>
 </route>
 <route>
 <from uri="seda:myfeed"/>
 <to uri="stream:out"/>
 </route>

Note: scanStream tells it to scan continuously (to get the tail -f effect) and the scanStreamDelay tells it to wait a second between scans.

A quick mvn clean compile exec:java later and it works! So, it seems that I can have my cake after all.

Now, there are caveats. For a start Camel gets pretty upset if you unplug the JeeNode and the device disappears. Also, it seems I’m stuck with 9600, at least on my Mac. But it does work and means a complete Camel solution is possible. Time for more experiments, but at least for the moment my inner engineer is quiet.

PS. It works the other way as well 🙂

<route>
 <from uri="stream:in?promptMessage=Enter something: "/>
 <transform>
 <simple>${body.toUpperCase()}</simple>
 </transform>
 <to uri="stream:file?fileName=/dev/tty.usbserial-A900acSz"/>
</route>

 

Creating MarkLogic content with Apache Camel

MarkLogic with JMS?

A few weeks back we started looking at MarkLogic  at work as a possible replacement for our mix of Cocoon and eXistDB systems. One of the side issues that’s been raised is how we would get messages to/from our ActiveMQ EIP system. Now, MarkLogic doesn’t have a JMS connector, although to be fair, it seems to have a pretty good system for slurping up content from URL’s, files etc. However, there is a Java API, which gave me the idea of using my old friend Apache Camel.

If I could get Camel to talk to MarkLogic then, not only could I talk to any sort of queue, I could also pull content into MarkLogic from the huge range of other things Camel will talk to, and I would get all the EIP magic thrown in for good measure.

The Camel Routes.

The easiest way to test this was of course to set up a simple Camel project to slurp up some data. The most expedient  producer I could think of was to get an RSS feed from somewhere; JIRA being the most obvious, as it would reliably produce something new at reasonably short intervals. This would need transforming into XML and pushing into Marklogic via a queue. The MarkLogic side would be handled by their Java API mounted as a Bean, in my case written in Groovy so I could work with it as a script. So much for the basic plan. Here’s the start route as Camel sees it :

<route id="jira">
    <from uri="rss:https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-xml
    /temp/SearchRequest.xml?jqlQuery=&amp;tempMax=100&amp;delay=10s"/>
    <marshal><rss /></marshal>
    <setHeaderheaderName="ml_doc">
        <simple>/twitter/${exchangeId}</simple>
    </setHeader>
   <to uri="seda:mlfeed"/>
</route>

The Camel RSS module calls a basic Jira RSS feed, in this case, polling every 10 seconds. I’ve used the module defaults, so each entry is separated out of the feed and passed down the route one at a time. At this point the message body is a Java SyndFeed object, not XML, so it has to be ‘marshalled‘. Now the message body is an XML string ready for upload, but before I can send it I need to make a URI for MarkLogic to use. Each run of the route or ‘exchange’ has a unique id, so I’ve used that via the inbuilt <simple/> language. Alternatively, I could have also parsed something out of the content, like the Jira id or made something up like the current date-time. Finally, the message is dropped into a queue via the SEDA module.
Note, this in-memory queue type isn’t persistent, like JMS or ActiveMQ, but it’s built into camel-core, so was just handy.

There is another route to pull messages from the queue and into MarkLogic.

<route id="marklogic">
    <from uri="seda:mlfeed"/>
    <to uri="bean:marklogic?method=upload_xml"/>
    <!-- <to uri="file:outbox"/> -->
</route>

This route takes messages off the queue and passes them to a Bean written using Camel’s Groovy support. Lastly there’s an optional entry to put the message into a file in an /outbox folder. This is handy if you can’t get the MarkLogic bit working and want to look at the input: comment out the bean instead and just drop the data into files.

The Groovy Code.

The Groovy Bean is mounted in the configuration file, along with some parameters needed to connect to MarkLogic.
Note. To get this working, you’ll need to supply your own parameters, and have a MarkLogic REST server listening, as REST is the basis of their API. You can get instructions here.

<lang:groovy id="marklogic" script-source="classpath:groovy/marklogic.groovy">
<lang:property name="host" value="YOURHOST" />
<lang:property name="port" value="YOURPORT" />
<lang:property name="user" value="YOURNAME" />
<lang:property name="password" value="YOURPASSWORD" />
</lang:groovy>

Once the Bean is running, you simple call it’s methods in the route. You get as input the entire Exchange, so you have access to everything, as well as the ability to alter it as you like. In this case, I’ve simply written the data out and not altered the massage at all. In real life it would probably be more complex. The salient bit of Groovy code (the Get’s for the parameters are not shown) is shown below. This is the MarkLogic basic example with a couple of mods to a) Get the header that has the URI in, and b) Get the body of the input Message as an InputStream:

public void upload_xml(Exchange exchange) {
    // Get the doc url from Camel
    String docId = exchange.getIn().getHeader("ml_doc");
    if (docId == null) docId = "/twitter/" + exchange.getExchangeId();
    // create the client
    DatabaseClient client = DatabaseClientFactory.newClient(host, port,
 user, password,
 Authentication.DIGEST);

    // try to make use of the client connection
    try {
        XMLDocumentManager XMLDocMgr = client.newXMLDocumentManager();
        //Get an InputStream from the Message Body
        InputStreamHandle handle = new InputStreamHandle(exchange.getIn().getBody(InputStream.class));

        //Write out the XML Doc
        XMLDocMgr.write(docId,handle);
        } catch (Exception e) {
            System.out.println("Exception : " + e.toString() );
        } finally {
            // release the client
            client.release();
        }
}

Note.  I’ve connected and disconnected to the MarkLogic database each time. I’m sure this can’t be efficient in anything but a basic use case, but it will do for the present. There’s nothing to stop me creating an Init() method that could be called as the Bean starts to create a persistent connection if that’s better, but all the examples I could find seem to do it this way [If I’ve made any MarkLogic Java API gurus out there wince, I’m sorry. Happy to do it a better way].

Putting it all together.

If you’ve got a handy MarkLogic server, you can try this all out. I’ve put the code here on GitHub as a Maven project, and all you need to do is pull it and run “mvn compile exec:java”. Ten seconds or so after it starts, you should see something similar to this on the console:

[l-1) thread #1 – seda://mlfeed] DocumentManagerImpl INFO Writing content for /jira/ID-twiglet-53205-1398451322665-0-2
[l-1) thread #1 – seda://mlfeed] DatabaseClientImpl INFO Releasing connection

On the MarkLogic side, if you go to the Query Console you can use Explore to look at your database. You should see the files in the database – query them to your heart’s content.

  • I’m using MarkLogic 7 and Java API 2.0-2 with Camel 2.12.0.
  • If you want to change the routes, you’ll find them in src/resources/camel-context.xml.
  • The Groovy script is in resources/groovy/marklogic.groovy.
  • Remember, if you want to use modules outside of camel-core, you’ll need them in the pom.xml!

Bells and Whistles.

Now I’ve got the basic system working there are a couple of other things I could do. As the MarkLogic component reads from the end of a queue, I could for instance add another route that puts messages into the same queue from another source, for example Twitter (for which there’s a module) assuming I had appropriate twitter oauth ‘keys’, like so:

<route id="twitter-demo">
    <from uri="twitter://search?type=polling&amp;delay=60&amp;keywords=marklogic&amp;consumerKey={{twitter.consumerKey}}&amp;consumerSecret={{twitter.consumerSecret}}&amp;accessToken={{twitter.accessToken}}&amp;accessTokenSecret={{twitter.accessTokenSecret}}" />
    <setHeader headerName="ml_doc">
        <simple>/twitter/${body.id}</simple>
    </setHeader>
    <log message="${body.user.screenName} tweeted: ${body.text}" />
    <to uri="seda:mlfeed"/>
</route>

Of course, once you start doing that, you need someway to make sure you can throttle the speed that things get added to the queue to avoid overwhelming the consumer. Camel has several strategies for this, but my favourite is RoutePolicy. With this you can specify rules that allow the input routes to be shutdown and restarted as necessary to throttle the number of in-flight exchanges. You simple add the Bean like so with an a approprite configuration:

<bean id="myPolicy" class="org.apache.camel.impl.ThrottlingInflightRoutePolicy">
 <property name="scope" value="Context"/>
<property name="maxInflightExchanges" value="20"/>
<property name="loggingLevel" value="WARN"/>
</bean>

and then add this policy to any route you wish to control, like so:

<route routePolicyRef="myPolicy">

Once there are more than 20 messages in-flight (‘context’ means all routes/queues) the inbound routes will be suspended. Once the activity drops below 70% (you can configure this) they’ll start up again – neat.

This only really skims the surface. Camel is a marvellous system and being able to use it to push content to MarkLogic is very handy (if I polish the code a bit).  Wiring routes up in Camel is so much easier, flexible and maintainable, than writing custom, one-off code.

Finally, of course, there’s nothing to say you couldn’t have a route that went away, got some key which was then sent to MarkLogic via Bean to retrieve some data instead and which that then got added to the body (Content Enrichment in EIP). That’ll have to be the subject of another day.

Resources.

  • MarkLogic Developer. http://developer.marklogic.com
  • Apache Camel. http://camel.apache.org
  • Enterprise Integration Patterns (nice hardback book) http://www.eaipatterns.com

Camel, Groovy and Beans

Last year I did an article on using Apache Camel as a switchboard for home monitoring – and a bit of a nod to IoT perhaps. One of the decisions I’d made was, as far as possible, I’d use Spring XML to configure rather than compile my solution, as I was interested in whether I could use Camel as a general tool. [more on Artisan use and tools here] So far, it’s worked out pretty well, until I wanted to upload some files via HTPP POST.

The plan.

Untitled

I’ve a Raspberry PI with a camera module, to take stop-motion images. There’s not much room on the PI, so it’s attached to the WiFi and uploads the photo after taking it, every minute or so. My Camel engine (2.12) is sitting on the server as a servlet inside a copy of Tomcat 7.

Now you might say that all I need is a bit of PHP or a servlet or similar to just dump the file. But, if I did that, not only would I get a ‘point-solution’ just for this need, I’m also reducing my choices as to what  can be done with the data afterwards. What if I want to send every 100th one to Flickr or SMS my phone if the PI stops sending? If I can pull the image (and it’s meta-data) into a Camel route, not only can I just save them, they’re ready for anything else I might want/think of to do with them later.

The technical problem is that the Camel Servlet component I’m using works fine for GET as you, well get, the parameters as Headers. If however you POST, as you need to to upload a file, then you get one long stream of data in the message body with everything mashed together as “Multipart Form-data” or RFC1867. What I need is a way to parse out the image file and headers myself and there’s even a Java library to do it called Commons File Upload. In the normal scheme of things you would create a Java Bean which would be called in the pipeline to do the work for you. But it seems a little against the configure-only theme, so, I need a way to write code, without writing “code”, i.e. script it in.

UntitledNote: This Bean doesn’t need to save the image, just move it into the body in a format where I can use modules like File to save it or route it somewhere else later.

Going Groovy

In my previous article, I’d already used a bit of Javascript in my route, just in-line with the XML. Now another Camel supported script language is Groovy. If you’ve not come across Groovy before, it’s worth a look, Java engine underneath but with the rough edges taken off and a rather nicer syntax. Happily, it also understands straight Java as well as cool Groovy constructs like closures, so you can simply drop code in and it works. You can use Groovy in-line in predicates, filters, expressions etc in routes and everything will be lovely, but it is also supported by Spring (and if you want, you can read about Dynamic Language Beans here) so you can create a Bean with it which seems just the ticket.
Dynamic, flexible and easily shared, drop-in Beans. That’s more like it.

I’m using the Camel Servlet-Tomcat example as the basis for my current engine. To use Groovy, you have to have the following added to your pom.xml :

<dependency>
 <groupId>org.apache.camel</groupId>
 <artifactId>camel-script</artifactId>
</dependency>

 <dependency>
 <groupId>org.apache.camel</groupId>
 <artifactId>camel-groovy</artifactId>
</dependency>

Build the war file and deploy it onto Tomcat. Try the built-in example to make sure it’s all working. Next add the following route in camel-config.xml (in WEB-INF/classes) :

<route>
    <from uri="servlet:///gmh" />
        <setBody>
        <groovy>
            def props = exchange.getIn().getHeaders()
            response ="&lt;doc&gt;"
            props.entrySet().each {
                response += "&lt;header key='" + it.getKey() + "'&gt;" + 
                it.getValue() +"&lt;/header&gt;"
            }
            response += "&lt;/doc&gt;"
            response
        </groovy>
        </setBody>
    <setHeader headerName="Content-Type">
    <simple>text/xml</simple>
    </setHeader>
</route>

Now try http://localhost:8080/[YOUR SERVLET]/camel/gme and you should get a nice XML list of headers and a warm feeling that Groovy is working.

Adding Beans

The problem with adding code in-line is that it not only becomes unwieldy pretty easily, you’re also limited in how you can put it together. This is where Beans come in. They not only allow you to hide/reuse even share groovy recipe code, but work at the Exchange level giving you far more options. To set up the code snippet above as a Bean is pretty easy. a) Create a folder in /classes called /groovy. b) In that create a file (or download) called gmh.groovy.
c) Into that file drop the following code:

import org.apache.camel.Exchange
class handler {
    public void gmh(Exchange exchange) {
        def props = exchange.getIn().getHeaders()
        def response = "<doc>"
        props.entrySet().each {
            response += "<header key='" + it.getKey() + "'>" + it.getValue() + "</header>"
        }
        response += "</doc>" 
        exchange.getIn().setBody(response)
    }
}

Notice that you now need to declare things that were previously hard-wired for in-line code, like the exchange, response etc. Also, there is now a class wrapped around the code which is itself now in a method plus I’ve had to set the body explicitly. But, you don’t need all that tedious XML escaping, and the whole Exchange is available to play with.

To wire gmh.groovy up into the route you need to add the following above the camelContext entry:

<lang:groovy id="gmh" script-source="classpath:groovy/gmh.groovy"/>

Note you may need to declare the “lang” namespace at the top of the file before it will work in which case add the following:

xmlns:lang="http://www.springframework.org/schema/lang"

Lastly, the route can be altered to get rid of the in-line code and use the bean instead:

<route>
 <from uri="servlet:///gmh" />
 <to uri="bean:gmh?method=gmh"/>
 <setHeader headerName="Content-Type"><simple>text/xml</simple></setHeader>
 </route>

Note it’s the bean:id that gets used in the uri, not the class name, – only the method name is in the route.

Now is you re-start the servlet and re-try the url, you should get the same answer as before. There’s of course a lot more you can do with this, there always is, but that’s the basics and it works. So, where does that leave me with my POST problem?

Getting the POST

Suffice to say, it’s not much more difficult, once Groovy is running, it’s just a bit more script. I used the Apache Commons File Upload libraries, and a bit of code (60 lines). Basically, it reads the encoded data in the body and does one of two things. a) If it’s a header, it creates a Exchange.Property called groovy.[header]. If it’s a file, it creates a property with the file name and turns the stream into byte array which gets put in the body.
That gave me a new postMe.groovy script which I wired into this route (Here’s the camel-config as well) :

<route id="POSTTest">
 <from uri="servlet:///posttest" />
 <to uri="bean:PostMe?method=getPost"/>
 <to uri="file:outbox"/>
 </route>

If you set this up and call curl with a file to upload like so:

curl -i -F filedata=@image.jpg http://localhost:8080/[SERVLET]/camel/posttest

Then you should see your file  in  the /outbox folder. You can also add headers like -F stuff=foo and throw in the previous bean to show them in the list as groovy.stuff=foo.

Last thoughts

Groovy is well, groovy and fixes my POST issue nicely. However, it’s proved that running Camel without doing some coding maybe optimistic. Having said that, once you have a Groovy-enabled .war deployed, it’s a great way to add code snippets into your route XML, and use it to create Beans to open out a much wider range of possibilities that can be shared. I can see it would be fairly easy to create a “toolkit” Camel with all these things in and perhaps SQL, SEDA (for queues) etc and a few of these Beans as well as ‘recipes’.