URLRewriting in SimpleXist

Earlier on in the year I set up a SimpleXist system as a basis for training xquery and the XRX method on the eXist database. The simpler, “code in the file system, data in the database” idiom the guys were used to and a clean, minimal setup has proved pretty successful. Now we’re doing more complex things, having all our urls as project/something.xql has become a bit of pain. It would be nice to be able to do something prettier and more “REST” like getting project/123 turned into project/projects.xql?id=123 automatically. What we need is a URLRewriter.

Originally I didn’t use rewriting in SimpleXist as I was only after the basics, though I also  never cared for the complexities of either the internal eXist rewriter or the various XProc offerings. [actually, I was seriously tempted to set up SimpleXist using Cocoon which is excellent, but sadly it’s a bit old now, needs eXist as a block and could become obsolete at a stroke in a future eXist version].

What I went looking for was a rewriter that would do it’s job and then get out of the way, with a simple but flexible syntax. As with everything else in this project, if possible I wanted to leverage something people might already know – a bit like mod-rewrite in fact. A bit of research led me to a neat little web filter for servlet containers called UrlRewriteFilter by Paul Tuckey which seemed to fit the bill.

There’s a short, well documented, install on the site, but to reprise in SimpleXist:

  1. Put urlrewritefilter-4.0.3.jar into ROOT/WEB-INF/lib .
  2. Add the following to ROOT/WEB-INF/web.xml just after
    <display-name>SimpleXist</display-name>

    <filter>
      <filter-name>UrlRewriteFilter</filter-name>
      <filter-class>org.tuckey.web.filters.urlrewrite.UrlRewriteFilter
      </filter-class>
    <init-param>
      <param-name>logLevel</param-name>
      <!-- Go to DEBUG if you want chatty -->
      <param-value>INFO</param-value>
    </init-param>
    </filter>
    <filter-mapping>
        <filter-name>UrlRewriteFilter</filter-name>
        <url-pattern>/*</url-pattern>
        <dispatcher>REQUEST</dispatcher>
        <dispatcher>FORWARD</dispatcher>
    </filter-mapping>

Create a urlrewrite.xml file to hold our rules in ROOT/WEB-INF and put in the following:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 4.0//EN"
 "http://www.tuckey.org/res/dtds/urlrewrite4.0.dtd">
<urlrewrite>
 <!-- Rules go here. Default matcher is regexp. -->
<rule>
 <name>Project resources passthrough</name>
 <from>^.*/resources/.*$</from>
 <to last="true">-</to>
 </rule>
<rule>
 <name>Neat URL</name>
 <from>^/project1/([0-9]+)$</from>
 <to>/project1/index.xql?id=$1</to>
 </rule>
<rule match-type="wildcard">
 <name>Redirect oldproject</name>
 <from>/oldproject/**</from>
 <to type="redirect">/newproject/$1</to>
 </rule>
</urlrewrite>

Restart Jetty and try http://127.0.0.1:8080/rewrite-status : you should get status messages and the current ruleset. There is a comprehensive manual and a long list of examples but the above is a starter for three:

  1. Static resources are passed through as are (to dest = -). Stop matching here if fired (last=true).
  2. Using default regexp matcher, invisibly forward /project1/22 to /project1/list.xql?id=22.
  3. Used simpler wildcard matcher to redirect anything /oldproject to /newproject

Note: Rule 1 is useful because it stops static resources forcing the whole rule set to be evaluated for best match (which is the default).

With just these three, we’ve got the basis for project-wide rules, feeds, simpler URLs etc. Of course this is just the start; we can use <condition>, <set> and a host of other features to make more complex rules. I’m wary though of going too far down this road though as it’s easy to end up putting more and more of the application logic in here, when it should be in the code or creating a tight coupling of passed-in values.

All in all UrlRewriter has been really useful. It’s allowed us to write cleaner URLs, without a lot of extra know-how and is complex enough to do more if we need it to. Result!

Advertisements