A peek into WebEngage’s security layer – super cool use of Java annotations

Note: If you are new to attribute programming, we recommend giving Attribute based programming or Java Annotations a read first.

To begin with, let us show you a small code snippet from one of our controllers. The saveSurveyResponse method underneath gets called every time some user takes an in-site survey on a customer’s website.

/**
 * This method is invoked everytime someone takes a survey on a customer's website.
 * It performs two main tasks -
 * 1. saves the user's response in our database
 * 2. refreshes the stats and data graphs for the corresponding survey
 */
public ... saveSurveyResponse(...) throws IOException {
  //mundane code here to compute a response object "surveyResponseDto"
 
  //first save the response in database
  this.publisherBc.saveSurveyResponse(licenseCode, surveyResponseDto);
 
  //then, refresh all the analytics associated with this survey
  this.publisherBc.refreshSurveyStatusOnResponse(licenseCode, surveyId);
}

As you can see above, the code performs two major tasks – #1. save the response in database and #2. refresh the analytics associated with the corresponding survey. We pre-compute a lot of data graphs and collate the information to create a bunch of Maps, so that the stats for our customers can be presented in real-time, in its true essence. That’s a lot of computing work. The method refreshSurveyStatusOnResponse usually takes about half a second (and much more at times) to complete. We can’t keep the end user on a third party website waiting because (s)he took a simple survey and we got in to some crazy computing business!

Solution? Simple – make that method call asynchronous.
Yes, one would either use a batch process or make that call asynchronous. However, it is a much bigger problem to address. There are several such methods in any application stack which should be executed asynchronously to keep the user experience intact. E.g. in any action that needs to send out an email, the sending email part can easily be made asynchronous because SMTP relays can be painful at times leading to huge lag; in the process, it keeps your end user waiting for a response.

Implementation? We created a cool Java annotation called Asynch.
We first created an annotation interface called Asynch and then implemented a method interceptor to look for this annotation on the callee. If the annotation were present, we’d execute it further in a new thread. Snippet from the implementation below.

/**
 * Defining the Asynch interface
 */
@Retention(RetentionPolicy.RUNTIME)
public @interface Asynch {}
 
/**
 * Implementation of the Asynch interface. Every method in our controllers
 * goes through this interceptor. If the Asynch annotation is present,
 * this implementation invokes a new Thread to execute the method. Simple!
 */
public class AsynchInterceptor implements MethodInterceptor {
  public Object invoke(final MethodInvocation invocation) throws Throwable {
    Method method = invocation.getMethod();
    Annotation[] declaredAnnotations = method.getDeclaredAnnotations(); 
    if(declaredAnnotations != null && declaredAnnotations.length > 0) {
      for (Annotation annotation : declaredAnnotations) {
        if(annotation instanceof Asynch) {
          //start the requested task in a new thread and immediately
          //return back control to the caller
          new Thread(invocation.getMethod().getName()) {
            public void execute() {
              invocation.proceed();
            }
          }.start();
          return null;
        }
      }
    }
    return invocation.proceed();
  }
}

Done? Just that much?
Oh yes, pretty much. Here’s how the declaration of heavyweight method, refreshSurveyStatusOnResponse, looks like –

/**
 * So, earlier we had a simple method in our interface which we later
 * annotated with the Asynch @interface. Bang! The caller doesn't need
 * to worry about it now. This method (no matter who the caller is)
 * gets executed asynchronously. Ain't that awesome? 
 */
@Asynch
public void refreshSurveyStatusOnResponse(String licenseCode, Integer surveyId);

How did we use annotations to build our security layer?
Now that we gave you a fair idea of how we are using Annotations to our advantage, let’s dive a bit deeper. You are about to see how we built our entire Authorization and Authentication stacks using Java Annotation.

We are security freaks. Be it the web layer or data exchange layer, access to all our code is protected. The caller of a method is denied entry to the method if it does not have right privileges. Underneath is a small snippet from one of our web layer controllers. This method is invoked when someone tries to edit a survey from the WebEngage dashboard.

/**
 * This is a public method invoked via a URL on the site. Once a user on the site
 * tries to reach this method, the "rules" specified below (via annotations) are
 * evaluated. If it matches with the criterion specified for the UserAuth class,
 * the user is allowed an entry into the method; otherwise is shown the exit door!
 *
 * In the specific example below, only "signed-in" users who are "authorized 
 * publishers" (our terminology for WebEngage customers) AND have access to 
 * "survey configuration" related features, are allowed entry into this method.
 */
@UserAuth (
  userTypes = {
     UserType.SIGNED, 
     UserType.AUTHORIZED_PUBLISHER
  },
  publisherUserFeatures = {
     Feature.SURVEY_CONFIGURATION
  }
)
public ... edit(...) throws IOException{
 
}

Pretty nice. Right? So just by annotating the edit method with UserAuth, we made sure that survey edit URL’s returns a sweet nothing to those who are not supposed to use those URL’s. Where’s the beauty? This piece of annotation is reusable; we use it in a variety of ways on pretty much all the code that needs to be protected behind the concepts of user and their corresponding roles.

Of-course there’s a lot of application specific code behind understanding the annotation UserAuth. However, that’s one time and we have managed to reuse it in a much more sophisticated manner inside our Business Layer as well. Take a look at this usage in one of the methods below:

/**
 * This is our business component method for customers who intend to change
 * the styling of their surveys to match the CSS with their site's look and
 * feel [oh, we have a sexy CSS editor inside dashboard for them to do so ;)]
 *
 * This is a nested annotation. A list of @Authorize methods can be specified
 * as rules. Each of them specifying the method to be called (for authorization)
 * and the argument to be passed to it. For the caller to get access, these
 * values should meet the AuthRules criteria. If it does not, an 
 * AuthorizationException is thrown.
 */
@AuthRules(
  authRules = {
    @Authorize(
      method = "hasPublisherUserFeatureAccess", 
      sargs = {"$0", "SURVEY_STYLING"}
    ), 
    @Authorize(
      method = "hasPublisherFeatureAccess", 
      sargs = {"$0", "WE_SURVEY_CUSTOM_CSS"}
    )
  }
)
public void saveSurveyStyleCss(
  Integer publisherId, 
  AdhocAttributeName cssAdhocAttributeName, 
  String css
) throws AuthorizationException;

This explains how powerful annotations can get. At WebEngage, we use them to the fullest. Hope this article helps you build some cool stuff with attributes. Do let us know!

Note: Use the Asynch annotation idea with care. Spawning new threads without being in control can be fatal. If you plan to use it, make sure the threads are fetched from a pre-created thread pool.

Stay tuned. We love you!

How not to do URL redirects (… the way Quora does) [Update: Quora has fixed the issue]

If you were running an online platform or a service wherein you deal with user generated content, you would often want to keep a track of links that your users post and the corresponding clicks on those. We, as webmasters, love analytics of any sort because it can be put to great use – to make better products and to make lives simpler. Trouble begins when we start doing it the wrong way.

Some WebEngage customers complained that website referrer based rules in targeting for surveys were not working when they specified referring links from Quora (#get-more-context-here). We investigated the problem and realized it was one of those bad engineering practices that people use while choosing to do URL redirects to external websites from their web applications.

What’s the wrong doing?
Quora redirects in such a way that the destination website doesn’t come to know what the original site referrer was. Take this Quora thread for example – http://www.quora.com/What-are-everyday-apps-that-use-cloud-computing. As you can see, the answer by Mat Ellis contains a link to Gigaom – http://gigaom.com/2010/06/08/how-zynga-survived-farmville/. Quora overrides it and converts the link to http://www.quora.com/_/redirect?url=http%3A%2F%2Fgigaom.com%2F2010%2F06%2F08%2Fhow-zynga-survived-farmville%2F&sig=4f01ab. Overriding links is absolutely okay.

This is where the problem starts. Underneath is a snippet of the response (header and body) sent by Quora for such redirect requests –

Response Headers
HTTP/1.1 200 OK
Server: PasteWSGIServer/0.5 Python/2.7.2
Date: Thu, 19 Jan 2012 12:37:52 GMT
Content-Type: text/html; charset=utf-8
Pragma: no-cache
Cache-Control: no-cache
Content-Encoding: gzip
Content-Length: 135
 
Response Body
<html>
  <head>
    <meta http-equiv="refresh" 
     content="0; url=http://gigaom.com/2010/06/08/how-zynga-survived-farmville/">
  </head>
</html>

As you can see, Quora emits a response body for those requests. This means the actual redirect happened to Gigaom as a page refresh from within the browser when the current location was http://www.quora.com/_/redirect... instead of the old school 302 browser redirects. Now, if the Gigaom site or any script on Gigaom’s pages wanted to know which URL’s are users on their site coming from, all they would get to know is the URL http://www.quora.com/_/redirect... They would never ever come to know where the user actually came from, e.g. in this case it should have been http://www.quora.com/What-are-everyday-apps-that-use-cloud-computing.

A simple mistake and such loss of precious information.

If Quora would have simply redirected to the destination site from their backend, there would have been no problems at all. We understand that at times you don’t have a choice but to redirect via an HTML response body to the destination site. In that case, you should do it the way Google does it for its search results (Oh, you know that Google tracks your search clicks, right?). They changed their redirection mechanism a bit since the launch of new suggest feature. As you keep searching on Google with their suggest feature on, you only change a fragment (portion after #) in the URL. Browsers (read User Agents) don’t add such anchors, as they are called, to HTTP request header called Referrer. To counter this, Google composes their redirect URLs in such a way that it gives you the original search query in the redirect URL. Based on this particular parameter a lot of products thrive – the biggest being Google Analytics which tells you about search queries that led to your website. The point is simple – preserve and pass on the context for downstream applications to work as expected.

Why are we complaining?

For the uninitiated, WebEngage is an in-site short survey tool. We let you create surveys and display those on your website in a “targeted” manner – basically, we let you filter based on multiple things like visitors geography, first time visitors, pages on your website, user’s browser etc. One of the targeting parameters is Referring site (images below).

Specifying referrer based targeting for a WebEngage survey

Short survey in action

For users trying to specify Quora URLs in the referring site section, it wouldn’t work as expected. We found out why and hence this blog post.

As online content sharing platforms keep growing, user are being offered innovative ways to share information. It becomes all the more important for these platforms to realize that there are hundreds and thousands of applications which empowers their users and others in variety of ways. Every single engineering decision they make, affects not only their own application but also the ecosystem around it. Moreover, getting it right is no rocket science either. Just stick to basics; there’s no need to re-invent because Sir Tim Berners-Lee has done all the hard work for us decades ago.

We rest our case here.

Update (20th Jan, 2012): After an uproar on Hacker News and a massive support for the cause, someone finally asked this question on Quora – http://www.quora.com/Why-does-Quora-redirect-to-URLs-in-a-way-that-loses-the-original-referrer. We’ll update this post when an official reply comes from Quora.

Update (28th Jan, 2012): We are pleased to announce that Quora has fixed its redirection logic to the old school way. This means no more broken web. We truly appreciate this gesture from Quora and thank the developer community for standing behind us.

Try a demo feature – why and how we did it

For the uninitiated, WebEngage is a simple customer feedback and short survey tool for websites. You can read more about the tool on WebEngage.com. We have a cool feature inside WebEngage called “try a demo“. In brief, the feature let’s you see beforehand as to how the WebEngage Feedback tab and the Survey window will appear on your site once the integration is done.

Slightly ambitious, but if Google planned to use WebEngage, this is how it would have looked on their home page

We have received innumerable requests from developers asking us to disclose how we did it. The urge is to such an extent that there is whole discussion thread on Stack Overflow around this feature.

http://stackoverflow.com/questions/7849466/showing-a-demo-of-my-css-on-any-website

This feature is a huge hit. Not only with developers but also with people willing to use WebEngage on their websites. In most cases we have seen that an online demo precedes a sign-up for WebEngage. Rightly so, because you want to see how it looks on your site before you go ahead with the integration. We have built an entire online demo application on top of this feature which we share with our bigger prospects. Before you read further, give the demo a try on your website – demo.webengage.com

How did we do it?
There are three components to make such a demo functional –
  1. A Javascript widget that works cross-domain on a third party website
  2. A Web crawling bot that can fetch responses from any public URL
  3. A Web page parser to sanitize the response (#2) and modify it by inserting the widget (#1)
1. The JS widget
<webengage license="your_license_code">
  <script id="_webengage_script_tag" type="text/javascript">
    (function(){       
      var _we = document.createElement('script');       
      _we.type = 'text/javascript';       
      _we.async = true;       
      _we.src = "//widgets.webengage.com/js/widget/webengage-min-v-2.0.js";       
      var _sNode = document.getElementById('_webengage_script_tag');          
      _sNode.parentNode.insertBefore(_we, _sNode);     
    })();
  </script>
</webengage>

The code sample above is what we give out to our customers. The idea is to do all communication from a third party website via JSON requests – a technique in which you create dynamic <script> tags on a third party site to fetch data in realtime. These kind of Javascript widgets can also qualify to be a bookmarklet. Once the static JS above loads, you’d see that we make some dynamic requests thereafter to display the feedback tab and the survey window.

2. The crawling bot

With robust libraries like Apache’s HTTPClient (JAVA), cURL (PHP) and PycURL (Python) etc, it is easy enough to fetch responses from URL’s. Here’s a snippet of how we did it in JAVA

public Object getPageByURL(URI uri) {
  HttpClient client = new HttpClient();
  client.getHostConfiguration().setHost(uri);
  GetMethod method = new GetMethod(uri.toString());
  int statusCode = client.executeMethod(method);
  if (statusCode == HttpStatus.SC_OK) {
    //process the response
  }
}
3. Response processor

There are multiple things to do here.

First, you need to make sure that all the resources that are needed to render the page are downloaded from the right location. So, if the webpage has relative URL’s for CSS, JS, Images etc, the browser should be asked to fetch it from the right location. The simplest way to do it is to add a <base> tag in the <head> tag of the page. Underneath is how –

<base href="url_of_the_resource_that_was_fetched_through_the_bot">

However, there is a small twist – if the site already has a <base> tag defined in its page, you have to make sure that you are not overwriting it.

Second, you got to add the JS widget code at the end of the page.

To do both of the above, we needed to parse the page so that we could add these at the right location. Think ugly regex patterns and string manipulation. We decided to do it the easy way and Tidy our html response. Once the response was sanitized by this framework, it was very easy for us to identify the nodes we wanted to insert the above mentioned code snippets. But, to our surprise, a whole majority of users started complaining about the demo not working accurately as their websites looked drastically different in our demo as compared to what it would look like when viewed in the browser directly.

We scratched our head and almost everywhere in the body to figure out that these sites themselves were the culprit. They had malformed or untidy html markup. Tidy was removing or adding code to make it look tidy! And then we realized that the browser and Tidy behaved very very differently for untidy markups. For example, while the browser is okay with 2 <body> tags in a html response, Tidy is not – it will eat one up. (Un)fortunately, the browser renders _as_per_user_expectations. And we had to do the same thing.

Sounds easy right. Don’t parse the page at all. Pass it on to the browser the way it is. Well, yes and no. Remember, we need to insert two code snippets as mentioned above? So, we wrote a sweet and simple html parser which was completely fault tolerant – as in, it knew that html coders are drunk fellas who can choose not the start with <html>, who can choose not to close any tag they wish and who can have a <head> tag inside the <body> tag!

It’s an entire package we wrote and not all of it could be shared here. Here’s some pseudo code from a subroutine that you might find interesting.

/**
 * core parsing function which returns a Map of html Node name
 * and a List of HTMLTag's within it. The HTMLTag data structure looks
 * like this -
 *
 * public class HTMLTag{
 *   int indexStart;
 *   int indexEnd;
 *   int indexStartTagName;
 *   int indexEndTagName;
 *   String tagName;
 *   Map<String, HTMLAttributeValue> attributeValues = 
 *               new LinkedHashMap<String, HTMLAttributeValue>();
 * }
 */
public static Map<String, List<HTMLTag>> parseHTML(char[] html){
  Map<String, List<HTMLTag>> map = new LinkedHashMap<String, List<HTMLTag>>();
  int i=0;
  while(i < html.length){
    if(html[i]=='<'){
      HTMLTag htmlTag = new HTMLTag();
      htmlTag.setIndexStart(i);
      i++;
      //findToken is a recursive function which binary searches for the 
      //start and close of a particular tag with certain prefix and suffix
      //(the third and fourth parameters). It returns a
      //Token object (int startIndex; int endIndex; String token;) 
      Token tagToken = findToken(html, i, whiteSpaceChars, whiteSpaceChars);
      String tag = tagToken.token;
      if(html[tagToken.endIndex] != '>'){
        //parse for all kind of tags here
        //and store the data in map
      }else{
        i = tagToken.endIndex+1;
      }
    }else{
    i++;
  }
  return map;
}

We have made sure to give you a wow experience with our demo feature. Hope this post post gives you enough insights to build your own.

Stay tuned. We love you!

Hello world!

/**
 * Hello people! Welcome to our engineering blog.
 * We, at WebEngage, are die-hard believers and evangelists of open-source software.
 * Some amazing open-source frameworks we actively build or contribute to, include
 *
 * UrlRewriteFilter - A JAVA port of Apache's famous URL rewriting module
 *                    mod_rewrite. URF Works as a JAVA web filter
 *                    Project URL - http://code.google.com/p/urlrewritefilter/
 *
 * ResponseHeaderFilter - A cool JAVA filter which makes it super easy and abstract
 *                    to set response headers. No more writing, for example,
 *                    response.setContentType("application/json") in all your
 *                    controllers that serve responses to URLs starting with /json/
 *                    Project URL - http://code.google.com/p/responseheaderfilter/
 *
 * AntWebTasks - Apache Ant is a popular build tool. AntWebTasks is an extentions
 *                    of Ant which contains some targets specific to web
 *                    application build and deployment. Cache-busting of URLs for
 *                    static resources like JS, CSS, images etc can elegantly be
 *                    handled outside code using this framework
 *                    Project URL - http://code.google.com/p/ant-web-tasks/
 *
 *
 * On this blog, we'll talk about some of the cool things we have done at
 * WebEngage. We'll also talk about some difficult to make engineering
 * decisions and our beautiful solutions to some of the complex problems
 * we faced. We'll also discuss our upcoming API with you here.
 *
 * If you have any interest in Java, Javascript, NoSQL and
 * Attribute oriented programming, stay hooked - we have some kickass
 * stuff coming your way in days to come. Happy subscribing :)
 */
 
 public class Main{
   public static void main(String[] args){
     System.out.println("Hello World! We are programmers with emotions :)");
   }
 }