5.08.2015

String Encoding for Talend/JaspersoftETL for Basic Authentication/md5/sha-256

What is a convenient way to encode usernames/passwords for use in my ETL jobs?

First copy the java code on this Github Gist into your clipboard:



Second Create a routine called hashingUtils (name is important!). Paste the Java code from above and save the routine









Third you can now use the following methods in any component:

hashingUtils.basicAuth("username","password")
hashingUtils.md5encode("string")
hashingUtils.sha256encode("string")

For example, in a tHttpRequest Component:


6.06.2014

Dealing with Nested Documents in MongoDB and Talend (aka Baking a Cake)

Abstract:

MongoDB is a very popular document database that gives you a huge amount of flexibility when it comes to storing data. On the other hand, the traditional relational model is far less flexible in how data is stored - you're limited to columns and rows. Sometimes you want to go from a flexible model like MongoDB to a relational one, that's what this post attempts to explain using the Talend/JaspersoftETL

I do not want to get into the relational vs non-relational model argument in this post - it's only an example if you need to do this for some reason...I'm pro choice :)

Scenario:

We have a JSON document in MongoDB that looks like this:
 {  
      "id": "0001",  
      "type": "donut",  
      "name": "Cake",  
      "ppu": 0.55,  
      "batters":  
           {  
                "batter":  
                     [  
                          { "id": "1001", "type": "Regular" },  
                          { "id": "1002", "type": "Chocolate" },  
                          { "id": "1003", "type": "Blueberry" },  
                          { "id": "1004", "type": "Devil's Food" }  
                     ]  
           },  
      "topping":  
           [  
                { "id": "5001", "type": "None" },  
                { "id": "5002", "type": "Glazed" },  
                { "id": "5005", "type": "Sugar" },  
                { "id": "5007", "type": "Powdered Sugar" },  
                { "id": "5006", "type": "Chocolate with Sprinkles" },  
                { "id": "5003", "type": "Chocolate" },  
                { "id": "5004", "type": "Maple" }  
           ]  
 }  
We want to put the document into three different tables,  for use in a relational database:

Main Table:

ID TYPE NAME PPU
0001 donut cake 0.55

Batter Table:

BATTERID BATTERTYPE CAKEID
1001 regular 0001
1002 chocolate 0001
1003 blueberry 0001
1004 devil's food 0001

Topping Table:

TOPPINGID TOPPINGTYPE CAKEID
5001 None 0001
5002 Glazed 0001
5005 Sugar 0001
5007 Powdered Sugar 0001
5006 Chocolate with Sprinkles 0001
5003 Chocolate 0001
5004 Maple 0001

Talend tMongoDBInput Component

The tMongoDBInput component is what we'll use to extract the data from MongoDB and input into Talend. This is fairly straightforward, you can configure the connection and then you will have to map some fields. We map id, type, name, ppu as regular fields. The batter and topping fields are somewhat special. 

Toppings:
The toppings field will be mapped a string and will contain an object in each row that we'll need to deal with later

Batters:
Batters is a nested document within another nested document and we'll need to define a parent node path. tMongoDBInput does not allow you to use node notation to select fields (like batters.batter) so you'll define a parent node like this:
In the end the schema for tMongoDBInput will look like this:

tMap Component - Splitting

Now we'll use a tMap component to split the outputs into three (main, topping, batter)



tExtractJSONFields - Expanding Documents
So now we have to deal with the special fields, batter and topping - each one of them contains a list that we want to denormalize into it's own rows. We can use the tExtractJSONFields component for this. The component has a couple of key things: schema, Loop XPath Query and XPath query. This determines what fields to map.

Schema:

Then set the Xpath queries:

And finally glue it all together:
Note that we have 3 input rows, then for each batter and topping we are extracting.

If you are interested in this example, you can download it here:

  • Data: This is a JSON object you can import with mongoimport (like  mongoimport --jsonArray --collection nestedSample < doc.json)
  • JaspersoftETL Job: This job works with JaspersoftETL 5.4.1 Big Data Edition (or presumably the Open Source Big Data edition of Talend)

4.04.2014

Deleting Old Scheduled Report Ouptuts Using the Jaspersoft REST API

I get this request quite a bit - how do I clean up old artifacts from the Jaspersoft repository? Old artifacts are typically reports that are scheduled and then saved into the repository. I explain this in a short video:



Here are some specifics about the Talend/Jaspersoft ETL job used:
  • tHttpRequest - Does a GET to the URI "http://localhost:8080/jasperserver-pro/rest_v2/resources?recursive=true&type=file&folderUri=/public/ScheduledReports" and writes it to an XML file. API reference- see "Searching the Repository"
  • tFileInputXML - Reads the XML response from file (written by tHttpRequest)
  • tFilterRow - Filters the XML file on the field "creationDate" with the value: TalendDate.parseDate("yyyy-MM-dd", "2014-03-04")
  • tFlowToIterate - takes each request and iterates through them
  • tREST - For each file (URI) we run the DELETE command: "http://localhost:8080/jasperserver-pro/rest_v2/resources" + ((String)globalMap.get("row2.uri")) API reference- see "Deleting Resources"
You can find my example job here - it is for Talend 5.4.1 

12.01.2013

Geospatial Features in MongoDB with Jaspersoft

Lately I have been posting technical examples to the Jaspersoft community wiki instead of this blog. I'll cross-post this one because there are not many examples of doing this. When working with Geospatial data, I found the $near operator to be pretty sweet. Let me show you by example.

In the screenshots below you'll see that I constructed a report that shows "cases" on a map. When you click on a specific case you will be taken to a detail level report. The detail level report will show you cases nearby for that service. So for example, if someone reports a broken street light on Market Street, this might help you locate similar cases.

So how is it done?
 Quite simple really, take a look at this mongodb query:




  • Line 8: The $P{location} parameter is a java.util.List that contains longitude and latitude coordinates, something like [-122.4194200,37.7749300]
  • Line 11: How many meters from the center (specified in line 8) should it search
  • Line 13: The $P{service} parameter is a string containing the name of the Service, in the screenshot above it's Missing Sign
  • You will need to have an index on your location field, like so: ensureIndex({"location" : "2dsphere" } )

  • That's it, I've never really done anything with geospatial data and this seemed like it was a piece of cake.

    References:


    6.27.2013

    New MongoDB Components in Talend/JaspersoftETL

    I uploaded two videos with examples of using the tMongoDBInput and tMongoDBRow components that are shipping with JaspersoftETL 5.2.2 (Plus edition) and Talend 5.2.2+

    I recommend you change the quality to 720p to watch the videos in better quality. I hope you find them useful!

    Part 1: Using tMongoDBInput (and using Talend's built in aggregation engine):



    Part 2: Using tMongoDBRow (and the aggregation framework):

    2.28.2013

    1.02.2013

    Report Bursting in JasperReports Server

    What is report bursting?

    Take this example. You have a report that shows all the accounts that a particular employee owns in your company. The user can log into your reporting system and ask for the report, he'll be asked to enter his employee ID or maybe it will be filled in automatically based on who he/she is.

    Now your boss asks you (a BI professional) to send each employee their list of accounts on a weekly basis...yikes! You need to automate this. What do you do?

    As of Jaspersoft version 5.0 there's no UI built into the web-application to handle the complex task of report bursting. Designing an UI that let's you iterate through any possibility of inputs, read any possibility of values and output to any possible medium is tricky, for now you have to build your own utilities using the rich SOAP or REST APIs that are available in JasperReports Server. The easiest way I found to do this using is using JaspersoftETL (Talend) and a component called tJasperServerExec

    The custom component calls the API of Jaspersoft to handle the report execution end. There are more components like this coming one day soon that let you use the scheduler built into Jaspersoft, for this example we use the mail sending capabilities of Talend.

    What the video doesn't cover is scheduling of this job. The community editions of ETL rely on you to schedule jobs yourself using cron or the Windows scheduler. The commercial editions allow you to publish the job to the Administration Server which have the ability to schedule from a web UI.

    So here's how it's done: