Garantia de Entrega – Design Patterns


Quando nos deparamos com cenários de integrações onde não podemos nos dar ao luxo de perdermos alguma mensagem devido a eventuais problemas de conectividade, diponibilidade ou de qualquer outro tipo, dizemos que esta integração deve ter garantia de entrega das mensagens que são trafegadas, e para endereçar esta questão existem alguns design patterns, e 2 dos mais comuns são:

Guaranteed Delivery:

Este pattern está associado muito ao processo de store-and-foward, ou seja para prevenir que a mensagem se perca, persistimos em disco ou em banco a mensagem antes de seguir para um próximo passo do processo que tenha potencial de falha. O código da integração ou serviço pode ter a responsabilidade de persistir as mensagens, mas não é o ideal, pois geralmente as próprias ferramentas de integração ou servidores de mensageira já possuem esta funcionalidade.

No tibco por exemplo, além de usar o checkpoint, onde o próprio BW persiste o estado do processo, e obviamente as mensagens que estão sendo trafegadas, temos uma forma mais elegante de implementarmos este pattern, que é uma funcionalidade do EMS, onde podemos configurá-lo para que todo subscritor e publicador tenham uma fila local e que o ems se encarregue de copiar a mensagem da fila local do publicador para a fila local do subscritor e enquanto não conseguir isto, a mensagem permanece na fila esperando retentativas de envio por parte do EMS até que a mensagem expire (se a expiração tiver configurada). Esta configuração é feita configurando a propriedade Delivery mode da task JMS Queue Sender com o valor Persistent, como sugere a imagem abaixo:

Durable Subscriber:

Quando o cenário de integração envolve vários subscritores, e todos precisam receber todas as mensagens publicadas, preferimos usar tópico ao invés de fila por justamente a diferença entre eles ser a gestão que o servidor de mensageira implementa, relativa a fazer com que a mensagem permaneça no tópico até todos os subscritores conectados ao a ele recebam a mensagem, e então logo depois remove a mensagem do tópico.

Mas esta gestão está baseada nas conexões ativas originadas dos subscritores no momento que a mensagem é publicada, e isto quando pensamos em garantia de entrega é uma brecha para falhas. Imagine um cenário onde os N subscritores tem que ficar conectados ao tópico, esperando mensagens 24/7, mas por uma questão qualquer a conexão é perdida e alguns segundos depois é restabelecida ou até a máquina do subscritor cai e volta depois de um tempo, sendo que uma mensagem é publicada neste mesmo periodo. Se isto ocorrer o subscritor que ficou indisponível perderá as mensagens que foram publicadas neste periodo.

O Pattern Durable Subscriber tem como objetivo tirar esta brecha de falha, fazendo com o servidor de mensageria registre os subscritores e não dependa mais de suas conexões estarem ativas ou não com ele.

No tibco para tornar um subcritor durável é necessário apenas marcar a checkbox de Durable Subscription nas configurações do subscritor, como sugere a imagem abaixo:

Anúncios

RESTful web services: When to use HTTP PATCH method


HTTP is a very interesting method but just a few application, SOA or Enterprise Integration developers use all of its capabitities. POST and GET among all HTTP Methods are the most used Method followed by PUT. Yet not very used the PATCH method is very useful and it plays a so specific role that it worth understanding its semantics.

In order to better understand HTTP’s PATCH method let’s have a real-world inspired example. The following figure is a simplified version of a integration project that I’ve been working on.

Users  connect to the cloud-based Payroll System in order to do many activivies such as employee data management.  This system holds the “master data” for employees.

Now imagine everytime an employee data is created or edited in the Payroll System we have to send this new/edited data  (the “1” in the figure) to other systems such as a Sales/Remuneration system (the “2” in the figure), a Performance Appraisal System (the “3” in the figure) and so on (the “4” in the figure).

As described in one of my last posts When to use HTTP Post and HTTP PUT, both POST and PUT create or edit a full Resource Representation, i.e., when you have all of its data.

Bringing it to our Payroll example it means we would create a RESTful web service – in our Tibco ESB – based on POST/PUT  method if the Payroll would send not only the edited Employee data but all its data (all attributes). But that was not the case of the Payroll system I was integrating with. The Payroll sends only the edited data, I mean, if a employee move to another apartment in the same building the payroll would send just the Address’ complement without the street name, city, postal code and so on so forth. That’s why HTTP PATCH was created for!

While in a PUT/POST request, the enclosed entity is considered to be a modified version of the resource stored on the
origin server, and the client is requesting that the stored version be replaced. With PATCH, however, the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version.

Therefore, the final result of the RESTful web services created in Tibco ESB for this project was:

  • POST /employee/id:  Creates a employee in the Payroll system;
  • PUT /employee/id: Broadcasts a message  so that all other systems replace a full employee data. Actually, this will only be used in the future in my project and we did not created it;
  • PATCH /employee/id: Broadcasts a message  so that all other systems update just a set of a employee’s data.

That’s all for today. I hope you now are able to make a better decision on when to use POST, PUT and PATCH when designing RESTful web services.

Tchau!

noSQL and SQL databases: Not an exclusive decision


When trying to find real-world noSQL modeling cases to get some inspiration I ran into the NoSQL Data Modeling Techniques from Highly Scalable Blog. After reading the post and its comments I remembered the discussions around one Semantic Web class I attended last year at NCE-UFRJ. During that class we discussed the future of Triple Stores and that maybe OLTP systems would never leave the relational model and start using Triple Stores. Maybe representing data as RDF would be just another way of representing data the way OLAP systems do.

Almost everytime people talk about noSQL database they end up comparing it to SQL database and the relational model. My opinion at the moment is that it’s not an exclusive (XOR) decision. Since everybody say noSQL databases are better in high traffic/load evironments and SQL databases are better in OLTP env (where data change very frequently), we should take advantage of both at the same time. I mean, let’s have a noSQL database be another form of  data representation that is also in relational databases.

I will try to find how Facebook, Foursquare, Twitter etc use noSQL and SQL/Relational databases. If you see any article, presentation or post that talks about how the use such databases types, or if you have used both together please, share it here by posting a comment. It would be better if it shows an end-to-end architecture using both and not just a peace of it, ok?!

Keep in touch!

RESTful Web Services: When to use POST or PUT?


From time to time I have a discussion with my workmate Carlos Filho and our Architecture team at Infoglobo about the use of POST and PUT when developing RESTful webservices. Such discussion happens because we always get confused on when to use POST and when to use PUT.

The convention I’m going to explain here was not stated by us. It’s a definition made by Leonard Richardson and Sam Ruby in RESTful Web Services (Web services for the real world) published in 2007.

When to use POST

Scenario 1) If the client wants the service to create a resource but the client is not sure how (or don’t know) the resource’s URI is going to look like in advance
For example, let’s say you are designing a Restful web service to deal with Products. A possible solution for a product creation would be…
POST /product
…where you would, let’s say, send a XML representation with all product’s specific attributes.
This way the service would could take either a product ID or a auto-generated code to compose this product’s URI and return this URI in the Location HTTP header response. Therefore, the response header to the POST /product request would have something like…
...
Location: /product/1234
...
…where 1234 is the product’s identifier whe talked about.
Scenario 2) If the client wants the service to append a resource to a collection but the client is not sure (or don’t know) how the resource’s URI is going to look like in advance
This is scenario is very similay to the previous one. Considering you have a product catalog you would issue a…
POST /surf-boards-catalog
…and the service would return a Location header such as…
...
Location: /surf-boards-catalog/product/1234
...

Tha’s pretty the same as the first scenario.

When to use PUT

Scenario 1) If the client wants the service to update a resource identified by an URI that the  client knows in advance

This is the case where the client is going to use the URI provided by the service. This URI in the one in Location header provided by the service in the response to the POST request that created the resource.

Scenario 2) If the client wants the service to create a resource identified by an URI that the  client knows in advance

When it’s the client who will decide how will be the resource’s URI, and not the service, the client should use PUT. In this scenario, of course the service knows the URI’s pattern but it’s the client who will tell the service what will be each one of the variable URI parts.

We had an example at Infoglobo of such scenario. We created a RESTful web service that will be consumed by iPhone,  iPad and Android applications developed by Infoglobo. The first time the application is opened it will issue a request to this RESTful service to tell it there is a new device where the application was installed.

Each device is identified by an token and each application has a specific identification depending on the device’s operational system. This way we defined that each applilcation would issue a PUT request such as…

POST /application/[app-id]/platform/[platform-id]/device/[device-token]

In this scenario the service doesn’t know in advance all device tokens and we decided not to register or create all applications and platforms in advance. All of it is done in this single request. Although the service knows the URL pattern  it’s up to the client to define how the final URI looks like.

Hope that helps you when designing your next RESTful web service.

How to make your web application the Tomcat default application


One would say you have to change your application’s war file name to ROOT.war. But this is not the best thing to do.

Following the instructions of the Tomcat Wiki and reading about Host and Content elements in server.xml, I came to the following elegant solution:

<Host name="localhost" appBase="webapps"
 unpackWARs="true" autoDeploy="false" deployOnStartup="false"
 xmlValidation="false" xmlNamespaceAware="false">

          <Context docBase="C:\\Documents and Settings\\mfernandes\\Documents\\workspace-sts-2.9.0.RELEASE\\xilya\\target\\xilya-0.1.war" debug="0" crossContext="true" path="">
               <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/>
          </Context>

          <Context docBase="C:\\apache-solr-3.5.0\\example\\solr\\solr.war" debug="0" crossContext="true" path="/solr">
<Environment name="solr/home" type="java.lang.String" value="C:\\apache-solr-3.5.0\\example\\multicore" override="true"/>
<Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/>
          </Context>
</Host>

<strong>

As you can see in this real example I have an war file of an Grails application and a Solr war file. While the former has the attribute path=”” which means it is the default application, the latter will be accessed by a /solr URL.

Amazon Linux EC2: Running Tomcat on port 80


I thought it would be as easy as editing Tomcat’s server.xml to have Tomcat bind on port 80 but that was not the case. Non root users cannot bind to low port numbers and I wouldn’t like to have my Tomcat run as root.

I resorted to iptables to redirect connections from port 8080 to port 80. It’s just easy as running two commands:

  • iptables -t nat -I PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080
  • service iptables save

While the first one makes the redirection the second one guarantee the redirection will be there after a system reboot.

Don’t change any default Tomcat configuration to change port numbers. Run from authbind and from running your Tomcat as super user!

Loading dbPedia data into a local Virtuoso installation


My next step within a semantic web project is to bring dbPedia data of the types Person, Organization and Place into a local Virtuoso installation.

dbPedia give us its data for downloading in many different sets and formats. In my case I’m working with the 3.7 version.

I started by the Person type where  I plan to make it possible for data entered by end journalists and reporters to be “automatically” linked to dbPedia data. For exemple, when the user says a news article talks about “Lula” I will execute a SPARQL query such as…

SELECT DISTINCT ?s ?label 
WHERE {
 ?s rdf:type <http://dbpedia.org/ontology/Person> .
 ?s rdfs:label ?label .
 FILTER (REGEX(STR(?label), "lula", "i"))
}
LIMIT 100

… in my local Virtuoso installation. The result of such query would be presented to the user for his/her decision about which “lula” the news article talks about. The result of the previous query is…

The 'lula' result SPARQL query

This way one responsible for writing the article will make the decision of whom the article is about. After that I will create a sameAs link between my local data and dbPedia data.

Well, before doing this I discovered it would be a challenge to load dbPedia‘s Person data into the Virtuoso installation in my 4GB RAM notebook. That’s because as stated in Setting up a local DBpedia mirror with Virtuoso loading all dbPedia in a 8 core machine with 32GB RAM it would take 7 hours!

Trying to not to figuring out how much time it would take to load Person data into my Virtuoso, I had another challenge which was how to load the dbPedia data into my Virtuoso.  The problem is that the Quad Store Upload of Virtuoso‘s Conductor seems not to be able to deal with files over than 51MB of triples in it. So… how to import the  531MB of triples in the persondata_en.nt file?

First of all I had to split the persondata_en.nt file into chunks files of 100.000 lines each. Since I couldn’t do it with neither Notepad++ nor Replace Pioneer, I had to resort to a the Linux’s split built in program. The command split -l 100000 persondata_en.nt solved my first problem.

The second one was how to load each 12MB chunck file into Virtuoso. I chosed  Virtuoso’s Bulk data loader. There are to very important things to pay attention to when following the instructions of this documentation.

The first one is that it seems to have an error in the load_grdf procedure of the loader script. I had to change the while condition from while (line <> 0) to while (line <> ”). The second is that it was difficuld to successfuly set of the folder where the chunk files should be placed. After executing the SQL select server_root (), virtuoso_ini_path (); I discovered that C:\virtuoso-opensource\database was my server root folder and that was the place where the chuck files should be placed.

I started the rdf_loader_run(); command at 7:41PM.

It’s 9:47PM now and there are 8 (out of 41) files remmaining. I’ll not wait another hour to write more in this post. See you in the next one!