Friday, May 22, 2009

Experience with using multiple datasources in JEE application

We started with multiple data sources in one of our project. One of the reasons why it was decided to use multiple data sources was to reduce performance impact. It was SOA so we have started assigning a dedicated data source with dedicated authorization id to each service. The very first problem what we encountered here was maintenance of authorization ids. As different data sources were using different authorization id, we had to keep track of what data source uses which authorization id. After having around 8-10 data sources this maintenance became bizarre. We learn from that experience and made all data sources using same authorization id in a given environment. So now we have all data sources using same id in DEV environment and other id into QA environment and all. This model works pretty well with us.

Second problem in using multiple data sources was mysterious dead lock issue . We were using all these data sources for accessing same DB. When multiple data sources are nvolved into single logical unit of work (i.e. logical transaction), each data source starts its own transactional branch. If this transaction with isolation level set to READ_COMMITTED involves multiple tables and some of them are having foreign key relationship then it might happen that one transaction branch insert a record into table x and other transaction branch also tries to insert a record into table y which has foreign key relationship to table x. In this scenario table y will time out if record to be inserted into table y is related to a record inserted into table x. Reason behind this is, record inserted into table x is locked by transaction A so record to be inserted into table y can not read the record from table x for checking referential integrity as its in other transaction and isolation level is set to READ_COMMITTED.

So important points to be considered in using multiple data sources are:
  1. Please try to stick to same authorization id for a given environment, if not required.
  2. Multiple data sources should not be pointing to same database, otherwise it might lead to dead lock issue quite often. Use multiple data sources if its required to point to different databases.
  3. In case of multiple data source and remote EJB call scenario one should use XA data source as it might lead to a XA transaction.