Hibernate Batch Processing

Batch Processing allows you to group related SQL statements into a batch and submit them with one call to the database. When we club several SQL statements to the database at once, we reduce the amount of communication overhead, thereby improving performance. Since we are working with ORM, Entities,  their key generation strategy plays a vital role in performance enhancements.

@GeneratedValue strategy in Entity: There are different strategies to generate primary key for Entities. These key generation strategies brings a huge impact on performance when amount of data to be processed gets increased.

For more reference, check this blog.

To reduce the interactions with RDS, we added a middleware for caching the data, REDIS. Redis is an open source, in-memory data structure store. It supports data structures such as strings, hashes, lists, sets, sorted sets. 

Every request for data from RDS, was fetched once and served from cache afterwards. Every request to process (save/update) data is first added to the cache, and a background engine was responsible to sync up this data in batch from Redis to RDS. Thereby, we divided our data processing into two major parts:

Save only data:


We used List data structure of Redis. Since multiple threads will be concurrently inserting data in the database, thereby we preferred using a key generation methodology by hibernate. Primary key Generation technique used for such entities:

@GeneratedValue(strategy = GenerationType.IDENTITY).

By default every database interaction in Hibernate caches the data in its first level (default) caching system. While dealing with a huge data, saving the same in cache of Hibernate, requires an external clean up task of the cache. On the other hand, StatelessSession in hibernate reduces this overhead, by the principal of No caching at all!

SessionFactory sessionFactory = entityManagerFactory.unwrap(SessionFactory.class);
StatelessSession session = sessionFactory.openStatelessSession();
session.beginTransaction();
int i = 0;
for (T t : entites) {
	session.insert(t);
	i++;
	if (i % batchInsert == 0) {
		session.getTransaction().commit();
		session.beginTransaction();
	}
}
session.getTransaction().commit();
session.close();

Batch Insertion configuration in application.yml file:

jpa:
  properties:
    hibernate:
      jdbc:
        batch_size: 2000
        order_inserts: true
        order_updates: true

Save and update data:


ntityManager first executes a select command in database to check the existence of the tuple on the basis of primary key. If the primary key is null, data is inserted directly without any check, else, insertion or updation is decided from the output of select command. Since each tuple insertion will be having a safeguard select command, therefore no default key generation technique was used, we generated the primary keys manually (without using any Generation strategy). If the data was fetched from RDS, then predefined key was used, else a manual key was created for the same. We used Hashmap data structure in Redis for this implementation. As mentioned, Hibernate by default maintains a first level cache, so we need to flush(); and clear(); the cache after certain number of records get processed.

int i = 0;
for (T t : entities) {
	entityManager.merge(t);
	i++;
	if (i % batchSize == 0) {
		entityManager.flush();
		entityManager.clear();
	}
}
entityManager.flush();
entityManager.clear();

Determining the optimal batch size was done by hit and trial methodology. There must be a fine balance between the number of threads of the background engine, batch size of data to be processed in each thread and number of HikariCP maxConnections. 

1 thought on “Hibernate Batch Processing”

Leave a Reply

Your email address will not be published. Required fields are marked *