Recently my team was developing an application in Java that needed to insert a lots of records (around one million) into a relational database (AWS Aurora in MySQL mode). We used Spring Boot 2 together with a recent version of Spring Data JPA and Hibernate 5.3 underneath. Records were read from an input file (XML) and persisted in chunks of size up to 250. Each chunk was persisted in a separate database transaction and the whole process was handled by a single thread. It quickly turned out we were facing issue with lack of JVM heap memory.
We were surprised because we were convinced that Hibernate is clearing its cache at the end of a transaction (commit or rollback). A close examination using a memory profiler (jvisualvm) revealed that plenty of entity class objects were held by… Hibernate session manager (the session cache).
A close examination of Hibernate documentation section “Session batching” quickly explained what we have experienced and clarified there is no Hibernate bug involved.
Hibernate caches all newly inserted entities in the session-level cache and they are not removed from this cache when the transaction ends. So in context of batch processes one needs to subsequently create and close Hibernate sessions (this may be inefficient) for parts of the job or explicitly clear Hibernate session cache at some intervals. The second one is the solution recommended by the Hibernate documentation.
After persisting each chunk of records we have added flushing and clearing the Hibernate Session, like presented in the Hibernate documentation in the section “Batch inserts”. The issue with out of memory was gone.