Friday, August 15, 2014

Use Spring batch to load csv files into Oracle database

Use Spring batch to load csv files into Oracle database:
1. Need set the below for oracle database (to create the tables job repository needs, may check the contents in core spring batch jar)
    <jdbc:initialize-database data-source="dataSource">
 <jdbc:script location="org/springframework/batch/core/schema-drop-oracle10g.sql" />
 <jdbc:script location="org/springframework/batch/core/schema-oracle10g.sql" />
    </jdbc:initialize-database>
2. Need set <property name="isolationLevelForCreate" value="ISOLATION_READ_COMMITTED" /> so that there's no below error:
Spring Batch ORA-08177: can't serialize access for this transaction when running single job, SERIALIZED isolation level
3. May set the skip lines for csv files:
<property name="linesToSkip" value="1"/>
4. csv file location could be specified by classpath or file way:
<property name="resource" value="classpath:...">
<property name="resource" value="file:c:/...">
5. Load 2G csv files with 41 million lines, it took 221 minutes. Much slower than SQL*Loader but should be good enough for most projects

The performance for Spring Batch is about 2700/s while Oracle SQL*Loader is above 100k/s. Oracle SQL*Loader (direct path) is 40 time faster than Spring Batch.

No comments:

Post a Comment