Thursday, August 28, 2014

Spring integration upload file

The framework is sometimes stupid.
For example, for Spring integration ftp upload, it rename the files after the upload is completed to avoid the partially written files.
The problem is that on our FTP server, we only have the permission to upload, no permission to rename on ftp server.
Under such case, it just cannot upload, it should have a mode to
String remoteFilePath = remoteDirectory + file.getName() + FileWritingMessageHandler.TEMPORARY_FILE_SUFFIX;
// write remote file first with .writing extension
session.write(fileInputStream, remoteFilePath);
fileInputStream.close();
// then rename it to its final name
session.rename(remoteFilePath, pathTo);

After some investigation, it's fixed after version 2.2.6 with use-temporary-file-name="false"

Monday, August 25, 2014

Oracle SQL*Loader does not support the sequence generation for direct loading.

Oracle SQL*Loader does not support the sequence generation for direct loading.
Under such case, the id could be added firstly on the file:
grep -n '^' file1.csv > file2.csv
sed "s/:/,/g" file2.csv > file3.csv
Then use direct load, should get the best performance.

Or use SEQUENCE(MAX,1)

If the direct load is false, then expression "seq_....nextval" could be used

Wednesday, August 20, 2014

Spring Batch in Action

1. based on Spring Batch 2.1
2. source code http://code.google.com/p/springbatch-in-action/
3. Spring Batch processes items in chunks.Chunk processing
allows streaming data instead of loading all the data in memory. By default, chunk
processing is single threaded and usually performs well.
4. If you return null from the ItemProcessor method process, processing for
that item stops and Spring Batch won’t insert the item in the database. (Filter)
5. First, the size of a chunk and the commit interval are the same thing! Second, there’s
no definitive value to choose. Our recommendation is a value between 10 and 200.
6. Spring Batch needs two infrastructure components:
■ Job repository—To store the state of jobs (finished or currently running)
■ Job launcher—To create the state of a job before launching it
7. the job repository stores job execution metadata in a database
to provide Spring Batch reliable monitoring and restart features.

Friday, August 15, 2014

Use Spring batch to load csv files into Oracle database

Use Spring batch to load csv files into Oracle database:
1. Need set the below for oracle database (to create the tables job repository needs, may check the contents in core spring batch jar)
    <jdbc:initialize-database data-source="dataSource">
 <jdbc:script location="org/springframework/batch/core/schema-drop-oracle10g.sql" />
 <jdbc:script location="org/springframework/batch/core/schema-oracle10g.sql" />
    </jdbc:initialize-database>
2. Need set <property name="isolationLevelForCreate" value="ISOLATION_READ_COMMITTED" /> so that there's no below error:
Spring Batch ORA-08177: can't serialize access for this transaction when running single job, SERIALIZED isolation level
3. May set the skip lines for csv files:
<property name="linesToSkip" value="1"/>
4. csv file location could be specified by classpath or file way:
<property name="resource" value="classpath:...">
<property name="resource" value="file:c:/...">
5. Load 2G csv files with 41 million lines, it took 221 minutes. Much slower than SQL*Loader but should be good enough for most projects

The performance for Spring Batch is about 2700/s while Oracle SQL*Loader is above 100k/s. Oracle SQL*Loader (direct path) is 40 time faster than Spring Batch.

Thursday, August 14, 2014

Integer.parseInt difference between JDK6 and JDK8

Today QA reported a bug which is actually a NumberFormatException.forInputString exception for "+2"
But when I tested it, it works perfect and when I checked the document, it stated that - and + is supported for Integer.parseInt.
After further investigation, it shows its JDK difference. My test uses JDK8 while the QA test environment is JDK6.
JDK6 does not support plus sign