Enable the development of robust batch applications
The problem: calculating long running operations for multiple rows in a table, like tax calculations for 300.000 employees. Easy,right? Create one big selection, pick each employee, calculate the tax and update the record.
Well, it’s not that easy. Some operations might fail for some users and we want to restart the operation only for the failed rows. Some operations might depend on external web services which are not idempotent so it is not simple to restart the process only for the employees with failed operations.
Luckily we have Spring Batch 3.0 which takes care of separating our work in chunks, restarting operations that failed, distributing the work across several machines and a lot of functionality surrounding batch operations. All we need to do is provide our domain functionality.
Spring Batch 3.0 is also compatible with JSR-352 1.0.0. Finally, the new standard which is part of Java EE 7. We choose to go for the Spring Batch packages as its apis use generics which make it a little bit cleaner. With some minor changes, the same code should also run on any Java EE 7 compliant application container.
Spring Batch components
The main component of Spring Batch is the Job. The job is a concrete task we want to perform. Each Job as a series of JobSteps. The step is an independent process of the Job. And each job may run several times and for that we have JobExecutions.
In our example the job is a process that calculates the taxes of a list of employees. This job is made of three steps: tax calculation, web service call that actually does the tax payment and generating a summary tax payment PDF file. The job execution in our case is the time we run the same job in each month. Job execution is a job run with a number ofparameters. In our case the parameters are the year and month for which we do the calculation.
This might seem complicated but the actual configuration is done in pure Java code and it’s simple to read and understand. No more verbose XML!
@Bean public Job employeeJob() { return jobBuilders.get(EMPLOYEE_JOB) .start(taxCalculationStep()) .next(wsCallAndGenerateAndSendPaycheckStep()) .next(jobResultsPdf()) .build(); }
Let’s dig a little deeper: what is a step? How do we configure it? Until now, this is just configuration and the application does not know anything about our domain. The next Spring Batch objects ItemReader, ItemWriter and ItemProcessor actually work with our domain objects. The item reader reads item by item, passes it to the item processor and then the processed result is passed to the item writer. By decoupling the reading, processing and writing, Spring Batch makes it easy to have item writes grouped in chunks instead of writing item by item.
Here is some more configuration code:
protected Step taxCalculationStep() { return stepBuilders.get(TAX_CALCULATION_STEP) .<Employee, TaxCalculation>chunk(5) .reader(taxCalculatorItemReader) .processor(calculateTaxProcessor) .writer(taxCalculatorItemWriter) .build(); }
By configuring the chunk size (in our case this is 5) we can pick how many processed items get written in one chunk write. If one chunk fails, the entire transaction is rolled back, the item that failed is removed from the chunk and then the chunk is reprocessed. Thanks to this, we can retry or restart the job in order to process the failed items again.
There is still something we did not explain in the code above: what is <Employee, TaxCalculation>chunk(5)
? The item reader reads objects of type Employee, the processor processes objects of type Employee into objects of type TaxCalculation and the writer writes objects of type TaxCalculation). This looks a lot like UNIX pipes. If the reader reads Employee
and the writer writes TaxCalculation
then the step will have Employee
as input and TaxCalculation
as output. Configuration of reader, processor and writer will make this a lot clearer:
@Autowired private JpaPagingItemReader taxCalculatorItemReader; @Autowired private ItemProcessor<Employee, TaxCalculation> calculateTaxProcessor; @Autowired private JpaItemWriter taxCalculatorItemWriter;
As you can assume from the code above, Spring Batch offers some default implementations of ItemReader and ItemWriter. We are using the JPA version since we use JPA for persistence.
Now the last class we need to check out is the ItemProcessor. The code is also quite self-explanatory:
@Component public class CalculateTaxProcessor implements ItemProcessor<Employee, TaxCalculation> { @Autowired private TaxCalculatorService taxCalculatorService; @Override public TaxCalculation process(Employee employee) { TaxCalculation taxCalculation = taxCalculatorService.calculateTax(employee); return taxCalculation; } }
Of course, there are a number of gotchas you need to know: when using Spring Batch, make sure you extend the DefaultBatchConfigurer from Spring Batch. We ran into a problem because we have defined our own JpaTransactionManager in our PersistenceConfig and if you just add the @EnableBatchProcessing and do not inherit from DefaultBatchConfigurer, Spring Batch creates its own TransactionManager which will result in a bunch of nasty side effects.
With this we finish a short intro in how spring batch works and gets configured. This is just a simplified version of our code. I kept out some parts for clarity. Please feel free to check out our application since it is open source. Please check out our future blog post about job executions, retrying and exception handling in Spring Batch.
The post Why use Spring Batch? appeared first on Cegeka Blog.