HAPI-FHIR 5.1.0 introduced support for batch processing using the Spring Batch framework. However since its introduction, we discovered Spring Batch jobs do not recover well from ungraceful server shutdowns, which are increasingly common as implementors move to containerized deployment technologies such as Kubernetes.
HAPI-FHIR 6.0.0 has begun the process of replacing Spring Batch with a custom batch framework, called "batch2". This new "batch2" framework is designed to scale well across multiple processes sharing the same message broker, and most importantly, to robustly recover from unexpected server restarts.
A HAPI-FHIR batch job definition consists of a job name, version, parameter json input type, and a chain of job steps. Each step of the chain declares the json output type it produces, which will be the input type for the following step. The final step in the chain does not declare an output type as the final step will typically do the work of the job, e.g. reindex resources, export data to disk, etc.
After a job has been defined, instances of that job can be submitted for batch processing by populating a
JobInstanceStartRequest with the job name and job parameters json and then submitting that request to the Batch Job Coordinator.
The Batch Job Coordinator will then store two records in the database:
Lastly the Batch Job Coordinator publishes a message to the Batch Notification Message Channel (named
batch2-work-notification) to inform worker threads that this first chunk of work is now ready for processing.
HAPI-FHIR Batch Jobs run based on job notification messages. The process is kicked off by the first chunk of work. When this notification message arrives, the message handler makes a single call to the first step defined in the job definition, passing in the job parameters as input.
The handler then does the following:
Middle Steps in the job definition are executed in the same way, except instead of only using the Job Parameters as input, they use both the Job Parameters and the Work Chunk data produced from the previous step.
The final step operates the same way as the middle steps, except it does not produce any new work chunks.
If a Job Definition is set to having Gated Execution, then all work chunks for one step must be COMPLETED before any work chunks for the next step may begin.
A Batch Job Maintenance Service runs every minute to monitor the status of all Job Instances and the Job Instance is transitioned to either COMPLETED, ERRORED or FAILED according to the status of all outstanding work chunks for that job instance. If the job instance is still IN_PROGRESS this maintenance service also estimates the time remaining to complete the job.