7.0.1Partitioning and Multitenancy

 

HAPI FHIR 5.0.0 introduced a new feature to HAPI FHIR JPA server called Partitioning.

Partitioning allows each resource on the server to be placed in a partition, which is essentially just an arbitrary identifier grouping a set of resources together.

Partitioning is designed to be very flexible, and can be used to achieve different outcomes. For example:

  • Partitioning could be used to achieve multitenancy, where there are multiple logically separate pools of resources on the server. Traditionally this kind of setup is desired when each of these pools belongs to a distinct user group / organization / customer / etc. (a "tenant"), and each of these tenants should not be able to access or modify data belonging to another tenant.

  • Partitioning could also be used to logically separate data coming from distinct sources within an organization. For example, patient records might be placed in one partition, lab data sourced from a lab system might be placed in a second partition and patient surveys from a survey app might be placed in another. In this situation data does not need to be completely segregated (lab Observation records may have references to Patient records in the patient partition) but these partitions might be used to support security groups, retention policies, etc.

  • Partitioning could be used for geographic sharding, keeping data in a partition that is geographically closest to where it is likely to be used.

These examples each have different properties in terms of security rules, and how data is organized and searched.

7.0.2Architecture

 

7.0.2.1Conceptual Architecture

Partitioning in HAPI FHIR JPA means that every resource has a partition identity. This identity consists of the following attributes:

  • Partition Name: This is a short textual identifier for the partition that the resource belongs to. This might be a customer ID, a description of the type of data in the partition, or something else. There is no restriction on the text used aside from a maximum length of 200, but generally it makes sense to limit the text to URL-friendly characters.

  • Partition ID: This is an integer ID that corresponds 1:1 with the partition Name. It is used in the database as the partition identifier.

  • Partition Date: This is an additional partition discriminator that can be used to implement partitioning strategies using a date axis.

Mappings between the Partition Name and the Partition ID are maintained using the Partition Management Operations.

7.0.2.2Logical Architecture

At the database level, partitioning involves the use of two dedicated columns to many tables within the HAPI FHIR JPA database schema:

  • PARTITION_ID – This is an integer indicating the specific partition that a given resource is placed in. This column can also be NULL, meaning that the given resource is in the Default Partition.
  • PARTITION_DATE – This is a date/time column that can be assigned an arbitrary value depending on your use case. Typically, this would be used for use cases where data should be automatically dropped after a certain time period using native database partition drops.

When partitioning is used, these two columns will be populated with the same value for a given resource on all resource-specific tables (this includes HFJ_RESOURCE and all tables that have a foreign key relationship to it including HFJ_RES_VER, HFJ_RESLINK, HFJ_SPIDX_*, etc.)

When a new resource is created, an interceptor hook is invoked to request the partition ID and date to be assigned to the resource.

When a resource is updated, the partition ID and date from the previous version will be used.

When a read operation is being performed (e.g. a read, search, history, etc.), a separate interceptor hook is invoked in order to determine whether the operation should target a specific partition. The outcome of this hook determines how the partitioning manifests itself to the end user:

  • The system can be configured to operate as a multitenant solution by configuring the partition interceptor to scope all read operations to read data only from the partition that request has access to.
  • The system can be configured to operate with logical segments by configuring the partition interceptor to scope read operations to access all partitions.

7.0.3Partitioning and Resource IDs

 

In a partitioned repository, it is important to understand that only a single pool of resource IDs exists. In other words, only one resource with the ID Patient/1 can exist across all partitions, and it must be in a single partition.

This fact can have security implications:

  • A client might be blocked from creating Patient/ABC in the partition they have access to because this ID is already in use in another partition.

  • In a server using the default configuration of SEQUENTIAL_NUMERIC Server ID Strategy a client may be able to infer the IDs of resources in other partitions based on the ID they were assigned.

These considerations can be addressed by using UUID Server ID Strategy, and disallowing client-assigned IDs.

7.0.4Partition Interceptors

 

In order to implement partitioning, an interceptor must be registered against the interceptor registry (either the REST Server registry, or the JPA Server registry will work).

This interceptor can implement the hooks shown below.

7.0.4.1Identify Partition for Create (Required)

A hook against the Pointcut.STORAGE_PARTITION_IDENTIFY_CREATE pointcut must be registered, and this hook method will be invoked every time a resource is created in order to determine the partition the resource is assigned to.

The criteria for determining the partition will depend on your use case. For example:

  • If you are implementing multi-tenancy the partition might be determined by using the Request Tenant ID. It could also be determined by looking at request headers, or the authorized user/session context, etc.

  • If you are implementing segmented data partitioning, the partition might be determined by examining the actual resource being created, by the identity of the sending system, etc.

7.0.4.2Identify Partition for Read (Optional)

A hook against the Pointcut.STORAGE_PARTITION_IDENTIFY_READ pointcut must be registered, and this hook method will be invoked every time a resource is read in order to determine the partition to read the resource from.

As of HAPI FHIR 5.3.0, the Identify Partition for Read hook method may return multiple partition names or IDs. If more than one partition is identified, the server will search in all identified partitions.

7.0.4.3Non-Partitionable Resources

Some resource types can not be placed in any partition other than the DEFAULT partition. When a resource of one of these types is being created, the STORAGE_PARTITION_IDENTIFY_CREATE pointcut is invoked, but the hook method must return defaultPartition(). A partition date may optionally be included.

The following resource types may not be placed in any partition except the default partition:

  • CapabilityStatement
  • CodeSystem
  • CompartmentDefinition
  • ConceptMap
  • Library
  • NamingSystem
  • OperationDefinition
  • Questionnaire
  • SearchParameter
  • StructureDefinition
  • StructureMap
  • ValueSet

7.0.4.4Examples

See Partition Interceptor Examples for various samples of how partitioning interceptors can be set up.

7.0.5Complete Example: Using Request Tenants

 

In order to achieve a multitenant configuration, the following configuration steps must be taken:

Additionally, indexes will likely need to be tuned in order to support the partition-aware queries.

The following snippet shows a server with this configuration.

public class MultitenantServer extends RestfulServer {

   @Autowired
   private PartitionSettings myPartitionSettings;

   @Override
   protected void initialize() {

      // Enable partitioning
      myPartitionSettings.setPartitioningEnabled(true);

      // Set the tenant identification strategy
      setTenantIdentificationStrategy(new UrlBaseTenantIdentificationStrategy());

      // Use the tenant ID supplied by the tenant identification strategy
      // to serve as the partitioning ID
      registerInterceptor(new RequestTenantPartitionInterceptor());

      // ....Register some providers and other things....

   }
}

Once enabled, HTTP Requests to the FHIR server must include the name of the partition in the request, for identification purposes. With no multitenancy, a request to create a Patient could look like this:

POST www.example.com/fhir/Patient

With partitioning enabled, if we were to now create a patient in the P1 partition, the request would now look like this:

POST www.example.com/fhir/P1/Patient

If a tenant name is not provided in the request path, the request will default the tenant and use will use the 'DEFAULT' partition.

7.0.6Limitations

 

Partitioning is a relatively new feature in HAPI FHIR (added in HAPI FHIR 5.0.0) and has a number of known limitations. If you are intending to use partitioning for achieving a multi-tenant architecture it is important to consider these limitations.

None of the limitations listed here are considered permanent. Over time the HAPI FHIR team is hoping to make all of these features partition aware.

  • The following ResourceTypes may not be partitioned: The following resources must be placed in the default partition:

    • CapabilityStatement
    • CodeSystem
    • CompartmentDefinition
    • ConceptMap
    • Library
    • NamingSystem
    • OperationDefinition
    • Questionnaire
    • SearchParameter
    • StructureDefinition
    • StructureMap
    • ValueSet
  • Server Capability Statement is not partition aware: The server creates and exposes a single server capability statement, covering all partitions. This can be misleading when partitioning us used as a multitenancy strategy.

  • Conformance resources may not be partitioned: Conformance resources must be placed in the default partition, and will be shared for any validation activities across all partitions.

  • Search Parameters are not partitioned: There is only one set of SearchParameter resources for the entire system, and any search parameters will apply to resources in all partitions. All SearchParameter resources must be stored in the default partition.

  • Cross-partition History Operations are not supported: It is not possible to perform a _history operation that spans all partitions (_history does work when applied to a single partition however).

  • Package Operations are not partition aware: Package operations will only create, update and query resources in the default partition.

  • Advanced Elasticsearch indexing is not partition optimized: The results are correctly partitioned, but the extended indexing is not optimized to account for partitions.

  • Subscriptions are partition aware: Subscriptions can be placed on any partition and will deliver matching resources from the same partition. A subscription on the default can deliver resource from all partition if it is placed in the default partition with the cross-partition extension and the server allows cross-partition subscriptions.