HAPI MDM rules are defined in a single json document.
Note that in all the following configuration, valid options for resourceType
include any supported resource, such as Organization
, Patient
, Practitioner
, and *
. Use *
if the criteria is identical across both resource types and you would like to apply it to all resources.
Here is an example of a full HAPI MDM rules json document:
{
"version": "1",
"mdmTypes" : ["Organization", "Patient", "Practitioner"],
"candidateSearchParams": [
{
"resourceType": "Patient",
"searchParams": [
"phone"
]
},
{
"resourceType": "Patient",
"searchParams": [
"birthdate"
]
},
{
"resourceType": "*",
"searchParams": [
"identifier"
]
}
],
"candidateFilterSearchParams": [],
"matchFields": [
{
"name": "birthday",
"resourceType": "Patient",
"resourcePath": "birthDate",
"matcher": {
"algorithm": "STRING"
}
},
{
"name": "phone",
"resourceType": "Patient",
"resourcePath": "telecom.value",
"matcher": {
"algorithm": "STRING"
}
},
{
"name": "firstname-meta",
"resourceType": "Patient",
"resourcePath": "name.given",
"matcher": {
"algorithm": "METAPHONE"
}
},
{
"name": "lastname-meta",
"resourceType": "Patient",
"resourcePath": "name.family",
"matcher": {
"algorithm": "METAPHONE"
}
},
{
"name": "firstname-jaro",
"resourceType": "Patient",
"resourcePath": "name.given",
"similarity": {
"algorithm": "JARO_WINKLER",
"matchThreshold": 0.80
}
},
{
"name": "lastname-jaro",
"resourceType": "Patient",
"resourcePath": "name.family",
"similarity": {
"algorithm": "JARO_WINKLER",
"matchThreshold": 0.80
}
},
{
"name": "org-name",
"resourceType": "Organization",
"resourcePath": "name",
"matcher": {
"algorithm": "STRING"
}
}
],
"matchResultMap": {
"firstname-meta,lastname-meta,birthday": "MATCH",
"firstname-meta,lastname-meta,phone": "MATCH",
"firstname-jaro,lastname-jaro,birthday": "POSSIBLE_MATCH",
"firstname-jaro,lastname-jaro,phone": "POSSIBLE_MATCH",
"lastname-jaro,phone,birthday": "POSSIBLE_MATCH",
"firstname-jaro,phone,birthday": "POSSIBLE_MATCH",
"org-name": "MATCH"
}
}
Here is a description of how each section of this document is configured.
These define fields which must have at least one exact match before two resources are considered for matching. This is like a list of "pre-searches" that find potential candidates for matches, to avoid the expensive operation of running a match score calculation on all resources in the system. E.g. you may only wish to consider matching two Patients if they either share at least one identifier in common or have the same birthday or the same phone number. The HAPI FHIR server executes each of these searches separately and then takes the union of the results, so you can think of these as OR
criteria that cast a wide net for potential candidates. In some MDM systems, these "pre-searches" are called "blocking" searches (since they identify "blocks" of candidates that will be searched for matches).
If a list of searchParams is specified in a given candidateSearchParams item, then these search parameters are treated as AND
parameters. In the following candidateSearchParams definition, hapi-fhir will extract given name, family name and identifiers from the incoming Patient and perform two separate searches, first for all Patient resources that have the same given AND
the same family name as the incoming Patient, and second for all Patient resources that share at least one identifier as the incoming Patient. Note that if the incoming Patient was missing any of these searchParam values, then that search would be skipped. E.g. if the incoming Patient had a given name but no family name, then only a search for matching identifiers would be performed.
{
"candidateSearchParams" : [
{
"resourceType" : "Patient",
"searchParams" : ["given", "family"]
}, {
"resourceType" : "Patient",
"searchParam" : "identifier"
}
]
}
[ {
"resourceType" : "Patient",
"searchParam" : "active",
"fixedValue" : "true"
} ]
For example, if the incoming patient looked like this:
{
"resourceType": "Patient",
"id": "example",
"identifier": [{
"system": "urn:oid:1.2.36.146.595.217.0.1",
"value": "12345"
}],
"name": [
{
"family": "Chalmers",
"given": [
"Peter",
"James"
]
}
]
}
then the above candidateSearchParams
and candidateFilterSearchParams
would result in the following two consecutive searches for candidates:
Patient?given=Peter,James&family=Chalmers&active=true
Patient?identifier=urn:oid:1.2.36.146.595.217.0.1|12345&active=true
Once the match candidates have been found, they are then each compared to the incoming Patient resource. This comparison is made across a list of matchField
s. Each matchField returns true
or false
indicating whether the candidate and the incoming Patient match on that field. There are two types of matchFields: matcher
and similarity
. matcher
matchFields return a true
or false
directly, whereas similarity
matchFields return a score between 0.0 (no match) and 1.0 (exact match) and this score is translated to a true/false
via a matchThreshold
. E.g. if a JARO_WINKLER
matchField is configured with a matchThreshold
of 0.8 then that matchField will only return true
if the JARO_WINKLER
similarity evaluates to a score >= 8.0.
By default, all matchFields have exact=false
which means that they will have all diacritical marks removed and all letters will be converted to upper case before matching. exact=true
can be added to any matchField to compare the strings as they are originally capitalized and accented.
Here is a matcher matchField that uses the SOUNDEX matcher to determine whether two family names match.
{
"name": "familyname-soundex",
"resourceType": "*",
"resourcePath": "name.family",
"matcher": {
"algorithm": "SOUNDEX"
}
}
Here is a matcher matchField that only matches when two family names are identical.
{
"name": "familyname-exact",
"resourceType": "*",
"resourcePath": "name.family",
"matcher": {
"algorithm": "STRING",
"exact": true
}
}
Special identifier matching is also available if you need to match on a particular identifier system:
{
"name": "identifier-ssn",
"resourceType": "*",
"resourcePath": "identifier",
"matcher": {
"algorithm": "IDENTIFIER",
"identifierSystem": "http://hl7.org/fhir/sid/us-ssn"
}
}
Here is a similarity matchField that matches when two given names match with a JARO_WINKLER threshold >= 0.8.
{
"name": "firstname-jaro",
"resourceType": "*",
"resourcePath": "name.given",
"similarity": {
"algorithm": "JARO_WINKLER",
"matchThreshold": 0.80
}
}
The following algorithms are currently supported:
Algorithm | Type | Description | Example |
---|---|---|---|
CAVERPHONE1 | matcher | Apache Caverphone1 | Gail = Gael, Gail != Gale, Thomas != Tom |
CAVERPHONE1 | matcher | Apache Caverphone1 | Gail = Gael, Gail = Gale, Thomas != Tom |
COLOGNE | matcher | Apache Cologne Phonetic | |
DOUBLE_METAPHONE | matcher | Apache Double Metaphone | Dury = Durie, Allsop = Allsob, Smith != Schmidt |
MATCH_RATING_APPROACH | matcher | Apache Match Rating Approach Encoder | |
METAPHONE | matcher | Apache Metaphone | Dury = Durie, Allsop != Allsob, Smith != Schmidt |
NYSIIS | matcher | Apache Nysiis | |
REFINED_SOUNDEX | matcher | Apache Refined Soundex | |
SOUNDEX | matcher | Apache Soundex | Jon = John, Thomas != Tom |
STRING | matcher | Match the values as strings. This matcher should be used with tokens (e.g. gender). | MCTAVISH = McTavish when exact = false, MCTAVISH != McTavish when exact = true |
SUBSTRING | matcher | True if one string starts with the other. | Bill = Billy, Egbert = Bert |
DATE | matcher | Reduce the precision of the dates to the lowest precision of the two, then compare them as strings. | 2019-12,Month = 2019-12-19,Day |
NAME_ANY_ORDER | matcher | Match names as strings in any order | John Henry = Henry JOHN when exact = false |
NAME_FIRST_AND_LAST | matcher | Match names as strings in any order | John Henry = John HENRY when exact=false, John Henry != Henry John |
IDENTIFIER | matcher | Matches when the system and value of the identifier are identical. | If an optional "identifierSystem" is provided, then the identifiers only match when they belong to that system |
JARO_WINKLER | similarity | tdebatty Jaro Winkler | |
COSINE | similarity | tdebatty Cosine Similarity | |
JACCARD | similarity | tdebatty Jaccard Index | |
LEVENSCHTEIN | similarity | tdebatty Normalized Levenshtein | |
SORENSEN_DICE | similarity | tdebatty Sorensen-Dice coefficient |
These entries convert combinations of successful matchFields into an MDM Match Result for overall matching of a given pair of resources. MATCH results are evaluated take precedence over POSSIBLE_MATCH results. If the incoming resource matches ALL of the named matchFields listed, then a new match link is created with the assigned matchResult (MATCH
or POSSIBLE_MATCH
).
{
"matchResultMap": {
"firstname-meta,lastname-meta,birthday": "MATCH",
"firstname-jaro,lastname-jaro,birthday": "POSSIBLE_MATCH"
}
}
The external EID system that the HAPI MDM system should expect to see on incoming Patient resources. Must be a valid URI. See MDM EID for details on how EIDs are managed by HAPI MDM.