Config Schema
Overview
All Monarch configurations are stored in the monarch_config schema:
Setup
The Monarch config schema gets set up as part of DAG 01_setup_provider_schema. This dag will run DBT seed to create the base config. As long as that DAG has been run the config schema will be ready for use.
Updating the Configs
Updates can be made in Monarch's config interface.
Summary of Each Table's Purpose
This section describes the purpose of each table at a high-level.
⭐ Core Tables
⭐ source_ingestion
Used in: core
Defines the sources that Monarch will look at. They come from CSV files, JSON files, or Salesforce instances.
source_name: the name of the sourcetype: eithersalesforce,csv, orjsonimport_dag_id: The name of the DAG that will import the raw data and transform it into the common data modelsf_url: This is the url for the Salesforce instance, in the form MyDomainName.my.salesforce.com. Only applies to sources of type "salesforce".sf_token: The token for Salesforce access. Generate one by following the steps here. If pushing to Salesforce, the token must grant read and write access. Only applies to sources of type "salesforce".protected: If this flag is set, Monarch will not write (data load or delete) to this Salesforce instance. We recommend setting it for production sources. Only applies to sources of type "salesforce".transformation_project: The name of the DBT project that will transform the data to the common data modeltransform_schema: The name of the schema where the DBT transformation models will livecopy_script: The path to the script used by DAG 03_import_data_into_landing to copy data from thetransform_schematolandingbulk_query_max_records: The maximum number of records a salesforce query will fetch in a batch. Only applies to sources of type "salesforce".target: The environment the data load and data seed will push to. This must be a Salesforce source and there must be exactly one target settarget_type: Determines the type of target system for data operations. Can beSF(Salesforce),FHIR(FHIR endpoints and resources), orNULL(not a target). This field helps determine whether the target source is directed to FHIR endpoints and resources or Salesforce uploads and deletes.
The following tables are used to track source status:
Raw refers to raw source data
raw_schema: The schema name for the raw source dataraw_tin_list_table: The table name containing raw TIN (Tax Identification Number) list dataraw_tin_column: The column name containing TIN values in the raw data
Transformed refers to the raw source data after it's transformed into the V12 data model
transformed_tin_list_table: The name of the table in thetransform_schemacontaining the TIN listtransformed_tin_column: The name of the TIN column intransformed_tin_list_table
Landed means data that's been imported to the landing schema
landed_tin_list_table: The name of the table in thelandingschema containing the TIN listlanded_tin_column: The name of the TIN column inlanded_tin_list_table
Merged means records that have been merged and loaded into the staging schema
merged_tin_list_table: The name of the table in thestagingschema containing the TIN listmerged_tin_column: The name of the TIN column inmerged_tin_list_table
Reviewing means the data is in the staging_to_upload schema and is under final review before it gets loaded into Salesforce
reviewing_tin_list_table: The name of the table in thestaging_to_uploadschema containing the TIN listreviewing_tin_column_name: The name of the TIN column inreviewing_tin_list_table
Monarch can handle incremental uploads so it's possible a given migration only focuses on a subset of TINs. In Scope refers to the subset of TINs relevant to a given migration round. This TIN list generally consists of a hand-picked set of TINs and is manually imported into the database.
in_scope_tin_list_schema: The schema name for the list of in-scope TINsin_scope_tin_list_table: The name of the table inin_scope_tin_list_schemacontaining the TIN listin_scope_tin_column_name: The name of the TIN column inin_scope_tin_list_table
⭐ object
Used in: core, data load, salesforce
Defines the objects in each source that Monarch will look at. If an object's enabled flag is set to false, it will not be pulled from Salesforce. This is useful if your source data doesn't include one of the Monarch base objects (e.g. fee schedule).
Similarly, if the data_load_enabled flag or smart_delete_enabled flag is set to false, this object will not be pushed to or deleted from the target Salesforce environment during data loading (DAG 06_upload_data_to_salesforce) and smart deleting (DAG smart_delete) respectively. This also applies to data seeding (DAG full_data_seed).
When sf_use_source_column is enabled, this object's ID and foreign keys will be suffixed with the name of the source. For example, if the salesforce id for a record in the source sandbox2 is 001E100001LX1gZIAT, then it becomes 001E100001LX1gZIAT_sandbox2. Generally you want to enable this for each object but there are a few exceptions. For example, you might not enable it for RecordType and User because those ID's are shared across environments.
⭐ object_field
Used in: core, data load, merge, salesforce
Defines the fields in each object that Monarch will look at. If an object's enabled flag is set to false, it will still appear in the pipeline but will be omitted during data loading (DAG 06_upload_data_to_salesforce) and data seeding (DAG full_data_seed).
The sort_order column defines the order in which the fields will be listed on the merge report.
The group_id column is used to define a set of fields that should be treated as one group. It's used during merge to ensure if one field in a given group gets overwritten, then the other fields must also be overwritten. For example, ShippingStreet, ShippingState, ShippingPostalCode, and ShippingCity belong to one group because it doesn't make sense to only pull in the ShippingCity field when merging two records. All four fields make up an address so they should be handled together as a group.
⭐ lookup
Used in: core, data load, salesforce
Similar to object field but defines what fields in each object are foreign keys and what tables they point to. Used to build tree structures (e.g. the DAG graph for data loading, the object hierarchy for smart copying)
A few key definitions:
-
relation- the options areparent,child, andself. They refer to the order in which the objects are reached when traversing the hierarchy tree.parentmeans the other table is traversed before this one (higher node in the tree), whilechildmeans the other table is traversed after this one (lower node in the tree).selfis a way to defer loading a specific field to a later point (see section below).- Note that this is different from table parentage relationships in a database. For example, when looking at Account, Contact, and AccountContactRelation (ACR for short), one might think that ACR has
accountIdandcontactIdset asparentrelations because both both the account and contact records must exist before the ACR record can be created. When building the hierarchy tree however, the parentage order is Account -> ACR -> Contact, so therefore ACR hasaccountIdset asparentandcontactIdset aschild.
- Note that this is different from table parentage relationships in a database. For example, when looking at Account, Contact, and AccountContactRelation (ACR for short), one might think that ACR has
-
defer- This flag is used when thetarget_objectfor a given lookup is the same object type asobject_name. When this flag is set, the field will be left blank during the initial data load step for the object, then updated at a later step. For example, the account object has aparentIdfield which references another account. If a location account is uploaded before its parent account, it won't be able to populate theparentIdfield with the parent salesforce ID because that object won't exist yet. Deferring the field ensures all objects will be uploaded and given salesforce ID's before the lookup ID's are populated.- When
deferis set and the relation is set toself, it means to defer loading the field on the object itself. The main example that uses this is thestatusfield of the contract object. When a contract is created, the status can't immediately be set toExecutedeven if that's the value of the data due to a Salesforce limitation. To get around this, we set thestatusfield todeferand give it a relation ofself, which allows Monarch to upload the object without thestatusfield, then update it in a later step.
- When
-
enabled- If this flag is set tofalse, this lookup will be ignored when building the DAG graphs for data loading (DAG 06_upload_data_to_salesforce) and smart deleting (DAG smart_delete).
Supplementary Tables
confidence_functions
Used in: merge
Defines which function a given source's object will use and what schema that function is in. DAG provider_merge uses this to generate confidence scores.
data_quality_exclusion
Used in: reports
Defines which tests are excluded from DAG 04_data_quality_and_mastering. This is useful if there are built-in Monarch tests that don't apply to your data.
default_airflow_vars
Utility
A previous version of monarch utilized airflow variables but we've since moved away from those to this new config setup. This table exists to house the variables previously stored there. It contains key-value pairs so it can be used as a general purpose table for any other config variables you might need to add.
export_db_config
Used in: db exports
Used by DAG export_project_database to export specific tables and schemas to a client's on-premise database so they have a record of data that was moved. See exporting project database for more details
fuzzy_match
Used in: core, reports
Defines what methods and fields we want to fuzzy match on for the specified object. The table_name column must be formatted to include the schema and table name like <schema>.<tablename>. The config column is a json string where the keys are methods and the values are field configurations for that method. The available methods are name, phone, npi_column, and string_match. Fuzzy matching happens during DAG 04_data_quality_and_mastering.
demographic_report
Used in: reports
Defines the configuration for the demographic report feature, which allows counting distinct records for a specified aggregate column while filtering by state and category. The report can point to any table in any schema and provides interactive filtering capabilities.
Key Configuration Fields:
schema_name: The database schema containing the target tableobject_name: The table name to analyzeaggregate_col: The column to count distinct values (e.g., practice tax ID numbers)state_col: The column containing state information for filteringcategory_col: The column containing category information for filteringname_col: The column containing names for navigation links to source explorercity_col: The column containing city informationstreet_col: The column containing street address informationzip_col: The column containing zip code information
Advanced Usage with DBT Views: Migration developers can leverage their transformation DBT project to create custom views that join multiple staging object tables. This enables the use of categories that are stored on other object tables. For example, if you want to categorize accounts by their associated contact types or contract statuses, you can create a DBT view that joins the account table with contact and contract tables, then reference this view in the demographic report configuration.
Source Explorer Integration:
The demographic report provides navigation links from the name column to source explorer for detailed record inspection. Important: To enable this navigation functionality, users must ensure the staging environment is properly indexed in source explorer. This requires configuring the source_explorer table to include the staging schema with the source name 'staging' and running the source_explorer_reload DAG to build the search index.
remap_table
Used in: merge
Defines the objects and fields that will need to be remapped during merge (DAG provider_merge). If a row's enabled flag is set to false, remap instructions will not be generated for that object and field.
report_master
Used in: reports
se_object
Used in: source explorer
Similar to the object config table but used by source explorer. This table defines the objects that will show up for each indexed source. Indexing occurs in DAG source_explorer_reload. facet_fields are fields source explorer uses to cluster objects together like a sql "group by". It's used as an index to optimize searching. If a row's enabled flag is set to false, the object will not be indexed. Note that this can cause issues depending on the object.
se_object_field
Used in: source explorer
Similar to the object_field config table but used by source explorer.This table defines the fields per object that will show up for each indexed source. Just like se_object, if a row's enabled flag is set to false, the field will not be indexed, which can cause issues depending on the field.
se_object_lookup
Used in: source explorer
Similar to the lookup config table but used by source explorer. This table defines the lookups per object that will show up for each indexed source. They're called "refs" in solr. Note that if use_default_lookups is set to false in the source_explorer config table, this table will be ignored.
sf_fields_to_update
Used in: core, data load, salesforce
Defines how an object's field maps to a Salesforce field. The most common use of this is record types. In some data sources, the value of the recordtypeid field will be the developername rather than the id, so Monarch uses this table to update that column in DAG 04_data_quality_and_mastering. Also used in record_merge to auto-generate ui-unmap table (see ui_unmap for more details).
source_explorer
Used in: source explorer
Defines each source to index and display in source explorer. Indexing occurs in DAG source_explorer_reload. If use_default_lookups is enabled, source explorer will default to using the lookup config instead of se_object_lookup. See source-explorer-setup for more details.
table_options
Used in: db exports
Used in conjunction with export_db_config for db exporting. Defines which columns in a given table should be included or excluded. See exporting project database for more details.
test_to_report_mapping
Used in: reports
ui_unmap
Used in: merge
This table is used to determine how to "unmap" fields that have been remapped during the record merge phase. These "unmaps" are done in the merge screen in order to make fields in the merge screen more human readable - for example, if a field contains data in the form of a salesforce ID, the user may wish to "unmap" this field in the UI so that it points instead to the name of the object the ID represents. It contains the following fields:
- object - the object containing the field to unmap (ie account)
- field_to_unmap - the field to unmap (ie recordtypeid)
- reference_table - the table used to pull the new, "unmapped" data from. Should be in the format
schema.tablename(ie staging.recordtype) - unmap_value - the value to be used in the UI (can either be a column from the table
reference_tableor a more complex sql statement. Any references toreference_tableshould be in the format<object>_<field_to_unmap>. A simple unmap example might bename, and a more complex example isconcat(contact_ownerid.firstname, ' ', contact_ownerid.lastname)) - match_value - the column in
reference_tablethat matchesobject.field_to_unmap. This column will be used to join the two tables. - custom_unmap_value - false when
unmap_valueis a simple column name, true when it is a more complex sql statement - enabled - when false, the row will be ignored and not used to update the merge UI
When record_merge is run, ui_unmap rows will be auto-generated for every row in sf_fields_to_update. The mappings for this is as follows:
- model_col_name as field_to_unmap
- concat('staging.', sf_table) as reference_table
- replace(regexp_substr(where_clause, 'sf.[\w]*[ =]'), 'sf.','') as unmap_value
- 'id' as match_value
- false as custom_unmap_value
- true as enabled
Any existing rows will not be updated by this auto-generation. If you wish to delete an auto-generated row, simply mark the enabled column as false, and it will not be re-generated in the future.