Databricks
Supports:
- ✅ Models
 - ✅ Model sync destination
 - ✅ Bulk sync source
 - ✅ Bulk sync destination
 
Connection
Configuration
| NAME | TYPE | DESCRIPTION | REQUIRED | READONLY | 
|---|---|---|---|---|
| databricks_auth_mode | string | Accepted values: access_token, oauth_service_principal | true | false | 
| access_token | string | (required if databricks_auth_mode is “access_token”) | false | false | 
| service_principal_id | string | (required if databricks_auth_mode is “oauth_service_principal”) | false | false | 
| service_principal_secret | string | (required if databricks_auth_mode is “oauth_service_principal”) | false | false | 
| server_hostname | string | true | false | |
| port | integer | true | false | |
| http_path | string | true | false | |
| cloud_provider | string | Accepted values: aws, azure | false | false | 
| auth_mode | string | How to authenticate with AWS. Defaults to Access Key and Secret. Accepted values: access_key_and_secret, iam_role | true | false | 
| iam_role_arn | string | (required if auth_mode is “iam_role”) | false | false | 
| storage_credential_name | string | false | false | |
| external_id | string | External ID for the IAM role | false | false | 
| aws_access_key_id | string | See https://docs.polytomic.com/docs/databricks-connections#writing-to-databricks (required if auth_mode is “aws_access_key_id”) | false | false | 
| aws_secret_access_key | string | (required if auth_mode is “aws_access_key_id”) | false | false | 
| aws_user | string | false | false | |
| s3_bucket_name | string | Name of bucket used for staging data load files (required if cloud_provider is “aws”) | false | false | 
| s3_bucket_region | string | Region of bucket (required if cloud_provider is “aws”) | false | false | 
| azure_account_name | string | The account name of the storage account (required if cloud_provider is “azure”) | false | false | 
| azure_access_key | string | The access key associated with this storage account (required if cloud_provider is “azure”) | false | false | 
| container_name | string | The container which we will stage files in (required if cloud_provider is “azure”) | false | false | 
| unity_catalog_enabled | boolean | false | false | |
| enable_delta_uniform | boolean | false | false | |
| enforce_query_limit | boolean | false | false | |
| concurrent_queries | integer | false | false | |
| set_retention_properties | boolean | false | false | |
| log_file_retention_days | integer | false | false | |
| deleted_file_retention_days | integer | false | false | |
| use_bulk_sync_staging_schema | boolean | false | false | |
| bulk_sync_staging_schema | string | false | false | 
Example
1 { 2 "name": "Databricks connection", 3 "type": "databricks", 4 "configuration": { 5 "access_token": "isoz8af6zvp8067gu68gvrp0oftevn", 6 "auth_mode": "access_key_and_secret", 7 "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE", 8 "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY", 9 "aws_user": "", 10 "azure_access_key": "abcdefghijklmnopqrstuvwxyz0123456789/+ABCDEabcdefghijklmnopqrstuvwxyz0123456789/+ABCDE==", 11 "azure_account_name": "account", 12 "bulk_sync_staging_schema": "", 13 "cloud_provider": "aws", 14 "concurrent_queries": 0, 15 "container_name": "container", 16 "databricks_auth_mode": "access_token", 17 "deleted_file_retention_days": 0, 18 "enable_delta_uniform": false, 19 "enforce_query_limit": false, 20 "external_id": "", 21 "http_path": "/sql", 22 "iam_role_arn": "", 23 "log_file_retention_days": 0, 24 "port": 443, 25 "s3_bucket_name": "s3://polytomic-databricks-results/customer-dataset", 26 "s3_bucket_region": "us-east-1", 27 "server_hostname": "dbc-1234dsafas-d0001.cloud.databricks.com", 28 "service_principal_id": "sp-1234abcd", 29 "service_principal_secret": "abcdefghijklmnopqrstuvwxyz0123456789/+ABCDEabcdefghijklmnopqrstuvwxyz0123456789/+ABCDE==", 30 "set_retention_properties": false, 31 "storage_credential_name": "", 32 "unity_catalog_enabled": false, 33 "use_bulk_sync_staging_schema": false 34 } 35 } 
Model Sync
Source
Configuration
| NAME | TYPE | DESCRIPTION | REQUIRED | READONLY | 
|---|---|---|---|---|
| catalog | string | Catalog | false | false | 
| schema | string | Schema | false | false | 
| table | string | Table | false | false | 
| query | string | Query | false | false | 
Example
1 { 2 ... 3 "configuration": { 4 "catalog": "samples", 5 "query": "SELECT * FROM samples.nyctaxi.trips", 6 "schema": "nyctaxi", 7 "table": "trips" 8 } 9 } 
Target
Databricks connections may be used as the destination in a model sync.
All targets
Configuration
| NAME | TYPE | DESCRIPTION | REQUIRED | READONLY | 
|---|---|---|---|---|
| preserve_table_on_resync | boolean | Preserve destination table when resyncing | false | false | 
| write_record_timestamps | boolean | Write row timestamp metadata | false | false | 
| created_column | string | ’Created at’ timestamp column | false | false | 
| updated_column | string | ’Updated at’ timestamp column | false | false | 
Example
1 { 2 ... 3 "target": { 4 "configuration": { 5 "created_column": "", 6 "preserve_table_on_resync": false, 7 "updated_column": "", 8 "write_record_timestamps": false 9 } 10 } 11 } 
Bulk Sync
Destination
Configuration
| NAME | TYPE | DESCRIPTION | REQUIRED | READONLY | 
|---|---|---|---|---|
| advanced | object | false | false | |
| catalog | string | Catalog | false | false | 
| schema | string | Output schema | false | false | 
| mirror_schemas | boolean | Mirror schemas | false | false | 
| external_location_name | string | External location | false | false | 
Example
1 { 2 ... 3 "destination_configuration": { 4 "advanced": { 5 "deleted_file_retention_days": 0, 6 "empty_strings_null": false, 7 "hard_deletes": false, 8 "log_file_retention_days": 0, 9 "set_retention_properties": false, 10 "table_prefix": "", 11 "truncate_existing": false 12 }, 13 "catalog": "samples", 14 "external_location_name": "", 15 "mirror_schemas": false, 16 "schema": "nyctaxi" 17 } 18 } 
Type handling
Destination types
| POLYTOMIC TYPE | DATABRICKS TYPE | 
|---|---|
array<> | ARRAY<> | 
bigint | BIGINT | 
boolean | BOOLEAN | 
date | DATE | 
datetime | TIMESTAMP | 
decimal(precision, scale) | DECIMAL(precision,scale) | 
double | DOUBLE | 
int | INT | 
json | STRING | 
jsonarray | STRING | 
number | DECIMAL(38,18) | 
object{} | STRUCT<> | 
single | FLOAT | 
smallint | SMALLINT | 
string | STRING | 
time | TIMESTAMP | 
Source types
| DATABRICKS TYPE | POLYTOMIC TYPE | 
|---|---|
ARRAY<> | array<> | 
BIGINT | bigint | 
DATE | date | 
DECIMAL(precision, scale) | decimal(precision, scale) | 
DOUBLE | double | 
FLOAT | single | 
INT | int | 
INTERVAL | string | 
MAP<> | object{} | 
SMALLINT | smallint | 
STRUCT<> | object{} | 
TIMESTAMP | datetime_tz | 
TIMESTAMP_NTZ | datetime | 
TINYINT | smallint | 
VARCHAR | string | 
