Databricks
Supports:
- ✅ Models
- ✅ Model sync destination
- ✅ Bulk sync source
- ✅ Bulk sync destination
Connection
Configuration
| Name | Type | Description | Required |
|---|---|---|---|
access_token | string | Access Token (required if databricks_auth_mode is “access_token”) | false |
auth_mode | string | AWS Authentication Method How to authenticate with AWS. Defaults to Access Key and Secret. Accepted values: access_key_and_secret, iam_role | true |
aws_access_key_id | string | AWS Access Key ID (destinations only) See https://docs.polytomic.com/docs/databricks-connections#writing-to-databricks (required if auth_mode is “aws_access_key_id”) | false |
aws_secret_access_key | string | AWS Secret Access Key (destinations only) (required if auth_mode is “aws_access_key_id”) | false |
azure_access_key | string | Storage Account Access Key (destination support only) The access key associated with this storage account (required if cloud_provider is “azure”) | false |
azure_account_name | string | Storage Account Name (destination support only) The account name of the storage account (required if cloud_provider is “azure”) | false |
bulk_sync_staging_schema | string | Staging schema name | false |
cloud_provider | string | Cloud Provider (destination support only) Accepted values: aws, azure | false |
concurrent_queries | integer | Concurrent query limit | false |
container_name | string | Storage Container Name (destination support only) The container which we will stage files in (required if cloud_provider is “azure”) | false |
databricks_auth_mode | string | Authentication Method Accepted values: access_token, oauth_service_principal | true |
deleted_file_retention_days | integer | Deleted file retention | false |
enable_delta_uniform | boolean | Enable Delta UniForm tables | false |
enforce_query_limit | boolean | Limit concurrent queries | false |
http_path | string | HTTP Path | true |
iam_role_arn | string | IAM Role ARN (required if auth_mode is “iam_role”) | false |
log_file_retention_days | integer | Log retention | false |
port | integer | Port | true |
s3_bucket_name | string | S3 Bucket Name (destinations only) Name of bucket used for staging data load files (required if cloud_provider is “aws”) | false |
s3_bucket_region | string | S3 Bucket Region (destinations only) Region of bucket (required if cloud_provider is “aws”) | false |
server_hostname | string | Server Hostname | true |
service_principal_id | string | Service Principal ID (required if databricks_auth_mode is “oauth_service_principal”) | false |
service_principal_secret | string | Service Principal Secret (required if databricks_auth_mode is “oauth_service_principal”) | false |
set_retention_properties | boolean | Configure data retention for tables | false |
storage_credential_name | string | Storage credential name | false |
unity_catalog_enabled | boolean | Unity Catalog enabled | false |
use_bulk_sync_staging_schema | boolean | Use custom bulk sync staging schema | false |
Example
1 { 2 "name": "Databricks connection", 3 "type": "databricks", 4 "configuration": { 5 "access_token": "isoz8af6zvp8067gu68gvrp0oftevn", 6 "auth_mode": "access_key_and_secret", 7 "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE", 8 "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY", 9 "azure_access_key": "abcdefghijklmnopqrstuvwxyz0123456789/+ABCDEabcdefghijklmnopqrstuvwxyz0123456789/+ABCDE==", 10 "azure_account_name": "account", 11 "bulk_sync_staging_schema": "", 12 "cloud_provider": "aws", 13 "concurrent_queries": 0, 14 "container_name": "container", 15 "databricks_auth_mode": "access_token", 16 "deleted_file_retention_days": 0, 17 "enable_delta_uniform": false, 18 "enforce_query_limit": false, 19 "http_path": "/sql", 20 "iam_role_arn": "", 21 "log_file_retention_days": 0, 22 "port": 443, 23 "s3_bucket_name": "s3://polytomic-databricks-results/customer-dataset", 24 "s3_bucket_region": "us-east-1", 25 "server_hostname": "dbc-1234dsafas-d0001.cloud.databricks.com", 26 "service_principal_id": "sp-1234abcd", 27 "service_principal_secret": "abcdefghijklmnopqrstuvwxyz0123456789/+ABCDEabcdefghijklmnopqrstuvwxyz0123456789/+ABCDE==", 28 "set_retention_properties": false, 29 "storage_credential_name": "", 30 "unity_catalog_enabled": false, 31 "use_bulk_sync_staging_schema": false 32 } 33 }
Read-only properties
| Name | Type | Description | Required |
|---|---|---|---|
aws_user | string | User ARN (destinations only) | false |
external_id | string | External ID for the IAM role | false |
Model Sync
Source
Configuration
| Name | Type | Description | Required |
|---|---|---|---|
catalog | string | Catalog | false |
query | string | Query | false |
schema | string | Schema | false |
table | string | Table | false |
Example
1 { 2 ... 3 "configuration": { 4 "catalog": "samples", 5 "query": "SELECT * FROM samples.nyctaxi.trips", 6 "schema": "nyctaxi", 7 "table": "trips" 8 } 9 }
Target
Databricks connections may be used as the destination in a model sync.
All targets
Configuration
| Name | Type | Description | Required |
|---|---|---|---|
created_column | string | ’Created at’ timestamp column | false |
preserve_table_on_resync | boolean | Preserve destination table when resyncing | false |
updated_column | string | ’Updated at’ timestamp column | false |
write_record_timestamps | boolean | Write row timestamp metadata | false |
Example
1 { 2 ... 3 "target": { 4 "configuration": { 5 "created_column": "", 6 "preserve_table_on_resync": false, 7 "updated_column": "", 8 "write_record_timestamps": false 9 } 10 } 11 }
Bulk Sync
Destination
Configuration
| Name | Type | Description | Required |
|---|---|---|---|
advanced | object | false | |
catalog | string | Catalog | false |
external_location_name | string | External location | false |
mirror_schemas | boolean | Mirror schemas | false |
schema | string | Output schema | false |
Example
1 { 2 ... 3 "destination_configuration": { 4 "advanced": { 5 "deleted_file_retention_days": 0, 6 "empty_strings_null": false, 7 "hard_deletes": false, 8 "log_file_retention_days": 0, 9 "set_retention_properties": false, 10 "table_prefix": "", 11 "truncate_existing": false 12 }, 13 "catalog": "samples", 14 "external_location_name": "", 15 "mirror_schemas": false, 16 "schema": "nyctaxi" 17 } 18 }
Type handling
Destination types
| POLYTOMIC TYPE | DATABRICKS TYPE |
|---|---|
array<> | ARRAY<> |
bigint | BIGINT |
boolean | BOOLEAN |
date | DATE |
datetime | TIMESTAMP |
decimal(precision, scale) | DECIMAL(precision,scale) |
double | DOUBLE |
int | INT |
json | STRING |
jsonarray | STRING |
number | DECIMAL(38,18) |
object{} | STRUCT<> |
single | FLOAT |
smallint | SMALLINT |
string | STRING |
time | TIMESTAMP |
Source types
| DATABRICKS TYPE | POLYTOMIC TYPE |
|---|---|
ARRAY<> | array<> |
BIGINT | bigint |
DATE | date |
DECIMAL(precision, scale) | decimal(precision, scale) |
DOUBLE | double |
FLOAT | single |
INT | int |
INTERVAL | string |
MAP<> | object{} |
SMALLINT | smallint |
STRUCT<> | object{} |
TIMESTAMP | datetime_tz |
TIMESTAMP_NTZ | datetime |
TINYINT | smallint |
VARCHAR | string |
