Databricks

Supports:

  • ✅ Models
  • ✅ Model sync destination
  • ✅ Bulk sync source
  • ✅ Bulk sync destination

Connection

Configuration

NameTypeDescriptionRequired
access_tokenstringAccess Token

(required if databricks_auth_mode is “access_token”)
false
auth_modestringAWS Authentication Method

How to authenticate with AWS. Defaults to Access Key and Secret. Accepted values: access_key_and_secret, iam_role
true
aws_access_key_idstringAWS Access Key ID (destinations only)

See https://docs.polytomic.com/docs/databricks-connections#writing-to-databricks (required if auth_mode is “aws_access_key_id”)
false
aws_secret_access_keystringAWS Secret Access Key (destinations only)

(required if auth_mode is “aws_access_key_id”)
false
azure_access_keystringStorage Account Access Key (destination support only)

The access key associated with this storage account (required if cloud_provider is “azure”)
false
azure_account_namestringStorage Account Name (destination support only)

The account name of the storage account (required if cloud_provider is “azure”)
false
bulk_sync_staging_schemastringStaging schema namefalse
cloud_providerstringCloud Provider (destination support only)

Accepted values: aws, azure
false
concurrent_queriesintegerConcurrent query limitfalse
container_namestringStorage Container Name (destination support only)

The container which we will stage files in (required if cloud_provider is “azure”)
false
databricks_auth_modestringAuthentication Method

Accepted values: access_token, oauth_service_principal
true
deleted_file_retention_daysintegerDeleted file retentionfalse
enable_delta_uniformbooleanEnable Delta UniForm tablesfalse
enforce_query_limitbooleanLimit concurrent queriesfalse
http_pathstringHTTP Pathtrue
iam_role_arnstringIAM Role ARN

(required if auth_mode is “iam_role”)
false
log_file_retention_daysintegerLog retentionfalse
portintegerPorttrue
s3_bucket_namestringS3 Bucket Name (destinations only)

Name of bucket used for staging data load files (required if cloud_provider is “aws”)
false
s3_bucket_regionstringS3 Bucket Region (destinations only)

Region of bucket (required if cloud_provider is “aws”)
false
server_hostnamestringServer Hostnametrue
service_principal_idstringService Principal ID

(required if databricks_auth_mode is “oauth_service_principal”)
false
service_principal_secretstringService Principal Secret

(required if databricks_auth_mode is “oauth_service_principal”)
false
set_retention_propertiesbooleanConfigure data retention for tablesfalse
storage_credential_namestringStorage credential namefalse
unity_catalog_enabledbooleanUnity Catalog enabledfalse
use_bulk_sync_staging_schemabooleanUse custom bulk sync staging schemafalse

Example

1{
2 "name": "Databricks connection",
3 "type": "databricks",
4 "configuration": {
5 "access_token": "isoz8af6zvp8067gu68gvrp0oftevn",
6 "auth_mode": "access_key_and_secret",
7 "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
8 "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
9 "azure_access_key": "abcdefghijklmnopqrstuvwxyz0123456789/+ABCDEabcdefghijklmnopqrstuvwxyz0123456789/+ABCDE==",
10 "azure_account_name": "account",
11 "bulk_sync_staging_schema": "",
12 "cloud_provider": "aws",
13 "concurrent_queries": 0,
14 "container_name": "container",
15 "databricks_auth_mode": "access_token",
16 "deleted_file_retention_days": 0,
17 "enable_delta_uniform": false,
18 "enforce_query_limit": false,
19 "http_path": "/sql",
20 "iam_role_arn": "",
21 "log_file_retention_days": 0,
22 "port": 443,
23 "s3_bucket_name": "s3://polytomic-databricks-results/customer-dataset",
24 "s3_bucket_region": "us-east-1",
25 "server_hostname": "dbc-1234dsafas-d0001.cloud.databricks.com",
26 "service_principal_id": "sp-1234abcd",
27 "service_principal_secret": "abcdefghijklmnopqrstuvwxyz0123456789/+ABCDEabcdefghijklmnopqrstuvwxyz0123456789/+ABCDE==",
28 "set_retention_properties": false,
29 "storage_credential_name": "",
30 "unity_catalog_enabled": false,
31 "use_bulk_sync_staging_schema": false
32 }
33}

Read-only properties

NameTypeDescriptionRequired
aws_userstringUser ARN (destinations only)false
external_idstringExternal ID for the IAM rolefalse

Model Sync

Source

Configuration

NameTypeDescriptionRequired
catalogstringCatalogfalse
querystringQueryfalse
schemastringSchemafalse
tablestringTablefalse

Example

1{
2 ...
3 "configuration": {
4 "catalog": "samples",
5 "query": "SELECT * FROM samples.nyctaxi.trips",
6 "schema": "nyctaxi",
7 "table": "trips"
8 }
9}

Target

Databricks connections may be used as the destination in a model sync.

All targets

Configuration
NameTypeDescriptionRequired
created_columnstring’Created at’ timestamp columnfalse
preserve_table_on_resyncbooleanPreserve destination table when resyncingfalse
updated_columnstring’Updated at’ timestamp columnfalse
write_record_timestampsbooleanWrite row timestamp metadatafalse
Example
1{
2 ...
3 "target": {
4 "configuration": {
5 "created_column": "",
6 "preserve_table_on_resync": false,
7 "updated_column": "",
8 "write_record_timestamps": false
9 }
10 }
11}

Bulk Sync

Destination

Configuration

NameTypeDescriptionRequired
advancedobjectfalse
catalogstringCatalogfalse
external_location_namestringExternal locationfalse
mirror_schemasbooleanMirror schemasfalse
schemastringOutput schemafalse

Example

1{
2 ...
3 "destination_configuration": {
4 "advanced": {
5 "deleted_file_retention_days": 0,
6 "empty_strings_null": false,
7 "hard_deletes": false,
8 "log_file_retention_days": 0,
9 "set_retention_properties": false,
10 "table_prefix": "",
11 "truncate_existing": false
12 },
13 "catalog": "samples",
14 "external_location_name": "",
15 "mirror_schemas": false,
16 "schema": "nyctaxi"
17 }
18}

Type handling

Destination types

POLYTOMIC TYPEDATABRICKS TYPE
array<>ARRAY<>
bigintBIGINT
booleanBOOLEAN
dateDATE
datetimeTIMESTAMP
decimal(precision, scale)DECIMAL(precision,scale)
doubleDOUBLE
intINT
jsonSTRING
jsonarraySTRING
numberDECIMAL(38,18)
object{}STRUCT<>
singleFLOAT
smallintSMALLINT
stringSTRING
timeTIMESTAMP

Source types

DATABRICKS TYPEPOLYTOMIC TYPE
ARRAY<>array<>
BIGINTbigint
DATEdate
DECIMAL(precision, scale)decimal(precision, scale)
DOUBLEdouble
FLOATsingle
INTint
INTERVALstring
MAP<>object{}
SMALLINTsmallint
STRUCT<>object{}
TIMESTAMPdatetime_tz
TIMESTAMP_NTZdatetime
TINYINTsmallint
VARCHARstring