Databricks

Supports:

  • ✅ Models
  • ✅ Model sync destination
  • ✅ Bulk sync source
  • ✅ Bulk sync destination

Connection

Configuration

NameTypeDescriptionRequired
cloud_providerstringCloud Provider (destination support only)

Accepted values: aws , azure
false
databricks_auth_modestringAuthentication Method

Accepted values: access_token , oauth_service_principal
true
enable_delta_uniformbooleanEnable Delta UniForm tablesfalse
enforce_query_limitbooleanLimit concurrent queries false
http_pathstringHTTP Pathtrue
portintegerPorttrue
server_hostnamestringServer Hostnametrue
unity_catalog_enabledbooleanUnity Catalog enabledfalse
use_bulk_sync_staging_schemabooleanUse custom bulk sync staging schema false

databricks_auth_mode

When databricks_auth_mode is access_token
NameTypeDescriptionRequired
access_tokenstringAccess Tokentrue
When databricks_auth_mode is oauth_service_principal
NameTypeDescriptionRequired
service_principal_idstringService Principal IDtrue
service_principal_secretstringService Principal Secrettrue

cloud_provider

When cloud_provider is aws
NameTypeDescriptionRequired
auth_modestringAWS Authentication Method

How to authenticate with AWS. Defaults to Access Key and Secret. Accepted values: access_key_and_secret, iam_role
true
s3_bucket_namestringS3 Bucket Name (destinations only)

Name of bucket used for staging data load files
true
s3_bucket_regionstringS3 Bucket Region (destinations only)

Region of bucket
true
set_retention_propertiesbooleanConfigure data retention for tablesfalse
When cloud_provider is azure
NameTypeDescriptionRequired
azure_access_keystringStorage Account Access Key (destination support only)

The access key associated with this storage account
true
azure_account_namestringStorage Account Name (destination support only)

The account name of the storage account
true
container_namestringStorage Container Name (destination support only)

The container which we will stage files in
true

use_bulk_sync_staging_schema

When use_bulk_sync_staging_schema is true
NameTypeDescriptionRequired
bulk_sync_staging_schemastringStaging schema namefalse

enforce_query_limit

When enforce_query_limit is true
NameTypeDescriptionRequired
concurrent_queriesintegerConcurrent query limitfalse

auth_mode

When auth_mode is access_key_and_secret
NameTypeDescriptionRequired
aws_access_key_idstringAWS Access Key ID (destinations only)

See https://docs.polytomic.com/docs/databricks-connections#writing-to-databricks
false
aws_secret_access_keystringAWS Secret Access Key (destinations only)false
When auth_mode is iam_role
NameTypeDescriptionRequired
iam_role_arnstringIAM Role ARNtrue
storage_credential_namestringStorage credential namefalse

set_retention_properties

When set_retention_properties is true
NameTypeDescriptionRequired
deleted_file_retention_daysintegerDeleted file retentionfalse
log_file_retention_daysintegerLog retentionfalse

Example

1{
2 "name": "Databricks connection",
3 "type": "databricks",
4 "configuration": {
5 "access_token": "isoz8af6zvp8067gu68gvrp0oftevn",
6 "auth_mode": "access_key_and_secret",
7 "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
8 "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
9 "cloud_provider": "aws",
10 "databricks_auth_mode": "access_token",
11 "enable_delta_uniform": false,
12 "enforce_query_limit": false,
13 "http_path": "/sql",
14 "port": 443,
15 "s3_bucket_name": "s3://polytomic-databricks-results/customer-dataset",
16 "s3_bucket_region": "us-east-1",
17 "server_hostname": "dbc-1234dsafas-d0001.cloud.databricks.com",
18 "set_retention_properties": false,
19 "unity_catalog_enabled": false,
20 "use_bulk_sync_staging_schema": false
21 }
22}

Read-only properties

NameTypeDescriptionRequired

auth_mode

When auth_mode is access_key_and_secret
NameTypeDescriptionRequired
aws_userstringUser ARN (destinations only)false
When auth_mode is iam_role
NameTypeDescriptionRequired
external_idstringExternal ID for the IAM rolefalse

Model Sync

Source

Configuration

NameTypeDescriptionRequired
catalogstringCatalogfalse
querystringQueryfalse
schemastringSchemafalse
tablestringTablefalse

Example

1{
2 ...
3 "configuration": {
4 "catalog": "samples",
5 "query": "SELECT * FROM samples.nyctaxi.trips",
6 "schema": "nyctaxi",
7 "table": "trips"
8 }
9}

Target

Databricks connections may be used as the destination in a model sync.

All targets

Configuration
NameTypeDescriptionRequired
created_columnstring’Created at’ timestamp columnfalse
preserve_table_on_resyncbooleanPreserve destination table when resyncingfalse
updated_columnstring’Updated at’ timestamp columnfalse
write_record_timestampsbooleanWrite row timestamp metadatafalse
Example
1{
2 ...
3 "target": {
4 "configuration": {
5 "created_column": "",
6 "preserve_table_on_resync": false,
7 "updated_column": "",
8 "write_record_timestamps": false
9 }
10 }
11}

Bulk Sync

Destination

Configuration

NameTypeDescriptionRequired
advancedobjectfalse
catalogstringCatalogfalse
external_location_namestringExternal locationfalse
mirror_schemasbooleanMirror schemasfalse
schemastringOutput schemafalse

Example

1{
2 ...
3 "destination_configuration": {
4 "advanced": {
5 "deleted_file_retention_days": 0,
6 "empty_strings_null": false,
7 "hard_deletes": false,
8 "log_file_retention_days": 0,
9 "set_retention_properties": false,
10 "table_prefix": "",
11 "truncate_existing": false
12 },
13 "catalog": "samples",
14 "external_location_name": "",
15 "mirror_schemas": false,
16 "schema": "nyctaxi"
17 }
18}

Type handling

Destination types

POLYTOMIC TYPEDATABRICKS TYPE
array<>ARRAY<>
bigintBIGINT
booleanBOOLEAN
dateDATE
datetimeTIMESTAMP
decimal(precision, scale)DECIMAL(precision,scale)
doubleDOUBLE
intINT
jsonSTRING
jsonarraySTRING
numberDECIMAL(38,18)
object{}STRUCT<>
singleFLOAT
smallintSMALLINT
stringSTRING
timeTIMESTAMP

Source types

DATABRICKS TYPEPOLYTOMIC TYPE
ARRAY<>array<>
BIGINTbigint
DATEdate
DECIMAL(precision, scale)decimal(precision, scale)
DOUBLEdouble
FLOATsingle
INTint
INTERVALstring
MAP<>object{}
SMALLINTsmallint
STRUCT<>object{}
TIMESTAMPdatetime_tz
TIMESTAMP_NTZdatetime
TINYINTsmallint
VARCHARstring