CDC streaming from databases
When syncing untransformed data from PostgreSQL or MySQL into a data warehouse, database, or cloud storage, it’s preferred (though not required) for Polytomic to utilize CDC (change data capture) replication. This will avoid Polytomic running table scans to figure out changes since the last sync. Rather, Polytomic will process changes in real-time without scanning source tables.
This section covers setup instructions to enable this on PostgreSQL and MySQL.
PostgreSQL
MySQL
PostgreSQL
To enable CDC streaming from PostgreSQL, Polytomic requires the following of the PostgreSQL instance:
- PostgreSQL 10 or later (Polytomic uses the
pgoutput
plugin). - Set the PostgreSQL
wal_level
setting tological
.
You can determine the value of the wal_level
setting by running this query:
- Available replication slot.
Polytomic requires a replication slot for each bulk sync; the same slot will be used for all tables synced as part of that bulk sync configuration. The total number of replication slots is set using the max_replication_slots
PostgreSQL setting.
You can inspect current replication slots using the pg_replication_slots
system table:
- User with
REPLICATION
property set.
The REPLICATION
property signals that a PostgreSQL user (role) is able to manage replication slots. Polytomic will use this property to create a replication slot for each bulk sync configured. You can set it with the following query:
- A publication for the tables you wish to replicate.
PostgreSQL uses a publication to track changes to one or more tables. Only a user with the owner role for a table may add it to the publication. After the publication is created, however, any user with the REPLICATION
property may use it.
You can create a publication for specific tables or for all tables (including those created in the future). Note that creating a publication for all tables may result in increased storage consumption.
The publication name can be anything; you’ll enter it in Polytomic’s PostgreSQL connection configuration.
Set your Polytomic PostgreSQL connection configuration
Once the previous steps are done, be sure to set the following arguments in Polytomic’s PostgreSQL connection configuration:
change_detection
:true
publication
: the name of the publication generated in the previous section