Kafka To Database Sync application

Summary

Ingest string messages seperated by '|' from configured kafka topic and writes each message as a record in DataBase. This application uses PoJoEvent as an example schema, this can be customized to use custom schema based on specific needs.

The source code is available at: https://github.com/DataTorrent/app-templates/tree/master/kafka-to-database-sync.

Please send feedback or feature requests to: feedback@datatorrent.com

This document has a step-by-step guide to configure, customize, and launch this application.

Steps to launch application

Click on the AppHub tab from the top navigation bar.
Page listing the applications available on AppHub is displayed. Search for Database to see all applications related to Database.

Click on import button for Kafka to Database Sync App
Notification is displayed on the top right corner after application package is successfully imported.
Click on the link in the notification which navigates to the page for this application package.

Detailed information about the application package like version, last modified time, and short description is available on this page. Click on launch button for Kafka-to-Database-Sync application.
Launch Kafka-to-Database-Sync dialogue is displayed. One can configure the name of this instance of the application from this dialogue.
Select Use saved configuration option to display a list of pre-saved configurations. Please select sandbox-memory-conf.xml or cluster-memory-conf.xml depending on whether your environment is the DataTorrent sandbox, or other cluster.
Select Specify custom properties option. Click on add default properties button.

This expands a key-value editor pre-populated with mandatory properties for this application. Change values as needed. Properties editor

For example, suppose we wish string messages seperated with '|' to the table test_event_table in a PostgreSQL database named testdb accessible at target-database.node.com port 5432 with credentials username=postgres, password=postgres. Properties should be set as follows:

Name	Value
dt.operator.kafkaInput.prop.clusters	kafka-source.node.com:9092
dt.operator.kafkaInput.prop.topics	transactions
dt.operator.kafkaInput.prop.initialOffset	EARLIEST
dt.operator.JdbcOuput.prop.store.databaseDriver	org.postgresql.Driver
dt.operator.JdbcOuput.prop.store.databaseUrl	jdbc:postgresql://target-database.node .com:5432/testdb
dt.operator.JdbcOuput.prop.store.password	postgres
dt.operator.JdbcOuput.prop.store.userName	postgres
dt.operator.JdbcOuput.prop.tablename	test_event_output_table

Details about configuration options are available in Configuration options section.

Click on the Launch button at the lower right corner of the dialog to launch the application. A notification is displayed on the top right corner after the application is launched successfully and includes the Application ID which can be used to monitor this instance and to find its logs.
Click on the Monitor tab from the top navigation bar.
A page listing all running applications is displayed. Search for the current instance based on name or application id or any other relevant field. Click on the application name or id to navigate to the details page.
Application instance details page shows key metrics for monitoring the application status. logical tab shows application DAG, StrAM events, operator status based on logical operators, stream status, and a chart with key metrics.
Click on the physical tab to look at the status of physical instances of operators, containers etc.

Configuration options

Prerequistes

Kafka configured with version 0.9.
Meta-data table is required for Jdbc Output operator for transactional data and application consistency.

Table Name	Column Names
dt_meta	dt_app_id (VARCHAR) dt_operator_id (INT) dt_window (BIGINT)

Query for Meta-data table creation:

CREATE TABLE dt_meta (dt_app_id varchar(100) NOT NULL, dt_operator_id int NOT NULL, dt_window bigint NOT NULL, CONSTRAINT dt_app_id UNIQUE (dt_app_id,dt_operator_id,dt_window));

Mandatory properties

End user must specify the values for these properties.

Property	Description	Type	Example
dt.operator.kafkaInput.prop.clusters	List of Brokers for kafka input	String	node1.company.com:9098, node2.company.com:9098, node3.company.com:9098
dt.operator.kafkaInput.prop.topics	Kafka topics for input	String	transactions
dt.operator.kafkaInput.prop.initialOffset	Kafka input offset	String	EARLIEST LATEST APPLICATION_OR_EARLIEST APPLICATION_OR_LATEST
dt.operator.JdbcOuput.prop.store.databaseDriver	JDBC driver class. This has to be on CLASSPATH. PostgreSQL driver is added as a dependency.	String	org.postgresql.Driver
dt.operator.JdbcOuput.prop.store.databaseUrl	JDBC connection URL	String	jdbc:postgresql://target-database.node.com:5432/testdb
dt.operator.JdbcOuput.prop.store.password	Password for Database credentials	String	postgres
dt.operator.JdbcOuput.prop.store.userName	Username for Database credentials	String	postgres
dt.operator.JdbcOuput.prop.tableName	Table name for output records	String	test_event_output_table

Advanced properties

There are pre-saved configurations based on the application environment. Recommended settings for datatorrent sandbox edition are in sandbox-memory-conf.xml and for a cluster environment in cluster-memory-conf.xml (the first 2 are integers and the rest are strings). The messages or records emitted are specified by the value of the TUPLE_CLASS attribute in the configuration file namely PojoEvent in this case.

Property	Description	Default for cluster -memory - conf.xml	Default for sandbox -memory -conf.xml
dt.operator.csvParser.port.out.attr.TUPLE_CLASS	Fully qualified class name for the tuple class POJO(Plain old java objects) emitted by JDBC input	com.datatorrent.apps.PojoEvent	com.datatorrent.apps.PojoEvent
dt.operator.JdbcOutput.port.input.attr.TUPLE_CLASS	Fully qualified class name for the tuple class POJO(Plain old java objects) emitted by Kafka input	com.datatorrent.apps.PojoEvent	com.datatorrent.apps.PojoEvent
dt.operator.csvParser.prop.schema	Schema for CSV parser	{"separator": "\|", "quoteChar": "\"", "lineDelimiter": "", "fields": [ { "name": "accountNumber", "type": "Integer" }, { "name": "name", "type": "String" }, { "name": "amount", "type": "Integer" } ]}	{"separator": "\|", "quoteChar": "\"", "lineDelimiter": "", "fields": [ { "name": "accountNumber", "type": "Integer" }, { "name": "name", "type": "String" }, { "name": "amount", "type": "Integer" } ]}

You can override default values for advanced properties by specifying custom values for these properties in the step specify custom property step mentioned in steps to launch an application.

Steps to customize the application

Make sure you have following utilities installed on your machine and available on PATH in environment variable:
- Java : 1.7.x
- maven : 3.0 +
- git : 1.7 +
- Hadoop (Apache-2.2)+
Use following command to clone the examples repository:

git clone git@github.com:DataTorrent/app-templates.git
Change directory to examples/tutorials/kafka-to-database-sync:

cd examples/tutorials/kafka-to-database-sync
Import this maven project in your favorite IDE (e.g. eclipse).
Change the source code as per your requirements. Some tips are given as commented blocks in the Application.java for this project.
Make respective changes in the test case and properties.xml based on your environment.
Compile this project using maven:

mvn clean package

This will generate the application package file with .apa extension in the target directory.
Go to DataTorrent UI Management console on web browser. Click on the Develop tab from the top navigation bar.
Click on upload package button and upload the generated .apa file.
Application package page is shown with the listing of all packages. Click on the Launch button for the uploaded application package. Follow the steps for launching an application.