“Most places/article show we can use (Py)Spark/Kafka to load data into TigerGraph, but not the other way around.”
@interesting_ideas I’ll bring this feedback to the Product Team. In most cases you can connect TigerGraph’s REST endpoint data egress to other tools/products.
Unfortunately, Apache Kafka itself does not natively support ingestion directly from REST endpoints because Kafka is fundamentally a distributed streaming platform that operates with its own native protocols for data transmission. However, you can facilitate data ingestion from REST endpoints into Kafka topics through a few different approaches using auxiliary tools and services. Here are some popular methods:
1. Kafka Connect with REST APIs
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems using source and sink connectors. It can be used to ingest data from REST APIs into Kafka using a HTTP Source Connector. The steps include:
-
Setting up Kafka Connect: Install and configure Kafka Connect on your Kafka cluster.
-
Configure HTTP Source Connector: Use an existing connector like the Confluent HTTP Source Connector or other community connectors capable of calling REST APIs at scheduled intervals and writing the responses to a Kafka topic.
Example configuration snippet for a Kafka Connect HTTP Source Connector:
propertiesCopy code
name=http-source-connector
connector.class=io.confluent.connect.http.HttpSourceConnector
tasks.max=1
http.api.url=https://api.example.com/data
kafka.topic=ingest-topic
confluent.topic.bootstrap.servers=localhost:9092
confluent.topic.replication.factor=1
This configuration defines a source connector that fetches data from https://api.example.com/data
and publishes it to the ingest-topic
Kafka topic.
2. Kafka REST Proxy
As previously mentioned, the Confluent REST Proxy provides a RESTful interface to Kafka. It allows producers and consumers to publish and retrieve Kafka data using HTTP/HTTPS protocols. This method is typically used for scenarios where the clients cannot use the native Kafka protocol.
To ingest data using the Kafka REST Proxy, you would need a separate process that polls or listens to your REST endpoints and then sends the fetched data to Kafka through the REST Proxy.
3. Custom Producer Application
Develop a custom application that acts as a middleware between your REST endpoints and Kafka. (see below TigerGraph -(Python)-> Kafka) This application would:
- Poll the REST API periodically or react to webhooks.
- Fetch the data and serialize it into a suitable format.
- Produce the data to a Kafka topic using the Kafka client libraries available for languages like Java, Python, or Node.js.
4. Stream Processing Frameworks
Use a stream processing framework like Apache NiFi or StreamSets, which have capabilities to ingest data from various sources including REST APIs. These frameworks offer robust data ingestion, transformation, and delivery to Kafka.
5. Using Scheduled Scripts
A simple but less scalable approach involves scheduled scripts (e.g., written in Python, Shell) that run at specific intervals (using cron jobs, for example), fetch data from REST endpoints, and push it to Kafka using Kafka’s client libraries or REST Proxy.
TigerGraph -(Python)-> Kafka
Step 1: Set Up Your Kafka Environment
Before you begin ingesting data, ensure that your Kafka cluster is properly set up. This includes setting up Kafka brokers, creating the necessary Kafka topics, and configuring Kafka producers. If you haven’t already set up Kafka, you can follow these general steps:
-
Install Kafka: Download and install Apache Kafka from the Apache website. Follow the installation instructions specific to your operating system.
-
Start Zookeeper and Kafka Server:
# Start Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
# Start Kafka server
bin/kafka-server-start.sh config/server.properties
-
Create a Kafka Topic:
bin/kafka-topics.sh --create --topic tigergraph_data --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1
Step 2: Configure TigerGraph
Ensure that your TigerGraph system is correctly configured and that you have the necessary queries or APIs set up to extract data.
-
Install GSQL Client (if not already installed):
Download and configure the GSQL client to connect to your TigerGraph server.
-
Create a Graph Query (if not already created):
Develop a GSQL query that outputs your desired data. Ensure the query is optimized for large data exports.
Step 3: Develop a Data Extraction and Streaming Application
Write a custom application that executes your TigerGraph query, fetches results, and sends them to Kafka. Here’s how you can do this in Python using pyTigerGraph
for TigerGraph connectivity and kafka-python
for Kafka integration.
-
Install Required Libraries:
pip install pyTigerGraph kafka-python
-
Sample Python Script:
import pytigergraph as tg
from kafka import KafkaProducer
import json
# Configuration
tg_host = 'http://your.tigergraph.server'
tg_graphname = 'your_graph'
tg_secret = 'your_secret'
kafka_bootstrap_servers = ['localhost:9092']
kafka_topic = 'tigergraph_data'
# Initialize TigerGraph Connection
conn = tg.TigerGraphConnection(host=tg_host, graphname=tg_graphname, apiToken=tg_secret)
# Initialize Kafka Producer
producer = KafkaProducer(bootstrap_servers=kafka_bootstrap_servers,
value_serializer=lambda x: json.dumps(x).encode('utf-8'))
# Define the query
query_name = "your_query_here"
# Execute the query
results = conn.runInstalledQuery(query_name, timeout=30)
# Send data to Kafka
for result in results[0]["@@result_variable_name"]:
producer.send(kafka_topic, value=result)
producer.flush()
print("Data ingestion to Kafka topic is complete.")