Kafka Loader Failing

Hi all,

I am trying to get a very simple Kafka Loader to load in a record from a Kafka topic.
I have my Zookeeper and Kafka containers running just fine, I put a message on the topic no problem,
but when I try to have the Kafka Loader job read off of the topic, it seems like it connects to the broker,
but it’s not able to read off of the topic.

I attached the log file, everything I believe is fine up until line 82 where there is the Kafka Loader error.
Not much information as to why anything failed.

I am using a basic kafka producer proj to put the message on the topic with a String type key and value.
I am using the tigergraph-dev 2.4 container for my TigerGraph instance btw, and it is connected to the
streams docker network I made.

My Kafka containers and configs are as follows:

docker network create streams

docker run -d \
    --net=streams \
    --name=zookeeper \
    -e ZOOKEEPER_CLIENT_PORT=2181 \
    confluentinc/cp-zookeeper:5.0.0


docker run -d \
    --net=streams \
    --name=kafka \
    -e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \
    -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092 \
    -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
    confluentinc/cp-kafka:5.0.0


ssh into the kafka broker

kafka-topics --create --topic tg-topic --partitions 1 --replication-factor 1 --zookeeper zookeeper:2181

kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic tg-topic --from-beginning

kafka configs:

kafka_config.json
{
  "broker": "kafka:9092",
  "kafka_config": {
    "group.id": "tigergraph"
  }
}


tg_topic_config.json
{
  "topic": "tg-topic"
}

My load job file is:

USE GRAPH data
DROP JOB kafka_vertex_load
DROP JOB kafka_edge_load
DROP DATA_SOURCE k1
 
CREATE DATA_SOURCE KAFKA k1 FOR GRAPH data
SET k1 = "/home/tigergraph/mydata/data/kafka_config.json"
 
# define the loading jobs
CREATE LOADING JOB kafka_vertex_load FOR GRAPH data {
  DEFINE FILENAME f1 = "$k1:/home/tigergraph/mydata/data/tg_topic_config.json";
  LOAD f1
    TO VERTEX SomeVertex(some things),
    TO VERTEX SomeOtherVertex(some other things)
    USING SEPARATOR="|";
}

 
# load the data
RUN LOADING JOB kafka_vertex_load

Hi Brett,

Kafka loader will consume the topic from OFFSET_END by default hence it will only load newer message after the loading job is launched. Please try to write some new messages and see whether it’s being loaded.

Or you can try to modify tg_topic_config.json to have it read from the beginning:
{
“topic”: “tg-topic”,
“default_start_offset”: 0
}

Thanks,
Chengbiao

Hi Brett,

Beside what Chengbiao mentioned, I saw (“broker”: “kafka:9092”) in the kafka_config.json, is “kafka” the hostname of the Kafka broker machine? If not, you should use the IP/hostname of the Kafka broker machine.

Example: you are consuming from a Kafka broker where its IP is 192.168.1.100, and port is 9092, the correct “broker” item in kafka_config.json should be:

       "broker": "[192.168.1.100:9092](http://192.168.1.100:9092/)"

Thanks,

Dadong

Thanks for the replies,

The logs I attached were from when I put the message on the topic after I had started the load job.
Interestingly enough, when I started the loader via the gsql client, the job status said that one message had been read.
But it was never added to the database, there is no trace of it, and if it were added to the db, the logs should contain
a load summary like the documentation says.

For the comment about “kafka” as a host name, when Docker containers are part of a Docker network,
each one can be referred to by the container name. I did add the TigerGraph container to the network,
it is at least connecting to the Kafka broker because the load job status said one message read.

Is there potentially some nuance that TigerGraph can only get so far when it tries to read a message from a topic
when trying to do so in the context of a Docker network? I would think not, but I’m at a loss here anyways.

Hi Brett,

Please use default_start_offset: -2, which is actually reading from the beginning.

{
“topic”: “tg-topic”,

“default_start_offset”: -2
}

If you want to check the log for the processed message, please read the $tigergraph/logs/KAFKA-LOADER_1_1/log.INFO. It has detailed info. Please also check your message format and whether it is consistent with your loading job. If data format has any issue then it will be logged. Please run the Kafka loader with EOF=“true” (where the loader will load all messages in the topic until it hits EOF and then exit) so that it can print out the streaming loading summary once it finished and the summary will provide detailed info about the errors in the parsing & how many vertices and edges were created. For the non-EOF mode, the detailed logs are sent to our internal Kafka topic and currently, it is for internal usage only.

Best Wishes,

Dan

Hi Brett,

You said the messages is consumed by Kafka loader, but not post to the database. The possible reason is the field format of the message does not match with loading job. You may check the related log in /home/tigergraph/tigergraph/logs/RESTPP_1_1/log.INFO, to check if any parsing error happens.

Could you plz also share the loading job and sample of the Kafka message it consumed?

Thanks,

Dadong

Hi all,

Turns out user error is the culprit…

My kafka message only has 4 fields, my problem was that in my load job I had $1 - $4, when I should have had $0 - $3.

So I have it working now and I can search the data. :slight_smile:

Thank you all for your timely responses, I definitely learned where to look for when other issues may arise. :slight_smile: