Hi, Karol,
From the error message, we suppose the problem is related to zk connection issues.
Can you please upload some related logs for us to do some further investigation? Once you upload the logs, we can make sure.
The related file/dir we need are:
-
/home/tigergraph/tigergraph/logs/dict/
-
/home/tigergraph/tigergraph/zk/zookeeper.out*
Zookeeper logs are pretty monotonous: a couple of warnings and repetitive error.
Warning:
2018-06-21 14:20:10,580 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:19999:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x164224971c40000, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:748)
2018-06-21 14:20:10,582 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:19999:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:53438 which had sessionid 0x164224971c40000
Errors:
2018-06-21 14:20:46,638 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x164224971c40008
2018-06-21 14:20:46,638 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:19999:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:53522 which had sessionid 0x164224971c40008
2018-06-21 14:20:46,639 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
2018-06-21 14:21:01,511 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:19999:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:53524
Quite a few dict logs (gdict_client_.ERROR*), but most errors/messages are like this:
Running on machine: xxx
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0621 14:18:19.428499 29430 address_resolver.cpp:68] AddressResolver-Resolve:Can not connect to ZK server, rc: operation timeout
E0621 14:18:19.430287 29430 gdict.cpp:311] Dictionary initialize failed.Cannot initialize resolver. Zk server: 127.0.0.1:19999, root: /tigergraph/dict/objects/__services/DICT/addresses, rc: kTimeout. Took 2058 milliseconds
…
Log file created at: 2018/06/21 14:20:38
Running on machine: xxx
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0621 14:20:38.280681 31923 address_resolver.cpp:96] AddressResolver cannot resolve path:/tigergraph/dict/objects/__services/DICT/addresses/DICT, rc:kNotFound
E0621 14:20:39.280998 31923 heartbeat_client.cpp:454] CLIENT: Cannot set up client session: can not resolve server , rc: kNotFound, retried: 0
E0621 14:20:39.281621 31923 address_resolver.cpp:96] AddressResolver cannot resolve path:/tigergraph/dict/objects/__services/DICT/addresses/DICT, rc:kNotFound
E0621 14:20:40.281798 31923 heartbeat_client.cpp:454] CLIENT: Cannot set up client session: can not resolve server , rc: kNotFound, retried: 1
…
Running on machine: xxx
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0621 14:20:44.635870 31957 address_resolver.cpp:96] AddressResolver cannot resolve path:/tigergraph/dict/objects/__services/DICT/addresses/DICT, rc:kNotFound
E0621 14:20:45.636169 31957 heartbeat_client.cpp:454] CLIENT: Cannot set up client session: can not resolve server , rc: kNotFound, retried: 0
E0621 14:20:45.637459 31957 address_resolver.cpp:96] AddressResolver cannot resolve path:/tigergraph/dict/objects/__services/DICT/addresses/DICT, rc:kNotFound
E0621 14:20:46.637643 31957 heartbeat_client.cpp:454] CLIENT: Cannot set up client session: can not resolve server , rc: kNotFound, retried: 1
E0621 14:21:27.449321 31957 gdict.cpp:635] Error found in RPC, code: 14, error msg: Endpoint read failed
E0621 14:21:27.449367 31957 gdict.cpp:635] Retry on error, retry times: 0, channel state: 3
E0621 14:21:28.435075 31957 gdict.cpp:635] Error found in RPC, code: 14, error msg: Endpoint read failed
E0621 14:21:28.435132 31957 gdict.cpp:635] Retry on error, retry times: 1, channel state: 3
E0621 14:21:29.432013 31957 gdict.cpp:635] Error found in RPC, code: 14, error msg: Endpoint read failed
E0621 14:21:29.432065 31957 gdict.cpp:635] Retry on error, retry times: 2, channel state: 3
E0621 14:21:30.431991 31957 gdict.cpp:635] Error found in RPC, code: 14, error msg: Endpoint read failed
E0621 14:21:30.432024 31957 gdict.cpp:635] Retry on error, retry times: 3, channel state: 3
E0621 14:21:31.434494 31957 gdict.cpp:635] Error found in RPC, code: 14, error msg: Endpoint read failed
E0621 14:21:31.434597 31957 gdict.cpp:635] Retry on error, retry times: 4, channel state: 3
There are no gdict_server.ERROR logs, though.
Some warnings found:
~/tigergraph/logs/dict$ cat WARNING.20180621-142125.32976
Log file created at: 2018/06/21 14:21:25
Running on machine: xxx
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0621 14:21:25.271512 32976 dict_core.cpp:112] Initialize Dictionary Service finished, rc: ok , dictionary root: /tigergraph/dict/objects, root node created: 0
W0621 14:21:25.329078 32976 dict_service.cpp:81] DictService started, rc: kOk
W0621 14:21:25.329097 32976 dict_service.cpp:82] Service is binded to address: 127.0.0.1:17797
And some info:
tigergraph@xxx:~/tigergraph/logs/dict$ cat gdict_server.INFO
Log file created at: 2018/06/21 14:21:25
Running on machine: xxx
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0621 14:21:25.212453 32976 gconfig.cpp:28] Engine_GConfig|GInfo_v set to 21845
I0621 14:21:25.212574 32976 gconfig.cpp:28] Engine_GConfig|GASSERT_v set to 0
I0621 14:21:25.212586 32976 completion_queue_manager.cpp:12] Server CQManager(0x7ffc166b92f8) constructing
I0621 14:21:25.212767 32976 dict_service.cpp:72] DictService starting…
I0621 14:21:25.215740 32981 zookeeper_context.cpp:230] Root Watcher SESSION_EVENT state = CONNECTED_STATE for path: NA
I0621 14:21:25.215749 32981 zookeeper_context.cpp:81] ZooKeeper Connection is setup. Session id: 164224971c4000b, previous client id:0
I0621 14:21:25.215752 32981 zookeeper_watcher.cpp:314] Zk Session connected, notifying watchers
I0621 14:21:25.215755 32981 zookeeper_watcher.cpp:321] → Number of watchers: 0
I0621 14:21:25.215759 32981 zookeeper_watcher.cpp:322] → Callback time used(us): 3
I0621 14:21:25.269830 32976 brain_daemon.cpp:273] Brain daemon initialized. rc: ok, zkserver: 127.0.0.1:19999, timeoutMS: 3000
W0621 14:21:25.271512 32976 dict_core.cpp:112] Initialize Dictionary Service finished, rc: ok , dictionary root: /tigergraph/dict/objects, root node created: 0
I0621 14:21:25.278863 32976 brain_daemon.cpp:443] Comm_Daemon|Election node:/tigergraph/dict/objects/__services/DICT/_election_nodes/DICT@0000000000
I0621 14:21:25.278890 32976 brain_daemon.cpp:498] Starting Daemon, type: DICT, seq: /tigergraph/dict/objects/__services/DICT/_election_nodes/DICT@0000000000 @127.0.0.1:17797
I0621 14:21:25.279127 32981 brain_daemon.cpp:17] BRAINDAEMON Handling Leader watcher event, watch path: /tigergraph/dict/objects/__services/DICT/_election_nodes, added size: 1, removed size: 0, children count: 1, context: 0x26dfe70
I0621 14:21:25.283201 32981 brain_daemon.cpp:60] Daemon list for type DICT has changed. There are now 1 of us, original 0. I am /tigergraph/dict/objects/__services/DICT/_election_nodes/DICT@0000000000 my leader is 127.0.0.1:17797
I0621 14:21:25.285337 32981 brain_daemon.cpp:89] BRAINDAEMON Election found myself: 127.0.0.1:17797, in node: DICT@0000000000 term:0
I0621 14:21:25.294715 32981 zookeeper_context.cpp:789] Parents to create: /gdbms/addresses
I0621 14:21:25.299448 32981 brain_daemon.cpp:591] Comm_Daemon|BRAINDAEMON Assigned Leader: Brain Daemon with seq ID: /tigergraph/dict/objects/__services/DICT/_election_nodes/DICT@0000000000, became the leader of type: DICT, my id: 127.0.0.1:17797, updated to: 127.0.0.1:17797, rc: kOk
I0621 14:21:25.299458 32981 brain_daemon.cpp:576] Comm_Daemon|BRAINDAEMON Become Leader: Brain Daemon with seq ID: /tigergraph/dict/objects/__services/DICT/_election_nodes/DICT@0000000000, became the leader of type: DICT, my id: 127.0.0.1:17797, updated to: 127.0.0.1:17797
I0621 14:21:25.299460 32981 dict_core.cpp:370] Becoming Dictionary Service primary, my id: /tigergraph/dict/objects/__services/DICT/_election_nodes/DICT@0000000000, my address: 127.0.0.1:17797, my leader: 127.0.0.1:17797
I0621 14:21:25.329062 32976 brain_daemon.cpp:515] Daemon @127.0.0.1:17797 of type DICT with ID /tigergraph/dict/objects/__services/DICT/_election_nodes/DICT@0000000000 started with leader 127.0.0.1:17797
I0621 14:21:25.329074 32976 dict_core.cpp:361] Dictionary Daemon is starting…
W0621 14:21:25.329078 32976 dict_service.cpp:81] DictService started, rc: kOk
W0621 14:21:25.329097 32976 dict_service.cpp:82] Service is binded to address: 127.0.0.1:17797
I0621 14:21:25.330451 32987 completion_queue_manager.cpp:89] Server CompletionQueue (0x26dbf00) begin processing.
I0621 14:21:25.330471 32976 async_server.cpp:39] Server listening on 127.0.0.1:17797
I0621 14:21:25.330559 32988 completion_queue_manager.cpp:89] Server CompletionQueue (0x26d98f0) begin processing.
I0621 14:21:25.330600 32989 single_thread_worker.cpp:85] SingleThreadWorker start: HeartbeatSessionManager