GPE crashes at same time every day

I’m running a TigerGraphDB on one VM that started acting up a few days ago. Most of the services (everything but ADMIN, CTRL, ETCD, EXE, IMF, KAFKA, and ZK) go down at the same time every day (I’ve copied a portion of the log file for the GPE service below). To the best of my knolwedge, we are not actively running any queries when these services crash. I’ve set up a cron job to restart the DB after it crashes, but I wanted to see if anyone had any insights into why this might be happening.

Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0610 22:32:47.073527 27432 zookeeper_context.cpp:254] ZooKeeper Error code connection loss. Cannot read path /tigergraph/dict/objects/__services/DICT/addresses/DICT
E0610 22:32:47.073699 27432 address_resolver.cpp:88] AddressResolver cannot resolve path:/tigergraph/dict/objects/__services/DICT/addresses/DICT, rc:kZkError
E0610 22:32:47.601683 27962 gbrain_active_address_resolver.cpp:124] [RefreshGSE] QueryLeader failed. error code: 14, error msg: Socket closed
E0610 22:32:47.602030 27962 gbrain_active_address_resolver.cpp:151] [RefreshGSE] failed to get leaders for all GSE partitions.
E0610 22:32:48.573808 27432 heartbeat_client.cpp:451] CLIENT: Cannot set up client session: can not resolve server , rc: kZkError, retried: 0
E0610 22:32:48.574110 27432 zookeeper_context.cpp:254] ZooKeeper Error code connection loss. Cannot read path /tigergraph/dict/objects/__services/DICT/addresses/DICT
E0610 22:32:48.574146 27432 address_resolver.cpp:88] AddressResolver cannot resolve path:/tigergraph/dict/objects/__services/DICT/addresses/DICT, rc:kZkError
E0610 22:32:50.074249 27432 heartbeat_client.cpp:451] CLIENT: Cannot set up client session: can not resolve server , rc: kZkError, retried: 1
E0610 22:32:50.074615 27432 zookeeper_context.cpp:254] ZooKeeper Error code connection loss. Cannot read path /tigergraph/dict/objects/__services/DICT/addresses/DICT
E0610 22:32:50.074651 27432 address_resolver.cpp:88] AddressResolver cannot resolve path:/tigergraph/dict/objects/__services/DICT/addresses/DICT, rc:kZkError
E0610 22:32:51.574764 27432 heartbeat_client.cpp:451] CLIENT: Cannot set up client session: can not resolve server , rc: kZkError, retried: 2
E0610 22:32:51.575121 27432 zookeeper_context.cpp:254] ZooKeeper Error code connection loss. Cannot read path /tigergraph/dict/objects/__services/DICT/addresses/DICT
E0610 22:32:51.575156 27432 address_resolver.cpp:88] AddressResolver cannot resolve path:/tigergraph/dict/objects/__services/DICT/addresses/DICT, rc:kZkError
E0610 22:32:52.601850 27962 gbrain_active_address_resolver.cpp:124] [RefreshGSE] QueryLeader failed. error code: 14, error msg: failed to connect to all addresses
E0610 22:32:52.601956 27962 gbrain_active_address_resolver.cpp:151] [RefreshGSE] failed to get leaders for all GSE partitions.
...

Hi @Ellen, welcome to the community!

Can you provide a few additional details to help us better understand what’s happening. Any details on the hosted environment? Are you using an on-prem installation (i’m assuming)? What version of TigerGraph DB are you using? What OS are you running on?

Thanks, @Jon_Herke!

We are running TigerGraph Enterprise (v3.5.1) hosted on Azure. The VM is a Standard E4-2as v4 (2 vcpus, 32 GiB memory) running ubuntu.

While I don’t think anyone is actively querying the graph when it goes down, I have a strong suspicion that it goes down on days when someone has run a query.

Hi @Ellen - Is TG installed in the default/simple config? Meaning is everything related to TG located under /tome/tigergraph/tigergraph… Those 'cannot read path /tigergraph/dict/objects/… suggest you have a link to a different disk (??)

Hi @Robert_Hardaway, we deployed TG on Azure without any modifications. The DICT services is running, but /tigergraph/dict does not exist. What is the best way to remedy this issue?

Hi @Ellen I was just checking in to see if this was resolved. It sounded like it might have been a VM issue and not a TigerGraph issue. (crashes correspond to an update/cleanup service starting up on the VM)