Hi Camilo,
In the developer version, Tigergraph system need to have enough memory to hold all vertices/edges in memory. According to the logs, the memory reaches to its limit in the engine so that engine put most of its job into sleep mode and hold all the ongoing loading requests until all previous loading request have been processed, which finally timeout these following requests and leads to the failure of loading. You can double check the system memory of each tigergraph components by “gadmin status -v” or by checking memory of system via “free -g”. Usually, in order to get the best performance, we suggest customer to give system memory at least twice of the disk space used by our engine storage. To check the disk usage of the engine store, you can use “gadmin status -v graph”.
Here is the summary of the logs, i.e.
RESTPP-LOADER LOG
Dans-MacBook-Pro-2:Downloads danhu$ grep -rn "98631:RESTPP-LOADER_1_1:1533613116574" RESTPP-LOADER_1_1_log.INFO
RESTPP-LOADER_1_1_log.INFO:1187773:I0806 20:38:36.575000 34978 handler.cpp:225] Engine_req|RawRequest|98631:RESTPP-LOADER_1_1:1533613116574|POST|url = restpp/onlineloader?|payload_data.size() = 183017|api = v2
RESTPP-LOADER_1_1_log.INFO:1188719:I0806 20:48:36.815372 34970 dispatcher.cpp:1175] Comp_Dispatcher|20:48:36.815393|98631:RESTPP-LOADER_1_1:1533613116574: Code 1, num_expectresults_ 1, holder result 1, ready result 1: 0,$1$0, ready result 2: 0
RESTPP-LOADER_1_1_log.INFO:1188720:I0806 20:48:36.869263 34979 worker.cpp:1070] Engine_req|OnError|98631:RESTPP-LOADER_1_1:1533613116574|gle|dispatcher|dispatcher timeout
RESTPP-LOADER_1_1_log.INFO:1188721:I0806 20:48:36.904887 34979 requestrecord.cpp:221] Engine_req|ReturnResult|98631:RESTPP-LOADER_1_1:1533613116574|214
This is the relevant logs for the first aborted loading request “98631:RESTPP-LOADER_1_1:1533613116574” inside the RESTPP-LOADER, which shows that LOADER start the request at 20:38:36 but it timeouts at 20:48:36, this is expected because our default system timeout is 10 minutes.
GPE LOG
However on the engine side, around the same time period the system memory is already not healthy and the same request was immediately put on hold when it received by engine, i.e.
I 0806 20:43:42.424974 19471 globalmemoryallocator.cpp:265] Comp_Memory|/tiger/tigergraph/gstore/0/part//.mv/661/1533613421358/_vertexpack force disk mode because System Memory not Health
I0806 20:43:42.425074 19470 globalmemoryallocator.cpp:265] Comp_Memory|/tiger/tigergraph/gstore/0/part//.mv/661/1533613421358/edgelist.bin force disk mode because System Memory not Health
I0806 20:43:42.425150 19469 globalmemoryallocator.cpp:265] Comp_Memory|/tiger/tigergraph/gstore/0/part//.mv/661/1533613421358/vertex.bin force disk mode because System Memory not Health
I0806 20:43:42.427094 19494 post_listener.cpp:138] Request|gle, **98631:RESTPP-LOADER_1_1:1533613116574** :NNAC,294,,0|Post_listener|getDeltaMessage|552717|552716|56204
I0806 20:43:42.432273 19492 gtimer.cpp:134] (11.679 ms) Rebuild _661_ RunOneSegment reload done
I0806 20:43:42.433750 19496 enginejobrunner.cpp:450] Engine_PullDelta|Pull|154600|552717|552717|1ms
I0806 20:43:42.434532 19496 enginejobrunner.cpp:494] Engine_PullDelta|System Memory not health. Sleeping.
I0806 20:43:42.437981 19492 gtimer.cpp:134] (5.717 ms) Rebuild _661_ RunOneSegment switch done
I0806 20:43:42.438215 19492 gtimer.cpp:134] (0.141 ms) DeltaSegmentRecords|Rebuild Compact_Until 661 cleaned_attr_count 145 remain_att_count 23 cleaned_edge_count 0 remain_edge_count 0
Best Wishes,
Dan